Publications

(2024). Triple-A: Early Operand Collector Allocation for Maximizing GPU Register Bank Utilization. IEEE Embedded Systems Letters (early access).

PDF Cite

(2024). Intel Accelerator Ecosystem: An SoC-Oriented Perspective. IEEE/ACM International Symposium on Computer Architecture (ISCA, accepted).

(2024). HAL: Hardware-assisted Load Balancing for Energy-efficient SNIC-Host Cooperative Computing. IEEE/ACM International Symposium on Computer Architecture (ISCA, accepted).

(2024). TAROT: A CXL SmartNIC-Based Defense Against Multi-bit Errors by Row Hammer Attacks. ACM Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

PDF Cite

(2024). A Quantitative Analysis and Guidelines of Data Streaming Accelerator in Modern Intel Xeon Scalable Processors. ACM Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

PDF Cite ISCA 2023 Tutorial

(2024). ScaleCache: A Scalable Page Cache for Multiple Solid-State Drives. ACM SIGOPS European Conference on Computer Systems (EuroSys).

PDF Cite

(2024). A Multi-DNN Acceleration Architecture for Balanced QoS and Throughput. International Conference on Electronics, Information, and Communication (ICEIC).

(2023). A Convertible Neural Processor Supporting Adaptive Quantization for Real-Time Neural Networks. Journal of Systems Architecture.

PDF Cite

(2023). Demystifying CXL Memory with Genuine CXL-Ready Systems and Devices. IEEE/ACM International Symposium on Microarchitecture (MICRO).

PDF Cite

(2023). INTERPRET: Inter-Warp Register Reuse for GPU Tensor Core. ACM International Conference on Parallel Architectures and Compilation Techniques (PACT).

PDF Cite

(2023). LADIO: Leakage-Aware Direct I/O for I/O-Intensive Workloads. IEEE Computer Architecture Letters (Volume: 22, Issue: 2).

PDF Cite

(2023). A Quantitative Analysis and Guideline of Data Streaming Accelerator in Intel 4th Gen Xeon Scalable Processors. arXiv Preprint.

PDF Cite ISCA 2023 Tutorial

(2023). Demystifying CXL Memory with Genuine CXL-Ready Systems and Devices. arXiv Preprint.

PDF Cite

(2022). CASH-RF: A Compiler-Assisted Hierarchical Register File in GPUs. IEEE Embedded Systems Letters (Volume: 14, Issue: 4).

PDF Cite

(2022). Reconstructing Out-of-Order Issue Queue. IEEE/ACM International Symposium on Microarchitecture (MICRO).

PDF Cite

(2022). TEA-RC: Thread Context-Aware Register Cache for GPUs. IEEE Access (Volume: 10).

PDF Cite

(2020). CASINO Core Microarchitecture: Generating Out-of-Order Schedules Using Cascaded In-Order Scheduling Windows. IEEE International Symposium on High Performance Computer Architecture (HPCA).

PDF Cite

(2019). OverCome: Coarse-Grained Instruction Commit with Handover Register Renaming. IEEE Transactions on Computers (Volume: 68, Issue: 12).

PDF Cite

(2018). Constructing Resilient Region in Dynamic Optimization Systems via Dynamic Adjustment of Bias Thresholds. IEEE International Conference On Consumer Electronics Asia (ICCE-ASIA).

(2017). Parallel In-Order Execution Architecture for Low-Power Processor. International SoC Design Conference (ISOCC).

PDF Cite

(2017). Dynamic Warp Scheduler Selection Policy Using Linear Regression for GPUs. International Conference on Electronics, Information, and Communication (ICEIC).

PDF Cite

(2016). Heterogeneous Core Microarchitecture with Functional Unit Gating for High Energy Efficiency. Annual Summer Conference of IEIE.

PDF Cite

(2016). Analyzing Development Trends and Performance/Power Characteristics of Multi-Core Processors. Annual Summer Conference of IEIE.

PDF Cite

(2014). Exploiting Back-End Fusion in Multi-Core Processors. Annual Fall Conference of KIPS.