Ipoom Jeong
Ipoom Jeong
Home
Publications
Projects
Posts
Events
Contact
Light
Dark
Automatic
Microarchitecture
A4: Microarchitecture-Aware LLC Management for Datacenter Servers with Emerging I/O Devices
This work uncovers two previously unknown sources of Last-Level Cache (LLC) contention in Intel Xeon CPUs caused by high-bandwidth I/O devices and proposes A4, a runtime LLC management framework that mitigates these issues. A4 improves performance for latency-sensitive workloads by 51% without significantly affecting low-priority workloads.
Haneul Park
,
Jiaqi Lou
,
Sangjin Lee
,
Yifan Yuan
,
KyoungSoo Park
,
Yongseok Son
,
Ipoom Jeong
,
Nam Sung Kim
PDF
Cite
Intel® In-Memory Analytics Accelerator: Performance Characterization and Guidelines
The rapid advancements in CPU performance have slowed due to the end of Dennard scaling and the exponential growth of data, making it …
Jaeyoung Kang
,
Qirong Xia
,
Ipoom Jeong
,
Yongjoo Park
,
Nam Sung Kim
Warped-Compaction: Maximizing GPU Register File Bandwidth Utilization via Operand Compaction
The GPU has been successfully used for diverse emerging compute-intensive applications, including imaging, computer vision, and more …
Eunbi Jeong
,
Ipoom Jeong
,
Myung Kuk Yoon
,
Nam Sung Kim
PDF
Cite
Marching Page Walks: Batching and Concurrent Page Table Walks for Enhancing GPU Throughput
Virtual memory, with the support of address translation hardware, is a key technique in expanding programmability and memory management …
Jiwon Lee
,
Gun Ko
,
Myung Kuk Yoon
,
Ipoom Jeong
,
Yunho Oh
,
Won Woo Ro
PDF
Cite
Triple-A: Early Operand Collector Allocation for Maximizing GPU Register Bank Utilization
Recent GPUs provisioned with large register files cannot fully utilize the bandwidth between the register files and execution …
Ipoom Jeong
,
Eunbi Jeong
,
Nam Sung Kim
,
Myung Kuk Yoon
PDF
Cite
A Quantitative Analysis and Guidelines of Data Streaming Accelerator in Modern Intel Xeon Scalable Processors
In this work, we set out to introduce the latest features supported by Intel DSA (Data Streaming Accelerator), deep-dive into its versatility, and analyze its throughput benefits through a comprehensive evaluation.
Reese Kuper
,
Ipoom Jeong
,
Yifan Yuan
,
Ren Wang
,
Narayan Ranganathan
,
Nikhil Rao
,
Jiayu Hu
,
Sanjay Kumar
,
Philip Lantz
,
Nam Sung Kim
PDF
Cite
ISCA 2023 Tutorial
INTERPRET: Inter-Warp Register Reuse for GPU Tensor Core
TBD
Jae Seok Kwak
,
Myung Kuk Yoon
,
Ipoom Jeong
,
Seunghyun Jin
,
Won Woo Ro
PDF
Cite
A Quantitative Analysis and Guideline of Data Streaming Accelerator in Intel 4th Gen Xeon Scalable Processors
In this work, we set out to introduce the latest features supported by Intel DSA (Data Streaming Accelerator), deep-dive into its versatility, and analyze its throughput benefits through a comprehensive evaluation.
Reese Kuper
,
Ipoom Jeong
,
Yifan Yuan
,
Jiayu Hu
,
Ren Wang
,
Narayan Ranganathan
,
Nam Sung Kim
PDF
Cite
ISCA 2023 Tutorial
CASH-RF: A Compiler-Assisted Hierarchical Register File in GPUs
Spin-transfer torque magnetic random-access memory (STT-MRAM) is an emerging nonvolatile memory technology that has been received …
Yunho Oh
,
Ipoom Jeong
,
Won Woo Ro
,
Myung Kuk Yoon
PDF
Cite
Reconstructing Out-of-Order Issue Queue
In this work, we propose an energy-efficient microarchitecture named Ballerino, carrying out BALanced and cache-miss toLERable dynamic scheduling via cascaded and clustered IN-Order IQs. The proposed microarchitecture is built upon three key principles that drive dynamic scheduling: instruction readiness, memory/register dependences, and oldest-first selection.
Ipoom Jeong
,
Jiwon Lee
,
Myung Kuk Yoon
,
Won Woo Ro
PDF
Cite
»
Cite
×