Marching Page Walks: Batching and Concurrent Page Table Walks for Enhancing GPU Throughput

Jiwon Lee, Gun Ko, Myung Kuk Yoon, Ipoom Jeong, Yunho Oh, Won Woo Ro

March, 2025

Abstract

Virtual memory, with the support of address translation hardware, is a key technique in expanding programmability and memory management in GPUs. However, the nature of the GPU execution model heavily pressures its translation hardware, particularly due to a discrepancy in the behavior of page table walkers and thousands of concurrently running threads. In GPU workloads, multiple threads simultaneously access a number of pages necessitating a substantial number of translations whereas each walker handles only a single walk request at a time. Such a limitation significantly increases the queueing latency of walk requests, which we observe as a major bottleneck for servicing page table walks in GPUs. To tackle this challenge, we investigate a design of page walkers that facilitates multiple walk requests to be handled together in batches. Then, we make the following observations: 1) allowing a page walker to issue beyond a single memory request significantly improves the throughput of walkers, and 2) GPU applications tend to concurrently access pages in wide address ranges. By leveraging the above implications, we propose Marching Page Walks (MPW), that effectively mitigate the contention in GPU page table walkers. MPW scans pending walk requests to identify ones that can be grouped together. Then, MPW batches these requests and concurrently handles them by issuing multiple memory instructions. Experiments show that MPW reduces the queueing latency of page walks by 86.7% and improves GPU performance by 55.6% over the baseline design.

Type

Conference paper

Publication

IEEE International Symposium on High-Performance Computer Architecture (HPCA)

Marching Page Walks: Batching and Concurrent Page Table Walks for Enhancing GPU Throughput

Abstract

Ipoom Jeong

Assistant Professor