• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • About us
  • Research
  • Publications
  • News
  • People
  • Contact Us

High Performance Computing Laboratory

Texas A&M University College of Engineering

A Domain-Specific On-Chip Network Design for Large Scale Cache Systems

Y. Jin, E. J. Kim, and K. H. Yum

Proceedings of 13th International Symposium on High-Performance Computer Architecture (HPCA-13), Phoenix, 2007

As circuit integration technology advances, the design of efficient interconnects has become critical. On-chip networks have been adopted to overcome scalability and the poor resource sharing problems of shared buses or dedicated wires. However, using a general on-chip network for a specific domain may cause underutilization of the network resources and huge network delays because the interconnects are not optimized for the domain. Addressing these two issues is challenging because in-depth knowledges of interconnects and the specific domain are required. Recently proposed Non-Uniform Cache Architectures (NUCAs) use wormhole-routed 2D mesh networks to improve the performance of on-chip L2 caches. We observe that network resources in NUCAs are underutilized and occupy considerable chip area (52% of cache area). Also the network delay is significantly large (63% of cache access time). Motivated by our observations, we investigate how to optimize cache operations and design the network in large scale cache systems. We propose a single-cycle router architecture that can efficiently support multicasting in on-chip caches. Next, we present Fast-LRU replacement, where cache replacement overlaps with data request delivery. Finally we propose a deadlock-free XYX routing algorithm and a new halo network topology to minimize the number of links in the network. Simulation results show that our networked cache system improves the average IPC by 38% over the mesh network design with Multicast Promotion replacement while using only 23% of the interconnection area. Specifically, Multicast FastLRU replacement improves the average IPC by 20% compared with Multicast Promotion replacement. A halo topology design additionally improves the average IPC by 18% over a mesh topology.

© 2016–2025 High Performance Computing Laboratory Log in

Texas A&M Engineering Experiment Station Logo
  • State of Texas
  • Open Records
  • Risk, Fraud & Misconduct Hotline
  • Statewide Search
  • Site Links & Policies
  • Accommodations
  • Environmental Health, Safety & Security
  • Employment