With the widespread use of cluster systems and ever increasing threat to computer security, it becomes more necessary to design and build secure cluster systems. Most cluster systems rely on security products like firewalls for their security, but they cannot guarantee security of intra-cluster communications, which can be a weak spot that hackers exploit for further security attacks. A recent study by Lee and Kim (2007) [22] proposed a security framework to protect intra-cluster communications by encrypting and authenticating all packets with fine-grained security where any two communicating processes dynamically generate and share a cryptographic key, called a session key. However, the fine-grained security scheme can incur serious performance degradation in large-scale cluster systems since it may take a long time to access session keys. To solve this problem, we propose to incorporate a session key cache inside a cluster interconnect card to speed up accesses to the session keys and build an analytical cluster traffic model to estimate the behavior of the cache in large-scale cluster systems. For further performance improvement, we propose a prefetching scheme speculating job scheduler’s decision without OS interventions. Simulation results indicate that the session key cache with the prefetching scheme decreases the network latency by 50% on average, compared to the configurations without the enhancements.
Journals
Fast Secure Communications in Shared Memory Multiprocessor Systems
Protection and security are becoming essential requirements in commercial servers. To provide secure memory and cache-to-cache communications, we presented Interconnect-Independent Security Enhanced Shared Memory Multiprocessor System (I2SEMS), mainly focusing on how to manage a global counter to encrypt, decrypt, and authenticate data messages with little performance overhead. However, I2SEMS was vulnerable to replay attacks on data messages and integrity attacks on control and counter messages. This paper proposes three authentication schemes to remove those security vulnerabilities. First, we prevent replay attacks on data messages by inserting Request Counter (RC) into request messages. Second, we also use RC to detect integrity attacks on control messages. Third, we propose a new counter, referred to as GCC Counter (GC), to protect the global counter messages. We simulated our design with SPLASH-2 benchmarks on up to 16-processor shared memory multiprocessor systems by using Simics with Wisconsin multifacet General Execution-driven Multiprocessor Simulator (GEMS). Simulation results show that the overall performance slowdown is 4 percent on average with the highest keystream hit rate of 78 percent
Communication-Aware Globally-Coordinated On-Chip Networks
With continued Moore’s law scaling, multicore-based architectures are becoming the de facto design paradigm for achieving low-cost and performance/power-efficient processing systems through effective exploitation of available parallelism in software and hardware. A crucial subsystem within multicores is the on-chip interconnection network that orchestrates high-bandwidth, low-latency, and low-power communication of data. Much previous work has focused on improving the design of on-chip networks but without more fully taking into consideration the on-chip communication behavior of application workloads that can be exploited by the network design. A significant portion of this paper analyzes and models on-chip network traffic characteristics of representative application workloads. Leveraged by this, the notion of globally coordinated on-chip networks is proposed in which application communication behavior-captured by traffic profiling-is utilized in the design and configuration of on-chip networks so as to support prevailing traffic flows well, in a globally coordinated manner. This is applied to the design of a hybrid network consisting of a mesh augmented with configurable multidrop (bus-like) spanning channels that serve as express paths for traffic flows benefiting from them, according to the characterized traffic profile. Evaluations reveal that network latency and energy consumption for a 64-core system running OpenMP benchmarks can be improved on average by 15 and 27 percent, respectively, with globally coordinated on-chip networks.
