With the widespread use of cluster systems and ever increasing threat to computer security, it becomes more necessary to design and build secure cluster systems. Most cluster systems rely on security products like firewalls for their security, but they cannot guarantee security of intra-cluster communications, which can be a weak spot that hackers exploit for further security attacks. A recent study by Lee and Kim (2007) [22] proposed a security framework to protect intra-cluster communications by encrypting and authenticating all packets with fine-grained security where any two communicating processes dynamically generate and share a cryptographic key, called a session key. However, the fine-grained security scheme can incur serious performance degradation in large-scale cluster systems since it may take a long time to access session keys. To solve this problem, we propose to incorporate a session key cache inside a cluster interconnect card to speed up accesses to the session keys and build an analytical cluster traffic model to estimate the behavior of the cache in large-scale cluster systems. For further performance improvement, we propose a prefetching scheme speculating job scheduler’s decision without OS interventions. Simulation results indicate that the session key cache with the prefetching scheme decreases the network latency by 50% on average, compared to the configurations without the enhancements.
Publications
Fast Secure Communications in Shared Memory Multiprocessor Systems
Protection and security are becoming essential requirements in commercial servers. To provide secure memory and cache-to-cache communications, we presented Interconnect-Independent Security Enhanced Shared Memory Multiprocessor System (I2SEMS), mainly focusing on how to manage a global counter to encrypt, decrypt, and authenticate data messages with little performance overhead. However, I2SEMS was vulnerable to replay attacks on data messages and integrity attacks on control and counter messages. This paper proposes three authentication schemes to remove those security vulnerabilities. First, we prevent replay attacks on data messages by inserting Request Counter (RC) into request messages. Second, we also use RC to detect integrity attacks on control messages. Third, we propose a new counter, referred to as GCC Counter (GC), to protect the global counter messages. We simulated our design with SPLASH-2 benchmarks on up to 16-processor shared memory multiprocessor systems by using Simics with Wisconsin multifacet General Execution-driven Multiprocessor Simulator (GEMS). Simulation results show that the overall performance slowdown is 4 percent on average with the highest keystream hit rate of 78 percent
Communication-Aware Globally-Coordinated On-Chip Networks
With continued Moore’s law scaling, multicore-based architectures are becoming the de facto design paradigm for achieving low-cost and performance/power-efficient processing systems through effective exploitation of available parallelism in software and hardware. A crucial subsystem within multicores is the on-chip interconnection network that orchestrates high-bandwidth, low-latency, and low-power communication of data. Much previous work has focused on improving the design of on-chip networks but without more fully taking into consideration the on-chip communication behavior of application workloads that can be exploited by the network design. A significant portion of this paper analyzes and models on-chip network traffic characteristics of representative application workloads. Leveraged by this, the notion of globally coordinated on-chip networks is proposed in which application communication behavior-captured by traffic profiling-is utilized in the design and configuration of on-chip networks so as to support prevailing traffic flows well, in a globally coordinated manner. This is applied to the design of a hybrid network consisting of a mesh augmented with configurable multidrop (bus-like) spanning channels that serve as express paths for traffic flows benefiting from them, according to the characterized traffic profile. Evaluations reveal that network latency and energy consumption for a 64-core system running OpenMP benchmarks can be improved on average by 15 and 27 percent, respectively, with globally coordinated on-chip networks.
A PROactive Request Distribution(PRORD) Using Web Log Mining in Cluster-Based Web Server
Widely adopted, distributor-based systems forward user requests to a balanced set of waiting servers in complete transparency to the users. The policy employed in forwarding requests from the front-end distributor to the backend servers plays an important role in the overall system performance. The locality-aware request distribution (LARD) scheme improves the system response time by having the requests serviced by the web servers that contain the data in their cache. In this paper, we propose a proactive request distribution (PRORD) to apply an intelligent proactive-distribution at the front-end and complementary pre-fetching at the back-end server nodes to acquire the data into their caches. The pre-fetching scheme fetches the web pages in advance into the memory based on a confidence value of the web page, which is predicted by the proactive distribution scheme. The proactive distribution depends on both online and offline analysis of the website log files, which capture user navigation patterns on the website. Designed to work with the prevailing web technologies, such as HTTP 1.1, our scheme aims to provide reduced response time to the users. Simulations carried out with traces derived from the log files of real web servers witness performance boost of 15-45% compared to the existing distribution policies
Assuring K-Coverage in the Presence of Mobility in Wireless Sensor Networks
Along with energy conservation, it has been a critical issue to maintain a desired degree of coverage in wireless sensor networks (WSNs), especially in a mobile environment. By enhancing a variant of Random Waypoint (RWP) model [1], we propose Mobility Resilient Coverage Control (MRCC) to assure K-coverage in the presence of mobility. Our basic goals are 1) to elaborate the probability of breaking K-coverage with moving-in and moving-out probabilities, and 2) to issue wake-up calls to sleeping sensors to meet user requirement of K-coverage even in the presence of mobility. Furthermore, by separating the mobility behavior into average and individual, the probability of breaking K-coverage can be precisely calculated, hence reducing the number of sensors to be awakened. Our experiments with NS2 show that MRCC with the individual probability achieves better coverage by 1.4% with 22% fewer numbers of active sensors than that of existing Coverage Configuration Protocol (CCP) [2]
An Overview on Security Issues in Cluster Interconnects
Widespread use of cluster systems in diverse set of applications has spurred significant interest in providing high performance cluster interconnects. A major inefficiency in utilizing such interconnects has been the send/receive communication overheads at the sender/receiver hosts. Various techniques such as User-Level Communication (ULC) have been proposed to mitigate this communication inefficiency. However, due to recent security breaches, focus on cluster communication security research has spurred. Such research is non-trivial due to the high-speed nature of the cluster interconnect. This paper surveys the four most popular cluster interconnects used in Top500 supercomputers and explores possible schemes to ensure secure cluster intra-communication encompassing the host processor, secure coprocessor and the Network Interface Card (NIC) by illustrating its challenges in doing so. We then compare these schemes in terms of host processor offload, end-to-end latency, security transparency and cryptographic processing performance. Then we give an overview of security issues for those cluster interconnects designs.
I/O Node Placement for Performance and Reliability in Torus Networks
When a cluster system interconnects processor and I/O nodes through a network, an optimal placement of I/O nodes is critical to improve the overall system performance by reducing its communication latency. In this paper, we propose an efficient and scalable I/O node placement scheme, called a relaxed quasi-perfect, for torusbased interconnection networks using Lee distance errorcorrecting code. It provides a more general placement than the previous quasi-perfect placement [1]. We also suggest a fault-tolerant scheme using our I/O placement model for a guaranteed performance. Simulation results show that our scheme provides 53% speed-up over the quasi-perfect. Also the fault tolerant scheme provides a graceful slowdown especially until the number of faulty I/O nodes becomes less than half of the initial I/O nodes.
A Heuristic for Peak Power Constrained Design of Network on Chip (NoC) based Multimode System
Designing NoC-based systems has become increasingly complex with support for multiple functionalities. Decisions regarding interconnections between the heterogeneous system components and routing of system communication affect system performance and power consumption. This research provides a heuristic to determine the neighborhood configuration for each component. By controlling the communication bandwidth allocation, simulation results with synthetic and real workloads indicate that our heuristic is able to control the peak power consumption, but at cost of throughput degradation.
Security Enhancement in InfiniBand Architecture
The InfiniBandTM Architecture (IBA) is a new promising I/O communication standard positioned for building clusters and System Area Networks (SANs). However, the IBA specification has left out security resulting in potential security vulnerabilities, which could be exploited with moderate effort. In this paper, we view these vulnerabilities from three classical security aspects: availability, confidentiality, and authentication. For better availability of IBA, we recommend that a switch be able to enforce partitioning for data packets for which we propose an efficient implementation method using trap messages. For confidentiality, we encrypt only secret keys to minimize performance degradation. The most serious vulnerability in IBA is authentication since IBA authenticates packets solely by checking the existence of plaintext keys in the packet. In this paper, we propose a new authentication mechanism that treats the Invariant CRC (ICRC) field as an authentication tag, which is compatible with current IBA specification. When analyzing the performance of our authentication approach along with other authentication algorithms, we observe that our approach dramatically enhances IBA’s authentication capability without hampering IBA performance benefit. Furthermore, simulation results indicate that our methods enhance security in IBA with marginal performance overhead.