Publications

MediaWorm: A QoS Capable Router Architecture for Clusters

K. H. Yum, E. J. Kim, C. R. Das, and A. Vaidya

IEEE Transactions on Parallel and Distributed Systems, Vol. 13, No. 12, pp.1261-1274, December 2002.

With the increasing use of clusters in real-time applications, it has become essential to design high-performance networks with quality-of-service (QoS) guarantees. We explore the feasibility of providing QoS in wormhole switched routers, which are widely used in designing scalable, high-performance cluster interconnects. In particular, we are interested in supporting multimedia video streams with CBR and VBR traffic, in addition to the conventional best-effort traffic. The proposed MediaWorm router uses a rate-based bandwidth allocation mechanism, called Fine-Grained VirtualClock (FGVC), to schedule network resources for different traffic classes. Our simulation results on an 8-port router indicate that it is possible to provide jitter-free delivery to VBR/CBR traffic up to an input load of 70-80 percent of link bandwidth and the presence of best-effort traffic has no adverse effect on real-time traffic. Although the MediaWorm router shows a slightly lower performance than a pipelined circuit switched (PCS) router, commercial success of wormhole switching, coupled with simpler and cheaper design, makes it an attractive alternative. Simulation of a (2/spl times/2) fat-mesh using this router shows performance comparable to that of a single switch and suggests that clusters designed with appropriate bandwidth balance between links can provide required performance for different types of traffic.

Performance Analysis of a QoS Capable Cluster Interconnect

E. J. Kim, K. H. Yum, and C. R. Das

Performance Evaluation, Vol. 60, Issues 1-4, pp. 275-302, May 2005.

The growing use of clusters in diverse applications, many of which have real-time constraints, requires quality-of-service (QoS) support from the underlying cluster interconnect. All prior studies on QoS-aware cluster routers/networks have used simulation for performance evaluation. In this paper, we present an analytical model for a wormhole-switched router with QoS provisioning. In particular, the model captures message blocking due to wormhole switching in a pipelined router, and bandwidth sharing due to a rate-based scheduling mechanism, called VirtualClock. Then we extend the model to a hypercube-style cluster network. Average message latency for different traffic classes and deadline missing probability for real-time applications are computed using the model.We evaluate a 16-port router and hypercubes of different dimensions with a mixed workload of real-time and best-effort (BE) traffic. Comparison with the simulation results shows that the single router and the network models are quite accurate in providing the performance estimates, and thus can be used as efficient design tools.

A Holistic Approach to Designing Energy-Efficient Cluster Interconnets

E. J. Kim, G. M. Link, K. H. Yum, V. Narayanan, M. Kandemir, M. J. Irwin, C. R. Das

IEEE Transactions on Computers, Vol. 54, No. 6, pp. 660-671, June 2005

Designing energy-efficient clusters has recently become an important concern to make these systems economically attractive for many applications. Since the cluster interconnect is a major part of the system, the focus of this paper is to characterize and optimize the energy consumption in the entire interconnect. Using a cycle-accurate simulator of an InfiniBand Architecture (IBA) compliant interconnect fabric and actual designs of its components, we investigate the energy behavior on regular and irregular interconnects. The energy profile of the three major components (switches, network interface cards (NICs), and links) reveals that the links and switch buffers consume the major portion of the power budget. Hence, we focus on energy optimization of these two components. To minimize power in the links, first we investigate the dynamic voltage scaling (DVS) algorithm and then propose a novel dynamic link shutdown (DLS) technique. The DLS technique makes use of an appropriate adaptive routing algorithm to shut down the links intelligently. We also present an optimized buffer design for reducing leakage energy in 70nm technology. Our analysis on different networks reveals that, while DVS is an effective energy conservation technique, it incurs significant performance penalty at low to medium workload. Moreover, energy saving with DVS reduces as the buffer leakage current becomes significant with 70nm design. On the other hand, the proposed DLS technique can provide optimized performance-energy behavior (up to 40 percent energy savings with less than 5 percent performance degradation in the best case) for the cluster interconnects.

Exploring IBA Design Space for Improved Performance

E. J. Kim, K. H. Yum, C. R. Das, M. Yousif, and J. Duato

IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol. 18, No. 4, pp. 498-510, April 2007

InfiniBand Architecture (IBA) is envisioned to be the default communication fabric for future system area networks (SANs) or clusters. However, IBA design is currently in its infancy since the released specification outlines only higher level functionalities, leaving it open for exploring various design alternatives. In this paper, we investigate four corelated techniques for providing high and predictable performance in IBA. These are: 1) using the Shortest Path First (SPF) algorithm for deterministic packet routing, 2) developing a multipath routing mechanism for minimizing congestion, 3) developing a selective packet dropping scheme to handle deadlock and congestion, and 4) providing multicasting support for customized applications. These designs are implemented in a pipelined, IBA-style switch architecture, and are evaluated using an integrated workload consisting of MPEG-2 video streams, besteffort traffic, and control traffic on a versatile IBA simulation testbed. Simulation results with 15-node and 30-node irregular networks indicate that the SPF routing, multipath routing, packet dropping, and multicasting schemes are quite effective in delivering high and assured performance in clusters.

Bandwidth Estimation in Wireless LANs for Multimedia Streaming Services

H. K. Lee, V. Hall, K. H. Yum, K. I. Kim and E. J. Kim

Advances in Multimedia, vol. 2007, Article ID 70429, 7 pages, 2007

The popularity of multimedia streaming services via wireless networks presents major challenges in the management of network bandwidth. One challenge is to quickly and precisely estimate the available bandwidth for the decision of streaming rates of layered and scalable multimedia services. Previous studies based on wired networks are too burdensome to be applied to multimedia applications in wireless networks. In this paper, a new method, IdleGap, is suggested to estimate the available bandwidth of a wireless LAN based on the information from a low layer in the protocol stack. We use a network simulation tool, NS-2, to evaluate our new method with various ranges of cross-traffic and observation times. Our simulation results show that IdleGap accurately estimates the available bandwidth for all ranges of cross-traffic (100 Kbps ∼ 1 Mbps) with a very short observation time of 10 seconds.

A Comprehensive Framework for Enhancing Security in InfiniBand Architecture

M. Lee, and E.J. Kim

IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol. 18, No. 10, pp. 1393-1406, Oct. 2007

The InfiniBand Architecture (IBA) is a promising communication standard for building clusters and system area networks. However, the IBA specification has left out security aspects, resulting in potential security vulnerabilities, which could be exploited with moderate effort. In this paper, we view these vulnerabilities from three classical security aspects—confidentiality, authentication, and availability—and investigate the following security issues. First, as groundwork for secure services in IBA, we present partition-level and queue-pair-level key management schemes, both of which can be easily integrated into IBA. Second, for confidentiality and authentication, we present a method to incorporate a scalable encryption and authentication algorithm into IBA, with little performance overhead. Third, for better availability, we propose a stateful ingress filtering mechanism to block denial-of-service (DoS) attacks. Finally, to further improve the availability, we provide a scalable packet marking method tracing back DoS attacks. Simulation results of an IBA network show that the security performance overhead due to encryption/authentication on network latency ranges from 0.7 percent to 12.4 percent. Since the stateful ingress filtering is enabled only when a DoS attack is active, there is no performance overhead in a normal situation.

ROAL: A Randomly Ordered Activation and Layering Protocol for Ensuring K-Coverage in Wireless Sensor Networks

H. Kim, E. J. Kim, and K. H. Yum

Journal of Networks (JNW), Vol. 3, No. 1, pp. 43-52, Jan 2008.

In this paper, we propose a Randomly Ordered Activation and Layering (ROAL) protocol. Each node under the ROAL protocol can decide its eligibility regarding a given coverage degree K at randomly generated activation time using only the coverage status informed from its neighbor nodes located within its sensing region. A new concept of layer coverage also provides a simple and effective reconfiguration method for energy balancing. Using the layer concept, we also propose a circulation scheme to reconfigure the set of working nodes in an autonomous way, where the reconfiguration can be performed with a small and almost constant energy consumption. We also provide the model of the expected coverage and connectivity for the layer coverage and show a proper range in which only one node can be activated with regard to a node density and the sensing radius of a node. The simulation results show that the ROAL protocol can guarantee K-coverage with more than 95% coverage ratio, which almost closes to the coverage ratio that is achieved using the geographic coordinate. A significantly extended network lifetime is also observed against the original topology of a given network. Meanwhile, the experimental results on the circulation scheme show that the fraction of total reconfiguration energy becomes less than 1% of the energy consumed for the reconfiguration. Also, we obtain a greatly reduced packet latency, which corresponds to only 5% of the delay that occurred in the ROAL protocol

Assuring K-Coverage in the Presence of Mobility and Wear-Out Failures in Wireless Sensor Networks

J. Iyer, H. Yu, H. Kim, E. J. Kim, K. H. Yum, and P. S. Mah

Along with energy conservation, it has been a critical issue to maintain a desired degree of coverage in Wireless Sensor Networks (WSNs). In this paper, we consider more realistic WSN environments where the sensor nodes are moving around, which can disappear due to wear-out failures. By enhancing a variant of random waypoint model (Li et al., 2005), we propose Mobility Resilient Coverage Control (MRCC) to assure K-coverage in the presence of mobility. Our basic goals are (1) to elaborate the probability of breaking K-coverage with moving-in and moving-out probabilities and (2) to issue wake-up calls to sleeping sensors to meet user requirement of K-coverage even in the presence of mobility. Furthermore, to show the impact of wear-out failures on the coverage achieved, we adopt a lognormal distribution to depict the conditional probability of failures and observe the influence of reduced number of active nodes on coverage. Our experiments with Network Survivability – Double Link Failure show that MRCC achieves better coverage by 1.4% with 22% fewer active sensors than that of the existing Coverage Configuration Protocol (CCP). By taking reliability of nodes into account, the performance drop with respect to coverage is 3.7% (for coverage >1) while the reduction in the number of sensor nodes is 18.19% when compared with pure MRCC. Comparing CCP and MRCC with reliability, we observe a 3.4% reduction in coverage for the average probabilistic case and 5.78% for the individual probabilistic case, while achieving a 12.82% and 28.2% reduction in number of nodes, respectively.

Integration of Admission, Congestion, and Peak Power Control in QoS-Aware Clusters

K. H. Yum, Y. Jin, E. J. Kim, and C. R. Das

The Journal of Parallel and Distributed Computing (JPDC), 2010

Admission, congestion, and peak power control mechanisms are essential parts of a cluster network design for supporting integrated traffic. While an admission control algorithm helps in delivering the assured performance, a congestion control algorithm regulates traffic injection to avoid network saturation. Peak power control forces to meet pre-specified power constraints while maintaining the service quality by regulating the injection of packets. In this paper, we propose these control algorithms for clusters, which are increasingly being used in a diverse set of applications that require QoS guarantees. The uniqueness of our approach is that we develop these algorithms for wormhole-switched networks, which have been used in designing clusters. We use QoS-capable wormhole routers and QoS-capable network interface cards (NICs), referred to as Host Channel Adapters (HCAs) in InfiniBand™ Architecture (IBA), to evaluate the effectiveness of these algorithms. The admission control is applied at the HCAs and the routers, while the congestion control and the peak power control are deployed only at the HCAs. A mixed workload consisting of best-effort, real-time, and control traffic is used to investigate the effectiveness of the proposed schemes.Simulation results with a single router (8-port) cluster and a 2-D mesh network cluster indicate that the admission, congestion, and peak power control algorithms are quite effective in delivering the assured performance. The proposed credit-based congestion control algorithm is simple and practical in that it relies on hardware already available in the HCA/NIC to regulate traffic injection.

Hierarchical Multiplexing Interconnection Structure for Cost-Effective Stage-Level Reconfigurable Chip Multiprocessor

Y. Kim, E. J. Kim, and R. N. Mahapatra

Stage-level reconfigurable chip multiprocessor (CMP) aims to achieve highly reliable and fault tolerant computing by using interwoven pipeline stages and on-chip interconnect for communicating with each other. The existing crossbar-switch based stage-level reconfigurable CMPs offer high reliability at the cost of significant area/power overheads. These overheads make realizing large CMPs prohibitive due to the area and power consumed by heavy interconnection networks. On other hand, area/ power-efficient architectures offer less reliability and inefficient stage-level resource utilization. In this paper, I propose a hierarchical multiplexing interconnection structure in lieu of crossbar interconnect to design area/power-efficient stage-level reconfigurable CMP. The proposed approach is able to keep the reliability offered by the crossbar-switch while reducing the area and power overheads. Experimental results show that the proposed approach reduces area by up to 21% and power by up to 32% when compared with the crossbar-switch based interconnection network.