Conference & Workshops

Hybrid Dynamic Thermal Management Based on Statistical Characteristics of Multimedia Applications

I. Yeo, and E. J. Kim

Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), Bangalore, India, August 2008

Recently multimedia applications become one of the most popular applications in mobile devices such as wireless phones, PDAs, and laptops. However, typical mobile systems are not equipped with cooling components, which eventually causes critical thermal deficiencies. Although many low-power and low-temperature multimedia playback techniques have been proposed, they failed to provide QoS (Quality of Service) while controlling temperature due to the lack of proper understanding of multimedia applications. We propose Hybrid Dynamic Thermal Management (HDTM) which exploits thermal characteristics of both multimedia applications and systems. Specifically, we model application characteristics as the probability distribution of the number of cycles required to decode a frame. We also improve existing system thermal models by considering the effect of workload. This scheme finds an optimal clock frequency in order to prevent overheating with minimal performance degradation at runtime. The proposed scheme is implemented on Linux in a PentiumM processor which provides variable clock frequencies. In order to evaluate the performance of the proposed scheme, we exploit three major codecs, namely MPEG-4, H.264/AVC and H.264/AVC streaming. Our results show that HDTM lowers the overall temperature by 15◦C and the peak temperature by 20◦C , while maintaining frame drop ratio under 0.2% compared to previous thermal management schemes such as feedback control DTM [8], Frame-based DTM [5] and GOP-based DTM [15].

Predictive Dynamic Thermal Management for Multicore Systems

I. Yeo, C. C. Liu and E. J. Kim

Proceedings of the 45th Design Automation Conference (DAC), Anaheim, CA, June 2008

Recently, processor power density has been increasing at an alarming rate resulting in high on-chip temperature. Higher temperature increases current leakage and causes poor reliability. In this paper, we propose a Predictive Dynamic Thermal Management (PDTM) based on Application-based Thermal Model (ABTM) and Core-based Thermal Model (CBTM) in the multicore systems. ABTM predicts future temperature based on the application specific thermal behavior, while CBTM estimates core temperature pattern by steady state temperature and workload. The accuracy of our prediction model is 1.6% error in average compared to the model in HybDTM [8], which has at most 5% error. Based on predicted temperature from ABTM and CBTM, the proposed PDTM can maintain the system temperature below a desired level by moving the running application from the possible overheated core to the future coolest core (migration) and reducing the processor resources (priority scheduling) within multicore systems. PDTM enables the exploration of the tradeoff between throughput and fairness in temperature-constrained multicore systems. We implement PDTM on Intel’s Quad-Core system with a specific device driver to access Digital Thermal Sensor (DTS). Compared against Linux standard scheduler, PDTM can decrease average temperature about 10%, and peak temperature by 5◦C with negligible impact of performance under 1%, while running single SPEC2006 benchmark. Moreover, our PDTM outperforms HRTM [10] in reducing average temperature by about 7% and peak temperature by about 3◦C with performance overhead by 0.15% when running single benchmark.

Effective Dynamic Thermal Management for MPEG-4 Decoding

I. Yeo, H. K. Lee, K. H. Yum and E. J. Kim

IEEE International Conference on Computer Design (ICCD), Lake Tahoe, USA, October, 2007

This paper proposes Dynamic Thermal Management (DTM) based on a dynamic voltage and frequency scaling (DVFS) technique for MPEG-4 decoding to guarantee thermal safety while maintaining a quality of service (QoS) constraint. Although many low-power and low-temperature multimedia playback techniques have been proposed, most of them are impractical in real-time and have several restricting assumptions. Multimedia data consists of several frames requiring different decoding efforts. Since both temperature and performance of a multimedia system are affected by the complexity of scenes, our main idea is to use the information on scene complexity to find an appropriate frequency. In order to predict the complexity of the current scene, we extract information from the previous group of pictures (GOP) using feedback control with a display buffer. Experimental results with twelve movies show that our DTM scheme guarantees the threshold of temperature (70◦C) while maintaining 0% frame miss ratio. Also, our DTM scheme decreases the average temperature by up to 13% without any additional hardware and playback latency

I2SEMS: Interconnects-Independent Security Enhanced Shared Memory Multiprocessor Systems

M. Lee, M. Ahn, and E. J. Kim

Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), Brasov, Romania, September, 2007.

Protection and security are becoming essential requirements in commercial servers. In this paper, we present a fast and efficient method for providing secure memory and cache-to-cache communications in shared memory multiprocessor systems that are becoming enormously popular in designing servers for various applications. Since our scheme is independent of underlying interconnects and cache coherence protocols, we refer to it as Interconnects-Independent Security Enhanced Shared Memory Multiprocessor Systems (I2SEMS). The main challenge in designing I2SEMS is how to precompute keystreams in a timely manner, which is critical to minimize performance overhead. We achieve this goal by adopting a single system-wide Global Counter Controller (GCC) and three additional components for each processor: a keystream queue, a keystream cache, and a keystream pool. The GCC assigns a unique range of counters as a way to help processors precompute the counters’ keystreams. We have implemented I2SEMS using Simics with Wisconsin multifacet General Execution-driven Multiprocessor Simulator (GEMS). We tested our design with SPLASH-2 benchmarks on up to 16-processor shared memory multiprocessor systems. Simulation results show that the overall performance slowdown is 4% on average and the keystream hit rate is as high as 78%. The stable keystream hit rate shows that I2SEMS works well with both memory-read and memory-write dominant applications. Similar to the conventional cache, a large keystream pool size is beneficial to high hit rates

Design of Active Set Top Box in a Wireless Network for Scalable Streaming Services

H. K. Lee, V. Hall, K. H. Yum, K. I. Kim and E. J. Kim

Proceedings of the 2007 International Conference on Image Processing (ICIP), San Antonio, 2007

The popularity of multimedia streaming services via wireless home networks has confronted major challenges in quality improvement for services through a set top box (STB). Even though scalable methods have been suggested to enhance the quality of multimedia streaming services, it is still challenging how to provide scalable streaming services in wireless home networks. Previous studies on scalable streaming services eliminate the corrupted stream at the multimedia client. In this paper, we propose a new method, ActiveSTB, which removes the distorted or unsuitable multimedia data early to save frugal resources. We use a network simulation tool, NS-2, to evaluate our method with various range of cross traffic and error rates. The simulation results show that ActiveSTB support 12.5% to 40% more packets than the original STB for all ranges of cross traffic and error rates.

A Randomly Ordered Activation and Layering Protocol for Ensuring K-Coverage in Wireless Sensor Networks

H. Kim, E. J. Kim, and K. H. Yum

Proceedings of the 3rd International Conference on Wireless and Mobile Communications, ICWMC '07, March 2007, Guadeloupe, French Caribbean.

K-coverage in wireless sensor networks (WSNs) is defined as ensuring that every point in the area is monitored by at least K different sensor nodes. In this paper, we propose a new K-coverage algorithm for sensor networks, called Randomly Ordered Activation and Layering (ROAL), that solves the K-coverage problem in a small constant time in a distributed manner while providing simple and efficient dynamic reconfiguration for the WSNs. The simulation results show that the ROAL can guarantee K-coverage with the uncovered area less than 5% when a sufficient number of sensor nodes are provided, and that the lifetime of the sensor network is significantly extended by more than 400%

A Domain-Specific On-Chip Network Design for Large Scale Cache Systems

Y. Jin, E. J. Kim, and K. H. Yum

Proceedings of 13th International Symposium on High-Performance Computer Architecture (HPCA-13), Phoenix, 2007

As circuit integration technology advances, the design of efficient interconnects has become critical. On-chip networks have been adopted to overcome scalability and the poor resource sharing problems of shared buses or dedicated wires. However, using a general on-chip network for a specific domain may cause underutilization of the network resources and huge network delays because the interconnects are not optimized for the domain. Addressing these two issues is challenging because in-depth knowledges of interconnects and the specific domain are required. Recently proposed Non-Uniform Cache Architectures (NUCAs) use wormhole-routed 2D mesh networks to improve the performance of on-chip L2 caches. We observe that network resources in NUCAs are underutilized and occupy considerable chip area (52% of cache area). Also the network delay is significantly large (63% of cache access time). Motivated by our observations, we investigate how to optimize cache operations and design the network in large scale cache systems. We propose a single-cycle router architecture that can efficiently support multicasting in on-chip caches. Next, we present Fast-LRU replacement, where cache replacement overlaps with data request delivery. Finally we propose a deadlock-free XYX routing algorithm and a new halo network topology to minimize the number of links in the network. Simulation results show that our networked cache system improves the average IPC by 38% over the mesh network design with Multicast Promotion replacement while using only 23% of the interconnection area. Specifically, Multicast FastLRU replacement improves the average IPC by 20% compared with Multicast Promotion replacement. A halo topology design additionally improves the average IPC by 18% over a mesh topology.

Bandwidth Estimation In Wireless LANs For Multimedia Streaming Services

H. K. Lee, V. Hall, K. H. Yum, K. I. Kim and E. J. Kim

Proceedings of the 2006 International Conference on Multimedia & Expo (ICME), Toronto, 2006

The popularity of multimedia streaming services via wireless networks presents major challenges in the management of network bandwidth. One challenge is to quickly and precisely estimate the available bandwidth for the decision of streaming rates of layered and scalable multimedia services. Previous works based on wired networks are too burdensome to be applied to multimedia applications in wireless networks. In this paper, a new method, IdleGap, is suggested to estimate the available bandwidth of a wireless LAN based on the information from a low layer in the protocol stack. We use a network simulation tool, NS-2, to evaluate our new method with various range of cross traffic and observation times. Our simulation results show that IdleGap accurately estimates the available bandwidth for all ranges of cross traffic (100Kbps ~ 1Mbps) with a very short observation time of 10 seconds.

A PROactive Request Distribution(PRORD) Using Web Log Mining in Cluster-Based Web Server

H. K. Lee, G. Vageesan, K. H. Yum and E. J. Kim

Proceedings of the 2006 International Conference on Parallel Processing (ICPP) , Columbus, 2006.

Widely adopted, distributor-based systems forward user requests to a balanced set of waiting servers in complete transparency to the users. The policy employed in forwarding requests from the front-end distributor to the backend servers plays an important role in the overall system performance. The locality-aware request distribution (LARD) scheme improves the system response time by having the requests serviced by the web servers that contain the data in their cache. In this paper, we propose a proactive request distribution (PRORD) to apply an intelligent proactive-distribution at the front-end and complementary pre-fetching at the back-end server nodes to acquire the data into their caches. The pre-fetching scheme fetches the web pages in advance into the memory based on a confidence value of the web page, which is predicted by the proactive distribution scheme. The proactive distribution depends on both online and offline analysis of the website log files, which capture user navigation patterns on the website. Designed to work with the prevailing web technologies, such as HTTP 1.1, our scheme aims to provide reduced response time to the users. Simulations carried out with traces derived from the log files of real web servers witness performance boost of 15-45% compared to the existing distribution policies

Assuring K-Coverage in the Presence of Mobility in Wireless Sensor Networks

H. Yu, J. Iyer, H. Kim, E. J. Kim, K. H. Yum and P. S. Mah

Proceedings of IEEE GLOBECOM2006

Along with energy conservation, it has been a critical issue to maintain a desired degree of coverage in wireless sensor networks (WSNs), especially in a mobile environment. By enhancing a variant of Random Waypoint (RWP) model [1], we propose Mobility Resilient Coverage Control (MRCC) to assure K-coverage in the presence of mobility. Our basic goals are 1) to elaborate the probability of breaking K-coverage with moving-in and moving-out probabilities, and 2) to issue wake-up calls to sleeping sensors to meet user requirement of K-coverage even in the presence of mobility. Furthermore, by separating the mobility behavior into average and individual, the probability of breaking K-coverage can be precisely calculated, hence reducing the number of sensors to be awakened. Our experiments with NS2 show that MRCC with the individual probability achieves better coverage by 1.4% with 22% fewer numbers of active sensors than that of existing Coverage Configuration Protocol (CCP) [2]