Thread Migration for High-Performance Distributed Computing

Thread Migration for High-Performance Distributed Computing

Dr. Steve J. Chapin, Department. of Mathematics and Computer Science, Kent State University; Dr. Janche Sang, Dr. Ben A. Blake and Dr. Chien-Hua Lin, Department of Computer and Information Science,Cleveland State University


Contact Dr. Steve J. Chapin or Dr. Janche Sang via e-mail for more details.


Objectives:

Conventional systems software, notably UNIX, supports the notion of a: a single instruction stream within an address space. This model has two major limitations. First, for each process, a large amount of information, including page tables, file descriptors, register values, and so on, must be manipulated during execution. This volume of information makes processes expensive to create, synchronize, and maintain. Second, the system structure is well-suited to conventional uniprocessor systems such as workstations, but is not well-suited for multiprocessor systems. To solve these problems, recent operating systems and languages have supported lightweight processes or threads as a basic representation of concurrent computational entities. Threads represent multiple sequences of instruction execution within a single address space. Threads have advantages over conventional processes, in that the quantity of per-thread state information is small, and thus the runtime overhead of state management is diminished.

On multiprocessor systems, spreading execution of threads over several processors can exploit parallelism and thus achieve improved performance. However, in these systems, two factors may degrade the performance gains of multi-threading. The first factor is load imbalance.

During program execution, there may be a dense cluster of threads resident on a single processor while only a few threads xist on other processors. A systematic scattering of threads across processors that allows heavily loaded processors to balance their load efficiently with lightly loaded processors gives the executing system an opportunity to achieve a better overall throughput. The second factor is non-local data access. During a threaded program's execution, threads will typically access remote data and thus require unavoidable inter-processor communication. If cross-processor data access tends to be frequent, then relocating an accessing thread to the site hosting the remote information can reduce inter-processor communication traffic. These two factors provide the impetus to develop a dynamic migration capability for threads. In the past, the use of thread migration in distributed applications was hindered due to the high communication latency. With the support of high-speed networks, thread migration can become much cheaper and more attractive to programmers.

Methodology:

Faculty at Kent State and Cleveland State will be pursuing cooperative research to achieve a concurrent programming environment utilizing threads for a distributed system consisting of workstations and small-scale parallel processors connected by an ATM network. We will investigate techniques for migrating threads efficiently between heterogeneous machines, and schemes to balance loads across processors and increase locality of reference.

This environment is designed for the needs of high-performance distributed computing, such as numerical computation and simulations; therefore, efficiency is our major concern. Based on our multithreaded library, our second goal is to design and implement a high-performance parallel simulation system with graphical user interfaces. This system is based on the mobile-process approach we proposed in and can be used in a variety of computational intensive applications such as molecular dynamics in chemical physics, battle management models, genetic algorithms, etc.

In parallel simulation the need for visualization of simulation results and interaction while the application is running has been ignored. Providing dynamic visualization is useful especially in simulations that take a long time so that online corrections to some parameters can be made with the goal of getting useful results from the remaining run of the simulation.

Drs. Chapin, Sang, and Blake will be primarily responsible for the thread support software. Drs. Lin and Sang will be primarily responsible for the simulation system.

Dr. Chapin is being funded funded by the Department of Energy to research task migration support for heterogeneous hardware and software. Dr Sang's current research emphasis is on use of portable thread libraries for parallel computing/simulation, transaction processing, software testing, and practical parallel algorithms. Dr. Lin's current research interests include wireless computer networks, software reliability, and neural networks.

We plan to design and implement a portable threads system supporting thread migration between workstations clustered over an ATM network. To implement thread migration in a heterogeneous environment, we need sufficient knowledge of the underlying systems to generate machine-independent information of a thread's running state. This information can be transferred from a source to a destination machine to regenerate an equivalent state for the migrant thread.

Our research will take advantage of Dr. Chapin's ongoing work for the Department of Energy, which is attempting to develop a machine-independent task representation. We will also develop schemes for partitioning and mapping of tasks and data for scheduling as well as load balancing.

Our intention is to use the experimental approach to measure the performance and to use the analytical method to quantify data locality and workload in distributed simulation.

We will design and conduct experiments to provide empirical data on the evaluation of the locality and load balancing policies. The goal of these experiments will be to answer the following questions:

  1. On a high-speed network, how do locality management and load balancing policies affect application performance?
  2. Which factor has a larger impact?
  3. Is it possible to optimize both data locality and load balance,or will increasing locality severely imbalance the load and vice versa?

The prototype system will be implemented and tested locally at CSU and KSU, using OCARnet for connectivity between the two sites. We will use sample applications from the Liquid Crystal Institute (LCI) at Kent State to benchmark and test our system. When the system is complete and has been tested, it will be ported to machines at the Ohio Supercomputer Center, and also made available to the other OCARnet sites.

In addition, we will continue to pursue collaboration with external agencies such as Los Alamos National Laboratory and Sandia National Laboratories.

Prior Work of Investigators:

Dr. Blake's research background is in the area of task scheduling for distributed systems, and his work applies directly to the problem of load balancing for threads in distributed systems. Dr. Chapin has a research contract with the Department of Energy for $84,000 for nine months starting 1 Jan 1996, and which is renewable for two more years. Under this contract, he is researching task migration for heterogeneous operating systems. This work is directly related to the project. His past research experience is in operating systems, computer networking, and distributed scheduling. Dr. Lin's recent project entitled ``Wireless Internet Access Using Spread Spectrum Technology'' was supported from NASA Lewis Research Center and a proposal for continuation is being reviewed. Dr. Sang has been awarded the Research and Creative Activities Award from the CSU Graduate Council for research in the enhancement of data locality to speed up distributed simulations.

In addition to the core work mentioned above, OCARnet will also be of use to other faculty in related areas. For example, Paul Farrell and Arden Ruttan of Kent State will use the proposed equipment to extend their work on a collaborative project with the Liquid Crystal Institute at Kent, undertaken as part of the theory effort of the ALCOM NSF Science and Technology Center. This involves an environment for computational steering and visualization applied to modeling of the behavior of liquid crystals in two- and three-dimensional regions by minimizing the Landau--de Gennes free energy. This requires solving for a symmetric traceless tensor with five independent components. The project will use the workstation cluster for algorithm and code development and initial testing. Once the development stage of the code has been completed, full scale computations will be undertaken on the machines at OSC and the ATM network will be used to permit interactive visualization and steering of the computations using the Hewlett-Packard workstations at Kent.

Paul Wang of Kent State is leading several research projects in parallel and distributed symbolic computing; several of his projects will use the workstation cluster, as well as the high-speed connection to the facilities at the other institutions, especially the Ohio Supercomputer Center. He and his research group have been conducting research in parallelism and symbolic computation for a number of years under support from the National Science Foundation and the Army Research Office. The research efforts have been focused in the area of polynomial operations including polynomial GCD (greatest common divisor), and polynomial factoring.