Associative Cluster Computing (ACC)

Associative Cluster Computing (ACC)

Authors:

Dr. Jerry Potter, Department of Math & Computer Science, Kent State University, Kent, OH 44242

Srinivas Rajagopalan, Department of Math & Computer Science, Kent State University, Kent, OH 44242

Abstract:

Associative Computing paradigm allows the development of new, innovative algorithms that can take direct advantage of massive parallelism. The computational power provided by PC clusters is better utilized by providing the Associative Computing model on top of these PC clusters. This paper discusses the concept of Associative Cluster Computing model to take advantage of the best of the two worlds.

Cluster Computing:

Cluster computing [1] takes advantage of the computational power provided by PC clusters exceeding the computational power provided by supercomputers only a few years ago at a fraction of the hardware cost. But, more effort is needed to reduce programming complexity and software costs, and to increase the effective parallelism of the cluster computing environment. We feel that effective parallelism cannot be achieved by hardware duplication using conventional algorithms, programming models and data representation models. This opinion applies to clusters also since they are a type of parallel computing.

The problem:

The basic von Neuman model of computation is a centralized processor shared among the elements of a single monolithic address space of data (We avoid the use of RAM or other models because the issue is fundamental to all of conventional computing). We will refer to the address space and the data it contains as the von Neuman address space. In this model, the CPU is a relatively expensive common resource and a program is the means whereby the data stored in inexpensive memory share the processor. While the storing of both the program and data in the same memory is an important aspect of the von Neuman model, we will not address it here. That is, for this analysis, we will assume that there is no memory contention between program and data. Consequently in this basic von Neuman model, the programmer has sole discretion to move and store data in memory as he chooses.

In the early days when CPUs and memory were "evenly" matched and the OS consisted of little more than a bootstrap program there was essentially a one-to-one mapping of the address space model onto the hardware’s memory. However, now in modern OS memory is a shared resource. That is, multitasking OS impose a model of computing on top of the von Neuman model in which the physical memory is a single resource shared by the multiple tasks executed in the single CPU. Yet because of virtual mapping, the von Neuman address space model is still used by application algorithm developers and programmers.

The abstract concept of multitasking is easily transferred to cluster and parallel computing. In essence, parallel hardware allows multiple independent tasks to be executed simultaneously. As long as the OS assures that independent tasks are being executed, only straightforward modifications to multitasking algorithms are needed to handle multiple CPUs. However, a major purpose of cluster and parallel computers is to address those problems that can not be executed in an acceptable amount of time on a single computer. That is, by definition those problems that we want to execute on a cluster are those tasks that are not mutually independent and therefore share at least one common resource - the von Neuman address space.

Because parallel and cluster computing under the von Neuman address space model forces the partitioning of the address space shared resource, a major change is induced in the nature of the computation. Specifically, the problem's von Neuman address space must now be mapped onto multiple tasks on multiple processors each with their own virtual address space using OS techniques such as semaphores, barriers, and rendezvous to coordinate access to the single von Neuman address space resource. In the typical algorithm development model for parallel programs, the partitioning of global von Neuman address space into virtual address spaces is done by imposing OS resource-sharing mechanisms on top of the von Neuman address space model. This is a poor programming model and is one reason that many consider parallel programming difficult.

Dynamic jobs such as air traffic control and dynamic data mining require that the data be initially distributed and terminally retrieved, in addition, the dynamic and global nature of the data requires that the address space be reorganized (retrieved, sorted and redistributed) whenever new data is added to the system. This requires the dynamic scheduling of n processor over a single shared resource - the global von Neuman address space. Over the years considerable research has been developed about the nature of special purpose OSs particularly in real time command and control applications. The conclusion is that dynamic scheduling of a shared resource is NP-hard[2]. Thus the conclusion must be that those problems of interest that remain to be solved using the shared resource von Neuman model on a cluster probably require dynamic resource allocation and are therefore NP-hard using the von Neuman model. This conclusion is supported by imperial evidence such as the Air Traffic Control problem. Literally billions of dollars have been spent trying to provide greater capacity for the ATC system using multi-tasking parallelism under the von Neuman model to no avail.

It is important to point out that it is the parallel algorithms based on the von Neuman model which are NP-hard, not the problems the algorithm addresses (Indeed, the ATC problem was solved in the 1970's by a non von Neuman model in an associative computing environment[3]).

Problem Existence in Cluster Computing:

Parallel computers have always been plagued by I/O bottlenecks. The cluster I/O problem is the same as the parallel I/O problem because it is due in part to the shared resource aspect of the von Neuman model. In the case of clusters the problem is most dominate in the form of disk I/O management. That is, spreading a single file or a single logical database perhaps composed of multiple files (i.e. a single address space) over many disk drives in a cluster creates a shared resource database and a corresponding I/O problem that is not present on a single processor computer.

Associative Computing:

Associative computing tries to achieve computational power not just by hardware duplication, but, through associative parallel algorithms, associative programming models, and associative data representations [4][5]. In the associative model each processor - memory element pair is an associative cell with its own unique address space. Thus in the associative model, cells and their corresponding data can be identified and accessed by content. A global address is not needed. Thus data can be located any place in an associative processor. There is no need to sort or order the data. Since each cell has its own processor, data movement can be minimized and the classic data memory to processor bottleneck is greatly reduced. In the associative model, the data is sendentary. The program comes to it. In problems where the data is vastly larger than the program, this a desirable approach.

Proposed solution:

We feel that in a cluster system, a cluster node, i.e. a computer and its disks, can be thought of as an associative cell. The intent is to apply the associative model to maximize the cluster computing power and minimize the cluster I/O problem. In the associative computing model, it is assumed that there is much more data than program. Thus the program is sent to the individual cells in the cluster so that the data need not be moved.

In addition to homogeneous clusters, as the Heterogeneous Associative Computing model shows[6], clusters of heterogeneous computers can be handled. Most of today's clusters consist of homogeneous systems. However, heterogeneous clusters have the advantage of being able to be created out of what processors are available. In addition, if a cluster is to address problems that require a special aspect such as a dedicated visualization node, the heterogeneous associative model would support it.

The ACC architecture is depicted here.

Associative Computing paradigm

Logical Cluster Computing Environment

Physical Cluster

The top, associative computing paradigm (ACP), layer provides the end-users with an easy to use programming and the data representation model allowing the end-users to achieve maximal parallelism using the cluster environment. The logical cluster computing environment(LCCE) seamlessly maps the associative cells of the ACP layer onto the computers that form the cluster in the physical layer. The physical cluster layer is the actual network of computers.

The middle layer mentioned above is the key part of this architecture. Individual associative cells in this proposed model represents a node in the cluster. Different subsets of nodes in the cluster can be part of different associations. New associative cells can be easily added to the cluster at this level. That is, new hardware in the physical cluster can be easily added as new cells in the LCCE layer.

The individual cells in a cluster are much more powerful than the cells of a typical associative machine. We can make use of this power as follows. Associative cells typically store only the data, we can also store subprograms along with the data that they need to work with in these cells. Then high level associative instructions can be broadcast to the cells. These “meta” instructions cause the nodes to execute subprograms stored in the cell’s memory. The only requirement is that every cell that responds to a “meta” instruction needs to have the subprogram as well as the data required by the meta instruction. For a given associative program, one of the nodes in the cluster acts as the mega instruction stream controller.

Node 0 (Controller node)	Associative cluster program that is being executed (mixture of simple and mega instructions)
Node 1	Subprogram 1		Data for subprogram 1
Node 2	red	small	fast	Toyota
Node 3	Subprogram 1		Data for subprogram 1
Node 4	Subprogram 2			Data for subprogram 2
Node 5	green	medium	fast	Honda
Node 6	blue	large	slow	Buick

Figure 1 - ACC

Figure 1 provides a simple illustration of an associative cluster computing program. In this example, there are seven nodes. Node 0 acts as the controller node for the associative cluster program. Nodes 2, 5, and 6 form three cells of one association which contains only data.. Nodes 1 and 3 are cells of another association. Node 4 is part of yet another association. Upon receiving a mega instruction from the controller node equivalent to “Find the fast cars.” The associative cells represented by nodes 2 and 5 will respond and perform the subsequent commands broadcast.

Further research:

Currently we have developed the initial design of the ACC system. We are in the process of implementing our Associative Cluster Computing design. The cluster computing environment will then be evaluated and will be used as a test bed for further high performance associative algorithm development.

References:

1. “Cluster Computing White Paper” – Mark Baker (Editor), University of Portsmouth, UK, 2000

2. “Real-Time and Embedded Systems,” by J. A. Stankovic in The Computer Science and Engineering Handbook, A. B. Tucker, Jr. Ed., CRC Press, 1997, pp.1709-1724.

3. “ATC Computers - Yesterday, Today, Tomorrow,” by W. C. Meilander and J. Baker in Proceedings of the 1998 43rd Annual Air Traffic Control Association Fall Conference, 1998, pp. 91-95.

4. “ASC – An Associative Computing Paradigm” – Jerry Potter, et al, COMPUTER, November, 1994, pp. 19ff.

5. “Associative Computing on the Connection Machine” – Srinivas Rajagopalan, Master’s Thesis, Department of Math and Computer Science, Kent State University, Kent, OH, 1991

6. “Heterogeneous Associative Computing” – Jerry Potter, and Stephen Scott, Artech House, Inc., 1996

7. “Heterogeneous Associative Computing” in PROCEEDINGS OF THE WORKSHOP ON HETEROGENEOUS PROCESSING, New Port Beach, Calif., April 13-16, 1993, pp.3-11.