#### **GAPP** Overview

- Developed in 1982-1984 by Martin Marietta Aerospace, manufactured by NCR Corporation
  GAPP II had 6x12=72 cells/PEs per chip
  Largest (?) GAPP produced contained
  - Hord does not mention this, but the GAPP can be thought of as a systolic array!!

288x288=82,944 processing elements

## **GAPP Cell / PE Overview**

- GAPP II had 6x12=72 cells/PEs per chip
- Cells are simple to allow many cells per chip: 4 1-bit registers, 1bit full adder / subtractor (FAS), 128 bits of memory
  - FAS gets inputs from 3 of the registers: North-South (NS), East-West (EW), and carry borrow (C)
    - NS register connects to NS registers in north and south neighbors, EW register...
  - FAS implements a truth table to perform all arithmetic and logical functions in bitserial fashion
  - FAS outputs are sum (SM), carry (CY), and borrow (BW)
  - Control signals to select data paths, addresses to select memory element
    - SIMD, so all cells do same thing
    - Cells can be deactivated

Fall 1999, Lecture 13

2

## GAPP PE Interconnection (Within a 6x12 PE Chip)

1

3

- Each GAPP chip has 6 sets of 12-bit NS shift registers, and 12 sets of 6-bit EW shift registers
  - Contents of EW registers can be transferred to NS registers and vice-versa
  - Each cell also has a one-bit CM register (communication), and thus each chip has 6 sets of 12-bit CM registers
    - Input enters chip at south, output flows north though CM registers
- On every instruction, a 72-bit "plane" of data involving NS, EW, C, CM, and/or RAM planes can move around in the chip
  - Edges of the NS, EW, and CM planes exit the chip in the obvious directions, through 6-bit ports on n & s edges and 12-bit ports on e & w edges
  - Can have as many as three simultaneous inputs and three simultaneous outputs

# **GAPP PE Array**

- Each <u>chip</u> connects to its 4 neighbors, with wrap-around on the edges
  - A 14"x16" board hods 48x132=6336 cells
- Operation:
  - Data enters at CM south port, flows north, leaves at CM north port
  - Simultaneous I/O possible
    - Result available at start of I/O operation
    - Plane of input data in external world
    - Algorithm must take N clocks, where N is size of array in north-south direction
    - Systolic operation:
      - Result loaded from RAM or register into CM plane, CM register plane shifts north, while data is output from northern edge of CM plane and more data is input from southern edge of CM plane
      - Continues for N clocks, and during this time other operations can occur that don't use the CM plane

Fall 1999, Lecture 13

Fall 1999, Lecture 13

Fall 1999, Lecture 13

#### **GAPP PEs & Interconnection**

- PE connections
  - N/S and E/W lines to pass data to neighbors, plus CMN and CMS
  - 7 RAM address lines, 13 control lines
- PE internals (see figure)
  - 4 registers / latches: CM N/S E/W C
    - Inputs from different sources, fed by multiplexors controlled by 13 control lines
  - Full adder / subtractor

5

- C, NS, EW inputs to multiplexors are the outputs of the registers
- Sum (SM) output goes to RAM and to any of the 4 registers
- Carry (CY) and borrow (BW) outputs go to C register
- Truth table specifies various operations on data in NS and EW registers

Fall 1999, Lecture 13