Chapter 5 - Computer Systems Organization

(5.1) Introduction

Chapter 4 introduced the basic building blocks of a computer: transistors, gates, and circuits. This chapter puts those blocks together to build the major sub-systems of a computer.

(5.2) The Components of a Computer System

The earliest computing devices were programmed by punching up a deck of IBM cards and/or wiring up a plug-board. In 1946, John von Neumann suggested it would be better to store the program for a computing device in the same memory as its data so it could fetch its instructions at a fast rate and also operate on its own program if need be. We call this the stored program concept.

Figure 5.2 shows the major sub-systems of a computer with the Von Neumann Architecture. The architecture has a Memory holding both data and instructions to be executed by a computer; a Processor comprising an Arithmetic-Logic Unit (ALU) to operate on the data under control of a Control Unit executing instructions read from memory one at a time; and Input-Output devices to input data and programs from the outside world and to send results to the outside world.

(5.2.1) Memory and Cache: The state of each memory bit is held in a bi-stable device called a flip-flop. There are many different ways to implement a flip-flop from the basic logic gates - the following diagram shows a flip-flop built from a NOT-gate, an AND-gate, and an OR-gate:


Note the feedback path in this circuit - the output of the OR-gate is fed back to the AND-gate whose output feeds the OR-gate. To act like a memory every flip-flop needs a feedback path to hold the state of its stored bit.

Raising the S input of this flip-flop to the 1-value sets the state of its stored bit, Q, to the 1-state. Raising the R input to the 1-value resets the state of Q to the 0-state. When S = R = 0 the flip-flop acts like a memory and holds the current state of its stored bit, Q. The computer should never try to both set and reset the flip-flop at the same time; i.e., it should never raise both inputs S and R to the 1-value simultaneously.

There are only two inputs, S and R, in this circuit but to see how the flip-flop behaves we must also consider the current state of the stored bit, Q, and draw a truth table with 8 rows:

INPUTSCurrent
state of Q
Internal LinesNew state of
Q = S OR b
SRa = NOT(R) b = a AND Q
000100
001111
010000
011000
100101
101111
110Not Allowed
111

From the truth table we see that the flip-flop behaves like it should:


Random Access Memory: The following diagram shows a random access memory (RAM) with 2N memory cells with each cell holding W bits. There wasn't any standard value for memory cell width in the early days - computers were built with W = 6, 8, 12, 16, 24, 30, 32, 36, 40, 48, and 60 bits. All modern computers use a cell width, W, of 8 bits and the 8-bit value stored in each cell is called a byte.
Each cell in the RAM is identified by a unique integer address: 0, 1, 2, ..., 2N - 1. Each address requires N bits. The value 2N is the maximum memory size or address space of the computer. The following table shows the address space for some values of N:

NAddress Space (in bytes) Usually Written
1665,536 64 Kilobytes
201,048,576 1 Megabyte
224,194,304 4 Megabytes
2416,777,216 16 Megabytes
2667,108,864 64 Megabytes
28268,435,456 256 Megabytes
301,073,741,824 1 Gigabyte
324,294,967,296 4 Gigabytes
401,099,511,627,776 1 Terabyte

Powers of Ten and Powers of Two

The metric system of measurements uses certain prefixes for certain powers of ten:

kilo 103 = 1,000
mega 106 = 1,000,000
giga 109 = 1,000,000,000
tera 1012 = 1,000,000,000,000
peta 1015 = 1,000,000,000,000,000
It's only because certain powers of two happen to be close to these powers of ten that the computer field uses these same prefixes to measure memory capacities: e.g., a megabyte memory actually contains 220 = 1,048,576 bytes instead of 106 = 1,000,000 bytes.

The computer field only does this with memory capacities - all other quantities are measured with powers of ten. For example, an 800-megahertz clock emits exactly 800,000,000 clock ticks per second and a network link with a 10-megabyte/second bandwidth transmits exactly 10,000,000 bytes per second across the link.


Each 16-bit integer occupies two successive bytes of a RAM, each 32-bit integer occupies four successive bytes, each 64-bit real number occupies eight successive bytes, etc. For example, the 16-bit integer 0000 0111 1101 0001 (which is 2001 in decimal) might be stored in bytes 42 and 43 of a RAM with byte 42 holding 00000111 and byte 43 holding 11010001.

A text string of m characters occupies m successive bytes of a RAM with each byte holding the ASCII code for one character. For example, the 10-character string, KENT STATE, might occupy bytes 160 through 169 of a RAM as follows:

ADDRESS CONTENTSASCII-Character
16001001011K
16101000101E
16201001110N
16301010100T
16400100000(space)
16501010011S
16601010100T
16701000001A
16801010100T
16901000101E

The Memory Unit also has a Memory Address Register (MAR) and a Memory Data Register (MDR) for communication with other parts of the computer. The MAR holds the N-bit address of a memory cell and the MDR holds the data being fetched from that cell or the data being stored into that cell. The size of the MDR is some multiple of W bits and usually agrees with the size of the processor: e.g., a 32-bit processor can fetch or store 32 memory bits at one time so the MDR holds 4 bytes.

A memory fetch operation retrieves data from a storage location of the RAM using the following algorithm:

  1. Copy the address of the storage location into the MAR.
  2. Decode the contents of the MAR to select the desired storage location of the RAM.
  3. Copy the contents of that storage location into the MDR.
A memory store operation stores a value into a storage location of the RAM using the following algorithm:
  1. Copy the address of the storage location into the MAR.
  2. Copy the value to be stored into the MDR.
  3. Decode the contents of the MAR to select the desired storage location of the RAM.
  4. Copy the contents of the MDR into that storage location.
A RAM with N address bits and 2N memory cells needs an N-to-2N decoder circuit to select the particular cell to be fetched or stored. A RAM is usually organized as a rectangular array of cells because that's the easiest way of building large decoders (see section 4.5 of these course notes.)

A large RAM is usually too slow for the processor so modern computers use a small fast cache memory as well as the RAM. Most computer programs exhibit a property called locality of reference: if the program has recently referenced a particular storage location in the RAM it will most likely want to reference the same location (or its neighbors) in the near future. The cache memory holds recently-referenced memory items and their neighbors so future references to these items can be performed much faster.

A Read-Only Memory (ROM) is a RAM without the store operation: its contents are fixed when it is built. One common use of a ROM is to initialize a computer properly whenever it is turned on.

(5.2.2) Input/Output and Mass Storage: There is a wide variety of input/output (I/O) devices that computers connect to:

Disks are direct access storage devices (DASDs). The time to access a sector of data on a disk is the sum of the seek time required to move the read head to the appropriate track on the disk; the latency required for the disk to rotate to the appropriate sector on that track; and the transfer time required to let that sector pass under the read head.

Tapes are sequential access storage devices (SASDs). To find a particular data record on the tape the computer starts at the beginning of the tape and examines the records one at a time until it finds the one sought.

I/O devices are much slower than the other subsystems of a computer so the I/O controllers contain buffers to temporarily hold I/O data. The processor executes instructions while I/O data is being transferred.

(5.2.3) The Arithmetic/Logic Unit: The arithmetic/logic unit (ALU) is usually shown as a vee-shaped block to emphasize the fact that it combines two input numbers, A and B, with an arithmetic operation (add, subtract, multiply, etc.) to produce one number as a result. The operation is specified by a set of control lines fed by the Control Unit:

The ALU can also compare the two input numbers to see if A is Greater-Than, EQual to, or Less-Than B - for this operation the ALU sets one of the three condition-code bits (GT, EQ, or LT) to the 1-state.

To speed up arithmetic, modern computers couple the ALU with a set of very fast memory cells called registers. Figure 5.11 shows such a datapath with an ALU coupled to 16 registers, R0, R1, ..., R15. The organization is very flexible; for example, the ALU can subtract the value in any register from the value in any register and store the difference into any register. Storing the result of an ALU operation into a register overwrites the previous value stored in that register.

In a b-bit processor each register holds b bits and the ALU performs operations on b-bit data items (b is usually 32 or 64.)

(5.2.4) The Control Unit: The stored program concept means that machine instructions are stored in memory as sequences of binary bits. The control unit:

  1. fetches the next instruction from memory;
  2. decodes the instruction to determine what is to be done; and
  3. executes the instruction by issuing the appropriate control signals to the ALU, the memory, and the I/O controllers.
These three steps are repeated over and over again until the last instruction in the program (like HALT, STOP, or QUIT) is fetched.

Each instruction is written in machine language as a sequence of 0-bits and 1-bits with a format like:

Operation codeAddress field 1 Address field 2Address field 3 . . .

Each operation in the instruction set of the computer is assigned a unique unsigned-integer operation code (or opcode). For example, opcode 0 might specify the ADD operation, opcode 1 might specify the COMPARE operation, etc. An opcode field of k bits can specify any one of 2k different operations.

The address fields specify the locations of the source operands and the result of the operation. For example, an instruction to subtract the contents of register R2 from the contents of register R1 and put the difference in register R3 might look like:

0101000100100011
SUB R1 R2 R3

where 0101 is the opcode for subtraction and each 4-bit address field specifies one of the sixteen registers. Address fields specifying memory locations are much longer since it takes an n-bit address to specify one of the locations in a 2n-byte RAM. Some processors use only two address fields for arithmetic operations - the result of an operation overwrites one of the source operands.

In the seventies and the eighties the typical processor had several hundred opcodes. The add operation, for example, might have three different opcodes:

These processors had variable-length instructions. For example, the ADD3 instruction required many more bytes than the ADDR instruction.

Machine languages for modern processors are much simpler with less than 50 operations. These machines are called Reduced Instruction Set Computers or RISC machines. Fixed-length instructions (usually 32 bits) are used and arithmetic operations only use register operands. Even though a RISC machine must execute more instructions to perform a typical algorithm, each instruction runs much faster so the running time of the algorithm is less.

There are four basic classes of instructions:

  1. Data Transfer: to move operands between registers and memory words, etc.
  2. Arithmetic: to add, subtract, multiply, divide, and logically combine operands.
  3. Compare: to compare two values and set the states of condition code bits depending on outcome of the comparison.
  4. Branch: to alter the flow of control through the instructions based on the states of the condition code bits.
Figure 5.16 shows the organization of a control unit with the Instruction Register or IR, the Program Counter or PC, and the instruction decoding circuit.

(5.3) Putting All the Pieces Together: Figure 5.18 shows the organization of a Von Neumann computer with a number of ALU registers: R0, R1, R2, R3, ...

Figure 5.19 shows a hypothetical instruction set for a Von Neumann computer:

Binary OpcodeOperationMeaning
0000LOAD XCON(X) --> R
0001STORE XR --> CON(X)
0010CLEAR X0 --> CON(X)
0011ADD X R + CON(X) --> R
0100INCREMENT X CON(X) + 1 --> CON(X)
0101SUBTRACT X R - CON(X) --> R
0110DECREMENT X CON(X) - 1 --> CON(X)
0111COMPARE X if CON(X) > R then set GT to 1, else 0
if CON(X) = R then set EQ to 1, else 0
if CON(X) < R then set LT to 1, else 0
1000JUMP XJump to location X
1001JUMPGT X Jump to location X if GT = 1
1010JUMPEQ X Jump to location X if EQ = 1
1011JUMPLT X Jump to location X if LT = 1
1100JUMPNEQ X Jump to location X if EQ = 0
1101IN XInput an integer into location X
1110OUT XOutput an integer from location X
1111HALTStop program execution

The control unit uses the following algorithm to execute a program.


Set the PC to the address of the first instruction in the program.
Repeatuntil a HALT instruction or a fatal error is encountered.
Fetch phase
Decode phase
Execute phase
End of the loop

Fetch Phase: There are four steps in the fetch phase:
  1. Copy the address of the instruction in the PC to the MAR.
  2. Initiate a fetch operation in the memory to put a copy of the instruction in the MDR.
  3. Copy the instruction in the MDR to the instruction register (IR.)
  4. Increment the value in the PC to the address of the following instruction.
Decode Phase: The decode phase has only one step - send a copy of the opcode field of the instruction in the IR to the decoder.

Execute Phase: The steps in the execute phase depend on the opcode of the instruction. The steps for some of the instructions in figure 5.19 are shown below:

(5.4) Parallel Processing: The title of section 5.4 in the text is Non-Von Neumann Architectures but the author of these notes doesn't like that title because Von Neumann was one of the first persons to describe parallel processors.

Figure 5.20 graphs processor speeds from the mid-1940s to the present. At first processor speeds grew exponentially - quadrupling about every three years. But lately the growth rate has been slower. To get higher processing speed several processing elements are coupled together in a parallel processor. Parallel processors come in two different flavors: SIMD and MIMD.

SIMD: Figure 5.21 is a block diagram of a Single-Instruction-Multiple-Data (SIMD) processor - several ALUs (usually in the thousands) reading and writing data in their own local memories under command of a single control unit executing instructions from a single program. For example, an ADD instruction in the program will cause every ALU to perform an ADD operation on its own data. Unfortunately, the figure doesn't show the interconnection network between the ALUs that every SIMD processor has.

MIMD: Figure 5.22 is a block diagram of a Multiple-Instruction-Multiple-Data (MIMD) processor - several processors (each with its own ALU and control unit) execute instructions from their own programs in their own local memories. For example, while some processors are executing ADD instructions others might be executing SUB instructions.

SIMD vs. MIMD: The single control unit in a SIMD processor keeps all ALUs in lock-step. This simplifies communication of data between the ALUs - a Data Transfer instruction in the program:

Inter-processor communication in a MIMD processor is much slower. For example, to send a data item from processor A to processor B, the program in A must be synchronized with the program in B so B expects to receive the item when A sends it.


Kenneth E. Batcher - 9/19/2006