Homework Four - Elementary MPI programming

Due : Thursday Decemeber 3, 2015

Assignment:

Parallelize the program using MPI. To do this follow the instructions in

  • Using OpenMPI on Gollum
  • hostfile for all nodes and cores using gigabit ethernet for spawning
  • hostfile for all nodes but only 2 cores per node using gigabit ethernet for spawning
  • When compiling and running you must NOT use gollum itself but one of the nodes node1 to node12. Present the results as a table of run time and spedup Speedup is defined as T1/Tp where T1 is the time on 1 process and Tp is the time on p processes. Comment on the results.

    The preferred way to time the codes is to use MPI_Wtime() to time the relevant part e.g.

           double starttime, endtime;
           starttime = MPI_Wtime();
            ....  stuff to be timed  ...
           endtime   = MPI_Wtime();
           printf("That took %f seconds\n",endtime-starttime);
    
    1. Implement and check the time to do a matrix-matrix product of a 1500x900 matrix A with a 900x1200 matrix B of doubles As in the first OpenMP exercise
      Define Aij = (i+1)*(j+1) and Bij = 1/((double) (i+1)* (double)(j+1)).
      The result matrix C=A*B should be Cij= 900*(double)(i+1)/(double) (j+1) . You should check that the result is correct in each case by comparing A*B with a matrix C with these values.
    2. The matrices A and B should initially be distributed by block rows over the processes used. The matrix-multiply can then be accomplished by a variation of the matrix-vector multiply code such as
      for each column x of B 
      Compute the parallel metrix-vector product Ax
      
    3. Note that you should not read in the matrices but calculate them using the formula given. You should also not write them out but verify that the results are correct in a manner similar to the OpenMP homework.
    4. You should run the code using 1,2,3,4,5,6, 7 and 8 processes on gollum by using up to 8 processes on one node and by using at most 2 processes on each node.
    5. List the various run times and speedup in seperate tables for each case and comment on the speedup of each

      Note that to implement this program you may need to increase the default stack size. If you do not you will get a segmentation error. To do this you need to execute shell commands similar to:

      ulimit -s unlimited
      

      on each node. Note: when timing the output remember not to print anything while timing.

    This assignment is an individual assignment, to be done on your own without help from other students in the class. However, you may use any materials from any written resource, including web resources.

    Instructions for submitting the homework using svn are contained in this file.