CSCI 476

Parallel Programming

Coordinator: Muhammad Chaudhary

Credits: 4.0

Description

Overview of parallel computing through study of parallel programming. Topics include message-passing, highly parallel computations, partitioning and divide-and-conquer strategies, pipelined and synchronous computations, load balancing and termination detection, programming with shared memory systems, parallel sorting algorithms, numerical algorithms, image processing, searching and optimization, parallel programming using current technology. Offered periodically.

Prerequisites

C- or higher in CSCI 362, 370.

Sample Textbooks

Course Outcomes

  1. Able to describe various parallel computer architectures.
  2. Able to describe general parallel programming techniques such as Data Parallelism, Data Partitioning, Synchronous Iteration, Replicated Workers, and Pipelined computation.
  3. Able to describe some of the major sources of performance degradation in parallel programs, in particular, memory contention, excessive sequential code, process creation time, communication delay, synchronization delay, and load imbalance.
  4. Able to formulate efficient parallel algorithms, by implementing, debugging, and experimenting various alternatives dictated by the target computer architecture.

Major Topics Covered

  1. WHY PARALLEL PROGRAMMING?
    1. Sequential vs. Parallel
    2. Parallel Computer Systems
    3. Multiprocessor Architecture
    4. Multi-computers
    5. Parallel Programming
    6. The Multi-Pascal Language
    7. Types of Parallel Algorithms
  2. DATA PARALLELISM
    1. Parallel Sorting
    2. Nested Loops
    3. Shared and Local Variables
    4. The FORK Operator
    5. Amdahl's Law
  3. MULTIPROCESSOR ARCHITECTURE
    1. Bus-Oriented Systems
    2. Cache Memory
    3. Multiple Memory Modules
    4. Processor-Memory Interconnection Networks
  4. PROCESS COMMUNICATION
    1. Process Communication Channels
    2. Channel Variables
    3. Pipeline Parallelism
    4. Solution to Linear Equations
    5. Channels and Structured Types
    6. Bitonic Merge Sort
  5. DATA SHARING
    1. Atomic Operations
    2. Spinlocks
    3. Contention for Shared Data
    4. Numerical Integration
    5. Comparing Spinlocks and Channels
  6. SYNCHRONOUS PARALLELISM
    1. Linear Barrier Implementation
    2. Binary tree Implementation of Barriers
    3. Local Synchronization
    4. Broadcasting and Aggregation
  7. MULTI-COMPUTER ARCHITECTURE
    1. Processor Communication
    2. Multi-computer Topology
    3. Broadcasting and Aggregation
    4. Hypercube Embedding
  8. MESSAGE PASSING PROGRAMS
    1. Communication Ports
    2. Language Support for Message-Passing
    3. Pipeline Programs on Multi-computers
    4. Communication Delay
    5. Multiple Port Communications
    6. Programming Projects
  9. DATA PARTITIONING
    1. Communication Overhead
    2. Matrix Multiplication
    3. Jacobi Relaxation
  10. REPLICATED WORKERS
    1. Work Pools
    2. Shortest Path Algorithms
    3. Implementation of Work Pools
    4. Eliminating Contention
    5. N-Queen Problem
  11. DISTRIBUTED TERMINATION DETECTION
    1. Multi-computer Shortest Path Program
    2. Work Pool Implementation
    3. Performance Analysis
    4. Work Compression

Sample Laboratory Projects

In most projects, the student first creates a sequential algorithm to solve the assigned problem. This algorithm is then implemented and tested. The students then converts the sequential to a parallel program, which is debugged to make certain that it works. Finally, the student runs the program for larger data sets and observes the gains in performance.

Generally, a programming project is finished in one lab session. A closed lab session of two hours time-slot is held once in two weeks. In addition, the student also works on the project on his/her own time.

The following is a representative set of laboratories:

Lab #1: C# Program to Merge Lists in Parallel

Lab #2: Multi-threaded Implemenation of John Conway Game of Life in C#

Lab #3: Implementation of Bitonic Merge Sort in MPI

Lab #4: Parallel Matrix Multiplication

Final Project