OpenMP Overview

OpenMP is not magic!

Shared Variables

Parallel Programming using Threads


Most commonly, many applications has single thread in each process, btu a single process can contain multiple threads. Each thread is like a child process contained within parent process.

  • Threads can see all data in parent process.

  • Threads can run on different cores.

  • Threads have potential for parallel speedup.

Analogy:

  • Huge whiteboard: shared memory

  • Different people: threads

  • Do not write on the same place: not interfering with each other

  • Have a place inaccessible to the others: private data

Each thread has its own PC(Program Counter, which has the address of the next instruction to be executed from memory.) and private data, as well as shared data with all other threads.

Synchronisation:

  • Crucial for shared variables approach.

  • Most commonly use global barrier synchronisation (Coarse-grained), also can use lock (Fine-grained) and even CAS (Compare and Swap, atomic instruction guaranteed by hardware)

  • Writing parallel codes relatively straightforward, access shared data as and when its needed.

  • Getting correct code can be difficult.

Example:

  • Computing $asum = a_0+a_1+...+a_7$

    • Shared:

      • main array: a[8]

      • result: asum

    • private:

      • loop counter: i

      • loop limits: istart, istop

      • local sum: mysum

    • synchronisation:

      • thread0: asum += mysum

      • barrier

      • thread1: asum += mysum

Threads in HPC

  • Threads existed before parallel computers

    • designed for concurrency

    • many more threads running than physical cores

      • scheduled / de-scheduled when needed

      • schedule policy: FIFO, LRU, LFU, CLOCK

  • For parallel computing

    • typically run a single thread per core (For affinity and avoid resources overload)

    • want them all to run all the time (Avoid Context-Switch overload)

  • Os optimisations

    • place threads on selected cores (taskset in Linux, KMP_AFFINITY in OpenMP, numactl in NUMA architecture)

    • stop them from migrating (Mitigate Cache-miss and context switch)

Last updated