Python

Description

The Blue Waters Python software stack (bwpy) is an expansive suite of Python software components supporting a large class of high performance and scientific computing applications. The bwpy suite implements the scipy stack specification plus many other scientific Python packages. Over 180 Python packages are avalable, including mpi4py, pycuda, h5pynetcdf4-python, and pandas.  The core numerical libraries are linked against Cray's libsci, and the stack is built for running on both login and compute nodes.

How to use the bwpy suite

The components of the bwpy suite are installed into two modules, "bwpy" and "bwpy-mpi". The base module, "bwpy" provides the python interpreters plus most packages. The module "bwpy-mpi" adds an additional Python sites directory at higher priority to override packages with optional MPI functionality and to proivide MPI-only modules such as mpi4py. This way, packages like mpi4py can be optionally imported and not break on login nodes. The "bwpy" module must be loaded first to gain access to the "bwpy-mpi" module.

Besides bwpy, there are several other installations of Python, and so users should take care and ensure that they are using the correct Python software environment. The module "bw-python" is a deprecated python suite, the Anaconda install is only partially functional, and the OS provided Python is old with limited packages available. For example, in a default starting environment the Python installation in a user's $PATH is the operating system-managed Python.  This installation is not desirable for intensive computation or jobs on the compute nodes.

One may load bwpy with:

 $ module load bwpy

To load bwpy with MPI functionality:

 $ module load bwpy
$ module load bwpy-mpi 

After which, the path to the Python binary should be different:

 $ which python /mnt/bwpy/single/usr/bin/python

And the version reported by Python should be the latest version of the Python2 branch:

 $ python --version Python 2.7.14

Entering the full bwpy environment with bwpy-environ

BWPY is now installed into ext3 images which must be mounted before use. This change dramatically improves Python start-up times and allows for more frequent updates. To do this, a small program, bwpy-environ, is needed to perform these operations. It acts as a wrapper and should be invoked as bwpy-environ -- program [args...]. If no arguments are given, it will open a new bash instance within the mount namespace it creates, which is useful for interactive tasks. There are some wrappers for some commonly used executables, but bwpy-environ is needed to access the full range of executables and should be used if python is to be invoked multiple times.

Example 1:

 $ aprun -n 1 -- bwpy-environ -- python -c "import numpy"

Example 2:

 $ aprun -n 1 -- bwpy-environ -- myscript.sh  $ cat myscript.sh #!/bin/bash python script1.py python script2.py 

 

When possible, it is still advisable to condense multiple invocations of Python into a single Python script. For example:

 $ cat job.pbs ... module load bwpy
aprun -n 1 bwpy-environ myscript.sh  $ cat myscript.sh #!/bin/bash for i in {0..999}; do python script1.py; done for i in {0..499}; do python script2.py; done  $ cat script1.py #!/usr/bin/env python import sys print("script 1 %s" % sys.argv[1]) $ cat script2.py  #!/usr/bin/env python import sys print("script 2 %s" % sys.argv[1]) 

This should be rewritten to:

 $ cat myscript.sh #!/bin/bash python outer.py $ cat outer.py #!/usr/bin/env python from script1 import script1_main from script2 import script2_main def outer_main(range1,range2):  for i in range(range1): script1_main(i)  for i in range(range2):  script2_main(i) if __name__ == "__main__":
    import sys
 outer_main(int(sys.argv[1]),int(sys.argv[2])) $ cat script1.py #!/usr/bin/env python def script1_main(arg):  print("script 1 %s" % arg) if __name__ == "__main__":  import sys  script1_main(sys.argv[1]) $ cat script2.py #!/usr/bin/env python def script2_main(arg):  print("script 2 %s" % arg) if __name__ == "__main__":  import sys  script2_main(sys.argv[1])

 

This can be made to run in parallel with minimal changes:

 $ cat outer.py #!/usr/bin/env python import os from script1 import script1_main from script2 import script2_main pes = int(os.environ.get('PBS_NP',1)) rank = int(os.environ.get('ALPS_APP_PE',0))  def outer_main(range1,range2):  for i in range(range1):  if i % pes == rank:  script1_main(i)  for i in range(range2):  if i % pes == rank:  script2_main(i) if __name__ == "__main__":
    import sys
 outer_main(int(sys.argv[1]),int(sys.argv[2])) 

Python Implementations and Packages

The bwpy module provides several implementations and versions of Python. Those in the latest release are:

  • Python 2.7
  • Python 3.5
  • Pypy
  • Pypy3

These can be accessed through python, python2, python3, python2.7, python3.5, pypy, and pypy3. The default python is 2.7 and the default python3 is 3.5.

It is recommended to use #!/usr/bin/env python2 or #!/usr/bin/env python3 for your shebangs.

To list the available packages for each implementation, use the command `pip list`. Pypy and pypy3 only have a limited numpy implementation, but is much better optimized than CPython with JIT compilation to machine code.

Virtualenv

Simply being able to use the pip and python executables is enough for most users. However, the bwpy module also supports the usage of virtualenv.

Virtualenv creates Python containers for building multiple Python environments with different packages and package versions. The containers have python and pip wrappers which set up the environment for the active virtualenv. For information on virtualenv, please read its package documentation.

Building software against bwpy libraries

To build software against the bwpy libraries, use the PrgEnv-gnu compiler environment and export the following variables:

$ export CPATH="${BWPY_INCLUDE_PATH}"
$ export LIBRARY_PATH="${BWPY_LIBRARY_PATH}"
$ export LDFLAGS="${LDFLAGS} -Wl,--rpath=${BWPY_LIBRARY_PATH}"

These paths are treated as system paths for bwpy. Using CPATH and LIBRARY_PATH will ensure that these paths are searched after any -I and -L options for correctness. The CMake in bwpy also treats these paths as system paths for the same reason, and won't generate -I or -L flags for libraries in these paths. Thus, these environment variables must be set for bwpy's CMake to function correctly.

Known Limitations

  • Applications that use Tkinter, matplotlib, or any other GUI backend will require a connection to a properly configured X server.
  • MPI applications will not run on a login node, but must be ran in a job using aprun. This restriction exists even for runs with only one rank.

Resources