AI Software Environment¶
The AI Software Environment by LUMI AI Factory is a comprehensive, ready-to-use containerised stack for AI and machine learning workloads on the LUMI supercomputer. The environment is designed to address the complexity of deploying and maintaining AI/ML software in high-performance computing (HPC) setting.
All build artifacts are publicly available. This includes the full recipe, Containerfiles, build logs, and the resulting final container images. This transparent approach enables full customization for special use cases, reuse on other similar systems, as well as adapting the images to run on cloud environments.
Available container images¶
At the moment each release includes the following container images, each building on the previous one by adding new major functionality in the following order:
lumi-multitorch-rocm-*: Starts from Ubuntu base image and adds ROCmlumi-multitorch-libfabric-*: Adds libfabric to the ROCm imagelumi-multitorch-mpich-*: Adds MPICH with GPU support to the libfabric imagelumi-multitorch-torch-*: Adds PyTorch to the MPICH imagelumi-multitorch-full-*: Adds selection of AI and ML libraries (e.g., Bitsandbytes, DeepSpeed, Flash Attention, Megatron LM, vLLM) to the PyTorch image
The releases on GitHub also include full details of the included software of each image.
The name of the container includes a timestamp and version identifier. It is explained in the releases on GitHub.
For users running AI applications based on PyTorch, the containers starting with lumi-multitorch-full-* are most likely the best starting point. Advanced users can build on intermediate containers to customize to their use cases.
Access to container images¶
The container images are available from the following locations:
- LUMI supercomputer in directory:
/appl/local/laifs/containers/ - Docker Hub in the LUMI AI Factory organisation
The GitHub releases in a public GitHub repository include full details of the provided container images.
Examples for using the container images¶
This list only includes some examples for using the container images. More examples can be found in the LUMI AI guide.
lumi-aif-singularity-bindings module
To give LUMI containers access to the file system of the working directory, some additional bindings are required. As it can be quite cumbersome to set these bindings manually, we provide a module that does this for you. You can load the module with the following commands:
If you prefer to set the bindings manually, we recommend taking a look at the Running containers on LUMI lecture from the LUMI AI workshop material.
Run PyTorch using the container¶
module purge
module use /appl/local/laifs/modules
module load lumi-aif-singularity-bindings
export SIF=/appl/local/laifs/containers/lumi-multitorch-u24r64f21m43t29-20260124_092648/lumi-multitorch-full-u24r64f21m43t29-20260124_092648.sif
srun -A <your-project-id> -p small-g -n 1 --gpus-per-task=1 singularity run $SIF python -c "import torch; print(torch.cuda.device_count())"
List pip packages in container¶
To inspect which specific packages are included in the images you can use this simple command:
export SIF=/appl/local/laifs/containers/lumi-multitorch-u24r64f21m43t29-20260124_092648/lumi-multitorch-full-u24r64f21m43t29-20260124_092648.sif
singularity run $SIF pip list
Alternatively, you can have a look at the software bill of materials (SBOM) .json file in the releases on GitHub or in the directory of the container on LUMI.
Add more pip packages to container¶
You might find yourself in a situation where none of the provided containers contain all Python packages you need. There are multiple ways of adding more pip packages:
- If you need many or large pip packages, you should build new containers based on the existing images.
- If you only need a few small pip packages, you can use a virtual environment together with the provided containers and follow this example.
For this example, we need to add the HDF5 Python package h5py to the environment:
module purge
module use /appl/local/laifs/modules
module load lumi-aif-singularity-bindings
export SIF=/appl/local/laifs/containers/lumi-multitorch-u24r64f21m43t29-20260124_092648/lumi-multitorch-full-u24r64f21m43t29-20260124_092648.sif
singularity shell $SIF
Singularity> python -m venv h5-env --system-site-packages
Singularity> source h5-env/bin/activate
(h5-env) Singularity> pip install h5py
This will create an h5-env environment in the working directory. The --system-site-packages flag gives the virtual environment access to the packages from the container.
Strain on Lustre file system
Installing Python packages typically creates thousands of small files. This puts a lot of strain on the Lustre file system and might exceed your file quota. This problem can be solved by building new containers based on the images.
Now one can execute a script with and import the h5py package. To execute a script called my-script.py within the container using the virtual environment, use the additional activation command:
export SIF=/appl/local/laifs/containers/lumi-multitorch-u24r64f21m43t29-20260124_092648/lumi-multitorch-full-u24r64f21m43t29-20260124_092648.sif
singularity run $SIF bash -c 'source h5-env/bin/activate && python my-script.py'
Build new containers based on the images¶
It is possible to create new containers based on the existing containers. In general, the instructions to extend singularity images can be followed.
GPU support and communication libraries
Note that for some packages installing for GPU or the correct communication libraries might require significant work. We recommend starting with a suitable image that has most of the required software to mitigate the risk of that happening.
To build a new container the following steps are required:
-
Check what software you want to install.
Note: For a few small pip packages a virtual environment might be sufficient.
-
Identify the correct base container from the releases on GitHub.
Examples:
-
For installing an alternative deep learning framework like JAX, start with the
lumi-multitorch-mpich-*image -
For installing a system package while requiring PyTorch and vLLM start with the
lumi-multitorch-full-*image.
-
-
Create a Singularity definition file.
Examples:
-
Add pip package like
scikit-learntolumi-multitorch-torch-*:Save the following as
scikit.def. -
Add some system package like
nvtoptolumi-multitorch-full-*:Save the following as
nvtop.def.
-
-
Build the container using the instructions to extend singularity images. This is possible either on your own hardware or directly on LUMI using PRoot.
Memory requirements
Creating new containers based on the provided containers might require significant memory. One option is to use an interactive slurm job with sufficient memory.
Example:
singularity build scikit.sif scikit.defsingularity build nvtop.sif nvtop.def
-
Check if the package has been installed correctly. It depends on the specific package installed how to do that, but for packages using GPUs it might be worth checking if the GPU is being detected. For packages using multiple nodes it is recommended to verify that all nodes are used by your package.