Revision 39 as of 2019-05-31 07:02:16

Clear message

Setting up a personal python development infrastructure

This page shows how to set up a personal python development infrastructure, how to use it, how to maintain it and make backups of your project environments.

Some examples for software installation in the field of data sciences are provided.

The infrastructure is driven by the conda packet manager which accesses the Anaconda repositories to install software.

After familiarizing yourself with conda, read this collection of hints and explanations about available platforms on which to use your infrastructure and particularities of the software packages involved.

Installing conda

To provide conda, the minimal anaconda distribution miniconda can be installed and configured for the D-ITET infrastructure with the following bash script:

#!/bin/bash

# If local scratch is made available through scratch_net, use its path in
# order to be able to access it on other hosts through scratch_net
scratch_net="/scratch_net/$(hostname -s)"
if mountpoint -q "${scratch_net}"; then
    LOCAL_SCRATCH="${scratch_net}/${USER}"
else
    LOCAL_SCRATCH="/scratch/${USER}"
fi

NET_SCRATCH="/itet-stor/${USER}/net_scratch"

# Locations for conda installation, packet cache and environments
# Replace NET_SCRATCH with LOCAL_SCRATCH according to your needs
CONDA_INSTALL_DIR="${NET_SCRATCH}/conda"
CONDA_PACKET_CACHE="${NET_SCRATCH}/conda_pkgs"
CONDA_ENV_DEFAULT="${NET_SCRATCH}/conda_envs"
CONDA_ENV_LOCAL="${LOCAL_SCRATCH}/conda_envs"

# Installer of choice for conda
CONDA_INSTALLER_URL='https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh'

line=$(printf '%*s\n' "${COLUMNS:-$(tput cols)}" '' |tr ' ' '-')

# Display underlined title to improve readability of script output
function title()
{
    echo
    echo "$@"
    echo "${line}"
}

# Unset pre-existing python paths
[[ -z ${PYTHONPATH} ]] || unset PYTHONPATH

# Downlad latest version of miniconda and install it
title 'Downloading and installing conda'
wget -O miniconda.sh "${CONDA_INSTALLER_URL}" \
    && chmod +x miniconda.sh \
    && ./miniconda.sh -b -p "${CONDA_INSTALL_DIR}" \
    && rm ./miniconda.sh

# Configure conda
title 'Configuring conda:'
eval "$(${CONDA_INSTALL_DIR}/bin/conda shell.bash hook)"
conda config --add pkgs_dirs "${CONDA_PACKET_CACHE}" --system
conda config --add envs_dirs "${CONDA_ENV_LOCAL}" --system
conda config --add envs_dirs "${CONDA_ENV_DEFAULT}" --system
conda config --set auto_activate_base false
conda deactivate

# Update conda and conda base environment
title 'Updating conda and conda base environment:'
conda update conda --yes
conda update -n 'base' --update-all --yes

# Clean installation
title 'Removing unused packages and caches:'
conda clean --all --yes

# Display information about this conda installation
title 'Information about this conda installation:'
conda info

# Show how to initialize conda
title 'Initialize conda immediately:'
echo "eval \"\$(${CONDA_INSTALL_DIR}/bin/conda shell.bash hook)\""
title 'Automatically initialize conda for future shell sessions:'
echo "echo '[[ -f ${CONDA_INSTALL_DIR}/bin/conda ]] && eval \"\$(${CONDA_INSTALL_DIR}/bin/conda shell.bash hook)\"' >> ${HOME}/.bashrc"

# Show how to remove conda
title 'Completely remove conda:'
echo "rm -r ${CONDA_INSTALL_DIR} ${CONDA_INSTALL_DIR}_pkgs ${CONDA_INSTALL_DIR}_envs ${LOCAL_SCRATCH}/conda_envs ${HOME}/.conda"

Save this script as install_conda.sh, make it executable with

chmod +x install_conda.sh

and execute the script by issuing

./install_conda.sh

conda storage locations

The directories listed in the command for complete conda removal contain the following data:

/itet-stor/$USER/net_scratch/conda

The miniconda installation

/itet-stor/$USER/net_scratch/conda_pkgs

Downloaded packages

/itet-stor/$USER/net_scratch/conda_envs

Virtual environments on NAS where startup time is not important

/scratch/$USER/conda_envs

Virtual environments on local disk which need to start fast

/home/$USER/.conda

Personal conda configuration

The purpose of this configuration is to store data according to its importance and prevent using up your quota. If you intend to deviate from the default configuration, consult the storage overview to choose your storage locations adequately and follow these recommendations:

Using conda

conda allows to seperate installed software packages from each other by creating so-called environments. Using environments is best practice to generate deterministic and reproducible tools.

conda takes care of dependencies common to the packages it is asked to install. If two packages have a common dependency but define a differing range of version requirements of said dependency, conda chooses the highest common version number. This means the dependency installed in an environment with both packages together might have a lower version number than in environments seperating both packages.

It is best practice to seperate packages in different environments if they don't need to interact.

For a complete guide to conda see the official documentation.

The official cheat sheet contains a compact summary of common commands. An abbreviated list to get you started is shown below.

Installation examples

For conda, python itself is just a software package as any other. After analyzing all packages to be installed it decides which python version works for the whole environment. This means different environments may contain differing versions of python.

Creating an environment with a specific python version

conda create --name py37 python=3.7.3

Creating an environment with the GPU version of pytorch and CUDA toolkit 10

conda create --name pytcu10 pytorch torchvision cudatoolkit=10.0 --channel pytorch

Creating an environment with the GPU version of tensorflow and CUDA toolkit 10

conda create --name tencu10 tensorflow-gpu cudatoolkit=10.0

Environments

conda automatically installs a default environment called base with a python interpreter, pip and other tools to start coding in python. Whether you want to use and extend this environment or create your own is up to you. At the time of writing this information it is not possible to remove the base environment.

Create an environment called "my_env" with packages "package1" and "package2" installed

conda create --name my_env package1 package2

Activate the environment called "my_env"

conda activate my_env

Deactivate the current environment

conda deactivate

List available environments

conda env list

Remove the environment called "my_env"

conda remove --name my_env --all

Create a cloned environment named "cloned_env" from "original_env"

conda create --name cloned_env --clone original_env

Export the active environment definition to the file "my_env.yml"

This command is also the basis for backing up an environment.

conda env export > my_env.yml

Recreate a previously exported environment

conda env create --file my_env.yml

Creates the environment "my_env" in the specified location

This example is for creating the environment on local scratch for faster disk access

conda create --prefix /scratch/$USER/conda_envs/my_env

Update an active environment

Make sure to create a backup by exporting the active environment before updating.

conda update --update-all

Packages

Search for a package named "package1"

conda search package1

Install the package named "package1" in the active environment

conda install package1

List packages installed in the active environment

conda list

Add software channels

The list of available software can be extended by adding channels of selected repositories. The priority of the channels is set in order of configuration. In the following example, Conda-Forge has the highest priority over Bioconda, with the default channel at the lowest priority.

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

Show software channels

The following command shows the available channels in order of priority (highest first):

conda config --show channels

Miscellaneous

Display information about the current conda installation

conda info

Maintenance

The cache of installed packages will consume a lot of space over time. The default location set for the package cache resides on NetScratch, the terms of use for this storage area require you to clean your cache regularly.

Remove index cache, lock files, unused cache packages, and tarballs

conda clean --all

Update conda without any active environment

conda update conda

Backup

Regular backups are recommended to be able to reproduce an environment used at a certain point in time. Before installing or updating an environment, a backup should always be created in order to be able to revert the changes.

It is not necessary to backup environments themselves, it is sufficient to backup the files of environment exports to recreate them exactly.

For a simple backup of all environments the following script can be used:

#!/bin/bash

BACKUP_DIR="${HOME}/conda_env_backup"
MY_TIME_FORMAT='%Y-%m-%d_%H-%M-%S'

NOW=$(date "+${MY_TIME_FORMAT}")
[[ ! -d "${BACKUP_DIR}" ]] && mkdir "${BACKUP_DIR}"
ENVS=$(conda env list |grep '^\w' |cut -d' ' -f1)
for env in $ENVS; do
    echo "Exporting ${env} to ${BACKUP_DIR}/${env}_${NOW}.yml"
    conda env export --name "${env}"> "${BACKUP_DIR}/${env}_${NOW}.yml"
done