Differences between revisions 21 and 22
Revision 21 as of 2019-05-14 10:47:17
Size: 9031
Editor: stroth
Comment:
Revision 22 as of 2019-05-14 10:53:57
Size: 9033
Editor: stroth
Comment:
Deletions are marked like this. Additions are marked like this.
Line 90: Line 90:
=== Common commands ===
Line 93: Line 92:
==== Environments ====
===== Create an environment called "my_env" with packages "package1" and "package2" installed =====
=== Environments ===
The name of the automatically installed default environment is `base`.
==== Create an environment called "my_env" with packages "package1" and "package2" installed ====
Line 98: Line 98:
===== Activate the environment called "my_env" ===== ==== Activate the environment called "my_env" ====
Line 102: Line 102:
===== Deactivate the current environment ===== ==== Deactivate the current environment ====
Line 106: Line 106:
===== List available environments ===== ==== List available environments ====
Line 110: Line 110:
===== Remove the environment called "my_env" ===== ==== Remove the environment called "my_env" ====
Line 114: Line 114:
===== Create a cloned environment named "cloned_env" from "original_env" ===== ==== Create a cloned environment named "cloned_env" from "original_env" ====
Line 118: Line 118:
===== Export the active environment definition to the file "my_env.yml" ===== ==== Export the active environment definition to the file "my_env.yml" ====
Line 122: Line 122:
===== Recreate a previously exported environment ===== ==== Recreate a previously exported environment ====
Line 126: Line 126:
===== Creates the environment "my_env" in the specified location ===== ==== Creates the environment "my_env" in the specified location ====
Line 137: Line 137:
==== Packages ====
===== Search for a package named "package1" =====
=== Packages ===
==== Search for a package named "package1" ====
Line 142: Line 142:
===== Install the package named "package1" in the active environment ===== ==== Install the package named "package1" in the active environment ====
Line 146: Line 146:
===== List packages installed in the active environment ===== ==== List packages installed in the active environment ====
Line 151: Line 151:
==== Maintenance ====
===== Remove index cache, lock files, unused cache packages, and tarballs =====
=== Maintenance ===
The cache of installed packages will consume a lot of space over time. The default location set for the package cache resides on [[Services/NetScratch|NetScratch]], the terms of use for this storage area imply to clean your cache regularly.
==== Remove index cache, lock files, unused cache packages, and tarballs ====
Line 160: Line 161:
The name of the default environment is `base`.
Line 184: Line 184:
=== Maintenance ===
The cache of installed packages will consume a lot of space over time. The default location set for the package cache resides on [[Services/NetScratch|NetScratch]], the terms of use for this storage area imply to [[#Remove_index_cache,_lock_files,_unused_cache_packages,_and_tarballs|clean your cache]] regularly.
Line 188: Line 186:
Regular backups of environments are recommended to be able to reproduce an environment used at a certain point in time. Before installing or updating an environment, a backup should always be created in order to be able to revert the changes. Regular backups are recommended to be able to reproduce an environment used at a certain point in time. Before installing or updating an environment, a backup should always be created in order to be able to revert the changes.   It is not necessary to backup environments themselves, it is sufficient to backup the files of environment exports to recreate them exactly.

Setting up a personal python development infrastructure

This page shows how to set up a personal python development infrastructure, how to use it with examples for software installation in the field of data sciences, how to maintain it and make backups of your project environments.

After familiarizing yourself with the tool you'll learn how to use here, read

The infrastructure is driven by the conda packet manager which accesses the Anaconda repositories to install software.

Install conda

  • Time to install: ~1 minute
  • Space required: ~350M

To provide conda, the minimal anaconda distribution miniconda can be installed and configured for the D-ITET infrastructure with the following bash script:

#!/bin/bash

# Locations to store environments
# net_scratch is used as default, local scratch needs to be chosen explicitly
LOCAL_SCRATCH="/scratch/${USER}"
NET_SCRATCH="/itet-stor/${USER}/net_scratch"

# Installer of choice for conda
CONDA_INSTALLER_URL='https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh'

# Unset pre-existing python paths
[[ -z ${PYTHONPATH} ]] || unset PYTHONPATH

# Downlad latest version of miniconda and install it
wget -O miniconda.sh "${CONDA_INSTALLER_URL}" \
    && chmod +x miniconda.sh \
    && ./miniconda.sh -b -p "${NET_SCRATCH}/conda" \
    && rm ./miniconda.sh

# Configure conda
eval "$(${NET_SCRATCH}/conda/bin/conda shell.bash hook)"
conda config --add pkgs_dirs "${NET_SCRATCH}/conda_pkgs" --system
conda config --add envs_dirs "${LOCAL_SCRATCH}/conda_envs" --system
conda config --add envs_dirs "${NET_SCRATCH}/conda_envs" --system
conda config --set auto_activate_base false
conda deactivate

# Update conda and conda base environment
conda update conda --yes
conda update -n 'base' --update-all --yes

# Show how to initialize conda
echo
echo 'Initialize conda immediately:'
echo "eval \"\$(${NET_SCRATCH}/conda/bin/conda shell.bash hook)\""
echo
echo 'Automatically initialize conda for furure shell sessions:'
echo "echo 'eval \"\$(${NET_SCRATCH}/conda/bin/conda shell.bash hook)\"' >> ${HOME}/.bashrc"

# Show how to remove conda
echo
echo 'Completely remove conda:'
echo "rm -r ${NET_SCRATCH}/conda ${NET_SCRATCH}/conda_pkgs ${NET_SCRATCH}/conda_envs ${LOCAL_SCRATCH}/conda_envs ${HOME}/.conda"

Save this script as install_conda.sh, make it executable with

chmod +x install_conda.sh

and execute the script by issuing

./install_conda.sh

Choose your preferred method of initializing conda as recommended by the script.

Conda storage locations

The directories listed in the command for complete conda removal contain the following data:

/itet-stor/$USER/net_scratch/conda

The miniconda installation

/itet-stor/$USER/net_scratch/conda_pkgs

Downloaded packages

/itet-stor/$USER/net_scratch/conda_envs

Virtual environments on NAS

/scratch/$USER/conda_envs

Virtual environments on local disk

/home/$USER/.conda

Personal conda configuration

The purpose of this configuration is to store reproducible and space consuming data outside of your $HOME to prevent using up your quota.

Using Conda

conda allows to seperate installed software packages from each other by creating so-called environments. Using environments is best practice to generate deterministic and reproducible tools.

conda takes care of dependencies common to the packages it is asked to install. If two packages have a common dependency but define a differing range of version requirements of said dependency, conda chooses the highest common version number. This means the dependency installed in an environment with both packages together might have a lower version number than in environments seperating both packages.

It is best practice to seperate packages in different environments if they don't need to interact.

For a complete guide to conda see the official documentation.

The official cheat sheet is a compact summary of common commands to get you started. An abbreviated list is shown here:

Environments

The name of the automatically installed default environment is base.

Create an environment called "my_env" with packages "package1" and "package2" installed

conda create --name my_env package1 package2

Activate the environment called "my_env"

conda activate my_env

Deactivate the current environment

conda deactivate

List available environments

conda env list

Remove the environment called "my_env"

conda remove --name my_env --all

Create a cloned environment named "cloned_env" from "original_env"

conda create --name cloned_env --clone original_env

Export the active environment definition to the file "my_env.yml"

conda env export > my_env.yml

Recreate a previously exported environment

conda env create --file my_env.yml

Creates the environment "my_env" in the specified location

This example is for creating the environment on local scratch for faster disk access

conda create --prefix /scratch/$USER/conda_envs/my_env

Update an active environment

Make sure to create a backup by exporting the active environment before updating.

conda update --update-all

Packages

Search for a package named "package1"

conda search package1

Install the package named "package1" in the active environment

conda install package1

List packages installed in the active environment

conda list

Maintenance

The cache of installed packages will consume a lot of space over time. The default location set for the package cache resides on NetScratch, the terms of use for this storage area imply to clean your cache regularly.

Remove index cache, lock files, unused cache packages, and tarballs

conda clean --all

Update conda without any active environment

conda update conda

Installation examples

/!\ time to install /space neu abzählen

For conda, python itself is just a software package as any other. Depending on all installation parameters it decides which python version works for all other packages. This means different environments will contain differing versions of python.

Creating an environment with a specific python version

conda create --name py37 python=3.7.3

Creating an environment with the GPU version of pytorch and CUDA toolkit 10

  • Time to install: ~5 minutes
  • Space required: ~2G, ~1.5G packages before cleanup, ~130M packages after cleanup

conda create --name pytcu10 pytorch torchvision cudatoolkit=10.0 --channel pytorch

Creating an environment with the GPU version of tensorflow and CUDA toolkit 10

  • Time to install: ~5 minutes
  • Space required: ~2G, ~1.5G packages before cleanup, ~130M packages after cleanup

conda create --name tencu10 tensorflow-gpu cudatoolkit=10.0

Backup

Regular backups are recommended to be able to reproduce an environment used at a certain point in time. Before installing or updating an environment, a backup should always be created in order to be able to revert the changes.

It is not necessary to backup environments themselves, it is sufficient to backup the files of environment exports to recreate them exactly.

For a simple backup of all environments the following script can be used:

#!/bin/bash

BACKUP_DIR="${HOME}/conda_env_backup"
MY_TIME_FORMAT='%Y-%m-%d_%H-%M-%S'

NOW=$(date "+${MY_TIME_FORMAT}")
[[ ! -d "${BACKUP_DIR}" ]] && mkdir "${BACKUP_DIR}"
ENVS=$(conda env list |grep '^\w' |cut -d' ' -f1)
for env in $ENVS; do
    echo "Exporting ${env} to ${BACKUP_DIR}/${env}_${NOW}.yml"
    conda env export --name "${env}"> "${BACKUP_DIR}/${env}_${NOW}.yml"
done

Programming/Languages/Conda (last edited 2023-07-11 19:50:53 by stroth)