9033
Comment:
|
10123
|
Deletions are marked like this. | Additions are marked like this. |
Line 4: | Line 4: |
This page shows how to set up a personal python development infrastructure, how to use it with examples for software installation in the field of data sciences, how to maintain it and make backups of your project environments. After familiarizing yourself with the tool you'll learn how to use here, read |
This page shows how to [[#Installing_conda|set up a personal python development infrastructure]], how to [[#Using_conda|use it]], how to [[#Maintenance|maintain it]] and [[#Backup|make backups of your project environments]]. Some [[#Installation_examples|examples for software installation]] in the field of data sciences are provided. |
Line 11: | Line 10: |
== Install conda == * Time to install: ~1 minute * Space required: ~350M |
After familiarizing yourself with `conda`, read [[Programming/Languages/GPUCPU|further information]] about available platforms on which to use your infrastructure and particularities of the software packages involved. == Installing conda == * Time to install: ~1.5 minutes * Space required: ~370M |
Line 53: | Line 54: |
echo 'Automatically initialize conda for furure shell sessions:' | echo 'Automatically initialize conda for future shell sessions:' |
Line 62: | Line 63: |
{{{#!highlight bash numbers=disable | {{{ |
Line 66: | Line 67: |
{{{#!highlight bash numbers=disable | {{{ |
Line 69: | Line 70: |
Choose your preferred method of initializing `conda` as recommended by the script. == Conda storage locations == |
When the script ends it prints out commands to initialize `conda` immediately or every time you log in and a command to completely remove your `conda` installation. Choose your preferred method of initializing `conda` as recommended by the script and note down the deletion command. == conda storage locations == |
Line 75: | Line 78: |
||`/itet-stor/$USER/net_scratch/conda_envs`||Virtual environments on NAS|| ||`/scratch/$USER/conda_envs`||Virtual environments on local disk|| |
||`/itet-stor/$USER/net_scratch/conda_envs`||Virtual environments on NAS where startup time is not important|| ||`/scratch/$USER/conda_envs`||[[#Creates_the_environment_.22my_env.22_in_the_specified_location|Virtual environments on local disk]] which need to start fast|| |
Line 78: | Line 81: |
The purpose of this configuration is to store reproducible and space consuming data outside of your `$HOME` to prevent using up your quota. == Using Conda == `conda` allows to seperate installed software packages from each other by creating so-called ```environments```. Using environments is best practice to generate deterministic and reproducible tools. |
The purpose of this configuration is to store data according to its importance and prevent using up your quota. If you intend to deviate from the default configuration, consult the [[Services/StorageOverview|storage overview]] to choose your storage locations adequately and follow these recommendations: * Reproducible, space consuming data like environments and package cache belongs into storage class ''SCRATCH'' * Code written by yourself should be backuped regularly. It consumes a small amount of space therefore it's ideal location is in storage class ''HOME'' and additionally checked into your [[https://git.ee.ethz.ch/users/sign_in|git repository]]. * Data generated over a long time period which would be time consuming to recreate from scratch and is in use regularly should be stored in the storage class ''PROJECT''. * Data generated as a final result which is not needed for ongoing work but needs to be available for later generations should be stored in the storage class ''ARCHIVE''. == Using conda == `conda` allows to seperate installed software packages from each other by creating so-called ''environments''. Using environments is best practice to generate deterministic and reproducible tools. |
Line 90: | Line 99: |
The official [[https://conda.io/projects/conda/en/latest/user-guide/cheatsheet.html|cheat sheet]] is a compact summary of common commands to get you started. An abbreviated list is shown here: | The official [[https://conda.io/projects/conda/en/latest/user-guide/cheatsheet.html|cheat sheet]] contains a compact summary of common commands. An abbreviated list to get you started is shown below. |
Line 93: | Line 102: |
The name of the automatically installed default environment is `base`. | `conda` automatically installs a default environment called ''base'' with a `python` interpreter, [[https://pypi.org/project/pip/|pip]] and other tools to start coding in python. Whether you want to use and extend this environment or create your own is up to you. At the time of writing this information it is not possible to remove the base environment. |
Line 95: | Line 104: |
{{{#!highlight bash numbers=disable | {{{ |
Line 99: | Line 108: |
{{{#!highlight bash numbers=disable | {{{ |
Line 103: | Line 112: |
{{{#!highlight bash numbers=disable | {{{ |
Line 107: | Line 116: |
{{{#!highlight bash numbers=disable | {{{ |
Line 111: | Line 120: |
{{{#!highlight bash numbers=disable | {{{ |
Line 115: | Line 124: |
{{{#!highlight bash numbers=disable | {{{ |
Line 119: | Line 128: |
{{{#!highlight bash numbers=disable | This command is also the basis for [[#Backup|backing up]] an environment. {{{ |
Line 123: | Line 133: |
{{{#!highlight bash numbers=disable | {{{ |
Line 128: | Line 138: |
{{{#!highlight bash numbers=disable | {{{ |
Line 133: | Line 143: |
{{{#!highlight bash numbers=disable | {{{ |
Line 139: | Line 149: |
{{{#!highlight bash numbers=disable | {{{ |
Line 143: | Line 153: |
{{{#!highlight bash numbers=disable | {{{ |
Line 147: | Line 157: |
{{{#!highlight bash numbers=disable | {{{ |
Line 152: | Line 162: |
The cache of installed packages will consume a lot of space over time. The default location set for the package cache resides on [[Services/NetScratch|NetScratch]], the terms of use for this storage area imply to clean your cache regularly. | The cache of installed packages will consume a lot of space over time. The default location set for the package cache resides on [[Services/NetScratch|NetScratch]], the terms of use for this storage area require you to clean your cache regularly. |
Line 154: | Line 164: |
{{{#!highlight bash numbers=disable | {{{ |
Line 158: | Line 168: |
{{{#!highlight bash numbers=disable | {{{ |
Line 161: | Line 171: |
=== Installation examples === /!\ time to install /space neu abzählen For `conda`, `python` itself is just a software package as any other. Depending on all installation parameters it decides which `python` version works for all other packages. This means different environments will contain differing versions of `python`. ==== Creating an environment with a specific python version ==== {{{#!highlight bash numbers=disable conda create --name py37 python=3.7.3 }}} ==== Creating an environment with the GPU version of pytorch and CUDA toolkit 10 ==== * Time to install: ~5 minutes * Space required: ~2G, ~1.5G packages before cleanup, ~130M packages after cleanup {{{#!highlight bash numbers=disable conda create --name pytcu10 pytorch torchvision cudatoolkit=10.0 --channel pytorch }}} ==== Creating an environment with the GPU version of tensorflow and CUDA toolkit 10 ==== * Time to install: ~5 minutes * Space required: ~2G, ~1.5G packages before cleanup, ~130M packages after cleanup {{{#!highlight bash numbers=disable conda create --name tencu10 tensorflow-gpu cudatoolkit=10.0 }}} |
|
Line 205: | Line 192: |
=== Installation examples === For `conda`, `python` itself is just a software package as any other. After analyzing all packages to be installed it decides which `python` version works for the whole environment. This means different environments may contain differing versions of `python`. ==== Creating an environment with a specific python version ==== * Time to install: ~1 minute * Space required: ~140M {{{ conda create --name py37 python=3.7.3 }}} ==== Creating an environment with the GPU version of pytorch and CUDA toolkit 10 ==== * Time to install: ~5 minutes * Space required: ~2.5G {{{ conda create --name pytcu10 pytorch torchvision cudatoolkit=10.0 --channel pytorch }}} ==== Creating an environment with the GPU version of tensorflow and CUDA toolkit 10 ==== * Time to install: ~5 minutes * Space required: ~2G {{{ conda create --name tencu10 tensorflow-gpu cudatoolkit=10.0 }}} |
Contents
-
Setting up a personal python development infrastructure
- Installing conda
- conda storage locations
-
Using conda
-
Environments
- Create an environment called "my_env" with packages "package1" and "package2" installed
- Activate the environment called "my_env"
- Deactivate the current environment
- List available environments
- Remove the environment called "my_env"
- Create a cloned environment named "cloned_env" from "original_env"
- Export the active environment definition to the file "my_env.yml"
- Recreate a previously exported environment
- Creates the environment "my_env" in the specified location
- Update an active environment
- Packages
- Maintenance
- Backup
- Installation examples
-
Environments
Setting up a personal python development infrastructure
This page shows how to set up a personal python development infrastructure, how to use it, how to maintain it and make backups of your project environments.
Some examples for software installation in the field of data sciences are provided.
The infrastructure is driven by the conda packet manager which accesses the Anaconda repositories to install software.
After familiarizing yourself with conda, read further information about available platforms on which to use your infrastructure and particularities of the software packages involved.
Installing conda
- Time to install: ~1.5 minutes
- Space required: ~370M
To provide conda, the minimal anaconda distribution miniconda can be installed and configured for the D-ITET infrastructure with the following bash script:
#!/bin/bash
# Locations to store environments
# net_scratch is used as default, local scratch needs to be chosen explicitly
LOCAL_SCRATCH="/scratch/${USER}"
NET_SCRATCH="/itet-stor/${USER}/net_scratch"
# Installer of choice for conda
CONDA_INSTALLER_URL='https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh'
# Unset pre-existing python paths
[[ -z ${PYTHONPATH} ]] || unset PYTHONPATH
# Downlad latest version of miniconda and install it
wget -O miniconda.sh "${CONDA_INSTALLER_URL}" \
&& chmod +x miniconda.sh \
&& ./miniconda.sh -b -p "${NET_SCRATCH}/conda" \
&& rm ./miniconda.sh
# Configure conda
eval "$(${NET_SCRATCH}/conda/bin/conda shell.bash hook)"
conda config --add pkgs_dirs "${NET_SCRATCH}/conda_pkgs" --system
conda config --add envs_dirs "${LOCAL_SCRATCH}/conda_envs" --system
conda config --add envs_dirs "${NET_SCRATCH}/conda_envs" --system
conda config --set auto_activate_base false
conda deactivate
# Update conda and conda base environment
conda update conda --yes
conda update -n 'base' --update-all --yes
# Show how to initialize conda
echo
echo 'Initialize conda immediately:'
echo "eval \"\$(${NET_SCRATCH}/conda/bin/conda shell.bash hook)\""
echo
echo 'Automatically initialize conda for future shell sessions:'
echo "echo 'eval \"\$(${NET_SCRATCH}/conda/bin/conda shell.bash hook)\"' >> ${HOME}/.bashrc"
# Show how to remove conda
echo
echo 'Completely remove conda:'
echo "rm -r ${NET_SCRATCH}/conda ${NET_SCRATCH}/conda_pkgs ${NET_SCRATCH}/conda_envs ${LOCAL_SCRATCH}/conda_envs ${HOME}/.conda"
Save this script as install_conda.sh, make it executable with
chmod +x install_conda.sh
and execute the script by issuing
./install_conda.sh
When the script ends it prints out commands to initialize conda immediately or every time you log in and a command to completely remove your conda installation.
Choose your preferred method of initializing conda as recommended by the script and note down the deletion command.
conda storage locations
The directories listed in the command for complete conda removal contain the following data:
/itet-stor/$USER/net_scratch/conda |
The miniconda installation |
/itet-stor/$USER/net_scratch/conda_pkgs |
Downloaded packages |
/itet-stor/$USER/net_scratch/conda_envs |
Virtual environments on NAS where startup time is not important |
/scratch/$USER/conda_envs |
Virtual environments on local disk which need to start fast |
/home/$USER/.conda |
Personal conda configuration |
The purpose of this configuration is to store data according to its importance and prevent using up your quota. If you intend to deviate from the default configuration, consult the storage overview to choose your storage locations adequately and follow these recommendations:
Reproducible, space consuming data like environments and package cache belongs into storage class SCRATCH
Code written by yourself should be backuped regularly. It consumes a small amount of space therefore it's ideal location is in storage class HOME and additionally checked into your git repository.
Data generated over a long time period which would be time consuming to recreate from scratch and is in use regularly should be stored in the storage class PROJECT.
Data generated as a final result which is not needed for ongoing work but needs to be available for later generations should be stored in the storage class ARCHIVE.
Using conda
conda allows to seperate installed software packages from each other by creating so-called environments. Using environments is best practice to generate deterministic and reproducible tools.
conda takes care of dependencies common to the packages it is asked to install. If two packages have a common dependency but define a differing range of version requirements of said dependency, conda chooses the highest common version number. This means the dependency installed in an environment with both packages together might have a lower version number than in environments seperating both packages.
It is best practice to seperate packages in different environments if they don't need to interact.
For a complete guide to conda see the official documentation.
The official cheat sheet contains a compact summary of common commands. An abbreviated list to get you started is shown below.
Environments
conda automatically installs a default environment called base with a python interpreter, pip and other tools to start coding in python. Whether you want to use and extend this environment or create your own is up to you. At the time of writing this information it is not possible to remove the base environment.
Create an environment called "my_env" with packages "package1" and "package2" installed
conda create --name my_env package1 package2
Activate the environment called "my_env"
conda activate my_env
Deactivate the current environment
conda deactivate
List available environments
conda env list
Remove the environment called "my_env"
conda remove --name my_env --all
Create a cloned environment named "cloned_env" from "original_env"
conda create --name cloned_env --clone original_env
Export the active environment definition to the file "my_env.yml"
This command is also the basis for backing up an environment.
conda env export > my_env.yml
Recreate a previously exported environment
conda env create --file my_env.yml
Creates the environment "my_env" in the specified location
This example is for creating the environment on local scratch for faster disk access
conda create --prefix /scratch/$USER/conda_envs/my_env
Update an active environment
Make sure to create a backup by exporting the active environment before updating.
conda update --update-all
Packages
Search for a package named "package1"
conda search package1
Install the package named "package1" in the active environment
conda install package1
List packages installed in the active environment
conda list
Maintenance
The cache of installed packages will consume a lot of space over time. The default location set for the package cache resides on NetScratch, the terms of use for this storage area require you to clean your cache regularly.
Remove index cache, lock files, unused cache packages, and tarballs
conda clean --all
Update conda without any active environment
conda update conda
Backup
Regular backups are recommended to be able to reproduce an environment used at a certain point in time. Before installing or updating an environment, a backup should always be created in order to be able to revert the changes.
It is not necessary to backup environments themselves, it is sufficient to backup the files of environment exports to recreate them exactly.
For a simple backup of all environments the following script can be used:
#!/bin/bash
BACKUP_DIR="${HOME}/conda_env_backup"
MY_TIME_FORMAT='%Y-%m-%d_%H-%M-%S'
NOW=$(date "+${MY_TIME_FORMAT}")
[[ ! -d "${BACKUP_DIR}" ]] && mkdir "${BACKUP_DIR}"
ENVS=$(conda env list |grep '^\w' |cut -d' ' -f1)
for env in $ENVS; do
echo "Exporting ${env} to ${BACKUP_DIR}/${env}_${NOW}.yml"
conda env export --name "${env}"> "${BACKUP_DIR}/${env}_${NOW}.yml"
done
Installation examples
For conda, python itself is just a software package as any other. After analyzing all packages to be installed it decides which python version works for the whole environment. This means different environments may contain differing versions of python.
Creating an environment with a specific python version
- Time to install: ~1 minute
- Space required: ~140M
conda create --name py37 python=3.7.3
Creating an environment with the GPU version of pytorch and CUDA toolkit 10
- Time to install: ~5 minutes
- Space required: ~2.5G
conda create --name pytcu10 pytorch torchvision cudatoolkit=10.0 --channel pytorch
Creating an environment with the GPU version of tensorflow and CUDA toolkit 10
- Time to install: ~5 minutes
- Space required: ~2G
conda create --name tencu10 tensorflow-gpu cudatoolkit=10.0