Size: 7330
Comment:
|
Size: 9171
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
= Set up a python development environment for data science = The following procedure shows how to set up a python development environment with the [[https://conda.io/|conda]] packet manager and install [[https://pytorch.org/|pytorch]] and [[https://www.tensorflow.org/|tensorflow]]. |
= Setting up a personal python development infrastructure = This page shows how to set up a personal python development infrastructure, how to use it with examples for software installation in the field of data sciences, how to maintain it and make backups of your project environments. The infrastructure is driven by the [[https://conda.io/|conda]] packet manager which accesses the [[https://repo.continuum.io/pkgs/|Anaconda repositories]] to install software. After familiarizing yourself with `conda`, read [[Programming/Languages/GPUCPU|further information]] about available platforms on which to use your infrastructure and particularities of the software packages involved. |
Line 7: | Line 11: |
* Time to install: ~1' | * Time to install: ~1 minute |
Line 10: | Line 14: |
To provide conda, the minimal anaconda distribution '''miniconda''' can be installed and configured for the D-ITET infrastructure with the following bash script: | To provide `conda`, the minimal anaconda distribution '''miniconda''' can be installed and configured for the D-ITET infrastructure with the following bash script: |
Line 15: | Line 19: |
# net_scratch is used as default, local scratch needs to be chosen explicitly | |
Line 38: | Line 43: |
# Update conda and conda base environment conda update conda --yes conda update -n 'base' --update-all --yes |
|
Line 57: | Line 66: |
./install_conda | ./install_conda.sh |
Line 74: | Line 83: |
This means the dependency installed in an environment with both packages together might have a lower version number than in environments separating both packages. | This means the dependency installed in an environment with both packages together might have a lower version number than in environments seperating both packages. |
Line 78: | Line 87: |
For a complete guide to `conda` see the [[https://conda.io/projects/conda/en/latest/index.html|official documentation]]. === Common commands === Common commands to get you started are listed here: * `conda create --name my_env package1 package2` . creates an environment called "my_env" with packages "package1" and "package2" installed * `conda activate my_env` . activates the environment called ''my_env'' * `conda deactivate` . deactivates the current environment * `conda env list` . lists available environments * `conda remove --name my_env --all` . removes the environment called ''my_env'' * `conda create --name cloned_env --clone original_env` . creates a cloned environment named ''cloned_env'' from ''original_env'' * `conda env export > my_env.yml` . exports the active environment definition to the file ''my_env.yml'' * `conda env create --file my_env.yml` . recreates a previously exported environment * `conda list` .lists packages installed in the active environment * `conda create --prefix /scratch/$USER/conda_envs/my_env` . creates the environment ''my_env'' in the specified location The name of the default environment is `base`. |
For a complete guide to `conda` see the [[https://conda.io/projects/conda/en/latest/index.html|official documentation]]. The official [[https://conda.io/projects/conda/en/latest/user-guide/cheatsheet.html|cheat sheet]] is a compact summary of common commands to get you started. An abbreviated list is shown here: === Environments === The name of the automatically installed default environment is `base`. ==== Create an environment called "my_env" with packages "package1" and "package2" installed ==== {{{#!highlight bash numbers=disable conda create --name my_env package1 package2 }}} ==== Activate the environment called "my_env" ==== {{{#!highlight bash numbers=disable conda activate my_env }}} ==== Deactivate the current environment ==== {{{#!highlight bash numbers=disable conda deactivate }}} ==== List available environments ==== {{{#!highlight bash numbers=disable conda env list }}} ==== Remove the environment called "my_env" ==== {{{#!highlight bash numbers=disable conda remove --name my_env --all }}} ==== Create a cloned environment named "cloned_env" from "original_env" ==== {{{#!highlight bash numbers=disable conda create --name cloned_env --clone original_env }}} ==== Export the active environment definition to the file "my_env.yml" ==== {{{#!highlight bash numbers=disable conda env export > my_env.yml }}} ==== Recreate a previously exported environment ==== {{{#!highlight bash numbers=disable conda env create --file my_env.yml }}} ==== Creates the environment "my_env" in the specified location ==== This example is for creating the environment on local scratch for faster disk access {{{#!highlight bash numbers=disable conda create --prefix /scratch/$USER/conda_envs/my_env }}} ==== Update an active environment ==== Make sure to create a [[#Backup|backup]] by exporting the active environment before updating. {{{#!highlight bash numbers=disable conda update --update-all }}} === Packages === ==== Search for a package named "package1" ==== {{{#!highlight bash numbers=disable conda search package1 }}} ==== Install the package named "package1" in the active environment ==== {{{#!highlight bash numbers=disable conda install package1 }}} ==== List packages installed in the active environment ==== {{{#!highlight bash numbers=disable conda list }}} === Maintenance === The cache of installed packages will consume a lot of space over time. The default location set for the package cache resides on [[Services/NetScratch|NetScratch]], the terms of use for this storage area imply to clean your cache regularly. ==== Remove index cache, lock files, unused cache packages, and tarballs ==== {{{#!highlight bash numbers=disable conda clean --all }}} ==== Update conda without any active environment ==== {{{#!highlight bash numbers=disable conda update conda }}} |
Line 105: | Line 162: |
* Time to install: ~5' per environment * Space required: ~1.5G packages, 3G per environment The following examples show how to install `pytorch` with CUDA toolkit 9, 10 and for CPU as well as `tensorflow` in the same three variants: {{{#!highlight bash numbers=disable conda create --name pytcu9 pytorch torchvision cudatoolkit=9.0 --channel pytorch |
/!\ time to install /space neu abzählen For `conda`, `python` itself is just a software package as any other. Depending on all installation parameters it decides which `python` version works for all other packages. This means different environments will contain differing versions of `python`. ==== Creating an environment with a specific python version ==== {{{#!highlight bash numbers=disable conda create --name py37 python=3.7.3 }}} ==== Creating an environment with the GPU version of pytorch and CUDA toolkit 10 ==== * Time to install: ~5 minutes * Space required: ~2G, ~1.5G packages before cleanup, ~130M packages after cleanup {{{#!highlight bash numbers=disable |
Line 113: | Line 175: |
conda create --name pytcpu pytorch-cpu torchvision-cpu --channel pytorch conda create --name tencu9 tensorflow-gpu cudatoolkit=9.0 |
}}} ==== Creating an environment with the GPU version of tensorflow and CUDA toolkit 10 ==== * Time to install: ~5 minutes * Space required: ~2G, ~1.5G packages before cleanup, ~130M packages after cleanup {{{#!highlight bash numbers=disable |
Line 116: | Line 181: |
conda create --name tencpu tensorflow --channel intel }}} === Testing installations === ==== Testing pytorch ==== To verify if your installation of `pytorch` is working, run the following python code: {{{#!highlight python numbers=disable from __future__ import print_function import torch x = torch.rand(5, 3) print(x) }}} The output should look something like: {{{ tensor([[0.4813, 0.8839, 0.1568], [0.0485, 0.9338, 0.1582], [0.1453, 0.5322, 0.8509], [0.2104, 0.4154, 0.9658], [0.6050, 0.9571, 0.3570]]) }}} To verify if CUDA is available for `pytorch`, run the following code: {{{#!highlight python numbers=disable import torch torch.cuda.is_available() }}} It should return ''True''. ==== Testing TensorFlow ==== The following code prints information about your `tensorflow` installation: {{{#!highlight python numbers=disable import tensorflow as tf sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) }}} Look for lines containing `device: XLA_`, they show which CPU/GPU devices are available. A line containing `cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version` means the NVIDIA driver is not compatible with the CUDA toolkit (see below). == NVIDIA CUDA Toolkit == Which version of the CUDA toolkit is usable depends on the version of the NVIDIA driver installed on the machine you run your programs. The version can be checked by issuing the command `nvidia-smi` and looking for the number next to the text ''Driver Version''. The CUDA compatibility document by NVIDIA shows a [[https://docs.nvidia.com/deploy/cuda-compatibility/index.html#binary-compatibility__table-toolkit-driver|dependency matrix]] matching driver and toolkit versions. |
}}} === Backup === Regular backups are recommended to be able to reproduce an environment used at a certain point in time. Before installing or updating an environment, a backup should always be created in order to be able to revert the changes. It is not necessary to backup environments themselves, it is sufficient to backup the files of environment exports to recreate them exactly. For a simple backup of all environments the following script can be used: {{{#!highlight bash numbers=disable #!/bin/bash BACKUP_DIR="${HOME}/conda_env_backup" MY_TIME_FORMAT='%Y-%m-%d_%H-%M-%S' NOW=$(date "+${MY_TIME_FORMAT}") [[ ! -d "${BACKUP_DIR}" ]] && mkdir "${BACKUP_DIR}" ENVS=$(conda env list |grep '^\w' |cut -d' ' -f1) for env in $ENVS; do echo "Exporting ${env} to ${BACKUP_DIR}/${env}_${NOW}.yml" conda env export --name "${env}"> "${BACKUP_DIR}/${env}_${NOW}.yml" done }}} |
Contents
-
Setting up a personal python development infrastructure
- Install conda
- Conda storage locations
-
Using Conda
-
Environments
- Create an environment called "my_env" with packages "package1" and "package2" installed
- Activate the environment called "my_env"
- Deactivate the current environment
- List available environments
- Remove the environment called "my_env"
- Create a cloned environment named "cloned_env" from "original_env"
- Export the active environment definition to the file "my_env.yml"
- Recreate a previously exported environment
- Creates the environment "my_env" in the specified location
- Update an active environment
- Packages
- Maintenance
- Installation examples
- Backup
-
Environments
Setting up a personal python development infrastructure
This page shows how to set up a personal python development infrastructure, how to use it with examples for software installation in the field of data sciences, how to maintain it and make backups of your project environments.
The infrastructure is driven by the conda packet manager which accesses the Anaconda repositories to install software.
After familiarizing yourself with conda, read further information about available platforms on which to use your infrastructure and particularities of the software packages involved.
Install conda
- Time to install: ~1 minute
- Space required: ~350M
To provide conda, the minimal anaconda distribution miniconda can be installed and configured for the D-ITET infrastructure with the following bash script:
#!/bin/bash
# Locations to store environments
# net_scratch is used as default, local scratch needs to be chosen explicitly
LOCAL_SCRATCH="/scratch/${USER}"
NET_SCRATCH="/itet-stor/${USER}/net_scratch"
# Installer of choice for conda
CONDA_INSTALLER_URL='https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh'
# Unset pre-existing python paths
[[ -z ${PYTHONPATH} ]] || unset PYTHONPATH
# Downlad latest version of miniconda and install it
wget -O miniconda.sh "${CONDA_INSTALLER_URL}" \
&& chmod +x miniconda.sh \
&& ./miniconda.sh -b -p "${NET_SCRATCH}/conda" \
&& rm ./miniconda.sh
# Configure conda
eval "$(${NET_SCRATCH}/conda/bin/conda shell.bash hook)"
conda config --add pkgs_dirs "${NET_SCRATCH}/conda_pkgs" --system
conda config --add envs_dirs "${LOCAL_SCRATCH}/conda_envs" --system
conda config --add envs_dirs "${NET_SCRATCH}/conda_envs" --system
conda config --set auto_activate_base false
conda deactivate
# Update conda and conda base environment
conda update conda --yes
conda update -n 'base' --update-all --yes
# Show how to initialize conda
echo
echo 'Initialize conda immediately:'
echo "eval \"\$(${NET_SCRATCH}/conda/bin/conda shell.bash hook)\""
echo
echo 'Automatically initialize conda for furure shell sessions:'
echo "echo 'eval \"\$(${NET_SCRATCH}/conda/bin/conda shell.bash hook)\"' >> ${HOME}/.bashrc"
# Show how to remove conda
echo
echo 'Completely remove conda:'
echo "rm -r ${NET_SCRATCH}/conda ${NET_SCRATCH}/conda_pkgs ${NET_SCRATCH}/conda_envs ${LOCAL_SCRATCH}/conda_envs ${HOME}/.conda"
Save this script as install_conda.sh, make it executable with
chmod +x install_conda.sh
and execute the script by issuing
./install_conda.sh
Choose your preferred method of initializing conda as recommended by the script.
Conda storage locations
The directories listed in the command for complete conda removal contain the following data:
/itet-stor/$USER/net_scratch/conda |
The miniconda installation |
/itet-stor/$USER/net_scratch/conda_pkgs |
Downloaded packages |
/itet-stor/$USER/net_scratch/conda_envs |
Virtual environments on NAS |
/scratch/$USER/conda_envs |
Virtual environments on local disk |
/home/$USER/.conda |
Personal conda configuration |
The purpose of this configuration is to store reproducible and space consuming data outside of your $HOME to prevent using up your quota.
Using Conda
conda allows to seperate installed software packages from each other by creating so-called environments. Using environments is best practice to generate deterministic and reproducible tools.
conda takes care of dependencies common to the packages it is asked to install. If two packages have a common dependency but define a differing range of version requirements of said dependency, conda chooses the highest common version number. This means the dependency installed in an environment with both packages together might have a lower version number than in environments seperating both packages.
It is best practice to seperate packages in different environments if they don't need to interact.
For a complete guide to conda see the official documentation.
The official cheat sheet is a compact summary of common commands to get you started. An abbreviated list is shown here:
Environments
The name of the automatically installed default environment is base.
Create an environment called "my_env" with packages "package1" and "package2" installed
conda create --name my_env package1 package2
Activate the environment called "my_env"
conda activate my_env
Deactivate the current environment
conda deactivate
List available environments
conda env list
Remove the environment called "my_env"
conda remove --name my_env --all
Create a cloned environment named "cloned_env" from "original_env"
conda create --name cloned_env --clone original_env
Export the active environment definition to the file "my_env.yml"
conda env export > my_env.yml
Recreate a previously exported environment
conda env create --file my_env.yml
Creates the environment "my_env" in the specified location
This example is for creating the environment on local scratch for faster disk access
conda create --prefix /scratch/$USER/conda_envs/my_env
Update an active environment
Make sure to create a backup by exporting the active environment before updating.
conda update --update-all
Packages
Search for a package named "package1"
conda search package1
Install the package named "package1" in the active environment
conda install package1
List packages installed in the active environment
conda list
Maintenance
The cache of installed packages will consume a lot of space over time. The default location set for the package cache resides on NetScratch, the terms of use for this storage area imply to clean your cache regularly.
Remove index cache, lock files, unused cache packages, and tarballs
conda clean --all
Update conda without any active environment
conda update conda
Installation examples
time to install /space neu abzählen
For conda, python itself is just a software package as any other. Depending on all installation parameters it decides which python version works for all other packages. This means different environments will contain differing versions of python.
Creating an environment with a specific python version
conda create --name py37 python=3.7.3
Creating an environment with the GPU version of pytorch and CUDA toolkit 10
- Time to install: ~5 minutes
- Space required: ~2G, ~1.5G packages before cleanup, ~130M packages after cleanup
conda create --name pytcu10 pytorch torchvision cudatoolkit=10.0 --channel pytorch
Creating an environment with the GPU version of tensorflow and CUDA toolkit 10
- Time to install: ~5 minutes
- Space required: ~2G, ~1.5G packages before cleanup, ~130M packages after cleanup
conda create --name tencu10 tensorflow-gpu cudatoolkit=10.0
Backup
Regular backups are recommended to be able to reproduce an environment used at a certain point in time. Before installing or updating an environment, a backup should always be created in order to be able to revert the changes.
It is not necessary to backup environments themselves, it is sufficient to backup the files of environment exports to recreate them exactly.
For a simple backup of all environments the following script can be used:
#!/bin/bash
BACKUP_DIR="${HOME}/conda_env_backup"
MY_TIME_FORMAT='%Y-%m-%d_%H-%M-%S'
NOW=$(date "+${MY_TIME_FORMAT}")
[[ ! -d "${BACKUP_DIR}" ]] && mkdir "${BACKUP_DIR}"
ENVS=$(conda env list |grep '^\w' |cut -d' ' -f1)
for env in $ENVS; do
echo "Exporting ${env} to ${BACKUP_DIR}/${env}_${NOW}.yml"
conda env export --name "${env}"> "${BACKUP_DIR}/${env}_${NOW}.yml"
done