Differences between revisions 3 and 4
Revision 3 as of 2019-05-14 10:57:55
Size: 4758
Editor: stroth
Comment:
Revision 4 as of 2019-05-14 12:43:30
Size: 4786
Editor: stroth
Comment:
Deletions are marked like this. Additions are marked like this.
Line 14: Line 14:
The [[https://developer.nvidia.com/cuda-toolkit|CUDA toolkit]] provides a development environment for creating high performance GPU-accelerated applications. It is a necessary software dependency for many tools used The [[https://developer.nvidia.com/cuda-toolkit|CUDA toolkit]] provides a development environment for creating high performance GPU-accelerated applications. It is a necessary software dependency for tools used in GPU computing.
Line 16: Line 16:
=== Driver and toolkit versions === === Matching driver and toolkit versions ===
It is crucial to match the CUDA toolkit used in a project to the NVIDIA driver installed on the platform the project is supposed to run on.
Line 24: Line 26:
And the version matching the can be installed with the following command in an active environment: And the version matching the driver can be installed with the following command in an active environment:
Line 29: Line 31:
The following examples show how to install a specfic `python` version, `pytorch` and `tensorflow` in an environment intended to be run either on a Linux managed client, a GPU cluster or a Linux machine without a NVIDIA GPU. The CUDA toolkit versions in the examples are derived from the version of the NVIDIA driver available on a given platform, which always has to be determined before installing an environment. For details see [[#NVIDIA-CUDA-Toolkit|the explanation below]]. == cuDNN library ==
The [[https://developer.nvidia.com/cudnn|cuDNN library]] is a GPU-accelerated library of primitives for deep neural networks. It is another dependency for GPU computing.
In order to use it NVIDIA asks you to read the [[https://docs.nvidia.com/deeplearning/sdk/cudnn-sla/index.html|Software Level Agreement]] for the library. The library is registered by ISG to be used for research at D-ITET. If you use the library differently you are obliged to register it yourself.
Line 32: Line 36:
 [[https://pytorch.org/|pytorch]] and [[https://www.tensorflow.org/|tensorflow]] including non-python dependencies like [[https://developer.nvidia.com/cuda-toolkit|CUDA toolkit]] and the [[https://developer.nvidia.com/cudnn|cuDNN library]].

[[https://pytorch.org/|pytorch]] and [[https://www.tensorflow.org/|tensorflow]]
Line 36: Line 42:
As shown in the examples above, environments can be tailored to a platform for optimal performance. Make sure you set up environments for each platform you intend to use. The list of packages installed and their version numbers should be identical on all environments if you follow the examples. An identical list of versions in your environments will make sure your environments behabe identically on all platforms. As shown in the examples above, environments can be tailored to a platform for optimal performance. Make sure you set up environments for each platform you intend to use. The list of packages installed and their version numbers should be identical on all environments if you follow the examples. An identical list of versions in your environments will make sure your environments behave identically on all platforms.

Working with GPU or CPU

Calculations in data sciences run on CPUs and/or GPUs. If you're using tools for or write code in this field, at some point it will be necessary to decide where the calculations are executed. The following information is supposed to help with that decision.

Platform information

The D-ITET infrastructure managed by ISG uses NVIDIA GPUs and Intel CPUs exclusively. Available platforms are either managed Linux workstations with a single GPU or GPU clusters.

Information about these components can be shown by issuing the following commands in a shell:

  • lscpu

  • Shows information about the CPUs, most relevantly the number of CPU cores available in the line starting with CPU(s):

  • nvidia-smi

  • Shows the NVIDIA driver version, the CUDA toolkit version and GPUs with their available memory

NVIDIA CUDA Toolkit

The CUDA toolkit provides a development environment for creating high performance GPU-accelerated applications. It is a necessary software dependency for tools used in GPU computing.

Matching driver and toolkit versions

It is crucial to match the CUDA toolkit used in a project to the NVIDIA driver installed on the platform the project is supposed to run on.

The CUDA compatibility document by NVIDIA contains a dependency matrix matching driver and toolkit versions.

Installing a specific toolkit version with conda

Assuming the CUDA toolkit is to be installed in a conda environment, available versions can be shown with

conda search cudatoolkit

And the version matching the driver can be installed with the following command in an active environment:

conda install cudatoolkit=10.0

cuDNN library

The cuDNN library is a GPU-accelerated library of primitives for deep neural networks. It is another dependency for GPU computing. In order to use it NVIDIA asks you to read the Software Level Agreement for the library. The library is registered by ISG to be used for research at D-ITET. If you use the library differently you are obliged to register it yourself.

A CPU version of tensorflow optimized for Intel CPUs exists, which might be a tempting choice. Be aware that this version of tensorflow and installed dependencies will differ from versions installed from the default channel in the examples above.

As shown in the examples above, environments can be tailored to a platform for optimal performance. Make sure you set up environments for each platform you intend to use. The list of packages installed and their version numbers should be identical on all environments if you follow the examples. An identical list of versions in your environments will make sure your environments behave identically on all platforms.

pytorch

what is it checks test for cuda/cpu

tensorflow

Testing installations

Testing pytorch

To verify the successful installation of pytorch run the following python code in your python interpreter:

from __future__ import print_function
import torch
x = torch.rand(5, 3)
print(x)

The output should be similar to the following:

tensor([[0.4813, 0.8839, 0.1568],
        [0.0485, 0.9338, 0.1582],
        [0.1453, 0.5322, 0.8509],
        [0.2104, 0.4154, 0.9658],
        [0.6050, 0.9571, 0.3570]])

To verify CUDA availability in pytorch, run the following code:

import torch
torch.cuda.is_available()

It should return True.

Testing TensorFlow

The following code prints information about your tensorflow installation:

import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

Lines containing device: XLA_ show which CPU/GPU devices are available.

A line containing cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version means the NVIDIA driver installed on the system you run the code is not compatible with the CUDA toolkit installed in the environment you run the code from.

Additional buzzwords to find this article

  • Deep learning
  • Machine learning
  • Neural networks
  • Big Data

Programming/Languages/GPUCPU (last edited 2023-10-16 13:52:05 by alders)