Working with GPU or CPU

Calculations in data sciences run on CPUs and/or GPUs. If you're using tools for or write code in this field, at some point it will be necessary to decide where the calculations are executed. The following information is supposed to help with that decision.

Platform information

The D-ITET infrastructure managed by ISG uses NVIDIA GPUs and Intel CPUs exclusively. Available platforms are either managed Linux workstations with a single GPU or GPU clusters.

Information about these components can be shown by issuing the following commands in a shell:

lscpu
Shows information about the CPUs, most relevantly the number of CPU cores available in the line starting with CPU(s):
nvidia-smi
Shows the NVIDIA driver version, the CUDA toolkit version and GPUs with their available memory

NVIDIA CUDA Toolkit

The CUDA toolkit provides a development environment for creating high performance GPU-accelerated applications. It is a necessary software dependency for many tools used

Driver and toolkit versions

The CUDA compatibility document by NVIDIA contains a dependency matrix matching driver and toolkit versions.

Installing a specific toolkit version with conda

Assuming the CUDA toolkit is to be installed in a conda environment, available versions can be shown with

conda search cudatoolkit

And the version matching the can be installed with the following command in an active environment:

conda install cudatoolkit=10.0

pytorch and tensorflow including non-python dependencies like CUDA toolkit and the cuDNN library.

A CPU version of tensorflow optimized for Intel CPUs exists, which might be a tempting choice. Be aware that this version of tensorflow and installed dependencies will differ from versions installed from the default channel in the examples above.

As shown in the examples above, environments can be tailored to a platform for optimal performance. Make sure you set up environments for each platform you intend to use. The list of packages installed and their version numbers should be identical on all environments if you follow the examples. An identical list of versions in your environments will make sure your environments behabe identically on all platforms.

pytorch

what is it checks test for cuda/cpu

tensorflow

Testing installations

Testing pytorch

To verify the successful installation of pytorch run the following python code in your python interpreter:

from __future__ import print_function
import torch
x = torch.rand(5, 3)
print(x)

The output should be similar to the following:

tensor([[0.4813, 0.8839, 0.1568],
        [0.0485, 0.9338, 0.1582],
        [0.1453, 0.5322, 0.8509],
        [0.2104, 0.4154, 0.9658],
        [0.6050, 0.9571, 0.3570]])

To verify CUDA availability in pytorch, run the following code:

import torch
torch.cuda.is_available()

It should return True.

Testing TensorFlow

The following code prints information about your tensorflow installation:

import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

Lines containing device: XLA_ show which CPU/GPU devices are available.

A line containing cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version means the NVIDIA driver installed on the system you run the code is not compatible with the CUDA toolkit installed in the environment you run the code from.

Additional buzzwords to find this article

Deep learning
Machine learning
Neural networks
Big Data

Wiki

Page