Differences between revisions 8 and 9

Contents

Working with GPU or CPU in data sciences

Working with GPU or CPU in data sciences

Calculations in data sciences run on CPUs and/or GPUs. If you're using tools for or write code in this field, you will have to decide where your calculations are executed. The following information is supposed to help with that decision.

Platform information

The D-ITET infrastructure managed by ISG uses NVIDIA GPUs and Intel CPUs exclusively. Available platforms are either managed Linux workstations with a single GPU or GPU clusters.

Information about these components can be shown by issuing the following commands in a shell:

lscpu
Shows information about the CPUs, most relevantly the number of CPU cores available in the line starting with CPU(s):
nvidia-smi
Shows the NVIDIA driver version, the CUDA toolkit version and GPUs with their available memory

NVIDIA CUDA Toolkit

The CUDA toolkit provides a development environment for creating high performance GPU-accelerated applications. It is a necessary software dependency for tools used in GPU computing.

Matching driver and toolkit versions

It is crucial to match the CUDA toolkit used in a project to the NVIDIA driver installed on the platform the project is supposed to run on.

The CUDA compatibility document by NVIDIA contains a dependency matrix matching driver and toolkit versions.

Installing a specific toolkit version with conda

Assuming the CUDA toolkit is to be installed in a conda environment, available versions can be shown with

conda search cudatoolkit

And the version matching the driver can be installed with the following command in an active environment:

conda install cudatoolkit=10.0

cuDNN library

The cuDNN library is a GPU-accelerated library of primitives for deep neural networks. It is another dependency for GPU computing. In order to use it NVIDIA asks you to read the Software Level Agreement for the library. The library is registered by ISG to be used for research at D-ITET. If you use the library differently you are obliged to register it yourself.

pytorch

pytorch is one of the main open source deep learning platforms in use at the time of writing this page.

A good starting point for further information is the official pytorch documentation.

Testing pytorch

To verify the successful installation of pytorch run the following python code in your python interpreter:

import torch
x = torch.rand(5, 3)
print(x)

The output should be similar to the following:

tensor([[0.4813, 0.8839, 0.1568],
        [0.0485, 0.9338, 0.1582],
        [0.1453, 0.5322, 0.8509],
        [0.2104, 0.4154, 0.9658],
        [0.6050, 0.9571, 0.3570]])

Environment and platform information

The following example shows how to gather information which you can use for example to decide whether to run your code on CPU or GPU:

import torch
import sys
print('__Python VERSION:', sys.version)
print('__pyTorch VERSION:', torch.__version__)
print('__CUDA VERSION')
print('__CUDNN VERSION:', torch.backends.cudnn.version())
print('__Number CUDA Devices:', torch.cuda.device_count())
print('__Devices:')
from subprocess import call
call(["nvidia-smi", "--format=csv", "--query-gpu=index,name,driver_version,memory.total,memory.used,memory.free"])
print('Active CUDA Device: GPU', torch.cuda.current_device())
print ('Available devices ', torch.cuda.device_count())
print ('Current cuda device ', torch.cuda.current_device())

tensorflow

tensorflow is another popular open source platform for machine learning.

Choose from the available tutorials to learn how to use it.

Platform information

The following code prints information about the capabilities of the platform you run your environment on:

import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

Lines containing device:XLA_ show which CPU/GPU devices are available.

A line containing cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version means the NVIDIA driver installed on the system you run the code is not compatible with the CUDA toolkit installed in the environment you run the code from.

An extensive list of device information can be shown with:

from tensorflow.python.client import device_lib
device_lib.list_local_devices()

The module tf.test contains helpful functions to gather platform information:

-  ⇤ ← Revision 8 as of 2019-05-14 15:27:54 → 
  Size: 5353
  Editor: stroth
  Comment:
+   ← Revision 9 as of 2019-05-14 15:28:47 → ⇥
  Size: 5370
  Editor: stroth
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 3:
-= Working with GPU or CPU =
+= Working with GPU or CPU in data sciences =

Wiki

Page

Working with GPU or CPU in data sciences

Platform information

NVIDIA CUDA Toolkit

Matching driver and toolkit versions

Installing a specific toolkit version with conda

cuDNN library

pytorch

Testing pytorch

Environment and platform information

tensorflow

Platform information