Differences between revisions 2 and 10 (spanning 8 versions)
Revision 2 as of 2019-05-14 10:25:10
Size: 4274
Editor: stroth
Comment:
Revision 10 as of 2019-05-16 07:20:01
Size: 7062
Editor: stroth
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= Working with GPU or CPU =
Calculations in data sciences run on CPUs and/or GPUs. If you're using tools for or write code in this field, at some point it will be necessary to decide where the calculations are executed. The following information is supposed to help with that decision.
<<TableOfContents()>>

= Working with GPU or CPU in data sciences =
Calculations in data sciences run on CPUs and/or GPUs. If you're using tools for or write code in this field, you will have to decide where your calculations are executed. The following information is supposed to help with that decision.

The [[https://github.com/Chris-Engelhardt/data_sci_guide|Guided Data Science Resources]] is a community-sourced data science repo containing open source learning material.
Line 14: Line 18:
The [[https://developer.nvidia.com/cuda-toolkit|CUDA toolkit]] provides a development environment for creating high performance GPU-accelerated applications. It is a necessary software dependency for many tools used The [[https://developer.nvidia.com/cuda-toolkit|CUDA toolkit]] provides a development environment for creating high performance GPU-accelerated applications. It is a necessary software dependency for tools used in GPU computing.
Line 16: Line 20:
=== Driver and toolkit versions ===
The CUDA compatibility document by NVIDIA contains a [[https://docs.nvidia.com/deploy/cuda-compatibility/index.html#binary-compatibility__table-toolkit-driver|dependency matrix]] matching driver and toolkit versions.
=== Matching toolkit versions to installed driver ===
The version of the NVIDIA driver installed on a platform limits the version range of CUDA toolkits working with the driver. The driver version is subject to operating update policies and cannot be changed by a user with normal privileges. It is not uniform on servers an desktop clients.

For your projects to work it is crucial to
 * check the driver version with `nvidia-smi` and
 * consult NVIDIA's [[https://docs.nvidia.com/deploy/cuda-compatibility/index.html#binary-compatibility__table-toolkit-driver|dependency matrix]]
 * to choose the toolkit version matching the driver installed on the platform you use.
Line 20: Line 29:
Assuming the CUDA toolkit is to be installed in a [[Programming/Languages/Conda|conda environment]], available versions can be shown with The easiest way to install the CUDA toolkit is by using [[Programming/Languages/Conda|conda]]. Available versions can be shown with
Line 24: Line 33:
And the version matching the can be installed with the following command in an active environment: And the version matching the driver can be installed with the following command in an active environment:
Line 29: Line 38:
 [[https://pytorch.org/|pytorch]] and [[https://www.tensorflow.org/|tensorflow]] including non-python dependencies like [[https://developer.nvidia.com/cuda-toolkit|CUDA toolkit]] and the [[https://developer.nvidia.com/cudnn|cuDNN library]]. == Important reminder about working locally ==
If you're working locally, meaning on a managed Linux desktop or your private machine, always keep in mind the following:
 * '''The local GPU might not have enough memory for your project'''
 * '''The CUDA version you're using in your project environment might be too new for the driver installed locally'''
Line 31: Line 43:
A [[https://software.intel.com/en-us/articles/intel-optimization-for-tensorflow-installation-guide#Anaconda_Intel|CPU version of tensorflow optimized for Intel CPUs]] exists, which might be a tempting choice. Be aware that this version of `tensorflow` and installed dependencies will differ from versions installed from the default channel in the examples above. == cuDNN library ==
The [[https://developer.nvidia.com/cudnn|cuDNN library]] is a GPU-accelerated library of primitives for deep neural networks. It is another dependency for GPU computing.
In order to use it NVIDIA asks you to read the [[https://docs.nvidia.com/deeplearning/sdk/cudnn-sla/index.html|Software Level Agreement]] for the library. The library is registered by ISG to be used for research at D-ITET. If you use the library differently you are obliged to register it yourself.
Line 33: Line 47:
As shown in the examples above, environments can be tailored to a platform for optimal performance. Make sure you set up environments for each platform you intend to use. The list of packages installed and their version numbers should be identical on all environments if you follow the examples. An identical list of versions in your environments will make sure your environments behabe identically on all platforms. [[Programming/Languages/Conda|conda]] automatically installs this library if it's a dependency of another package installed.
Line 35: Line 49:
== pytorch ==
[[https://pytorch.org/|pytorch]] is one of the main open source deep learning platforms in use at the time of writing this page. If you haven't done so already, read this [[Programming/Languages/Conda#Creating_an_environment_with_the_GPU_version_of_pytorch_and_CUDA_toolkit_10|installation example]].
Line 36: Line 52:
A good starting point for further information is the [[https://pytorch.org/docs/stable/index.html|official pytorch documentation]].
Line 37: Line 54:
pytorch

what is it
checks
test for cuda/cpu

tensorflow



=== Testing installations ===

==== Testing pytorch ====
=== Testing pytorch ===
Line 52: Line 57:
from __future__ import print_function
Line 65: Line 69:
To verify CUDA availability in `pytorch`, run the following code:
=== Environment and platform information ===
The following example shows how to gather information which you can use for example to decide whether to run your code on CPU or GPU:
Line 68: Line 74:
torch.cuda.is_available() import sys
print('__Python VERSION:', sys.version)
print('__pyTorch VERSION:', torch.__version__)
print('__CUDA VERSION')
print('__CUDNN VERSION:', torch.backends.cudnn.version())
print('__Number CUDA Devices:', torch.cuda.device_count())
print('__Devices:')
from subprocess import call
call(["nvidia-smi", "--format=csv", "--query-gpu=index,name,driver_version,memory.total,memory.used,memory.free"])
print('Active CUDA Device: GPU', torch.cuda.current_device())
print ('Available devices ', torch.cuda.device_count())
print ('Current cuda device ', torch.cuda.current_device())
Line 70: Line 87:
It should return ''True''.
Line 72: Line 88:
==== Testing TensorFlow ====
The following code prints information about your `tensorflow` installation:
== tensorflow ==
[[https://www.tensorflow.org/|tensorflow]] is another popular open source platform for machine learning. If you haven't done so already, read this [[Programming/Languages/Conda#Creating_an_environment_with_the_GPU_version_of_tensorflow_and_CUDA_toolkit_10|installation example]].

Choose from the [[https://www.tensorflow.org/tutorials/|available tutorials]] to learn how to use it.

=== Platform information ===
The following code prints information about the capabilities of the platform you run your environment on:
Line 78: Line 99:
Lines containing `device: XLA_` show which CPU/GPU devices are available. Lines containing `device:XLA_` show which CPU/GPU devices are available.
Line 82: Line 103:
An extensive list of device information can be shown with:
{{{#!highlight python numbers=disable
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
}}}
Line 83: Line 109:
The module [[https://www.tensorflow.org/api_docs/python/tf/test|tf.test]] contains helpful functions to gather platform information:
 * [[https://www.tensorflow.org/api_docs/python/tf/test/is_gpu_available|tf.test.is_gpu_available]]
 * [[https://www.tensorflow.org/api_docs/python/tf/test/gpu_device_name|tf.test.gpu_device_name]]
Line 84: Line 113:

== Additional buzzwords to find this article ==
 * Deep learning
 * Machine learning
 * Neural networks
 * Big Data
 
=== Managing GPU resources ===
If your code is going to run on a GPU cluster you need to make sure you [[https://www.tensorflow.org/guide/using_gpu|manage your use of GPU resources]] and use the following recommended configuration:
{{{#!highlight python numbers=disable
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
config.allow_soft_placement = True
sess = tf.Session(config=config)
}}}

Working with GPU or CPU in data sciences

Calculations in data sciences run on CPUs and/or GPUs. If you're using tools for or write code in this field, you will have to decide where your calculations are executed. The following information is supposed to help with that decision.

The Guided Data Science Resources is a community-sourced data science repo containing open source learning material.

Platform information

The D-ITET infrastructure managed by ISG uses NVIDIA GPUs and Intel CPUs exclusively. Available platforms are either managed Linux workstations with a single GPU or GPU clusters.

Information about these components can be shown by issuing the following commands in a shell:

  • lscpu

  • Shows information about the CPUs, most relevantly the number of CPU cores available in the line starting with CPU(s):

  • nvidia-smi

  • Shows the NVIDIA driver version, the CUDA toolkit version and GPUs with their available memory

NVIDIA CUDA Toolkit

The CUDA toolkit provides a development environment for creating high performance GPU-accelerated applications. It is a necessary software dependency for tools used in GPU computing.

Matching toolkit versions to installed driver

The version of the NVIDIA driver installed on a platform limits the version range of CUDA toolkits working with the driver. The driver version is subject to operating update policies and cannot be changed by a user with normal privileges. It is not uniform on servers an desktop clients.

For your projects to work it is crucial to

  • check the driver version with nvidia-smi and

  • consult NVIDIA's dependency matrix

  • to choose the toolkit version matching the driver installed on the platform you use.

Installing a specific toolkit version with conda

The easiest way to install the CUDA toolkit is by using conda. Available versions can be shown with

conda search cudatoolkit

And the version matching the driver can be installed with the following command in an active environment:

conda install cudatoolkit=10.0

Important reminder about working locally

If you're working locally, meaning on a managed Linux desktop or your private machine, always keep in mind the following:

  • The local GPU might not have enough memory for your project

  • The CUDA version you're using in your project environment might be too new for the driver installed locally

cuDNN library

The cuDNN library is a GPU-accelerated library of primitives for deep neural networks. It is another dependency for GPU computing. In order to use it NVIDIA asks you to read the Software Level Agreement for the library. The library is registered by ISG to be used for research at D-ITET. If you use the library differently you are obliged to register it yourself.

conda automatically installs this library if it's a dependency of another package installed.

pytorch

pytorch is one of the main open source deep learning platforms in use at the time of writing this page. If you haven't done so already, read this installation example.

A good starting point for further information is the official pytorch documentation.

Testing pytorch

To verify the successful installation of pytorch run the following python code in your python interpreter:

import torch
x = torch.rand(5, 3)
print(x)

The output should be similar to the following:

tensor([[0.4813, 0.8839, 0.1568],
        [0.0485, 0.9338, 0.1582],
        [0.1453, 0.5322, 0.8509],
        [0.2104, 0.4154, 0.9658],
        [0.6050, 0.9571, 0.3570]])

Environment and platform information

The following example shows how to gather information which you can use for example to decide whether to run your code on CPU or GPU:

import torch
import sys
print('__Python VERSION:', sys.version)
print('__pyTorch VERSION:', torch.__version__)
print('__CUDA VERSION')
print('__CUDNN VERSION:', torch.backends.cudnn.version())
print('__Number CUDA Devices:', torch.cuda.device_count())
print('__Devices:')
from subprocess import call
call(["nvidia-smi", "--format=csv", "--query-gpu=index,name,driver_version,memory.total,memory.used,memory.free"])
print('Active CUDA Device: GPU', torch.cuda.current_device())
print ('Available devices ', torch.cuda.device_count())
print ('Current cuda device ', torch.cuda.current_device())

tensorflow

tensorflow is another popular open source platform for machine learning. If you haven't done so already, read this installation example.

Choose from the available tutorials to learn how to use it.

Platform information

The following code prints information about the capabilities of the platform you run your environment on:

import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

Lines containing device:XLA_ show which CPU/GPU devices are available.

A line containing cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version means the NVIDIA driver installed on the system you run the code is not compatible with the CUDA toolkit installed in the environment you run the code from.

An extensive list of device information can be shown with:

from tensorflow.python.client import device_lib
device_lib.list_local_devices()

The module tf.test contains helpful functions to gather platform information:

Managing GPU resources

If your code is going to run on a GPU cluster you need to make sure you manage your use of GPU resources and use the following recommended configuration:

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
config.allow_soft_placement = True
sess = tf.Session(config=config)

Programming/Languages/GPUCPU (last edited 2023-10-16 13:52:05 by alders)