Torch¶
Torch is a scientific computing framework with wide support for machine learning algorithms that puts GPUs first. It is easy to use and efficient, thanks to an easy and fast scripting language, LuaJIT, and an underlying C/CUDA implementation.
Torch Docker Container¶
Torch is available on JADE through the use of a Docker container. For more information on JADE’s use of containers, see Using Containerised Applications.
Using Torch Interactively¶
All the contained applications are launched interactively in the same way within 1 compute node at a time. The number of GPUs to be used per node is requested using the “gres” option. To request an interactive session on a compute node the following command is issued from the login node:
# Requesting 2 GPUs for Torch image version 17.04
srun --gres=gpu:2 --pty /jmain01/apps/docker/torch 17.04
This command will show the following, which is now running on a compute node:
______ __ | Torch7
/_ __/__ ________/ / | Scientific computing for Lua.
/ / / _ \/ __/ __/ _ \ |
/_/ \___/_/ \__/_//_/ | https://github.com/torch
| http://torch.ch
NVIDIA Release 17.04 (build 17724)
Container image Copyright (c) 2017, NVIDIA CORPORATION. All rights reserved.
Copyright (c) 2016, Soumith Chintala, Ronan Collobert, Koray Kavukcuoglu, Clement Farabet
All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
groups: cannot find name for group ID 30773
I have no name!@f1915084ec5f:/home_directory$
Note
The group ID warning and no name warning can safely be ignored.
Note
Inside the container, your home directory on the outside e.g. /jmain01/home/JAD00X/test/test1-test
is mapped to the /home_directory
folder inside the container.
- You can test this by using the command:
- ls /home_directory
You are now inside the container where Torch
is installed.
Torch console¶
Torch
can be used interactively by using the th
command:
th
Where you will the torch command prompt:
______ __ | Torch7
/_ __/__ ________/ / | Scientific computing for Lua.
/ / / _ \/ __/ __/ _ \ | Type ? for help
/_/ \___/_/ \__/_//_/ | https://github.com/torch
| http://torch.ch
th>
When you’re done, type exit
and then y
to exit the Torch
console:
th> exit
Do you really want to exit ([y]/n)? y
I have no name!@f1915084ec5f:/home_directory$
Using LUA script¶
It is also possible to pass a LUA script to the th
command. For example, create a test.lua
file in the current directory with the contents:
torch.manualSeed(1234)
-- choose a dimension
N = 5
-- create a random NxN matrix
A = torch.rand(N, N)
-- make it symmetric positive
A = A*A:t()
-- make it definite
A:add(0.001, torch.eye(N))
-- add a linear term
b = torch.rand(N)
-- create the quadratic form
function J(x)
return 0.5*x:dot(A*x)-b:dot(x)
end
print(J(torch.rand(N)))
Call the test.lua
script by using the command:
th test.lua
Which shows the following results:
0.72191523289161
Using Torch in Batch Mode¶
There are wrappers for launching the containers within batch mode.
Firstly navigate to the folder you wish your script to lauch from, for example we’ll use the home directory:
cd ~
It is recommended that you create a script file e.g. script.sh
:
#!/bin/bash
# Runs a script called test.lua
# see above section for contents
th test.lua
And don’t forget to make your script.sh
executable:
chmod +x script.sh
Then create a Slurm batch script that is used to launch the code, e.g. batch.sh
:
#!/bin/bash
# set the number of nodes
#SBATCH --nodes=1
# set max wallclock time
#SBATCH --time=01:00:00
# set name of job
#SBATCH -J JobName
# set number of GPUs
#SBATCH --gres=gpu:8
# mail alert at start, end and abortion of execution
#SBATCH --mail-type=ALL
# send mail to this address
#SBATCH --mail-user=your.mail@yourdomain.com
#Launching the commands within script.sh
/jmain01/apps/docker/torch-batch -c ./script.sh
You can then submit the job using sbatch
:
sbatch batch.sh
On successful submission, a job ID is given:
Submitted batch job 7800
The output will appear in the slurm standard output file with the corresponding job ID (in this case slurm-7800.out
). The content of the output is as follows:
______ __ | Torch7
/_ __/__ ________/ / | Scientific computing for Lua.
/ / / _ \/ __/ __/ _ \ |
/_/ \___/_/ \__/_//_/ | https://github.com/torch
| http://torch.ch
NVIDIA Release 17.04 (build 17724)
Container image Copyright (c) 2017, NVIDIA CORPORATION. All rights reserved.
Copyright (c) 2016, Soumith Chintala, Ronan Collobert, Koray Kavukcuoglu, Clement Farabet
All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
0.72191523289161