
Parallel and cluster computation
parallelisation.RmdTo speed up computation, the batss.glm function can use
parallelisation on a single machine by setting
computation = "parallel" (the default; see the help of
batss.glm for details).
When using multiple machines and/or a cluster, parallelisation can be
achieved by splitting the set of seeds - corresponding to as many
simulated trials - between machines or cluster CPUs, saving the
batss.glm outputs, and merging them using the function
batss.combine.
We describe here how to use the functions batss.glm and
batss.combine in two settings when:
- using several machines,
- using a cluster.
Several machines
Let’s assume a BATSS user wants to perform a Monte Carlo simulation considering 10,000 trials and has two computers, each of them with 10 CPUs.
The strategy we suggest consists of
- running
batss.glmon each each computer for a different subset of 5,000 seeds among the 10,000 seeds of interest specified in the argumentR(so that each computer evaluates a different set of seeds), with- argument
computationset toparalleland the argumentmc.coresset to10(orparallel::detectCores()), the number of CPUs of the computer of interest, - argument
extendedset to1or2(check?batss.glmfor details),
- argument
- saving the
batss.glmoutput as RData files with the functionsaveunder a name specific to the task of each computer (such as the first or last seed evaluated by each computer), - finally, using the function
batss.combineto combine these outputs.
Let’s have the first computer run seeds 1 to 5,000 and save the
output under seed1to5000.rdata:
require(BATSS)
##
## run seeds 1 to 5000 (example of batss.glm)
##
seed1to5000 = batss.glm(
model = y ~ group,
var = list(y = rnorm,
group = alloc.balanced),
var.control = list(y = list(sd = 5)),
beta = c(1, 1, 2),
which = c(2:3),
alternative = "greater",
R = 1:5000,
N = 200,
interim = list(recruited = seq(100, 180, 20)),
prob0 = c(C = 1/3, T1 = 1/3, T2 = 1/3),
eff.arm = eff.arm.simple,
eff.arm.control = list(b = 0.975),
fut.arm = fut.arm.simple,
fut.arm.control = list(b = 0.05),
computation = "parallel",
H0 = TRUE,
mc.cores = 10,
extended = 1)
##
## save results as an rdata file
##
# specify here the folder in which to save the output
path_computer1 = "~/"
# save
save(seed1to5000, file = paste0(path_computer1,"seed1to5000.rdata"))Let’s now have the second computer run seeds 5,001 to 10,000 and save
the output under seed1to5000.rdata:
##
## run seeds 5001 to 10000 (example of batss.glm)
##
seed5001to10000 = batss.glm(
model = y ~ group,
var = list(y = rnorm,
group = alloc.balanced),
var.control = list(y = list(sd = 5)),
beta = c(1, 1, 2),
which = c(2:3),
alternative = "greater",
R = 5001:10000,
N = 200,
interim = list(recruited = seq(100, 180, 20)),
prob0 = c(C = 1/3, T1 = 1/3, T2 = 1/3),
eff.arm = eff.arm.simple,
eff.arm.control = list(b = 0.975),
fut.arm = fut.arm.simple,
fut.arm.control = list(b = 0.05),
computation = "parallel",
H0 = TRUE,
mc.cores = 10,
extended = 1)
##
## save results as an rdata file
##
# specify here the folder in which to save the output
path_computer2 = "~/"
# save
save(seed5001to10000, file = paste0(path_computer2,"seed5001to10000.rdata"))Transfer the objects seed1to5000.rdata and
seed5001to10000.rdata to the same folder of the same
computer, for example path_computer1 above, and merge the
objects with the function batss.combine as follows:
# combine
seed1to10000 = batss.combine(
paths = paste0(path_computer1, c("seed1to5000","seed5001to10000"),".rdata"))
# look at combined results
summary(seed1to10000)Cluster
Let’s assume a BATSS user wants to perform a Monte Carlo simulation considering 10,000 trials and has access to 500 CPUs of a cluster.
The strategy we suggest consists of
- running
batss.glmon each CPU for a specific subset of the 10,000 seeds of interest specified in the argumentR(so that each CPU evaluates a different set of seeds), with argumentcomputationset tosequential(as parallelisation is already achieved by the large number of CPUs), - saving each CPU’s
batss.glmoutput as an RData file using the functionsavewith filenames indicating the task (e.g., based on the first or last seed evaluated by each CPU), - using the
batss.combinefunction to merge all these outputs into a single one.
There are multiple ways to accomplish this. In the following, we describe the approach we follow. Let us assume that
- the cluster runs a Linux OS with Slurm as workload manager (a common setup in cluster computing),
- the working directory for this simulation is
~/batss-example/, which contains:- a folder
~/batss-example/in/with (optional) inputs, like parameters values saved in RData format, or an R script containing user-defined functions to be used bybatss.glm(like the functiontreatalloc.fundescribed in the ANCOVA help page) that is to be sourced by each CPU, - a folder
~/batss-example/out/where all outputs from the CPUs will be stored, - an R script
~/batss-example/1-run.r, which- optionally loads and sources elements of the folder
~/batss-example/in/, - runs the
batss.glmfunction for a set of seeds assigned by Slurm, - saves the output in folder
~/batss-example/out/folder,
- optionally loads and sources elements of the folder
- an R script
~/batss-example/2-combine.r, which merges the different 500batss.glmoutputs with the functionbatss.combine, - a shell script
~/batss-example/batss_sim.shcontaining the instructions for each CPU.
- a folder
The simulation is conducted by:
- i/ invoking the
sbatchcommand, which - ii/ runs the shell script
~/batss-example/batss_sim.shon each CPU, which - iii/ starts R and executes the
~/batss-example/1-run.rscript that performs the parallelised simulation, - iv/ and finally executing the
~/batss-example/2-combine.rscript that merges all simulation results.
Let’s describe each step:
sbatch command
The following command corresponds to the Slurm
sbatch submission command. It submits a job array with task
IDs from 1 to 500 (i.e., the number of CPUs), executing the script
sim_batch.sh and passing in (input folder) and
out (output folder) as arguments for each task:
# move to the folder of interest
cd ~/batss-example
# sbatch function
sbatch --array=1-500 batss_sim.sh in outFor additional options such as setting the maximum computation time, memory limits, or email notifications, please refer to the help section of the sbatch command or consult your cluster’s documentation.
batss_sim.sh shell script
The following command of the shell script fed into the
sbatch call above tells each CPU to start R
(both in slave and vanilla mode), run the
script 1-run.r with
- arguments
$1and$2, corresponding respectively to theinandoutfolders specified at the end of the call tosbatch, - the location where to save the
Routfiles related to each task (i.e., CPU job here referred to as$SLURM_ARRAY_TASK_IDand provided by Slurm): these files will be saved in theoutfolder (indicated as$2).
R --slave --vanilla < 1-run.r --args $SLURM_ARRAY_TASK_ID $1 $2 > $2/${SLURM_ARRAY_TASK_ID}.Rout 2>&1Note that the R program may not be directly available via a call to R and that you might need to
- specify the full path to R,
- add to the shell script a way to make R available (like the
modulecommand, for example)
You can check this with your cluster manager.
1-run.r R script
The following code shows the content of the R script run by each CPU. The code
- loads the library BATSS,
- defines the vector of seeds related to the job ID attributed by Slurm,
- runs the batss.glm for that vector of seeds with
computationset tosequentialandextendedset to 1, - saves the results in the output folder under of name that corresponds to the first seed.
###################################
## setup
###################################
# load library
library(BATSS)
# define list of seed for CPU to handle where
# - args[1] is the job id attributed by slurm
# that we will use as first seed
# - 10000 is the number of simulations (i.e., seeds)
# - 500 is the number of CPUs
seed.list = seq(as.numeric(args[1]),10000,500)
# define path to in/ and out/ folders (input of sbatch)
path_in <- paste0(args[2],"/")
path_out <- paste0(args[3],"/")
# optionally load/source info from relevant files
# of "in/" here
###################################
## simulation
###################################
# only compute res if needed
if(!any(dir(path_out)==id.seed$id[1])){
##
## batss.glm
##
start = Sys.time()
sim = batss.glm(
model = y ~ group,
var = list(y = rnorm,
group = alloc.balanced),
var.control = list(y = list(sd = 5)),
beta = c(1, 1, 2),
which = c(2:3),
alternative = "greater",
R = seed.list,
N = 200,
interim = list(recruited = seq(100, 180, 20)),
prob0 = c(C = 1/3, T1 = 1/3, T2 = 1/3),
eff.arm = eff.arm.simple,
eff.arm.control = list(b = 0.975),
fut.arm = fut.arm.simple,
fut.arm.control = list(b = 0.05),
computation = "sequential",
H0 = TRUE,
extended = 1)
finish = Sys.time()
## print required time
cat("\trequired time: ",finish-start,"\n\n")
## store results
save(sim,file=paste0(path_out,seed.list[1],".rdata"))
}# end if
cat("\n\t",date(),"\n")
cat("\n\t DONE!\n")
q("no")2-combine.r R script
Once the simulation is complete, the following code merges all results into a single object:
# load library
library(BATSS)
# define list of 'successful' jobs
job.list = dir("out/")[!grepl("Rout",dir("out/"))]
# list of potential 'unsuccessful' slurm jobs to be
# investigated by looking at the corresponding
# 'out' (slurm) and 'Rout' (R) files
seq(1,500)[is.na(match(paste0(seq(1,500),".rdata"),job.list))]
# merge
sim = batss.combine(paste0("out/",job.list))
# store results
save(sim,file=paste0("out/full.rdata"))