Cluster computation
Cluster.Rmd
To speed up the computation, the batss.glm
function can
use parallelisation on single machines when
computation = "parallel"
(which is the default).
When using a cluster, parallelisation is best achieved by letting the
cluster workload manager - typically Slurm
on clusters
running Linux - split the set of seeds (corresponding to as many
simulated trials) between cluster nodes and cpus.
Let’s assume a BATSS user wants to perform a Monte Carlo simulation considering 10’000 trials and has 500 cpus to do so. The strategy we suggest consists in
- running
batss.glm
on each cpu with a subset of the 10’000 seeds of interest specified in argumentR
, so that each cpu evaluates a different set of seeds, - saving the (500)
batss.glm
outputs as a Rdata files with the functionsave
under different names (like one of the seed evaluated by the cpu like the first or the last one, for example), - finally use the function
batss.combine
to combine these outputs.
In the next Section, we show examples of use of the function
batss.combine
.