Random notes mostly on Machine Learning

Exploiting Multiple Machines for Embarrassingly Parallel Applications

During work on my machine learning project I was needed to perform some quite computation-heavy calculations several times — each time with a bit different inputs. These calculations were CPU and memory bound, so just spawning them all at once would just slow down overall running time because of increased amount of context switches. Yet running 4 (=number of cores in my CPU) of them at a time (actually, 3 since other applications need CPU, too) should speed it up.

Fortunately, I have an old laptop with 2 cores as well as an access to somewhat more modern machine with 4 cores. That results in 10 cores spread across 3 machines (all of`em have some version of GNU Linux installed). The question was how to exploit such a treasury.

And the answer is GNU Parallel with some additional bells and whistles. GNU Parallel allows one to execute some commands in parallel and even in a distributed way.

The command was as following

parallel -u --wd ... -S :,host1,host2 --trc {}.emb "sh {}"
Here we have:
  • wd stands for working directory. Three-dots means parallel’s temporary folder
  • S contains list of hosts with : being a localhost
  • trc stands for “Transfer, Return, Cleanup” and means that we’d like to transfer an executable file to target host, return specified file and do a cleanup

parallel accepts list command arguments (file names) in standard input and executes a command (sh in my case) for each of them.

ls -1 jobs/* | parallel -u --wd ... -S :,host1,host2 --trc {}.emb "sh {}"

There’s a problem: we usually need more than one file to do usefull stuff. There are several solutions to that problem

  • Bring all files manually
    It’s a solution, but somewhat tedious one: setting computing environment on a several machines is dull
  • tar it and do all the stuff in a command
    Looks better, but some shell kung-fu is required
  • Use shar
    Basically it’s a tar archive with some shell commands for (self-)extracting. I chose this way and glued in some my code.
Tagged as gnu parallel, linux
comments powered by Disqus