GNU parallel is a shell tool for executing jobs in parallel locally or using remote computers. A job is typically a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. If you use xargs today you will find GNU parallel very easy to use, as GNU parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel. If you use ppss or pexec you will find GNU parallel will often make the command easier to read. GNU parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU parallel as input for other programs.
Release Notes: ‘-L n –pipe’ will use records of n lines. This is useful when processing data that have fixed records with a fixed number of lines (e.g. fastq). –filter-hosts will remove down hosts. For each remote host, the program checks that login through ssh works. If not, the host will not be used. Currently you can not put –filter-hosts in a profile, $PARALLEL, /etc/parallel/config, or similar. –pipe now uses fork() instead of busy wait. The performance should be better on computers with more than 10 cores while remaining the same on computers with fewer cores.
Tags: Text Processing, parallel, Parallel processing, Multicore, Clustering/Distributed Networks, Command Line Tools, Filters, System Administration