Downloading multiple files from the command line can be a time-consuming process, especially when dealing with large datasets or numerous web pages. Fortunately, curl
, a versatile command-line tool, offers several methods to download files in parallel, significantly speeding up the process. This article explores various techniques for achieving parallel downloads using curl
, providing practical examples and explanations.
Parallel downloading involves initiating multiple download streams simultaneously. This approach offers several benefits:
xargs
for Parallel ExecutionThe xargs
command is a powerful utility for building and executing command lines from standard input. It can be used to run multiple curl
processes in parallel.
seq 1 10 | xargs -n1 -P2 bash -c 'i=$0; url="http://example.com/?page${i}.html"; curl -O -s $url'
Explanation:
seq 1 10
: Generates a sequence of numbers from 1 to 10, representing the page numbers to download.xargs -n1
: Instructs xargs
to process one input argument (a single page number) at a time.-P2
: Tells xargs
to keep two subprocesses running concurrently.bash -c '...'
: Executes the specified command in a new Bash shell.i=$0
: Assigns the input argument (page number) to the variable i
.url="http://example.com/?page${i}.html"
: Constructs the URL for the page to download.curl -O -s $url
: Downloads the file using curl
, saving it with the filename from the URL (-O
) and suppressing progress output (-s
).Alternative with -I
flag:
seq 1 10 | xargs -I{} -P2 -- curl -O -s 'http://example.com/?page{}.html'
curl
ProcessesAnother approach involves running multiple curl
commands in the background using the &
operator.
urls="http://example.com/?page1.html http://example.com?page2.html"
for url in $urls; do
echo "fetching $url"
curl "$url" -O -s &
done
wait
Explanation:
urls="http://example.com/?page1.html ..."
: Defines a string containing the URLs to download.for url in $urls; do ... done
: Iterates through each URL in the string.curl "$url" -O -s &
: Executes the curl
command in the background, allowing the loop to continue without waiting for the download to finish.wait
: Waits for all background processes to complete before the script exits.Limiting Concurrent Processes:
To prevent overwhelming the system, you can implement a mechanism to limit the number of concurrent curl
processes.
max_processes=3
current_processes=0
urls=( "url1" "url2" "url3" "url4" "url5" )
for url in "${urls[@]}"; do
while [ "$current_processes" -ge "$max_processes" ]; do
wait -n 1
current_processes=$((current_processes-1))
done
curl "$url" -O -s &
current_processes=$((current_processes+1))
done
wait
parallel
GNU Parallel is a powerful tool specifically designed for parallel execution of commands.
parallel --jobs 2 curl -O -s http://example.com/?page{}.html ::: {1..10}
Explanation:
parallel --jobs 2
: Specifies that parallel
should run two jobs concurrently.curl -O -s http://example.com/?page{}.html
: The command to execute for each input.::: {1..10}
: Provides the input arguments to the command (the numbers 1 to 10).As of version 7.66.0, curl
has built-in support for parallel downloads.
curl -Z 'http://httpbin.org/anything/[1-9].{txt,html}' -o '#1.#2'
Explanation:
-Z
: Enables parallel transfer mode'http://httpbin.org/anything/[1-9].{txt,html}'
: Specifies the URLs to download, using brace expansion to generate multiple URLs.-o '#1.#2'
: Defines the output filename pattern, where #1
refers to the base name of the URL and #2
refers to the extension.Limiting Concurrent Connections:
You can limit the number of concurrent connections using the --parallel-max
option.
curl --parallel --parallel-max 3 -O "http://example.com/file[1-5].txt"
Starting from version 7.68.0, curl allows specifying multiple URLs and output files in a config file.
curl --parallel --parallel-immediate --parallel-max 3 --config urls.txt
urls.txt:
url = "example1.com"
output = "example1.html"
url = "example2.com"
output = "example2.html"
url = "example3.com"
output = "example3.html"
url = "example4.com"
output = "example4.html"
url = "example5.com"
output = "example5.html"
While curl
is a powerful tool, other utilities can also be used for parallel downloading:
aria2c -x 5 <url>
axel -n 5 <url>
wget
with support for multi-threading.
wget2 --max-threads=5 <url>
Parallel downloading with curl
can significantly improve the efficiency of retrieving multiple files from the command line. By leveraging tools like xargs
, GNU parallel
, or curl
's built-in parallel transfer mode, you can optimize your network bandwidth and reduce download times. Choose the method that best suits your needs and system configuration to unlock the full potential of parallel downloading.