The best way to copy a lot of data in Linux fast is using tar
. It's much faster than cp
or copying in any file manager. Here's a simple command with progress bar (pv
needs to be installed) that needs to be executed inside a folder that you want to copy recursively:
$ tar cf - . | pv | (cd /destination/path; tar xf -)
Top comments (3)
This is not universal advice.
Tar (for younger audience - Tape ARchiving ) is a way to present set of files as single continuous file that can be streamed to/from magnetic tape storage. So for this method to be advantageous additional CPU time on both sides must be lower than doing full synchronous disk operations sequentially for each file. So
tar | xxx | tar
:scp
that needs to proces each file sequentially and wait for network confirmation from other side.I just did some quick benchmark on 20G repository with 3000 files on APFS filesystem on PCIe NVMe 3.0 disk and:
tar | tar
took 20scp -r
managed to finish in 12scp -r -c
(Copy on Write) finished in 1.3sI had to copy files from HDD to SSD, SSD to SSD, SSD to NVMe 3 and NVMe3 to NVMe3. It was from 80GB to 2TB data folders.
Tar
was always extremely faster. Whattar
did in minutes, it tookcp
hours. So from my perspective it is universal advice if someone has lots of data (which for me is more than 100GB) and there's a lot of small files there (cp
fails completely in this case). I guess if someone is just copying few big filescp
could work, but it's not the case in most backup situations.Hours? I think something was really off with your system configuration (journaling issues? incomplete RAID array? not enough PCIe lanes? kernel setup?). That was extremely slow even for SATA 3.0 SSD which should crunch 2TB folder with moderate amount of files in it in ~1h using pure
cp
.Anyway -
tar
is helpful when full, synchronous roundabout of copying single file is costly. But for those cases I preferfind | parallel
combo because:tar
cp
,scp
,rsync
, etc.