DEV Community

t-o-d
t-o-d

Posted on

Split CSV by number in shell only.

  • Sometimes I have to deal with large CSV files.
  • In this case, dividing and foldering in advance will make it easier to handle.
  • This section describes how to split files and classify folders according to the number of lines using only Shell.

Result

  • The following is the directory structure before execution.
.
├── sample.csv
├── main.sh
Enter fullscreen mode Exit fullscreen mode
  • The following is the description of main.sh
    • sample.csv has 100 rows of data
    • ※Error handling is omitted.
#!/bin/sh
set -e

# file path
[ ! -e "$1" ] && exit 1 || datafile="$1"
# File extension deletion
filename="${datafile%.*}"
# Get number of lines
row=$(grep -c '' $datafile)
# Obtaining the number of splits
sep="$2"
# Number of directories created
dir_cnt=$(awk -v row="$row" -v sep="$sep" 'BEGIN {
    i=row/sep
    printf("%d\n",i+=i<0?0:0.999)
    }
    '
)
# Folder creation
seq -f "${filename}_%01.0f" 1 ${dir_cnt} |
xargs mkdir -p
# File division
split -l ${sep} -a 2 $datafile "${filename}_"
# File movement
count=1
for i in `find . -type f -name "${filename}_*" | sort`
do
    mv $i "${filename}_${count}/${i//_*/_${count}}.csv"
    let count++
done
Enter fullscreen mode Exit fullscreen mode
  • Run as follows.
sh main.sh sample.csv 25
Enter fullscreen mode Exit fullscreen mode
  • After executing, check that the directory structure is as follows.
.
├── main.sh
├── sample.csv
├── sample_1
│   ├── sample_1.csv
├── sample_2
│   ├── sample_2.csv
├── sample_3
│   ├── sample3.csv
├── sample_4
│   ├── sample4.csv
Enter fullscreen mode Exit fullscreen mode

Supplement

Number of created directories

  • Rounding up
    • In the case of a decimal number such as 100/15, the directory is not created normally.
    • Round up to an integer with printf.

File splitting and moving

  • Extension is added by mv.
    • Additional extension (--additional-suffix) in split is not the default on Mac etc.

Link

Top comments (0)