jorin

Posted on Nov 10, 2017

CSV Challenge

#fun #coding #challenge #puzzle

You got your hands on some data that was leaked from a social network and you want to help the poor people.

Luckily you know a government service to automatically block a list of credit cards.

The service is a little old school though and you have to upload a CSV file in the exact format. The upload fails if the CSV file contains invalid data.

The CSV files should have two columns, Name and Credit Card. Also, it must be named after the following pattern:

YYYYMMDD.csv.

The leaked data doesn't have credit card details for every user and you need to pick only the affected users.

The data was published here:

data.json

You don't have much time to act.

What tools would you use to get the data, format it correctly and save it in the CSV file?

Do you have a crazy vim configuration that allows you to do all of this inside your editor? Are you a shell power user and write this as a one-liner? How would you solve this in your favorite programming language?

Show your solution in the comments below!

Top comments (33)

Thomas Rayner • Nov 10 '17

PowerShell to the rescue!

$json = invoke-webrequest 'gist.githubusercontent.com/jorinvo...' | convertfrom-json

$json | select name,creditcard | export-csv "$(get-date -format yyyyMMdd).csv" -NoTypeInformation

Daniel Coturel • Nov 10 '17

Excellent, man

Tobias Salzmann • Nov 10 '17 • Edited

ramda-cli:

curl -s https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json \
| ramda 'filter where name: (complement isNil), creditcard: (complement isNil)' 'map (x) -> x.name + ", " + x.creditcard' -o raw > `date +%Y%m%d.csv`

scala:

import java.io.{BufferedWriter, FileOutputStream, OutputStreamWriter}
import java.text.SimpleDateFormat
import java.util.Date

import io.circe.generic.auto._
import io.circe.parser._

object Data extends App {
  case class CCInfo(name: Option[String], creditcard: Option[String])

  val url = "https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json"
  val json = scala.io.Source.fromURL(url).mkString

  val infos = decode[List[CCInfo]](json).toOption.get

  val lines = infos.collect{case CCInfo(Some(name), Some(creditcard)) => s"$name, $creditcard"}

  Helper.writeToFile(lines, s"${Helper.formatDate("yyyyMMdd")}.csv")
}

object Helper {
  def writeToFile(lines: TraversableOnce[String], fileName: String): Unit = {
    val writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(fileName)))
    for (x <- lines) {
      writer.write(x + "\n")
    }
    writer.close()
  }

  def formatDate(format: String, date: Date = new Date()) = 
    new SimpleDateFormat(format).format(new Date())
}

Francesco Cogno • Nov 12 '17

Aaaand Rust :)

Really an overkill for this task but fun nevertheless!

extern crate chrono;
extern crate csv;
extern crate futures;
extern crate hyper;
extern crate hyper_tls;
extern crate serde;
#[macro_use]
extern crate serde_derive;
extern crate serde_json;
extern crate tokio_core;

use futures::prelude::*;
use futures::future::ok;
use tokio_core::reactor::Core;
use hyper::client::Client;
use hyper_tls::HttpsConnector;
use chrono::{DateTime, FixedOffset};
use std::collections::HashMap;
use csv::Writer;
use std::fs::File;


#[derive(Debug, Deserialize, Clone)]
struct Record {
    name: String,
    email: Option<String>,
    city: Option<String>,
    mac: String,
    timestamp: String,
    creditcard: Option<String>,
}

#[derive(Debug, Clone)]
struct RecordParsed {
    record: Record,
    ts: DateTime<FixedOffset>,
}

const FORMAT: &'static str = "%Y%m%d";

fn main() {
    let mut core = Core::new().unwrap();
    let client = Client::configure()
        .connector(HttpsConnector::new(4, &core.handle()).unwrap())
        .build(&core.handle());

    let uri = "https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json"
        .parse()
        .unwrap();

    let fut = client.get(uri).and_then(move |resp| {
        resp.body().concat2().and_then(move |body| {
            let array: Vec<Record> = serde_json::from_slice(&body as &[u8]).unwrap();
            let mut a_parsed: HashMap<String, Vec<RecordParsed>> = HashMap::new();

            array
                .into_iter()
                .filter(|item| item.creditcard.is_some())
                .map(|item| {
                    let dt =
                        DateTime::parse_from_str(&item.timestamp, "%Y-%m-%d %H:%M:%S %z").unwrap();

                    let rp = RecordParsed {
                        record: item,
                        ts: dt,
                    };

                    let date_only = format!("{}.csv", rp.ts.format(FORMAT).to_string());

                    let ret = match a_parsed.get_mut(&date_only) {
                        Some(ar) => {
                            ar.push(rp);
                            None
                        }
                        None => {
                            let mut ar: Vec<RecordParsed> = Vec::new();
                            ar.push(rp);
                            Some(ar)
                        }
                    };

                    if let Some(ar) = ret {
                        a_parsed.insert(date_only, ar);
                    }
                })
                .collect::<()>();

            a_parsed
                .iter()
                .map(|(key, array)| {
                    println!("generating file == {:?}", key);
                    let file = File::create(key).unwrap();
                    let mut wr = Writer::from_writer(file);

                    array
                        .iter()
                        .map(|record| {
                            let creditcard = match record.record.creditcard {
                                Some(ref c) => c,
                                None => panic!("should have filtered those!"),
                            };
                            wr.write_record(&[&record.record.name, creditcard]).unwrap();
                        })
                        .collect::<()>();
                })
                .collect::<()>();

            ok(())
        })
    });

    core.run(fut).unwrap();
}

Ayman Nedjmeddine • Nov 11 '17 • Edited

A oneliner if you're a linuxer 😉

curl -sSLo- https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json \
| jq -r '.[] | {name: .name, creditcard: .creditcard} | join(",")' \
> `date +%Y%m%d`.csv

However, there is something you have not mentioned in your post: Should the CSV file have the header line?

If yes, then use this:

echo 'name,creditcard' > `date +%Y%m%d`.csv && \
curl -sSLo- https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json \
| jq -r '.[] | {name: .name, creditcard: .creditcard} | join(",")' \
>> `date +%Y%m%d`.csv

Devin Weaver • Nov 12 '17

This adds quotes.

"Dax Brekke II,1234-2121-1221-1211"
"Brando Stanton Jr.,1228-1221-1221-1431"
"Lacey McDermott PhD,"
"Elza Bauch,"

Maybe adding this sed command:

curl -sSLo- https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json \
| jq '.[] | {name: .name, creditcard: .creditcard} | join(",")' \
| sed -e 's/^"//' -e 's/"$//' -e 's/\\"/"/g' \
> "$(date +%Y%m%d).csv"

Richard Metzler • Nov 11 '17

Doesn't the second solution need a >> in the last line, so the output is appended?

Ayman Nedjmeddine • Nov 12 '17

Yes, it does. (Didn't copy the correct version)

Thanks ☺

Timur Zurbaev • Nov 12 '17

PHP:

<?php

$json = json_decode(file_get_contents('https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json'), true);

$users = array_filter($json, function (array $item) {
    return !empty($item['name']) && !empty($item['creditcard']);
});

$file = fopen(date('Ymd').'.csv', 'w+');

foreach ($users as $user) {
    fputcsv($file, [$user['name'], $user['creditcard']]);
}

fclose($file);

Michael Orji • Nov 12 '17 • Edited

You beat me to the PHP implementation. And your solution is so elegant.

Devin Weaver • Nov 12 '17

Since the input JSON could be really large, here is a Node.JS steaming version (using stream-json package):

#!/usr/bin/env node
let fs = require('fs');
let { Transform } = require('stream');
let StreamArray = require("stream-json/utils/StreamArray");
let stream = StreamArray.make();

function escapeCSV(str) {
  if (str == null) { return ''; }
  return /[",]/.test(str) ? `"${str.replace(/"/g, '\\"')}"` : str;
}

class CsvStream extends Transform {
  constructor() {
    super({objectMode: true});
  }
  _transform(chunk, enc, cb) {
    let { name, creditcard } = chunk.value;
    let line = [name, creditcard].map(escapeCSV).join(',');
    this.push(`${line}\n`);
    cb();
  }
}

process.stdin
  .pipe(stream.input);

stream.output
  .pipe(new CsvStream())
  .pipe(process.stdout);

jorin • Nov 12 '17

Nice! There is also csv-write-stream then you can save some code :)

Ryan Palo • Nov 13 '17

Using the CSV module to avoid any quoting pitfalls. :)

require 'CSV'
require 'date'
require 'JSON'

data = JSON.parse(`curl #{ARGV[0]}`)
filename = Date.today.strftime('%Y%m%d') + '.csv'

CSV.open("#{filename}.csv", 'w') do |csv|
  data
    .select { |item| item['name'] && item['creditcard'] }
    .map { |item| [item['name'], item['creditcard']] }
    .sort
    .each { |item| csv << item }
end

jorin • Nov 13 '17 • Edited

Ruby is still one of the most pretty languages!
Maybe you can use the open(url).read from require 'open-uri' instead of curl to allow it to run on other systems 🙂

Alernatively could look like this:

CSV.open "#{Date.today.strftime '%Y%m%d'}.csv", 'w' do |csv|
  JSON.parse(open(ARGV[0]).read).each { |x| csv << x if x['creditcard'] }
end

Ryan Palo • Nov 13 '17

Oh, I like that!

I didn't know about those extra options for CSV. Awesome.
I didn't know about the open-uri built-in. Also awesome.
I love the short and sweet each block! It even feels a little Pythonic, which is nice. Also also awesome!

Josh Cheek • Nov 12 '17

A few things to note: cache is a program I wrote that caches command-line invocations, it's to make it cheap to iterate (e.g. so you don't have to hit the network each time) github.com/JoshCheek/dotfiles/blob...

My shell is fish (fishshell.com) which allows multi-line editing, and the parentheses in fish are like backticks in bash, so the > (...) is redirecting the output into a file whose name is the result of the ...

Al • Nov 13 '17 • Edited

library("jsonlite")

frames <- fromJSON("https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json")
frames <- frames[!is.na(frames$creditcard),]
frames <- frames[,c("name","creditcard")]

write.csv(frames, file="20171112.csv", row.names=FALSE)

Devin Weaver • Nov 12 '17

A vanilla Node.JS version:

#!/usr/bin/env node

function escapeCSV(str) {
  if (str == null) { return ''; }
  return /[",]/.test(str) ? `"${str.replace(/"/g, '\\"')}"` : str;
}

let data = require('./sample.json');
process.stdout.write('Name,Credit Card\n');
for (let { name, creditcard } of data) {
  let line = [name, creditcard].map(escapeCSV).join(',');
  process.stdout.write(`${line}\n`);
}

View full discussion (33 comments)