Austin

Posted on Mar 21, 2023 • Edited on Apr 7, 2023

Analyzing New York City Traffic Collisions with Perl One-Liners and Linux

#perl #linux #data #traffic

Just when you think it’s safe outside. Here are some statistics relating to traffic collisions in New York City.

Background

NYC publishes vehicle collision data which anyone can access using their API. You can also download this information in standard CSV (Comma Separated Values) file format.
The file is fairly large, 420 MB, with almost 2 Million lines.

-rw-rw-r-- 1 austin austin 402M Mar  4 20:38 all_motor_vehicle_collision_data.csv
…
bash > wc -l all_motor_vehicle_collision_data.csv 
1972886 all_motor_vehicle_collision_data.csv

Display the first five records of the dataset using head

bash > head -n5 all_motor_vehicle_collision_data.csv 
CRASH DATE,CRASH TIME,BOROUGH,ZIP CODE,LATITUDE,LONGITUDE,LOCATION,ON STREET NAME,CROSS STREET NAME,OFF STREET NAME,NUMBER OF PERSONS INJURED,NUMBER OF PERSONS KILLED,NUMBER OF PEDESTRIANS INJURED,NUMBER OF PEDESTRIANS KILLED,NUMBER OF CYCLIST INJURED,NUMBER OF CYCLIST KILLED,NUMBER OF MOTORIST INJURED,NUMBER OF MOTORIST KILLED,CONTRIBUTING FACTOR VEHICLE 1,CONTRIBUTING FACTOR VEHICLE 2,CONTRIBUTING FACTOR VEHICLE 3,CONTRIBUTING FACTOR VEHICLE 4,CONTRIBUTING FACTOR VEHICLE 5,COLLISION_ID,VEHICLE TYPE CODE 1,VEHICLE TYPE CODE 2,VEHICLE TYPE CODE 3,VEHICLE TYPE CODE 4,VEHICLE TYPE CODE 5
09/11/2021,2:39,,,,,,WHITESTONE EXPRESSWAY,20 AVENUE,,2,0,0,0,0,0,2,0,Aggressive Driving/Road Rage,Unspecified,,,,4455765,Sedan,Sedan,,,
03/26/2022,11:45,,,,,,QUEENSBORO BRIDGE UPPER,,,1,0,0,0,0,0,1,0,Pavement Slippery,,,,,4513547,Sedan,,,,
06/29/2022,6:55,,,,,,THROGS NECK BRIDGE,,,0,0,0,0,0,0,0,0,Following Too Closely,Unspecified,,,,4541903,Sedan,Pick-up Truck,,,
09/11/2021,9:35,BROOKLYN,11208,40.667202,-73.8665,"(40.667202, -73.8665)",,,1211      LORING AVENUE,0,0,0,0,0,0,0,0,Unspecified,,,,,4456314,Sedan,,,,

Using head to display the first record only

bash > head -n1 all_motor_vehicle_collision_data.csv 
CRASH DATE,CRASH TIME,BOROUGH,ZIP CODE,LATITUDE,LONGITUDE,LOCATION,ON STREET NAME,CROSS STREET NAME,OFF STREET NAME,NUMBER OF PERSONS INJURED,NUMBER OF PERSONS KILLED,NUMBER OF PEDESTRIANS INJURED,NUMBER OF PEDESTRIANS KILLED,NUMBER OF CYCLIST INJURED,NUMBER OF CYCLIST KILLED,NUMBER OF MOTORIST INJURED,NUMBER OF MOTORIST KILLED,CONTRIBUTING FACTOR VEHICLE 1,CONTRIBUTING FACTOR VEHICLE 2,CONTRIBUTING FACTOR VEHICLE 3,CONTRIBUTING FACTOR VEHICLE 4,CONTRIBUTING FACTOR VEHICLE 5,COLLISION_ID,VEHICLE TYPE CODE 1,VEHICLE TYPE CODE 2,VEHICLE TYPE CODE 3,VEHICLE TYPE CODE 4,VEHICLE TYPE CODE 5

Use Perl to list the column names in numerical order

bash > perl -F, -an -E '$. == 1 && say $i++ . "\t$_" for @F'  all_motor_vehicle_collision_data.csv
0   CRASH DATE
1   CRASH TIME
2   BOROUGH
3   ZIP CODE
4   LATITUDE
5   LONGITUDE
6   LOCATION
7   ON STREET NAME
8   CROSS STREET NAME
9   OFF STREET NAME
10  NUMBER OF PERSONS INJURED
11  NUMBER OF PERSONS KILLED
12  NUMBER OF PEDESTRIANS INJURED
13  NUMBER OF PEDESTRIANS KILLED
14  NUMBER OF CYCLIST INJURED
15  NUMBER OF CYCLIST KILLED
16  NUMBER OF MOTORIST INJURED
17  NUMBER OF MOTORIST KILLED
18  CONTRIBUTING FACTOR VEHICLE 1
19  CONTRIBUTING FACTOR VEHICLE 2
20  CONTRIBUTING FACTOR VEHICLE 3
21  CONTRIBUTING FACTOR VEHICLE 4
22  CONTRIBUTING FACTOR VEHICLE 5
23  COLLISION_ID
24  VEHICLE TYPE CODE 1
25  VEHICLE TYPE CODE 2
26  VEHICLE TYPE CODE 3
27  VEHICLE TYPE CODE 4
28  VEHICLE TYPE CODE 5

Explanation

'perl -an -E'
- Split up the column values into array '@F'
'-F,'
- Specifies a comma field separator.
'$. == 1'
- The Perl special variable '$.' contains the current line number.
- Display the first line only.
'say $i++ . "\t$_" for @F'
- Prints a tab separated counter variable '$i', and the corresponding column name, stored in the Perl default variable '$_'.

Create a work-file containing the zip-code, injury count, and fatality count. Only get records that include a zip-code and at least one injury or fatality.

3   ZIP CODE
10  NUMBER OF PERSONS INJURED
11  NUMBER OF PERSONS KILLED

Method

Unfortunately the previous method for splitting a comma delimited file has limitations. It cannot handle fields with embedded commas. The Street Name fields often have embedded commas which will throw off our column numbering.
To get around this we can use Text::CSV, which has both functional and OO interfaces. For one-liners, it exports a handy csv function. From the Text::CSV documentation 'my $aoa = csv (in => "test.csv") or die Text::CSV_XS->error_diag;'', it’ll convert the CSV file into an array of arrays.
I'll modify this example slightly to 'csv( in => $ARGV[0], headers => qq/skip/ )'. The @ARGV array contains any input arguments. The first element $ARGV[0] will contain the input CSV file. We don’t need the header row, so it’ll be skipped.

perl -MText::CSV=csv  -E '$aofa = csv( in => $ARGV[0], headers => qq/skip/ ); ( $_->[3] =~ /^\S+$/ ) && say qq/$_->[3],$_->[10],$_->[11]/ for @{$aofa}'  all_motor_vehicle_collision_data.csv | sort -t, -k 1 -r > sorted_injured_killed_by_zip.csv

Explanation

Input file 'all_motor_vehicle_collision_data.csv'
'perl -MText::CSV=csv'
- Run the perl command with '-M' switch to load a Perl module.
'Text::CSV=csv'
- Export the ‘csv’ function from the 'Text::CSV' module.
'( $_->[3] =~ /^\S+$/ )'
- Use a Regular expression to only process rows that have non-blank data in the ZIP CODE field.
'say qq/$->[3],$->[10],$_->[11]/ for @{$aofa}'
- Loop through the Array of Arrays '$aofa'
- Print the contents of columns 3,10,11 followed by a line break.
The output is piped '|' into the Linux sort command.
- Sorting on the first field, ZIP CODE and redirecting, '>' into a new file, 'sorted_injured_killed_by_zip.csv'.
- See the ss64.com site for more details on the Linux sort command.
The new file has about 1.36 Million lines.

Get a line count with wc. Display the first 10 records using head

bash > wc -l sorted_injured_killed_by_zip.csv 
1359291 sorted_injured_killed_by_zip.csv
bash > head -n10 sorted_injured_killed_by_zip.csv | column -t -s, --table-columns=ZipCode,#Injured,#Killed
ZipCode  #Injured  #Killed
11697    4         0
11697    3         0
11697    2         0
11697    2         0
11697    2         0
11697    1         0
11697    1         0
11697    1         0
11697    1         0
11697    1         0

Explanation

'wc -l'
- Counts the number of lines in our new file
'head -n 10'
- Prints out the first 10 lines of the file
'column -t -s, --table-columns=ZipCode,#Injured,#Killed'
- column
- '-t' switch will tell 'column' to print in table format.
- '-s' switch specifies an input delimiter of ','.
- The output is tabbed.

List the 10 worst zip codes for injuries

We can use the output file, sorted_injured_killed_by_zip.csv, from the previous example

perl -n -E '@a=split(q/,/,$_);$h{$a[0]} += $a[1]; END{say qq/$_,$h{$_}/ for keys %h}' sorted_injured_killed_by_zip.csv  | sort -nr -t, -k 2  | head -n10 | column -t -s, --table-columns=ZipCode,#Injured
ZipCode  #Injured
11207    10089
11236    7472
11203    7426
11212    6676
11226    6103
11208    6027
11234    5505
11434    5403
11233    5159
11385    4440

Explanation

'@a=split(q/,/,$_);'
- As there are no embedded commas in this file we use the Perl ‘split’ function to break up the 3 CSV fields in each row into array '@a'.
'$h{$a[0]} += $a[1];'
- The first element of each row, ZIP CODE is used as a key for Hash'%h'.
- The value is the accumulated number of injuries for that ZIP CODE.
'$h{$a[0]} += $a[1]'
- We accumulate the second element, $[1], which contains 'NUMBER OF PERSONS INJURED'
- We can set a value for a Hash key without checking if it exists already.
- This is called Autovivification which is explained nicely by The Perl Maven.
'END{say qq/$,$h{$}/ for keys %h}'
- The 'END{}'block runs after all the rows are processed.
- The keys(Zip Codes) are read and printed along with their corresponding values.
- We could have used Perl to sort the output by the keys, or values.
- I used the Linux sort.
'sort -nr -t, -k 2'
- Will perform a numeric sort, descending on the # of people injured.
'head -n10'
- Will get the first 10 records printed.
'column -t -s, --table-columns=ZipCode,#Injured'
The 'columns' command will produce a prettier output.
- '-t' for table format.
- '-s' to specify that the fields are comma separated
- '--table-columns'' to add column header names.

Observation

Zip code 11207, which encompasses East New York, Brooklyn, as well as a small portion of Southern Queens, has a lot of issues with traffic safety.

Display the 10 worst zip codes for traffic fatalities

perl -n -E '@a=split(q/,/,$_);$h{$a[0]} += $a[2]; END{say qq/$_,$h{$_}/ for keys %h}' sorted_injured_killed_by_zip.csv  | sort -nr -t, -k 2  | head -n10 | column -t -s, --table-columns=ZipCode,#Killed
ZipCode  #Killed
11236    44
11207    34
11234    29
11434    25
11354    25
11229    24
11208    24
11206    23
11233    22
11235    21

Explanation

With a few minor adjustments, we got the worst zip codes for traffic collision fatalities
'$h{$a[0]} += $a[2]'
- Accumulate the third element, $[2], which contains 'NUMBER OF PERSONS KILLED'

Observation

Zip code 11236, which includes Canarsie Brooklyn is the worst for traffic fatalities according to this data.
- Zip code 11207 is also very bad for traffic fatalities, as well as being the worst for collision injuries
These stats are not 100 percent correct, as out of 1,972,886 collision records, 1,359,291 contained Zip codes.
- We have 613,595 records with no zip code, which were not included in the calculations.

Some NYC Borough Stats

Similar to how we created the 'sorted_injured_killed_by_zip.csv', we can run the following command sequence to create a new file 'sorted_injured_killed_by_borough.csv'

perl -MText::CSV=csv  -E '$aofa = csv( in => $ARGV[0], headers => qq/skip/ ) ; ( $_->[2] =~ /^\S+/ ) && say qq/$_->[2],$_->[10],$_->[11]/ for @{$aofa}'  all_motor_vehicle_collision_data.csv | sort  -t, -k 3rn -k 2rn -k 1  >|  sorted_injured_killed_by_borough.csv

Explanation

The Borough field is the third column, '2 BOROUGH', starting from 0, in the 'all_motor_vehicle_collision_data.csv' file.
'( $_->[2] =~ /^\S+/ )'
- Only get rows which have non blank data in the BOROUGH field.
'sort -t, -k 3rn -k 2rn -k 1'
- I added some more precise sorting, which is unnecessary except to satisfy my curiosity.
- sort
- '-k 3rn''
- Sort by column 3(starting @ 1), which is the fatality count field.
- This is sorted numerically in descending order.
- '-k 2rn''
  - When equal, the injury count is also sorted numerically, descending.
- '-k 1'
- The Borough is sorted in ascending order as a tiebreaker.

Display the first 10 rows of this file.

bash > head -n10 sorted_injured_killed_by_borough.csv | column -t -s, --table-columns=Borough,#Injured,#Killed
Borough        #Injured  #Killed
MANHATTAN      12        8
QUEENS         3         5
QUEENS         15        4
QUEENS         1         4
STATEN ISLAND  6         3
BROOKLYN       4         3
BROOKLYN       3         3
QUEENS         3         3
BROOKLYN       1         3
QUEENS         1         3

Sanity check if we got all five boroughs

cut -d, -f 1  sorted_injured_killed_by_borough.csv | sort -u 
BRONX
BROOKLYN
MANHATTAN
QUEENS
STATEN ISLAND

Explanation

'cut -d, -f 1'
- cut to split the comma delimited file records.
- '-d,'
- Specifies that the cut will comma delimited
- '-f 1'
- Get the first field from the cut, which is the Borough Name.
'sort -u'
- Sorts and prints only the unique values to STDOUT
We got all 5 New York City boroughs in this file.

Display collision injuries for each borough

bash > perl -n -E '@a=split(q/,/,$_);$h{$a[0]} += $a[1]; END{say qq/$_,$h{$_}/ for keys %h}' sorted_injured_killed_by_borough.csv   | sort -nr -t, -k 2 | column -t -s,
BROOKLYN       137042
QUEENS         105045
BRONX          62880
MANHATTAN      61400
STATEN ISLAND  15659

Observation

Brooklyn emerges as the Borough with the most traffic injuries.

Display collision fatalities by Borough

bash > perl -n -E '@a=split(q/,/,$_);$h{$a[0]} += $a[2]; END{say qq/$_,$h{$_}/ for keys %h}' sorted_injured_killed_by_borough.csv   | sort -nr -t, -k 2 | column  -J -s, --table-columns Borough,#Killed
{
   "table": [
      {
         "borough": "BROOKLYN",
         "#killed": "564"
      },{
         "borough": "QUEENS",
         "#killed": "482"
      },{
         "borough": "MANHATTAN",
         "#killed": "300"
      },{
         "borough": "BRONX",
         "#killed": "241"
      },{
         "borough": "STATEN ISLAND",
         "#killed": "88"
      }
   ]
}

Explanation

Similar to the Injury count by Borough, this counts all fatalities by borough and prints the output in JSON format.
'column -J -s, --table-columns Borough,#Killed' Use the 'column' command with the '-J' switch, for JSON, instead of '-t' for Table.

I forgot to mention what date range is involved with this dataset. We can check this with the cut command.

cut -d, -f1 all_motor_vehicle_collision_data.csv | cut -d/ -f3 | sort -u
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023

Explanation

Get the date field, '0 CRASH DATE' which is in 'mm/dd/yyyy' format.
'cut -d, -f ' 'all_motor_vehicle_collision_data.csv'
- Get the first column/field of data for every row of this CSV file.
- '-d,' specifies that we are cutting on the comma delimiters.
- '-f 1' specifies that we want the first column/field only
- This is the date in 'mm/dd/yyyy' format.
'cut -d/ -f3'
- Will cut the date using '/’ as the delimiter.
- Grab the third field from this, which is the four digit year.
'sort -u'
- The years are then sorted with duplicates removed.

Observation

The dataset started sometime in 2012 and continues until now, March 2023.

Display the 20 worst days for collisions in NYC

bash > cut -d, -f1 all_motor_vehicle_collision_data.csv | awk -F '/' '{print $3 "-" $1 "-" $2}' | sort | uniq -c | sort -k 1nr | head -n20 | column -t --table-columns=#Collisions,Date
#Collisions  Date
1161        2014-01-21
1065        2018-11-15
999         2017-12-15
974         2017-05-19
961         2015-01-18
960         2014-02-03
939         2015-03-06
911         2017-05-18
896         2017-01-07
884         2018-03-02
883         2017-12-14
872         2016-09-30
867         2013-11-26
867         2018-11-09
857         2017-04-28
851         2013-03-08
851         2016-10-21
845         2017-06-22
845         2018-06-29
841         2018-12-14

Explanation

Get a count for all collisions for each date on record
Display the first 20 with the highest collision count
cut
- Get the first column from the dataset.
- Pipe this date into the awk command.
- AWK is a very useful one-liner tool as well as being a full scripting language.
'awk -F '/' '{print $3 "-" $1 "-" $2}'
- '-F '/' '
- Split the date into separate fields using the ’/’ as a delimiter.
- $1 contains the month value, $2 contains the day of month and $3 contains the four digit year value.
- These will be printed in the format '”yyyy-mm-dd”.
Dates are then sorted and piped into the uniq command.
'uniq -c'
- Will create a unique output.
- '-c' switch gets a count of all the occurrences for each value.
The output is piped into another sort command, which sorts by the number of occurrences descending.

Observation

I’m not sure if there is any explanation for why some days have a lot more collisions than others. Weatherwise, January 21 2014 was a cold day, but otherwise uneventful. November 15 2018 had some snow, but not a horrific snowfall. The clocks went back on November 4, so that wouldn’t be a factor.
2014-01-21 weather
2018-11-15 weather

Display the twenty worst times during the day for collisions

bash > cut -d, -f2 all_motor_vehicle_collision_data.csv | sort | uniq -c | sort -k 1nr | head -n20 |column -t  --table-columns=#Collisions,Time
#Collisions  Time
27506       16:00
26940       17:00
26879       15:00
24928       18:00
24667       14:00
22914       13:00
20687       9:00
20641       12:00
20636       19:00
19865       16:30
19264       8:00
19107       10:00
19106       14:30
19010       0:00
18691       11:00
18688       17:30
16646       18:30
16602       20:00
16144       8:30
16008       13:30

Explanation

We use the time field, '1 CRASH TIME', which is in 24 hour format, 'HH:MM'

Observation

Using the 'actual time' may be a bit too specific. I’ll use the 'hour of day' instead to give a 60 minute time frame.

 bash > cut -d, -f2 all_motor_vehicle_collision_data.csv | cut -d : -f1 | sort | uniq -c | sort -k 1nr | head -n10 | column -t --table-columns=#Collisions,Hour
#Collisions  Hour
143012      16
139818      17
132443      14
123761      15
122971      18
114555      13
108925      12
108593      8
105206      9
102541      11

Explanation

Similar to the previous example, except this time the cut command is used to split the time HH:MM, delimited by ':'
'cut -d : -f 1'
- '-d'
- The 'cut' delimiter is ':'
- '-f 1'
- Grab the first field, 'HH' of the 'HH:MM'.
Use something like the printf command to append ':00' to those hours.

Observation

As you would expect, most collisions happen during rush hour.

Display the worst years for collisions

bash > cut -d, -f1 all_motor_vehicle_collision_data.csv | cut -d '/' -f3 | sort | uniq -c | sort -k 1nr | head -n10 | column -t --table-columns=#Collisions,Year
#Collisions  Year
231564       2018
231007       2017
229831       2016
217694       2015
211486       2019
206033       2014
203734       2013
112915       2020
110546       2021
103745       2022

Explanation

We use the first column, '0 CRASH DATE' again
'cut -d '/' -f3'
- Extracts the 'yyyy' from the 'mm/dd/yyyy'

Observation

Some improvement seen in 2020, 2021 and 2022, if you can believe the data.
One unscientific observation here is that the NYPD may have been much less vigilant in the past few years than they were prior to Commissioner Raymond Kelly and his successors.
Also, by only printing out the worst 10 years, partial years 2012 and 2023 were excluded.

For a comparison. See how many people were injured or killed in traffic collisions.

First create a work file, `'sorted_injured_killed_by_year.csv'`, with three columns, Year, Injured count and Fatality count

We need the Text::CSV Perl module here due to those embedded commas in earlier fields. Below are the three fields needed.

0   CRASH DATE
10  NUMBER OF PERSONS INJURED
11  NUMBER OF PERSONS KILLED

bash > perl -MText::CSV=csv -E '$aofa = csv( in => $ARGV[0], headers => qq/skip/ ); ($_->[10] || $_->[11]) && say substr($_->[0],6,4) . qq/,$_->[10],$_->[11]/ for @{$aofa}' all_motor_vehicle_collision_data.csv | sort > sorted_injured_killed_by_year.csv

Check out the work file we just created

bash > wc -l sorted_injured_killed_by_year.csv 
433081 sorted_injured_killed_by_year.csv

bash > head -n10 sorted_injured_killed_by_year.csv | column -t -s, --table-columns=Year,#Injured,#Killed
Year  #Injured  #Killed
2012  0         1
2012  0         1
2012  0         1
2012  0         1
2012  0         1
2012  0         1
2012  0         1
2012  0         1
2012  0         1
2012  0         1

Worst years for collision injuries

perl -n -E '@a=split(q/,/,$_);$h{$a[0]} += $a[1]; END{say qq/$_,  $h{$_}/ for sort {$h{$b} <=> $h{$a} } keys %h}' sorted_injured_killed_by_year.csv | head -n10 |  column  -t -s, --table-columns=Year,#Injured 
Year  #Injured
2018    61941
2019    61389
2017    60656
2016    60317
2013    55124
2022    51883
2021    51780
2015    51358
2014    51223
2020    44615

Explanation

This is similar to how we got the Zip Code and Borough data previously.
This time the Perl sort is used instead of the Linux sort.
'END{say qq/$_, $h{$_}/ for sort {$h{$b} <=> $h{$a} } keys %h}'
- 'for' statement loops through the '%h' hash keys(years).
- The corresponding Hash values(Injured count), are sorted in descending order.
'sort {$h{$b} <=> $h{$a} }'.
- $a and $b are default Perl sort variables.
- Rearranged it to 'sort {$h{$a} <=> $h{$b} }', to sort the injury count in ascending order.

Observation

While the collision count may have gone down, there isn't any real corresponding downward trend in injuries.

The worst years for collision fatalities.

bash > perl -n -E '@a=split(q/,/,$_);$h{$a[0]} += $a[2]; END{say qq/$_,  $h{$_}/ for sort {$h{$b} <=> $h{$a} } keys %h}' sorted_injured_killed_by_year.csv | head -n10 |  column  -t -s, --table-columns=Year,#Killed
Year  #Killed
2013    297
2021    294
2022    285
2020    268
2014    262
2017    256
2016    246
2019    244
2015    243
2018    231

Explanation

Slightly modified version of the injury by year count.

Observation

Same as with the injuries count. There isn't any real corresponding downward trend in traffic collision fatalities.

Conclusion

There’s lots more work that can be done to extract meaningful information from this dataset.

What’s clear to me, is that all the political rhetoric and money poured into Vision Zero has yielded little in terms of results.
Most of the solutions are obvious from a logical point of view, but not a political point of view. I walk and cycle these streets and know how dangerous it is to cross at the “designated” crosswalks when cars and trucks are turning in on top of you. Cycling in NYC is even worse.

Some sugggested solutions

Delayed green lights co cars cars don't turn in on pedestrians at crosswalks.
Much higher tax and registraton fees for giant SUV's and pickup trucks. The don’t belong in the city.
Better bike lanes, instead of meaningless lines painted on the road.
- Many bike lanes are used as convenient parking for NYPD and delivery vehicles.
Basic enforcement of traffic laws, which isn't being done now.
- Drivers ignore red lights, speed limits, noise restrictions etc. when they know they aren't being enforced.
- Driving while texting or yapping on the phone is the norm, not the exception.
Drastically improve public transit, especially in areas not served by the subway system.

Top comments (2)

Matthew O. Persico • Mar 22 '23

Absolutely thrilling tour-de-force of the utility of Perl and how to be a data scientist with an interesting data set!

Austin • Mar 23 '23

Thank you Matthew

Just when you think it’s safe outside. Here are some statistics relating to traffic collisions in New York City.

Background

Display the first five records of the dataset using head

Using head to display the first record only

Use Perl to list the column names in numerical order

Explanation

Create a work-file containing the zip-code, injury count, and fatality count. Only get records that include a zip-code and at least one injury or fatality.

Method

Explanation

Get a line count with wc. Display the first 10 records using head

Explanation

List the 10 worst zip codes for injuries

We can use the output file, sorted_injured_killed_by_zip.csv, from the previous example

Explanation

Observation

Display the 10 worst zip codes for traffic fatalities

Explanation

Observation

Some NYC Borough Stats

Explanation

Display the first 10 rows of this file.

Sanity check if we got all five boroughs

Explanation

Display collision injuries for each borough

Observation

Display collision fatalities by Borough

Explanation

I forgot to mention what date range is involved with this dataset. We can check this with the cut command.

Explanation

Observation

Display the 20 worst days for collisions in NYC

Explanation

Observation

Display the twenty worst times during the day for collisions

Explanation

Observation

Explanation

Observation

Display the worst years for collisions

Explanation

Observation

For a comparison. See how many people were injured or killed in traffic collisions.

First create a work file, 'sorted_injured_killed_by_year.csv', with three columns, Year, Injured count and Fatality count

Check out the work file we just created

Worst years for collision injuries

Explanation

Observation

The worst years for collision fatalities.

Explanation

Observation

Conclusion

Some sugggested solutions

Some Perl CLI Resources

Some NYC Street Resources

Me

First create a work file, `'sorted_injured_killed_by_year.csv'`, with three columns, Year, Injured count and Fatality count