DEV Community

MercyMburu
MercyMburu

Posted on

Analyzing criminal incident data from Seattle or San Francisco

(This blog post has been written as one of the assignments of the Data Science MOOC being offered on Coursera by the University of Washington)
We will compare and contrast the crime data gathered from Seattle and San Francisco in this brief essay. In summary, we will track how the crimes in the two cities change over the course of a year. In order to determine which parts of the cities are more likely to see criminal activity, we will also map out the crimes that have been reported in the area.

Conclusions

  1. Over the course of a year, Seattle commits significantly more crimes than San Francisco does. June, July, and August are the busiest months for crimes in both cities.
  2. The northeastern part of San Francisco is where most crime events occur. In Seattle, most crimes happen in the core area of the city, with small urban pockets hosting the majority of crime incidences.

The yearly variations in the crime statistics in Seattle are as follows:

Image description

The yearly variations in the crime cases in San Francisco are as follows:

Image description

The two charts above show that the months of June, July, and August are when crime peaks in both cities. In contrast to San Francisco, Seattle has a significantly higher number of crime cases, and from September to December, the city sees a sharp decline in the number of instances. San Francisco's data, however, does not indicate such a tendency. The fact that the dataset we utilized didn't have all of the information for San Francisco could be one explanation for the data gap.

The following diagram illustrates the distribution of crime incidents across the areas of Seattle.

Image description

The visualization above shows that the majority of crimes are recorded from Seattle's center metropolitan neighborhoods. In addition, there is a concentration of crime cases reported in little clusters that are divided by areas with lower crime case densities. This pattern lends credence to the theory that Seattle's densely populated metropolitan regions have a higher distribution of crime cases.
The following illustration displays the crime cases reported across the city of San Francisco:

Image description

The visualization indicates that the northeastern area of San Francisco reports the highest number of crime data. The visualization unequivocally demonstrates that the northeastern portion of San Francisco is more criminalized than the rest of the city, despite the fact that there are pockets of the city with moderately high reported crime rates.

Code for the exercise:
The following is the code that I used to generate these graphs. The code was written in R.
require(googleVis)
require(ggmap)
sanfrancisco_dataset <- read.csv("sanfrancisco.csv")
seattle_dataset <- read.csv("seattle.csv")
seattle_dataset$Date.Reported <- as.Date(seattle_dataset$Date.Reported,"%Y/%m/%d")
seattle_crime_count_date<-aggregate(seattle_dataset$Offense.Type,by=list(seattle_dataset$Date.Reported), FUN = length)
names(seattle_crime_count_date) <- c("Date","Count")
SeattleHeatMap <- gvisCalendar(seattle_crime_count_date,datevar = "Date",numvar = "Count",options=list(
title="Calendar heat map of Crime cases in Seattle over the year",
calendar="{cellSize:10,yearLabel:{fontSize:20, color:'#444444'},focusedCellColor:{stroke:'red'}}",width=590, height=320),chartid="Calendar")
plot(SeattleHeatMap)
sanfrancisco_dataset$Date <- as.Date(sanfrancisco_dataset$Date,"%Y/%m/%d")
sanfrancisco_crime_count_date<-aggregate(sanfrancisco_dataset$IncidntNum, by = list(sanfrancisco_dataset$Date), FUN = length)
names(sanfrancisco_crime_count_date) <- c("Date", "Count")
SanFranciscoHeatMap <- gvisCalendar(sanfrancisco_crime_count_date,datevar = "Date",numvar = "Count",options=list(
title="Calendar heat map of Crime cases in San Francisco over the year",
calendar="{cellSize:10,yearLabel:{fontSize:20, color:'#444444'},focusedCellColor:{stroke:'red'}}",width=590, height=320),chartid="Calendar")

Top comments (0)