As i am heading towards the end of my time in my computer science boot camp i have realized i needed to see some real life examples of things i could potentially do in the future. the easiest way to do that was look at real life examples of big projects that have been taken in the past.
i wanted to find a research paper withing the last few years as computer science is ever evolving so taking things from the recent past will hold more weight for me now. the paper i chose was from december 2nd 2021 so its been less than a year since this came out and it was called "How We Determined Crime Prediction Software Disproportionately Targeted Low-Income, Black, and Latino Neighborhoods". the title instantly hooked me because these are things i have always believed happened but would have trouble finding defined evidence to present other people. The final dataset they used for analysis contained more than 5.9 million predictions.
Now as someone who has a decent amount of knowledge in this field i was able to fully understand all of the code and techniques but how would i be able to explain this to someone who is not as tech-savvy?
to begin what exactly are the writers of this paper looking at to make their claims? and how do we know its correct? well, according to the author they obtained PredPol crime prediction data, PredPol was one of the first data analytic tools used by police and is currently the most popular, this data has never before been released by PredPol. One of their associates 'Gizmodo' found it exposed on the open web (the portal is now secured) and downloaded more than seven million PredPol crime predictions for between 2018 and 2021. After securing the data and categorizing it they were able to find a number of factors that happen during most police interactions, such as number of times arrested, how many uses of force, amount of patrols by police and many more, then compared those metrics between different ethnicities and income ranges.
this is one of their final models and clearly shows that the low end of the spectrum in terms of blocks targeted by PredPol was vast majority white people while blocks with the highest amount of targets from PredPol are considerably disproportionate towards black and latino groups. this clearly shows in real data what a lot of people in America have known for a long time.
The creators of this article were then able to show that not only were blacks and Latinos were the group most targeted in block groups that were highly targeted but they also show that the proportion of people who were black or Latino drastically rose in accordance with blocks that are most targeted.
Another model they ended up using to further cement their point about what they were able to find in their data they were then able to find data on arrest rates sorted by ethnicity. In most counties within the data set you can see that black people are extremely over represented in the amount of arrest rates they are over twice as likely in certain areas of the U.S to be arrested when officers arrive to these predicted areas the algorithm has told them to patrol.
As you can see not only were black and latino groups targeted by this system but also areas with very low average income. you can see the very drastic difference between the most targeted blocks and least target blocks of households. households that have high diversity of wealth often see similar amounts of patrols regardless of any circumstance but blocks with disproportionately large amount of poor residence see such a jump up in targeting from the PredPol.
to further cement this point they further looked into the data and were able to find data on public housing and how this factored into the predictions targeting their areas. Some findings of this part of the study were that:
In Jacksonville, 63 percent of public housing was located in the block groups PredPol targeted the most.
In Elgin, 58 percent of public housing was located in the block groups PredPol targeted the most.
In Portage; Livermore, Calif.; Cocoa, Fla.; South Jordan, Utah; Gloucester, N.J.; and Piscataway, every single public housing facility was located in block groups that were targeted the most.
as you can clearly see areas which involve people without the funds to buy their own properity were highly over represented within the PredPol predictions so they were heavily targeted just like how people who are black or latino are heavily targeted from this system.
obviously a lot of the stuff talked about in this paper is seen as common sense to certain groups, but that only is because those groups might have had first hand experience with these troubles with police, being able to take data from PredPol which is a data collection agency themselves just shows how much data reveals truths, as it was eventually leaked we can point to real life companies data to show what a lot of people already had preconceptions about cops.
this paper was really informative to me because Ive always been interested in figuring out very broad problems in our country today and learning how to take data and present them plainly to others will be a huge boost in what i want to do in the future for government work.
if i were to try and better explain this paper in plain words to either a business stakeholder who has interests in human resources or even a politician who wants to see change in our country like i do this paper would be extraordinary in pointing to facts when deciding policies to help others.
in conclusion i would recommend people read the whole paper as it is too much to explain in a short blog but i hope i was able to represent at least a little of their points well.
The paper i referenced: https://themarkup.org/show-your-work/2021/12/02/how-we-determined-crime-prediction-software-disproportionately-targeted-low-income-black-and-latino-neighborhoods#2021-predpol-methodology_race-percentile