In any case, you shouldn't neglect the subtle but important sources of bias those pre-crime models can have. Here's an interesting talk about it:<p><a href="https://www.youtube.com/watch?v=MfThopD7L1Y" rel="nofollow">https://www.youtube.com/watch?v=MfThopD7L1Y</a><p>Basically, one instance of bias is the fact that many crime-prediction models are trained on police data, which means they will predict crime in places more often targeted by the police anyway. Then the model predictions even amplify that effect, since more training data may be generated from the places now more often policed, etc.<p>There's lots of resources out there on AI fairness these days. I think everyone who tries stuff like crime prediction should read up on that topic.
Any such system is/would be potentially very dangerous. Crime data is not the same thing as crime. Populations that are over-policed are be disproportionately represented in any such data set, leading to higher prediction of crime, leading in turn more over-policing (feedback loop). I implore anyone attempting to build such a system to consider the serious issue of machine bias and it's implications in the real world.<p>See this tutorial given at this years NIPS machine learning conference: <a href="http://mrtz.org/nips17/#/" rel="nofollow">http://mrtz.org/nips17/#/</a>
This is an area that was explored some years ago, but ultimately determined to have civil rights pitfalls. Crime reporting is only as good (or biased) as the humans that report and input the crime data. Therefore, crime "training" data for AI systems can be very biased and it might only magnify those biases more so using AI - a sort of self-perpetuating negative feedback loop.<p>Having worked in law enforcement at various levels (state and federal) in a prior professional life, I can attest to the differences in what gets reported and how based upon who was working or supervising and where they were assigned. Humans are simply not reliable reporters for this kind of data. No matter how hard we try to make the reports plain and standardized our biases, one way or another, will always seep in.
Inspired by a Kaggle competition (<a href="https://www.kaggle.com/c/sf-crime" rel="nofollow">https://www.kaggle.com/c/sf-crime</a>), one of my older blog posts involved predicting the type of arrest in San Francisco (given that an arrest occurred) using data such as location and timing and the relatively new LightGBM machine learning algorithm: <a href="http://minimaxir.com/2017/02/predicting-arrests/" rel="nofollow">http://minimaxir.com/2017/02/predicting-arrests/</a><p>The code is open-sourced in an R Notebook: <a href="http://minimaxir.com/notebooks/predicting-arrests/" rel="nofollow">http://minimaxir.com/notebooks/predicting-arrests/</a><p>The model performance isn't great enough to usher in precrime, even in the best case. There are likely better approaches nowadays. (e.g. since the location data is spatial, a convolutional neural network might work better.)
There are much better ways to solve crime than to double down on enforcement that is already happening, which is likely all your model will tell you. “Police the neighbourhoods where people are poor” wow, thanks ML!<p>Palantir already does all this on a massive scale for the US govt. Want to affect future crime in a positive way? Solve the problems that contribute to it.<p>Not that you asked.
I am currently writing my master thesis on predictive policing using machine learning. Working with local police in Norway. Got a bunch of papers and articles you might find interesting. Hit me up: michaedm@stud.ntnu.no
A lot of good work by Cynthia Rudin <a href="http://online.liebertpub.com/doi/pdf/10.1089/big.2014.0021" rel="nofollow">http://online.liebertpub.com/doi/pdf/10.1089/big.2014.0021</a> and her tools are open sourced (her papers <a href="https://users.cs.duke.edu/~cynthia/papers.html" rel="nofollow">https://users.cs.duke.edu/~cynthia/papers.html</a> and tools <a href="https://users.cs.duke.edu/~cynthia/code.html" rel="nofollow">https://users.cs.duke.edu/~cynthia/code.html</a>)
Do you know about the journalist who spent years obsessing about this and supposedly had some predictive success relating to serial killers?<p>If I recall it was kind of a lone wolf effort, so I don’t know the rigor of his techniques, howver you never know if he might want to share results or collaborate.<p>Don’t have a link handy, but that should be enough info to google if you’re interested.
There is a project[1] + whitepaper[2] on projecting the likelihood of future white collar crimes written by Sam Lavigne, Francis Tseng, and Brian Clifton.<p>[1] <a href="https://thenewinquiry.com/white-collar-crime-risk-zones/" rel="nofollow">https://thenewinquiry.com/white-collar-crime-risk-zones/</a>
[2] <a href="https://whitecollar.thenewinquiry.com/static/whitepaper.pdf" rel="nofollow">https://whitecollar.thenewinquiry.com/static/whitepaper.pdf</a>
<a href="https://en.wikipedia.org/wiki/Predictive_policing" rel="nofollow">https://en.wikipedia.org/wiki/Predictive_policing</a><p>The British series "The Code" speaks a little bit about it in ep 3:
<a href="https://en.wikipedia.org/wiki/The_Code_(2011_TV_series)#Stage_3:_The_Finale" rel="nofollow">https://en.wikipedia.org/wiki/The_Code_(2011_TV_series)#Stag...</a>
Believe I heard about a project a UW student did predicting crime in San Francisco based on volume of vulgar tweets in a given area. Not sure if it's on github anywhere but you can always start with that idea. Nothing about specifics of the crimes, just where a high volume of them would be located.
There's a British tv presenter and scientist called Hannah Fry who has published in this area, including a talk in Germany (received just like many comments on this page), some Numberphile videos and BBC documentaries in other areas of data science.
<a href="https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing" rel="nofollow">https://www.propublica.org/article/machine-bias-risk-assessm...</a><p>Food for thought on how incredibly biased these effort can be.
For a source of data: <a href="https://data.cityofchicago.org/" rel="nofollow">https://data.cityofchicago.org/</a><p>And in the case of crime, chicago should be a pretty good dataset.
<a href="https://github.com/kandluis/crime-prediction" rel="nofollow">https://github.com/kandluis/crime-prediction</a> is a good place to start
Are you looking for predicting future crimes in an area (i.e. city, neighborhood, state, etc...) or predicting whether an individual will commit future crimes?