Ask HN: Any open source code/materials on predicting future crimes based on data?

62 点作者 febin超过 7 年前

25 条评论

Asdfbla超过 7 年前

In any case, you shouldn't neglect the subtle but important sources of bias those pre-crime models can have. Here's an interesting talk about it:<a href="https://www.youtube.com/watch?v=MfThopD7L1Y" rel="nofollow">https://www.youtube.com/watch?v=MfThopD7L1Y</a>Basically, one instance of bias is the fact that many crime-prediction models are trained on police data, which means they will predict crime in places more often targeted by the police anyway. Then the model predictions even amplify that effect, since more training data may be generated from the places now more often policed, etc.There's lots of resources out there on AI fairness these days. I think everyone who tries stuff like crime prediction should read up on that topic.

评论 #15980490 未加载

评论 #15980366 未加载

评论 #15979739 未加载

bayesbiol超过 7 年前

Any such system is/would be potentially very dangerous. Crime data is not the same thing as crime. Populations that are over-policed are be disproportionately represented in any such data set, leading to higher prediction of crime, leading in turn more over-policing (feedback loop). I implore anyone attempting to build such a system to consider the serious issue of machine bias and it's implications in the real world.See this tutorial given at this years NIPS machine learning conference: <a href="http://mrtz.org/nips17/#/" rel="nofollow">http://mrtz.org/nips17/#/</a>

评论 #15980043 未加载

评论 #15979718 未加载

USNetizen超过 7 年前

This is an area that was explored some years ago, but ultimately determined to have civil rights pitfalls. Crime reporting is only as good (or biased) as the humans that report and input the crime data. Therefore, crime "training" data for AI systems can be very biased and it might only magnify those biases more so using AI - a sort of self-perpetuating negative feedback loop.Having worked in law enforcement at various levels (state and federal) in a prior professional life, I can attest to the differences in what gets reported and how based upon who was working or supervising and where they were assigned. Humans are simply not reliable reporters for this kind of data. No matter how hard we try to make the reports plain and standardized our biases, one way or another, will always seep in.

minimaxir超过 7 年前

Inspired by a Kaggle competition (<a href="https://www.kaggle.com/c/sf-crime" rel="nofollow">https://www.kaggle.com/c/sf-crime</a>), one of my older blog posts involved predicting the type of arrest in San Francisco (given that an arrest occurred) using data such as location and timing and the relatively new LightGBM machine learning algorithm: <a href="http://minimaxir.com/2017/02/predicting-arrests/" rel="nofollow">http://minimaxir.com/2017/02/predicting-arrests/</a>The code is open-sourced in an R Notebook: <a href="http://minimaxir.com/notebooks/predicting-arrests/" rel="nofollow">http://minimaxir.com/notebooks/predicting-arrests/</a>The model performance isn't great enough to usher in precrime, even in the best case. There are likely better approaches nowadays. (e.g. since the location data is spatial, a convolutional neural network might work better.)

SamReidHughes超过 7 年前

Careful! Your crime predictor might unfairly conclude that men are more likely to commit crimes than women.

评论 #15980498 未加载

评论 #15980244 未加载

评论 #15981274 未加载

评论 #15980487 未加载

评论 #15980826 未加载

lwansbrough超过 7 年前

There are much better ways to solve crime than to double down on enforcement that is already happening, which is likely all your model will tell you. “Police the neighbourhoods where people are poor” wow, thanks ML!Palantir already does all this on a massive scale for the US govt. Want to affect future crime in a positive way? Solve the problems that contribute to it.Not that you asked.

michaelmcmillan超过 7 年前

I am currently writing my master thesis on predictive policing using machine learning. Working with local police in Norway. Got a bunch of papers and articles you might find interesting. Hit me up: michaedm@stud.ntnu.no

评论 #15979977 未加载

评论 #15979607 未加载

评论 #15980293 未加载

thedrake超过 7 年前

A lot of good work by Cynthia Rudin <a href="http://online.liebertpub.com/doi/pdf/10.1089/big.2014.0021" rel="nofollow">http://online.liebertpub.com/doi/pdf/10.1089/big.2014.0021</a> and her tools are open sourced (her papers <a href="https://users.cs.duke.edu/~cynthia/papers.html" rel="nofollow">https://users.cs.duke.edu/~cynthia/papers.html</a> and tools <a href="https://users.cs.duke.edu/~cynthia/code.html" rel="nofollow">https://users.cs.duke.edu/~cynthia/code.html</a>)

WhitneyLand超过 7 年前

Do you know about the journalist who spent years obsessing about this and supposedly had some predictive success relating to serial killers?If I recall it was kind of a lone wolf effort, so I don’t know the rigor of his techniques, howver you never know if he might want to share results or collaborate.Don’t have a link handy, but that should be enough info to google if you’re interested.

评论 #15981618 未加载

jjoonathan超过 7 年前

Ask HN: Any open source code/materials on predicting good fall guys based on data?

ryanmaynard超过 7 年前

There is a project[1] + whitepaper[2] on projecting the likelihood of future white collar crimes written by Sam Lavigne, Francis Tseng, and Brian Clifton.[1] <a href="https://thenewinquiry.com/white-collar-crime-risk-zones/" rel="nofollow">https://thenewinquiry.com/white-collar-crime-risk-zones/</a> [2] <a href="https://whitecollar.thenewinquiry.com/static/whitepaper.pdf" rel="nofollow">https://whitecollar.thenewinquiry.com/static/whitepaper.pdf</a>

partycoder超过 7 年前

<a href="https://en.wikipedia.org/wiki/Predictive_policing" rel="nofollow">https://en.wikipedia.org/wiki/Predictive_policing</a>The British series "The Code" speaks a little bit about it in ep 3: <a href="https://en.wikipedia.org/wiki/The_Code_(2011_TV_series)#Stage_3:_The_Finale" rel="nofollow">https://en.wikipedia.org/wiki/The_Code_(2011_TV_series)#Stag...</a>

zebrafish超过 7 年前

Believe I heard about a project a UW student did predicting crime in San Francisco based on volume of vulgar tweets in a given area. Not sure if it's on github anywhere but you can always start with that idea. Nothing about specifics of the crimes, just where a high volume of them would be located.

tobylane超过 7 年前

There's a British tv presenter and scientist called Hannah Fry who has published in this area, including a talk in Germany (received just like many comments on this page), some Numberphile videos and BBC documentaries in other areas of data science.

YurtleTheTurtle超过 7 年前

<a href="https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing" rel="nofollow">https://www.propublica.org/article/machine-bias-risk-assessm...</a>Food for thought on how incredibly biased these effort can be.

paulie_a超过 7 年前

For a source of data: <a href="https://data.cityofchicago.org/" rel="nofollow">https://data.cityofchicago.org/</a>And in the case of crime, chicago should be a pretty good dataset.

crabl超过 7 年前

<a href="https://github.com/kandluis/crime-prediction" rel="nofollow">https://github.com/kandluis/crime-prediction</a> is a good place to start

jeffmould超过 7 年前

Are you looking for predicting future crimes in an area (i.e. city, neighborhood, state, etc...) or predicting whether an individual will commit future crimes?

PaulHoule超过 7 年前

<a href="https://en.wikipedia.org/wiki/CompStat" rel="nofollow">https://en.wikipedia.org/wiki/CompStat</a>

chiefalchemist超过 7 年前

Fwiw there's some discussion of this in the book Everybody Lies. Look into that. Perhaps follow up with the author. His name escapes me atm.

thisisit超过 7 年前

Are you looking for tools or data?

评论 #15979340 未加载

amigoingtodie超过 7 年前

You need to watch Person of Interest and Minority Report.

评论 #15979479 未加载

评论 #15988983 未加载

netrus超过 7 年前

Have you checked kaggle for relevant datasets?

0xdeadbeefbabe超过 7 年前

Why don't you base it on Law data?

0xdeadbeefbabe超过 7 年前

The Poisson distribution!

25 条评论

Asdfbla超过 7 年前

评论 #15980490 未加载

评论 #15980366 未加载

评论 #15979739 未加载

bayesbiol超过 7 年前

评论 #15980043 未加载

评论 #15979718 未加载

USNetizen超过 7 年前

minimaxir超过 7 年前

SamReidHughes超过 7 年前

Careful! Your crime predictor might unfairly conclude that men are more likely to commit crimes than women.

评论 #15980498 未加载

评论 #15980244 未加载

评论 #15981274 未加载

评论 #15980487 未加载

评论 #15980826 未加载

lwansbrough超过 7 年前

michaelmcmillan超过 7 年前

评论 #15979977 未加载

评论 #15979607 未加载

评论 #15980293 未加载

thedrake超过 7 年前

WhitneyLand超过 7 年前

评论 #15981618 未加载

jjoonathan超过 7 年前

Ask HN: Any open source code/materials on predicting good fall guys based on data?

ryanmaynard超过 7 年前

partycoder超过 7 年前

zebrafish超过 7 年前

tobylane超过 7 年前

YurtleTheTurtle超过 7 年前

paulie_a超过 7 年前

For a source of data: <a href="https://data.cityofchicago.org/" rel="nofollow">https://data.cityofchicago.org/</a>And in the case of crime, chicago should be a pretty good dataset.

crabl超过 7 年前

<a href="https://github.com/kandluis/crime-prediction" rel="nofollow">https://github.com/kandluis/crime-prediction</a> is a good place to start

jeffmould超过 7 年前

Are you looking for predicting future crimes in an area (i.e. city, neighborhood, state, etc...) or predicting whether an individual will commit future crimes?

PaulHoule超过 7 年前

<a href="https://en.wikipedia.org/wiki/CompStat" rel="nofollow">https://en.wikipedia.org/wiki/CompStat</a>

chiefalchemist超过 7 年前

Fwiw there's some discussion of this in the book Everybody Lies. Look into that. Perhaps follow up with the author. His name escapes me atm.

thisisit超过 7 年前

Are you looking for tools or data?

评论 #15979340 未加载

amigoingtodie超过 7 年前

You need to watch Person of Interest and Minority Report.

评论 #15979479 未加载

评论 #15988983 未加载

netrus超过 7 年前

Have you checked kaggle for relevant datasets?

0xdeadbeefbabe超过 7 年前

Why don't you base it on Law data?

0xdeadbeefbabe超过 7 年前

The Poisson distribution!