Nice work, I am glad that there is a paragraph talking about exposure. Crash trends based strictly on total number of crashes are easy to predict just based on where there is more traffic. Using crashes per vehicle mile traveled for road segments or crashes per entering vehicle for intersections can help tease out trends. Controlling for severity is also important.<p>When I do a crash analysis for a city, one of the tasks I do regularly for my job, I generate a crash rate and severity index for each intersection. The severity index is basically a weighted average based on severity, non-injury=1, minor injury=3, and severe injury or fatality=8. The crash rate and severity index are divided to create a Severity Rate. While not perfect or statistically valid, it does help identify trends. Also, I am in a rural state so it is rare that there are enough crashes to make any statistically valid conclusions.
I've worked extensively with this dataset on a similar project, <a href="http://crashmapper.org" rel="nofollow">http://crashmapper.org</a>, and through that process found that the data is extremely error prone. Perhaps 20% of the collisions recorded are not geocoded (e.g.lack lat, long coordinates) and don't contain other location information such as street, cross street, and zip code that could be used to geocode them. It appears that some precincts of the NYPD do a better job at recording a crash location then others. Even more of the data lacks values for "contributing factors" so it seems difficult to use as a metric for analysis. Often there is a mismatch between the total number of persons injured or killed and the number of pedestrians, cyclists, or motorists injured or killed. Furthermore, whomever maintains this dataset will periodically go back in time and update it seemingly at random, editing existing data or adding new data, potentially months or years back in time. Often it appears to be that the data maintainer is changing values for fields such as the number of pedestrians, cyclists, motorists injured or killed. Presumably this is because more information surfaced about an incident at a later point in time and the city must go back and update it. However this can result in stats from the data not aligning with the NYPD's or DOT's official stats from a previous year. I would advise anyone to keep these facts in mind if trying to use the data for analysis and policy recommendations, such is open data.
Having done something similar for the Long Beach, CA area in college, one of the most interesting takeaways was the relative spatial distribution between fatal and non-fatal accidents.<p>Non-fatal accidents clearly clustered around high traffic areas, but fatal accidents didn’t reveal the same clustering. Instead they appeared to be uniformly distributed across the city.<p>I’m sure there is an explanation in this, and this was only 10 years data for a single city, but it always felt a little spooky that these accidents were equally likely to happen anywhere (though most likely later in the night).
I'm not sure what constitutes a "collision", but in 2015, I lived on Lexington between 121 and 122 and saw the investigation of a Hit and Run of a homeless man. I talked to a couple of the witnesses who saw it happen.<p>This incident was at Lexington and 123rd. In the data, I do not see this incident.
The question is if the highlighted area are really more dangerous or if there are just more visitors. Shouldn't one take into account the traffic counts?<p>BTW: there is similar (open) data for Germany: <a href="https://unfallatlas.statistikportal.de/" rel="nofollow">https://unfallatlas.statistikportal.de/</a> (It clearly shows the problem I mentioned)<p>Update: sorry, it seems that this issue is already discussed in this thread
Lots of crashes in Hell's Kitchen. That area is full of people going out to bars and restaurants, tiny sidewalks, and lots of impatient drivers trying to get through Manhattan to New Jersey.
The map of total deaths includes a significant blip on the west side near Pier 40 and the Holland Tunnel, which I think is from the 2017 truck attack.<p><a href="https://en.wikipedia.org/wiki/2017_New_York_City_truck_attack" rel="nofollow">https://en.wikipedia.org/wiki/2017_New_York_City_truck_attac...</a><p>Map: <a href="https://imgur.com/a/jNbOv7W" rel="nofollow">https://imgur.com/a/jNbOv7W</a>
I would bet that the shadow/light patterns on Roosevelt Avenue & 94th Street, Queens cause significant visual distractions to drivers and pedestrians.
Drivers mostly hit other things when there's too many things demanding their attention (poor visibility + difficult left turn + busy traffic + bikes + pedestrians = high risk of accidents) so this is probably just a heat map of intersections that are the busiest (in terms of things going on, not necessarily throughput).<p>I'd like to see a month by month heat map.