Hi HN,<p>I built "Vision Zero Reporting" (<a href="https://visionzeroreporting.com" rel="nofollow">https://visionzeroreporting.com</a>), a tool to detect editorial anti-patterns in local news coverage of car crashes.<p>Maybe you've noticed that local news articles about car crashes, especially those that involve "vulnerable road users" (VRU) such as bicyclists and pedestrians, tend to employ language that seems to blame the victim or only discuss the incident as an isolated event, rather than in context that crashes are preventable and are caused by specific reasons.<p>This tool is meant to help news publishers check their articles and learn the anti-patterns to avoid.<p>Here's a brief explanation of the problems my tool checks for:<p>1. Focus - Readers find the focus/subject of the sentence more in control of the situation, and hence more blameworthy (e.g. "A pedestrian was struck by a driver" VS "A driver struck a pedestrian").<p>2. Agency - Some sentences lack an agent altogether, which places more blame on the recipient (e.g. "A bicyclist was hit." VS "A bicyclist was hit by a driver.")<p>3. Object-based reference - Pedestrians and bicyclists are almost always referred to using people-based language, but drivers are referred to using object-based language 81% of the time [1] (e.g. "The vehicle fled the scene" VS "The driver fled the scene"). This language personifies and gives agency to vehicles rather than their drivers.<p>4. Accident - Accident is the most-used term in articles to describe the incident (47%). This term is being phased out by some news agencies because the word implies a sense of inevitability or that it happened purely by chance, when we know why car crashes happen and can take preventative action.<p>5. Framing - (still in beta) Articles employ an "episodic" frame, meaning they describe crashes as isolated incidents. Only 6% (!) of articles use "thematic" framing [1], meaning they contextualize the event by discussing road design, number of recent crashes in the area, quote experts, educate readers about road safety initiatives, etc.<p>6. Counterfactual - (still in beta) Counterfactuals are true statements, but imply the outcome could have been changed had the victim acted differently. While reporters may see these statements as sticking-to-the-facts, we've discovered in 700+ manually-annotated articles that counterfactuals almost always shift blame toward the victim (A bicyclist was struck; he wasn't wearing a helmet. It was dark outside, the biker wasn't wearing reflective clothing, and the driver told police he didn't see the bicyclist until it was too late.) Notice that all of these statements may be true, but goes hand-in-hand with the Framing issue discussed above: the bicyclist was hit, but is that because there is no protected bike lane? It was dark outside, but is road visibility a municipal obligation?<p>I'm looking for constructive feedback to make this tool better!<p>My work is based primarily on the following research papers (and I've already shown the tool to the authors - they loved it!):<p>[1] <a href="https://www.researchgate.net/publication/330975590_Editorial_Patterns_in_Bicyclist_and_Pedestrian_Crash_Reporting" rel="nofollow">https://www.researchgate.net/publication/330975590_Editorial...</a><p>[2] <a href="https://www.researchgate.net/publication/337279845_Does_news_coverage_of_traffic_crashes_affect_perceived_blame_and_preferred_solutions_Evidence_from_an_experiment" rel="nofollow">https://www.researchgate.net/publication/337279845_Does_news...</a>
This is neat, but something bugs me about the framing of the intended goal versus how you propose to get there.<p>If the intent is to raise public awareness and to put pressure on leadership to make roads generally safer, why do almost all the corrections follow a pattern of shifting blame from the individual pedestrian onto the individual driver? Many of the suggested fixes are just their own form of counterfactual - they're facts, sure, but they don't contribute to public understanding of _why_ the crash happened.<p>The examples of removing counterfactual outright and including thematic framing (focusing on road conditions, frequency of crashes, and aggregate statistics that tell the story of how often crashes occur) seem to be the best set of suggestions for correcting public perception on road safety. Not making sure the public knows this _one specific driver_ hit a pedestrian, rather than their car.
This is cool! I appreciate the effort here to make the roads safer for bicyclists and pedestrians, as I'd love to see more people prioritize this.<p>The homepage confused me because the presentation of the two examples made me think that they were before/after at first. It took me ~30s to realize they were just two separate articles.<p>I'd like to see a bad article juxtaposed with an improved version that someone created using your tool.<p>I'm maybe used to tools like Grammarly, but when I clicked / hovered over the highlighted text, I was surprised to see nothing happen. I found it a little difficult to scroll back and forth between the highlighted text and the context where it appeared in the article. Having an explanation appear next to the cursor on mouse click/hover might resolve this.<p>It would also be cool if the tool allowed me to import by URL (with some suggested real articles to show this is a problem in mainstream news sources) rather than require the user to copy/paste manually.
That's actually pretty amazing work.<p>Being super preachy, as a cliched straight white etc etc... after riding a bike I sort of get complaints about -isms, emotionally, in a way I wouldn't without.<p>Everyone has already concluded that <i>you deserved it</i>. Whatever happened. It does not matter if you were wearing hi-vis, or had a light, or were in a bike lane, or the nearest bike lane is 5 miles away and on the pavement for some reason, or were 3.1 ft from the kerb or 2.6. If you were stopped at a red light and are now sprawled in front of it because a van didn't, you're clearly lying. If a motorist jumps a set of temporary lights and hit you when you had the green, you're clearly lying. <i>You deserved it</i>, always and forever, is the only argument you need to know, the rest is window dressing.<p>Now we wonder why kids don't get exercise.
I don't think that "the cyclist was not wearing a helmet" or "impairment was not an issue" are examples of counterfactuals; the authors should crack open a dictionary look up what that word means in general use, and additionally how it is used in philosophy and science.<p>If you have word from the police that the driver was not found to have any alcohol or if the cyclist wasn't wearing a helmet, those are simply <i>facts</i>, not <i>counterfactuals</i>.<p>(An article about an accident should report all the hitherto known facts; then it can't be accused of having a bias with regard to cherry-picking the facts.)<p>There is a problem with "alcohol <i>doesn't appear</i> to be a factor" in that it lacks conclusiveness. Appear to whom? For what reasons, and why aren't they sure? The driver was able to touch their nose with their eyes closed, is that it? I think a reasonable rule should be to cull any probabilistic statements, or statements with hedge words introducing uncertainty: remove all such statements from third parties, and under no circumstances invent new probabilistic statements in the process of editorializing.<p>In fact, it's the use of the word <i>accident</i> that may have the counterfactual issue (and good point here). There is a supposition behind it which may be false. Maybe it wasn't an accident? You can't logically call a pedestrian hit an accident until other hypotheses have been ruled out, like the driver had a specific murder motive, or is crazy. It's also hard to call it an accident if the driver was blatantly reckless: acted contrary to the rules of the road which are intended to prevent such occurrences, and which the driver is legally obliged to follow as a matter of licensing. If a walking person falls into a fountain due to texting on their phone, it's difficult to swallow the word <i>accident</i>, since they were practically begging for something to happen by moving through an environment while choosing to block it out.
I gather that the intent is not to make the reporting more neutral or accurate, but to change the framing in a direction that vision zero finds more appealing. E.g. in the first article we should view the woman as a vulnerable road user, bearing no responsibility for being struck by a vehicle even though video evidence shows that she fell into the street.
A few months ago I wrote a short Twitter rant about a Boston Globe article that described a pedestrian being struck by a driver really poorly.<p>Thread: <a href="https://twitter.com/evanjfields/status/1387131251811831812/" rel="nofollow">https://twitter.com/evanjfields/status/1387131251811831812/</a><p>Article: <a href="https://www.bostonglobe.com/2021/04/27/metro/woman-28-seriously-injured-after-being-hit-by-car-cambridge/" rel="nofollow">https://www.bostonglobe.com/2021/04/27/metro/woman-28-seriou...</a><p>To my surprise, this tool finds no problems with the article and gives it a B.<p>(I'm really tickled by the tool, like the idea, but based on a totally not rigorous sample, seems like the tool leans a bit too much into sort of sentence structure analysis and elides some semantics?)
I like the general idea, and the use of NLP to implement it.<p>But I am dubious about some of the principles behind it.<p>- For example, I find it inaccurate to say "the driver hit the pedestrian", which to me suggests a collision between two people, not between a person and a vehicle. (Of course this does not apply to a phrase like "the vehicle fled the scene" - it is clear that it was the driver who fled the scene). While it's obvious the driver is responsible for the trajectory of their vehicle, it's also clear that the injuries and deaths are caused by the fact that one of the elements involved in the collusion is a 1+ ton piece of steel, and the other one a 70kg human being.<p>- Regarding the term "accident", I see in the Merriam-Webster that it is defined in this context as: "an unfortunate event resulting in particular from negligence or ignorance". It seems to me that this definition does not exonerate the driver from responsibility (lack of vigilance or competence).
Worth noting the history behind this issue of biased reporting:<p>> In the late 1920s and ’30s, a consortium of automobile manufacturers, insurers, and fuel companies known as the National Automobile Chamber of Commerce funded a wire service that provided free reporting on crashes to short-staffed Depression-era newspapers. Reporters could send in a few basic details about a local collision, and the wire service would craft a narrative that exonerated the driver, blamed any pedestrians who were involved, and — crucially — transformed virtually every “crash” into an understandable or even inevitable “accident.” Newspapers around the country published the industry-approved stories, often without edits.<p>Source: <a href="https://usa.streetsblog.org/2020/03/05/streetsblog-101-how-journalists-help-build-car-culture/" rel="nofollow">https://usa.streetsblog.org/2020/03/05/streetsblog-101-how-j...</a>
this seems to me like it's designed to shift the blame from unilaterally on the VRU (bad), to unilaterally on the driver (also bad).<p>You pay lip service to systemic problems, but this is not a systemic approach to reducing crashes; it just implies that the driver is at fault, which is not always the case in these things. As another comment noted, reporters would do well to highlight the context around the crash if there's no clear fault (such as an intoxicated or negligent driver, or a pedestrian running into the road within minimum braking distance), such as lack of bike lanes, blind corners, etc that would apply pressure to authorities to make the roads safer.<p>Not all crashes can be default-blamed on the driver.
I was expecting things like "woman was hit by a car" to be replaced by "A car hit a woman" (prefer active voice).<p>Instead it refers to pedestrians as Vulnerable Road Users. It says the information that "alcohol was not a factor" is a "Distracting counterfactual. Readers place more blame on victims when articles use more counterfactual statements. Counterfactuals also obscure the systemic nature of incidents and place unreasonable burden on individuals."<p>I thought this would be an automated grammar assistant for basic English writing techniques you should have learned in high school. It's not. Instead, it's a tool for injecting bias into reporting.
Here’s why the reframing matters:<p>Accidents are “oops” that are to accepted as fact of life. The no reason to change the system. Accidents happen.<p>Crashes are not accidents. They have a cause and the cause can addressed. Crashes can be prevented.<p>So the reframing shifts from talking about a system we should accept how it is to one that can be improved.
> Show HN: Ensure all car crash articles are biased against the driver and vehicles generally using NLP<p>Seems like a better title for this.<p>In the analysis of the first article, under the "recommendations" about counterfactuals, it's literally telling you to remove relevant context about the incident, to ensure the readers can't possibly come to the "wrong" conclusion about who's responsible.<p>Distrust of news media is at an all-time high[1] and if people can't see how this sort of thing contributes, I really don't know what to say at this point.<p>1. <a href="https://news.gallup.com/poll/321116/americans-remain-distrustful-mass-media.aspx" rel="nofollow">https://news.gallup.com/poll/321116/americans-remain-distrus...</a>
> "The vehicle fled the scene" VS "The driver fled the scene"<p>It is quite common for the driver to flee the scene while leaving behind the disabled vehicle.<p>…one of my old coworker’s vehicle was involved in an accident with a pedestrian who ran full speed in front of her while it was dark outside and the vehicle was traveling at the legal speed limit. Said vehicle was unable to stop in time which resulted in the speeding pedestrian striking the vehicle.
I like it.<p>I like that it tags segments of text as opposed to the document as a whole.<p>Since it is confined to a domain and has a well-chosen problem it seems to be highly correct and thus useful.
I think this is going to be a major application for "attempting" to identify "types" of biases. I'm being super cautious here because the term bias is painfully misunderstood in most circumstances and often misused as a cudgel rather than a context. But I think it could be educational to observe cultural/social/political trends by distilling content this way.
Is the source code available? It would be interesting to apply this to reporting on other topics.<p>As far as suggestions, it would be great to see an explanation of each problem when hovering over it in the text. Also, the colors for the "Object", "Counterfactuals", and "Accident" problems are difficult to distinguish for color-deficient individuals.
I like it! I assume the red/yellow/green are roughly bad/warning/good, but what do the blue highlights mean? I agree with another comment that a tooltip to say which of the categories you listed a given span falls into.
This is an interesting tool! It would be cool if you could take say 20 articles from Gothamist, Streetsblog, NYT, NY Post, Pix11, etc. and see how they all ranked in this system. That could be a really interesting blog post.
--
Which tools are you using for these NLP judgments? Are you using SpaCy at all? Are there any models you've built here, or is this all rules on top of an NLP model? I'm working on NLP models for education, and we use SpaCy a ton. I'm peter@quill.org if you want to learn more about how we're using this.
I'm somewhat concerned about the ethical implications of this beyond the scope of just car crash reporting. While impressive, the nlp model will have to be tuned by someone, someone with biases the model will inevitably inherit. Seems to have obvious potential as a propaganda tool, which might be used by the 'good guys' today, and be used by malicious actors tomorrow.
I tried it out on <a href="https://www.denverpost.com/2021/10/07/medina-alert-hit-run-crash-larimer-county/" rel="nofollow">https://www.denverpost.com/2021/10/07/medina-alert-hit-run-c...</a>, which I would think in general a group like Vision Zero would approve of, and it got a C (-1 points), because of the phrase, "[The] vehicle will have heavy damage". This seems pedantic, at best, since the whole point of that phrase is to help the public identify the vehicle (and presumably the driver) which was involved in the accident.
Slightly related is the crashes[1], which collects reports of crashes in the local news. They do this because they feel that in the news cycle these are heavilly underreported (as compared to for instance a terrorist attack) and therefore they seem less as a problem (to for instance improve infrastructure to reduce the risk of crashes).<p>[1] <a href="https://us.thecrashes.org/" rel="nofollow">https://us.thecrashes.org/</a>
As feedback, speaking of contextualizing things, it would help if you would contextualize your colored highlighting by bringing some of the explanations of issues onto the page itself where the content is marked up. Or have links (maybe have the highlighted words be links) going to a writeup telling why those words got that color.
For consistency, we should use the term motorist instead of driver. Just like we use cyclist or bicyclist instead of rider or motorcyclist instead of motorcycle rider. The only exception would be for buses or trucks (bus driver of truck driver).
It would be nice if the color-coded key ("Object", etc.) linked to the detailed sections below.<p>When I first saw "Object" in the summary, I didn't know what it meant.
I'm curious how a similar tool built for airplane and helicopter crashes. What if it was due to mechanical failure, would you want the story to focus on a mechanic?
Very nice, I like it! And as a frequent cyclist and writing instructor, really appreciate this kind of work on many levels. Good to see NLP towards social good like this.
Super interesting... it's like a linter for car crashes.<p>There are opinions that are not going to be liked (i.e. crash vs accident) but such is the nature of any linter.
I'm afraid I find this somewhere between creepy and ridiculous.<p>On the ridiculous side: the idea that calling a car accident an "accident" is somehow wrong - as if, because I understand that what happened was an accident, I am somehow incapable of thinking "hey, maybe we need better road safety legislation".<p>On the creepy side: the general approach which is, instead of making public arguments about why cars are bad, to manipulate language so as to push people to your side, using results from behavioural science.<p>Of course, if you think the author is on the side of the angels, then this is great! But the techniques can be used by equally self-righteous people, with whom you may disagree.<p>Treating people like sheep is gross. Don't do it.
I hope you can look past the negative responses here. I think this is very well intentioned and well done. It would be nice to see more rationale on the website like what you've written here