TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

I accidentally started a movement – Policing the Police by scraping court data

650 pointsby kristintynskiover 2 years ago
Almost 3 years ago, I posted a story of how a post I wrote about utilizing county level police data to &quot;police the police&quot; to r&#x2F;privacy and hackernews. <a href="https:&#x2F;&#x2F;old.reddit.com&#x2F;r&#x2F;privacy&#x2F;comments&#x2F;gr11aw&#x2F;i_think_i_accidentally_started_a_movement&#x2F;" rel="nofollow">https:&#x2F;&#x2F;old.reddit.com&#x2F;r&#x2F;privacy&#x2F;comments&#x2F;gr11aw&#x2F;i_think_i_a...</a><p>The idea quickly evolved into a real goal, to make good on the promise of free and open policing data. By freeing policing data from antiquated and difficult-to-access county data systems, and compiling that data in a rigorous way, we could create a valuable new tool to level the playing field and help provide community oversight of police behavior and activity.<p>In the almost 3 years since the first post, something amazing has happened.<p>The idea turned into something real. Something called The Police Data Accessibility Project. (<a href="https:&#x2F;&#x2F;www.pdap.io" rel="nofollow">https:&#x2F;&#x2F;www.pdap.io</a>)<p>More than 2,000 people joined the initial community, and while those numbers dwindled after the initial excitement, a core group of highly committed and passionate folks remained. In these 3 years, this team has worked incredibly hard to lay the groundwork necessary to enable us to realistically accomplish the monumental data collection task ahead of us.<p>Let me tell you a bit about what the team has accomplished in these 3 years.<p>Established the community and identified volunteer leaders who were willing and able to assume consistent responsibility.<p>-Gained a pro-bono law firm to assist us in navigating the legal waters. Arnold + Porter is our pro-bono law firm.<p>-Arnold + Porter helped us to establish as a legal entity and apply for 501c3 status<p>-501c3 status granted<p>-We&#x27;ve carefully defined our goals and set a clear roadmap for the future<p>-Hired first full-time staff.<p>-PDAP was awarded a $250,000 grant by The Heinz Endowments<p>So now, I&#x27;m asking for help, because scraping, cleaning, and validating 18,000 police departments is no easy task.<p>The first is to join us and help the team. Perhaps you joined initially, realized we weren&#x27;t organized yet, and left? Now is the time to come back. Or, maybe you are just hearing of it now. Either way, the more people we have working on this, the faster we can get this done. Those with scraping experience are especially needed. The second is to either donate, or help us spread the message. The more donations, the more data we can gather. I want to thank the r&#x2F;privacy community especially. It was here that things really began.<p>TL;DR: I accidentally started a movement from a blog post I wrote about policing the police with data. The movement turned into something real because of r&#x2F;privacy and hackernews: (Police Data Accessibility Project). 3 years later, the groundwork has been laid, non-profit established, full-time staff hired, and $250,000 in grant money and donations so far!<p>Scrapers so far Github <a href="https:&#x2F;&#x2F;github.com&#x2F;Police-Data-Accessibility-Project&#x2F;Scrapers" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;Police-Data-Accessibility-Project&#x2F;Scraper...</a> Discord if you would like to join the efforts: <a href="https:&#x2F;&#x2F;discord.com&#x2F;invite&#x2F;wMqex8nKZJ" rel="nofollow">https:&#x2F;&#x2F;discord.com&#x2F;invite&#x2F;wMqex8nKZJ</a><p>*This is US centric

25 comments

curiousllamaover 2 years ago
Really love the idea, and the passion behind it. Def could have legs.<p>Here’s the pitfalls I see you falling into:<p>(1) seriously, what data are you collecting? “Everything” isn’t a great answer (who’s supposed to use ‘everything’, anyway? “Anyone”?). “Apples-to-apples police misconduct statistics” is a good one.<p>(2) it’s important to clarify 1 because you need to know who you’re serving, and why. Different activists need different data. “Have all data” sounds good until you need to decide how to allocate your resources.<p>(3) more deeply, data is the land of edge cases. Even just with police misconduct, you need to get DEEP to rigorously compare seemingly-simple stats like “# of unjustified police killings”. If you don’t start narrow, you’ll never show value. If you don’t show value, nobody will ever care you exist.<p>When I look at the data you’ve collected, it ranges from annual reports, to municipal contact info, to crime stats. What’s important to collect at scale? To whom? What do they need it for?<p>Again - great, ambitious idea! But $250k goes fast. Show value before it runs out!
评论 #32906758 未加载
评论 #32906122 未加载
xenadu02over 2 years ago
You might try defining what the &quot;ideal&quot; department&#x27;s data would look like: what categories of data, what columns each record has, what the values are for each, etc. Ideally you&#x27;d stamp it with a year and give it a spiffy name so it could be the National Police Data Reporting Standard 2022 (NPDRS.2022) or something.<p>Departments that are trying to be transparent (or who just don&#x27;t want to deal with figuring it all out from scratch) may be happy to adopt something considered a &quot;standard&quot; for tracking and reporting data. In some cases it means it is a checkbox they can check without having to deal with annoying people and their annoying questions... but that hardly matters so long as the data is made available. It would also give companies developing software for police departments a target to aim for.
评论 #32917216 未加载
评论 #32917281 未加载
评论 #32912307 未加载
debacleover 2 years ago
This is important. Locally, we had a sheriff who was being heavily, heavily criticized due to several deaths at the county facility. This was at the height of the protests a few years ago.<p>It was a lot of work to find data on policing nationwide, because the question really was &quot;Is the sheriff doing a bad job, or do bad things happen sometimes?&quot;<p>After some hard work trying to identify other cities with similar socioeconomic circumstances and populations, it became clear that our local sheriff was actually better than average, and that much of the outrage was fabricated.<p>That&#x27;s also when I learned that many people don&#x27;t want to listen to statistics unless they agree with their own preconceptions.
评论 #32905439 未加载
评论 #32905690 未加载
评论 #32906770 未加载
评论 #32905186 未加载
Pelerinover 2 years ago
Thank you for your work with this! One question I have:<p>You say in your FAQ &quot;We aren&#x27;t a watchdog—our activism is data collection and accessibility, not analysis or research.&quot;<p>Can you note any instances of other people using your data for analysis or research?
评论 #32904613 未加载
Jcowellover 2 years ago
Post like this are interesting because as an idea you would think that HN would the best target. Even if no one here provides a a single character of code they can provide insight Into pitfalls and experiences they’ve run into when doing this sort of thing. I hope the comment section are fortuitous in advice.
评论 #32903987 未加载
josh-pdapover 2 years ago
Hello! I&#x27;m the executive director. I have a design background, have done product management in the past, and aside from keeping the lights on at PDAP and making sure we&#x27;re tax-compliant I am in a product role. I talk to people using police data, and figure out where we can add value to make the data more accessible.<p>TL:DR; If you want to write scrapers: go for it! Run your scraper, share the results in Discord and with your friends, and talk about the process. We&#x27;ll be listening, and it will help us build tools to support this important work.<p>A few things to clarify:<p>a. The source of truth for &quot;what are we doing right now&quot; and &quot;how can I contribute&quot; is <a href="https:&#x2F;&#x2F;docs.pdap.io&#x2F;" rel="nofollow">https:&#x2F;&#x2F;docs.pdap.io&#x2F;</a>.<p>b. Empowering people who write scrapers is a part of our broad mission of &quot;police data accessibility&quot;, but we have some foundational work to do first! Right now our primary project is creating a database of police agencies and data sources. This will help people understand what kinds of data are available, at which agencies, with which steps to access it. It will also help us create archives of the primary sources, so that if they get taken offline we can still go back and scrape them.<p>c. What we have realized in the past few years: there are already a ton of people writing and using web scrapers for their day to day work. They are as decentralized as our police system. Our scrapers repo will reflect that. We shouldn&#x27;t all rely on one library, or even one language. The people who need the data are most motivated to maintain scrapers, and we expect that maintenance will be ad-hoc and as-needed for the immediate future. In most cases, data already published on the internet is useful to local users as-is.<p>d. If you have a question you&#x27;d like to answer about the police, here&#x27;s the investigation process:<p>1. Determine whether public data exists to answer your question. Use google to find the appropriate agency, and see what they&#x27;re publishing. 2. Determine how it can be accessed; do you need to make a FOIA request? Is there a URL? 3. If there&#x27;s a URL, determine whether you need to write a scraper to access the records. Often, the records can simply be downloaded. 4. Write and run a scraper, if you need one! 5. If there&#x27;s not a URL, make a records request for the public information. This is a long and complicated process. 6. Share the data with your friends.<p>This means that scrapers are helpful and necessary some of the time; but not always, and not as the first step. We&#x27;re trying to help with steps 1, 2, 3, 5, and 6. The theory is that writing scrapers is something people can easily slot in and help with; and that, depending on what question you&#x27;re trying to answer, two scrapers for the same data source might look wildly different.<p>Scrapers are an important part of the ecosystem, but they&#x27;re one piece of the puzzle.
评论 #32904791 未加载
评论 #32904416 未加载
ben174over 2 years ago
a while back I created www.bartcrimes.com to publish police reports which were intentionally hidden behind a mailing list you must get approved to be a member of. Turns out, the public loves this kind of thing.
评论 #32903956 未加载
评论 #32904539 未加载
vgeekover 2 years ago
Of all news outlets you&#x27;d never expect, USA Today did a good amount of FOIA requests and made them searchable at <a href="https:&#x2F;&#x2F;www.usatoday.com&#x2F;in-depth&#x2F;news&#x2F;investigations&#x2F;2019&#x2F;04&#x2F;24&#x2F;biggest-collection-police-accountability-records-ever-assembled&#x2F;2299127002&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.usatoday.com&#x2F;in-depth&#x2F;news&#x2F;investigations&#x2F;2019&#x2F;0...</a><p>There are other sources regarding Brady lists like <a href="https:&#x2F;&#x2F;giglio-bradylist.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;giglio-bradylist.com&#x2F;</a> and <a href="http:&#x2F;&#x2F;bradycops.org&#x2F;" rel="nofollow">http:&#x2F;&#x2F;bradycops.org&#x2F;</a>, but they are obviously not 100% complete.
celestialcheeseover 2 years ago
For folks who do this kind of disparate data-source scraping at scale, what does best practices look like? What kind of tools are used in industry?<p>Maintaining scrapers for 18k county websites and PDs is no small task and looking through the docs for PDAP, it seems like this is still a very open question.
评论 #32906410 未加载
ALittleLightover 2 years ago
I like writing web scrapers and this is an interesting project idea. If I understand right you are looking for volunteers to write scrapers that would take a police department, scrape the PD website, and download any PDFs or documents that gather data about the police department. Is that right? If so, I feel that&#x27;s not super clearly communicated - I had to look at a couple example scrapers before arriving at this guess.<p>I do have a few questions too:<p>1. Will this scale? One problem with scrapers is that they break when people update their website. I&#x27;m imagining this problem multiplied by 18,000 and compounded by each scraper potentially being written by a different volunteer.<p>2. Where are the scrapers getting run?<p>3. How do the documents that the scrapers collect get transformed into usable data?<p>4. It seems to me like a scalable solution would be a standard to report data, a law to compel police departments to follow that standard, and then a system to collect that data and make it available. Do you work with police departments at all on data reporting?
评论 #32904497 未加载
liamtuohyffover 2 years ago
I was an early helper when I saw that on reddit and joined your slack before you had a discord. I was also one of the ones you mentioned that fizzled out after the initial excitement died down. But I didn&#x27;t stop helping because the excitement died down. I stopped helping because I felt like we weren&#x27;t &quot;doing&quot; anything. Other than raising money and getting paperwork in order. Have you guys actually &quot;done&quot; anything in the three years since? Other than, you know, collecting data and sitting around talking about &quot;stuff&quot;
评论 #32919558 未加载
评论 #32919675 未加载
elicashover 2 years ago
I&#x27;d be interested in helping scrape, but no experience. I&#x27;d presume every county is different so there&#x27;s no simple training you can put folks through? Other tasks, like monitoring for things breaking?
评论 #32906936 未加载
account-5over 2 years ago
Apologies for my ignorance but how is this going to police the police? I read the original blog post, there was lots of inferences&#x2F;could and might be&#x27;s&#x2F;etc made but little in the way of proof of anything. What&#x27;s to stop the police saying it was just circumstance that provided your results?<p>I&#x27;m not here defending the police, or denigrating the project, just playing devils advocate. What happens if the police just ignore you?
评论 #32905012 未加载
评论 #32904920 未加载
评论 #32905314 未加载
VWWHFSfQover 2 years ago
Is it possible to see the data the PDAP has scraped? I visited the website but I don&#x27;t see any actual data.
评论 #32904570 未加载
评论 #32904655 未加载
contingenciesover 2 years ago
I wonder if it is legal &#x2F; possible to record police radio traffic and associate it with the records?
评论 #32904389 未加载
评论 #32905515 未加载
评论 #32904258 未加载
评论 #32904144 未加载
bckrover 2 years ago
I&#x27;m curious if there are opportunities to be a force multiplier here. I see that the Readme says &quot;there&#x27;s no automated scraper farm&quot; yet. Getting that set up seems crucial. Will jump on the Discord :)
评论 #32903708 未加载
KennyBlankenover 2 years ago
Are you also working on pushing standards for data sources, such as a state-level standard? Ideally federal standards?<p>Maintaining thousands of scrapers for different formats seems like a nightmare, and it won&#x27;t take long for departments to learn they can slightly tweak the format of their reporting to cause extra work for you.<p>On the plus side, working with all this data probably makes you all very qualified to advise on developing standards.
评论 #32906987 未加载
meteor333over 2 years ago
Thanks for sharing about your project!<p>Do you mind giving us brief on what kind of data you are collecting and highlight any interesting findings so far?
motohagiographyover 2 years ago
On the back end, are you using a graph? Having done some public sector accountability stuff where the org structures themselves were obfuscated, graphs and a clear data model were the decisive tech.
评论 #32905021 未加载
tmalyover 2 years ago
Is there anything like this for regulatory capture in federal and state governments?<p>I could imagine a revolving door between people working in the regulatory bodies and the industry they regulate.
评论 #32906808 未加载
lazyasciiartover 2 years ago
Very interesting. I have written scrapers for the jail inmate data in the couple of counties nearest me - does that come under the scope of what you&#x27;re doing, or not quite?
评论 #32905989 未加载
评论 #32907002 未加载
fabkover 2 years ago
Sounds like one of the data bounties from DoltHub.com. Just thought I should drop this link. I am not affiliated with them.
enviclashover 2 years ago
I would like to research on the data, is it available as a source? (Email in profile)
RobertRobertsover 2 years ago
Is their any data sources we could scrape to stop crimes in our neighborhoods so the police don&#x27;t have any reason to come around and cause problems?
评论 #32904442 未加载
评论 #32904251 未加载
评论 #32904372 未加载
RickJWagnerover 2 years ago
Also relevant:<p>So far this year, 177 LEO officers have died in the line of duty. Our gratitude should go to all.<p><a href="https:&#x2F;&#x2F;www.odmp.org&#x2F;search&#x2F;year&#x2F;2022" rel="nofollow">https:&#x2F;&#x2F;www.odmp.org&#x2F;search&#x2F;year&#x2F;2022</a>
评论 #32906306 未加载
评论 #32909713 未加载
评论 #32907054 未加载