Hey HN,<p>After months of hard work, I am excited to share the first ever semantic map of Australian law.<p>My map represents the first attempt to map Australian laws, cases and regulations across the Commonwealth, States and Territories semantically, that is, by their underlying meaning.<p>Each point on the map is a unique document in the Open Australian Legal Corpus, the largest open database of Australian law (which, full disclosure, I created). The closer any two points are on the map, the more similar they are in underlying meaning.<p>As I cover in my article, there’s a lot you can learn by mapping Australian law. Some of the most interesting insights to come out of this initiative are that:<p>⦁ Migration, family and substantive criminal law are the most isolated branches of case law on the map;<p>⦁ Migration, family and substantive criminal law are the most distant branches of case law from legislation on the map;<p>⦁ Development law is the closest branch of case law to legislation on the map;<p>⦁ Case law is more of a continuum than a rigidly defined structure and the borders between branches of case law can often be quite porous; and<p>⦁ The map does not reveal any noticeable distinctions between Australian state and federal law, whether it be in style, principles of interpretation or general jurisprudence.<p>If you’re interested in learning more about what the map has to teach us about Australian law or if you’d like to find out how you can create semantic maps of your own, check out the full article on my blog, which provides a detailed analysis of my map and also covers the finer details of how I built it, with code examples offered along the way.
This is great. This sentence struck a chord with me in particular:<p><pre><code> Imagine applying these techniques on the Common Crawl
You would be able to produce a ... map of the internet.
</code></pre>
Making maps of things not usually on maps has been my passion for years. And I made many of them. One of the more popular ones that some of you might know is the Music-Map:<p><a href="https://www.music-map.com" rel="nofollow">https://www.music-map.com</a><p>I have had the urge to make a map of the web for quite a while. Already registered the web-map.com domain for it. I did some experiments, built a custom crawler and an algorithm which finds related websites fast. It showed that the project would be feasible.<p>But I hold back on doing it, because I already run multiple experimental maps and have yet to come up with a business model for "making maps of everything".
> "we can also see that Australian case law is a continuum of sorts"<p>It definitely provides a pretty picture, but just wanted to emphasise the map !== territory addage. The continuum may rather be a function of the projection, chosen similarity metric and so on.<p>That does not mean we cannot learn from the map, but that the actual 'knowledge structure' of the sum of documents may not be a convenient continuum at all.<p>In any case, the way you've documented this project is remarkable, and it does provide a novel view of the Australian legal sphere. Thanks for sharing!
I think visualizing it like this is very strange. I am not a legal expert but I have read a lot of law textbooks.<p>Normally, I’d expect blackletter law to form a somewhat sparse, tentacle-like structure.<p>Case law (or “cases” or “jurisprudence”) is by its nature largely interstitial: it consists of judges “filling in the holes” that are left by any unclear meaning (requiring interpretation) of blackletter law, or in some cases by the absence of such.<p>Having case law and blackletter law form two distinct clusters makes no sense to me: I really think it’s a domain modelling error. It’s what I would expect to see if one applied a text similarity measure naively to some data set, without regard for the domain models.
Amazing work. As someone doing self-funded web dev, how do you find the time to work on this? Is this a resume booster, a product/prototype, or just a passion of love? To say the least this is groundbreaking.<p>I love your technical explanations, even tho I started skimming there. It appears this is all built on modern embedding algorithms, plus traditional ML clustering magic. Now that you have the basic data, have you thought about using full generative models for semantic analysis? Ie “write summaries of this subset of cases and tag them with specific situations or intricacies”, and then do clustering on that? I feel like that’s the natural next computational step, and surely (hopefully?) what the many millions/billions of dollars worth of SWEs that were put to work applying LLMs to case law over the past year in America are up to.<p>The very best projects on here are ones where I’m tempted to ask to collaborate, even though I know I’m already booked up with work through the horizon! I’ll have to console myself with a comment and a very prestigious place in my “inspirations” bookmark folder :)
This is very cool, congratulations.<p>When I was in law school, I sometimes visualized the "common law" as a web of interdependencies. This is a similar visualization, although it doesn't quite capture the dependencies, at least as I have always imagined it.<p>For context, the common law refers to law made by (mostly) appellate judges. Sometimes it's built on top of statutory law (e.g., providing meaning, interpretation, or definition to statutory laws) and sometimes it's completely made up, when there is no law "on point." It's made up in the sense that it's constructed on top of a long trail of historical precedent, sometimes going all the way back to Victorian-era England or even older. Really.<p>(Aside: This is why certain individuals sound so silly when they rail against "judge-made law" in the US. Virtually all law in the US is "judge-made law.")<p>Anyway, the common law has always seemed to me to be amenable to representation as a graph-like structure where nodes are cases or precedents and the edges somehow encode the strength of the support for the precedent. I think judges might think twice about breaking from precedent (which can be virtuous or not, depending on your viewpoint) if they could see a visualization of how strong the precedent is.<p>This representation is a step in that direction and I hope your tech can be extended to other common law countries!
I've been dealing with some matters in the Australian legal system, for a long while self represented and self taught but recently with a solicitor. I've read a number of acts for myself, procedural civil and criminal, and have even run into the invisible wall between legislation and case law.<p>This has been shockingly pertinent to my interests and I thank you for compiling it. My only gripe is that you didn't post it several months prior when it would have been most helpful to me ;)
Really nice writeup, I appreciate the work you've put into that in both the descriptive analysis of the data and the technical breakdown of the process.
Last year, I had a similar idea to "map out" case law and legislation in the UK — as usual, though, life got in the way and it's ended up joining my vast collection of half-finished projects. Having read your excellent writeup, I'm now feeling rather inspired to give it another try! :)
I’ve noticed in many commonwealth countries there is no official codification of case law, administrative law, and statutory law passed by the legislative body and receiving assent from the executive branch.<p>The US being a hard fork of the commonwealth has the official US code and state codes—attempts to organize impacts of case law, admin law, passed law, etc—but Canada has pockets of codification (the Criminal Code), but not all acts of Parliament are organized in a single code. The UK as far as I can tell has no such thing in England or Wales. Hong Kong has some semblance of codification with the Basic Law and ordinances. Does Australia have codification at a federal or state level?
This is really awesome, thanks for the work and thanks for sharing.<p>This is a really interesting form of mapping - would you consider doing it for the original occupant's languages, as well?<p>Australian law itself is fascinating - those outliers on the edges of some of the trails are very curious - is this indicating that some of this material is authored, possibly by the same people/groups whose ontology is transferred with each new document?<p>I'd love to see this semantic map for the original occupants languages.<p>It would also be interesting to see Australia's human rights proclamations and related legislature, as well as its military orders and authorizations for involvement in the 5-eyes catastrophe somehow, semantically, in this context.
Would it be correct in saying that a semantic map, clustered by meaning, might be pushing it? If the data are word embeddings, then you'd hope that they have distilled the semantics in the raw text but as you said yourself, they are also heavily influenced by style and who knows what else, to the point that semantically identical but syntactically different texts might have different clusters? Think, if half of the texts were in French, would you keep the same semantic map or would you have a French continent and an English continent?
This is such an interesting use of semantic representation. I wonder if it could be used to map out cases vs outcomes, and determine sentencing outliers.
Mapping the internet as a whole has been a thing for quite a while, going back to Kumar et al in 2000.
<a href="https://scholar.google.com/citations?view_op=view_citation&hl=en&user=J_XhIsgAAAAJ&citation_for_view=J_XhIsgAAAAJ:ZeXyd9-uunAC" rel="nofollow">https://scholar.google.com/citations?view_op=view_citation&h...</a><p>I recall at least one of those papers characterizing the shape as resembling a bow-tie.<p>This and other early contributions were looking at the link structure of the internet, not textual similarity, though.
This approach could be used to build a global map of AI and/or data privacy legislation and cases that would be potentially very valuable and useful, particularly for startups.
Great job, I intend to reproduce this on a similar dataset I've been collecting!<p>I will say, it would be great to see the color labeling done on domain url alone, to see how much of the topography of the map is driven simply by the different formatting characteristics of the websites you're gathering data from.
"My map represents the first attempt to map Australian laws, cases and regulations across the Commonwealth, States and Territories semantically, that is, by their underlying meaning."<p>I think Jade.io has had a go at this, IIRC. This isn't to detract upon your amazing work though, great stuff.
The problem with Australian law, and I suspect most law, is that the practical problems of the actual system appear to be less about the theory and more about the absence of enforcement, oversight and due process.
Cool stuff, reminds me of “a Canadian payroll dependency chart” <a href="https://news.ycombinator.com/item?id=38843388">https://news.ycombinator.com/item?id=38843388</a>
Most of your work seems over my head, but doesn't the "mammoth" example indicate that by tweaking numbers you can end up getting just about any visual blob you want?
Seems like quite a project. And very useful.<p>Australia is the perfect example of when too many well-meaning people who think they can solve everything with more government power are given too much capability to see their vision through to its logical conclusion. It ends up making most of the problems it tries to solve far worse, and nobody has the guts to pull the plug on the programs that aren't functioning.