Hey HN,<p>Last month, I had the honour of seeing my article on how I built the largest open database of Australian law (<a href="https://umarbutler.com/how-i-built-the-largest-open-database-of-australian-law/" rel="nofollow noreferrer">https://umarbutler.com/how-i-built-the-largest-open-database...</a>) reach the front page. Due in large part to the outpouring of support and encouragement I received for my work from HN, I became determined to publish the first open LLM for Australian law by training a model on my database. I am excited to share that I finally achieved that goal today with the release of Open Australian Legal GPT2, a finetune of GPT2 trained on 37,560 laws and regulations, comprising 635,482,112 tokens, taken from my database.<p>Although it may not be as large as I had originally hoped, I'm still quite proud of the model. It was a struggle to wade through mountains of options trying to find something that worked. And now I have code I can reuse for training any other causal language model and dataset. The model is thus a small but important step towards maturing the legal AI field here in Australia.<p>If you’re interested in playing around with the model, you can find it here on Hugging Face: <a href="https://huggingface.co/umarbutler/open-australian-legal-gpt2" rel="nofollow noreferrer">https://huggingface.co/umarbutler/open-australian-legal-gpt2</a>