科技回声

5 条评论

ubutler超过 1 年前

Hey HN,Last month, I had the honour of seeing my article on how I built the largest open database of Australian law (<a href="https://umarbutler.com/how-i-built-the-largest-open-database-of-australian-law/" rel="nofollow noreferrer">https://umarbutler.com/how-i-built-the-largest-open-database...</a>) reach the front page. Due in large part to the outpouring of support and encouragement I received for my work from HN, I became determined to publish the first open LLM for Australian law by training a model on my database. I am excited to share that I finally achieved that goal today with the release of Open Australian Legal GPT2, a finetune of GPT2 trained on 37,560 laws and regulations, comprising 635,482,112 tokens, taken from my database.Although it may not be as large as I had originally hoped, I'm still quite proud of the model. It was a struggle to wade through mountains of options trying to find something that worked. And now I have code I can reuse for training any other causal language model and dataset. The model is thus a small but important step towards maturing the legal AI field here in Australia.If you’re interested in playing around with the model, you can find it here on Hugging Face: <a href="https://huggingface.co/umarbutler/open-australian-legal-gpt2" rel="nofollow noreferrer">https://huggingface.co/umarbutler/open-australian-legal-gpt2</a>

评论 #38399383 未加载

gitgud超过 1 年前

Is there a link to try it?This is literally the main application of ML that I’ve been awaiting for years. Making complex legislation and bureaucracy searchable and useful for people with no context.

vermaat超过 1 年前

How do you compare your own trained LLM versus using for example GPT4 + RAG (Vector DB + your Australian Law DB?

评论 #38400970 未加载

Obscurity4340超过 1 年前

What do you think about DevonThink?

RecycledEle超过 1 年前

One if the best current uses cases for LLMs is to point out possible errors in human produced work.I would love to see every small town judge forced to submit a complete recording of the trial, a draft opinion, and what the AI thought of it before submitting their final ruling. All of this should be on the public record.The problem is that, at least in Texas, the justices of the peace (JPs) would refuse to record or erase recordings if evidence that hurts their buddy.

5 条评论

ubutler超过 1 年前

评论 #38399383 未加载

gitgud超过 1 年前

vermaat超过 1 年前

How do you compare your own trained LLM versus using for example GPT4 + RAG (Vector DB + your Australian Law DB?

评论 #38400970 未加载

Obscurity4340超过 1 年前

What do you think about DevonThink?

RecycledEle超过 1 年前

Show HN: I built the first open LLM for Australian law

5 条评论

Show HN: I built the first open LLM for Australian law

5 条评论