TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: EmuBert – the first open encoder model for Australian law

2 点作者 ubutler12 个月前
Hey HN, I&#x27;m excited to share one of my most ambitious projects yet, EmuBert.<p>EmuBert is the largest and <i>most accurate</i> open-source masked language model for Australian law.<p>Trained on 180,000 laws, regulations and decisions across six Australian jurisdictions, totalling 1.4 billion tokens, taken from the Open Australian Legal Corpus, the largest open-source database of Australian law, EmuBert is well suited for tasks like: ⦁ Text classification; ⦁ Name extraction; ⦁ Question answering; ⦁ Text similarity; ⦁ Semantic search; and ⦁ Text embedding.<p>Not only that but, despite only being trained to guess missing words, EmuBert seems to know facts such as that Norfolk Island is an Australian territory (try the prompt, &#x27;Norfolk Island is an Australian &lt;mask&gt;.&#x27;), it is Section 51 of the Constitution that grants Parliament the power to make laws for the peace, order, and good government of the Commonwealth (&#x27;Section &lt;mask&gt; of the Constitution grants the Australian Parliament the power to make laws for the peace, order, and good government of the Commonwealth.&#x27;), and that the representative of the monarch of Australia is the Governor-General (&#x27;The representative of the monarch of Australia is the &lt;mask&gt;-General.&#x27;).<p>Finally, EmuBert achieves a perplexity of 2.05 on the Open Australian Legal QA, the first open dataset of Australian legal questions and answers, outperforming all known state-of-the-art masked language models, including Roberta, Bert and Legal-Bert.<p>You can check out EmuBert on Hugging Face here: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;umarbutler&#x2F;emubert" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;umarbutler&#x2F;emubert</a><p>The code I used to create EmuBert is also openly available on GitHub: <a href="https:&#x2F;&#x2F;github.com&#x2F;umarbutler&#x2F;emubert-creator">https:&#x2F;&#x2F;github.com&#x2F;umarbutler&#x2F;emubert-creator</a>

暂无评论

暂无评论