TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: An annotation tool for ML and NLP

76 pointsby neiman1almost 4 years ago

9 comments

neiman1almost 4 years ago
Hey HN!<p>Markup is an open-source annotation tool for transforming unstructured documents into a structured format that can be used for ML, NLP, etc.<p>Markup learns as you annotate in order to speed up the process by suggesting complex annotations to you.<p>There are also a few different in-built tools, including:<p>- A data generator that helps you to produce synthetic data for training the suggestion model<p>- An annotator diff tool that helps you to compare annotations produced by multiple annotators<p>It&#x27;s still very much a work in progress (and the documentation is severely lacking), but the ultimate goal is to make a tool that&#x27;s as useful as <a href="https:&#x2F;&#x2F;prodi.gy&#x2F;" rel="nofollow">https:&#x2F;&#x2F;prodi.gy&#x2F;</a>, without the $400 price tag.
hadsedalmost 4 years ago
Beautiful. So many annotation tools focus on &quot;text classification&quot; which assumes you&#x27;ve already got segmented samples. In the real world of documents that&#x27;s a whole challenge in itself.<p>Another challenge is that sometimes you&#x27;re working with PDFs and that means not only ingesting but also displaying. The difficulty is in keeping track of annotations and predictions across the PDF&lt;-&gt;text string boundary, both ways.<p>There are understandably even fewer solutions to that problem because it&#x27;s a harder UI to build.
评论 #27562436 未加载
评论 #27562478 未加载
kwerkalmost 4 years ago
This looks incredible! I’ve been following doccano for awhile but they were still working on active learning. Will you be adding an open source license like MIT?
评论 #27561790 未加载
Delkalmost 4 years ago
Looks like an interesting project. Would you have some kind of a summary of the methodology you&#x27;re using for the annotation suggestions? What kind of learning, and which kinds of features?
评论 #27563811 未加载
评论 #27563192 未加载
forgingaheadalmost 4 years ago
Really nice tool - thanks for making this! What is your plan for this? Is this a side-project that you&#x27;ll potentially turn into a business, or is this just a hobby on the side of your full-time job?<p>Just asking because I think many folks would be happy to pay to support a small ISV to ensure it&#x27;s long-term sustainability. Not via donations, but actual pricing.
评论 #27562997 未加载
hbcondo714almost 4 years ago
&gt; Document to annotate - The document you intend to annotate (must be .txt file)<p>Any thoughts on supporting additional file formats? I&#x27;m actually interested in annotating HTML files &#x2F; web pages. It would be great if I could browse for a local HTML file or enter in a URL and the HTML content would be rendered for it to be annotated using the entities.
评论 #27564400 未加载
jclosalmost 4 years ago
That&#x27;s fantastic. I was about to start a project in October building something that&#x27;s almost completely there already, for a specific use case (annotation of therapy sessions).
评论 #27570530 未加载
rubatugaalmost 4 years ago
What are some of your competitors, as well as any other open-source alternatives? What makes your tool better?
评论 #27565377 未加载
slava_kiosealmost 4 years ago
Amazing! So many tools, it&#x27;s very useful. Thanks.