TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Lilac: Analyze, structure, and clean unstructured data with AI

2 pointsby nsthoratover 1 year ago

2 comments

nsthoratover 1 year ago
Lilac co-creator here :)<p>Lilac is an open-source tool that enables AI practitioners to see and quantify their datasets.<p>Lilac allows users to:<p>- Browse datasets with unstructured data.<p>- Enrich unstructured fields with structured metadata using Lilac Signals, for instance near-duplicate and personal information detection. Structured metadata allows us to compute statistics, find problematic slices, and eventually measure changes over time.<p>- Create and refine Lilac Concepts which are customizable AI models that can be used to find and score text that matches a concept you may have in your mind.<p>- Download the results of the enrichment for downstream applications.<p>Out of the box, Lilac comes with a set of generally useful Signals and Concepts, however this list is not exhaustive and we will continue to work with the OSS community to continue to add more useful enrichments.<p>Check out the demo on HuggingFace: <a href="https:&#x2F;&#x2F;lilacai-lilac.hf.space&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;lilacai-lilac.hf.space&#x2F;</a> Find us on GitHub: <a href="https:&#x2F;&#x2F;github.com&#x2F;lilacai&#x2F;lilac">https:&#x2F;&#x2F;github.com&#x2F;lilacai&#x2F;lilac</a>
sammcgrailover 1 year ago
I really like the tooltips when you hover over the text. Exploring the imdb database is a useful example.