At least they link to the data, but of course a lot of the data is copyrighted and graciously made available on the web, so the name "commons" is misleading.<p>People will use the data as if it were "commons" though.
How is this built? What'd be the approach if I'd like to achieve similar results against proprietary data.<p>References article speak of RAG and RIG - but I wonder if they factor into fine-tuning the models. AFAIK, RAG doesn't play nicely with structured data.
Used as grounding by Google's DataGemma model <a href="https://blog.google/technology/ai/google-datagemma-ai-llm/" rel="nofollow">https://blog.google/technology/ai/google-datagemma-ai-llm/</a>