playing with one of the "llamafile" llms last night, one of them told me "there's been 3 cases impacting the 3rd amendment" and then listed <i>4</i> made up references.<p>The "Law is copyrwrit by us" jerks and the "we could use better search" users of that data are in for a fun confrontation very shortly here.<p>There's masses of legal data that isn't copyrighted; the Federal register and Congressional Record are available in lovely formats and the data goes back a ways.<p>I'm waiting for people to start experimenting with the text mining and concordance kind of things LLM techniques can do on those data sets. There's already a market for that kind of thing, but those established systems are dinosaurs that will crumble with a push.<p>cf "Managing Gigabytes" and trigraph indexing. Nifty tricks but decidedly less than optimal.