I did a similar kind of process for my own chat logs. I have about 11M tokens worth of logs, and it took 2 days to crunch all of them with ollama and LLaMA 3.1 8B on my MacBook. It's slow, but free.<p>I generated title, summary, keywords and hierarchical topics up to 3 levels up from the original text. My plan for now is to put them in a vector search engine, which, incidentally, was made with Sonnet 3.5 with very little iteration. I want to play around to see how I can organize my ideas with LLMs, make something useful from all that text.<p>I really don't know what I will discover. One small insight I already found is that summarization works really well, you can use summaries instead of full texts to prime Claude and it works better than expected. Unlimited context? Maybe.<p>Another direction of research is to create a nice taxonomy, there are thousands of topics, pretty difficult task, but there must be a way using clustering and LLMs. That is why I generated topic, parent-topic, gp-topic, and ggp-topic from all snippets. I would probably manually edit the top 2 levels of the taxonomy to give it the right focus.<p>I'm also integrating with my HN and reddit feeds. X is too stingy with the API. Maybe Pocket and local downloads folder too, I save/bookmark stuff I like. I could also include all the papers I am reading into the corpus. It could synthesize a ranked feed aligned to my own interests.