TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Trump Transcripts Full RAG Chatbot

4 pointsby Beefin12 months ago

3 comments

spdustin12 months ago
The PDFs didn&#x27;t have headers&#x2F;footers&#x2F;line numbers chopped off, which makes chunking tough and embeddings less effective. I preprocessed the text a lot before even beginning to generate embeddings in my own pet project. Then metadata extraction (speaker tags, NER from transcript including dates and GPEs, etc.) to add a rudimentary feature set that could be presented and matched by a GPT3.5 call upon the initial query in order to trim down the vector search area.<p>I feel like a lot was left on the table here. 360 pages of <i>court transcripts</i> (large font, generous margins and spacing, etc.) isn&#x27;t massive at all, and it seems like there was too much reliance on the embeddings and the LLM to magically figure it all out.
depth1stsearch12 months ago
“Was trump found guilty?”<p>“President Trump is found not guilty based on the provided documents and evidence presented. Based on the documents and statements presented, President Trump is not guilty of the charged crimes. The evidence and lack of proof support a not guilty verdict.”<p>I’m having some doubts about this one
Beefin12 months ago
Trump&#x27;s court transcripts were just released, and they&#x27;re massive—350 MB across 360 pages, packed with information that experts can dive into.<p>To help make sense of it all, we built a Q&amp;A RAG chatbot using the extracted contents.<p>Want to learn how we built it? We used MongoDB&#x27;s Hybrid Search, coupled with Jina AI&#x27;s embedding models and OpenAI chatgpt-4-turbo. All orchestrated in a pre-built Mixpeek pipeline.<p>End-to-end tutorial on how we built it: <a href="https:&#x2F;&#x2F;learn.mixpeek.com&#x2F;trump-court-transcripts-chatbot" rel="nofollow">https:&#x2F;&#x2F;learn.mixpeek.com&#x2F;trump-court-transcripts-chatbot</a><p>Happy searching!