Edit: here is a link to a Chatbot of all Huberman Lab podcast episodes. There are about 50 other Chatbots like Lex Fridman, All-in Podcast, Tim Ferriss, etc on our homepage.<p><a href="https://addcontext.xyz/search/huberman-lab" rel="nofollow">https://addcontext.xyz/search/huberman-lab</a><p>Original text:<p>Me and my friend wanted to do a project with all the new LLM tech that has been released and this is what we came up with.<p>A little bit of background on how it works:<p>1. User enters a link to the playlist they want to chat with.
2. Download all videos in playlist and store in GCS.
3. Transcribe all videos using Whisper via replicate.com.
4. Break transcripts into ~300 token snippets and store in MongoDB.
5. Embed each snippet using OpenAI’s new embeddings and store results in Pinecone.
6. When a user asks a question to the Chatbot, we embed the query string via OpenAI and then query Pinecone to get the most relevant snippet ids, then query Mongo for the snippets. We pass these as context into a call to GPT-3 along with a prompt to answer the original query using the context provided.<p>It’s working pretty well so far and we’ve had close to 50k queries run since we launched in December. Lots of inbound leads from people looking for related tech or curious about how this can be applied to other domains. We launched a self-service version that about 30 people have used. Still looking for other monetization strategies. The hardest part of this project was getting the YouTube download to work properly without hitting rate-limits. The actual “AI” work was extremely minimal. If I had to build it again I would use something like langchain.[1]<p>We would love feedback from the HN community! You can reach us directly at (sam | joel)@addcontext.xyz. Thanks!<p>[1] <a href="https://github.com/hwchase17/langchain">https://github.com/hwchase17/langchain</a>