Hey HN! I built COSSgpt using videos from the Open Source Founder Podcast [1] and livestreams from COSS Office Hours [2][3]<p>I transcribed the VODs using Whisper and vectorized fixed-size segments from the transcripts with MPNet on Replicate GPUs. I made these segments overlap a little to prevent semantic meaning being lost inbetween segments<p>Then I indexed the vectors using HNSWLib in-memory vectorstore [4] and persisted the entire vectorstore into Tigris object storage [5] to cache multimedia and vectors across all Fly.io regions<p>I built the app in Elixir, almost entirely server-side rendered with minimal diffs sent to the client over WebSockets using Phoenix LiveView. I also used Livebook [6] a ton when I was building the multimedia processing & ML pipeline. I'm super bullish on Elixir for building webapps and/or MLops!<p>Let me know what you think :) If you're curious you can find the code at <a href="https://github.com/algora-io/tv">https://github.com/algora-io/tv</a><p>[1]: <a href="https://algora.io/podcast" rel="nofollow">https://algora.io/podcast</a>
[2]: <a href="https://tv.algora.io/peerrich" rel="nofollow">https://tv.algora.io/peerrich</a>
[3]: <a href="https://tv.algora.io/rfc" rel="nofollow">https://tv.algora.io/rfc</a>
[4]: <a href="https://github.com/nmslib/hnswlib">https://github.com/nmslib/hnswlib</a>
[5]: <a href="https://tigrisdata.com" rel="nofollow">https://tigrisdata.com</a>
[6]: <a href="https://github.com/algora-io/tv/blob/2586950/scripts/cossgpt.livemd">https://github.com/algora-io/tv/blob/2586950/scripts/cossgpt...</a>