TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Recommendation – Make transcriptions searchable for local knowledgebase

1 点作者 alexdeloy6 个月前
Hello HN,<p>on my NAS I have a script running every night, downloading videos from 25 Youtube channels I deemed interesting to me. The original idea was to have the content offline in case it gets delisted or the channel ceases to exist. With the arrival of AI assisted transcriptions, I added a cronjob running Whisper on all those files over the past months and now have around 10k transcriptions of the same amount of videos on the disks.<p>The next step to me feels like building a knowledgebase of some sort to be able to access all the knowledge hidden in those videos, these range from history content to GDC talks and livestreams of other devs. Ideally I would like so search for topic X and get suitable parts of the video transcription back. Since Whisper also saves the position, jumping into the video to the relevant time would be a bonus. This search ideally not only works on word matching but can also find relevant content via some similarity measure.<p>I have my trusty Information Retrieval Handbook from university days still here and don&#x27;t shy away from writing something on my own, but I was wondering if there is something out that would offer such a functionality already or at least takes a big part of the workload from me.

1 comment

kmgrassi6 个月前
Pretty cool.<p>You could embed the transcripts and then they&#x27;d be &quot;searchable.&quot; <a href="https:&#x2F;&#x2F;platform.openai.com&#x2F;docs&#x2F;guides&#x2F;embeddings&#x2F;embedding-models" rel="nofollow">https:&#x2F;&#x2F;platform.openai.com&#x2F;docs&#x2F;guides&#x2F;embeddings&#x2F;embedding...</a>.<p>I&#x27;ve used Supabase with pgvector to do this. You can even store the position next to the embedding in the db so you could jump to the content in the UI. Ping me at my hn handle at gmail if you want more specifics :).