"Widjosumarajzer" = video summarizer<p>It's just a hodgepodge of prototype scripts, but one that I actually used on a few occasions already. Most of the work is manual, but does seem easily run as "fire and forget" with maybe some ways to correct afterwards.<p>First, I'm using the pyannote for speech recognition: it converts audio to text, while being able to discern speakers: SPEAKER_01, _02, etc. The diarization provides nice timestamps, with resolution down to parts of words, which I later use in the minimal UI to quickly skip around, when a text is selected.<p>Next, I'm running a LLM prompt to identify speakers; so if SPEAKER_02 said to SPEAKER_05 "Hey Greg", it will identify SPEAKER_05 = Greg. I think it was my first time using the mistral 7b and I went "wow" out loud, once it got correct.<p>After that, I fill in the holes manually in speaker names and move on to grouping a bunch of text - in order to summarize. That doesn't seem interesting at a glance, but removing the filler words, which there are a ton of in any presentation or meeting, is a huge help. I do it chunk by chunk. I'm leaning here for the best LLM available and often pick the dolphin finetune of mixtral.<p>Last, I summarize those summarizations and slap that on the front of the google doc.<p>I also insert some relevant screenshots in between chunks (might go with some ffmpeg automatic scene change detection in the future).<p>aaand that's it. A doc, that is searchable easily. So, previously I had a bunch of 30 min. to 90 min. meeting recordings and any attempt at searching required a linear scan of files. Now, with a lot of additional prompt messaging I was able to:<p>- create meeting notes, with especially worthwile "what did I promise to send later" points<p>- this is huge: TALK with the transcript. I paste the whole transcript into the mistral 7b with 32k context and simply ask questions and follow-ups. No more watching or skimming an hour long video, just ask the transcript, if there was another round of lay-offs or if parking spaces rules changed.<p>- draw a mermaid sequence diagram, of a request flowing across services. It wasn't perfect, but it got me super excited about future possibilities to create or update service documentation based on ad-hoc meetings.<p>I guess everybody is actually trying to build the same, seems like a no-brainer based on current tool's capabilities.