Hey folks! We just built a cost-effective, lightweight way to generate audiovisual summaries for videos.<p>* Process videos up to 12x faster than realtime
* Costs <$0.01 / min of video
* Combines visual and audial components<p>The goal here is not to build a single E2E model but something that could actually be used in production while preserving relatively high quality.<p>You can try it out yourself here: <a href="https://www.sievedata.com/functions/sieve/describe">https://www.sievedata.com/functions/sieve/describe</a>
How we built it: <a href="https://www.sievedata.com/blog/describe-video-summary-beta-launch">https://www.sievedata.com/blog/describe-video-summary-beta-l...</a>
The code: <a href="https://github.com/sieve-community/describe">https://github.com/sieve-community/describe</a>