hey HN,<p>We're all NLP & CV engineers, and just launched what seems to be the first video embedding model: <a href="https://learn.mixpeek.com/vuse-v1-release/" rel="nofollow">https://learn.mixpeek.com/vuse-v1-release/</a><p>here are some example semantic queries:<p><a href="https://mixpeek.com/video?q=breaking+the+ice" rel="nofollow">https://mixpeek.com/video?q=breaking+the+ice</a>
<a href="https://mixpeek.com/video?q=human+connection" rel="nofollow">https://mixpeek.com/video?q=human+connection</a><p>as you can see, these are not literal queries they're colloquialisms. which can't be done with simple transcription/object detection models or even CLIP.<p>we deploy in our customers' VPC, and give you full control of the embeddings.