TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: AnyModal – Train Your Own Multimodal LLMs

8 pointsby ritabratamaiti6 months ago
I’ve been working on AnyModal, a framework for integrating different data types (like images and audio) with LLMs. Existing tools felt too limited or task-specific, so I wanted something more flexible. AnyModal makes it easy to combine modalities with minimal setup—whether it’s LaTeX OCR, image captioning, or chest X-ray interpretation.<p>You can plug in models like ViT for image inputs, project them into a token space for your LLM, and handle tasks like visual question answering or audio captioning. It’s still a work in progress, so feedback or contributions would be great.<p>GitHub: <a href="https:&#x2F;&#x2F;github.com&#x2F;ritabratamaiti&#x2F;AnyModal">https:&#x2F;&#x2F;github.com&#x2F;ritabratamaiti&#x2F;AnyModal</a>

no comments

no comments