TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Some blind hackers are bridging IRC to LMMs running locally

201 pointsby blindgeekover 1 year ago

9 comments

kgeistover 1 year ago
Interesting, I also manage an IRC bot with multimodal capability for months now. It&#x27;s not a real LMM - rather, a combination of 3 models. It uses Llava for images and Whisper for audio. The pipeline is simple: if it finds a URL which looks like an image - it feeds it to Llava (same with audio). Llava&#x27;s response is injected back to the main LLM (a round robin of Solar 10.7B and Llama 13B) to provide the response in the style of the bot&#x27;s character (persona) and in the context of the conversation. I run it locally on my RTX 3060 using llama.cpp. Additionally, it&#x27;s also able to search on Wikipedia, in the news (provided by Yahoo RSS) and can open HTML pages (if it sees a URL which is not an image or audio).<p>Llava is a surprisingly good model for its size. However, what I found is that it often hallucinates &quot;2 people in the background&quot; for many images.<p>I made the bot just to explore how far I can go with local off-the-shelf LLMs, I never thought it could be useful for blind people, interesting. A practical idea I had on my mind was to hook it to a webcam so that if something interesting happens in front of my house, I can be notified by the bot, for example. I guess it could also be useful for blind people if the camera is mounted on the body.
评论 #39213214 未加载
评论 #39216224 未加载
codeofduskover 1 year ago
I&#x27;m also totally blind and, somewhat relatedly, I&#x27;ve built Gptcmd, a small console app to ease GPT conversation and experimentation (see the readme for more on what it does, with inline demo). Version 2.0 will get GPT vision (image) support:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;codeofdusk&#x2F;gptcmd">https:&#x2F;&#x2F;github.com&#x2F;codeofdusk&#x2F;gptcmd</a>
simonwover 1 year ago
I had an interesting conversation the other day about how best to make ChatGPT style &quot;streaming&quot; interfaces accessible to screenreaders, where text updates as it streams in.<p>It&#x27;s not easy! <a href="https:&#x2F;&#x2F;fedi.simonwillison.net&#x2F;@simon&#x2F;111836275974119220" rel="nofollow">https:&#x2F;&#x2F;fedi.simonwillison.net&#x2F;@simon&#x2F;111836275974119220</a>
评论 #39213142 未加载
jpsouthover 1 year ago
Hey! I don’t understand too much about AI&#x2F;ML&#x2F;LLMs (and now LMMs!) so hoping someone could explain a little further for me?<p>What I gather is this is an IRC bot&#x2F;plugin&#x2F;add-on that will allow a user to prompt an ‘LMM’ which is essentially an LLM with multiple output capabilities (text, audio, images etc) which on the surface sounds awesome.<p>How does an LMM benefit blind users over an LLM with voice capability? Is the addition of image&#x2F;video just for accessibility to none-blind people?<p>What’s the difference between this and integrating an LLM with voice&#x2F;image&#x2F;video capability?<p>Is there any reason that this has been made over other available uncensored&#x2F;free&#x2F;local LLMs (aside from this being an LMM)?<p>Thanks in advance.
评论 #39210597 未加载
评论 #39210520 未加载
th0ma5over 1 year ago
Since there&#x27;s no way to truly objectively tell if LLM output is correct, this seems like it would have its limits, even if it seems subjectively good, but I have that problem with all of the LLM stuff I guess.
DustinBrettover 1 year ago
You could run an LLM in the browser with WebLLM and then connect to IRC via WebSockets using something like KiwiIRC. Fully client side AI on IRC.
BMSRover 1 year ago
Blind hackers really impress me. I also have an ai bot on irc but it uses openai. Which is fast, almost instant, but less impressive.
nathiasover 1 year ago
I&#x27;ve been waiting 25 years for this
xpeover 1 year ago
If you didn&#x27;t know... LMM = Large Multimodal Models
评论 #39226048 未加载
评论 #39211009 未加载