TechEcho

9 comments

kgeistover 1 year ago

Interesting, I also manage an IRC bot with multimodal capability for months now. It's not a real LMM - rather, a combination of 3 models. It uses Llava for images and Whisper for audio. The pipeline is simple: if it finds a URL which looks like an image - it feeds it to Llava (same with audio). Llava's response is injected back to the main LLM (a round robin of Solar 10.7B and Llama 13B) to provide the response in the style of the bot's character (persona) and in the context of the conversation. I run it locally on my RTX 3060 using llama.cpp. Additionally, it's also able to search on Wikipedia, in the news (provided by Yahoo RSS) and can open HTML pages (if it sees a URL which is not an image or audio).Llava is a surprisingly good model for its size. However, what I found is that it often hallucinates "2 people in the background" for many images.I made the bot just to explore how far I can go with local off-the-shelf LLMs, I never thought it could be useful for blind people, interesting. A practical idea I had on my mind was to hook it to a webcam so that if something interesting happens in front of my house, I can be notified by the bot, for example. I guess it could also be useful for blind people if the camera is mounted on the body.

评论 #39213214 未加载

评论 #39216224 未加载

codeofduskover 1 year ago

I'm also totally blind and, somewhat relatedly, I've built Gptcmd, a small console app to ease GPT conversation and experimentation (see the readme for more on what it does, with inline demo). Version 2.0 will get GPT vision (image) support:<a href="https://github.com/codeofdusk/gptcmd">https://github.com/codeofdusk/gptcmd</a>

simonwover 1 year ago

I had an interesting conversation the other day about how best to make ChatGPT style "streaming" interfaces accessible to screenreaders, where text updates as it streams in.It's not easy! <a href="https://fedi.simonwillison.net/@simon/111836275974119220" rel="nofollow">https://fedi.simonwillison.net/@simon/111836275974119220</a>

评论 #39213142 未加载

jpsouthover 1 year ago

Hey! I don’t understand too much about AI/ML/LLMs (and now LMMs!) so hoping someone could explain a little further for me?What I gather is this is an IRC bot/plugin/add-on that will allow a user to prompt an ‘LMM’ which is essentially an LLM with multiple output capabilities (text, audio, images etc) which on the surface sounds awesome.How does an LMM benefit blind users over an LLM with voice capability? Is the addition of image/video just for accessibility to none-blind people?What’s the difference between this and integrating an LLM with voice/image/video capability?Is there any reason that this has been made over other available uncensored/free/local LLMs (aside from this being an LMM)?Thanks in advance.

评论 #39210597 未加载

评论 #39210520 未加载

th0ma5over 1 year ago

Since there's no way to truly objectively tell if LLM output is correct, this seems like it would have its limits, even if it seems subjectively good, but I have that problem with all of the LLM stuff I guess.

DustinBrettover 1 year ago

You could run an LLM in the browser with WebLLM and then connect to IRC via WebSockets using something like KiwiIRC. Fully client side AI on IRC.

BMSRover 1 year ago

Blind hackers really impress me. I also have an ai bot on irc but it uses openai. Which is fast, almost instant, but less impressive.

nathiasover 1 year ago

I've been waiting 25 years for this

xpeover 1 year ago

If you didn't know... LMM = Large Multimodal Models

评论 #39226048 未加载

评论 #39211009 未加载

9 comments

kgeistover 1 year ago

评论 #39213214 未加载

评论 #39216224 未加载

codeofduskover 1 year ago

simonwover 1 year ago

评论 #39213142 未加载

jpsouthover 1 year ago

评论 #39210597 未加载

评论 #39210520 未加载

th0ma5over 1 year ago

DustinBrettover 1 year ago

You could run an LLM in the browser with WebLLM and then connect to IRC via WebSockets using something like KiwiIRC. Fully client side AI on IRC.

BMSRover 1 year ago

Blind hackers really impress me. I also have an ai bot on irc but it uses openai. Which is fast, almost instant, but less impressive.

nathiasover 1 year ago

I've been waiting 25 years for this

xpeover 1 year ago

If you didn't know... LMM = Large Multimodal Models

评论 #39226048 未加载

评论 #39211009 未加载

Show HN: Some blind hackers are bridging IRC to LMMs running locally

9 comments

Show HN: Some blind hackers are bridging IRC to LMMs running locally

9 comments