TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

AMD Unveils Its First Small Language Model AMD-135M

311 pointsby figomore8 months ago

10 comments

diggan8 months ago
&gt; The training code, dataset and weights for this model are open sourced so that developers can reproduce the model and help train other SLMs and LLMs.<p>Wow, an actual open source language model (first of its kind [from a larger company] maybe even?), includes all you need to be able to recreate it from scratch. Thanks AMD!<p>Available under this funky GitHub organization it seems: <a href="https:&#x2F;&#x2F;github.com&#x2F;AMD-AIG-AIMA&#x2F;AMD-LLM">https:&#x2F;&#x2F;github.com&#x2F;AMD-AIG-AIMA&#x2F;AMD-LLM</a>
评论 #41675145 未加载
评论 #41675472 未加载
评论 #41675085 未加载
评论 #41675546 未加载
评论 #41679315 未加载
评论 #41674995 未加载
n_ary8 months ago
Now this here is the beginning on real innovation of AI. With AMD coming in(albeit late and slowly), meta with LLama improving, we will soon see some real adaptation and development in next few thousand days. At this moment, I see OAI as the yahoo of the pre-Google era.
评论 #41677998 未加载
highfrequency8 months ago
Looks like they are using sixteen $13k GPUs [1] (around $210k hardware) for 6 days of training.<p>Anyone know the recommended cloud provider and equivalent rental price?<p>[1] <a href="https:&#x2F;&#x2F;www.wiredzone.com&#x2F;shop&#x2F;product&#x2F;10025451-supermicro-gpu-amdmi250-oam-0029h-graphics-processing-unit-gpu-instinct-mi250-128gb-hbm2e-amd-100-300000029h-10725?srsltid=AfmBOoqud_fxtvuGVSjxigEx4DSMbozywAE5-dI9GfBsBWNDzLE9-wN2" rel="nofollow">https:&#x2F;&#x2F;www.wiredzone.com&#x2F;shop&#x2F;product&#x2F;10025451-supermicro-g...</a>
评论 #41676518 未加载
评论 #41676489 未加载
评论 #41676115 未加载
benterix8 months ago
I&#x27;m happy to see a truly open source model.<p>Actually, AMD has excellent reasons to make this kind of development and I hope they continue.
luyu_wu8 months ago
The section on speculative execution is interesting. &quot;This approach allows each forward pass to generate multiple tokens without compromising performance, thereby significantly reducing memory access consumption, and enabling several orders of magnitude speed improvements.&quot;<p>Does anyone know if the &quot;several orders of magnitude speed improvement&quot; is accurate? I&#x27;m doubtful.<p>Very interesting though! I&#x27;ll be playing around with this on the weekend!
评论 #41676369 未加载
Decabytes8 months ago
Since most people can’t run these LLMs locally, I wonder what a model would look like where we have hyper tuned models for specific purposes, IE a model for code, a model for prose, etc. you have a director model that interprets what downstream model should be used and then it runs that. That way you can run the model locally, without needing beefy GPUs. It’s a trade off of using more disk space vs needing more vram
评论 #41676149 未加载
评论 #41676644 未加载
评论 #41676326 未加载
评论 #41676717 未加载
craftkiller8 months ago
I see multiple mentions of NPU on this page, but its still not clear to me: is this something that can finally use the NPU on my processor?
评论 #41676437 未加载
loufe8 months ago
It&#x27;s always encouraging to see wider hardware platform competition for AI inference and training. Access to affordable and capable hardware for consumers will only benefit (I imagine) from increasing competition.
bjt123458 months ago
&gt; [1] The training code for AMD-135M is based on TinyLlama, utilizing multi-node distributed training with PyTorch FSDP.<p>I thought PyTorch didn&#x27;t work well with AMD architecture, and read of many people using JAX instead?
rsolva8 months ago
Can this model run on ollama?
评论 #41676548 未加载
评论 #41679813 未加载