Founder here!<p>We're still in stealth, but I'll be able to share details and performance figures soon.<p>Our first product is a bet on transformers. If we're right, there's enormous upside - being transformer-specific lets you get an order of magnitude more compute than more flexible accelerators (GPUs, TPUs).<p>We're hiring - if the EV makes sense for you, reach out at gavin @ etched.ai
I am not buying this at all. But I’m not a hardware guy so maybe someone can help with why this is not true:<p>- Crypto hardware needed SHA256 which is basically tons of bitwise operations. That’s way simpler than the tons of matrix ops transformers need.<p>- NVidia wasn’t focused on crypto acceleration as a core competency. There are focussed on this, and are already years down the path.<p>- One of the biggest bottlenecks is memory bandwidth. That is also not cheap or simple to do.<p>- Say they do have a great design. What process are they going to build it on? There are some big customers out there waiting for TMSC space already.<p>Maybe they have IP and it’s more of a patent play.<p>(I mention crypto only as an example of custom hardware competing with a GPU)
Title was a bit of a letdown. I was hoping for a discussion of silicon planar transformers (like, the electrical component), which are of increasing interest in RF ICs. :)
There is a lot going on in the LLM / AI chip space. Most of the big players are focusing on general purpose AI chips, like Cerebras and Untether. This - what I understand to be more like ASICs is an interesting market. They give up flexibility but presumably can make them more cheaply. There is also Positron AI in this space, mentioned here:
<a href="https://news.ycombinator.com/item?id=38601761">https://news.ycombinator.com/item?id=38601761</a><p>I'm only peripherally aware of ASICs for bitcoin mining, I have no idea the economics or cycle times. It would be interesting to see a comparison between bitcoin mining chips and AI.<p>One thing I wonder about is that all of AI is very forward looking, ie anticipating there will be applications to warrant building more infrastructure. It may be a tougher sell to convince someone they need to buy a transformer inference chip <i>now</i> as opposed to something more flexible they'll use in an imagined future.
Where did this come from? There is absolutely nothing clickable except 'contact us' which just reloads the same page? There's almost zero information here?
My comment is about the general idea (LLM transformers on a chip), not particular company, as I have no insight into the latter.<p>Such a chip (with support for LoRA finetuning) would likely be the enabler for the next-gen robotics.<p>Right now, there is a growing corpus of papers and demos that show what's possible, but these demos are often a talk-to-a-datacenter ordeal, which is not suitable for any serious production use: too high latency, too much dependency on the Internet.<p>With a low-latency, cost- and energy-efficient way to run finetuned LLMs locally (and keep finetuning based on the specific robot experience), we can actually make something useful in the real world.
This only tells me we are at peak AI hype, given that products like this have to dress up ASICs as 'Transformers on Chips' or 'Transformer Supercomputer'.<p>As always, no technical reports or in-depth benchmarks other than a unlabelled chart comparing against Nvidia H100s with little context and marketing jargon to the untrained eye.<p>It seems that this would tie you into a specific neural net implementation (i.e llama.cpp as a ASIC) and would have to require a hardware design change to support another.
Isn't this kinda pigeonholing yourself to one neural network architecture? Are we sure that transformers will take us to the promised land? Chip design is a pretty expensive and time consuming process, so if a new architecture comes out that is <i>sufficiently</i> different from the current transformer model wouldn't they have to design a completely new chip? The compute unit design is probably similar from architecture to architecture, so maybe I am misunderstanding...
Could probably go even faster burning GPT-4's weights right into the silicon. No need to even load weights into memory.<p>Granted, that eliminates the ability to update the model. But if you already have a model you like that's not a problem.
Yeah I call BS on this. This does nothing to address the main issues with autoregressive transformer models (memory bandwidth).<p>GPU compute units are mostly sitting idle these days waiting for chip cache to receive data fr VRAM.<p>This does nothing to solve that.
Wow. I wish I could get a computer or VM/VPS with this. Or rent part of one. Use it with quantized models and llama.cpp.<p>Seems like a big part of using these systems effectively is thinking of ways to take advantage of batching. I guess the normal thing is just to handle multiple user's requests simultaneously. But maybe another one could be moving from working with agents to agent swarms.
interesting how MCTS decoding is called out. that seems entirely like a software aspect, which doesn't depend on a particular chip design?<p>and on the topic of MCTS decoding, I've heard lots of smart people suggest it, but I've yet to see any serious implementation of it. it seems like such an obviously good way to select tokens, you'd think it would be standard in vllm, TGI, llama.cpp, etc. But none of them seem to use it. Perhaps people have tried it and it just don't work as well as you would think?
How expensive will this be?<p>100T models on one chip with MCTS search.<p>That is some impressive marketing.<p>I’ll believe it when I see it.<p>Great to see so many hardware startups.<p>Future is deffo accelerated neural nets on hardware.