TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: How to execute an 180B+ LLM on a Turing machine?

1 pointsby _ktndabout 1 year ago
Or, equivalently, how would you approach running an 180B+ LLM on a Von Neumann computer with say 1 MB of main memory and virtually limitless secondary storage? Or, do you know about an approach (that you might read somewhere) that might help running a heavy LLM on virtually any Turing equivalent device?<p>Picture this: you&#x27;re stuck with your “potato computer” (small RAM, no external GPU, very large SSD), and your LLM is saved on an external SSD.<p>Your task: run that LLM on your “potato PC” and try to achieve reasonable response times (e.g., 1 h to 24 h). Response times of 1 year, or higher, might be impractical for most use cases.<p>And on a side note, how would you figure out the response times of a language model on low-end devices (e.g., Raspberry Pi, business laptops, MSP430)? Would you just assume some basic operations such as linear algebra operations as a given and estimate the number of steps from there?<p>I expect the usual suspects brought up in this discussion:<p>— Memory Mapped I&#x2F;O aka treating an I&#x2F;O device such as an SSD as if it were actual RAM (mmap). BTW: `mmap` makes our secondary storage somewhat akin to an infinite tape in a Turing machine<p>— “LLM in a flash: Efficient Large Language Model Inference with Limited Memory”, https:&#x2F;&#x2F;arxiv.org&#x2F;html&#x2F;2312.11514v2 (04 Jan 2024)<p>— SSD &quot;wear and tear&quot;

no comments

no comments