C Is Not a Low-level Language (2018)

204 pointsby goranmoominover 3 years ago

25 comments

The Cell processor was a very different architecture from x86. It sacrificed cache coherency and required the programmer to manually manage each core's cache, in exchange for state-of-the-art performance. This was all done in C (although FORTRAN compiler was also available, of course). The Cell processor simply introduced new intrinsic functions to the C compiler, to allow the programmer to access the new hardware functionality. It all works perfectly fine with the rest of C, although people felt it was too difficult to program and the architecture quickly went extinct.NVIDIA GPUs are also innovative hardware, mentioned in the article, and CUDA is also just an extension of C. CUDA is wildly popular, and lots of higher level abstractions had been built on top of it. The only thing lower level than CUDA is the NVVM IR code which is generated by the C compiler (eg. LLVM NVVM backend) and is only compiled into final machine code by the GPU driver at run-time. So C is the lowest level.The problem doesn't lie with the language, it lies with the x86 processors and different trade-offs that companies like Intel must make, such as trying to sell processors to developers who have been instructed by their employers to be productive and use a "safe" and high level language (e.g. Java CRUD application developers, or JavaScript web developers, etc).edit: typos

评论 #29712437 未加载

评论 #29711991 未加载

johnklosover 3 years ago

This is a rewording of the original title, which is:C Is Not a Low-level Language Your computer is not a fast PDP-11."Trying to expose a PDP-11" is misleading here, because how it's written in this title suggests that C is not a low level language because it fails at exposing a PDP-11.Rather, the article suggests that modern processors are made in a way to expose abstractions which are similar to a fast PDP-11, which lacks real ways to represent their true complexity.It's an interesting article, but this misleading title might give the wrong idea.

kazinatorover 3 years ago

This article presents a false premise.Intel (and other) processors present a serial programming model in spite of parallelism under the hood because that is required for stable semantics in assembly language programming, and for the instruction set architecture to be a stable target for any higher level language whatsoever. It's not because of the expectations of the C programming model.The unoptimized, instruction by instruction fetch-decode-execute model of the instruction set pins down what the code means, which is super important, otherwise there is chaos.Moreover, machine language executables must continue to work across evolution of the architecture family. The way an new Intel processor is today has as much to do with C compilers as it does with the need for someone out there to run MS-DOS or Windows 95.Compilers could better deal with chaos at the architecture level, because breakages at the architecture level can be treated as a new back-end target, and code can be recompiled. It's the code that doesn't get recompiled that you have to worry about.

评论 #29719311 未加载

评论 #29717294 未加载

Jenssonover 3 years ago

"Low level" is relative. The article says that there doesn't exist any "low level" languages that programmers can use, this isn't a very helpful definition. If we go by the languages that programmers actually have access to, then machine code is as close as you get to the metal, and C maps very well to machine code. So by any reasonable definition C is a low level language for programmers. For a hardware engineer who works on CPU architecture it is of course different, but we are talking from a programmers perspective here.

评论 #29713180 未加载

评论 #29713896 未加载

评论 #29712478 未加载

评论 #29711289 未加载

评论 #29711079 未加载

flohofwoeover 3 years ago

This one caught my eye:> for example, you must be able to compare two structs using a type-oblivious comparison (e.g., memcmp), so a copy of a struct must retain its padding.This definitely doesn't work in the real world because the padding bytes will contain random junk which isn't copied along in some situations, depending on the compiler and optimization level.Also, IMHO what the article calls "low level" might be important for compiler writers, but isn't really all that relevant for most programmers when the comparison is to "high level" languages like Java, C#, Javascript or Python.In my mind, the most important property of a low level langauge is to provide explicit control over how data is layed out in memory, this is usually an afterthought in high-level languages, if possible at all.Or more generally: how much explicit control does the language allow before the programmer hits the "manual optimization wall". In that sense C is fairly high level, especially without non-standard extensions, but still much lower level than most other programming languages. I think there is definitely room for more experimental languages between C and assembly.

评论 #29714532 未加载

veltasover 3 years ago

C's pre/post-increment is from B which was designed for an older arch, not PDP, and B still had the pre/post increment and decrement. It's now a well known misconception that C was based on the PDP's instructions, it doesn't hold water with the facts. C did redesign for one thing on the PDP and that's byte addressing... and we're still using byte addressing most places, and frankly if you don't then C isn't incapable with that.

评论 #29712373 未加载

usrbinbashover 3 years ago

The Problem is not with C alone however.Speculative Execution, branch prediction and look ahead to the next 25 instructions, wasting huge amounts of power,...I mean, what?The article even mentions that there is another way:> In contrast, GPUs achieve very high performance without any of this logic, at the expense of requiring explicitly parallel programs.Yes, C doesn't support that very well. But it could, and other languages, namely Rust, Go & Julia already can. So maybe it's time to do in CPU design what Go did in language design, and hit the brakes on complexity?We don't need smarter processors, we need processors that can do high throughput on many execution units, and languages that support that well.

评论 #29712324 未加载

评论 #29712684 未加载

评论 #29712073 未加载

评论 #29711961 未加载

评论 #29714199 未加载

AceJohnny2over 3 years ago

It's really fun (?) to work with hardware designers, for whom even the registers that the code accesses is considered "high-level".Abstractions, all the way down... even our digital bits are a joke to the analog designers.

dash2over 3 years ago

Previous HN discussions with many comments:<a href="https://news.ycombinator.com/item?id=16967675" rel="nofollow">https://news.ycombinator.com/item?id=16967675</a><a href="https://news.ycombinator.com/item?id=21888096" rel="nofollow">https://news.ycombinator.com/item?id=21888096</a>

kovacs_xover 3 years ago

One should always think of a C as high level Assembler. Not less, not more. Everything else (parallel programming, threading, ...), are "higher" level paradigms, where C "robustness" is more of an obstruction, not a help.Imo, from hw designers perspective C is a language as "high" as it goes, when for a software engineers it's often "as low as it goes".

评论 #29713841 未加载

mghfreudover 3 years ago

I am lost here, the mentioned bugs are a result of optimizations like speculative execution, branch prediction, prefetching etc.These are language independent optimizations. For example, any language (that allow for loop like constructs) compiled to intel machine code and executed on intel processor will be exposed to these bugs, it is not C specific. Am I missing anything?

评论 #29711176 未加载

评论 #29712039 未加载

评论 #29714694 未加载

评论 #29711214 未加载

评论 #29713622 未加载

JoeAltmaierover 3 years ago

Maybe this doesn't matter so much. Remember, the cpu may be re-writing your code and re-ordering many operations. The latest Intel processors can examine your code and write new microcode to do it more efficiently. Heck, even identifying a fault to an instruction in 'your' code has become problematic!

phendrenad2over 3 years ago

"Assembly isn't a Low-Level Language: Your heterogeneous multicore Apple M1 with integrated GPU and multi-tier SRAM cache isn't a 1970s PDP-11 minicomputer, you can't just MOV RAX, EAX and expect it to work, you complete and utter jackwagon!"

nanodanoover 3 years ago

> C Is Not a Low-level Language AnymorePretty sure C was always considered a high-level language.

评论 #29710771 未加载

评论 #29711333 未加载

marcodiegoover 3 years ago

Looks like C being Low-level or not needs a precise or formal definition of low-level language. Without that, it falls down to opinion.

评论 #29713827 未加载

wly_cdgrover 3 years ago

Pretty sure K&R calls C a high level language. Just a matter of perspective, innit: a high level language to assembly programmers, a low level language to .NET/JVM/web programmers, and prob something like a lower-mid-level language to someone looking at the whole tower from a distance

nooberminover 3 years ago

This article is my go to url when arguing on the internet with people who are under the illusion that "C is portable assembler" when it really isn't and their mental model is actually not what a computer actually does today (or has performed in the past few decades).

评论 #29719581 未加载

pmontraover 3 years ago

As hinted by the article LLVM IR [1] is a lower level language and yet it's only intermediate as per the i in IR.And it's true that the actor model makes writing parallel programs easier. I tend to use queues and message passing when I write multi threaded programs in sequential languages like Python or Ruby. That's easier to do in a language like Elixir. Unfortunately when I work with Elixir I'm a little discouraged by all the boilerplate needed to make supervisors and GenServers work. I think there is a lot of room for improvement for a higher level language that makes most of that disappear.[1] <a href="https://llvm.org/docs/LangRef.html#instruction-reference" rel="nofollow">https://llvm.org/docs/LangRef.html#instruction-reference</a>

评论 #29712018 未加载

smlckzover 3 years ago

> In C, a read from an uninitialized variable is an unspecified value and is allowed to be any value each time it is read. This is important, because it allows behavior such as lazy recycling of pages: for example, on FreeBSD the malloc implementation informs the operating system that pages are currently unused, and the operating system uses the first write to a page as the hint that this is no longer true. A read to newly malloced memory may initially read the old value; then the operating system may reuse theunderlying physical page; and then on the next write to a different location in the page replace it with a newly zeroed page. The second read from the same location will then give a zero value.What? What is the benifit of such behaviour?What does other OSes do in this regard?

评论 #29712425 未加载

selimnairbover 3 years ago

Effing great article. I am starting down the path of learning rust and wondering how its mutable-first design and ownership model alleviate some of the problems identified in making C fast on contemporary machines.

protoman3000over 3 years ago

Thanks for this interesting read.Say a platform superseded that C platform the article is describing, allowing for an unleash of power and parallelism in computing, what are the implications for Linux and other operating systems?

评论 #29712571 未加载

jleyankover 3 years ago

I have an old button that says “C combines the power of assembly language with the flexibility of assembly language.” Might have changed now, but it was a humorous yet valid observation then.

zaptheimpalerover 3 years ago

Is there a language that exposes the stuff below assembly? Like assembly language presents a nice sequential ordered view of processors as if they execute instructions 1 by 1, but we know the reality is quite different - each instruction is many micro instructions, they compute data dependencies and parallelize accordingly, pipeline & predict heavily etc. We do so much to fit async data dependent computation into sequential models, only for that to be translated back into async/parallel by the CPU. I wonder what that kind of PL would look like?

评论 #29714646 未加载

Const-meover 3 years ago

> which attempt to hide latencyIt does exceptionally good job at these attempts. If you think manually managed caches are fun, read [1] for an illustration what amount of efforts is required to sum an array for an architecture where on-chip RAM is manually managed. Another interesting case was Cell CPUs in PS3, I don't have hands on experience but I've read that it was equally hard to develop for.> A low-level language for such processors would have native vector types of arbitrary lengths.A low-level language would have native vector types of exactly the same lengths as underlying hardware. "arbitrary" is overkill unless the CPU supports arbitrary-length vectors.Despite not specified as a part of language standard, all modern C and C++ implementations support these things. Specifically, when compiling into AMD64 instructions, the compilers implement native vector types, and vector intrinsics, defined by Intel. Same with NEON, all modern compilers implementing what's written by ARM.> you must be able to compare two structs using a type-oblivious comparison (e.g., memcmp)Using memcmp on structures is not necessarily a great idea, these padding bytes can be random garbage, it's not specified.> with enough high-level parallelism, you can suspend the threads.. The problem with such designs is that C programs tend to have few busy threads.Not just C programs. User input is serial, it can only interact with 1 application at a time. Display output is serial, it delivers a sequence of frames at 60Hz. Web browsers tend to have few busy threads because JavaScript is single threaded, also streaming parsers/decompressors/decryptors are not parallelizable.> ARM's SVE (Scalar Vector Extensions)—and similar work from Berkeley—provides another glimpse at a better interface between program and hardware.Just because it's different does not automatically make it better. The main problem with scalable vectors, it seems to be designed for problems CPUs no longer solving. For massively parallelizable vertical-only FP32 and FP64 math, GPGPU is the way to go, an order of magnitude faster while also being much more power efficient. CPU SIMD is used for more than vertical-only math. One thing is non-vertical operations i.e. shuffles, trivial use case: transpose a 4x4 matrix in 4 registers. Another one is operations on very small vectors, CPUs even have a DPPS instruction for FP32 dot product. For both use cases, scalable vectors make little sense.> a garbage collector becomes a very simple state machine that is trivial to implement in hardwarePeople tried that a few times, first with Lisp, then with Java chips. General-purpose CPUs were better.> Running C code on such a system would be problematic, so, given the large amount of legacy C code in the world, it would not likely be a commercial success.nVidia did precisely that, made a processor designed purely for compute speed. I wouldn't call them a commercial failure.[1] <a href="https://www.nvidia.com/content/GTC-2010/pdfs/2260_GTC2010.pdf" rel="nofollow">https://www.nvidia.com/content/GTC-2010/pdfs/2260_GTC2010.pd...</a>

dunemasterover 3 years ago

This article could have been 1/8th of its length and the message would still be clear. Modern writing is vague, overly detailed and inelegant.

评论 #29713870 未加载