Like NULL, confusion over EOF is a problem which can be eliminated via algebraic types.<p>What if instead of a char, getchar() returned an Option<char>? Then you can pattern match, something like this Rust/C mashup:<p><pre><code> match getchar() {
Some(c) => putchar(c),
None => break,
}
</code></pre>
Magical sentinels crammed into return values — like EOF returned by getchar() or -1 returned by ftell() or NULL returned by malloc() — are one of C's drawbacks.
This is very well explained in the classic book <i>The UNIX Programming Environment</i>, by Kernighan and Pike, in page 44:<p><i>Programs retrieve the data in a file by a system call ... called read. Each time read is called, it returns the next part of a file ... read also says how many bytes of the file were returned, so end of file is assumed when a read says "zero bytes are being returned" ... Actually, it makes sense not to represent end of file by a special byte value, because, as we said earlier, the meaning of the bytes depends on the interpretation of the file. But all files must end, and since all files must be accessed through read, returning zero is an interpretation-independent way to represent the end of a file without introducing a new special character.</i><p>Read what follows in the book if you want to understand Ctrl-D down cold.
In the beginning, there was the int. In K&R C, before function prototypes, all functions returned "int". ("float" and "double" were kludged in, without checking, at some point.)
So the character I/O functions returned a 16-bit signed int. There was no way to return a byte, or a "char". That allowed room for out of band signals such as EOF.<p>It's an artifact of that era. Along with "BREAK", which isn't a character either.
Seems like the confusion arises because getchar() (or its equivalent in langauges other than c) can produce an out-of-band result, EOF, which is not a character.<p>Procedural programmers don't generally have a problem with this -- getchar() returns an int, after all, so of course it can return non-characters, and did you know that IEEE-754 floating point can represent a "negative zero" that you can use for an error code in functions that return float or double?<p>Functional programmers worry about this much more, and I got a bit of an education a couple of years ago when I dabbled in Haskell, where I engaged with the issue of what to do when a nominally-pure function gets an error.<p>I'm not sure I really <i>got</i> it, but I started thinking a lot more clearly about some programming concepts.
CP/M and DOS use ^Z (0x1A) as an EOF indicator. More modern operating systems use the file length (if available). Unix/Linux will treat ^D (0x04) as EOF within a stream, but only if the source is "cooked" and not "raw". (^D is ASCII "End Of Transmission or EOT" so that seems appropriate, except in the world of unicode.)
The kernel returns EOF "if k is the current file position and m is the size of a file, performing a read() when k >= m..."<p>So, is the length of each file stored as an integer, along with the other metadata? This reminds me of how in JavaScript the length of an array is a property, instead of a function that counts it right then, like say in PHP.<p>Apparently it works. I've never heard of a situation where the file size number did not match the actual file size, nor of a time when the JavaScript array length got messed up. But it seems fragile. File operations would need to be ACID-compliant, like database operations (and likewise do JavaScript array operations). It seems like you would have to guard against race conditions.<p>Does anyone have a favorite resource that explains how such things are implemented safely?
Of course it isn't, you couldn't have arbitrary binary files if one of the 256 possible bytes was reserved.
That's why getchar returns int and not char; one char wouldn't be enough for 257 possible values (256 possible char values + eof).
Recently (though mine was the only comment): <a href="https://news.ycombinator.com/item?id=22461647" rel="nofollow">https://news.ycombinator.com/item?id=22461647</a>
I find it interesting that Rust's `Read` API for `read_to_end` [1] states that it "Read[s] all bytes until EOF in this source, placing them into buf", and stops on conditions of either `Ok(0)` or various kinds of `ErrorKind`s, including `UnexpectedEof`, which should probably never be the case.<p>[1]: <a href="https://doc.rust-lang.org/std/io/trait.Read.html#method.read_to_end" rel="nofollow">https://doc.rust-lang.org/std/io/trait.Read.html#method.read...</a>
Banged my head against the wall once after trying to figure out why Ctrl+D generates some character in bash but I can't send that character in a pipe to simulate termination.
Um, no, you can't use Python to infer that "EOF (as seen in C programs) is not a character".<p>The exception even tells you that "chr() arg not in range(0x110000)" which has nothing to do with range of C's character types.
For me EOF is a boolean state. Either I am at the end of file (stream / memory mapped etc) or not. That's how I was taught when I started programming. Never occurred to me to think of it like a character.
This strikes me as the sort of pedantic and "I'm witty" click bait that occasionally percolates upwards on HN, especially considering the specifics of "EOF" are very much contingent on operating context.
\r \n (0x0a 0x0d, or just one of them, or the combination of them, depending on your OS) is EOL<p>^D (0x04) is EOT and 0x03 is EOText: <a href="https://www.systutorials.com/ascii-table-and-ascii-code/" rel="nofollow">https://www.systutorials.com/ascii-table-and-ascii-code/</a><p>So, kinda, but somehow I'm happy it never got turned into a weird combinations depending on the OS.