EOF is not a character

78 pointsby UkiahSmithabout 5 years ago

19 comments

rectangabout 5 years ago

Like NULL, confusion over EOF is a problem which can be eliminated via algebraic types.What if instead of a char, getchar() returned an Option<char>? Then you can pattern match, something like this Rust/C mashup:<pre><code> match getchar() { Some(c) => putchar(c), None => break, } </code></pre> Magical sentinels crammed into return values — like EOF returned by getchar() or -1 returned by ftell() or NULL returned by malloc() — are one of C's drawbacks.

评论 #22574517 未加载

评论 #22573948 未加载

评论 #22574006 未加载

评论 #22572935 未加载

评论 #22573950 未加载

评论 #22574746 未加载

评论 #22574343 未加载

charlyslabout 5 years ago

This is very well explained in the classic book The UNIX Programming Environment, by Kernighan and Pike, in page 44:Programs retrieve the data in a file by a system call ... called read. Each time read is called, it returns the next part of a file ... read also says how many bytes of the file were returned, so end of file is assumed when a read says "zero bytes are being returned" ... Actually, it makes sense not to represent end of file by a special byte value, because, as we said earlier, the meaning of the bytes depends on the interpretation of the file. But all files must end, and since all files must be accessed through read, returning zero is an interpretation-independent way to represent the end of a file without introducing a new special character.Read what follows in the book if you want to understand Ctrl-D down cold.

Animatsabout 5 years ago

In the beginning, there was the int. In K&R C, before function prototypes, all functions returned "int". ("float" and "double" were kludged in, without checking, at some point.) So the character I/O functions returned a 16-bit signed int. There was no way to return a byte, or a "char". That allowed room for out of band signals such as EOF.It's an artifact of that era. Along with "BREAK", which isn't a character either.

评论 #22579589 未加载

reidacdcabout 5 years ago

Seems like the confusion arises because getchar() (or its equivalent in langauges other than c) can produce an out-of-band result, EOF, which is not a character.Procedural programmers don't generally have a problem with this -- getchar() returns an int, after all, so of course it can return non-characters, and did you know that IEEE-754 floating point can represent a "negative zero" that you can use for an error code in functions that return float or double?Functional programmers worry about this much more, and I got a bit of an education a couple of years ago when I dabbled in Haskell, where I engaged with the issue of what to do when a nominally-pure function gets an error.I'm not sure I really got it, but I started thinking a lot more clearly about some programming concepts.

评论 #22574153 未加载

评论 #22572937 未加载

评论 #22572741 未加载

评论 #22572643 未加载

anonymousiamabout 5 years ago

CP/M and DOS use ^Z (0x1A) as an EOF indicator. More modern operating systems use the file length (if available). Unix/Linux will treat ^D (0x04) as EOF within a stream, but only if the source is "cooked" and not "raw". (^D is ASCII "End Of Transmission or EOT" so that seems appropriate, except in the world of unicode.)

评论 #22573286 未加载

评论 #22573927 未加载

评论 #22573077 未加载

combatentropyabout 5 years ago

The kernel returns EOF "if k is the current file position and m is the size of a file, performing a read() when k >= m..."So, is the length of each file stored as an integer, along with the other metadata? This reminds me of how in JavaScript the length of an array is a property, instead of a function that counts it right then, like say in PHP.Apparently it works. I've never heard of a situation where the file size number did not match the actual file size, nor of a time when the JavaScript array length got messed up. But it seems fragile. File operations would need to be ACID-compliant, like database operations (and likewise do JavaScript array operations). It seems like you would have to guard against race conditions.Does anyone have a favorite resource that explains how such things are implemented safely?

评论 #22577246 未加载

chrisseatonabout 5 years ago

So what is CP/M-style character 26? Isn’t that documented as end-of-file?

评论 #22572776 未加载

评论 #22572992 未加载

评论 #22572703 未加载

IndexPointerabout 5 years ago

Of course it isn't, you couldn't have arbitrary binary files if one of the 256 possible bytes was reserved. That's why getchar returns int and not char; one char wouldn't be enough for 257 possible values (256 possible char values + eof).

schoenabout 5 years ago

Recently (though mine was the only comment): <a href="https://news.ycombinator.com/item?id=22461647" rel="nofollow">https://news.ycombinator.com/item?id=22461647</a>

评论 #22572559 未加载

nixpulvisabout 5 years ago

I find it interesting that Rust's `Read` API for `read_to_end` [1] states that it "Read[s] all bytes until EOF in this source, placing them into buf", and stops on conditions of either `Ok(0)` or various kinds of `ErrorKind`s, including `UnexpectedEof`, which should probably never be the case.[1]: <a href="https://doc.rust-lang.org/std/io/trait.Read.html#method.read_to_end" rel="nofollow">https://doc.rust-lang.org/std/io/trait.Read.html#method.read...</a>

评论 #22574062 未加载

评论 #22575507 未加载

badrabbitabout 5 years ago

Banged my head against the wall once after trying to figure out why Ctrl+D generates some character in bash but I can't send that character in a pipe to simulate termination.

评论 #22573006 未加载

评论 #22573051 未加载

评论 #22574353 未加载

jwilkabout 5 years ago

Um, no, you can't use Python to infer that "EOF (as seen in C programs) is not a character".The exception even tells you that "chr() arg not in range(0x110000)" which has nothing to do with range of C's character types.

unnouinceputabout 5 years ago

For me EOF is a boolean state. Either I am at the end of file (stream / memory mapped etc) or not. That's how I was taught when I started programming. Never occurred to me to think of it like a character.

Thorrezabout 5 years ago

Another weird thing is that sometimes you can read an EOF, then keep reading more real bytes. So EOF doesn't necessarily mean the permanent end.

评论 #22578299 未加载

agumonkeyabout 5 years ago

And this is why I failed C IO classes. Lack of information and improper abstraction.

cjohanssonabout 5 years ago

Interesting read, I suspected it was like this but I didn’t know for sure

jes5199about 5 years ago

yeah, this author doesn’t know the history. Unix I/O was defined in opposition to practices in other OSes, that no longer exist

评论 #22572962 未加载

ineedasernameabout 5 years ago

This strikes me as the sort of pedantic and "I'm witty" click bait that occasionally percolates upwards on HN, especially considering the specifics of "EOF" are very much contingent on operating context.

1996about 5 years ago

\r \n (0x0a 0x0d, or just one of them, or the combination of them, depending on your OS) is EOL^D (0x04) is EOT and 0x03 is EOText: <a href="https://www.systutorials.com/ascii-table-and-ascii-code/" rel="nofollow">https://www.systutorials.com/ascii-table-and-ascii-code/</a>So, kinda, but somehow I'm happy it never got turned into a weird combinations depending on the OS.

评论 #22573019 未加载