API design note: Beware of adding an "Other" enum value

237 点作者 luu2 个月前

28 条评论

remram2 个月前

Rust has the "non_exhaustive" attribute that lets you declare that an enum might get more fields in the future. In practice that means that when you match on an enum value, you have to add a default case. It's like a "other" field in the enum except you can't reference it directly, you use a default case.IIRC a secret 'other' field (or '__non_exhaustive' or something) is actually how we did thing before non_exhaustive was introduced.

评论 #43235096 未加载

评论 #43235348 未加载

评论 #43237844 未加载

评论 #43235175 未加载

评论 #43240856 未加载

评论 #43241575 未加载

评论 #43234815 未加载

zdw2 个月前

I wonder how this aligns with the protobuf best practice of having the first value be UNSPECIFIED:<a href="https://protobuf.dev/best-practices/dos-donts/#unspecified-enum" rel="nofollow">https://protobuf.dev/best-practices/dos-donts/#unspecified-e...</a>

评论 #43234421 未加载

评论 #43234237 未加载

评论 #43234390 未加载

评论 #43263936 未加载

评论 #43238885 未加载

评论 #43234388 未加载

NoboruWataya2 个月前

> Just document that the enumeration is open-ended, and programs should treat any unrecognized values as if they were “Other”.Possibly just showing my lack of knowledge here but are open-ended enumerations a common thing? I always thought the whole point of an enum is that it is closed-ended?

评论 #43234447 未加载

评论 #43234937 未加载

评论 #43234824 未加载

评论 #43237307 未加载

评论 #43234935 未加载

评论 #43234989 未加载

评论 #43235513 未加载

评论 #43236259 未加载

评论 #43235047 未加载

kstenerud2 个月前

I use the "other" technique when it's necessary for the user to be able to mix in their own:<pre><code> enum WidgetFlavor { Vanilla, Chocolate, Strawberry, Other=10000, }; </code></pre> Now users can add their own (and are also responsible for making sure it works in all APIs):<pre><code> enum CustomWidgetFlavor { RockyRoad=Other, GroovyGrape, Cola, }; </code></pre> And now you can amend the enum without breaking the client:<pre><code> enum WidgetFlavor { Vanilla, Chocolate, Strawberry, Mint, Other=10000, };</code></pre>

评论 #43239654 未加载

评论 #43241441 未加载

layer82 个月前

Slight counterpoint: Unless there is some guarantee that the respective enum type will never ever be extended with a new value, each and every case distinction on an enum value needs to consider the case of receiving an unexpected value (like Mint in the example). When case distinctions do adhere to that principle, then the problem described doesn’t arise.On the other hand, if the above principle is adhered to as it should, then there is also little benefit in having an Other value. One minor conceivable benefit is that intermediate code can map unsupported values to Other in order to simplify logic in lower-level code. But I agree that it’s usually better to not have it.A somewhat related topic that comes to mind is error codes. There is a common pattern, used for example by the HTTP status codes, where error codes are organized into categories by using different prefixes. For example in a five-digit error code scheme, the first three digits might indicate the category (e.g. 123 for “authentication errors”), and the remaining two digits represent a more specific error condition in that category. In that setup, the all-zeros code in each category represents a generic error for that category (i.e. 12300 would be “generic authentication error”).When implementing code that detects a new error situation not covered by the existing specific error codes, the implementer has now the choice of either introducing a new error code (e.g. 12366 — this is analogous to adding a new enum value), which has to be documented and maybe its message text be localized, or else using the generic error code of the appropriate category.In any case, when error-processing code receives an unknown — maybe newly assigned — error code, they can still map it according to the category. For example, if the above 12366 is unknown, it can be handled like 12300 (e.g. for the purpose of mapping it to a corresponding error message). This is quite similar to the case of having an Other enum value, but with a better justification.

qbane2 个月前

How about putting Other at the top? You can convince yourself that the value zero (or one if you like) is reserved for unknown values.

评论 #43240760 未加载

评论 #43241188 未加载

dataflow2 个月前

I think there are multiple concerns here, and they need to be analyzed separately -- they don't converge to the same solution:- Naming: "Other" should probably be called "Unrecognized" in these situations. Then users understand that members may not be mutually exclusive.- ABI: If you need ABI compatibility, the constraint you have is "don't change the meanings of values or members", which is somewhat stronger. The practical implication is that if you do need to have an Other value, its value should be something out of range of possible future values.- Protocol updates: If you can atomically update all the places where the enum is used, then there's no inherent need to avoid Other values. Instead, you can use compile-time techniques (exhaustive switch statements, compiler warnings, temporarily removing the Other member, grep, clang-query, etc.) to find and update the usage sites at compile time. This requires being a little disciplined in how you use the enum during development, but it's doable.- Distributed code: If you don't have control over all the code using your enum might, then you must avoid an Other value, unless you can somehow ensure out-of-band that users have updated their code.

coin2 个月前

Just call it "unknown" or "unspecified" or better yet use an optional to hold the enum.

评论 #43234826 未加载

KPGv22 个月前

> Rust has the "non_exhaustive" attribute that lets you declare that an enum might get more fields in the future.aIs there a reason, aside from documentation, that this is ever desirable? I rarely program in Rust, but why would this ever be useful in practice, outside of documentation? (Seems like code-as-documentation gone awry when your code is doing nothing but making a statement about future code possibilities)

评论 #43237617 未加载

评论 #43237935 未加载

jffhn2 个月前

>"programs should treat any unrecognized values as if they were “Other”"Having such an "Other" value does not prevent from considering that the enum is open-ended, and it simplifies a lot all the code that has to deal with potentially invalid or unknown values (no need for a validity flag or null).That's probably why in DIS (Distributed Interactive Simulation) standard, which defines many enums, all start with OTHER, which has the value zero.In STANAGs (NATO standards), the value zero is used for NO_STATEMENT, which can also be used when the actual value is in the enum but you can't or don't need to indicate it.I remember an "architecture astronaut" who claimed that NO_STATEMENT was not a domain value, and removed it from all the enums in its application. That did not last long.That also reminds me of Philippe Khan (Bordland) having in some presentation the ellipse extend the circle, to add a radius. A scientist said he would do the other way around, and Khan replied: "This is exactly the difference between research and industry".

评论 #43239919 未加载

评论 #43244016 未加载

sylware2 个月前

As another example: vulkan3D made the mistake to use enum in its API.Now, they must be sure it is a signed 32bits on 32 or 64 bits systems, namely check the compiler behavior. You can check the code, they always add a 0x7fffffff as the last enum value to "force" the compiler and tell developers (which have enough experience) "hey, this is a signed 32bits"... whoopsie!We should eat the bullet: remove the enum in vulkan3D, and use the appropriate primitive type for each platform ABI (not API...), so the "fix" should be transparent as it would no break the ABI. But all the "code generators" using khronos xml specifications and static source code are to be modified in one shot to stay consistent. This ain't small feat.[NOTE: enum is one of those things which should be removed from the "legacy profile" of C (like tons of keywords, integer promotion, implicit cast, etc).]

esafak2 个月前

Just add a free-form text field to hold the other value, and revise your enum as necessary, while migrating the data.

评论 #43234884 未加载

评论 #43241905 未加载

jasonkester2 个月前

This got me wondering what I actually do in practice. I think it's this:<pre><code> const KnownFlavors { Vanilla: "Vanilla", Chocolate: "Chocolate", Strawberry: "Strawberry" } </code></pre> Then, use a string to hold the actual value.<pre><code> doug.favoriteFlavor = KnownFlavors.Chocolate; cindy.favoriteFlavor = "Mint" case: KnownFlavors.Chocolate: </code></pre> Expand your list of known flavors whenever you like, your system will still always hold valid data. You get all the benefits of typo-proofing your code, switching on an enum, etc., without having to pile on any wackiness to fool your compiler or keep the data normalized.It acknowledges the reality that a non-exhaustive enum isn’t really an enum. It’s just a list of things that people might type into that field.

评论 #43239309 未加载

评论 #43239881 未加载

akamoonknight2 个月前

One of the tactics I end up using in Verilog, for better or worse, is to define enums with a'0 value (repeat 0s for the size of the variable), and '1 value (repeat 1s for the size of the value)'0 stays as "null"-like (e.g INVALID), and '1 (which would be 0xFF in an 8 bit byte for instance) becomes "something, but I'm not sure what" (e.g. UNKNOWN).Definitely has the same issues as referenced when needing to grow the variable, and the times where it's useful aren't super common, but I do feel like the general concept of an unknown-but-not-invalid value can help with tracking down errors in processing chains Definitely do run into the need to "beware" though with enums for sure.

o11c2 个月前

The approach in the link is fine for consumers, but for producers you really do need some way of saying "create a value that's not one of the known values". Still, there's nothing that says this needs to be pretty.

shortrounddev22 个月前

On a similar note: what do you think is best practice for reserving memory in a struct for future usage? For example, if you have a binary file format with a header like this:<pre><code> struct Header { char waterMark[3]; uint16_t width; uint16_t height; uint8_t reserved[16]; } </code></pre> So that you can future proof v1 binaries to still be compatible with v2 by adding empty padding on "reserved" which lets you add fields in the future. I do this sometimes and always wonder if there are other philosophies on it

mkleczek2 个月前

I've had a short discussion with Brian Goetz about a similar case (sealed types in Java): <a href="https://mail.openjdk.org/pipermail/amber-dev/2020-April/005844.html" rel="nofollow">https://mail.openjdk.org/pipermail/amber-dev/2020-April/0058...</a>I wonder when we are going to re-discover OOP style dynamic dispatch (or even better: multiple dispatch) to deal with software evolution.

oytis2 个月前

Worth noting that in C and C++ enum-typed variable holding a value not in the enum is a UB. Had some funny bugs because of that.

sgondala_ycapp2 个月前

Random tidbit: We use LLM to identify document types and use an enum to show a list of options.Initially, we didn’t include an "Other" category - which led the LLM to force-fit documents into existing types even when they didn’t belong. Obv this wasn't LLM's fault.We realized the mistake and added "Other". This significantly improved output accuracy!

spjt2 个月前

I guess I just don't see much value in using enums at all. My most memorable experience with them is when our production system went down because a third-party service added a value to an enum.

bob10292 个月前

Making things into enums that shouldn't be enums is a fun trap to fall into. Much of the time what you really want is a complex type so that you can communicate these additional facts. In this case I'd do something like:<pre><code> class Widget { WidgetFlavor Flavor; //Undefined, Vanilla, Chocolate, Strawberry string? OtherFlavor; } </code></pre> This is easy to work from a consumer standpoint because if you have a deviant flavor to specify, you don't bother setting the Flavor member to anything at all. You just set OtherFlavor. Fewer moving pieces == less chance for bad times.The first (default) member in an enum should generally be something approximating "Undefined". This also makes working with serializers and databases easier.

评论 #43236157 未加载

评论 #43236406 未加载

评论 #43235971 未加载

delduca2 个月前

SDL does this trick <a href="https://wiki.libsdl.org/SDL2/SDL_EventType" rel="nofollow">https://wiki.libsdl.org/SDL2/SDL_EventType</a>

hello123432142 个月前

Good idea. I appreciate that he thought through future compatibility with old versions.

moomin2 个月前

Also Microsoft: your enum should have an explicit Unknown entry with value 0.

vadim_phystech2 个月前

...since the set of all possible behaviour, that is not specified, it much greater, and densier, than one would initially feel and assume, one might cause lot's of possible bad outcomes and success-breaking-points if use "Other" type in their API. Because "Other" if the 1st thing to look for vulnerabilities, for attack vectors. Because the spirit of UB the Terrible lurks there! The spirit of UB feeds upon thee juices of "Other" omnimorphic (fel) type! скверный бесформенный "ЛЮБОЙ" тип! разврат и дисгармоничность! разложение и редуцирующие гетероморфизмы! decomposition, descriptive semantic matrix rank reduction, richness degradation, devolution...empoorness...scarcity pressure increase...</shutting_the_fuck_up_my_wetware_machine_whispering_kek>

_3u102 个月前

I usually use Unknown / Other as 0.

1oooqooq2 个月前

jr: add other optionsr: omit other optionilluminated: add other option in front end only and alert when the backend crashes.

DonHopkins2 个月前

<a href="https://en.wikipedia.org/wiki/Tony_Hoare#Research_and_career" rel="nofollow">https://en.wikipedia.org/wiki/Tony_Hoare#Research_and_career</a>>Speaking at a software conference in 2009, Tony Hoare apologized for inventing the null reference, his "Billion Dollar Mistake":>"I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years." -Tony HoareAnders Hejlsberg brilliantly points out how JavaScript doubled the cost of that mistake:>"My favorite is always the Billion-Dollar Mistake of having null in the language. And since JavaScript has both null and undefined, it's the Two-Billion-Dollar Mistake." -Anders Hejlsberg>"It is by far the most problematic part of language design. And it's a single value that -- ha ha ha ha -- that if only that wasn't there, imagine all the problems we wouldn't have, right? If type systems were designed that way. And some type systems are, and some type systems are getting there, but boy, trying to retrofit that on top of a type system that has null in the first place is quite an undertaking." -Anders HejlsbergThe JavaScript Equality Table shows how Brendan Eich simply doesn't understand equality for either data types or human beings and their right to freely choose who they love and marry:<a href="https://dorey.github.io/JavaScript-Equality-Table/" rel="nofollow">https://dorey.github.io/JavaScript-Equality-Table/</a>Do any languages implement the full Rumsfeld Awareness–Understanding Matrix Agnoiology, quadrupling the cost?Why stop at null, when you can have both null and undefined? Throw in unknown, and you've got a hat trick, a holy trinity of nihilistic ignorance, nothingness, and void! The Rumsfeld Awareness–Understanding Matrix Agnoiology breaks knowledge down into known knows, plus the three different types of unknowns:<a href="https://en.wikipedia.org/wiki/There_are_unknown_unknowns" rel="nofollow">https://en.wikipedia.org/wiki/There_are_unknown_unknowns</a>>"Reports that say that something hasn't happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don't know we don't know. And if one looks throughout the history of our country and other free countries, it is the latter category that tends to be the difficult ones." -Donald Rumsfeld1) Known knowns: These are the things we know that we know. They represent the clear, confirmed knowledge that can be easily communicated and utilized in decision-making.2) Known unknowns: These are the things we know we do not know. This category acknowledges the presence of uncertainties or gaps in our knowledge that are recognized and can be specifically identified.3) Unknown knowns: Things we are not aware of but do understand or know implicitly4) Unknown unknowns: These are the things we do not know we do not know. This category represents unforeseen challenges and surprises, indicating a deeper level of ignorance where we are unaware of our lack of knowledge.<a href="https://en.wikipedia.org/wiki/Agnoiology" rel="nofollow">https://en.wikipedia.org/wiki/Agnoiology</a>>Agnoiology (from the Greek ἀγνοέω, meaning ignorance) is the theoretical study of the quality and conditions of ignorance, and in particular of what can truly be considered "unknowable" (as distinct from "unknown"). The term was coined by James Frederick Ferrier, in his Institutes of Metaphysic (1854), as a foil to the theory of knowledge, or epistemology.I don't know if you know, but Microsoft COM hinges on the IUnknown interface. Microsoft COM's IUnknown interface takes the Rumsfeldian principle to heart: it doesn't assume what an object is but provides a structured way to query for knowledge (or interfaces). In a way, it models known unknowns, since a caller knows that an interface might exist but must explicitly ask if it does.Then there's Schulz's Known Nothing Nesiology, representing the existential conclusion of all this: when knowledge itself is questioned, where does that leave us? Right back at JavaScript's Equality Table, which remains an unfathomable unknown unknown to Brendan Eich and his well known but knowingly ignorant War on Equality.<a href="https://www.youtube.com/watch?v=HblPucwN-m0" rel="nofollow">https://www.youtube.com/watch?v=HblPucwN-m0</a>Nescience vs. Ignorance (on semantics and moral accountability):<a href="https://cognitive-liberty.online/nescience-vs-ignorance/" rel="nofollow">https://cognitive-liberty.online/nescience-vs-ignorance/</a>>From a psycholinguistic vantage point, the term “ignorance” and the term “nescience” have very different semantic connotations. The term ignorance is more generally more widely colloquially utilized than the term nescience and it is often wrongly used in contexts where the word nescience would be appropriate. “Ignorance” is associated with “the act of ignoring”. Per contrast, “nescience” means “to not know” (viz., Latin prefix ne = not, and the verb scire = “to know”; cf. the etymology of the word “science”/prescience).>As Mark Passio points out, the important underlying question which can be derived from this semantic distinction pertains to whether our individual and global problems are caused by “ignorance” or “nescience”? That is, “ignoring” or “not knowing”? It seems clear that it is the later. We know about the truth but we actively ignore it for the most part. Currently people have all the necessary information available (literally at their fingertips). Ignoring the facts is a decision, an irrational decision, and people can be held accountable for this decision. Nescience, on the other hand, acquits from accountability (i.e., someone cannot be held accountable when he/she for not knowing something but for ignoring something). Quasi-Freudian suppression plays a pivotal role in this scenario. Suppression is very costly in energetic terms. The energy and effort which is used for suppression lacks elsewhere (cf. prefrontal executive control is based on limited cognitive resources). The suppression of truth through the act of active ignoring thus has negative implications on multiple levels – on the individual and the societal level, the cognitive and the political, the psychological and the physiological.Brendan: While we can measure the economic consequences of your culpably ignorant mistakes of both bad programming language design and marriage inequality in billions of dollars, the emotional, social, and moral costs of the latter -- like diminished human dignity and the perpetuation of discrimination -- are, by their very nature, priceless.Ultimately, these deeper impacts underscore that the fight for marriage equality, defending against the offensive uninvited invasion of your War on Equality into other people's marriages, is about much more than economics; it’s about ensuring fairness, respect, and equality for all members of society.