Even worse is the all-decimal MAC problem.<p>Some genius decided that, to make time input convenient, YAML would parse HH:MM:SS as SS + 60×MM + 60×60×HH. So you could enter 1:23:45 and it would give you the correct number of seconds in 1 hour, 23 minutes, and 45 seconds.<p>They neglected to put a maximum on the number of such sexagesimal places, so if you put, say, six numbers separated by colons like this, it would be parsed as a very large integer.<p>Imagine my surprise when, while working at a networking company, we had some devices which failed to configure their MAC addresses in YAML! After this YAML config file had been working for literal years! (I believe this was via netplan? It's been like a decade, I don't remember.)<p>Turns out, if an unquoted MAC address had even a single non-decimal hex digit, it would do what we expected (parse as a string). This is not only by FAR the more common case, but also we had an A in our vendor prefix, so we never ran into this "feature" during initial development.<p>Then one day we ran out of MAC addresses and got a new vendor prefix. This time it didn't have any letters in it. Hilarity ensued.<p>(This behavior has thankfully been removed in more recent YAML standards.)
Perl has a <i>Poland Problem</i>. The customary file extension for Perl files is *.pl. This worked well until Apache introduced content negotiation and the convention to add a language code as file extension. It had index.html.en, index.html.de, for example.<p>index.html.pl is where the problem started and the reason why the officially recommended file extension for Perl files used to be (still is?) *.plx.<p>I don't have the Camel book at hand, but Randal Schwartz's <i>Learning Perl</i> 5th edition says:<p><i>"Perl doesn't require any special kind of filename or extension, and it's better not to use an extension at all. But some systems may require an extension like plx (meaning PerL eXecutable); see your system's release notes for more information."</i>
Programming with string templates, in a highly complex and footgun-rich markup language, is one of the things I find most offputting about the DevOps ecosystem.
"The limits of my keyboard mean the limits of my programming language."<p>If only they had had ⊥ and ⊤ somewhere on their keys to work with Booleans directly while designing the languages. In another branch of history, perchance.[1]<p>[1] <a href="https://en.wikipedia.org/wiki/APL_(programming_language)#/media/File:APL-keybd2.svg" rel="nofollow">https://en.wikipedia.org/wiki/APL_(programming_language)#/me...</a>
Pandas has a Nigeria problem, where NA -> NaN.<p>It's not that bad, because you can explicitly turn that behavior off, but ask me how I know =(
Always quote all yaml strings. If you have a yaml file that has something that isn't a simple value (number, boolean) such as for example a date, time, ip-address, mac address, country code, phone number, server name, configuration name, etc. etc. then you are asking for trouble. Just DON'T DO THAT. It's pretty simple.<p>"Yeah but it's so convenient"<p>"Yeah but the benefit of yaml is that you don't need quotes everywhere so that it's more human readable"<p>DON'T
How often do people even encounter this issue?
I have been using YAML for 5+ years and have never had it before.
Further, I use `yamllint` which points this out as a lint issue "truthy value should be one of [false, true]".
This has been fixed since 2009 with YAML 1.2. The problem is that everyone uses libyaml (_e.g._ PyYAML _etc._) which is stuck on 1.1 for reasons.<p>The 1.2 spec just treats all scalar types as opaque strings, along with a configurable mechanism[0] for auto-converting non-quoted scalars if you so please.<p>As such, I really don't quite grok why upstream libraries haven't moved to YAML 1.2. Would love to hear details from anyone with more info.<p>[0]:<a href="https://yaml.org/spec/1.2.2/#chapter-10-recommended-schemas" rel="nofollow">https://yaml.org/spec/1.2.2/#chapter-10-recommended-schemas</a>
IMO the proposed solution of StrictYAML + schema is the right one here and what we use extensively for human readable configs. StrictYAML (linked to in the post) is essentially a string-type-only restriction of YAML, so you impose your type coercion on the parsed data structure.
This problem occurs because pyyaml load() uses the full YAML 1.1 schema. There is another function BaseLoader that will interpret everything as a string which is the workaround that the article suggests. Just another way to achieve it.<p>It’s a bit of a sore spot in the YAML community as to why PyYAML can’t / won’t support YAML 1.2. It was in maintenance mode for a while. YAML 1.2 also introduced breaking changes.<p>From a SO comment: “ As long as you're okay with the YAML 1.1 standard, PyYAML is still perfectly fine, secure, etc. If you want to support the YAML 1.2 spec (released in 2009), you can use ruamel.yaml, which started out as a fork of PyYAML. –
CrazyChucky
Commented Mar 26, 2023 at 20:51”<p>- <a href="https://stackoverflow.com/q/75850232" rel="nofollow">https://stackoverflow.com/q/75850232</a>
In Lisp, if you want to read text into symbols (e.g. file of words), you just switch to a dedicated package in which those symbols are interned. Then if NIL happens to come up, it will be a symbol named "NIL" in that package, unrelated to the special object.
I reckon if this is really a big concern for anybody, then they are probably writing way too much YAML to begin with. If you're being caught out by things like this and need to debug it, then it maps very cleanly to types in most high level languages and you can generate your YAML from that instead.
I like using tags and avoid any doubt<p>!!boolean<p><a href="https://dev.to/kalkwst/a-gentle-introduction-to-the-yaml-format-bi6#:~:text=A%20YAML%20tag%20is%20a,)%2C%20followed%20by%20a%20URI." rel="nofollow">https://dev.to/kalkwst/a-gentle-introduction-to-the-yaml-for...</a>
I’ve been working on <a href="https://conl.dev" rel="nofollow">https://conl.dev</a>, which fixes/removes YAMLs problematic features.<p>Trying to find a tag-line for it I like, maybe “markdown for config”?
I do a lot of ansible which needs to run on multiple versions, and their yaml typing are not consistent - whenever I have a variable in a logic statement, I nearly always need to apply the "| bool" filter.
That edge case sounds like a reasonable tradeoff you would make for such a simple and readable generic data format.<p>Escaped json probably hits that sweetspot by being a bit uglier than yaml, but 100 times simpler than xml, though.
The article mentioned people with the last name "null". I never thought about that. It sounds like really fun in modern days to have that last name.
Related: the YAML exponent problem[0]<p>TLDR: unquoted hex hash in YAML is fine until it happens to match \d+E\d+ when it gets interpreted as a float in scientific notation.<p>[0]<a href="https://www.brautaset.org/posts/yaml-exponent-problem.html" rel="nofollow">https://www.brautaset.org/posts/yaml-exponent-problem.html</a>
Google App Engine used to do this to environment variables defined in YAML. IIRC it would convert the string "true" to "Yes", which was a fun surprise when deploying Java And NodeJS apps.
It's not a coincidence that YAML is a perfect acronym for "yet another migraine looming".<p>I mean ok it is technically a coincidence but it definitely feels like the direct result of the "what could possibly go wrong" approach the spec writers apparently took
See also <a href="https://noyaml.com" rel="nofollow">https://noyaml.com</a> (feel send in PRs with your gripes/gotchas re: YAML)