I ran a small test comparing different data serialization formats for use with GPT models (and possibly other LLMs). This is obviously very limited but it was striking how much of a difference switching from JSON to something like YAML could be.<p>I wonder if we might also see LLM specific data serialisation formats in the future, to make use of tokenization in the most efficient manner and enhance the generative capability of the models.