Hi HN. We’re writing and drawing a digital zine on building LLM evals. It’s set in a world where forest creatures learn how to prompt the LLM shoggoth living in the canopy of their home.<p>After talking to a bunch of AI engineers, we found either people were waist-deep in evals, or they had kinda heard about it, but had no real clue about it. We wrote this guide for the latter, to get people up to speed quickly about building their own evals.<p>I personally found some things surprising, such as how the grading scale matters and being conscientious about picking metrics for multiple goals as a proxy for “good”. We put the things we learned into a nicely illustrated package.<p>We took inspiration from the meme that LLMs are a Lovecraftian Shoggoth--an alien intelligence that we put a mask on to make it palatable for us. Juxtaposing it against forest animals seemed amusing, and a way for us to do some world-building and fun as well.<p>And yes, the illustrations are all hand-drawn and not generated. The current image generation tools aren’t yet consistent enough.<p>In case you miss it on the landing page, here are some sample pages and table of contents (subject to minor changes). <a href="https://forestfriends.tech/assets/preview.pdf?v=1" rel="nofollow">https://forestfriends.tech/assets/preview.pdf?v=1</a><p>Are you building LLM apps and haven’t put in evals yet? What sort of challenges are you running into or would like to get addressed?