The Vercel AI SDK abstracts against all LLMs, including locally running ones. It even handles file attachments well, which is something people are using more and more.<p><a href="https://sdk.vercel.ai/docs/introduction" rel="nofollow">https://sdk.vercel.ai/docs/introduction</a><p>It uses zod for types and validation, I've loved using it to make my apps swap between models easily.
I've been using using [BAML](<a href="https://github.com/boundaryml/baml">https://github.com/boundaryml/baml</a>) to do this, and it works really well. Lets you have multiple different fallback and retry policies, and returns strongly typed outputs from LLMs.
I would recommend looking at OpenRouter, if anyone is interested in implementing fallbacks across model providers. I've been using it in several projects, and the ability to swap across models without changing any implementation code/without managing multiple API keys has been incredibly nice:<p><a href="https://openrouter.ai/docs/quickstart" rel="nofollow">https://openrouter.ai/docs/quickstart</a>
I just had very bad JSON mode operation with gemini-1.5-flash and 2.0-flash models using their own library 'google-generativeai'. Either can't follow JSON formatting correctly, or renders string fields with no end until max_tokens. Pretty bad for Gemini, when open models like Qwen do a better job of a basic information extraction to JSON task.
I’ve done something similar using OpenRouter and fallback chains across providers. It’s super helpful when you’re hitting rate limits or need different models for different payload sizes. I would love to see more people share latency data, though, especially when chaining Gemini + OpenAI like this.
Typescript looks so ugly visually. It gives me PHP vibes. I think it's the large words at the first column of the eye line:<p>export const<p>function<p>type<p>return<p>etc<p>This makes scanning through the code really hard because your eye has to jump horizontally.