Hey HN! Tokencost is a utility library for estimating LLM costs. There are hundreds of different models now, and they all have their own pricing schemes. It’s difficult to keep up with the pricing changes, and it’s even more difficult to estimate how much your prompts and completions will cost until you see the bill.<p>Tokencost works by counting the number of tokens in prompt and completion messages and multiplying that number by the corresponding model cost. Under the hood, it’s really just a simple cost dictionary and some utility functions for getting the prices right. It also accounts for different tokenizers and float precision errors.<p>Surprisingly, most model providers don't actually report how much you spend until your bills arrive. We built Tokencost internally at AgentOps to help users track agent spend, and we decided to open source it to help developers avoid nasty bills.
I don't understand how the Claude functionality works.<p>As far as I know Anthropic haven't released the tokenizer for Claude - unlike OpenAI's tiktoken - but your tool lists the Claude 3 models as supported. How are you counting tokens for those?
Would anybody be interested in this for Rust? I already do everything this library does with the exception of returning the price in my LLM utils crate [1]. I do this just to count tokens to ensure prompts stay within limits. And I also support non-open ai tokenizers. So adding a price calculator function would be trivial.<p>[1] <a href="https://github.com/ShelbyJenkins/llm_utils">https://github.com/ShelbyJenkins/llm_utils</a>
With all the options there seems like an opportunity for a single point API that can take a series of prompts, a budget and a quality hint to distribute batches for most bang for buck.<p>Maybe a small triage AI to decide how effectively models handle certain prompts to preserve spending for the difficult tasks.<p>Does anything like this exist yet?
I dig it! Kind of related, but I made a comparison of LLM API costs vs their leaderboard performance to gauge which models can be more bang for the buck [0]<p>[0] <a href="https://llmcompare.net" rel="nofollow">https://llmcompare.net</a>
An interesting parameter that I don't read about a lot is vocab size. A larger vocab means you will need to generate less tokens for the same word on average, also the context window will be larger. This means that a model with a large vocab might be more expensive on a per token basis, but would generate less tokens for the same sentence, making it cheaper overall. This should be taken into consideration when comparing API prices.
Are you also accounting for costs of sending images and function calls? I didn't see that when I looked through the code. I developed this package so that I could count those sorts of calls as well:
<a href="https://github.com/pamelafox/openai-messages-token-helper">https://github.com/pamelafox/openai-messages-token-helper</a>
Very cool! Is this cost directory you're using the best source for historical cost per 1M tokens? <a href="https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json">https://github.com/BerriAI/litellm/blob/main/model_prices_an...</a>
A whole bunch of the costs are listed as zeroes, with multiple decimal points. I noticed y'all used the Decimal library and tried to hold onto precision so I'm not sure what's going on, but certainly some of the cheaper models just show up as "free".