tl;dr<p>> Overall, we found that few-shot tool assistance, without explicitly training models to use tools, is a difficult and unsolved problem, with significant costs. This is in contrast to their commonly perceived value as an easy and cheap solution to augment LMs with tools, such as retrieval or calculation. Beyond few-shot strategies, training models to use tools seems to be more promising (and a popular paradigm in recent months — such as with Gemini, GPT-4 and Command-R).