Brex’s Prompt Engineering Guide

540 pointsby appwizabout 2 years ago

16 comments

asteroidzabout 2 years ago

The "Strategies" section looks valuable.Here are a few more great resources from my notes (including one from Lilian Weng who leads Applied Research at OpenAI):- <a href="https://lilianweng.github.io/posts/2023-03-15-prompt-engineering" rel="nofollow">https://lilianweng.github.io/posts/2023-03-15-prompt-enginee...</a>- <a href="https://www.promptingguide.ai" rel="nofollow">https://www.promptingguide.ai</a> (check the "Techniques" section for several research-vetted approaches)- <a href="https://learnprompting.org/docs/intro" rel="nofollow">https://learnprompting.org/docs/intro</a>

评论 #35958352 未加载

typpoabout 2 years ago

Are there established best practices for "engineering" prompts systematically, rather than through trial-and-error?Editing prompts is like playing whack-a-mole: once you clear an edge case, a new problem pops up elsewhere. I'd really like to be able to say, "this new prompt performs 20% better across all our test cases".Because I haven't found a better way, I am building <a href="https://github.com/typpo/promptfoo">https://github.com/typpo/promptfoo</a>, a CLI that outputs a matrix view for quickly comparing outputs across multiple prompts, variables, and models. Good luck to everyone else out there tuning prompts :)

评论 #35943969 未加载

评论 #35943182 未加载

评论 #35943284 未加载

评论 #35956246 未加载

评论 #35943134 未加载

dakomabout 2 years ago

Why are we calling this "engineering"?Isn't engineering the application of science to solve problems? (math, definitive logic, etc.)Maybe one day we'll have instruments that let us reason about the connections between prompts and the exact state of the AI, so that we can understand the mechanics of causation, but until then, I would not think that being good at asking questions is "engineering"Are most 10 year olds veteran "search engineers"?Btw I'm asking this slightly tongue-in-cheek, as a discussion point. For example plenty of computer system hacks are done by way of "social engineering", so clearly that term is malleable even within the tech community.

评论 #35945307 未加载

评论 #35948536 未加载

评论 #35945675 未加载

评论 #35945366 未加载

评论 #35949474 未加载

评论 #35948640 未加载

评论 #35945298 未加载

评论 #35945765 未加载

velavarabout 2 years ago

Is it me or is the bot's output in the section "Give a Bot a Fish" incorrect? It states that the most recent receipt is from Mar 5th, 2023 but there are two receipts after that date. This is what worries me about using ChatGPT - the possibility of errors in financial matters, which won't go down well I fear.

评论 #35945150 未加载

hn_throwaway_99about 2 years ago

Thanks very much for posting this! I haven't yet finished reading the whole thing, but even just the first section about the history of LLMs, explaining some of the basic concepts, etc., I found to be a very well-written and useful info, and it was really nice that it linked out to source material. So many times when you go into reading stuff about the latest AI technique or feature it can feel like you need to do a ton of background reading just to understand what they're talking about (especially as the field moves so quickly), so having a nice simple primer at the beginning of this doc was most appreciated!

评论 #35943406 未加载

anotherpaulgabout 2 years ago

The suggestion to use markdown tables was quite interesting. It makes a lot of sense, and I haven't seen it described elsewhere.I have been getting good results by asking GPT to produce semi structured responses based on other aspects of (GitHub) markdown.In general, I find it very helpful to find an already popular format that suits your problem. The model is probably already fluent in rendering that output format. So you spend less time trying to teach it the output syntax.

评论 #35944394 未加载

zwapsabout 2 years ago

Worringly, I am it sure the people working on this really understand what a Transformer isQuote from them:“ There is still active research in non-transformer based language models though, such as Amazon’s AlexaTM 20B which outperforms GPT-3“Quote from said paper“ For AlexaTM 20B, we used the standard Transformer model architecture“(Its just an encoder decoder transformer)

评论 #35944603 未加载

评论 #35944393 未加载

wearhereabout 2 years ago

This reflects astonishingly poorly on Brex. What customer wants to hear that Brex is using "a non-deterministic model" for "production use cases" like "staying on top of your expenses"? I don't see them acknowledge the downsides of that non-determinism anywhere, let alone hallucination, even though they mention the latter. Hallucinating an extra expense, or missing one, could have serious consequences.This is also potentially terrible from a privacy standpoint. That "staying on top of your expenses" example suggests that you upload "a list of the entire [receipts] inbox" to the model. It _seems_ like they're using OpenAI's API, which doesn’t use customer data for training (unlike ChatGPT), but they should be crystal clear about this. Even if OpenAI doesn't retain/reuse the data, would Brex's customers be happy with this 3rd-party sharing?The expenses example seems like sloppy engineering too—there's no reason to share expense amounts with the model if you just want it to count the number of expenses. Merchant names could be redacted too, replaced with identifiers that Brex would map back to the real data. These suggestions would save on tokens too.Despite Brex saying they're using this in production, I suspect it's mostly a recruiting exercise. It's still a very bad look for their engineering.

评论 #35944281 未加载

ojbyrneabout 2 years ago

“In 2017, Google wrote a paper” - there’s the singularity right there.

akisejabout 2 years ago

This seems overall well-written and well-explained, but curious for that piece on fine-tuning. This article only recommends it as a last resort. That makes sense for a casual user, but if you're a company seriously using LLMs to provide services for your customers, wouldn't the cost of training data be offset by the potential gains you have and the edge cases you might automatically cover by fine-tuning instead of trying to whack-a-mole predict every single way the prompt can fail?

评论 #35956633 未加载

alexbouchardabout 2 years ago

YAML is just as effective at communicating data structure to the model while using ~50% less tokens. I now convert all my JSON to YAML before feeding it to GPT API's

评论 #35943076 未加载

评论 #35942919 未加载

uoaeiabout 2 years ago

This is a question borne of ignorance: why does Brex, a bank, care about AI like this?

评论 #35944379 未加载

评论 #35942916 未加载

评论 #35943464 未加载

评论 #35943338 未加载

评论 #35943058 未加载

评论 #35943122 未加载

评论 #35943123 未加载

game_the0ryabout 2 years ago

I wonder if linguistic and English majors would end up benefiting in this trend of "prompt engineering."

评论 #35942859 未加载

评论 #35942825 未加载

评论 #35944084 未加载

评论 #35945936 未加载

评论 #35943952 未加载

评论 #35942899 未加载

jaredsohnabout 2 years ago

One thing I haven't heard much discussion about is the fact that ChatGPT is constantly being updated.This means that if you build a prompt for classification and become confident that you've whacked all of the moles so that it is pretty solid with all of the edge cases, it can later start breaking again.Some solutions I can think of are 1) choose a fixed model to test against but they become deprecated over time or 2) perhaps fine-tuning might help.

评论 #35945928 未加载

saladtoesabout 2 years ago

I've been playing Gandalf in the last few days, it does a great job at giving an intuition for some of the subtleties of prompt engineering: <a href="https://gandalf.lakera.ai" rel="nofollow">https://gandalf.lakera.ai</a>Thanks for putting this together!

评论 #35948884 未加载

jasfiabout 2 years ago

I'm working on the idea of features instead of prompts: <a href="https://inventai.xyz" rel="nofollow">https://inventai.xyz</a>