Sorry, but a new prompt for GPT-4 is not a paper

279 点作者 georgehill超过 1 年前

38 条评论

H8crilA超过 1 年前

If you do enough measurements on that new prompt then I don't see why this shouldn't be a paper. People overestimate the value of "grand developments", and underestimate the value of actually knowing - in this case actually knowing how well something works, even if it is as simple as a prompt.Compare with drug trials: Adderall only differs from regular amphetamine in the relative concentration of enantiomers, and the entire value of the drug is in the measurements.

评论 #38530687 未加载

评论 #38530548 未加载

评论 #38531406 未加载

评论 #38533115 未加载

评论 #38532853 未加载

wongarsu超过 1 年前

I feel this has nothing at all to do with LLMs and more to do with academic incentives in general. Focusing on quality over quantity won't advance your career. Publishing lots of new papers will, as long as they meet the minimum threshold to be accepted into whatever journal or conference you are aiming for. Having one good paper won't increase your h-score, three mediocre papers might.Doubly so when there's a new breakthrough, where one of your low-effort papers might end up being the first saying something obvious that ends up being really important. Because then everyone will end up quoting your paper in perpetuity.

评论 #38539821 未加载

zitterbewegung超过 1 年前

Being dismissive about this tweet or agreeing with the author is one thing. Not realizing that the absolute minimum of a scientific paper can be much lower than a new prompt for GPT-4 is what everyone should be aware of.

mensetmanusman超过 1 年前

It is a _paper_, but it's not science, since GTP-4 is closed source and thus not reproducible in a lab.If OpenAI disappears tomorrow, papers are GTP-4 will likely be of little to no value, which is another tell of a non-scientific exploration.(note: not all explorations are scientific, and that is great! Science is just one of many tools for exploring lived reality.)

评论 #38531734 未加载

评论 #38531966 未加载

Der_Einzige超过 1 年前

While I think the twitter post author is being a bit of an ass, they’re sort of right about the overvaluing we’ve put on simply better prompts. I wrote an opinionated GitHub gist about this exact issue:<a href="https://gist.github.com/Hellisotherpeople/45c619ee22aac6865ca4bb328eb58faf" rel="nofollow noreferrer">https://gist.github.com/Hellisotherpeople/45c619ee22aac6865c...</a>I do the whole NLP publishing thing and I’ve hesitated to “write a paper” about applying techniques already known and used everywhere in the stable diffusion community to NLP models. That said, the AI community loves to pretend like they discovered something, such as a recent paper purporting to be the first to do “concept slider” Lora’s, despite these existing for months before that work was published on Civit.ai. The authors of course didn’t cite those already existing models.Everyone chasing citations and clout hard right now because these professors and researchers realize that they only have 5-10 years before AI eats their jobs and most other white collar jobs. I don’t blame them. I want my mortgage paid off before I’m automated away!

glitchc超过 1 年前

The current scientific research apparatus is more about being first than about being correct or thorough. A paper that gets out early means more citations, and many of the faculty sit on the editorial boards, and are able to suggest/enforce specific citations during the review process. Academics aren't fully to blame for this, it's just how the incentives are set up in the system. Tenure and promotions are increasingly based on h-index; a measure of impact based largely on the number of citations.

评论 #38531738 未加载

lmeyerov超过 1 年前

To bring some data to a sour grapes fight: <a href="https://paperswithcode.com/sota/code-generation-on-humaneval" rel="nofollow noreferrer">https://paperswithcode.com/sota/code-generation-on-humaneval</a>For code generation, GPT4 is getting beat by the small prompt library LATS wrapped around GPT3.5. Given the recent release of MagicCoder / Instruct-OSS, that means a small prompt library + a small 7B model you can self-host beats the much fancier GPT4.Similar to when simple NNs destroyed a decade of Bayesian modeling theses & research programs, it's frustrating for folks going other paths. But it doesn't make the work 'wrong'.

评论 #38536199 未加载

评论 #38536308 未加载

siva7超过 1 年前

Reminds me of what a real programmer is: <a href="https://sac.edu/AcademicProgs/Business/ComputerScience/Pages/Hester_James/Real%20Programmer.htm" rel="nofollow noreferrer">https://sac.edu/AcademicProgs/Business/ComputerScience/Pages...</a>

评论 #38530488 未加载

jatins超过 1 年前

Can a person just go upload anything on arxiv or is there a review process around these things?What I am really asking is "what makes something a paper and not a blogpost"?

评论 #38531059 未加载

评论 #38531096 未加载

Kelkonosemmel超过 1 年前

How to add prompt knowledge into research? By having papers about it.Shouldn't be the tooling around it good enough that a few prompt papers don't overload the system?

elif超过 1 年前

Nah, this is just an early example of many "this is too easy it doesn't count" defensive human arguments against AI.Parallel to the "you use copilot so your code quality is terrible and you don't really even understand it so it's not maintainable" human coping we are familiar with.If there is any shred of truth to these defenses, it is temporary and will be shown false by future, more powerful AI models.Consider the theoretical prompt that allows one of these models to rapidly improve itself into an AGI. Surely you'd want to read that paper right?

评论 #38535380 未加载

评论 #38531639 未加载

alphazard超过 1 年前

Developing prompts for these models isn't a science yet. It does seem to meet most of the criteria for an art though.We recognize some outputs as high quality, and others as low quality, but often can't articulate the exact reason why. It seems that some people are able to reliably produce high quality results, indicating there is some kind of skill involved. More precisely, the quality of an individual artist's last output is positively correlated with the quality of their next output. A kind of imprecise "shop talk" has emerged, self describing as "prompt engineering", which resembles the conversations artists in other mediums have.For people in tech this will seem most similar to graphic designers. They produce much nicer looking interfaces than lay people can. We often can't explain why, but recognize it to be the case. And graphic designers have their own set of jargon, which is useful to them, but is not scientific."Prompt artist" is a better term than "prompt engineer".

评论 #38538837 未加载

评论 #38533515 未加载

WhitneyLand超过 1 年前

Should this be a paper?<a href="https://not-just-memorization.github.io/extracting-training-data-from-chatgpt.html" rel="nofollow noreferrer">https://not-just-memorization.github.io/extracting-training-...</a>There is supporting analysis and measurement, but the essence is a single type of prompt, and DeepMind is a heavyweight lab I think it’s fair to say.Moreover there’s evidence people independently reported this result months beforehand on Reddit based on casual observation.

评论 #38533499 未加载

grepLeigh超过 1 年前

Studying the way LLMs behave to different prompts (or different ways of fine-tuning for a set of prompts) is valuable science.Some of the most interesting papers published this year ("Automatic Multi-Step Reasoning and Tool-Use") compare prompt strategies across a variety of tasks. The results are fascinating, findings are applicable and invite further research in the area of "prompt selection" or "tool selection."

3cats-in-a-coat超过 1 年前

Attacking the participants in a systemic shift is 100% useless as it doesn't target the culprit.In programming we have a similar phenomenon, that StackOverflow-driven (and I guess now GPT-driven) juniors have overtaken the industry and displaced serious talent. Because sufficient amounts of quantity always beats quality, even if the end result is inferior, this is caused by market dynamics, which operate on much cruder parameters than the sophisticated analysis of an individual noticing everything around them becoming "enshittified".SO-driven juniors are cheap, plentiful, and easily replaceable. And a business that values less expense and less risk therefore prefers them, because it has no way to measure the quality of the final product with simple metrics.The same mechanism is driving AI replacing our jobs currently, the avalanche of garbage papers by academics. This is entropy for you. We see it everywhere in modern society down to the food we eat. Quality goes away, replaced by cheap to produce and long shelf life.If we don't fundamentally alter what the system sees as ACCEPTABLE, and VALUABLE, this process will inevitably continue until our world is completely unrecognizable. And to fundamentally alter the system, we need an impulse that aligns us as a society, startles us into action, all together (or at least significant majority of us). But it seems we're currently in "slowly boiled frog mode".

评论 #38530683 未加载

评论 #38530717 未加载

mo_42超过 1 年前

Why not? A paper is not necessarily scientific nor a breakthrough. In my view, a paper is written and documented communication that's usually approved by peers in the field. Also a blunt observation in nature can be noteworthy. However, we don't see such papers anymore as these fields have matured. Just go back in the history of your field and you will find trivial papers.

评论 #38536331 未加载

henriquez超过 1 年前

Real science is reserved for those with real expertise! As the self-anointed gatekeeper of real science I decree that other peoples’ work fails to meet the minimum standard I have set for real science! Mind you not the work other actors in the scientific community publish and accept among their peers - they are not real scientists and their work is trivial. For shame!

评论 #38530509 未加载

评论 #38530666 未加载

carbocation超过 1 年前

The art and science of building these models is not disputed, but I think that the scientific value of prompts is tightly linked to reproducibility.If you’ve developed a new prompt for a model whose weights you can directly access, then this prompt could have scientific value because its utility will not diminish over time or be erased with a new model update. I’m even generally of the view that a closed API endpoint whose expiration date is years into the future could have some value (but much less so). But simply finding a prompt for something like ChatGPT is not useful for science because we don’t even have certainty about which model it’s executing against.Note that some of the best uses of these models and prompting have nothing to do with academics; this is a comment focused on the idea about writing academic papers about prompts.

skilled超过 1 年前

I can maybe understand the frustration from a “scientific” perspective, but for a lot of these “one prompt papers” - you still need someone to sit down and do the analysis and comparisons. Very few papers focus only on GPT/ChatGPT.Additionally, it gives people other ideas to try for themselves. And some of this stuff might be useful to someone in a specific scenario.It’s not glamorous research or even future-proof seeing as how certain prompts can be surgically removed or blocked by the owner of the model, but I don’t think it warrants telling people not to do it.

gandalfgeek超过 1 年前

If a new prompt enables a new task or enhances performance on a task then it absolutely should be published.Back in the day would compiler optimizations be not worthy of publishing?

snet0超过 1 年前

It's hard to draw these lines, because you will certainly filter out a lot of bad (i.e. useless, low contribution to any field) papers, but you might also filter out some really important papers. Research being basic or something anyone could've done doesn't count againt its potential importance, just the expected value of importance I guess.I'd rather we had a few too many bad papers than a few too few great papers.

yieldcrv超过 1 年前

Arxiv is like the MENSA of the tech worldThe similarity being that it’s ego masquerading as academic.Most things shared from there should have just been a blog post.The last year has showed that AI/ML research and use did not need academic gatekeeping by PhDs and yet many in that scene keep trying self infatuating things with the lowest utility.

etewiah超过 1 年前

Behind all this is a valid question. How does one evaluate prompts and LLMs? As gipeties (custom gpts) become more popular millions of hours will be wasted by ones that have been built badly. Without some sort of automated quality control, gipeties will become a victim of their own success.

potatoman22超过 1 年前

What's the difference between a paper on a new prompt and a paper discussing a new domain-specific model, e.g. heart failure risk? If they analyze the problem and solution equally, they both seem useful. It's not like most other ML papers share their weights or datasets.

JR1427超过 1 年前

This reminds me of how there was a boom in half-baked studies around COVID, e.g. modelling this or that aspect of the pandemic, or around mask wearing.I imagine that most of these will simply have had little to no impact, and will only serve to bolster the publication list of those who wrote them.

u32480932048超过 1 年前

ChatGPT droppings have to be at least as relevant and newsworthy as these findings [1]<pre><code> [1] https://www.wbur.org/news/2022/07/27/harvard-shorenstein-research-january-6-insurrection-president</code></pre>

SamBam超过 1 年前

Tired: Asking participants to sign an ethics pledge at the top of a tax return makes them more honest.Wired: Asking an LLM to write out their steps first makes them more accurate.They seem equally interesting to me, but one is a lot easier to replicate, and the other is easier to lie about.

karxxm超过 1 年前

It depends I guess.If you solve a problem that had been around for a while and LLMs offer a new way of approaching it, then it can definitely become a paper.Of cause one has to verify in sophisticated experiments, that this approach is stable.

jdefr89超过 1 年前

I am sorry but what can ChatGPT do that a couple of minutes of googling couldn’t solved? Write half hearted essays that all contain the same phrase?

评论 #38531664 未加载

评论 #38536177 未加载

评论 #38531662 未加载

评论 #38533161 未加载

d4rkp4ttern超过 1 年前

Yes these papers are optimizing for social media hype, I.e what is the quickest and easiest path to make noise on social media ?

coldtea超过 1 年前

Sorry, but if they can get away with it, they'll release it as a paper.It's not like most papers are much above that anyway...

samlhuillier超过 1 年前

Times are changing. Human researchers will dedicate more and more time towards getting language models to work in desired ways rather than doing the research themselves. Language models will largely be the ones making "research" discoveries. Both should be considered valid research IMO.

Racing0461超过 1 年前

Doesn't academia incentivise quantity over quality anyways?

darepublic超过 1 年前

A new prompt is not a paper, but you can prompt it for a paper.

gumballindie超过 1 年前

Anyone caught doing this should be kicked out of the industry. Period. You're scaming those funding your "research", you are misleading readers, and are producing low quality content wasting everyone's time.

selfhoster11超过 1 年前

Excuse me? Step by step wasn't paper-worthy? Hard disagree.LLM research is currently in its infancy, because they are no older than a few years old. And a research field in its infancy is bound to have a few noteworthy "no sh*t, Sherlock" papers that would be obvious from hindsight.The fact is, LLMs are a higher-order construct in machine learning, much like a fish is higher-order than a simple cellular colony. Lower-order ML constructs do not demonstrate emergent capabilities like step by step, stream of consciousness thinking, and so on.Academics should be less jaded and approach the field with beginner's eyes. Because we are all beginners here.

评论 #38533007 未加载

评论 #38531844 未加载

评论 #38531093 未加载

评论 #38531307 未加载

评论 #38532425 未加载

评论 #38532359 未加载

评论 #38532585 未加载

评论 #38531979 未加载

评论 #38532888 未加载

评论 #38532185 未加载

评论 #38533481 未加载

评论 #38531898 未加载

评论 #38532906 未加载

评论 #38534114 未加载

评论 #38530817 未加载

alickz超过 1 年前

Still beats most psychology papers

评论 #38530756 未加载

whywhywhywhy超过 1 年前

Academia needs to get over itself, can't wait to see how amazing this tech is going to get when the next generation who decide never to bother with those stuffy and navel gazing institutions becomes the driving force behind it.Looking forward to "I made this cool thing, here's the code/library you can use" rather than the papers/gatekeeping/ego stroking/"muh PhD".Think if Google had built an AI team around the former rather than the latter, they wouldn't have risked the future of their entire company and squandered their decade head start.