Don’t try to sanitize input, escape output (2020)

68 pointsby benhoyt10 months ago

20 comments

We were contacted by a bug hunter once stating he has access to our database and asking for a bounty for his finding, he even provided a sample of first 100 users from the users table in the database.After some investigating, I figured out how did he obtain the data.He was one of the first 100 users, he set one of his fields to an xss hunter payload, and slept on it.After two years, a developer had a dump of data to test some things on, and he loaded the data into an sql development software on his mac, and using his vscode muscle memory, he did a command+shift+p to show the vscode command bar, but on the sql editor it opened "Print Preview", and the software rendered the current table view into a webview to ease the printing, where the xss payload got executed and page content was sent to the researcher.Escape input, you never know where will it be rendered.

评论 #40960296 未加载

评论 #40959949 未加载

评论 #40960291 未加载

评论 #40960533 未加载

simonw10 months ago

This is such an important lesson, but it's a difficult one to convince people of - telling people NOT to sanitize their input goes against so much existing thinking and teaching about web application security.It's worth emphasizing that there's still plenty of scope for sensible input validation. If a field is a number, or one of a known list of items (US States for example) then obviously you should reject invalid data.But... most web apps end up with some level of free-form text. A comment on Hacker News. A user's bio field. A feedback form.Filtering those is where things go wrong. You don't want to accidentally create a web development discussion forum where people can't talk about HTML because it gets stripped out of their comments!

评论 #40959393 未加载

评论 #40959432 未加载

评论 #40959233 未加载

foota10 months ago

It's buried a bit in the article, but if you have to sanitize input to allow only some kinds of inputs (e.g., specific tags), you should really be parsing it fully to an AST and then acting on that (or using a library doing the same) since otherwise you're going to be subject to all sorts of pain.

评论 #40959291 未加载

评论 #40959240 未加载

WillAdams10 months ago

I still wish that the Unicode folks had set up a bunch of duplicate code points which could have been used exclusively for processing marked-up text and that the folks making markup systems/languages had followed through.Say one was updating TeX to take advantage of this --- all the normal Unicode character points would then have catcodes set to make them appropriate to process as text (or a matching special character), while "processing-marked-up" characters would then be set up so that for example:- \ (processing-marked-up variant) would work to begin TeX commands- # (processing-marked-up variant) would work to enumerate macro command arguments- & (processing-marked-up variant) would work to delineate table columns&c.and the matching "normal" characters when encountered would simply be set.

marticode10 months ago

Why not not both? Escaping output should be a requirement but doesn't hurt to remove obvious garbage in the input (including harmless stuff like pointless spaces)

评论 #40959212 未加载

评论 #40959545 未加载

评论 #40959529 未加载

评论 #40959584 未加载

buro910 months ago

I store the raw input in my database, but run it through bluemonday before rendering it. Simples.<a href="https://github.com/microcosm-cc/bluemonday">https://github.com/microcosm-cc/bluemonday</a>

评论 #40963796 未加载

hinkley10 months ago

This is another place where 80% of the time one way works but 20% of the time you need to go the other way.Of course once the product is in production you can swim one direction but not fight the current going in the other. You can always move to escaping output, but retroactively sanitizing input is a giant pain in the ass.But the problem comes in with your architecture, and whether you can discern data you generated from data the customers generated. Choose the wrong metaphors and you end up with partially formatted data existing halfway up your call stack instead of only at the view layer. And now you really are fucked.Rails has a cheat for this. It sets a single boolean value on the strings which is meant to indicate the provenance of the string content. If it has already been escaped, it is not escaped again. If you are combining escaped and unescaped data, you have to write your own templating function that is responsible for escaping the unescaped data (or it can lie and create security vulnerabilities. "It's fine! This data will always be clean!" Oh foolish man.)The better solution is to push the formatting down the stack. But this is a rule that Expediency is particularly fond of breaking.

shaftway10 months ago

I've always been a big fan of structuring data on input, escaping it on output.I think the big problem with just escaping output is that you can accidentally change what the output will actually be in ways that your users can't predict. If I am explaining some HTML in a field and drop `...` in there today, your escaper may escape this properly. But next month when you decide to change your output to actually allow an `` tag, then all of a sudden my comment looks like some italicized dots, which broke it.Instead if you structure it, and store it in your datastore as a tree of nodes and tags, then next month when you want to support `` you update the input reader to generate the new structure, and the output writer to handle the new tags. You preserve old values while sanitizing or escaping things properly for each platform.

zzo38computer10 months ago

It is a reasonable idea, but there are other things that can be done too.However, in the stuff about SQL, you could use SQL host parameters (usually denoted by question marks) if the database system you use supports it, which can avoid SQL injection problems.If you deliberately allow the user to enter SQL queries, there are some better ways to handle this. If you use a database system that allows restricting SQL queries (like the authorizer callback and several other functions in SQLite which can be used for this purpose), then you might use that; I think it is better than trying to write a parser for the SQL code which is independent of the database, and expecting it to work. Another alternative is to allow the database (in CSV or SQLite format) to be downloaded (and if the MIME type is set correctly, then it is possible that a browser or browser extension will allow the user to do so using their own user interface if they wish to do so; otherwise, an external program can be used).Some of the other problems mentioned, and the complexity involved, are due to problems with the messy complexity of HTML and WWW, in general.For validation, you should of course validate on the back end, and you may do so in the front end too (especially if the data needed for validation is small and is intended to be publicly known). However, if JavaScripts are disabled, then it should still send the form and the server will reply with an error message if the validation fails; if JavaScripts are enabled then it can check for the error before sending it to the server; therefore it will work either way.

chx10 months ago

This has been the way for Drupal since ... 2005 at least. My memory becomes fuzzy before that. Since 2015 it's highly automated too thanks to Twig autoescape.

Udo10 months ago

They're not even related. Sanitizing input is at best a formatting/style issue. Escaping output is a security issue.

lsb10 months ago

Of the “six famous bad ideas in computer security”, the first and second are “default permit” and “enumerating badness”.<a href="http://www.ranum.com/security/computer_security/editorials/dumb/" rel="nofollow">http://www.ranum.com/security/computer_security/editorials/d...</a>

评论 #40959288 未加载

KingOfCoders10 months ago

I think the challenge is, you share data with other systems. If you don't treat "sharing" as "output" you're in trouble.

评论 #40962561 未加载

kazinator10 months ago

Of course you should sanitize input, and escape everything properly in the context-specific way.Defining what is valid for an input field and rejecting everything else helps the user catch mistakes. It's not just for security.Some kinds of information are tricky to sanitize. Names, addresses and such. Especially in an application or site that has global users. Do the wrong thing and you end up aggravating users, who are not able to input something legitimate.But maybe don't allow, say, a date field to be "la la la" or even "December 47, 2023".

ww52010 months ago

Still looking for a way to safely parse HTML string into DOM while avoiding XSS attacks. Most solutions end up with sanitizing input.

评论 #40959178 未加载

评论 #40959618 未加载

评论 #40959452 未加载

评论 #40959300 未加载

ecjhdnc202510 months ago

Ehhh!? I don't get this at all. You obviously do both.1) you get your input data into the form that is meaningful in the database by validating, sanitising and transforming it. Because you know what form that data should be in, and that's the only form that belongs in your database. Data isn't just output, sometimes it is processed, queried, joined upon.2) you correctly format/transform it for output formats. Now you know what the normalised form is in the database, you likely have a simpler job to transform it for output.It's not just lazy to suggest there's a choice here, it's wrong.

评论 #40960505 未加载

atmanactive10 months ago

Absolutely the worst advice ever!

dudeinjapan10 months ago

Porque no los dos?

TheChaplain10 months ago

Disagree.Escaping/sanitizing on output takes extras cycles/energy that can be spared if the same process is done once upon submission.Think more sustainable.

评论 #40959292 未加载

评论 #40959278 未加载

评论 #40959504 未加载

评论 #40960119 未加载

评论 #40959410 未加载

评论 #40959569 未加载

ungamedplayer10 months ago

The reason you sanitise input is because the data can attack the host and the client.This post has a narrow view on attackers.

评论 #40959198 未加载