You might not need to store plaintext email addresses

250 pointsby danielskoglyover 4 years ago

30 comments

In most cases, encrypting sensitive information like e-mail addresses with a memory-resident key (e.g. injected using tools like Vault) in the application layer is a better strategy, at least if you need asynchronous access to that information (e.g. to send out weekly update e-mails). Most of the data leaks in the past were caused by compromised or misconfigured databases, not by compromised application server code.Also, within the EU I need to be able to proactively reach my users (e.g. to notify them about a data loss), so only storing hashes of e-mail addresses and hoping users will log in so that I can send them an e-mail won't work.

评论 #24966946 未加载

评论 #24967364 未加载

评论 #24966425 未加载

gregoriolover 4 years ago

This would not work for any serious/useful service: e-mails are not only for marketing, there are many good reasons to send one like (user requested) notifications, invoicing, ... and also screw-ups! If your service had a problem (security, broken data, invoicing again, long downtime, ...), you better contact your users before they find out on hackernews.

评论 #24971025 未加载

评论 #24970784 未加载

评论 #24970753 未加载

评论 #24972371 未加载

评论 #24970534 未加载

inopinatusover 4 years ago

This scheme struggles in the face of email address case folding.At the protocol level, email addresses are case-folded on the RHS but case-sensitive on the LHS. So it’s crucial that LHS case is preserved by delivery systems. Unfortunately most users then treat them as folded on both. So you can successfully verify one variant, store the downcased hash, and it’ll subsequently match but delivery bounces. Or, hash the exact original input but have many baffled users unable to access their accounts. Neither is a good outcome.This is not an edge behaviour either, I have tons of users that mix up their email capitalisation from day to day.

评论 #24966234 未加载

评论 #24969064 未加载

评论 #24966274 未加载

评论 #24966339 未加载

评论 #24966222 未加载

评论 #24966694 未加载

simonwover 4 years ago

Some things that become difficult if you don't have a verified email address for your users:- Most common: a user has a support request because they can't get into their account (e.g. you have sign-in-with-Facebook and they lost their account there, or got banned).- Your authentication partner (again e.g. Facebook) disables your integration for some reason - someone reports your account as abusive (maybe maliciously) and it gets locked, and your attempts to work through Facebook customer support hit a brick wall. If you have email addresses you can at least get your users back into their accounts via a reset-password style flow.- You have a data breach, and you need to tell your users what happened and what private data of theirs was leaked to an attacker.- You get a legal threat - a DMCA takedown message for example - and need to pass it on to your users.- You sell your service to another company and the lawyers involved in the transaction insist on emailing out a terms of service update.There are plenty more.

评论 #24974031 未加载

AndrewStephensover 4 years ago

This is a clever idea but limited in applicability. It is probably fine for a low security web app or game, but could still leak personal information if the db got hacked.The problem is that the salt has to be the same for each record and that emails present a limited search space.Imagine I stole the database for blackmailable-fetish.com. All the emails are hashed with the same salt so I can brute-force the following restricted space:[top 200 first names][top 1000 surnames][digits from 0 - 999]@[top 5 email providers]That would probably get me 75% of the emails - let the extortion games begin!

评论 #24969424 未加载

评论 #24969840 未加载

评论 #24970384 未加载

评论 #24968270 未加载

评论 #24970016 未加载

评论 #24983212 未加载

crazygringoover 4 years ago

Sidenote, but I find this post maddening to understand, because the author seems to be using the word "e-mail" to mean both "e-mail address" and "e-mail message", and then uses ambiguous pronouns to boot:> In conclusion, if you only use emails for transactional emails, you might be able to only store hashed versions of them.HUH?The most obvious way to interpret this sentence is as storing hashed versions of transaction e-mail messages. Which makes no sense and isn't what the author means, but wow this is some confusing writing.

评论 #24971437 未加载

评论 #24971550 未加载

markvdbover 4 years ago

> Earlier this year, when I went from having only Facebook-login [...] to allow registrations with email and password, one of my concerns was how to implement this is a way that protects the data and privacy of my users.Any privacy effort is laudable. Then again, if you're serious about protecting your users' data and privacy, Facebook login is the elephant in the room.

评论 #24966913 未加载

评论 #24966753 未加载

评论 #24968612 未加载

评论 #24967186 未加载

nwsmover 4 years ago

> For every transactional email I need to send out - registration, account recovery, and email change verification - the user always initiates this by submitting their email address, and it will at that time be available to the backend to perform the needed action.This sounds like terrible UX, not to mention email use cases not initiated by the user. I really think you'd be shooting yourself in the foot by setting up a small site with this philosophy because you don't need emails right now

评论 #24973350 未加载

cm2187over 4 years ago

Good points. Though given how many emails have been leaked already, not sure sha256 with fixed salt achieves much. One can build a rainbow table with that salt fairly quickly. You might as well use bcrypt, scrypt and co.

评论 #24966149 未加载

评论 #24966979 未加载

pgtover 4 years ago

When signing up for a service, I always sign up with <name-of-service>@<my-domain.com>, which makes it easy to see who sold my email address and to filter/block by service.

webmobdevover 4 years ago

It's a good idea to protect user privacy. One drawback I can think of storing a hashed email is - What if the user forgets the username / email id and wants to know it? (This is a common use case). In such a case you have to collect additional unique data to help the user gain access to their account, but that defeats the original purpose - to protect user privacy.

评论 #24966173 未加载

评论 #24966126 未加载

mischanixover 4 years ago

The more important takeaway from this article for me is that sites should be hashing the Facebook user ID, since it's often far more personally-identifiable information than an email address.

评论 #24968906 未加载

RobertoGover 4 years ago

Maybe this is not important in your user-case, but what if you have a database breach and you have to warm your users?

评论 #24969867 未加载

评论 #24966814 未加载

nicbouover 4 years ago

This is the sort of content I come on HN for. It introduced me to a possibility I haven't considered, and it's followed by an interesting debate in the comments. Thank you for sharing.The most important caveat I can think of is the ability to inform irregular users about something important, either for legal or ethical reasons. For example, my note taking app is shutting down, and users might have important things stored there. I could also message them about a deprecated feature, a change to the ToS, or ironically a data breach.Nonetheless, it's still a good idea and I'll keep it in mind.

jeromenerfover 4 years ago

Storing email feels like a no-brainer for a system that needs to send messages to its customers. Some prefer phone numbers, which maybe provide stronger guarantees while being maybe not as long lasting.As an individual, the issue is that "anon" or "throw-away" emails are not that commoditized. I heard that "login with Apple" meant to provide an email proxy, hiding your real email, but I have not seen it deployed, except on Reddit. As good as it can be, it’s Apple only.I can always wildcard on a domain I rent and use klingo@domain as a mean to compartment identifiers but it is not low maintenance.Still, it feels better that "login with faang".

评论 #24967152 未加载

评论 #24966488 未加载

Vanitover 4 years ago

To support: Hey, I closed my Facebook account and would like you to delete my data for me?Oh...

评论 #24966090 未加载

评论 #24966218 未加载

评论 #24966122 未加载

评论 #24983567 未加载

Mandatumover 4 years ago

Yeah with data breaches becoming more and more common, I really think it's irresponsible to not have a way of contacting your users. Sure, you could throw a banner up on your website - but the comms should be immediate.This might be reasonable for a service that doesn't sell anything, or there's absolutely nothing owed to users and users have no reasonable expectation of privacy. But any commercial or professional organisation that doesn't have a method for contacting end-users is either A. Shady as fuck (numbered accounts, darknet-hosted, ignorance by design), or B. irresponsible.This is a website who's pure purpose is to extract PPC/ad and referral revenue from its users. There's no personal information requested from users, other than "Display Name". This is actually one of few exceptions I think the owner of the website is being more responsible with their user's data by not keeping anything.However, if they are breached and are serving malware to customers for a week before realising, they will have no way to tell their users they may have been affected. Or what if someone decides to install a backdoor and log the user's email and password when logging in? This is nitpicking and honestly probably 1% of websites hacked in this way actually notify their customers, but it's nevertheless still a hole in the design.They're also likely capping their earning potential if they do plan to sell the website, as they don't have any delicious user data to sell to marketers.. For which I commend Daniel and Bjorn! Well done.I don't know, I'm thinking this is great, but also pretty bad. Maybe adding an opt-in for breach notifications would be useful, or having a third-party service to subscribe to breach notifications for the website would be the best of both worlds.

Fumtumiover 4 years ago

Sounds unnecessary complicated for no real benefit with issues along the road.And it does feel weird to use Facebook in this example.If you don't care for an email address, and you are using the login only for maintaining that list, use an permalink. Thats probably easier and better.One permalink for edit, one permalink for viewing.

评论 #24966830 未加载

llimosover 4 years ago

We did this slightly differently. For login we stored a hash of the normalized email address (all lowercase, and handling gmail's dots and plusses). For sending emails we had them encrypted in a separate database, which only the mail-sending servers had access to - not the web-facing servers. That way we didn't need to ask for the email address every time, and it was still fairly well protected.

hinkleyover 4 years ago

I wonder what a database that supported a moral equivalent of cgroups would look like.I can't create a record, I can't delete a record, I can't see the email field, but I can change the subscription plan for this user, or change their avatar.We tend to do table or row level permissions, matrixed with verb. Column level occurs at the application layer, leaving plenty of room underneath for exfiltration.

stjoover 4 years ago

I admire your dedication to keeping your users data secure, anonymous and private.> For Wishy.gift I use SHA512 with a fixed salt Just a FIY in case you don’t know: if you want to allow different accounts with the same email, in case of a data breach it would be obvious by the duplicate hash this has occurred. Salting with a different nonce for every row is not much harder and would protect in that case.

评论 #24967407 未加载

gbergerover 4 years ago

Maybe this is a dumb question but how do you send an email if you only have the hash of the recipient's address?

评论 #24966588 未加载

scootover 4 years ago

"You might not need to store user email addresses"Emails and email addresses are two very different things.

评论 #24973348 未加载

评论 #24966491 未加载

tlarkworthyover 4 years ago

I think you can store emails but process them only in an anonymizing publicly auditable proxy, ensuring that downstream business services do not have plaintext access whilst still being able to send outbound emails whenever you want. I wrote about it recently: <a href="https://futurice.com/blog/trustworthy-services-from-cloud-provider-impartiality" rel="nofollow">https://futurice.com/blog/trustworthy-services-from-cloud-pr...</a>The key is to grant cloudfunctions.functions.sourceCodeGet (Or AWS/Azure equivalent) on the edge so anybody can verify that your proxy is above board. End users just have to trust the Cloud Providers access controls, not the service providers word on implementation.

BlueTemplarover 4 years ago

Huh, considering the case (& Unicode) issue (that I wasn't aware of !), shouldn't using email addresses as logins be considered bad practice ?

raziel2pover 4 years ago

You use email+password for login. Does this mean that on every login attempt, you iterate through every row in the database to check for a hash match?

评论 #24966612 未加载

评论 #24967283 未加载

评论 #24966619 未加载

评论 #24966617 未加载

评论 #24966729 未加载

megousover 4 years ago

Don't forget to NOT store email server logs either. ;) Otherwise this exercise is kinda pointless.

评论 #24968112 未加载

miki123211over 4 years ago

One thing worth noting is that often, you don't even need to store passwords.If a user wishes to log in, you send them a link/code by email. That increases security dramatically, as most email services already have some more advanced protections built-in. You also don't have to worry about leaks that much, as there are just no passwords to be leaked.

评论 #24966861 未加载

KingOfCodersover 4 years ago

Haven't heard of that one.I'd like to see an A/B test on conversion and long time satisfaction.

Kwpolskaover 4 years ago

That might be in breach of the GDPR. In the event of a personal data breach, you need to tell the data subjects about the breach [0]. You can’t just put a notice on your page, since someone might not be using your service any more, but you still have their data. And GDPR aside, it is very short-sighted to assume you will never ever need to e-mail users on your own.[0]: <a href="https://gdpr-info.eu/art-34-gdpr/" rel="nofollow">https://gdpr-info.eu/art-34-gdpr/</a>

评论 #24967175 未加载