In most cases, encrypting sensitive information like e-mail addresses with a memory-resident key (e.g. injected using tools like Vault) in the application layer is a better strategy, at least if you need asynchronous access to that information (e.g. to send out weekly update e-mails). Most of the data leaks in the past were caused by compromised or misconfigured databases, not by compromised application server code.<p>Also, within the EU I need to be able to proactively reach my users (e.g. to notify them about a data loss), so only storing hashes of e-mail addresses and hoping users will log in so that I can send them an e-mail won't work.
This would not work for any serious/useful service: e-mails are not only for marketing, there are many good reasons to send one like (user requested) notifications, invoicing, ... and also screw-ups! If your service had a problem (security, broken data, invoicing again, long downtime, ...), you better contact your users before they find out on hackernews.
This scheme struggles in the face of email address case folding.<p>At the protocol level, email addresses are case-folded on the RHS but case-sensitive on the LHS. So it’s crucial that LHS case is preserved by delivery systems. Unfortunately most users then treat them as folded on both. So you can successfully verify one variant, store the downcased hash, and it’ll subsequently match but delivery bounces. Or, hash the exact original input but have many baffled users unable to access their accounts. Neither is a good outcome.<p>This is not an edge behaviour either, I have tons of users that mix up their email capitalisation from day to day.
Some things that become difficult if you don't have a verified email address for your users:<p>- Most common: a user has a support request because they can't get into their account (e.g. you have sign-in-with-Facebook and they lost their account there, or got banned).<p>- Your authentication partner (again e.g. Facebook) disables your integration for some reason - someone reports your account as abusive (maybe maliciously) and it gets locked, and your attempts to work through Facebook customer support hit a brick wall. If you have email addresses you can at least get your users back into their accounts via a reset-password style flow.<p>- You have a data breach, and you need to tell your users what happened and what private data of theirs was leaked to an attacker.<p>- You get a legal threat - a DMCA takedown message for example - and need to pass it on to your users.<p>- You sell your service to another company and the lawyers involved in the transaction insist on emailing out a terms of service update.<p>There are plenty more.
This is a clever idea but limited in applicability. It is probably fine for a low security web app or game, but could still leak personal information if the db got hacked.<p>The problem is that the salt has to be the same for each record and that emails present a limited search space.<p>Imagine I stole the database for blackmailable-fetish.com. All the emails are hashed with the same salt so I can brute-force the following restricted space:<p>[top 200 first names][top 1000 surnames][digits from 0 - 999]@[top 5 email providers]<p>That would probably get me 75% of the emails - let the extortion games begin!
Sidenote, but I find this post <i>maddening</i> to understand, because the author seems to be using the word "e-mail" to mean both "e-mail address" and "e-mail message", and then uses ambiguous pronouns to boot:<p>> <i>In conclusion, if you only use emails for transactional emails, you might be able to only store hashed versions of them.</i><p>HUH?<p>The most obvious way to interpret this sentence is as storing hashed versions of transaction e-mail messages. Which makes no sense and isn't what the author means, but wow this is some confusing writing.
> Earlier this year, when I went from having only Facebook-login [...] to allow registrations with email and password, one of my concerns was how to implement this is a way that protects the data and privacy of my users.<p>Any privacy effort is laudable. Then again, if you're serious about protecting your users' data and privacy, Facebook login is the elephant in the room.
> For every transactional email I need to send out - registration, account recovery, and email change verification - the user always initiates this by submitting their email address, and it will at that time be available to the backend to perform the needed action.<p>This sounds like terrible UX, not to mention email use cases not initiated by the user. I really think you'd be shooting yourself in the foot by setting up a small site with this philosophy because you don't need emails <i>right now</i>
Good points. Though given how many emails have been leaked already, not sure sha256 with fixed salt achieves much. One can build a rainbow table with that salt fairly quickly. You might as well use bcrypt, scrypt and co.
When signing up for a service, I always sign up with <name-of-service>@<my-domain.com>, which makes it easy to see who sold my email address and to filter/block by service.
It's a good idea to protect user privacy. One drawback I can think of storing a hashed email is - What if the user forgets the username / email id and wants to know it? (This is a common use case). In such a case you have to collect additional unique data to help the user gain access to their account, but that defeats the original purpose - to protect user privacy.
The more important takeaway from this article for me is that sites should be hashing the Facebook user ID, since it's often far more personally-identifiable information than an email address.
This is the sort of content I come on HN for. It introduced me to a possibility I haven't considered, and it's followed by an interesting debate in the comments. Thank you for sharing.<p>The most important caveat I can think of is the ability to inform irregular users about something important, either for legal or ethical reasons. For example, my note taking app is shutting down, and users might have important things stored there. I could also message them about a deprecated feature, a change to the ToS, or ironically a data breach.<p>Nonetheless, it's still a good idea and I'll keep it in mind.
Storing email feels like a no-brainer for a system that needs to send messages to its customers. Some prefer phone numbers, which maybe provide stronger guarantees while being maybe not as long lasting.<p>As an individual, the issue is that "anon" or "throw-away" emails are not that commoditized.
I heard that "login with Apple" meant to provide an email proxy, hiding your real email, but I have not seen it deployed, except on Reddit. As good as it can be, it’s Apple only.<p>I can always wildcard on a domain I rent and use klingo@domain as a mean to compartment identifiers but it is not low maintenance.<p>Still, it feels better that "login with faang".
Yeah with data breaches becoming more and more common, I really think it's irresponsible to not have a way of contacting your users. Sure, you could throw a banner up on your website - but the comms should be immediate.<p>This might be reasonable for a service that doesn't sell anything, or there's absolutely nothing owed to users and users have no reasonable expectation of privacy. But any commercial or professional organisation that doesn't have a method for contacting end-users is either A. Shady as fuck (numbered accounts, darknet-hosted, ignorance by design), or B. irresponsible.<p>This is a website who's pure purpose is to extract PPC/ad and referral revenue from its users. There's no personal information requested from users, other than "Display Name". This is actually one of few exceptions I think the owner of the website is being more responsible with their user's data by not keeping anything.<p>However, if they are breached and are serving malware to customers for a week before realising, they will have no way to tell their users they may have been affected. Or what if someone decides to install a backdoor and log the user's email and password when logging in? This is nitpicking and honestly probably 1% of websites hacked in this way actually notify their customers, but it's nevertheless still a hole in the design.<p>They're also likely capping their earning potential if they do plan to sell the website, as they don't have any delicious user data to sell to marketers.. For which I commend Daniel and Bjorn! Well done.<p>I don't know, I'm thinking this is great, but also pretty bad. Maybe adding an opt-in for breach notifications would be useful, or having a third-party service to subscribe to breach notifications for the website would be the best of both worlds.
Sounds unnecessary complicated for no real benefit with issues along the road.<p>And it does feel weird to use Facebook in this example.<p>If you don't care for an email address, and you are using the login only for maintaining that list, use an permalink. Thats probably easier and better.<p>One permalink for edit, one permalink for viewing.
We did this slightly differently. For login we stored a hash of the normalized email address (all lowercase, and handling gmail's dots and plusses). For sending emails we had them encrypted in a separate database, which only the mail-sending servers had access to - not the web-facing servers. That way we didn't need to ask for the email address every time, and it was still fairly well protected.
I wonder what a database that supported a moral equivalent of cgroups would look like.<p>I can't create a record, I can't delete a record, I can't see the email field, but I can change the subscription plan for this user, or change their avatar.<p>We tend to do table or row level permissions, matrixed with verb. Column level occurs at the application layer, leaving plenty of room underneath for exfiltration.
I admire your dedication to keeping your users data secure, anonymous and private.<p>> For Wishy.gift I use SHA512 with a fixed salt
Just a FIY in case you don’t know: if you want to allow different accounts with the same email, in case of a data breach it would be obvious by the duplicate hash this has occurred. Salting with a different nonce for every row is not much harder and would protect in that case.
I think you can store emails but process them only in an anonymizing publicly auditable proxy, ensuring that downstream business services do not have plaintext access whilst still being able to send outbound emails whenever you want. I wrote about it recently: <a href="https://futurice.com/blog/trustworthy-services-from-cloud-provider-impartiality" rel="nofollow">https://futurice.com/blog/trustworthy-services-from-cloud-pr...</a><p>The key is to grant <i>cloudfunctions.functions.sourceCodeGet</i> (Or AWS/Azure equivalent) on the edge so anybody can verify that your proxy is above board. End users just have to trust the Cloud Providers access controls, not the service providers word on implementation.
Huh, considering the case (& Unicode) issue (that I wasn't aware of !), shouldn't using email addresses as logins be considered bad practice ?
One thing worth noting is that often, you don't even need to store passwords.<p>If a user wishes to log in, you send them a link/code by email. That increases security dramatically, as most email services already have some more advanced protections built-in. You also don't have to worry about leaks that much, as there are just no passwords to be leaked.
That might be in breach of the GDPR. In the event of a personal data breach, you need to tell the data subjects about the breach [0]. You can’t just put a notice on your page, since someone might not be using your service any more, but you still have their data. And GDPR aside, it is very short-sighted to assume you will never ever need to e-mail users on your own.<p>[0]: <a href="https://gdpr-info.eu/art-34-gdpr/" rel="nofollow">https://gdpr-info.eu/art-34-gdpr/</a>