How do large sites like fb/myspace/amazon/ <insert large email sender> manage to send millions of emails a day. What kind of infrastructure is used. How are the IPs warmed, how are the messages queued. How do they manage connections?<p>Any interesting reads on this topic?
The guys at MailChimp released a nice guide with an overview of what you need to know when running a high capacity email infrastructure: <a href="http://resources.mailchimp.com/email-delivery-for-it-professionals" rel="nofollow">http://resources.mailchimp.com/email-delivery-for-it-profess...</a>
I know some guys who used to send 100 M emails every weekend with a qmail cluster. It was sorta opt-in, sorta. Lets just say that they're not doing it anymore.<p>I don't think there's a big technical issue in scaling SMTP sends; you can have N machines doing it in parallel and it will work (almost) N times faster than one machine. There IS the question of minimizing your cost per unit email, and the real spammers address that by building SMTP senders that don't comply with the standard.<p>What does bug me about email is deliverability. It's pretty much impossible to send e-mail to AOL members, for instance, if you (1) don't pay ice to AOL, or (2) are big enough that AOL is afraid you'll sue them. AOL has shrunk a lot lately, so this isn't as big a concern as it was years ago.<p>It's not hard to get burned by other organizations as well. For instance, I sent an email shot on the the behalf of a campus organization at a university from off campus... an opt-in list of 2500 subscribers, really just chicken feed, but only 1999 went through -- after 2000 connection attempts, they firewalled my IP address and for all I know that address is still firewalled today. I knew the guy who runs email for that school by name and refused to talk about the whole affair... That's what you're really up against.
I helped design and setup 2 large SMTP infrastructures for two different European mobile carriers. Both carriers had more than 1 million customers and processed close to 100 million messages a day. It was about 7 years ago but I doubt things have changed that much.<p>We made heavy use of Mirapoint. But we also made use in different places of qmail and postfix. At the high end of scaling SMTP routing starts becoming similar to routing IP traffic. You're just not going to beat a dedicated device like Mirapoint with some Unix box you make from scratch. You need dedicated and differentiated mail routers for inbound, outbound and mail storage. You monitor their usage and when they get overloaded you just add more. Since everything is load balanced you can scale close to linearly. We used SAN for storage. Local storage just doesn't work.<p>Mail hits an inbound SMTP router and that router does an LDAP lookup to find which storage box actually handles hat account. Then it forwards it. That's all it does and it does it fast. Each storage box handles mail for the accounts it stores and IMAP/POP3 access for those accounts as well. The outbound routers just take mail from the storage routers and spool it for outbound.<p>I could go on but I think you get the idea. The main idea is to put everything behind a load balancer that you can so that you can scale it linearly as much as possible.
I'm starting my new job as Lead Architect at Experian Cheetahmail at the end of the month; I understand their system can send billions of emails in a month. I'm looking forward to finding out how they do that. It sounds very impressive, even before dealing with spam filtering.
Maybe you should ask how <a href="http://sendgrid.com/" rel="nofollow">http://sendgrid.com/</a> do it. Probably like terra_t said, with tons of SMTP senders in parallel.
Adding: Just noticed that some of the email servers are responding back saying they will accept a max of 10 emails per connection.<p>I am sure every emailing receiving server has its own darn rules, so how do large cos manage sending emails to all such services without getting blocked or by adhering to the random rules of each of the email receiving servers?
I, too, would be interested to know whether sites of that size tend to build their own email dbs in-house or find that anything available off-the-shelf is adequate (or make their own modifications to the latter).
I don't know what the specific implementation of each company is, but having a separate program for email delivery makes the most sense compared to using qmail or any other off the shelf MTA. There is most likely an MTA that is used to receive mail, but a separate program can be used to deliver mail from the same machine. There is a Javamail API, for example, which has been open sourced and can be used to deliver email from any java program. Threading is an important issue here since it is possible to have as many open connections as you want to deliver mail, but you would not want all of your open threads to be attempting to send emails to one domain all at once. Reputation determines how many emails get accepted or are bounced.