I'm working on a spam control plugin and I need to test it. Ideally, I need hundreds to thousands of emails, from different senders, delivered to my test email account.<p>Are there any tools or methods available for this? I haven't found anything, I imagine because such a tool would be rife for abuse.
Spam training set:<p><a href="http://untroubled.org/spam/" rel="nofollow">http://untroubled.org/spam/</a><p>More good ones here<p><a href="http://stackoverflow.com/questions/4743996/publicly-available-spam-filter-training-set" rel="nofollow">http://stackoverflow.com/questions/4743996/publicly-availabl...</a>
Host a tor relay. I run two and put my email up as the maintainer and now I consistently get over 200 messages a day (I've since shut down my private mail server it was so bad).
Run a simulated OpenRelay <a href="https://github.com/schumann2k/SpamTrap" rel="nofollow">https://github.com/schumann2k/SpamTrap</a> but instead of dropping all mail, redirect it to your target account. Might take a few days to weeks to get started though.
I have a <i>lot</i> of spam gathered over many years. It would need washing to remove my addresses from it, but there's a chance I could provide you with a corpus.<p>I would need more details about what you're doing, and some assurances about what you would do with the data. What do you actually need? Do you need all the headers? It would be easier if I only needed to provide the bodies.<p>I would need to look at the data I have, and I might withdraw this offer if it would be too much work. In the meantime, perhaps you could think carefully about exactly what you need.<p>You can contact me via the email address in my profile. It might take a day or two for me to reply.
I did research in this area with the amazing Jeff Huang [1].<p>(Here's the findings: <a href="https://medium.com/@karan/how-do-spammers-harvest-your-e-mail-address-3d30c77a019a" rel="nofollow">https://medium.com/@karan/how-do-spammers-harvest-your-e-mai...</a>)<p>Answer to your question: <a href="https://d262ilb51hltx0.cloudfront.net/max/2000/0*q9f3570SPFftR9cx.png" rel="nofollow">https://d262ilb51hltx0.cloudfront.net/max/2000/0*q9f3570SPFf...</a><p>> Notably, spammy mailing lists send the most spam. These mailing lists include sites that promise you free credit scores, or insurance quotes, or free ipads etc. These sites stink of spam, but people still continue to give them their email addresses.<p>[1] <a href="http://jeffhuang.com/" rel="nofollow">http://jeffhuang.com/</a>
Post that your company just got funded in CrunchBase :)<p>But in all seriousness, look at blackhatworld.com (where the spammers gather) and look at how they scrape and spam email addresses (search for "email method"); there are a few ways that everyone else is copying, and you could get your email in there.
Setup your email server to "swallow all". I did this and after a few years I got around one million spam mails per day.<p>I guess that if the emails never bounce they will keep sending new stuff to you. It's like saying "yes, yes, yes" to a sales person, they will just keep adding more stuff :P
Put the unmunged address in the from and reply-to header of a Usenet client and make a few posts to Usenet.<p>Ditto some email lists.<p>Websearch news.admin.net-abuse.sightings
You're going to want more than one email account. Ideally you would setup one or more catch all domains where anything@domain.com dumps in to your collection. This can be set up with Linux and Postfix. Your configuration should not use any blacklists. Then do what others have suggested to spread these addresses around the net.
Tweet out the email perhaps?<p>Sign up for one of those free WalMart gift cards or the like - it's an endless chain of "offers" you fill out. I'd hate to share any links to give them any of HN's PR, but Googling "free walmart gift card" should get a juicy starting point 3 links or so down the page :-)
Search for "free search engine submission" on Google. There are sites that will take an email address as part of the signup procedure and you will get BLITZED with spam and various junk. I used this nasty little trick to get revenge on someone when I was but a youngling.
TREC has a spam corpus albeit dated at this point<p><a href="http://plg.uwaterloo.ca/~gvcormac/treccorpus/" rel="nofollow">http://plg.uwaterloo.ca/~gvcormac/treccorpus/</a><p>92k messages, 52k of which are labelled as spam messages in the 05 corpus. Totals 300mb or so
Would a pre-assembled corpus of spam be ok for your needs? <a href="https://www.google.ru/search?ie=utf-8&oe=utf-8&q=spam+corpus" rel="nofollow">https://www.google.ru/search?ie=utf-8&oe=utf-8&q=spam+corpus</a>