Some useful patterns, but also quite likley noise.<p>Both Alaska and Delaware are small states, with (relatively) small populations. For classifiers with uneven numbers of members (e.g., states), odds are high that whatever your outlier member is <i>will be a lower-population classifier</i>. It's simply a matter of variance and other elements.<p>To test for <i>actual</i> significance of those findings, you'd want to look at Monte Carlo simulations through your dataset over time to see if there's a <i>consistent</i> trend for these particular indicators, or if the locus shifts among several other regions.<p>Other indicators such as multiple accounts and time/day of activity suggest stronger causal relationships.
Fraud doesn't just cause merchants lost goods. It makes ordering from stores more difficult when there are extra steps (at least 3-D secure) just for fraud reduction. This both annoys customers and causes sales to be lost because ordering is less convenient.<p>It causes unnecessary returns when merchants end up sending things to non-existent addresses that someone just invented to see if their stolen card was working or not.<p>It even makes things like A/B testing less reliable, when your numbers are skewed by fraud. Eliminating an extra step in your order process might look like a great A/B testing win. Unless it was just because that "optimization" just made fraud easier to commit. At the very least it adds noise.<p>It also makes it more difficult to know where you as a merchant stand financially, as orders can become reversed in the future. Even if you seemingly turned a $1000 profit this month, later you might find out you actually didn't.<p>It's a sad situation that we have to just try to guess who might be committing fraud or not, sometimes denying service to perfectly legitimate users while still missing many cases of actual fraud.
I often buy sub-$20 items and have to try a few credit cards because I forget how much is left on each one (I use one-time load cards for security).<p>So that explains the holds I sometimes get on orders, where I am like why hold up a $20 order.<p>Regarding Deleware, I bet there are drop mailers there, surprised there isn't a database of those.
I wonder if the age range identified is because these fraudsters speed through the sign up and select the low value when asked their age. I wonder if they are either 1/1 or 12/31 babies too.
A little off-topic, but today there was a pretty entertaining post on Gawker about ISIS follower accounts who were caught talking about mundane stuff and having run-of-the-mill "Twitter" drama:<p><a href="http://gawker.com/even-isis-guys-have-twitter-drama-1740541455" rel="nofollow">http://gawker.com/even-isis-guys-have-twitter-drama-17405414...</a><p>What was funny was not just the purported content of the tweets -- now apparently removed by Twitter -- but how these guys were identified:<p>> <i>...Abu Yusuf Al-Jabarti is an avid tweeter (his handle, @AlJabarti42, indicates he’s been banned 41 times) and supporter of the Islamic State. Most of his tweets are like this, just trying to expand his brand like everyone else...</i><p>I'm not saying it's easy to write a general algorithm that follows a rule of "If an account gets banned an another account with the same name but a Levenshtein distance of 1 sprouts up from the same IP block and its first tweets contain similar content to the deleted account, then ban that new account, too"...at least, it wouldn't be easier than removing these accounts ad-hoc (i.e. after Gawker discovers them)...but some problematic users don't even make themselves hard to find and yet the prospective computational solution isn't necessarily practical to implement or particularly worth anyone's time (at the moment...).
Systems like Sift are interesting. Machine learning can pick up a lot of trends over time that humans can't. The problem is that they often fail to react quickly to new trends, many of which hit hard quickly. I've talked to a lot of these vendors and most of them are complimenting machine learning systems with traditional rule-based systems and human review because ML by itself is too slow in adapting to attackers.
>Fraudsters work at night. 3 a.m. is the fraudiest time of day, regardless of time zone.<p>what does that even mean? it's always 3 am somewhere in the world.
I'm generally opposed to torture, but I make an exception for people who coin neologisms like 'fraudiest.' No thanks for that assault on literacy.