TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Architecting a Machine Learning System for Risk

173 pointsby lennysanalmost 11 years ago

11 comments

sytelusalmost 11 years ago
I'm actually bit surprised at people using PMML and this architecture. Clearly, here the attempt is to isolate model generation and runtime prediction but doing this also confines you to least common denominator. This means you can't generate any model that Opescoring can't handle. If you think about it, there is no real need for Opescoring. You can wip up REST service very easily that wraps sk-learn predictor and I would bet it's actually much easier to do than writing PMML exporters. Then you can use all the goodness of top of the line models white your service interface still remains same. The architecture that enforces you to use lowest common denominator just for abstraction purposes is a poor design, IMO.
评论 #7904087 未加载
评论 #7905237 未加载
评论 #7905236 未加载
jacquesmalmost 11 years ago
I've just built a system very much like this for a large customer. Extremely interesting and I learned a lot while doing it. Funny to see companies operating at a similar scale running into similar problems and solving them in roughly similar ways.
shoyeralmost 11 years ago
Looks like a cool project... but I hope the plan to open source their library to export Scikit-Learn classifiers to PMML! This would be a great way for them to give back to the open source community.
评论 #7906009 未加载
评论 #7904661 未加载
xtacyalmost 11 years ago
Nice writeup. It seems like a supervised learning approach to fraud detection. I have a question: Where does the is_fraud variable come? Is it done by humans?
评论 #7902322 未加载
评论 #7902357 未加载
Hortinsteinalmost 11 years ago
Don&#x27;t mean to hijack the comment thread, but can anyone recommend any good videos that introduce machine learning or courses? I studied computer science, but was not able to take any classes on the subject. I found the Stanford one, anyone have experience with it?<p><a href="http://online.stanford.edu/course/machine-learning" rel="nofollow">http:&#x2F;&#x2F;online.stanford.edu&#x2F;course&#x2F;machine-learning</a>
评论 #7902291 未加载
评论 #7902239 未加载
评论 #7902756 未加载
评论 #7902298 未加载
ShabbyDooalmost 11 years ago
I was thinking about the sorts of fraud categories AirBnB likely experiences. Most fraudsters want cash or cash equivalents, and the use of lodging on a particular night is nearly as illiquid as stolen fine art. So, those seeking stuff to resell will choose to defraud one of the zillion online marketers who ship stuff to doorsteps. A buyer who actually used the space he reserved could initiate a chargeback later claiming that the service promised via AirBnB wasn&#x27;t provided -- couldn&#x27;t access apartment, wasn&#x27;t as described, etc. However, space providers likely will cooperate with AirBnB and provide evidence in their defense. Better to attempt a chargeback elsewhere if one is short on money. It seems that using AirBnB as a platform for crimes between buyer and space provider is possible, and there certainly has been at least one heavily publicized case, but we would hear a lot more about these events if they were happening much.<p>So, what&#x27;s left? Collusion between buyer and space provider -- in all likelihood, they are one in the same, or identities have been stolen. For example, I list my condo on AirBnB for $100&#x2F;night. Someone books it for the weekend, and then doesn&#x27;t show up. AirBnB owes me $200 -- after all, I gave up other options to profit from its use. An honest buyer pays up. But, maybe the buyer is dishonest -- he used a stolen credit card, etc. In this case, AirBnB eats the loss and pays me as the space provider. Now, wouldn&#x27;t it be convenient if I was also the buyer? Cash from stolen credit cards, funneled through AirBnB (much akin to the way online poker sites were used to transfer stolen money via bad heads-up play). This would work until AirBnB noticed that my listing seems to have a suspicious propensity to attract fraudulent buyers. Then, they&#x27;ll shut me down. So, I&#x27;ll pop-up elsewhere. After all, no need to actually have a space because no one I accept will ever show up!<p>I bet the usage patterns of the party&#x2F;parties involved in this fraud are drastically different than those of legitimate market participants. Someone with a fraudulent listing could out himself by rejecting a bunch of legitimate AirBnB buyers, and this behavior would stand-out as it&#x27;s the opposite of the behavior expected of an honest seller. So, he must protect against this risk by making his listing unappealing (high price, bad photos&#x2F;description, unpopular location, etc.). The behavior of users browsing AirBnB when viewing this property could identify its relative undesirability (few clicks, etc.), and price outliers could be identified by comparing similar offerings by date&#x2F;location&#x2F;type. The click stream of the &quot;buyer&quot; likely is most revealing. Someone selecting an unappealing property without doing much comparison shopping likely isn&#x27;t a legit buyer.<p>What other stuff might predict fraud? Vague descriptions might indicate a fraudulent listing. Most space providers love to tell buyers what&#x27;s special about their offering. Could some scoring of a listing&#x27;s prose prove a strong predictor? I&#x27;ve never listed with AirBnB. What do they do to verify listings? As a buyer, they verified my identity. Could this serve multiple purposes? Certainly, I&#x27;d feel better listing my guest room if I know that AirBnB will know the identity of the guy who rented the room and then stabbed me at 3AM. But, in addition, does identifying market participants in strong ways help keep fraudsters from repeating their crimes by setting up multiple accounts? Obviously, newer market participants are more risky than established ones, especially those who have interacted with known legit, long-time users. The social graph comes to the rescue here. Even astroturfing ought to show up as a small, disconnected graph unless legit users&#x27; identities are stolen.<p>Of course, this comment is all just conjecture. Obviously, AirBnB can&#x27;t tell the public about specific fraud methods or how they identify suspicious activity. However, I like the concreteness of considering actual fraud scenarios, so I decided to put forth some ideas for discussion.
bayesianhorsealmost 11 years ago
I didn&#x27;t quite understand the need for openscoring and pmml. If it&#x27;s just a question of using a sklearn model to predict an outcome, why not just build it into a simple json-rpc with Tornado, Gevent or whatever the rage is, currently?
评论 #7903760 未加载
评论 #7904485 未加载
czbondalmost 11 years ago
I was wondering if their solution is a home grown version of SiftScience?
评论 #7902587 未加载
评论 #7902582 未加载
elliott34almost 11 years ago
does anyone know if Java can port a gradient boosting model from R
Mangaloralmost 11 years ago
OMG I love that cartoon. &quot;Machine learning&quot; is such a funny phrase if you think about it.
contingenciesalmost 11 years ago
Pet peeve: the verb is <i>designing</i>.<p>Besides that, let me rephrase here.<p><i>At &lt;west coast startup, essentially a copy of earlier successful European businesses such as HouseTrip, but with access to stupid amounts of US capital and therefore more profitable&gt;, we &lt;make superfluous, keyword-laden, unverifiable claim about ourselves in the future&gt;. We &lt;continue to integrate feel-good community pronouns&gt;. We &lt;here discuss something only tangential to our core business and assert that we have allocated at least two people to this area&gt;. We &lt;have nothing better to do than write it up, because quite frankly, there&#x27;s nothing more pressing for us to work on in an already automated business of relative simplicity&gt;</i>.<p>OK, so that&#x27;s a bit harsh, but there&#x27;s some points toward reality in there. Sorry, as someone who used to run a complex travel industry business (3200+ hotel contracts... all of them in Chinese, all business by digital fax (no convenience here!), constant rate changes, in 6 human languages and multiple currencies with a real time call center) and who co-pitched for VC with HouseTrip&#x27;s management in London in 2009, I just have very little respect for AirBNB.
评论 #7902458 未加载
评论 #7902436 未加载