You don't need ML/AI, you need SQL

286 pointsby cyberominalmost 7 years ago

23 comments

tfehringalmost 7 years ago

>say a person bought a pair of shoe, sunglasses and a book. For their newsletter, we will show include shoes, sunglasses and books. This was a lot more relevant than sending random stuff.I agree with the general sentiment of the article, but this seems like a poor example, since a more sophisticated approach can add a lot of value to a recommendation system. How do you know whether a customer is likely to want more than one item in any of those categories? If they already purchased sunglasses, wouldn't they be more likely to purchase, say, a sunglasses case and/or sunscreen? If they purchased a book, do you recommend the same book again? And if not, how do you choose which book(s) to include?Of course, you could technically still handle this in SQL with a bunch of CASE statements, but obviously that doesn't scale well across a wide range of products. The whole point of ML/AI in that use case is to scale that type of nontrivial decision making.

评论 #17434243 未加载

评论 #17434292 未加载

评论 #17434248 未加载

评论 #17434871 未加载

joe_the_useralmost 7 years ago

Nice post,Here's a different way to think about the situation with current AI/deep-learning; if the current upsurge of methodologies was getting close to general AI, it would be getting closer and closer to a hammer that really did let you treat everything as a nail. IE, it would be general purpose.But I think I can say we're not seeing that even though deep learning seems to be continually expanding the domains that it can operate on. How is that? This Open AI is very eye-opening; "We’re releasing an analysis showing that since 2012, the amount of compute used in the largest AI training runs has been increasing exponentially with a 3.5 month-doubling time (by comparison, Moore’s Law had an 18-month doubling period)." Essentially, as a rather brute-force-y method, we have shown we can expand deep learning's impact to a larger and large domain but not at all in the fashion of human learning tricks (where the new isn't that much harder than the old trick).Maybe, in this process, a better algorithm that adjusts to new situations without increased costs will surface. But until then it seems new and old methods will need to coexist.<a href="https://blog.openai.com/ai-and-compute/" rel="nofollow">https://blog.openai.com/ai-and-compute/</a>?

评论 #17434323 未加载

posix_compliantalmost 7 years ago

Good post, but I couldn't disagree more. Regardless of your business size, it will always be valuable to know information such as:* How does every additional coupon-dollar affect the total amount a customer buys?* What is the relationship between customer age and retention for my store?* Does giving a customer more purchase options help or hurt their chances of making a purchase?My experience is that each of these questions can be solved, in part, using 3 lines of Python code:<pre><code> from sklearn.linear_model import LinearRegression lr = LinearRegression() lr.fit(X,y) </code></pre> Then look at the beta coefficients of the model, and you have a rough idea of how different features are correlated. Doing something like this in SQL sounds difficult. If you have data to interpret, it makes sense to use similar methods. I can't think of an example where you have data but refuse to look at it until your company is "bigger".

评论 #17434277 未加载

评论 #17434356 未加载

tilt_erroralmost 7 years ago

The premise for this article is wrong!The author describes using SQL to pull facts from history; who was the number one customer the last week, who abandoned online orders and so on.The premise should instead be how to fit a model onto your business data so that you better can guess who will be the number one customer next week, what (s)he will order and so on.The problem that ML addresses is how to arrive at that model, under the assumption that you can use historic data to pick either model or parameterise a model.SQL has it merits, as does the relational database model, but this has nothing to do with creating models (even though we are modelling the data itself). The author gives some examples that are, frankly, trivial.But he has a good argument around namedropping "hot" technology when your business need does not incorporate distributed trust (blockchain), modelling behaviour (or some such) using ML and so on.

评论 #17434986 未加载

gwbas1calmost 7 years ago

Maybe I'm niave, but are there really people who want to hop on the AI bandwagon just to do mundane lookups like this?When I worked with machine learning many years ago, we learned that it was no better than the heuristics already in place. The thing is, it's much easier to diagnose a well written and understood heuristic than a machine learning model.

评论 #17434087 未加载

评论 #17433967 未加载

评论 #17434223 未加载

评论 #17433979 未加载

评论 #17434266 未加载

评论 #17434265 未加载

reificatoralmost 7 years ago

I like to think I'm not too nitpicky about fonts, but that st ligature is incredibly distracting.It's the second article I've seen here that uses it over the last few days, but I'm not sure if it's the same site or not.

评论 #17433920 未加载

评论 #17433940 未加载

评论 #17434278 未加载

评论 #17434488 未加载

gonyeaalmost 7 years ago

This post is downright bonkers. “We don’t need ML/AI! Proof: list of things you wouldn’t use ML for”There are so many problems you can solve with a neural network. Should Waymo ETL sensor data and do a WHERE NOT IN for bicyclists?This is blog post is pretty dismissive. Statistics software has been in use since the beginning; see SAS. Financial institutions, actuaries, etc, have been using these methods with SQL data as the input and it’s the only reason they’re still in business.If this blog post simply suggested hiring a BI Analyst in your startup, I wouldn’t disagree.

benkarstalmost 7 years ago

There's no logical equivalency between SQL and ML/AI.SQL is a language that helps retrieve the data you're looking. ML/AI helps you predict the future (using past data).Maybe this is directed towards product people? But it has SQL in the title so it can't be. I'm confused as to who the audience is here.

评论 #17434098 未加载

oh-kumudoalmost 7 years ago

The title is often true, but it doesn't mean too much or anything. And the same argument is brought up again and again in the past as well.What OP suggests, the so called SQL, is basically a heuristic based system. When done probably and carefully, it could of course work very well, and is indeed often used as baseline model to bootstrap a ML system. However, eventually the rule-based system will hit the wall, and ML be the savior of the day to push the metric further for a margin of 20-30%.So yes, when you are small and has little data, ML is irrelevant. But same thing could be said to too many things in software industry, you probably won't need Docker/Big Data/Fancy JS as well, if you are building a small scale online store.Choose wisely your tech stack based on your problem, but the title is needlessly sensationalized.

评论 #17435440 未加载

评论 #17434293 未加载

Nasrudithalmost 7 years ago

I think this highlights a problem separate from machine learning, block chain, and similar vs the tried and proven technologies and a long standing one: attempting to solve via understanding vs seeking the simple solution to avoid thinking about it.Ironic that machine learning is 'simple' but that seems to be the case at times especially with the 'throw block chain or machine learning at it' approach when a proper algorithm could do it far more efficiently. The funny thing is that both approaches have their place. If turning it off and on again fixes a rare issue faster than following every instruction to machine code you are better off restarting it occasionally - unless it is a critical application where doing so will cost millions of dollars or lives.

panicalmost 7 years ago

I like this article's focus on technology as a way of helping skilled people do their job more effectively. Why shouldn't a business owner be able to use Bash and SQL to run their business? Maybe the solution isn't new technology, but training people to use the old stuff.

评论 #17434012 未加载

mrtksnalmost 7 years ago

But you don't gain ML/AI know-how by doing SQL, nor you discover previously unknown potential about your product buy sticking to your usual toolset.Not that I necessarily disagree with the OP but I find it deeply uninspirational.What's the difference between using ML/AI for problems traditionally solved by some other tool and using any other tool to solve the same problem unconventionally? Both can be "hacking". I guess my issue with this is the word "need", don't do what you need to do but what you want to do if you are looking for inspiration. After all, mankind never needed to leave the garden of Eden but left it anyway.

评论 #17434043 未加载

评论 #17433964 未加载

评论 #17434006 未加载

pnathanalmost 7 years ago

There aren't very many DBAs practicing in modern shops and devs don't seem to be too into SQL and delivering excellent SQL queries and schemas. It's its own skillset.I would also call out the NoSQL hype train here.NoSQL has its place, and largely its place is when SQL can not tolerate the intensity of traffic or the size of the dataset. You can look at the Dynamo paper for an example of the engineering rationale.Postgres can take enormous amounts of data at quite decent rates - without spending too much time on tuning even.

评论 #17434062 未加载

thomasfedbalmost 7 years ago

Turns out that people are actually kinda smart - toss in some raw cycles to handle the mind-numbing bits and you can have a solid system that does smart things en masse.

altitudinousalmost 7 years ago

I'm not sure about this article, but there is certainly scope for an article named "You don't need Blockchain, you need SQL/a Database"

sacado2almost 7 years ago

The real problem is people (including the author of the article, apparently) think ML is necessarily some kind of ultra-complicated technique that needs a PhD and a GPU. But, come on, 80% of the times you can use ML, dead-easy techniques are more than enough.I mean, the author is talking about how SQL is a good-old 40 year old tech. In the mean time, one of the simplest ML algorithm, linear regression, is about 200 years old, even older (AFAIK) than Ada's program for Babbage's machine. It's very easy to understand and implement, and even excel has it as a standard function.Sure, linear/logistic regression or naive bayes won't help you tag pictures with text à la facebook "this is a picture of a young man dancing with a red shirt", but the vast majority of use cases of ML are way easier, anyway. So yes, most of the time, you can easily find "talents" that will solve your ML problems. And if you really want to, you can implement it in SQL.

jaequeryalmost 7 years ago

sql is great but i am still waiting for the succesor to sql. sql was made for relational data. but a relational data with nested data structure kind of like postgres and jsonb built in mind from the ground up is what id really like to see.

评论 #17433992 未加载

评论 #17434443 未加载

评论 #17434760 未加载

评论 #17434379 未加载

emersonrsantosalmost 7 years ago

Previous post/discussion: <a href="https://news.ycombinator.com/item?id=16898827" rel="nofollow">https://news.ycombinator.com/item?id=16898827</a>

shrummalmost 7 years ago

Or maybe you can do ML with SQL.... Postgres can do basic linear regression, I did this a couple of times for an analysis and found it pretty handy.

评论 #17434914 未加载

评论 #17434058 未加载

swansonalmost 7 years ago

What was the Twitter thread/HN thread referenced about using "boring" approaches to solving problems?

评论 #17433894 未加载

评论 #17433956 未加载

visargaalmost 7 years ago

Counting items by value is a maximum likelihood estimation method too. It's still ML if you do a count, group by, max or threshold - just a less sophisticated way of doing things. The Naive Bayes algorithm is implemented by counting, at its base.

评论 #17434344 未加载

flatfilefanalmost 7 years ago

When you already know the query logic or the logic is easy to derive - use SQL if you can. For more complex stuff ML may work as your rule derivation mechanism.

slifinalmost 7 years ago

I thought graph databases were the canonical implemention for recommendation systems, one of the few use cases I'd not go straight to sql