TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The scikit-learn cargo cults

34 pointsby duckerudeabout 4 years ago

5 comments

gleennabout 4 years ago
The author's beef seems to be "people use similar terminology across similar libraries/frameworks/platforms but they don't behave identically and represent subtly different things". Maybe I don't do enough data scienceing, but isn't this super common? Like, if I write a parser... I'd probably call the main function "parse", or if I'm writing a database connector, I'd probably call the function "connect" to do the connecting. I personally wouldn't expect those to work identically or mean the same exact abstraction. I personally love when things are named similarly so I can grok the meaning in a new codebase more quickly, even if things don't transfer identically.
评论 #26932273 未加载
rubatugaabout 4 years ago
The author doesn’t really know what cargo cult is. It means doing things similar to other groups and expecting an unrealistically positive result. Not only do you have to prove that other ML libraries were imitating sklearn, but that copying it wasn’t useful. Like another commenter said, naming the functions: “fit” and “predict” are simply common names to easily convey meaning. It certainly has the positive effect of letting me know what the functions do. If that’s cargo culting, then so is any program that has a “main” or “init” function with different arguments. Also, to refute their last point, PyTorch is too low level to have a fit function, not because they aren’t trying to cargo cult.
huacabout 4 years ago
SKL being first does not afford it a monopoly on ML object design. Nor should other libraries necessarily seek to emulate what came first (or support pickling...)
gyrovagueGeistabout 4 years ago
Huh, didn’t think I’d see the writer of The Northern Caves on the top of HN.<p>Back to this post: I’ve written some nearest neighbor code and definitely felt some pressure to make the API sklearn compatible. But I don’t think it’s as bad as the post claims in practice.<p>Highly recommend checking out the posters other work. Its a lot of fun,
评论 #26930343 未加载
jwilberabout 4 years ago
“ Sagemaker “Estimators” do not have anything to do with fitting or predicting anything. The SDK is not supplying you with any machine learning code here.”<p>The author is confusing the sagemaker service with the mxnet deep learning library (which sagemaker provides access to). Basically everything they wrote in that section is flat out incorrect.
评论 #26930126 未加载
评论 #26923340 未加载