TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

An MNIST-like fashion product dataset

220 点作者 kashifr超过 7 年前

8 条评论

jph00超过 7 年前
I don&#x27;t understand why this seems to be getting so much attention. There are plenty of small image datasets around, and wide recognition of the issues with MNIST.<p>I see no evidence at all that this particular dataset is better than MNIST. None of the issues they themselves list with MNIST are discussed with relation to their proposed replacement.<p>The benchmarks they provide are entirely useless - sklearn does not claim to be a platform for computer vision models. A quick WRN model gets 96% of this dataset (h&#x2F;t @ajmooch on Twitter), suggesting that it doesn&#x27;t deal with the &quot;too easy&quot; issue.<p>The images clearly don&#x27;t deal with the problem of lack of translation invariance.<p>On the downside, they don&#x27;t have the same ease of understanding of hand-drawn digits, which is extremely helpful for teaching, debugging, and visualizing.
评论 #15121949 未加载
评论 #15121987 未加载
评论 #15121528 未加载
评论 #15125113 未加载
nip超过 7 年前
How would you go about generating such dataset?<p>1. Scrape images and store as png<p>2. Downscale to 28px<p>3. Convert each image to grayscale<p>4. Convert to matrices and add label (additional row?)<p>5. Normalize to have matrices of 1 and 0 for faster computation<p>6. Vectorize said matrices<p>7. Concatenate into one big vector<p>Did I miss something &#x2F; Am I fooling myself?<p>I plan on working on my first ML side project and I would love to gain some insights from HN.
评论 #15120867 未加载
评论 #15119747 未加载
评论 #15120583 未加载
eggie5超过 7 年前
Looks like this was sourced from in-house at some German online retailer: zalando.de. There is a similar data set from from amazon sourced by UCSD: <a href="http:&#x2F;&#x2F;jmcauley.ucsd.edu&#x2F;data&#x2F;amazon&#x2F;" rel="nofollow">http:&#x2F;&#x2F;jmcauley.ucsd.edu&#x2F;data&#x2F;amazon&#x2F;</a><p>And our research on recommenders using it: <a href="http:&#x2F;&#x2F;sharknado.eggie5.com" rel="nofollow">http:&#x2F;&#x2F;sharknado.eggie5.com</a><p>Particularly, the 2D scatter of the CNN features: <a href="http:&#x2F;&#x2F;sharknado.eggie5.com&#x2F;tsne" rel="nofollow">http:&#x2F;&#x2F;sharknado.eggie5.com&#x2F;tsne</a>
评论 #15119651 未加载
edshiro超过 7 年前
I&#x27;d love to play around with this dataset! It certainly seems richer than MNIST, and would most likely force the network to extract more features.<p>But just like MNIST, it seems to lack variety in the positioning of the important elements, they are all centered which means that they don&#x27;t train the network in being translation invariant. I presume this issue can be tackled with data augmentation techniques like applying affine transformations.
评论 #15119985 未加载
stared超过 7 年前
For a MNIST-like dataset, I often use not-MNIST (<a href="http:&#x2F;&#x2F;yaroslavvb.blogspot.com&#x2F;2011&#x2F;09&#x2F;notmnist-dataset.html" rel="nofollow">http:&#x2F;&#x2F;yaroslavvb.blogspot.com&#x2F;2011&#x2F;09&#x2F;notmnist-dataset.html</a>), which is more difficult than the original one (see examples of misclassified digits here: <a href="https:&#x2F;&#x2F;docs.neptune.ml&#x2F;get-started&#x2F;character-recognition&#x2F;" rel="nofollow">https:&#x2F;&#x2F;docs.neptune.ml&#x2F;get-started&#x2F;character-recognition&#x2F;</a>).<p>However, I am not sure if we need more MNIST-like datasets. With small size many things make much less sense (data augmentation, even convnets as images are centered anyway) plus using many channels is a typical things (IRL I rarely work with grayscale images). So I am curious, in which way this dataset is better than CIFAR-10?<p>See my note on datasets in Learning Deep Learning, <a href="http:&#x2F;&#x2F;p.migdal.pl&#x2F;2017&#x2F;04&#x2F;30&#x2F;teaching-deep-learning.html#datasets" rel="nofollow">http:&#x2F;&#x2F;p.migdal.pl&#x2F;2017&#x2F;04&#x2F;30&#x2F;teaching-deep-learning.html#da...</a>.
a3864超过 7 年前
If I am understanding the side-by-side comparison correctly, then the performance is highly correlated with MNIST (at least for high accuracy methods).<p><a href="https:&#x2F;&#x2F;i.imgur.com&#x2F;viV7gFB.png" rel="nofollow">https:&#x2F;&#x2F;i.imgur.com&#x2F;viV7gFB.png</a> (x-axis: Fashion, y-axis: MNIST)
评论 #15120658 未加载
ntenenz超过 7 年前
One of the reasons people have shifted away from MNIST is that it&#x27;s simply too easy. Single channel, small image size, few classes, etc. Unfortunately, this does not address any of these concerns.
singularity2001超过 7 年前
How is this &#x27;better&#x27; then cifar10 &#x2F; cifar100?
评论 #15120622 未加载