科技回声

6 条评论

rococode超过 6 年前

This seems like a really interesting approach and I think the numbers are promising. Given that it's an entirely different type of model-building than traditional methods, I think it's fine for it to just be up to par with a basic shallow model. If the constructive approach turns out to be comparable to current state-of-the-art models with sufficient refinement, it could be really valuable for low-compute applications like IoT devices, etc.To be honest, I can't say I know enough about the math here to do anything more than vaguely follow their explanations despite my ML/NLP background. I'm curious - other ML researchers out there, how much of this are you able to understand? My impression is that this math is pretty far beyond what ML folks typically know, although I'm on the lower end of the spectrum as far as math knowledge goes, so I may be totally wrong (and need to spend more time reading textbooks haha). I wonder if the complexity may slow down progress if it does turn out that this kind of geometric construction can compete with iterative training. It sounds like this approach could potentially support more complex networks by working more on the geometric representation, so I certainly hope this paper serves its purpose of motivating people with the right skillsets to do further exploration.

评论 #18882267 未加载

评论 #18881854 未加载

princeofwands超过 6 年前

<pre><code> from sklearn.datasets import fetch_openml from sklearn.model_selection import train_test_split from sklearn.neural_network import MLPClassifier from sklearn.metrics import accuracy_score X, y = fetch_openml('mnist_784', version=1, return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=1729, test_size=X.shape[0] - (10 * 20)) model = MLPClassifier(random_state=1729) model.fit(X_train, y_train) p = model.predict(X_test) print(accuracy_score(y_test, p)) X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=1729, test_size=X.shape[0] - (10 * 200)) model = MLPClassifier(random_state=1729) model.fit(X_train, y_train) p = model.predict(X_test) print(accuracy_score(y_test, p)) </code></pre> This gets you 0.645 and 0.838 accuracy score respectively (versus 62% and 76% in the paper). Sure, different validation (I validate on all the remaining data, they do 20x 70% 30% splits on the 200 and 2000 samples, which needlessly lowers the number of training samples, fairer comparison is 0.819 with 1400 samples), but the scores seem at least comparable. Cool method though, I can dig this and look beyond benchmarks (Though Iris and Wine are really toy datasets by now.)

评论 #18882341 未加载

评论 #18881673 未加载

tdj超过 6 年前

Their hypercube covering formalism can be seen as decision tree induction with a specific partitioning rule, and terminating branching only at uniformly labeled leaves. But try are using the tree nodes as kind of an embedding to apply a softmax on. I like the connection between relus and the geometrical representation, makes it easier to think about in spatial terms.Reading this I got several dejavus to my grad school classes on classical ML stuff. I like the direction but it feels like it could be better if it admitted that it's a variant of decision tree embedding, and built on some of the massive amount of research work in that area. At least in terms of understanding.I suspect doing a random forest version of this would actually help. Perhaps we will see this as a legit pre-training step.

评论 #18885592 未加载

tbenst超过 6 年前

Wonder how this compares to single shot with an SVM or nearest neighbor. 76% on MNIST is frankly embarrassing

评论 #18881698 未加载

评论 #18888608 未加载

calvinmorrison超过 6 年前

Interesting, we spent the last few days digging into the old Time Warp OS<a href="https://lasr.cs.ucla.edu/reiher/Time_Warp.html" rel="nofollow">https://lasr.cs.ucla.edu/reiher/Time_Warp.html</a>

stealthcat超过 6 年前

MNIST is joke. I can train linear regression and acheive 90% accuracy.CIFAR-10 is the new MNIST today.

6 条评论

rococode超过 6 年前

评论 #18882267 未加载

评论 #18881854 未加载

princeofwands超过 6 年前

评论 #18882341 未加载

评论 #18881673 未加载

tdj超过 6 年前

评论 #18885592 未加载

tbenst超过 6 年前

Wonder how this compares to single shot with an SVM or nearest neighbor. 76% on MNIST is frankly embarrassing

评论 #18881698 未加载

评论 #18888608 未加载

calvinmorrison超过 6 年前

Interesting, we spent the last few days digging into the old Time Warp OS<a href="https://lasr.cs.ucla.edu/reiher/Time_Warp.html" rel="nofollow">https://lasr.cs.ucla.edu/reiher/Time_Warp.html</a>

stealthcat超过 6 年前

MNIST is joke. I can train linear regression and acheive 90% accuracy.CIFAR-10 is the new MNIST today.

One-Shot Training of Neural Networks Using Hypercube-Based Topological Coverings

6 条评论

One-Shot Training of Neural Networks Using Hypercube-Based Topological Coverings

6 条评论