Transform Data by Example [video]

363 点作者 gggggggg大约 8 年前

24 条评论

teddyh大约 8 年前

You know what this reminds me of? Those trained neural-net things which, however many training examples you give it, always seem to find some way to “cheat” and not do what you want while still obeying all your training data correctly.Something like this: Suppose we have a table of strings of digits, some including spaces, and we’d like to remove the spaces. From<pre><code> 123 456 234567 345 678 </code></pre> to<pre><code> 123456 234567 345678 </code></pre> Now, what happens if it encounters, say<pre><code> 4567890 </code></pre> Would the result be unchanged (as we would probably want), or would it “cheat” and remove the middle “7” character, giving “456890”?

评论 #14364047 未加载

评论 #14364707 未加载

评论 #14364640 未加载

评论 #14365123 未加载

评论 #14381315 未加载

评论 #14365339 未加载

评论 #14365189 未加载

ktamura大约 8 年前

This is a great product idea. If you ask any Excel power users, by far the most time-consuming and hard-to-automate task is text and date manipulation.The beauty of this product is that its adoption strategy is baked into the product itself: I'd share this with all Excel user friends of mine because I want the algorithm to get smarter, and I might even learn a bit of C# myself so that I can contribute and scratch my own itch. This in turn makes the product better (because of the larger training data), lending itself to more word of mouth.One concern I have is security: I'd love to hear from folks who built this/more familiar with this about how to ensure the security of suggested transformations.

评论 #14363691 未加载

评论 #14366669 未加载

Cieplak大约 8 年前

I wonder if it uses Z3 under the hood for solving constraints. Very nice of MSFT to MIT license Z3. It's super useful for problems that result in circular dependencies when modeled in Excel, and require iterative solvers (e.g., goal seek). I use the python bindings, but unfortunately it's not as simple as `pip install` and requires a lengthy build/compilation. Well worth the effort, though.<a href="https://github.com/Z3Prover/z3" rel="nofollow">https://github.com/Z3Prover/z3</a><a href="https://github.com/Z3Prover/z3/issues/288" rel="nofollow">https://github.com/Z3Prover/z3/issues/288</a>

评论 #14364719 未加载

gergoerdi大约 8 年前

Check out MagicHaskeller which figures out list processing functions from examples: <a href="http://nautilus.cs.miyazaki-u.ac.jp/~skata/MagicHaskeller.html" rel="nofollow">http://nautilus.cs.miyazaki-u.ac.jp/~skata/MagicHaskeller.ht...</a>For example, given the rule `f "abcde" 2 == "aabbccddee"`, it even figures out the role of the parameter `2`, so `f "zq" 3` gives `"zzzqqq"`.

bcherny大约 8 年前

Wait, Excel had this built in since 2013!<a href="https://support.office.com/en-us/article/Use-AutoFill-and-Flash-Fill-2e79a709-c814-4b27-8bc2-c4dc84d49464" rel="nofollow">https://support.office.com/en-us/article/Use-AutoFill-and-Fl...</a>

评论 #14364547 未加载

netvarun大约 8 年前

Is this related/a commercial application of the 'Deep Learning for Program Synthesis' post[0][1] from Microsoft Research on HN a month ago?[0]<a href="https://www.microsoft.com/en-us/research/blog/deep-learning-program-synthesis/" rel="nofollow">https://www.microsoft.com/en-us/research/blog/deep-learning-...</a>[1]HN Discussion: <a href="https://news.ycombinator.com/item?id=14168027" rel="nofollow">https://news.ycombinator.com/item?id=14168027</a>

martinthenext大约 8 年前

Oh man, we did it before Microsoft!<a href="http://comnsense.io/" rel="nofollow">http://comnsense.io/</a><a href="https://youtu.be/ALF9GY2K-wc" rel="nofollow">https://youtu.be/ALF9GY2K-wc</a>

wayneprice大约 8 年前

I'm playing around with a client-side js implementation of this at <a href="https://www.robosheets.com/" rel="nofollow">https://www.robosheets.com/</a>It's not production ready / launched yet, but it's getting there.I'd be interested to finds (or really doesn't find) this useful :)

评论 #14365061 未加载

gerhardi大约 8 年前

This was also included in the query editor of Microsoft's Power BI in the release a month or two ago. First you select the columns to be used as a source then start writing example values to the new column to be generated. It also shows the generated M/PowerQuery expression.It can't do miracles, but this is time saving in many cases like when you want to concatenate values from different columns in a new format into a single column and so on.

fiatjaf大约 8 年前

See also <a href="http://www.transformy.io/#/app" rel="nofollow">http://www.transformy.io/#/app</a>Ok, just realized somehow the site has vanished. Not working archived version: <a href="http://web.archive.org/web/20161028231256/https://www.transformy.io/#/" rel="nofollow">http://web.archive.org/web/20161028231256/https://www.transf...</a>

评论 #14364997 未加载

unfamiliar大约 8 年前

Humans are really good at taking a vague description of a task and using a small number of examples to disambiguate it.For example, "sort all of the folders, so that it Alan goes before Amy, etc". The rule ("sort") is pretty ambiguous, but one simple example in the context gives enough information to realise you probably mean alphabetically by first name.Is there something like this example that could be combined with NLP to make things like these "intelligent assistants" we have now much more useful for data processing tasks?It would be great to describe data manipulation to a machine the way that I would describe it to a colleague: give an overview of an algorithm, watch how they interpret it, and correct with a couple of examples in a feedback loop. Currently describing such things for a machine requires writing the algorithm manually in a programming language.

评论 #14364379 未加载

评论 #14364303 未加载

评论 #14365571 未加载

logicallee大约 8 年前

It would be nice if it indicated where it was making stuff up (in the zip code example, for the rows that were missing some data, it just makes it up - these rows are not distinguished visually from the rows where it did not add data not in the input.)What I mean is if every row had a date like "12 May 2002" and you wanted it turned into 2002.05.12 then it would be nice if it indicated when it added data. For example if one of the rows just read "15 May" then, since there is no year, it would not be completely absurd if it transformed into 2017.05.15 - or if all of the other data is 2002, then adding that. But I really think silently adding data that was not in the input is going too far. A transform shouldn't ever silently inject plausible data with no indication that this is interpolated. Bad things can result.Otherwise great demo!

mballantyne大约 8 年前

I believe this is the implementation described in this paper published at POPL 2016:<a href="https://www.microsoft.com/en-us/research/publication/transforming-spreadsheet-data-types-using-examples/" rel="nofollow">https://www.microsoft.com/en-us/research/publication/transfo...</a>Though it probably also uses more recent work from the same group:<a href="https://www.microsoft.com/en-us/research/people/sumitg/" rel="nofollow">https://www.microsoft.com/en-us/research/people/sumitg/</a>

gshulegaard大约 8 年前

Excel is a really powerful tool. If you are fine with needing Windows or Mac (e.g. not Linux) and you are ok with their licensing constraints it's pretty hard to beat.

评论 #14367164 未加载

tdbeteam大约 8 年前

Relationship to FlashFill feature in Excel: FlashFill is a popular feature in Excel that also uses the example-driven paradigm to automatically produce transformations. While FlashFill supports string-based transformations, Transform Data by Example can leverage sophisticated domain-specific functions to perform semantic transformations beyond string manipulations. For examples, see: <a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2017/02/Sample_Reduced-1.xlsx" rel="nofollow">https://www.microsoft.com/en-us/research/wp-content/uploads/...</a>

JoelJacobson大约 8 年前

I hacked together something similar that learns row/column offsets for different fields in a text file, and converts it into a normal CSV, i.e. a normal table.<a href="https://github.com/trustly/fixed2csv" rel="nofollow">https://github.com/trustly/fixed2csv</a>

matt4711大约 8 年前

There is a paper describing such a method (not sure if that is what was implemented):"Zhongjun Jin, Michael R. Anderson, Michael J. Cafarella, H. V. Jagadish: Foofah: Transforming Data By Example. SIGMOD Conference 2017: 683-698"

评论 #14379835 未加载

captnswing大约 8 年前

Seems similar to <a href="http://openrefine.org/" rel="nofollow">http://openrefine.org/</a>

copperx大约 8 年前

That's great, I always loved Auto Fill in Excel, and this brings it to the Mac.

评论 #14365426 未加载

Kiro大约 8 年前

I would love something similar for Google Spreadsheet.

amelius大约 8 年前

I want this in Vim :)This would be great for refactoring code.

tejtm大约 8 年前

alas it is too late, it transformed our genes to dates, no sequence for Bill

cblte大约 8 年前

not usable for companies and secured networks. :-( too bad

sjg007大约 8 年前

There's a huge opportunity in making excel better..