In Case You Wondered, a Real Human Wrote This Column

129 pointsby l_adamsover 13 years ago

18 comments

jellicleover 13 years ago

What this article doesn't tell you is that the human who wrote it works for Narrative Science.For those who don't know, if you see a story in your local paper, and it doesn't involve a car crash, crime, weather, or sports, it was probably placed there by a PR representative. Most of the things you read are not the result of random reporters deciding to cover X or Y, but a paid, concerted effort to place story X or Y in the paper by providing the paper with a fully pre-digested story to perhaps rewrite, or perhaps not.The words "narrative science" appear 14 times in that story, including such clunkers as "To generate story “angles,” explains Mr. Hammond of Narrative Science...." when Mr. Hammond has already been introduced earlier in the story. It even includes pricing: hey readers, this is not only cool and will win the Pulitzer Prize, but it's cheap too! No mention of competitors... It reads like an ad because it is an ad.This story was provided, probably almost word for word, by a PR person to the NYT reporter.I'm not sure if computer-generated text will be better or worse than the media system we have now.

评论 #2985117 未加载

评论 #2984684 未加载

评论 #2984504 未加载

评论 #2984677 未加载

评论 #2984623 未加载

评论 #2984710 未加载

评论 #2984624 未加载

评论 #2984456 未加载

评论 #2985079 未加载

6renover 13 years ago

I love seeing these examples of product development: begin with a very specific niche at the edge (not tackling the mainstream head-on) and "target non-consumption" - that way, you have no competition; and it's not a zero-sum game where you beat someone, but creating value that never existed before. This is possible not because it's good, but because it's cheap (and good enough):> primarily a low-cost tool ... for local youth sports .... and financial results of local public companies ... “Mostly, we’re doing things that are not being done otherwise,”Then, once you have some customers - any customers! - you improve it, bit by bit. It doesn't need to be perfect in the first place; it doesn't need to be perfect in the end. It just needs to be good enough to be useful.> [customer] worked with Narrative Science for months to fine-tune the softwareAs for the technology itself, we're not told anything of its details, just what it can do. This is a marketing article, not a tech report. It would be interesting to see the models they use for stories, and whether they use grammars for the overall structure. These are very narrow domains, which are the easiest to start with: you could enumerate all the standard cliches, understand when they apply, and tweak the model. That's where the journalist expert domain knowledge of the two founders would come in handy. BTW: "easiest" is only relative - it would still be very difficult (almost impossible), and kudos to these guys for actually doing it - and even better, making an actual business out of it.It reads like a 50's Asimov story - the future is finally arriving.But a Pulitzer in 5 years is absurd, either cynical puff or visionary bravado. Theoretically possible, I think, maybe in 50 years - the figure I've long given for strong AI. ;-)

评论 #2984768 未加载

talbinaover 13 years ago

Did they write an entire two page article while ignoring the real leader in this space, in my opinion: <a href="http://statsheet.com/" rel="nofollow">http://statsheet.com/</a>

评论 #2984316 未加载

levyover 13 years ago

My worry here is computers will learn to write articles specific to every individual. The computer will know what other articles we liked and what we didn't like and just try to write to what we want to read. This will make it even less likely we'll hear an opposing view to our own, if the computers are giving us what we want to read.

评论 #2984290 未加载

jgilliamover 13 years ago

If a computer had written this article, maybe it would have mentioned how useful this technology is for spammers.

评论 #2984695 未加载

thalecressover 13 years ago

I'm skeptical of the claim that a program could win a Pulitzer. How does it decide what to write about, who to interview, and what questions to ask?Reporting a day at the races or the markets is easy because we know which kinds of data are relevant and we have them available.

评论 #2984181 未加载

TorKlingbergover 13 years ago

I wonder if these automatically generated articles will ever become good enough to be worth reading. Currently, they seem to be just good enough to fool Google, and convince people to click the link. Do any sports fans bookmark and come back to these sites?No matter how good the algorithms get, they are still limited by their input, the statistics. If for example a player scores a very unusual goal, say a bicycle kick in soccer, then a real writer who actually saw the match would surely mention it. An algorithm could not if there is no field for unusual goal in the match statistics.

评论 #2985554 未加载

评论 #2984610 未加载

jawnsover 13 years ago

Here's a description of my venture into this territory, in which I generated formulaic lottery result briefs:"I wrote this article with one mouse click"<a href="http://coding.pressbin.com/60/I-wrote-this-article-with-one-mouse-click" rel="nofollow">http://coding.pressbin.com/60/I-wrote-this-article-with-one-...</a>I can't imagine the sort of code base that would be needed to make these stories not seem formulaic.

kiaover 13 years ago

Single page:<a href="http://www.nytimes.com/2011/09/11/business/computer-generated-articles-are-gaining-traction.html?_r=1&pagewanted=all" rel="nofollow">http://www.nytimes.com/2011/09/11/business/computer-generate...</a>

评论 #2984651 未加载

dredmorbiusover 13 years ago

ObXKCD: <a href="http://xkcd.com/904/" rel="nofollow">http://xkcd.com/904/</a>There are certain topical areas which lend themselves to automated content generation. Sports, financial news, weather, astronomy (astrology isn't worth mentioning), earthquakes and other severe events, machine monitoring.Domains in which a quantified or measured outcome tied to a specific point in time or event (final score, market close, daily forecast, etc.) occurs. The important data has already been highlighted, all you've got to do is sprinkle some syntactic sugar around it.Oddly enough, these are areas in which you're already most likely to find existing "AI"-type content generators.In areas in which you've got to do significant determination of what is salient, the approach isn't nearly as successful.

lexiconover 13 years ago

This is a recent email I got from Facebook Support team regarding a vanity url for my business. I could swear this guy is a robot or a script, and I wonder if Facebook is using the technology described in the article:----------------We’re sorry, but we’re unable to process your request because another entity has made a previous request concerning this username. If you are still interested in claiming the username, you may contact us in 60 days for an update about its availability.---You have reached the right channel for these requests. As mentioned earlier, we have no further information to share with you concerning the username "xxxx" (marked out). We will be unable to assist you further from this alias.----------------What human being talks like that?

评论 #2984205 未加载

评论 #2984538 未加载

评论 #2984186 未加载

sjsover 13 years ago

I suppose this may do for articles that just deliver some facts. However the kind of stuff I enjoy reading doesn't just barf up some facts in the form of sentences, it provides insight into what the implications of those facts may be and also draws from the past to better put things in context.That's not to say their technology couldn't be improved to search the web and see what past events are relevant, but providing good insights about the implications of the facts will be a whole lot tougher. I don't think journalists need to be shaking in their boots unless they only deliver the quality and depth of results that this algorithm delivers.

kibaover 13 years ago

These technological advances made me shudder about the potential job loss of the future even though the previous technological advances created new jobs.Sure, there's no way that my profession and the great majority of jobs on the internet would be possible if we rely on human switchboard operators rather than relying on automation. That doesn't mean it will be true for the next advances in technology, does it?

评论 #2984269 未加载

评论 #2984226 未加载

评论 #2984556 未加载

SwellJoeover 13 years ago

This is pretty fascinating stuff, despite the limitations and obvious bias of this article. Are there any Open Source libraries or papers which cover toy implementations of this sort of thing? (Assuming, of course, that it is not simply a bunch of if/else constructs applied to templates, which would be far less interesting.)

jasonshenover 13 years ago

This reminds me of what MarketBrief is doing for financial documents. Definitely less color / variance in the stories though.<a href="http://techcrunch.com/2011/08/15/yc-funded-marketbrief-makes-obtuse-sec-documents-human-friendly/" rel="nofollow">http://techcrunch.com/2011/08/15/yc-funded-marketbrief-makes...</a>

nlover 13 years ago

For those interested, the best source of research in this field is the "Special Interest Group on Natural Language Generation": <a href="http://www.aclweb.org/anthology/siggen.html" rel="nofollow">http://www.aclweb.org/anthology/siggen.html</a>

jack7890over 13 years ago

If this works as advertised, it would have important (bad) consequences for SEO, right?

mkramlichover 13 years ago

Making a note here: add some shiny around my Python template engine and I can land $6m investment.