I'd like to see something that could do this, handling the awfulness of real world tabular data. "What country has the highest GDP? Okay, which table has GDP? Is it the country_gdp table? No, that's an old one that hasn't been written to in 3 years. Ah here it is, but you need to join against `geopolitics`, but first dedup the crimea data, since it's showing up in two places, we're can't remember why it got written to twice there. Also, you need to exclude June 21 because we had an outage on the brazil data that day. What do you mean some of the country_id rows are NULL?" And so on. I dream that someday there's a solution for that. That's a looooong ways away, I'd bet.
Does anyone know how it relates/compares to Google's TaPaS? [1]
I notice this paper doesn't refer to it.<p>[1] <a href="https://ai.googleblog.com/2020/04/using-neural-networks-to-find-answers.html" rel="nofollow">https://ai.googleblog.com/2020/04/using-neural-networks-to-f...</a>
Git repo or it doesn't exist ;-)<p>Seriously, if this is not available, what are the alternatives?<p>I've seen in the past some NLP + Storage project but I don't recall them. (even remotely connected, there was something to convert PDFs into machine readable data).<p>Is this AwesomeNLP <a href="https://github.com/keon/awesome-nlp" rel="nofollow">https://github.com/keon/awesome-nlp</a> a good starting point there?
Seems similar to this work out of Salesforce a few years ago: <a href="https://www.salesforce.com/blog/2017/08/salesforce-research-ai-talk-to-data.html" rel="nofollow">https://www.salesforce.com/blog/2017/08/salesforce-research-...</a>
TABERT no longer on the Spider leaderboard? - <a href="https://yale-lily.github.io/spider" rel="nofollow">https://yale-lily.github.io/spider</a> . The top is "RATSQL v2 + BERT" testing at 65.6 for exact matches.
NLP has come pretty far: "Released by Symantec in 1985 for MS-DOS computers, Q&A's flat-file database and integrated word processing application is cited as a significant step towards making computers less intimidating and more user friendly. Among its features was a natural language search function based on a 600 word internal vocabulary." <a href="https://en.wikipedia.org/wiki/Q%26A_(Symantec)" rel="nofollow">https://en.wikipedia.org/wiki/Q%26A_(Symantec)</a>
Does the following mean that one can map/train to runtimes that give proper results based on the underlying data _results_?<p>"A representative example is semantic parsing over databases, where a natural language question (e.g., “Which country has the highest GDP?”) is mapped to a program executable over database (DB) tables."<p>Could it be thought of in the same fashion as Resolvers in GraphQL integrated into BERT?
Honest version:<p>> Why it matters:<p>> Improving NLP allows us to create better, more seamless human-to-machine interactions for tasks ranging from identifying dissidents to querying for desperate laid-off software engineers. TaBERT enables business development executives to improve their accuracy in answering questions like “Which hot app should we buy next?” and “Which politicians will take our bribes?” where the answer can be found in different databases or tables.<p>> Someday, TaBERT could also be applied toward identifying illegal immigrants and automated fact checking. Third parties often check claims by relying on statistical data from existing knowledge bases. In the future, TaBERT could be used to map Facebook posts to relevant databases, thus not only verifying whether a claim is true, but also rejecting false, divisive and defamatory information before it's shared.