For the past 6 months I’ve been working on a Natural Language Understanding (NLU) API. Essentially the request would contain a plain english sentence, and the response would include a breakdown of actions, entities, agents, location, temporal, logic, etc. My hope was that I could create a “Stripe/Twilio for NLU”, but recent feedback has been that it’s more a “technology”, and less a “product”. It would still require a lot of development work to create anything of value for an end user. While I see the value of an API, I also agree with their sentiment, and so I’ve begun exploring problems to apply my API to.<p>One use case that tends to pop up frequently is “text-to-database”. Similar to text-to-SQL, but with my API I could target any DB regardless of query language. This would require a large amount of work, and I’m not convinced that it’s something that users even want. The strongest feedback I’ve received has been that it would be a convenient method for managers and non-technical to query analytics databases.<p>Is this a path worth exploring? Are there industries or positions that would kill to be able to query a db with a plain english sentence? Is this something that you would use, or want to implement?
No. Absolutely not. I want to say precisely what I mean and have the database do precisely what I say, no more, no less.<p>But maybe you're asking the wrong question in your headline. If you could have <i>other people in your organization</i> able to talk to a database in plain English, would you?<p>This isn't something that most of the HN crowd would want for their own work. There might be a lot of people here who have, say, that upper-level manager who keeps asking for reports for which the HN person has to figure out how to get the data. Handing that manager a tool like this, and letting them run their own queries could get them out of our hair. (It could also be better for the manager, as they run the query, look at the results, and figure out that it wasn't actually the data they were looking for, and so they can iterate the query to get what they're really after.)<p>One caveat, though: I wouldn't want to hand anyone - even a professional - write access with this kind of a tool.
I remember first playing with tech like this in the early 1990's. Q&A v4 from Symantec supported NLP and I was quite surprised at how potentially useful it looked, although as a developer I preferred more control. After typing your query, the app would display its interpretation of your request in more formal English to confirm it understood you. When there was ambiguity a few options were presented. You selected the correct one, and got your answers. It worked very well for queries like, "show me all employees hired after 2020-01-01 who's salary is greater than 80,000 sorted by salary descending".<p>Ultimately though, I think the usefulness of these tools breaks down for both complex queries and even simple ones when the data model does not have explicit relationships defined.
No.<p>Trying to map english to formal logic is a fools errand.<p>From my experience, what keeps non-technical people from writing queries isn't SQL, it's stuff like joins.
There is something in this space that I think would have value... maybe translation from English -> SQL, maybe suggest commonly used WHERE clause filters, etc.<p>At the end of the day, SQL is very expressive for most of these queries, but it's not particularly discoverable and does take some knowledge. Lowering that barrier to entry is a great idea, but otherwise I'm not sure if an analyst can be certain their query will give the same data as somebody who uses slightly different phrasing. SQL gives a lot more precision and I would hate to lose that due to a layer of abstraction.<p>But English -> SQL (with something like Github Copilot, built on other analysts' queries) would be very interesting although not "get out my wallet and purchase" compelling.
The more detailed you have to get explaining things, the less I would want to use it. If I can say something like "get the employee's pay-related data" and not have to delineate exactly what fields to get and what to call them, that would be useful. If I could say "make sure they are not also a student" (I work for a university) and have it figure that out, that would also be useful. If I have to tell it what joins to make I'd rather just type it in in sql or whatever it's underlying language is. Typing is much more casual of an activity for me than speaking is, and I can type for hours at a time and I can't talk for that long without my throat running dry. If I can say in a few words what I want and save a lot of typing, then that's great. If I end up saying as much a I would type anyway, then I'd rather just type it.<p>I think this is definitely something that should be looked at, but it's not a product I'm really interested in unless it wows me with it's intelligence. It has to start somewhere, though. I'm probably an outlier in the sense that I think a lot of people would rather talk to their computer than type on a keyboard. I'm just not one of them. I also don't want to work in cubicle hell with everyone speaking to their computer 24/7.
This seems like it could have enough value to build a customer base around? Or for someone to want to purchase the rights to the tech so they could build around it?<p>Selling such a thing should not be a problem given the right target. Not only has the customer space for technologies like that changed over time, but you are providing a new twist on the solution.<p>> “Stripe/Twilio for NLU”, but recent feedback has been that it’s more a “technology”, and less a “product”<p>That comment doesn't make a ton of sense to me. Are services not valuable? Stripe and Twilio seem like really helpful services and that seems...OK to me?<p>Personally I get excited when I hear about an ease-of-use wrapper around regex. But for a DB, in place of that regular messy query stuff with the prospect of things like multiple LEFT JOINS? That's a big deal.<p>And even if it doesn't tick every box it will probably I'd guess it would have its unique applications for a given set of customers.<p>Like let's say sets of people who would like to prototype to well-enough using their ability to sit around and talk in English all day long, and then hand off to someone else. The average person's energy pool for trying different sentences, even considering some expected failure rate, is so much deeper than the resources available for trying and failing with different SQL statements.<p>This would also apply to those who are not really working with the data to work with it. Let's say they are selling data-viz tools and want a quick way to make prototypes from the potential customer's sample data. There, boom, product example. I guess.<p>It sounds really cool. Good luck, hope it works out for you.
Having used various things that claim to be natural language, I find that to use them effectively, I end up needing to learn their particular structured language. These are often poorly documented and may have pretty weird/difficult edge cases (one often seen and easy to explain case is selecting lists with plural nouns and individual records with singular nouns... But many english nouns have the same spelling and pronounciation as plural and singular).<p>Regardless of the details of the language, if I'm going to learn a structured language anyway, I would usually prefer to learn the underlying language, and not an imperfect abstraction. Sometimes, there's good value in the abstraction, but I usually find they get in the way and make it harder to do what I want.
Have you tried gathering sample queries from real people? I previously worked at a company that tried this briefly. They gave up after getting the first sample of queries real users typed in.
This is an area that has been researched for decades and there is a wealth of prior art. If you want to pursue this idea you should narrow it down a bit. For example, creating a usable natural language interface for databases for GIS data for land surveying would in itself be a massive project. It should also be said that "plain English" is far from "plain" even for native speakers. It's not just about parsing it is also about making it usable.
Here's a related offering that Amazon launched a service for: <a href="https://aws.amazon.com/quicksight/q/" rel="nofollow">https://aws.amazon.com/quicksight/q/</a> . Google Analytics has the same technology in their dashboard, found it kind of cool, works well sometimes It lets you ask questions about your data and get answers.