GPT3 Get answers to technical questions from your documentation site

93 pointsby chandan_maruthiover 2 years ago

25 comments

dangover 2 years ago

Waiting lists and signup pages can't be Show HNs. Please see <a href="https://news.ycombinator.com/showhn.html" rel="nofollow">https://news.ycombinator.com/showhn.html</a>.I've taken Show HN out of the title now.

评论 #34302649 未加载

the_real_sparkyover 2 years ago

I think the actual interface with OpenAI’s platform is the easy part. Everybody and their dog will have a version of this. Just look at the comments so far - many of us have already been playing with it.If you want a real moat, figure out how to parse existing PDF documentation that is really badly formatted. Think diagrams and tables with text floating in various places, etc. Documentation of this style is very common in industries where physical things are being built in the real world. The standards documentation (IEEE, ANSI, NFPA, etc) doesn’t usually parse cleanly, much less the messier internal documentation within the businesses.Grobid is the best example of such a documentation parser, but it is so laser focused on academic papers that it fails to properly process industry-style standards and SOP documentation. What the world needs right now is a Grobid that works for other kinds of messy documentation.

评论 #34298476 未加载

isthisthingon99over 2 years ago

The problem with GPT3 is it creates BS answers. The function "Interval" does not seem to exist.how would I do a SQL query with questdb to grab every other second of data?This is a common use case for us. We have built SQL functions to easily do this task. The function below will return every other second of data:SELECT * FROM `time_series` WHERE (INTERVAL '2s')This query can be run directly from the web console or via Postgres wire protocol using Postgres client libraries such as pg4ce, postgres-java and others.The above query will return all records in the table with timestamp greater than 12

评论 #34298339 未加载

iamflimflam1over 2 years ago

Was thinking of building something very similar - upload your technical docs - and create a chat bot help system.One of the tricky things to overcome is the length limits of the prompt you can feed into things like GPT-3. There are some suggestions on the OpenAI website on how to overcome this. The main one seems to be to filter the content using embeddings and then only feed "relevant" sections into the prompt.Would be interested to know what your approach is.

评论 #34297848 未加载

评论 #34298405 未加载

评论 #34297593 未加载

immortalloomover 2 years ago

Being able to have conversations with a documentation site is the future. Awesome that you are tackling it!

评论 #34298275 未加载

评论 #34297373 未加载

motohagiographyover 2 years ago

While I've been looking for this, the gap in most product documentation I have seen is that it's written by architects who want you to understand the whole framework and demonstrate its myriad parts instead of by users (e.g. product managers) who just need to solve a problem and they will learn from that.On this GPT chat, I asked, "provide an example oauth2 config for QuestDB" and "how do I configure oauth2 for signaldot?" and it responded like a search engine with paragraphs from documentation that matched the keywords, perhaps within an extra degree of reasoning.As someone who uses fantastically (farcically?) complex security technologies, I'd tell architects, PMs, documentation writers and developers: Your Cookbook Is Your Product.LLM's have a GIGO problem, and until you can train it on example configs, it's going to be a search engine over someone's architectural grandstanding.

评论 #34300532 未加载

chandan_maruthiover 2 years ago

Some Highlights - Twig takes static documentation sites and lets users ask questions about it- Custom/FineTuned GPT3 models are trained on your data- Can be embedded within your app or docs site.Seeking Feedback- What do you think about it.- Would you add this to your docs site or within your app when it's available- What can I add/change to make this better.

评论 #34297770 未加载

评论 #34297371 未加载

cloudkingover 2 years ago

Looks interesting, but need more details about set up and pricing. This is another approach that doesn't require fine tuning <a href="https://github.com/jerryjliu/gpt_index">https://github.com/jerryjliu/gpt_index</a>

评论 #34311343 未加载

visargaover 2 years ago

It seems to do open domain question answering without restricting to the topic.> Is the word cat made of 4 or 5 letters?>> The word cat is made of 4 letters, 3 of which are in the stem.

评论 #34299944 未加载

评论 #34301415 未加载

评论 #34299829 未加载

ngc248over 2 years ago

I had this idea a few years back just when ML was heating up. A service which would crawl all of your documentation/data to which you can then ask questions and get answers back. A domain specific search engine/Knowledge Base.I started reading some of the papers, got to know that even to build a model required a huge number of resources and just dropped it.

评论 #34300331 未加载

monkeydustover 2 years ago

Some great stuff here. Been thinking about doing this for enterprise grade software at work. What I want to do is feed it docs (pdf), text from support queries with answers and videos. What types of approaches should I be considered here? Have just started using Whisper to do ASR on videos.

评论 #34299846 未加载

jinctover 2 years ago

Is there any way a company could run this on prem trained against their internal Confluence instance?

评论 #34300015 未加载

评论 #34299852 未加载

allisdustover 2 years ago

How did you train it. Does openai provide api for training chatgpt on specific sites or data?

评论 #34297623 未加载

评论 #34297865 未加载

spaniard89277over 2 years ago

It seems that technical writers will be even more important, as their job will be the imput for models.Unless GPT can learn directly from code and explain it properly.With chatGPT I've had some problems trying to deduce from code prompts, but this seems promising.

评论 #34297295 未加载

chandan_maruthiover 2 years ago

Just saw someone tried to hack/denial-of-serive the bot by just putting too many words in the prompt. (This hit triggers as usage is token based) .I just set a prompt limit of 100 characters should be plenty. Bot is till down until OpenAI resets limits. <a href="https://twitter.com/chandanmaruthi/status/1612134496035569664?s=20&t=JvOWZPvxNGXwvLqKG_a3KQ" rel="nofollow">https://twitter.com/chandanmaruthi/status/161213449603556966...</a>

raydiatianover 2 years ago

Nice, I have been thinking the same was possible to “teach” an AI about your codebase, and then ask it to make PRs for simple features and bug fixes.

评论 #34297389 未加载

评论 #34297342 未加载

chandan_maruthiover 2 years ago

Hi All, We had over 6,000k interactions and have hit OpenAIs hard api usage limits :-). I have requested OpenAi to increase limits. Until then the live chat with bot may be restricted. Remember to get on the waitlist with "get access" if you want to give it a try when the live chat starts again.

kindofabigdealover 2 years ago

This is actually interesting, can it parse documentation from my code? Or comment? Or how is this exactly working?

评论 #34298001 未加载

chandan_maruthiover 2 years ago

Update: Still waiting on OpenAI to increase API limits after we hit hard limits after a popular HN post

pruthvishettyover 2 years ago

Isn't the answers api deprecated?

评论 #34302721 未加载

neophyt3over 2 years ago

wondering if this can be used on RFCs

评论 #34302726 未加载

recuterover 2 years ago

<pre><code> QuestDb; Columnar time-series database with high performance ingestion and SQL analytics you know and love from QuestDB open source, now on the cloud. Q: How high is the performance? GPT3: We have tested QuestDB against other databases using benchmarking frameworks and benchmarking tools. We have also developed our own performance tests to simulate real-world use cases involving large data sets, multiple ingestion streams, and complex SQL queries. The results of these efforts are summarized in the table below:QuestDB is the fastest open source time series database available today. It powers real-life applications such as Yahoo! which stores 1 billion records per day in a single instance of QuestDB with </code></pre> GPT3 says QuestDb is web scale and I for one trust it implicitly.

评论 #34300080 未加载

titzerover 2 years ago

Let's use a language model trained on hundreds of gigabytes of god-knows-what, has a tendency to hallucinate, has literally never used your product, is known as a consummate bullshitter, and unleash it unsupervised on your confused users.Sorry for the snark, but what is your plan for quality assurance that this this meeting users needs instead of just BS'ing them out of your hair?

评论 #34299896 未加载

评论 #34300017 未加载

nutancover 2 years ago

Trying something similar. Using a mix of embeddings and generative AI(davinci) to answer questions from scrapped data of website. Scrapped data for our website (Ozonetel.com) and created this site.1. Scraping website. Used default node scraper. 5 mins.2. Generated huggingface embeddings. 10 mins.3. Use code AI to generate basic website. 5 mins.4. Created prompt to limit to answers known. 1 min.So in 30 mins we are able to create a site search powered by generative AI.Disclaimer. This is still a work in progress.<a href="http://speech-kws.ozonetel.com/ozosearch" rel="nofollow">http://speech-kws.ozonetel.com/ozosearch</a>

评论 #34299197 未加载

评论 #34300360 未加载

评论 #34298482 未加载

评论 #34300087 未加载

awestrokeover 2 years ago

Q: can I self-host questdb in my own EKS?> QuestDB is a popular time series database used by many companies for their application monitoring and real-time analytics. We have built QuestDB as a system to track the progress of our own application, which has evolved from an experiment into production at scale. As we grow, so does our use of QuestDB; it's now the core data store for all things InfluxData.We are looking to hire talented engineers who share our mission of empowering data producers with tools toNot super impressed

评论 #34300094 未加载

25 comments

dangover 2 years ago

评论 #34302649 未加载

the_real_sparkyover 2 years ago

评论 #34298476 未加载

isthisthingon99over 2 years ago

评论 #34298339 未加载

iamflimflam1over 2 years ago

评论 #34297848 未加载

评论 #34298405 未加载

评论 #34297593 未加载

immortalloomover 2 years ago

Being able to have conversations with a documentation site is the future. Awesome that you are tackling it!

评论 #34298275 未加载

评论 #34297373 未加载

motohagiographyover 2 years ago

评论 #34300532 未加载

chandan_maruthiover 2 years ago

评论 #34297770 未加载

评论 #34297371 未加载

cloudkingover 2 years ago

评论 #34311343 未加载

visargaover 2 years ago

It seems to do open domain question answering without restricting to the topic.> Is the word cat made of 4 or 5 letters?>> The word cat is made of 4 letters, 3 of which are in the stem.

评论 #34299944 未加载

评论 #34301415 未加载

评论 #34299829 未加载

ngc248over 2 years ago

评论 #34300331 未加载

monkeydustover 2 years ago

评论 #34299846 未加载

jinctover 2 years ago

Is there any way a company could run this on prem trained against their internal Confluence instance?

评论 #34300015 未加载

评论 #34299852 未加载

allisdustover 2 years ago

How did you train it. Does openai provide api for training chatgpt on specific sites or data?

评论 #34297623 未加载

评论 #34297865 未加载

spaniard89277over 2 years ago

评论 #34297295 未加载

chandan_maruthiover 2 years ago

raydiatianover 2 years ago

Nice, I have been thinking the same was possible to “teach” an AI about your codebase, and then ask it to make PRs for simple features and bug fixes.

评论 #34297389 未加载

评论 #34297342 未加载

chandan_maruthiover 2 years ago

kindofabigdealover 2 years ago

This is actually interesting, can it parse documentation from my code? Or comment? Or how is this exactly working?

评论 #34298001 未加载

chandan_maruthiover 2 years ago

Update: Still waiting on OpenAI to increase API limits after we hit hard limits after a popular HN post