Reading the readme makes me think it's only searching the top 4 most likely docs via the embeddings, not the wiki at any time? or am I misunderstanding how this works? With embeddings being close to just term vector matching via dot(?) product?<p>So basically get all the sub-prhases/sounds -> vector -> check vector db for closest matching documents -> send to gpt for summarization and answering the quetsion.<p>If that's ture wouldn't that have severe limitations with scattered information? I guess it would help you get answers and walk the data better than the "I don't even know the term" problem with google?
Nice to have tools like this to wrap up features, definitely makes these types of solutions more accessible, thanks!<p>It would be nice to know from your experience if there is a kind of rule of thumb for calculating cost of fine tuning and running a solution like this against a docs site?
I tried to do something tangentially similar recently, telling ChatGPT that I'd ask it a question, but rather than a response, I wanted search terms for Wikipedia and Wikidata that I could give it that would have the answer in. The thinking is I'd then be able to provide those to it, and get it to synthesize that data, providing answers that had decent citations in them.<p>Perhaps it was the example I chose "flight time from New York to London" but I couldn't really get it to provide sensible search terms for the information it wanted or needed
Thanks for sharing the code. What happen when the existing content get updated and new contents created, would it need to create embeddings for all contents again? The current approach is not good as create embeddings cost money? Please see <a href="https://github.com/mpaepper/content-chatbot/blob/main/create_embeddings.py#L49">https://github.com/mpaepper/content-chatbot/blob/main/create...</a>. Would it be possible progressively update the vector store?<p>Please advise. Thank you.
Awesome work! Thanks for sharing.<p>For anyone interested in an audio version that talks to you, that you can get on your site today, my brother put this together a few weeks ago!
<a href="https://siteguide.ai/" rel="nofollow">https://siteguide.ai/</a>
Awesome!<p>Are you planning on adding agent/tools support?<p>It would be cool to use this with internal data, then allow clients to chat with a bot fine-tunes on their data, but that can also run queries, or get reports for specific dates, or charts, all via tools.
Curious to see if it can take my entire site content: <a href="https://taoofmac.com/static/graph" rel="nofollow">https://taoofmac.com/static/graph</a><p>Might be a fun weekend experiment.