科技回声

17 条评论

pbourke大约 2 年前

> My script read through each of the products we had responses for, called OpenAI's embedding api and loaded it into Pinecone - with a reference to the Supabase response entry.OpenAI and the Pinecone database are not really needed for this task. A simple SBERT encoding of the product texts, followed by storing the vectors in a dense numpy array or faiss index would be more than sufficient. Especially if one is operating in batch mode, the locality and simplicity can’t be beat and you can easily scale to 100k-1M texts in your corpus on commodity hardware/VPS (though NVME disk will see a nice performance gain over regular SSD)

评论 #34941268 未加载

评论 #34941619 未加载

评论 #34941410 未加载

评论 #34941295 未加载

swyx大约 2 年前

> And it pretty much worked! Using prompts to find matches is not really ideal, but we want to use GPT's semantic understanding. That's where Embeddings come in.sounds like you ended up not using GPT3 in the end which is probably wise.i'm curious if you might see further savings using other cheaper embeddings that are available on huggingface. but its probably not material at this point.did you also consider using pgvector instead of pinecone? <a href="https://news.ycombinator.com/item?id=34684593" rel="nofollow">https://news.ycombinator.com/item?id=34684593</a> any painpoints with pinecone you can recall?

评论 #34940553 未加载

评论 #34941220 未加载

评论 #34941442 未加载

评论 #34940988 未加载

Hyption大约 2 年前

I don't like the unscientific ad for his gf company.'which helped launch the movement of those opposed to endocrine disruptors, was retracted and its author found to have committed scientific misconduct'

评论 #34942873 未加载

jamesblonde大约 2 年前

I have see a lot of people write about how important the interaction between vector DBs and chat-GPT3 (and GPT3) is. I am still not much wiser after this article. Is it that it makes it easier to go from:user query -> GPT3 response -> Lookup in VectorDB -> send response based on closest embedding in VectorDB?

评论 #34941582 未加载

评论 #34941001 未加载

评论 #34941133 未加载

mdorazio大约 2 年前

Are you saving the match pairs somewhere? I imagine 1) there are a finite number of them, 2) doing an exact lookup in a DB first will be faster and easier than calling GPT3 and Pinecone every time, and 3) eventually GPT3 APIs will get pricey enough to make you think twice unless you're running your own instance on a cluster.

throwthere大约 2 年前

This looks incredible and magical to me. How do you learn to create things like this as a mostly web programmer? Vectorization, etc I had no idea could integrate with gpt etc but honestly it looks kind of obvious/effortless to the author.

评论 #34942271 未加载

mattfrommars大约 2 年前

Pardon my ignorance here. I started to play around with text generation today and came around plenty of resource but hard to make any sense of it. I had this working <a href="https://github.com/oobabooga/text-generation-webui">https://github.com/oobabooga/text-generation-webui</a> and instead of it being able to answer question, it revolves around the concept of generating text.In your case and ChatGPT3, does is it provide output based on the data you feed it? If that is the case, is there anything related to training the model to use your data?I am trying to gauge a sense of what is going on.

ipv6ipv4大约 2 年前

How do you know if the output that was sent to customers (who believe they are getting accurate results from a knowledgable human being, BTW) is correct?

djoldman大约 2 年前

How long did this take?Did you consider something like openrefine or fuzzy matching / levenshtein distance?Seems like a common data cleaning ask with a small amount of data.

评论 #34942101 未加载

espe大约 2 年前

i fail to see how this data cleaning could not be solved with proper tokenization and some distance measure. the amount of power used for those api calls is slighty obscene.edit: don't want to rant. it's not a bad post and i'm sure there is many and far more wasteful examples than this.

1f60c大约 2 年前

I'm disappointed that the article doesn’t explain what they ended up doing.

EGreg大约 2 年前

Why not just use GPT-3 or even GPT-2 classifier API? No generative AI needed

评论 #34942905 未加载

评论 #34943856 未加载

sexangel大约 2 年前

> 100s of human hours savedwait till its thousands, millions, billions . . .

fswd大约 2 年前

What is pinecone and is there a link to a website?

评论 #34993222 未加载

pjakubowski大约 2 年前

Awesome to see the integration between Klaviyo automation and GPT-3 AI and using it to streamline your girlfriends processes. Keep up the fantastic work!

评论 #34943042 未加载

NotYourLawyer大约 2 年前

This is pure spam.

wyem大约 2 年前

Loved reading it. Will feature this in my newsletter on AI Tools and learning resources, AI Brews <a href="https://aibrews.com" rel="nofollow">https://aibrews.com</a>

17 条评论

pbourke大约 2 年前

评论 #34941268 未加载

评论 #34941619 未加载

评论 #34941410 未加载

评论 #34941295 未加载

swyx大约 2 年前

评论 #34940553 未加载

评论 #34941220 未加载

评论 #34941442 未加载

评论 #34940988 未加载

Hyption大约 2 年前

评论 #34942873 未加载

jamesblonde大约 2 年前

评论 #34941582 未加载

评论 #34941001 未加载

评论 #34941133 未加载

mdorazio大约 2 年前

throwthere大约 2 年前

评论 #34942271 未加载

mattfrommars大约 2 年前

ipv6ipv4大约 2 年前

How do you know if the output that was sent to customers (who believe they are getting accurate results from a knowledgable human being, BTW) is correct?

djoldman大约 2 年前

How long did this take?Did you consider something like openrefine or fuzzy matching / levenshtein distance?Seems like a common data cleaning ask with a small amount of data.

评论 #34942101 未加载

espe大约 2 年前

1f60c大约 2 年前

I'm disappointed that the article doesn’t explain what they ended up doing.

EGreg大约 2 年前

Why not just use GPT-3 or even GPT-2 classifier API? No generative AI needed

评论 #34942905 未加载

评论 #34943856 未加载

sexangel大约 2 年前

> 100s of human hours savedwait till its thousands, millions, billions . . .

fswd大约 2 年前

What is pinecone and is there a link to a website?

评论 #34993222 未加载

pjakubowski大约 2 年前

Awesome to see the integration between Klaviyo automation and GPT-3 AI and using it to streamline your girlfriends processes. Keep up the fantastic work!

评论 #34943042 未加载

NotYourLawyer大约 2 年前

This is pure spam.

wyem大约 2 年前

Loved reading it. Will feature this in my newsletter on AI Tools and learning resources, AI Brews <a href="https://aibrews.com" rel="nofollow">https://aibrews.com</a>

Using GPT3, Supabase and Pinecone to automate a personalized marketing campaign

17 条评论

Using GPT3, Supabase and Pinecone to automate a personalized marketing campaign

17 条评论