I have been surveying hundreds of GPTs through my research for the GPT-Analyst GPT and the prompt protection instructions library (see my previous posts).<p>What I found is a huge number of custom GPTs upload pirated PDFs and ebooks.
For example: https://x.com/TechWithElias/status/1733448828542722102?s=20<p>Additionally, I found an unbelievable level of sloppiness with some of the knowledge files provided. The authors literally save HTML pages (with its ugly inline JavaScript) 200kb+ and upload as knowledge files. Even if RAG is employed, you are guaranteed the worst performance ever.<p>Hear me out:<p>This practice really enforces my assumptions that OpenAI's custom GPTs' knowledge search uses no RAG whatsoever and instead is just a basic regular expression search for keywords from your prompt. Then the relevant 'pages' are extracted and fed as context. I doubt OAI uses vector dbs or any proper RAG pipeline.<p>Finally:<p>What a new security nightmare going forward to manage all those uploaded files, exploitable files, passing 'secrets' through custom GPTs, drops and what not.<p>Food for thought and just my two cents.