Wait, how can we verify this is OpenAI's form and not some random form on the internet?<p>Edit: Ok the link can be found here in part 4 of : <a href="https://openai.com/policies/privacy-policy" rel="nofollow">https://openai.com/policies/privacy-policy</a>
> we need clear evidence that the model has knowledge of the data subject conditioned on the prompts<p>We have a system that may have information about you and may even distort information about you. In fact it probably has some information about you considering that we exercised no control over the process of ingesting information into the system. Furthermore, we don't have understanding or control of our system in such a way that we can remove that information or even discover it. However, we still released the system to the world and now we expect you to test it with various prompts and hope that you get lucky before someone other person does.
Does anyone have any idea how this is handled from a technical perspective?<p>The data isn't sitting in some database somewhere, it's inside of a large lanaguage model. It's not like they can just execute a DELETE statement or do an entirely new training run.<p>Are they intercepting the outputs with something like a moderation server as a go-between? In that case, the data still would technically exist in the model, it just wouldn't be returned.<p>Maybe using fine-tuning?
Putting the onus on the user to find a “relevant prompt” is bullshit. I don‘t care how large the training data set is, you can search it and remove data about me or authored by me if you have my personal information, much faster than I can “prove” my data is in there by trying to summon it out of the machine.<p>The legal principle here is very, very simple — no training data without explicit legal consent. Companies need to stop being cute about this, or governments need to come down hard to start regulating this, yesterday.
> a request does not guarantee that information about you will be removed from ChatGPT outputs<p>Oh i am pretty sure that if you dont remove all data you’ll pay for it. Looking forward to hefty fines for openai.
"Relevant prompts" should not be a required field. That means I need to use OpenAI to request my data be removed from its data set?<p>Is there a way to remove PII without having to use their service?
Somewhat related, I previously completed the form found in the help section titled "How your data is used to improve model performance" to opt out of providing training data to OpenAI: <a href="https://help.openai.com/en/articles/5722486-how-your-data-is-used-to-improve-model-performance" rel="nofollow">https://help.openai.com/en/articles/5722486-how-your-data-is...</a><p>I received a confirmation in February that my data had been excluded from model training. However, recently, after the addition of the new Data Controls feature, I noticed that I was suddenly opted in again in the settings. I've tried contacting them about it via Discord and e-mail so that they can clarify whether the exclusion is still valid, but it seems like I'm getting ignored.
Wouldn't it be easier if they published a list where they scraped their data from in the first place. Filling out forms, scanning id and sending it only to learn they didn't capture any of your data seems like such a waste of time.<p>On the other hand, they already know which sites they used to scrape data. So publish it, maybe with a handy lookup portal where you can enter urls to see if it got scraped.<p>I prefer an opt-in model, but that's not likely to happen any time soon, so this seems reasonable while this gets legally sorted out. Just because something is transmitted publicly doesn't mean it's without copyright. Otherwise any song broadcast on radio is up for grabs to be resold by anyone receiving it.
Reminder that you have no obligation to use their stupid form if you don't like it and all their weird requirements.<p>You can just send them a snailmail or e-mail and they'll have to process that too. You can find templates for that all around the internet.
I'm wondering: How can I be certain that the model contains any personal data about me (or someone else not famous)?<p>For a public figure, of course there is lots of information in the training data, all public data. But when asked about me or my brother, ChatGPT either refuses to answer OR hallucinates the hell of it. Then, nearly everything is wrong and the output resembles the answer to a prompt like: "Create a short bio for a fictional character named xx, living in yy and working as zz." (Okay, often yy and zz are wrong either.)<p>Requesting to delete these hallucinated facts seems quite stubborn and ineffective?
I frankly don’t get this privacy argument at all. If I browse Facebook and look at pictures you uploaded and end up learning something from those pictures, what am I supposed to do? Undergo brain surgery?<p>It feels like anything that you release on the internet publicly is fair game. If however you didn’t release it in public, put it behind a password and then OpenAI somehow got access to it and train on it, I can see the argument here but if you put up data on your own, I don’t see why you can prevent others from accessing that data. If you don’t want others using it out there, don’t put it out there.
According to Washington Post's reporting on the contents of the C4 dataset last week I see that ChatGPT has over 100,000 tokens from my personal blogs. Does that qualify as "personal data" in this context? Everything I've written is clearly labeled as CC-BY so it would be nice to at least be acknowledged.<p><a href="https://www.washingtonpost.com/technology/interactive/2023/ai-chatbot-learning/" rel="nofollow">https://www.washingtonpost.com/technology/interactive/2023/a...</a>
Here's to hoping this forces their hand into making the mass data-hoovering completely opt-in so they can avoid these pesky normies from bothering them again the future
You can't finish this form without selecting "country who's laws apply". The US isn't in the list so you cant submit the form. California and Illinois have GDPR-like laws. This is pretty messed up.
So they seem to be confirming they're processing personal data. The UK ICO "at a glance" guidance for that includes:<p>* You must identify valid grounds under the UK GDPR (known as a ‘lawful basis’) for collecting and using personal data.<p>* You must use personal data in a way that is fair. This means you must not process the data in a way that is unduly detrimental, unexpected or misleading to the individuals concerned.<p>* You must be clear, open and honest with people from the start about how you will use their personal data.
You'll want to fact check this, but I find GPT4 is OK at this kinda stuff.<p>If you've been asked to remove data in accordance to a gdpr request, is it best effort or do you have to strictly comply with the request?
ChatGPT
I am not a lawyer, but I can offer some general information about GDPR and data removal requests. The General Data Protection Regulation (GDPR) is a legal framework that sets guidelines for the collection, processing, and storage of personal data of individuals within the European Union (EU). Under GDPR, individuals have the "right to be forgotten," also known as the "right to erasure." This means that they can request the deletion of their personal data under specific circumstances.<p>When a data controller (the organization that determines the purposes and means of processing personal data) receives a valid GDPR request to remove personal data, they are generally required to strictly comply with the request. However, there are some exceptions, and the extent of the obligation may depend on the specific circumstances surrounding the request.<p>The GDPR requires data controllers to respond to data removal requests without undue delay, and in any event within one month of receipt of the request. This period may be extended by two further months where necessary, taking into account the complexity and number of requests.<p>It's important to remember that GDPR compliance is not solely about "best effort." Organizations are expected to take reasonable and appropriate measures to ensure compliance with the regulation. This includes having processes in place to handle requests and promptly remove personal data where required. Failure to comply with GDPR can result in significant fines and penalties.<p>However, it's also essential to consider that there are circumstances under which a data controller may not be required to comply with a data removal request. These exceptions can include situations where the processing of personal data is necessary for exercising the right of freedom of expression and information, compliance with a legal obligation, or for reasons of public interest.<p>In any case, it's crucial to consult with legal counsel or a data protection officer to better understand your organization's specific obligations under GDPR and ensure proper compliance.