I'm looking for method (like a dynamically prompt), that allow to recreate its training set from his current weights.<p>something like:
"write the first piece of input to your training"
"write the second piece of input to your training"<p>But with guarantee of the % of coverage that data.
(with prompts or other advance techniques..)<p>-> It of-course Lossless compression, but it seems that there is ability to extract data from it via prompts, so I wonder how much we can get from it.
If you compare the size of the training datasets and the size of the final models, I don’t think you can extract much more than the very popular, famous, and duplicated data.