Imagine working this hard to track down people sharing ideas, that you didn't work to produce or fund in the first place, in order to punish them for not giving you a financial cut... Companies like this are just holding humanity back.
It sounds like it's good practice to get two copies of a document from two different sources, and to compare their hashes before publishing them. You can embed data in anything, so this would include images, audio files, PDF files, or programs. At least for Elsevier it's pretty obvious they're using a key to track you.
A FOSS tool used for .pdf files cleaning from potential malware, which also can delete metadata, is DangerZone. Probably overkill for simple metadata cleaning, but worth mentioning nonetheless.<p><a href="https://dangerzone.rocks/" rel="nofollow">https://dangerzone.rocks/</a>
Nothing strange, I have a small script to cleanup pdfs in general (reducing their size as well), essentially<p><pre><code> pdftops -paper A4 -expand -level3 file.pdf
ps2pdf14 -dEmbedAllFonts=true \
-dUseFlateCompression=true \
-dOptimize=true \
-dProcessColorModel=/DeviceRGB \
-r72 \
-dDownsampleGrayImages=true \
-dGrayImageResolution=150 \
-dAutoFilterGrayImages=false \
-dGrayImageDownsampleType=/Bicubic \
-dDownsampleMonoImages=true \
-dMonoImageResolution=150 \
-dMonoImageDownsampleType=/Subsample \
-dDownsampleColorImages=true \
-dColorImageResolution=150 \
-dAutoFilterColorImages=false \
-dColorImageDownsampleType=/Bicubic \
-dPDFSETTINGS=/ebook \
-dNOSAFER \
-dALLOWPSTRANSPARENCY \
-dShowAnnots=false \
file.pdf file.pdf
</code></pre>
that's is. After if needed we can add extra metadata. It's not specially designed to remove certain kind of tracking but simple and useful enough in most cases.
1. Download the PDF from Elsevier.<p>2. Open it in your own PDF viewer.<p>3. Press print.<p>4. In the printer selection box, select "Print to PDF".
Leaving these here:<p><a href="https://github.com/kanzure/pdfparanoia">https://github.com/kanzure/pdfparanoia</a><p>and<p><a href="https://github.com/firstlookmedia/pdf-redact-tools">https://github.com/firstlookmedia/pdf-redact-tools</a><p>as they are relevant to this topic.
The fact that elsevier and other similar entities ar allowed to continue to exist, i.e. that people are actually willing to give these parasites money needs to be analyzed so a root cause can be found and the dreaded thing can be put to pasture for good.<p>Leads:<p><pre><code> - academics are ultimately lazy and do not care to fix the system
- academics are so self-engrossed with their research that, like much of what they do that isn't directly pertaining to their work, the quality of what they do is horrible.
- there exist a system of incentives that feed something back to people helping perpetuate such a parasitic system.
</code></pre>
Other ideas?
Metadata is very low hanging fruit for document watermarking.
Typically the PDF renderer will use spacing, kerning, invisible
characters, and all sorts of steganography to make each copy unique.
What would be the point of a hash? More likely the hash is a MAC,
that's been salted with some secret plus the unique copy. That would
help the publisher identify a laundered copy. With two or more copies
its possible to re-anonymise. That's actually something I wonder
whether summarising language models would be good at. Of course they
may also make steganographic alterations to diagrams.<p>Because PDFs are such dirty documents I almost always convert them to
plain text, usually with no loss of semantics.
Previous discussion:<p>Elsevier embeds a hash in the PDF metadata that is unique for each download (twitter.com/json_dirs) 343 points by sohkamyung on Jan 26, 2022<p><a href="https://news.ycombinator.com/item?id=30082138">https://news.ycombinator.com/item?id=30082138</a>
At some point long ago, an online music shop decided to embed the credit card details of the buyer into the MP3 metadata.<p>If the original user kept the song to himself, "no harm done". If not, his credit details would be all over Napster and eMule.<p>I don't remember the details now, as that was way back when the 'Net was the Wild West, but it was all the rage on IRC....
You should see what ISO and ANSI do when you try to get a PDF of one of their standards. PDF with DRM, installing a DRM application on your computer that some claim cannot be uninstalled from your machine. You are also allowed to keep only a SINGLE copy of the PDF and only a single printout. It's just crazy.
Maybe a quick website that allowed folks to upload a PDF and get back an anonymous PDF would be great. That way folks don't have to be proficient in command line!
A noob question. Do we need to so many steps or can the hash be simple removed by printing the pdf to another pdf and sharibg the printed pdf? Granted the quality might suffer and size might increase, but if that works that looks like a simple option.<p>If that doesn't work, any way to integrate all these steps and make it possible through pandoc?
Yet another way to weaponize their paywall on knowledge and continued existence as nothing more than a greedy middle-man. Sad that more organizations and academics refuse to buck the system and continue to support Elsever as a gatekeeper.<p>Hopefully zlib and other public knowledge repositories will add to their upload process a stripper for this PDF metadata automatically.
Eksevier must die, and we must spare no effort to kill it. Work to kill elsevier.<p>Use the library Genesis<p>Use the QubesOS PDF sanitizer or similar<p>Integrate a sanitizer into the Zotero ref manager<p>Contribute to libgen software projects<p>Use the sci-hub<p>Tell your peeps, tell your peers, tell your org<p>Sabotage peeps peers and orgs that won't help
Elsevier is not to blame, this is fair business. It is like you were blaming homeopathy companies for selling overpriced sugar and chalk.<p>The blame is on Academia who decided that they will be based on paying a company to send them knowledge, paying a company to get knowledge from them, providing them competent people for free and making their decisions dependent on all of this.<p>This is whining about their pain when whipping themselves at the same time.<p>This system was put in place in ancient times and changed in the meantime, the first interested somehow did not notice that.<p>Note: I used to be in Academia and left after my PhD, among others because of these practices and medieval organizations. I loved the time I spent with brilliant people and the hours of teaching.