Elsevier embeds a hash in the PDF metadata that is unique for each download (2022)

284 pointsby luu12 months ago

27 comments

hcrean12 months ago

Imagine working this hard to track down people sharing ideas, that you didn't work to produce or fund in the first place, in order to punish them for not giving you a financial cut... Companies like this are just holding humanity back.

评论 #40631929 未加载

评论 #40632283 未加载

评论 #40634608 未加载

评论 #40632624 未加载

评论 #40632045 未加载

vsuperpower202012 months ago

It sounds like it's good practice to get two copies of a document from two different sources, and to compare their hashes before publishing them. You can embed data in anything, so this would include images, audio files, PDF files, or programs. At least for Elsevier it's pretty obvious they're using a key to track you.

评论 #40631705 未加载

评论 #40632243 未加载

squarefoot12 months ago

A FOSS tool used for .pdf files cleaning from potential malware, which also can delete metadata, is DangerZone. Probably overkill for simple metadata cleaning, but worth mentioning nonetheless.<a href="https://dangerzone.rocks/" rel="nofollow">https://dangerzone.rocks/</a>

评论 #40632438 未加载

评论 #40633186 未加载

kkfx12 months ago

Nothing strange, I have a small script to cleanup pdfs in general (reducing their size as well), essentially<pre><code> pdftops -paper A4 -expand -level3 file.pdf ps2pdf14 -dEmbedAllFonts=true \ -dUseFlateCompression=true \ -dOptimize=true \ -dProcessColorModel=/DeviceRGB \ -r72 \ -dDownsampleGrayImages=true \ -dGrayImageResolution=150 \ -dAutoFilterGrayImages=false \ -dGrayImageDownsampleType=/Bicubic \ -dDownsampleMonoImages=true \ -dMonoImageResolution=150 \ -dMonoImageDownsampleType=/Subsample \ -dDownsampleColorImages=true \ -dColorImageResolution=150 \ -dAutoFilterColorImages=false \ -dColorImageDownsampleType=/Bicubic \ -dPDFSETTINGS=/ebook \ -dNOSAFER \ -dALLOWPSTRANSPARENCY \ -dShowAnnots=false \ file.pdf file.pdf </code></pre> that's is. After if needed we can add extra metadata. It's not specially designed to remove certain kind of tracking but simple and useful enough in most cases.

hristov12 months ago

1. Download the PDF from Elsevier.2. Open it in your own PDF viewer.3. Press print.4. In the printer selection box, select "Print to PDF".

评论 #40632079 未加载

评论 #40631899 未加载

评论 #40632455 未加载

评论 #40631897 未加载

评论 #40631922 未加载

评论 #40631842 未加载

jbn12 months ago

Leaving these here:<a href="https://github.com/kanzure/pdfparanoia">https://github.com/kanzure/pdfparanoia</a>and<a href="https://github.com/firstlookmedia/pdf-redact-tools">https://github.com/firstlookmedia/pdf-redact-tools</a>as they are relevant to this topic.

评论 #40641951 未加载

henriquenunez12 months ago

We need to create a tool to allow the sharing and unlocking of humanity’s knowledge

评论 #40631933 未加载

评论 #40632164 未加载

评论 #40631938 未加载

ur-whale12 months ago

The fact that elsevier and other similar entities ar allowed to continue to exist, i.e. that people are actually willing to give these parasites money needs to be analyzed so a root cause can be found and the dreaded thing can be put to pasture for good.Leads:<pre><code> - academics are ultimately lazy and do not care to fix the system - academics are so self-engrossed with their research that, like much of what they do that isn't directly pertaining to their work, the quality of what they do is horrible. - there exist a system of incentives that feed something back to people helping perpetuate such a parasitic system. </code></pre> Other ideas?

评论 #40632715 未加载

nonrandomstring12 months ago

Metadata is very low hanging fruit for document watermarking. Typically the PDF renderer will use spacing, kerning, invisible characters, and all sorts of steganography to make each copy unique. What would be the point of a hash? More likely the hash is a MAC, that's been salted with some secret plus the unique copy. That would help the publisher identify a laundered copy. With two or more copies its possible to re-anonymise. That's actually something I wonder whether summarising language models would be good at. Of course they may also make steganographic alterations to diagrams.Because PDFs are such dirty documents I almost always convert them to plain text, usually with no loss of semantics.

评论 #40631948 未加载

Archelaos12 months ago

Is this practice legal in the EU?

评论 #40632742 未加载

stefek9912 months ago

SOP = standard operating procedure.I would be surprised if it was otherwise.

评论 #40631904 未加载

1vuio0pswjnm712 months ago

Previous discussion:Elsevier embeds a hash in the PDF metadata that is unique for each download (twitter.com/json_dirs) 343 points by sohkamyung on Jan 26, 2022<a href="https://news.ycombinator.com/item?id=30082138">https://news.ycombinator.com/item?id=30082138</a>

sam_goody12 months ago

At some point long ago, an online music shop decided to embed the credit card details of the buyer into the MP3 metadata.If the original user kept the song to himself, "no harm done". If not, his credit details would be all over Napster and eMule.I don't remember the details now, as that was way back when the 'Net was the Wild West, but it was all the rage on IRC....

评论 #40633772 未加载

beryilma12 months ago

You should see what ISO and ANSI do when you try to get a PDF of one of their standards. PDF with DRM, installing a DRM application on your computer that some claim cannot be uninstalled from your machine. You are also allowed to keep only a SINGLE copy of the PDF and only a single printout. It's just crazy.

BSDobelix12 months ago

Proudly licensed from Red Star OS ;)

评论 #40632326 未加载

vendiddy12 months ago

Maybe a quick website that allowed folks to upload a PDF and get back an anonymous PDF would be great. That way folks don't have to be proficient in command line!

评论 #40632244 未加载

评论 #40632234 未加载

hobofan12 months ago

[2022]

bayindirh12 months ago

It's good for them. Their blindness makes thair grave digging process more efficient.

ptman12 months ago

Watermarking is much preferable to DRM

评论 #40634250 未加载

wanderingmind12 months ago

A noob question. Do we need to so many steps or can the hash be simple removed by printing the pdf to another pdf and sharibg the printed pdf? Granted the quality might suffer and size might increase, but if that works that looks like a simple option.If that doesn't work, any way to integrate all these steps and make it possible through pandoc?

huhtenberg12 months ago

With all possible steganographic and watermarking options this looks ... I don't know ... eye-poppingly dumb?

评论 #40631884 未加载

评论 #40632015 未加载

评论 #40632103 未加载

3lit3krew12 months ago

Doesn't everybody publish their articles online now too?

robblbobbl12 months ago

RIP ripped ebooks

bastard_op12 months ago

Yet another way to weaponize their paywall on knowledge and continued existence as nothing more than a greedy middle-man. Sad that more organizations and academics refuse to buck the system and continue to support Elsever as a gatekeeper.Hopefully zlib and other public knowledge repositories will add to their upload process a stripper for this PDF metadata automatically.

barfbagginus12 months ago

Eksevier must die, and we must spare no effort to kill it. Work to kill elsevier.Use the library GenesisUse the QubesOS PDF sanitizer or similarIntegrate a sanitizer into the Zotero ref managerContribute to libgen software projectsUse the sci-hubTell your peeps, tell your peers, tell your orgSabotage peeps peers and orgs that won't help

评论 #40632050 未加载

评论 #40632031 未加载

评论 #40632170 未加载

评论 #40632253 未加载

henriquenunez12 months ago

I knew it!

BrandoElFollito12 months ago

Elsevier is not to blame, this is fair business. It is like you were blaming homeopathy companies for selling overpriced sugar and chalk.The blame is on Academia who decided that they will be based on paying a company to send them knowledge, paying a company to get knowledge from them, providing them competent people for free and making their decisions dependent on all of this.This is whining about their pain when whipping themselves at the same time.This system was put in place in ancient times and changed in the meantime, the first interested somehow did not notice that.Note: I used to be in Academia and left after my PhD, among others because of these practices and medieval organizations. I loved the time I spent with brilliant people and the hours of teaching.

评论 #40635107 未加载

27 comments

hcrean12 months ago

评论 #40631929 未加载

评论 #40632283 未加载

评论 #40634608 未加载

评论 #40632624 未加载

评论 #40632045 未加载

vsuperpower202012 months ago

评论 #40631705 未加载

评论 #40632243 未加载

squarefoot12 months ago

评论 #40632438 未加载

评论 #40633186 未加载

kkfx12 months ago

hristov12 months ago

1. Download the PDF from Elsevier.2. Open it in your own PDF viewer.3. Press print.4. In the printer selection box, select "Print to PDF".

评论 #40632079 未加载

评论 #40631899 未加载

评论 #40632455 未加载

评论 #40631897 未加载

评论 #40631922 未加载

评论 #40631842 未加载

jbn12 months ago

评论 #40641951 未加载

henriquenunez12 months ago

We need to create a tool to allow the sharing and unlocking of humanity’s knowledge

评论 #40631933 未加载

评论 #40632164 未加载

评论 #40631938 未加载

ur-whale12 months ago

评论 #40632715 未加载

nonrandomstring12 months ago

评论 #40631948 未加载

Archelaos12 months ago

Is this practice legal in the EU?

评论 #40632742 未加载

stefek9912 months ago

SOP = standard operating procedure.I would be surprised if it was otherwise.

评论 #40631904 未加载

1vuio0pswjnm712 months ago

sam_goody12 months ago

评论 #40633772 未加载

beryilma12 months ago

BSDobelix12 months ago

Proudly licensed from Red Star OS ;)

评论 #40632326 未加载

vendiddy12 months ago

Maybe a quick website that allowed folks to upload a PDF and get back an anonymous PDF would be great. That way folks don't have to be proficient in command line!

评论 #40632244 未加载

评论 #40632234 未加载

hobofan12 months ago

[2022]

bayindirh12 months ago

It's good for them. Their blindness makes thair grave digging process more efficient.

ptman12 months ago

Watermarking is much preferable to DRM

评论 #40634250 未加载

wanderingmind12 months ago

huhtenberg12 months ago

With all possible steganographic and watermarking options this looks ... I don't know ... eye-poppingly dumb?