TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Diff-pdf: tool to visually compare two PDFs

589 pointsby Olshansky11 months ago

27 comments

simonw11 months ago
This inspired me to have Claude 3.5 Sonnet knock out a quick web page prototype for me, using PDF.js to load and render the PDFs to canvas elements and then display visual diffs between their pages.<p>Two prompts:<p><pre><code> Build a tool where I can drag and drop on two PDF files and it uses PDF.js to turn each of their pages into canvas elements and then displays those pages side by side with a third image that highlights any differences between them, if any differences exist rewrite that code to not use React at all </code></pre> Here&#x27;s the result: <a href="https:&#x2F;&#x2F;tools.simonwillison.net&#x2F;compare-pdfs" rel="nofollow">https:&#x2F;&#x2F;tools.simonwillison.net&#x2F;compare-pdfs</a><p>It actually works quite well! Screenshot here: <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;simonw&#x2F;9d7cbe02d448812f48070e7de13a5ae5?permalink_comment_id=5109044#gistcomment-5109044" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;simonw&#x2F;9d7cbe02d448812f48070e7de13a5...</a>
评论 #40860370 未加载
评论 #40861013 未加载
评论 #40896196 未加载
评论 #40860894 未加载
tomwheeler11 months ago
In a previous job, I had to validate the output of an unreliable production publishing system, so I tested dozens of PDF comparison tools available at the time. The best I found was called Delta Walker. It was proprietary commercial Mac-only software, but reasonably inexpensive, accurate, and could handle long PDFs with lots of graphics well.<p>I remember evaluating this diff-pdf tool and finding that it fell short in some way, although it&#x27;s been so long that I don&#x27;t recall the specifics. Most of them failed to identify changes or reported false positives. I also remember being disappointed since this one was open source and could easily be scripted.
评论 #40857105 未加载
评论 #40862261 未加载
评论 #40861016 未加载
ydant11 months ago
Related - this might be helpful to someone.<p>ImageMagick can do a visual PDF compare:<p><pre><code> magick compare -density &quot;$DENSITY&quot; -background white &quot;$1[0]&quot; &quot;$2[0]&quot; &quot;$TMP&quot; </code></pre> (density = 100, $1 and $2 are the filenames to compare, $TMP the output file)<p>You need to do some work to support multiple pages, so I use this script:<p><a href="https:&#x2F;&#x2F;gist.github.com&#x2F;mbafford&#x2F;7e6f3bef20fc220f68e467589bb6a8aa" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;mbafford&#x2F;7e6f3bef20fc220f68e467589bb...</a><p>This also uses `imgcat` to show the difference directly in the terminal.<p>You can also use ImageMagick get a perceptual hash difference using something like:<p><pre><code> convert -metric phash &quot;$1&quot; null: &quot;$2&quot; -compose Difference -layers composite -format &#x27;%[fx:mean]\n&#x27; info: </code></pre> I use the fact you can configure git to use custom diff tools and take advantage of this with the following in my .gitconfig:<p><pre><code> [diff &quot;pdf&quot;] command = ~&#x2F;bin&#x2F;git-diff-pdf </code></pre> And in my .gitattributes I enable the above with:<p><pre><code> *.pdf binary diff=pdf </code></pre> ~&#x2F;bin&#x2F;git-diff-pdf does a diff of the output of `pdftotext -layout` (from poppler) and also runs pdf-compare-phash.<p>To use this custom diff with `git show`, you need to add an extra argument (`git show --ext-diff`), but it uses it automatically if running `git diff`.
评论 #40857435 未加载
thibaut_barrere11 months ago
I have been using this in a CI pipeline to maintain a business-critical PDF generation (healthcare) app (started circa 2010 I think), here is the RSpec helpers I&#x27;m using:<p><a href="https:&#x2F;&#x2F;gist.github.com&#x2F;thbar&#x2F;d1ce2afef68bf6089aeae8d9ddc05ddf" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;thbar&#x2F;d1ce2afef68bf6089aeae8d9ddc05d...</a><p>The code contains git-stored reference PDFs, and the test suite re-generate them and assert that nothing has changed.<p>Helped a lot to audit visual changes, or PDF library upgrades!
评论 #40857000 未加载
评论 #40856228 未加载
poidos11 months ago
Reminds me of the tool Bob Nystrom wrote to help himself out when working on the physical edition of Crafting Interpreters: <a href="https:&#x2F;&#x2F;journal.stuffwithstuff.com&#x2F;2020&#x2F;04&#x2F;05&#x2F;crafting-crafting-interpreters&#x2F;" rel="nofollow">https:&#x2F;&#x2F;journal.stuffwithstuff.com&#x2F;2020&#x2F;04&#x2F;05&#x2F;crafting-craft...</a><p>Whole article is worth reading, but if you want the relevant bits search for “ I wrote a Dart script that would take a PDF of the book”.
jaustin11 months ago
We&#x27;ve been using this in the Micro:bit Educational Foundation (microbit.org) to fill a gap in hardware design tooling, and get visual diffs of our schematics and gerbers during PCB design iterations. It&#x27;s kinda wild that&#x27;s what we ended up doing, but if you want to be sure your radio layout didn&#x27;t change at all when you&#x27;re making a minor revision to a different part of the board, visual diffs are perfect.<p>That said, next project we want to try something more integrated with EDA tools. If anyone else has followed this path, we&#x27;d love to know.
mikeyinternews11 months ago
You can do this with Beyond Compare (it&#x27;s not free, but not very expensive either) <a href="https:&#x2F;&#x2F;www.scootersoftware.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.scootersoftware.com&#x2F;</a>
评论 #40856793 未加载
smartmic11 months ago
I like this tool better: <a href="https:&#x2F;&#x2F;www.qtrac.eu&#x2F;diffpdf.html" rel="nofollow">https:&#x2F;&#x2F;www.qtrac.eu&#x2F;diffpdf.html</a><p>It shows the differences in the GUI side-by-side instead of overlayed.
评论 #40856284 未加载
评论 #40861171 未加载
评论 #40856138 未加载
rawbert11 months ago
We use this tool in our team regularly for comparison of PDFs we obtain from third party services that might have changed after code-changes on our side. Big thanks to the author &lt;3
canistel11 months ago
Interestingly, Github thinks the project is 46% shell, due to the fairly huge wxwin.m4.
评论 #40855876 未加载
deckar0111 months ago
I wrote a pixel-based visual diffing algorithm long ago that was intended for a CI tool that finds all of the UI changes in a PR. I broke the layout of a page I didn’t even know existed as an intern at Inkling and have had this idea in my head ever since.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;deckar01&#x2F;narcis">https:&#x2F;&#x2F;github.com&#x2F;deckar01&#x2F;narcis</a>
crocal11 months ago
I will just chime in to mention Draftable (<a href="https:&#x2F;&#x2F;www.draftable.com&#x2F;compare" rel="nofollow">https:&#x2F;&#x2F;www.draftable.com&#x2F;compare</a>). It really works well. It’s not so easy to have a visually comfortable diff of two PDFs.
ck_one11 months ago
Can anyone recommend a method to deduplicate pdfs? The hash is often different but the content and meta data is 99.99% the same.
评论 #40857499 未加载
评论 #40857888 未加载
评论 #40856929 未加载
strangus11 months ago
<a href="https:&#x2F;&#x2F;10052.ai" rel="nofollow">https:&#x2F;&#x2F;10052.ai</a> has a tool that will visually compare documents(pdfs, doc, image,etc) and cluster them together. It works amazingly well.
sva_11 months ago
Coincidentally I downloaded and tried using this just a while ago. I was trying to see if it can identify an Elsevier fingerprint between two pdfs. It can&#x27;t, it only compares visible things.<p>I used vbindiff instead.
akasakahakada11 months ago
Use this to compare university textbook edition 8 and 9 before buying.
评论 #40856113 未加载
redman2511 months ago
I created a similar in-browser version a while back with mozilla&#x27;s pdf-js. The diff rendering is all run client side.<p><a href="https:&#x2F;&#x2F;www.parepdf.com" rel="nofollow">https:&#x2F;&#x2F;www.parepdf.com</a><p>The diff-pdf project was my inspiration but I wanted to create a version that was distributable to non-programmers.
TacticalCoder11 months ago
This reminds me of a book author who posted here IIRC. He had a little tool allowing him to quickly compare two revisions of his book. For example too make sure typos fixed didn&#x27;t t break havoc. I remember his tool would show in red what had changed on pages thumbnails.
atum4711 months ago
back when I was writing my final paper I faced a similar issue, needed to de-duplicate a bunch of PDF&#x27;s, so I came up with a simple solution<p><a href="https:&#x2F;&#x2F;github.com&#x2F;victorqribeiro&#x2F;dtf">https:&#x2F;&#x2F;github.com&#x2F;victorqribeiro&#x2F;dtf</a>
fwn11 months ago
I really like the overlay view and that it is not cloud based. Will try to test it at work.<p>I rely heavily on PDF comparison via PDF-XChange Editor, which is accurate for text, but often has trouble highlighting visual changes correctly.
riedel11 months ago
I always used DiffPDF only to read on their website: &gt; in the view of the EU’s Cyber Resilience Act and an abundance of caution, we have withdrawn all our free software<p>[1]<p>Good to see post-cyberresilience alternatives :)<p>PDF diffs are really great for versioning&#x2F;comparing PCB-Designs. (The only real use case I had 15 yrs back)<p>[1] <a href="http:&#x2F;&#x2F;www.qtrac.eu&#x2F;diffpdf-foss.html" rel="nofollow">http:&#x2F;&#x2F;www.qtrac.eu&#x2F;diffpdf-foss.html</a>
评论 #40858885 未加载
mycall10 months ago
Of course, Adobe Compare does this too.<p><a href="https:&#x2F;&#x2F;www.adobe.com&#x2F;acrobat&#x2F;features&#x2F;compare-pdfs.html" rel="nofollow">https:&#x2F;&#x2F;www.adobe.com&#x2F;acrobat&#x2F;features&#x2F;compare-pdfs.html</a>
npack11 months ago
<a href="https:&#x2F;&#x2F;onlinetextcompare.com&#x2F;pdf" rel="nofollow">https:&#x2F;&#x2F;onlinetextcompare.com&#x2F;pdf</a> lets you compare text between two pdf files locally within the browser
jgalt21211 months ago
Thanks. I&#x27;ll give this a shot to see if any counterparties try to sneak in any last second changes to the executable version of the doc.
asah11 months ago
Crazy, I&#x27;d have thought that modern multi-modal LLMs can do this, but when I tried Gemini, ChatGPT-4o and Claude they all pooped out:<p>- Gemini at first only diff&#x27;d the text, and then when pushed it identified the items in the images and then hallucinated the differences between the versions. It could not produce an image output.<p>- Claude only diff&#x27;d the text and refused to believe that there images in the PDFs.<p>- ChatGPT attempted to write and execute python code for this, which errored out.
评论 #40856234 未加载
评论 #40856082 未加载
评论 #40855894 未加载
评论 #40856250 未加载
downboots11 months ago
Maybe this could be used to generate PDFs using LaTeX and use the diff as a distance metric to optimize.
Levitating11 months ago
No screenshots?
评论 #40855115 未加载
评论 #40856106 未加载