TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Borb – A Python library to read, write, and edit PDF files

262 pointsby dsr12over 3 years ago

12 comments

Syzygiesover 3 years ago
I am a math professor with a scanned exam grading workflow that I hacked together as Bash scripts using various open source command line tools. I feed all the exams through a sheet-fed scanner, decode bar codes to identify problems and students, add radio buttons for entering and tracking scores (0-6 per problem), and create PDF &quot;books&quot; per problem for grading and annotating.<p>Having grad students help grade paper is a consistency nightmare: It&#x27;s look once, never look back. Instead, after each of several provisional passes I recreate the PDF &quot;book&quot; for that problem, with a chapter for each score, and students randomized within each chapter. In the same spirit as &quot;checking your work lets you work three times faster&quot; this is actually both more consistent and faster that a single pass over paper. Almost all of my attention is on the math, which I&#x27;m good at, rather than locating problems and finding again the ones I know I misgraded, which I&#x27;m not good at.<p>Then each student&#x27;s exam needs to be extracted from these problem PDFs, scores recorded, and annotations frozen.<p>There are cloud services for grading. They&#x27;re hopelessly primitive, with cloud lag. Like a gamer, I used to reject wireless mice because of the lag. I reject these services. I can grade everything myself faster than using a team of grad students, with the right local tools.<p>The PDF format is a morass. My hats off to anyone who will work with it. There are many evolutionary layers and no formal specification or verification; one tests a PDF by seeing if most programs accept it.<p>It&#x27;s time for me to rewrite my grading system in a modern scripting language, so others could use it. I prefer Ruby, but that&#x27;s mainly to stave off boredom when I&#x27;m not using Haskell. I can use Python. This would permit a more robust workflow, such as adding late exams in mid-grading without losing grading in progress.<p>I can&#x27;t find documentation for Borb, to check off the list of features I&#x27;d need. I suspect from this being a one-person project that I might need to continue to patch together external tools.
评论 #28577476 未加载
评论 #28575717 未加载
评论 #28575639 未加载
评论 #28575603 未加载
评论 #28591423 未加载
评论 #28577568 未加载
评论 #28586105 未加载
spapas82over 3 years ago
Haven&#x27;t tested this lib, however be careful before including it in your project because of its license (it is dual licensed agpl&#x2F;commercial). This means that you can use it only if your project is GPL or else you need a commercial license.<p>On the other hand, the reportlab pdf generation library (which is what I actually use) offers a permissive language in its open source version (and a commercial reportlab plus version), so it can be included in all kinds of projects.
评论 #28577122 未加载
评论 #28578392 未加载
wodenokotoover 3 years ago
Some books starts counting pages after the table of contents and so the pdf page number and the book page number are not in sync.<p>I’ve seen some PDFs have the first few pages counted in Roman numerals and then “normal” numbers for the main content.<p>How do you edit an existing pdf to do that?
评论 #28575263 未加载
alephu5over 3 years ago
Amazing, I&#x27;ve been yearning for something like this for years but have always been told it&#x27;s impossible. Can&#x27;t wait to try it
评论 #28575537 未加载
rstuart4133over 3 years ago
There are a few PDF Python libraries and open source programs out there, but as far as I can tell all lack one feature: signing. If anyone knows of a open source tool kit or library that can sign, I&#x27;d be most appreciative.
jl6over 3 years ago
Always good to see more open source tools for the PDF ecosystem.<p>I couldn’t see any support for PDF&#x2F;A (the good version of PDF) in borb though.
pixelmonkeyover 3 years ago
The README in the GitHub repo for borb is a bit of a better explainer than this landing page (especially for Python programmers).<p><a href="https:&#x2F;&#x2F;github.com&#x2F;jorisschellekens&#x2F;borb&#x2F;blob&#x2F;master&#x2F;README.md" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;jorisschellekens&#x2F;borb&#x2F;blob&#x2F;master&#x2F;README....</a>
评论 #28576705 未加载
nickspainover 3 years ago
This is awesome! I&#x27;ve been looking for something that could be a link between PDFs and Instapaper[0] for a while. This looks like it&#x27;ll be perfect to build such a tool with.<p>[0] <a href="https:&#x2F;&#x2F;www.instapaper.com" rel="nofollow">https:&#x2F;&#x2F;www.instapaper.com</a>
chrismorganover 3 years ago
I have a PDF of a hymn book that I want to convert to 2-up so I can use it that way on my reMarkable which only supports single-page display and not two-page spreads; but I <i>also</i> want all the internal hyperlinks (to and from a table of contents) to keep working. I haven’t found <i>any</i> software that seems capable of doing this (though I’ve only looked at FOSS; wouldn’t surprise me if something Adobe could do it). The closest I seem to have found is qpdf which <i>might</i> be able to do it with some programming effort.<p>Is that sort of thing going to be in scope for this library’s editing capabilities? (“Editing PDFs” is such a broad, open-ended thing.)
cycomanicover 3 years ago
So how does this compare to the python bindings of mupdf? Which IMO is the most featureful module to manipulate PDFs in python (I&#x27;m a bit buffled by all the comments that something like this didn&#x27;t exist before).
评论 #28577258 未加载
einpoklumover 3 years ago
So, are there decent lower-level (e.g. C++ or even C) libraries for doing this, which this library wraps? Or does this actually do the nitty-gritty PDF innards itself?<p>As for myself, I&#x27;ve not had to automate work on PDFs, luckily; for manual manipulation and annotation I&#x27;ve found Xournal++ sort of useful (<a href="https:&#x2F;&#x2F;xournalpp.github.io&#x2F;" rel="nofollow">https:&#x2F;&#x2F;xournalpp.github.io&#x2F;</a>). Inkscape can also be used with some questionable PDFs.
评论 #28576557 未加载
sneakover 3 years ago
I was going to be upset if the project logo were not a fat birb. I was not disappointed. :)