Utilities to operate on lots of PDF files
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Test
README.md
pdfdir-copy
pdfdir-join
pdfdir-verify

README.md

pdfdir

Turns a directory tree of PDFs into a single bookmarked PDF. Automatically handles the table of contents.

Tested on Linux and Mac.

Usage

If you arrange your PDF files in folders like this:

book/01-Table of Contents.pdf
book/02-First Generation/01-Mary Cunningham.pdf
book/02-First Generation/02-Peter Cunningham.pdf
book/02-First Generation/02-:more-notes.pdf
book/03-Second Generation/01-John Mendell Cunningham.pdf
book/99-Index.pdf

and run:

$ pdfdir-join book

you will find the result in "book.pdf"

The PDF's table of contents will be automatically generated from the filenames:

Table of Contents
First Generation
  Mary Cunningham
  Peter Cunningham
Second Generation
  John Mendell Cunningham
Index

The 01-, 02- prefixes determine the order of the chapters in the final book and don't appear in the bookmarks.

If you don't want a file to be added to the TOC, adding a : to the beginning of its filename will suppress it (02-:more-notes.pdf above).

Prerequisites

MacOS: brew install ghostscript Linux: apt-get install ghostscript

And also Ruby. Hopefully this is temporary.

Verify PDFs

This package also includes some tools to help assemble the input files. This will find corrupt PDFs:

$ pdfdir-verify book

It uses Ghostscript to carefully process every page of every PDF file. This is awfully slow. You can specify --quick for a 10X speedup at the risk of missing some obscure corruptions.

Re-encode PDFs

If you're having trouble with encrypted or corrupt PDFs, try using pdfdir-copy to duplicate your entire directory structure. It takes a while but, because it re-encodes each PDF, the result is sure to be valid.

$ pdfdir-copy book /tmp/book-fixed