← All runbooks
anthropics / pdf

PDF Processing Guide

Process PDF files using Python libraries and command-line tools to perform operations such as reading, extracting text and tables, merging, splitting, rotating pages, adding watermarks, creating new PDFs, filling forms, encrypting/decrypting, extracting images, and performing OCR

agent claude-codemodel claude-sonnet-4-6snapshot python312-uveval programmatic8 stepsv1.0.0

Deploy PDF Processing Guide to your jetty.io

One-click installs this runbook into a collection on your Jetty account. You can run it from the Spot dashboard, schedule it, or pipe inputs in via the API.

The shape of the run

8 steps · start to finish.

  1. 1
    Step 1

    Environment Setup

    Install all Python dependencies and verify CLI tools are available.

    echo "=== Installing Python dependencies ==="
    pip install pypdf pdfplumber reportlab
    
    # Install optional dependencies based on operation
    OPERATION="${OPERATION:-extract-text}"
    if [[ "$OPERATION" == "ocr" ]]; then
      pip install pytesseract pdf2image
    fi
    if [[ "$OPERATION" == "extract-tables" ]]; then
      pip install pandas openpyxl
    fi
    
    echo "=== Checking CLI tools ==="
    command -v pdftotext >/dev/null 2>&1 && echo "pdftotext: OK" || echo "pdftotext: not found (install poppler-utils)"
    command -v qpdf      >/dev/null 2>&1 && echo "qpdf: OK"      || echo "qpdf: not found"
    command -v pdftk     >/dev/null 2>&1 && echo "pdftk: OK"     || echo "pdftk: not found (optional)"
    
    echo "=== Creating output directory ==="
    mkdir -p /app/results
    

    Verify Python imports succeed before proceeding:

    from pypdf import PdfReader, PdfWriter
    import pdfplumber
    from reportlab.lib.pagesizes import letter
    print("All core dependencies imported successfully")
    

  2. 2
    Step 2

    Validate Inputs

    Verify that the input PDF(s) exist and are readable before running any operation.

  3. 3
    Step 3

    Execute PDF Operation

    Choose the appropriate code block for the requested operation. Run the relevant section only.

  4. 4
    Step 4

    Iterate on Errors (max 3 rounds)

    If Step 3 raised an exception or produced an empty/corrupt output file:

  5. 5
    Step 5

    Validate Outputs

    Verify that all expected output files exist and are non-empty.

  6. 6
    Step 6

    Write Executive Summary

    Write `/app/results/summary.md` with a concise record of the run.

  7. 7
    Step 7

    Write Validation Report

    Write `/app/results/validation_report.json`.

  8. 8
    Step 8

    Final Checklist (MANDATORY — do not skip)

    echo "=== FINAL OUTPUT VERIFICATION ===" RESULTS_DIR="/app/results"