Thanks to Stack Exchange, I have my PDF solution

Posted On March 15, 2021

A helpful user on Stack Exchange solved the problem from my previous post for me by creating a small bash script, which reads values from a CSV file and then replaces the matching values in the PDF with the new ones. Check it out:

awk '{print $NF}' marks.csv | awk -F. '{print "s/"$0"/"($1+50)"."$2"/g"}' >replace.sed
while read -rs f; do
  sed -f replace.sed "$f" >"$(sed 's/\.[pP][dD][fF]$//' <<<"$f")_adjust.pdf"
done < <(/bin/ls *.[pP][dD][fF])

This script takes an input file marks.csv, whose final column contains the FitH (horizontal co-ordinates of links and bookmarks) values from a PDF. It adds 50 units to the values before the decimal and then appends the decimal after, outputting to replace.sed. Using the values replace.sed file, the sed command scans the raw input of the PDF for the x values and replaces them with the y values in the replacement syntax ‘s/x/y/g’. Thus, spitting out a PDF with the correct link co-ordinates and solving my issue of unaligned bookmarks produced by pdfroff!

Compare the original PDF sample with the fixed PDF, to see how effective the script is!

Original PDF ‘bookmarks.pdf’Download

Fixed PDF ‘bookmarks_adjust.pdf’Download

I may make some improvements to the script, such as simply editing the code to scan the original PDF for the necessary FitH values, rather than requiring a .csv generated by jpdftweak. That should be easy to do, as I’ve experimented with regex values before to extract those values. I should also edit the script to allow the user to say whatever they want for the input .csv, if they do already have one generated.

Add a Comment

You must be logged in to post a comment.