Fixing PDF bookmark locations

It’s been a while since I’ve attempted to code anything. After hours of trying to figure it out, I finally found a somewhat-working solution. Unfortunately, it’s not a one-stop-shop solution. Let me explain.

The issue

So, a very efficient way to try to write things is using Pandoc, which allows a user to write something in Markdown format and then convert it to a variety of markup languages, such as LaTeX, HTML, or even (g/t)roff. As I’m not very adept at LaTeX nor groff, I decided to try making a PDF from Markdown with pdfroff as my engine. The PDF rendered as expected, but I noticed that links and bookmarks were all positioned lower than they should be.

As I pointed out in a previous post, many PDFs can be viewed from certain text editors, such as VIM, as long as they are uncompressed. It turns out that Pandoc doesn’t compress the PDF files, so I was able to actually view the source code of the PDF. Playing around with the coordinates of the bookmarks, I was able to deduce that 38px North is all I would need to position the bookmarks properly. So, I set off to figure out how to do that.

The solution

After many hours of Googling how to properly use sed, awk, cut, and paste commands in the terminal, I finally got a solution I was happy with. The only step that I have to do manually is extract the bookmarks of the PDF in CSV format. The script then automatically adds 38px to each bookmark’s coordinates, overwrites the file, and then I just have to merge the PDF with the CSV file again. It actually worked! Here’s the source, since I don’t feel like actually linking a file at the moment:

#!/bin/sh
#This is a program to accompany the workflow of pandoc -> pdf with pdfroff as the pre-processor. For some reason, it keeps screwing up the bookmarks, so this script is a fix. You MUST first extract the bookmarks as a CSV. I use jPDF Tweak to do this and also to put the bookmarks back.
 incsv=$1
 tmpcsv=$incsv.tmp
 awk -F ";" -v OFS=';' '{print $4}' $1 | awk -F ' ' -v OFS=' ' '{print $1" "$2" "$3 + 38}' | paste -d\; $1 -> $tmpcsv && awk -F ";" -v OFS=';' '{print $1";"$2";"$3";"$5}' $tmpcsv > $1 && rm $tmpcsv && echo "Done!" || echo "Script failed." && exit

I know that it probably looks bloated and n00bish, so if anyone has any ideas as to how I could optimize the code, please let me know. I also want to figure out a way to automate the extraction and combination of the CSV file into the new PDF. I’ve seen StackExchange discussions of using the CLI of jpdftweak to do this, so I suppose that’s an option, if I don’t mind using bloated Java every time I want to do this. Another fellow on GitHub seems to have developed some code for bookmarks, too. So I may try to integrate that.

If I feel brave enough, I may bring this up to the Pandoc team to see if anyone else has an issue with this.

Add a Comment