Pandoc Filter

01/17/2021, Sun
Categories: #JavaScript #markdown

Output file Manipulation

When working with a document format conversion, such as converting over a Markdown file to LaTeX, there will be occasions where you might want to manipulate the output when Pandoc is not able to completely infer the output that is desired.

Fortunately, Pandoc does offer a means of performing such a task through something called a Pandoc filter.

This feature provides the capability to manipulate the line-by-line structure that Pandoc sees when it goes through your document with the filter language of your choice.

To use the JavaScript version of the Pandoc filter, create a new Node package project by creating a new folder and filtering file.

mkdir pandoc-filter
cd pandoc-filter
touch "filter.js"

Install the npm module for Pandoc manipulation, "pandoc-filter"

npm install pandoc-filter

Place the following into a "filter.js" file

// "filter.js"

// File operation
const fs = require("fs");

// Pandoc CLI filter
const pandoc = require("pandoc-filter");

// Define a logger which outputs to a file for easier debugging
const outputLogger = new console.Console(fs.createWriteStream("./output.log"));

// This function will evaluate each of the lines which gets read
// from the supplied input file
function action(key) {
  // Log to "output.log" for debugging
  // outputLogger.log(key);
  switch (key.t) {
    case "Header":
      return header(key);
      break;
    case "Str":
      return str(key);
      break;
    case "Code":
      return code(key);
      break;
    case "RawBlock":
      return rawBlock(...key.c);
      break;
    case "CodeBlock":
      return codeblock(...key.c);
      break;
    case "Para":
      return para(key.c);
      break;
  }
}

/* Other action functions go here */

function header(content) {
  // < Put your transformation code here >
  // Return a RawBlock or RawInline for control over the output
  // return pandoc.RawBlock("tex", newContent);
  // return pandoc.RawInline("tex", `\\kode{${transformedContent}}`);
}

pandoc.stdio(action);

Execute the above with

pandoc /path/to/my_markdown.md --filter /path/to/pandoc-filter/filter.js -o my_converted.tex --verbose

Explanation

The "pandoc.stdio" function gets executed as Pandoc steps through your input file calling on the "action" function. The "action" function is a switch statement function used to check for the document structure type on a specific line or block of text. The switch statement in the "action" function outlines the more common structure types Pandoc will typically encounter in a markdown file. The complete list of "block" and "inline" structures that can be match is found here.

You provide the function to call to modify the final output of the Pandoc document.

The output of manipulation functions should return either a "pandoc.RawInline" or "pandoc.RawBlock" with the new content in it.

If you uncommented out the log output lines, the logged values of the Pandoc AST will go into "output.log" in the same folder as where you executed the Pandoc command and the output log will look like the following

{
  t: 'Para',
  c: [
    { t: 'Str', c: 'some' },
    { t: 'Space' },
    { t: 'Str', c: 'cool' },
    { t: 'Space' },
    { t: 'Str', c: 'text' },
    { t: 'Space' },
    { t: 'Str', c: 'for' },
    { t: 'Space' },
    { t: 'Str', c: 'you' },
    { t: 'Space' },
    { t: 'Str', c: 'reader' }
  ]
}
{ t: 'Str', c: 'some' }
{ t: 'Space' }
{ t: 'Str', c: 'cool' }
{ t: 'Space' }
{ t: 'Str', c: 'text' }
{ t: 'Space' }
{ t: 'Str', c: 'for' }
{ t: 'Space' }
{ t: 'Str', c: 'you' }
{ t: 'Space' }
{ t: 'Str', c: 'reader' }