Markdown + Pandoc + Statement

A markdown file is just a plain text file with some human-readable markup like *this* for italics. Markdown basics can be learned in 60 seconds. Markdown files are created with a text editor and saved with the md, mkd or markdown extensions.

(Note: MacOS’s TextEdit doesn’t work as a text editor, it doesn’t save files as plain text. Search for MacOS text editors online. On Windows you can use Notepad, on Linux gedit, though you may want to switch to more powerful editors for sustained projects.)

Pandoc is a command line tool that converts to and from a wide range of document formats. We’ll use it to convert a markdown source to various output formats, including PDF (via LaTeX), html (webpage) and MS Word docx. It uses an extended markdown syntax that is useful to write academic texts.

Statement is a filter that Pandoc can insert in its conversion. It extends the markdown syntax further to handle theorems and other statements.

Write statements: two syntaxes

Start with the simple document below. It begins with an optional preamble between --- and --- lines that allows us to specify some document properties, here its title and author. Later we will use the document preamble to specify its language and to customize our statements.

---
title: My statements
author: Jane E. Doe
---

Theorem. 
: For all numbers $a$ and $b$ we have: $|a + b| \leq |a| + |b|$.

Definition.
: A function is a rule which assigns, to each of certain real numbers,
  some other real number.

This illustates the Definition List syntax for statements. The syntax is normally used for definition lists but the filter repurposes it for writing theorems. (You can still use definition lists, provided that the expression defined isn’t ‘theorem’, ‘lemma’ or some other statement kind.) The first line is the theorem label, followed one or more paragraphs starting with :.

You can also enter theorems as follows:

::: thm
For all numbers $a$ and $b$ we have: $|a + b| \leq |a| + |b|$.
:::

::: def
A function is a rule which assigns, to each of certain real numbers,
some other real number.
:::

This illustrates the fenced Div syntax for statements. Fenced Divs are all-purposes divisions in Pandoc. They are normally invisible in output, but they can be given attributes (here, the thm and def classes, respectively) that allow filters to format them.

I’ve used full labels in definition lists (Theorem, Definition) and short aliases in fenced Divs (thm, def) but I could have done the opposite, or used the kind names theorem and definition:

thm
: For all numbers $a$ and $b$ we have: $|a + b| \leq |a| + |b|$.

definition
: A function is a rule which assigns...

::: theorem
For all numbers $a$ and $b$ we have...
:::

::: Definition
A function is a rule which assigns...
:::

These will all generate the same output.

WORK IN PROGRESS

Generate outputs

Installation

  1. Make sure Pandoc is installed.
  2. Get the statement.lua file from the repository’s releases page. For a simple test, place it in the same folder as your markdown source file. For a more permanent solution you could place it in Pandoc’s user data directory (you can see what it is by running pandoc -v). You can also use an arbitrary location and pass it to Pandoc via the command line.

Create a markdown source file (see below) and save it, say as source.md. Open a terminal and navigate to its folder. Apply the Statement filter to it by running Pandoc with the -L (alias --lua-filter) flag:

pandoc source.md -L statement.lua -o output.pdf

This converts your source into a PDF file, output.pdf file. Change the extension to get other output formats.

If statement is not in the present folder nor in Pandoc’s user data dir you need to specify its absolute or relative path on the command line:

pandoc source.md -L /path/to/statement.lua -o output.pdf

A few tips:

  • add -s (alias --standalone) to produce a self-contained document. This is implied in PDF, MS Word/OpenOffice outputs but not in html.
  • add -N (alias --number-sections) to number the sections of your document.
  • by and large the order in which you specify parameters doesn’t matter:

     pandoc -L statement.lua  -s -N -o page.html source.md
    

    Though if you’re applying several filters, they are applied in the order in which they appear on the command line.

See the Pandoc manual for more detail on command line options, in particular the lua filter and user data dir options.

Localization

If you specify a document’s language in the preamble.

Crossreferencing

Aliases

Each statement kind has:

  • An internal name: theorem, lemma, definition, proof, statement
  • An label that appears in output () a label that appears in output (and possibly some aliases.

However, when we only want to specify one class, we can write it without . and curly brackets:

::: theorem
For all numbers $a$ and $b$ we have: $|a + b| \leq |a| + |b|$.
:::

More on fenced Divs

A fenced Div starts with an opening fence of three or more consecutive colons (:::) and ends with a closing fence of at least three colons. It should separated from previous or subsequent text by a blank line.

The opening fence can carry attributes, which are normally specified between curly brackets and can be classes (starting with .), an identifier (starting with #) and key-value pairs (key=value):

::: { .theorem #mythm source="Spivak 1967" }
For all numbers $a$ and $b$ we have: $|a + b| \leq |a| + |b|$.
:::

Attributes aren’t visible in output, but they may be used by filters (among other things). Statement uses classes to specify a statement kind and identifier to refer to a specific statement. (An identifier is supposed to be unique; classes can be shared by several Divs.)

When there’s just one class specified, we can do without the . and curly brackets. Thus the following two fences are equivalent:

::: { .theorem }

::: theorem

A fenced Div with class thm, theorem or Theorem will be treated as a theorem; lem, lemma or Lemma as a lemma, etc.

A note on LaTeX

In case you’ve been wondering: in our first theorem, the bits between $ signs are mathematical formulas. They aren’t markdown but LaTeX codes: \leq is LaTeX code for the lower-or-equal symbol (≤). Even the $a$ and $b$ are LaTeX formulas: not only they will be output as italics a and b, but in PDF output their typesetting will be subtly different (different font and spacing than an italic a), and in ‘semantic’ documents they will be marked up as equations (html, JATS XML).

You could write an [entire document in LaTeX][LaTeX-intro]. But LaTeX is harder to learn and much less readable than markdown. Here is a bit of text with emphasis (italics) and strong emphasis (bold), a footnote and a citation, in LaTeX:

My \textit{first} \textbf{point}.\footnote{See \cite{smith2022}.}

And in markdown:

My *first* **point**.^[See @smith2022]

LaTeX is also too detailed. It is typesetting language designed for fine-grain control of PDF outputs. While Pandoc can usually do a good job at converting a LaTeX files to other formats, LaTeX document too easily end up with a clutter of design code that doesn’t easily translate in other formats.

Statements are a case in point: theorems can be written in LaTeX (notably with the AMS theorem package) but Pandoc doesn’t fully convert them.

Markdown is a better authoring syntax. For most projects, it has you just what you need to write your document, leaving detailed design issues to later stages. Pandoc readily converts it to main output formats without loss.

To write math formulas in markdown, though, you’ll need to know the bit of LaTeX needed to encode them. The Latex Wikibook has a good overview of maths in LaTeX, and plenty of tutorials online.

If that’s dauting, you could start by using LyX, a MS Word-like visual editor to produce LaTeX that allows you to enter formulas by clicking on symbols or typing their LaTeX code. It displays a symbol’s LaTeX code if you hover over it, so you can easily find symbols and the correspoinding LaTeX code. You’ll quickly get used to directly type the codes in instead.