Labelled lists in Pandoc and Quarto

Filter to create labelled lists in Pandoc and Quarto.

Introduction

This filter provides custom labelled lists in Pandoc’s markdown for outputs in LaTeX/PDF, HTML and JATS XML. Instead of bullets or numbers, list items are given custom text labels. The text labels can include markdown formatting.

View the filter source on GitHub.

Installation

Plain pandoc

Get labelled-lists.lua from the Releases page and save it somewhere Pandoc can find (see PandocMan for details).

Pass the filter to Pandoc via the --lua-filter (or -L) command line option.

pandoc --lua-filter imagify.lua ...

Quarto

Install this filter as a Quarto extension with

quarto install extension dialoa/labelled-lists

and use it by adding labelled-lists to the filters entry in their YAML header:

---
filters:
- labelled-lists
---

See Quarto’s Extensions guide for more details on updating and version-controlling filters.

R Markdown

Use pandoc_args to invoke the filter. See the R Markdown Cookbook for details.

---
output:
  word_document:
    pandoc_args: ['--lua-filter=labelled-lists.lua']
---

Usage

Markdown syntax

A simple illustration of the custom label syntax:

* [Premise 1]{} This is the first claim.
* [Premise 2]{} This is the second claim.
* [Conclusion]{} This is the conclusion.

This generates the following list (process this file with the filter to see the result):

(Premise 1) This is the first claim.

(Premise 2) This is the second claim.

(Conclusion) This is the conclusion.

In general, the filter will turn a bullet list into a custom label list provided that every item starts with a Span element.

Customizing the label delimiters

By default the custom lable is put between two parentheses. You can change this globally by setting a delimiter key within a labelled-lists key in your document’s metadata.

labelled-lists:
  delimiter: )

Possible values:

This can be set for a specific list by using a delimiter attribute on the first span element of your list (same possible values as above):

* [Premise 1]{delimiter='**%1**'} This is the first claim.
* [Premise 2]{} This is the second claim.
* [Conclusion]{} This is the conclusion.

**Premise 1** This is the first claim.

**Premise 2** This is the second claim.

**Conclusion** This is the conclusion.

Cross-referencing custom-label items

Custom labels can be given internal identifiers. The syntax is [label]{#identifier}. In the list below, A1ref, A2ref and Cref identify the item:

* [**A1**]{#A1ref} This is the first claim.
* [A2]{#A2ref} This is the second claim.
* [*C*]{#Cref} This is the conclusion.

Note that # is not part of the identifier. Identifiers should start with a letter and contain only letters, digits, colons :, dots ., dashes - and underscores _.

Labels with identifiers can be crossreferenced using Pandoc’s citations or internal links.

Cross-referencing with citations

The basic syntax is:

You can crossrefer to several custom labels at a time: [@A1ref; @A2ref]. But mixing references to a custom label with bibliographic ones in a same citation won’t work: if Smith2003 is a key in your bibliography [@A1ref; Smith2003] will only output “(A1; Smith, 2003)”.

Because this syntax overlaps with Pandoc’s citation syntax, conflicts should be avoided:

Alternatively, the citation syntax for crossreferencing custom label items can be deactivated. See Customization below.

In Pandoc markdown internal links are created with the syntax [link text](#target_identifier). (Note the rounded brackets instead of curly ones for Span element identifiers.) You can use internal links to cross-refer to custom label items that have a identifier. If your link has no text, the label with its formatting will be printed out; otherwise whichever text you give for the link. For instance, given the custom label list above, the following:

The claim [](#A1ref) together with [the next claim](#A2ref) 
entail ([](#Cref)).

will output:

The claim A1 together with the next claim entail (C).

where the links point to the corresponding items in the list.

Customization

Filter options can be specified in the document’s metadata (YAML block) as follows:

---
title: My document
author: John Doe
labelled-lists:
  disable-citations: true
  delimiter: Period

That is the metadata field labelled-lists contains the filter options as a map. Presently the filter has just one option:

Examples and tests

math formulas

(p1) This list uses

(p2) math formulas as labels.

LaTeX code

() This list uses

() latex code as labels.

Ignored: these are not treated as labels.

Small caps

(All) This list uses

(Some) latex code as labels.

List with Para items

(A1) F(x) > G(x)

(A2) G(x) > H(x)

items with several blocks

(B1) This list’s items

consist of several blocks

iFi > ∑iGi

(B2) Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec et massa ut eros volutpat gravida ut vel lacus. Proin turpis eros, imperdiet sed quam eget, bibendum aliquam massa. Phasellus pellentesque egestas dapibus. Proin porta tellus id orci consectetur bibendum. Nam eu cursus quam. Etiam vehicula in mi sed interdum. Duis rutrum eleifend consectetur. Phasellus ullamcorper, urna at vestibulum venenatis, tellus erat luctus nibh, eget hendrerit justo enim nec magna. Duis mollis ac felis ac tristique.

Pellentesque malesuada arcu ac orci scelerisque vulputate. Aenean at ex suscipit, ultricies tellus sit amet, luctus lectus. Duis ut viverra sapien. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Cras consequat nisi at ex finibus, in condimentum erat auctor. In at nulla at est iaculis pulvinar sed id diam. Cras malesuada sit amet tellus id molestie.

cross-reference with citation syntax

(B1) This is the first claim.

(B2) This is the second claim.

(D) This is the conclusion.

The claim B1 together with the claim B2 entail (D).

(A1) This is the first claim.

(A2) This is the second claim.

(C) This is the conclusion.

The claim A1 together with the claim A2 entail (C).

Details

LaTeX output

\begin{itemize}
\tightlist

\item[(Premise 1)] This is the first claim.

\item[(Premise 2)] This is the second claim.

\item[(Conclusion)] This is the conclusion.

\end{itemize}

HTML output

HTML output is placed in a <div>.

Currently, the list is recreated as a set of paragraphs. Each item is a <p> if it’s one block long, a <div> if longer. The label itself is contained in a <span>.

<div class="labelled-lists-list">
  <p class="labelled-lists-item"><span class="labelled-lists-label">(Premise 1)</span> This is the first claim.</p>
  <p class="labelled-lists-item"><span class="labelled-lists-label">(Premise 2)</span> This is the second claim.</p>
  <div class="labelled-lists-item">
    <p><span class="labelled-lists-label">(<strong>Conclusion</strong>)</span> This third item consists of</p>
    <p>two blocks.</p>
  </div>
</div>

In the future, we’ll output a <ol> list within a Div:

<div class="labelled-lists-list">
  <ul>
  <li><span class="labelled-lists-label">(Premise 1) </span>This is the first claim.</li>
  <li><span class="labelled-lists-label">(Premise 1) </span>This is the first claim.</li>
  <li>
    <p><span class="labelled-lists-label">(<strong>Conclusion</strong>)</span> This third item consists of</p>
    <p>two blocks.</p>
  </li>
  </ul>
</div>

And style it via the CSS (see “Css” global variable in the code).

List structures

Example

example.md

---
title: Labelled lists examples
labelled-lists:
  delimiter: )
  disable-citations: false
---

# List labels and delimiters

Default delimiter format set to  "...)".

Labelled list

* [Premise 1]{} This is the first claim.
* [Premise 2]{} This is the second claim.
* [Conclusion]{} This is the conclusion.

Setting the delimiter for an individual list

* [Label 1]{delimiter='**%1**'} This is the first item.
* [Label 2]{} This is the second item.

Empty list

* []{delimiter=''} This is the first item.
* []{} This is the second item.

# Cross-referencing

Assigning identifiers to list items. Arbitrary markdown formatting on
labels will be preserved in crossreferencing.

* [**A1**]{#A1ref} This is the first claim.
* [A2]{#A2ref} This is the second claim.
* [*C*]{#Cref} This is the conclusion.

Crossreferencing items with the citation syntax

Normal citation [@A1ref]. In-text reference, see @A2ref. Year-only
citations are treated as normal ones [-@Cref]. Referencing several
items [@A1ref; @A2ref].

Crossreferencing items with links

The claim [](#A1ref) together with [the next claim](#A2ref) 
entail ([](#Cref)).

# More examples and tests

## Math formulas

* [$p_1$]{} This list uses
* [$p_2$]{} math formulas as labels.

## LaTeX code

* [\textbf{a}]{} This list uses
* [\textbf{b}]{} latex code as labels.

Ignored: these are not treated as labels.

## Small caps

* [[All]{.smallcaps}]{} This list uses
* [[Some]{.smallcaps}]{} latex code as labels.

## List with Para items

* [A1]{} $$F(x) > G(x)$$
* [A2]{} $$G(x) > H(x)$$

## items with several blocks

* [**B1**]{} This list's items

    consist of several blocks

    $$\sum_i Fi > \sum_i Gi$$

* [**B2**]{} Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec et
  massa ut eros volutpat gravida ut vel lacus. Proin turpis eros, imperdiet sed
  quam eget, bibendum aliquam massa. Phasellus pellentesque egestas dapibus.
  Proin porta tellus id orci consectetur bibendum. Nam eu cursus quam. Etiam
  vehicula in mi sed interdum. Duis rutrum eleifend consectetur. Phasellus
  ullamcorper, urna at vestibulum venenatis, tellus erat luctus nibh, eget
  hendrerit justo enim nec magna. Duis mollis ac felis ac tristique.

  Pellentesque malesuada arcu ac orci scelerisque vulputate. Aenean at ex
  suscipit, ultricies tellus sit amet, luctus lectus. Duis ut viverra sapien.
  Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac
  turpis egestas. Cras consequat nisi at ex finibus, in condimentum erat auctor.
  In at nulla at est iaculis pulvinar sed id diam. Cras malesuada sit amet tellus id molestie.

## cross-reference with citation syntax

* [**B1**]{#B1ref} This is the first claim.
* [B2]{#B2ref} This is the second claim.
* [*D*]{#Dref} This is the conclusion.

The claim @B1ref together with the claim @B2ref 
entail [@Dref].

## cross-reference with internal link syntax

* [**C1**]{#C1ref} This is the first claim.
* [C2]{#C2ref} This is the second claim.
* [*E*]{#Eref} This is the conclusion.

The claim [](#C1ref) together with the claim [](#C2ref) 
entail ([](#Eref)).

output.html

Code

labelled-lists.lua

--[[-- # Labelled-lists - Pandoc / Quarto filter for labelled lists

@author Julien Dutant <julien.dutant@kcl.ac.uk>
@copyright 2021-2024 Julien Dutant
@license MIT - see LICENSE file for details.
@release 0.3

@TODO style the HTML output
@TODO in HTML, leave the BulletList element as is. 
      simply turn the Spans into labels, and wrap in a Div. 
@TODO style the label in all outputs
@TODO Possible solution: first sytle all labels, leaving 
    the pandoc.BulletList as is. Then send it to a formatter
    that wraps it in a Div (html) and adds local CSS style 
    block if needed, or flattens it in Raw for LaTeX.  
@TODO use a Div to declare as custom-label list
]]

-- # Internal settings

--- Options map, including defaults.
-- @disable_citations boolean whether to use pandoc-crossref cite syntax
-- @delimiter list label delimiters (as a list)
local options = {
    disable_citations = false,
    delimiter = {'(',')'},
}

-- target_formats  filter is triggered when those formats are targeted
local target_formats = {
  "html.*",
  "latex",
  "jats",
}

-- html classes
local html_classes = {
    item = 'labelled-lists-item',
    label = 'labelled-lists-label',
    list = 'labelled-lists-list',
}

-- Css to be used later
local Css = [[
div.labelled-list > ul {
  list-style-type: none;
}
div.labelled-list > ul li {
/*  border: 1px solid dimgray;*/
  padding-left: 1em;
}
div.labelled-list > ul > li > label:first-child{
/*  border: 1px solid blue;*/
  display: inline-block;
  min-width: 3em; /* 2em + padding on li */
  margin-left: -3.5em; /* -(2.5em + padding on li) */
  margin-right: .5em;
  color: blue;
}
div.labelled-list > ul > li > *:first-child > label:first-child{
/*  border: 1px solid red;*/
  display: inline-block;
  min-width: 3em; /* 2em + padding on li */
  margin-left: -3.5em; /* -(2.5em + padding on li) */
  margin-right: .5em;
  color: red;
}
]]

-- # Global variable

-- table of indentified labels
local labels_by_id = {}

-- # Helper functions

--- type: pandoc-friendly type function
-- pandoc.utils.type is only defined in Pandoc >= 2.17
-- if it isn't, we extend Lua's type function to give the same values
-- as pandoc.utils.type on Meta objects: Inlines, Inline, Blocks, Block,
-- string and booleans
-- Caution: not to be used on non-Meta Pandoc elements, the 
-- results will differ (only 'Block', 'Blocks', 'Inline', 'Inlines' in
-- >=2.17, the .t string in <2.17).
local type = pandoc.utils.type or function (obj)
        local tag = type(obj) == 'table' and obj.t and obj.t:gsub('^Meta', '')
        return tag and tag ~= 'Map' and tag or type(obj)
    end

--- format_matches: Test whether the target format is in a given list.
-- @param formats list of formats to be matched
-- @return true if match, false otherwise
function format_matches(formats)
  for _,format in pairs(formats) do
    if FORMAT:match(format) then
      return true
    end
  end
  return false
end

--- message: send message to std_error
-- @param type string INFO, WARNING, ERROR
-- @param text string text of the message
function message(type, text)
    local level = {INFO = 0, WARNING = 1, ERROR = 2}
    if level[type] == nil then type = 'ERROR' end
    if level[PANDOC_STATE.verbosity] <= level[type] then
        io.stderr:write('[' .. type .. '] Labelled-lists lua filter: ' 
            .. text .. '\n')
    end
end

-- # Filter functions

--- filter_citations: process citations to labelled lists
-- Check whether the Cite element only contains references to custom 
-- label items, and if it does, convert them to crossreferences.
-- @param cite pandoc AST Cite element
function filter_citations(cite)

    -- warn if the citations mix cross-label references with 
    -- standard ones
    local has_cl_ref = false
    local has_biblio_ref = false

    for _,citation in ipairs(cite.citations) do
        if labels_by_id[citation.id] then
            has_cl_ref = true
        else
            has_biblio_ref = true
        end
    end

    if has_cl_ref and has_biblio_ref then
        message('WARNING', 'A citation mixes bibliographic references \
            with custom label references '
            .. pandoc.utils.stringify(cite.content) )
        return
    end

    if has_cl_ref and not has_biblio_ref then

        -- get style from the first citation
        local bracketed = true 
        if cite.citations[1].mode == 'AuthorInText' then
            bracketed = false
        end

        local inlines = pandoc.List:new()

        -- create link(s)

        for i = 1, #cite.citations do
           inlines:insert(pandoc.Link(
                labels_by_id[cite.citations[i].id],
                '#' .. cite.citations[i].id
            ))
            -- add separator if needed
            if #cite.citations > 1 and i < #cite.citations then
                inlines:insert(pandoc.Str('; '))
            end
        end


        if bracketed then
            inlines:insert(1, pandoc.Str('('))
            inlines:insert(pandoc.Str(')'))
        end

        return inlines

    end

end

--- filter_links: process internal links to labelled lists
-- Empty links to a custom label are filled with the custom
-- label text. 
-- @param element pandoc AST link
-- @TODO in LaTeX output you need \ref and \label
function filter_links (link)

    if pandoc.utils.stringify(link.content) == '' 
        and link.target:sub(1,1) == '#' 
        and labels_by_id[link.target:sub(2,-1)] then

        link.content = labels_by_id[link.target:sub(2,-1)]
            return link

    end

end

-- style_label: style the label
-- returns a styled label. Default: round brackets
-- @param label Inlines an item's label as list of inlines
-- @param delim (optional) a pair of delimiters (list of two strings)
-- @return pandoc.Inlines label
function style_label(label, delim)
    if not delim then
        delim = options.delimiter
    end
    styled_label = label:clone()
    styled_label:insert(1, pandoc.Str(delim[1]))
    styled_label:insert(pandoc.Str(delim[2]))
    return styled_label
end

--- build_list: processes a custom label list
-- returns a list of blocks containing Raw output format code
-- @param element BulletList the original Bullet List element
function build_list(element)

    -- build a list of blocks
    local list = pandoc.List:new()

    -- start

    if FORMAT:match('latex') then
        list:insert(pandoc.RawBlock('latex',
            '\\begin{itemize}\n\\tightlist'
            ))
    elseif FORMAT:match('html') then
        list:insert(pandoc.RawBlock('html',
            '<div class="' .. html_classes['list'] .. '">'
            ))
    end

    -- does the first span have a delimiter attribute?
    -- element.c[1] is the first item in the list, type blocks
    -- .. [1].c is the first block's content, type inlines
    -- .. [1] the first inline in that block, our span
    local span = element.c[1][1].c[1]
    local delim = nil
    if span.attributes and span.attributes.delimiter then
        delim = read_delimiter(span.attributes.delimiter)
    end

    -- process each item

    for _,blocks in ipairs(element.c) do

        -- get the span, remove it from the tree, store its content
        local span = blocks[1].c[1]
        blocks[1].c:remove(1)
        local label = pandoc.List(span.content)
        local id = ''

        -- get identifier if not duplicate, store a copy in global table
        if not (span.identifier == '') then
            if labels_by_id[span.identifier] then
                message('WARNING', 'duplicate item identifier ' 
                    .. span.identifier .. '. The second is ignored.')
            else
                labels_by_id[span.identifier] = label
                id = span.identifier
            end
        end

        if FORMAT:match('latex') then

            local inlines = pandoc.List:new()
            inlines:insert(pandoc.RawInline('latex','\\item['))
            inlines:extend(style_label(label, delim))
            inlines:insert(pandoc.RawInline('latex',']'))
            -- create link target if needed
            if not(id == '') then 
                inlines:insert(pandoc.Span('', {id = id}))
            end            

            -- if the first block is Plain or Para, we insert
            -- the label code at the beginning
            -- otherwise we add a Plain block for the label
            if blocks[1].t == 'Plain' or blocks[1].t == 'Para' then
                inlines:extend(blocks[1].c)
                blocks[1].c = inlines
                list:extend(blocks)
            else
                list:insert(pandoc.Plain(inlines))
                list:extend(blocks)
            end

        elseif FORMAT:match('html') then

            local label_span = pandoc.Span(style_label(label, delim))
            label_span.classes = { html_classes['label'] }
            if id then label_span.identifier = id end

            -- if there is only one block and it's Plain or Para,
            -- we create the item as <p>, otherwise as <div>
            if #blocks == 1 and 
                (blocks[1].t == 'Plain' or blocks[1].t == 'Para') then
                    local inlines = pandoc.List:new()
                    inlines:insert(1, pandoc.RawInline('html', 
                        '<p class="' .. html_classes['item'] .. '">'))
                    inlines:insert(label_span)
                    inlines:extend(blocks[1].c)
                    inlines:insert(pandoc.RawInline('html', '</p>'))
                    list:insert(pandoc.Plain(inlines))
            else
                -- if the first block is Plain or Para we insert the
                -- label in it, otherwise the label is its own paragraph
                if (blocks[1].t == 'Plain' or blocks[1].t == 'Para') then
                    local inlines = pandoc.List:new()
                    inlines:insert(label_span)
                    inlines:extend(blocks[1].c)
                    blocks[1].c = inlines
                else
                    blocks:insert(1, pandoc.Para(label_span))
                end

                list:insert(pandoc.Div(blocks,  
                    { class = html_classes['item'] } ))        

            end
 
        end

    end

    if FORMAT:match('latex') then
        list:insert(pandoc.RawBlock('latex',
            '\\end{itemize}\n'
            ))
    elseif FORMAT:match('html') then
        list:insert(pandoc.RawBlock('html','</div>'))        
    end

    return list
end

--- is_custom_labelled_list: Look for custom labels markup
-- Custom label markup requires each item starting with a span
-- containing the label
-- @param element pandoc BulletList element
function is_custom_labelled_list (element)

    local is_cl_list = true

    -- the content of BulletList is a List of List of Blocks
    for _,blocks in ipairs(element.c) do

        -- check that the first element of the first block is Span
        -- ~~and not empty~~ allowing empty
        if not( blocks[1].c[1].t == 'Span' ) 
            -- or pandoc.utils.stringify(blocks[1].c[1].content) == '' 
            then
            is_cl_list = false
            break 
        end
    end

    return is_cl_list
end

--- read_delimiter: process a delimiter option
-- @delim: string, e.g. `Parens` or `[%1]`
-- @return: a pair of delimiter strings
function read_delimiter(delim) 
    delim = pandoc.utils.stringify(delim)

    --- process standard Pandoc attributes and their equivalent
    if delim == '' or delim:lower() == 'none' then
        return {'',''}
    elseif delim == 'Period' or delim == '.' then
        return {'', '.'}
    elseif delim == 'OneParen' or delim == ')' then
        return {'', ')'}
    elseif delim == 'TwoParens' or delim == '(' or delim == '()' then
        return {'(',')'}
    --- if it contains '%1' assume it's a substitution string for gmatch
    -- the left delimiter is before '%1' and the right after
    elseif string.find(delim, '%%1') then
        return {delim:match('^(*.)%%1') or '', 
                delim:match('%%1(*.)$') or ''}
    end

end

--- Read options from metadata block.
--  Get options from the `statement` field in a metadata block.
-- @todo read kinds settings
-- @param meta the document's metadata block.
-- @return nothing, values set in the `options` map.
-- @see options
function get_options(meta)
  if meta['labelled-lists'] then

    if meta['labelled-lists']['disable-citations'] then
        options.disable_citations = true
    end


    -- default-delimiter: string
    if meta['labelled-lists'].delimiter and 
            type(meta['labelled-lists'].delimiter) == 'Inlines' then
        local delim = read_delimiter(pandoc.utils.stringify(
                        meta['labelled-lists'].delimiter))
        if delim then options.delimiter = delim end
    end

  end
end

-- # Filter

--- Main filters: read options, process lists, process crossreferences
read_options_filter = {
    Meta = get_options
}
process_lists_filter = {
    BulletList = function(element)
        if is_custom_labelled_list(element) then
            return build_list(element)
        end
    end,
}
crossreferences_filter = {
    Link = filter_links,
    Cite = function(element) 
        if not options.disable_citations then 
            return filter_citations(element)
        end
    end
}


--- Main code
-- return the filters in the desired order
if format_matches(target_formats) then
    return { read_options_filter, 
        process_lists_filter, 
        crossreferences_filter
    }
end