Pandoc/Quarto filter for self-citing BibTeX bibliographies.
BibTeX’s documentation allows self-citing bibliographies: entries that cite other entries, e.g. in their note, abstract or title fields. Citeproc - Pandoc’s and Quarto’s internal bibliography engine - doesn’t handle them. This filter acts as a drop-in replacement for Citeproc that handles self-citing bibliographies.
The filter still runs Citeproc in the background: CSL bibliography style files are applied as expected.
BibTeX bibliographies can self-cite: one bibliography entry
may cite another entry. That is done in two ways: the
crossref
field to cite a collection from which an entry is
extracted (see the BibTeX’s
documentation), or by entering citation commands, e.g. in a note
field:
@incollection{Doe:2000,
author = 'Jane Doe',
title = 'What are Fish Even Doing Down There',
crossref = 'Snow:2000',
}@book{Snow:2010,
editor = 'Jane Snow',
title = 'Fishy Works',
note = 'Reprint of~\citet{Snow:2000}',
}@collection{Snow:2000,
editor = 'Jane Snow',
title = 'Fishy Works',
}
LaTeX’s bibliography engines (natbib
,
biblatex
) handle self-citations of both kinds.
Pandoc and Quarto can use those engines but only for PDF output. They come instead with their own engine, Citeproc, which conveniently uses citation styles files and covers all output formats.
However, Citeproc only handles crossref
self-citations.
It fails to process citation commands in bibliographies.
This filter enables Citeproc to process cite commands in the bibliography. It ensures that the self-cited entries are displayed in the document’s bibliography.
Are self-citing bibliographies a good idea? It ensures consistency by avoiding multiple copies of the same data, but creates dependencies between entries. The citation sytle language doesn’t seem to permit it. Be that as it may, many of us have legacy self-citing bibliographies, so we may as well handle them.
Pandoc 2.17+ or Quarto 1.4+
Note. Version 1 of this filter does not work with Pandoc
3.1.10+ and Quarto 1.4.530+. If switching from version 1 to current
version, make sure you do not call -C
or
--citeproc
in Pandoc or set citeproc: false
in
Quarto. See below for details.
This filter remplaces Citeproc.
The filter modifies the internal document representation; it can be used with many publishing systems that are based on Pandoc.
Pass the filter to pandoc via the --lua-filter
(or
-L
) command line option:
pandoc --lua-filter recursive-citeproc.lua ...
Or via a defaults file:
filters:
- recursive-citeproc.lua
Copy the file in your Pandoc user data directory to make it available
to Pandoc anywhere. Run pandoc -v
to see where your Pandoc
user data directory is.
Do not use Citeproc. Do not use the
--citeproc
or -C
option in combination with
this filter. If applied before the filter, it is redundant; if after, it
generates a duplicate bibliography.
Users of Quarto can install this filter as an extension with
quarto install extension dialoa/recursive-citeproc.git
and use it by adding recursive-citeproc
to the
filters
entry in their YAML header. You should also
deactivate Citeproc:
---
citeproc: false
filters:
- recursive-citeproc
---
If you use other filters and specify their order relative to Quarto, it is safer to run this filter after Quarto’s own:
---
citeproc: false
filters:
- ...
- quarto
- recursive-citeproc
---
Use pandoc_args
to invoke the filter. See the R
Markdown Cookbook for details.
---
output:
word_document:
pandoc_args: ['--lua-filter=recursive-citeproc.lua']
---
Do not use Citeproc. Before this filter, it is redundant; after, it duplicates the bibliography.
You can specify the filter’s maximum recursive depth in the document’s metadata. Use 0 for infinte (default 10):
recursive-citeproc:
max-depth: 5
A max-depth
of 2, for instance, means that the filter
inserts references that are only cited by references cited in the
document’s body, but not references that are only cited by references
that are themselves only cited by references cited in the document.
If the max depth is reached before all self-recursive citations are processed, PDF output may generate an error.
To try the filter with Pandoc or Quarto, clone the directory.
Generate Pandoc outputs with make generate
. Change the
output format with make generate FORMAT=docx
. Use
FORMAT=latex
for latex outputs. You can list multiple
formats, make generate FORMAT="docx pdf"
. The outputs will
be in the test
folder, named
expected.<format>
.
Requires Pandoc.
As above, replacing generate
with
qgenerate
.
Requires Quarto.
With Quarto installed, you can also
use the Pandoc engine embedded in Quarto: add the argument
PANDOC="quarto pandoc"
to the Pandoc commands above,
e.g. make generate FORMAT=docx PANDOC="quarto pandoc"
.
Version 2 is meant to replace Citeproc. It returns the document
appended with a refs
Div containing Citeproc bibliography
output.
The filter runs Citeproc on the document and checks whether the generated bibliography contains citations. If not, it simply returns the document with bibliography.
If the bibliography contains citations, the filter recursively runs
Citeproc on those citations, generated citations, and so on recursively
until all needed citations are identified. They are then added to the
document’s nocite
metadata field.
Citeproc is then run on the document, which typesets Cite elements in the document body and adds a bibliography with all needed entries to cover self-citations. However, Cite elements in the bibliography may still contain LaTeX cite commands that aren’t typeset yet. To ensure these are typeset, we run Citeproc on the bibliography itself, and update the document’s bibliography with the result.
The last step of the process generates a duplicate bilbiography which
we discard. There is no way around it since Pandoc 3.1.10: if we ran
Citeproc on the bibliography with suppress-bibliography
the
Cite commands couldn’t be converted to links. To ensure
link-references
adds links to citations even in the
bibliography, we must leave suppress-bibliography
to
false.
Version 1 of this filter was supposed to be run in combination with and before Citeproc.
It added a Citeproc-generated bibliography to the document, which
could contain Cite elements
whose content
could contain a LaTeX citation commands, and
exited with the document’s metadata key
suppress-bibliography
to true
. Citeproc
running after this would:
content
of Cite
elements in the the bibliography.content
of Cite elements, if
document’s metadata key link-references
was
true
,The filter’s main task was to ensure that the Citeproc-generated
bibliography contained all entries cited in bibliography entries, and
entries cited in bibliography entries cited in other bibliographies
entries, and so on. That was done by generating a the bibliography a
first time, checking whether it added citations, adding them to the
metadata nocite
key and trying again until no new citations
was added or the maximal depth was reached.
Since Pandoc 3.1.10, suppress-bibliography
deactivates
link-references
. The filter would still handle self-citing
bibliographies but link-references
would have no effect:
citations would not be linked to bibliographies. To let Citeproc link
references, we would need to remove suppress-bibliography
,
but we would then get a duplicate bibliography.
The solution in version 2 was to incorporate the last Citeproc step
within the filter; we run it witout suppress-bibliography
for the references to be linked if link-references
is set
and we take out the duplicate bibliography it outputs.
Based on an idea given by John MacFarlane on the pandoc-discuss mailing list.
This pandoc Lua filter is published under the MIT license, see file
LICENSE
for details.
---
title: 'Self-citing bibliography example'
author: Julien Dutant
recursive-citeproc:
max-depth: 50 # optional, specify max recursive depth
debug: true
bibliography: references.bib
nocite: # checks that nocite references are preserved
- '@Smith2001'
- |
@Smith2003
abstract: |
This document illustrates self-citing bibliographies.
The abstract is used to check that citations in
metadata are printed correctly [@Smith2005].
---
# Tests
## Simple recursion
@Doe2020 illustrates a simple recursion.
## Citations in the `nocite` metadata fields
Two more citations from Smith, namely 2001 `nocite` metadata field. This checks
and 2003 appear only in the
that they are preserved in the final document.
## Suffixes.
@Jones2022feb and @Jones2022may check that automatic suffixes are
handled correctly. In the first recursion round these citations
get suffixes a and b. Ultimately they should settle on suffixes b
and e as the cross-referred Jones 2022 entries are added
chronologically and their suffixes recalculated.
## Placement and preamble of the bibliography
We can place the bibliography at an arbitrary point of the
document, and even add a preamble to it. The bibliography
should appear below this paragraph and start with a
brief user preamble.
::: {#refs}
This is the user preamble for the bibliography.
:::
## Debug
`debug` option to `true` in the metadata
Setting the filter's
shows the bibliography self-citations found at each recursion
round, provided Pandoc/Quarto is in verbose output mode.
---------------------------------------------------------
----------------Auto generated code block----------------
---------------------------------------------------------
do
local searchers = package.searchers or package.loaders
local origin_seacher = searchers[2]
searchers[2] = function(path)
local files =
{
------------------------
-- Modules part begin --
------------------------
["CitationIdList"] = function()
--------------------
-- Module: 'CitationIdList'
--------------------
--[[ CitationIdList class
Hold and manipulate lists of citations Ids.
]]
local type = pandoc.utils.type
--- # Helper functions
---Concatenate a List of lists
---@param list pandoc.List[] list of pandoc.Lists
---@return pandoc.List result concatenated List
local function listConcat(list)
local result = pandoc.List:new()
for _,sublist in ipairs(list) do
result:extend(sublist)
end
return result
end
---Flatten a meta value to Inlines
---in pandoc < 2.17 we only return a pandoc.List of Inline elements
---@param elem pandoc.Inlines|string|number|pandoc.Blocks|pandoc.List
---@return pandoc.Inlines|pandoc.List result possibly empty Inlines
local function flattenToInlines(elem)
local elemType = type(elem)
return elemType == 'Inlines' and elem
or elemType == 'string'
and pandoc.Inlines(pandoc.Str(elem))
or elemType == 'number'
and pandoc.Inlines(pandoc.Str(tonumber(elem)))
or elemType == 'Blocks' and pandoc.utils.blocks_to_inlines(elem)
or elemType == 'List' and listConcat(
elem:map(flattenToInlines)
)
or pandoc.Inlines{}
end
-- # CitationIdList object
---@alias CitationId string Citation Identifier
---@class CitationIdList
---@field data CitationId[] list of citation ids
---@field new fun(self: CitationIdList, source?:pandoc.Pandoc|pandoc.Meta|pandoc.Blocks|pandoc.Block|CitationId[]):CitationIdList
---@field toStr fun(self: CitationIdList):string
---@field isEmpty fun(self: CitationIdList): boolean
---@field find fun(self: CitationIdList, citationId: CitationId):boolean
---@field includes fun(self: CitationIdList, citationIdList: CitationIdList):boolean
---@field insert fun(self: CitationIdList, citationId: CitationId):nil
---@field remove fun(self: CitationIdList, citationId: CitationId):nil
---@field clone fun(self: CitationIdList):CitationIdList
---@field minus fun(self: CitationIdList, citationIdList: CitationIdList):CitationIdList
---@field plus fun(self: CitationIdList, citationIdList: CitationIdList):CitationIdList
---@field addFromCitationIds fun(self: CitationIdList, list: CitationId[]):nil
---@field addFromCite fun(self: CitationIdList, cite: pandoc.Cite):nil
---@field addFromWalkable fun(self: CitationIdList, container: pandoc.Pandoc|pandoc.Meta|pandoc.Blocks|pandoc.Inlines):nil
---@field addFromBlock fun(self: CitationIdList, inlines: pandoc.Block):nil
---@field addFromReferences fun(self: CitationIdList, doc: pandoc.Pandoc):nil
---@field insertInNocite fun(self: CitationIdList, meta: pandoc.Meta):pandoc.Meta
local CitationIdList = {}
---Create an CitationIdList object
---@param source? pandoc.Pandoc|pandoc.Meta|pandoc.Blocks|pandoc.Block|CitationId[]
---@return CitationIdList
function CitationIdList:new(source)
local o = {}
setmetatable(o,self)
self.__index = self
o.data = {}
if source then
srcType = type(source)
if srcType == 'Pandoc'
or srcType == 'Meta'
or srcType == 'Blocks'
or srcType == 'Inlines' then
o:addFromWalkable(source)
elseif srcType == 'Block' then
o:addFromBlock(source)
elseif srcType == 'table' then
o:addFromCitationIds(source)
end
end
return o
end
---convert to string
---@param separator string|nil
---@return string
function CitationIdList:toStr(separator)
local separator = separator or ', '
return table.concat(self.data, separator)
end
---Whether the list of citations is empty
---@return boolean
function CitationIdList:isEmpty()
return #self.data == 0
end
---Whether citationId is in the list
---@param citationId CitationId
---@return boolean
function CitationIdList:find(citationId)
for _,id in ipairs(self.data) do
if citationId == id then
return true
end
end
return false
end
---Whether the list includes all items from citationIdList
---@param citationIdList CitationIdList
function CitationIdList:includes(citationIdList)
result = true
for _,id in ipairs(citationIdList.data) do
if not self:find(id) then
result = false
break
end
end
return result
end
---Insert citation in the list if not already present
---@param citationId CitationId
function CitationIdList:insert(citationId)
if not self:find(citationId) then
table.insert(self.data, citationId)
end
end
---Get a copy of the list
---@return CitationIdList
function CitationIdList:clone()
result = CitationIdList:new(self.data)
return result
end
---Get a new list of citations minus those already in citationIdList
---@param citationIdList CitationIdList list of citations to remove
---@return CitationIdList result new CitationIdList
function CitationIdList:minus(citationIdList)
result = CitationIdList:new()
for _,id in ipairs(self.data) do
if not citationIdList:find(id) then
result:insert(id)
end
end
return result
end
---Get a new list of citations plus those in citationIdList
---@param citationIdList CitationIdList list of citations to add
---@return CitationIdList result new CitationIdList
function CitationIdList:plus(citationIdList)
result = CitationIdList:new()
result:addFromCitationIds(self.data)
result:addFromCitationIds(citationIdList.data)
return result
end
---Add from a list of citation Ids
---@param list CitationId[]
function CitationIdList:addFromCitationIds(list)
for _,item in ipairs(list) do
if item and type(item) == 'string' then
self:insert(item)
end
end
end
---Add from a Cite element
function CitationIdList:addFromCite(cite)
for _,citation in ipairs(cite.citations) do
self:insert(citation.id)
end
end
---Add citation ids found in walkable container
---@param container pandoc.Meta|pandoc.Pandoc|pandoc.Blocks
function CitationIdList:addFromWalkable(container)
container:walk{
Cite = function(cite)
self:addFromCite(cite)
end
}
end
---Add citation ids found in block
---@param block pandoc.Block
function CitationIdList:addFromBlock(block)
if block.content then
block.content:walk{
Cite = function(cite)
self:addFromCite(cite)
end
}
end
end
---Add citation Ids from a Pandoc document
---@param doc pandoc.Pandoc
function CitationIdList:addFromPandoc(doc)
doc:walk{
Cite = function(cite)
self:addFromCite(cite)
end
}
end
---Add citation Ids from a Pandoc document using pandoc.utils.references
---Differences between addFromReferences and addFromPandoc:
---addFromReferences only adds citations present in the bibliography database
---addFromPandoc adds any citations
---both list citations in a pre-existing* Citeproc bib, if present
function CitationIdList:addFromReferences(doc)
for _,item in ipairs(pandoc.utils.references(doc)) do
self:insert(item.id)
end
end
---Insert citations in the nocite metadata field
---@param meta pandoc.Meta metadata block to modify
---@return pandoc.Meta
function CitationIdList:insertInNocite(meta)
local inlines = meta.nocite and flattenToInlines(meta.nocite)
or pandoc.Inlines{}
for _,id in ipairs(self.data) do
inlines:insert(pandoc.Space())
inlines:insert(pandoc.Cite(
pandoc.Str('@'..id),
pandoc.List{
pandoc.Citation(id, 'AuthorInText')
}
))
end
meta.nocite = pandoc.MetaInlines(inlines)
return meta
end
--- Use this to run command line tests with pandoc lua
-- if arg and arg[0] == debug.getinfo(1, "S").source:sub(2) then
-- else
return CitationIdList
-- end
end,
["Options"] = function()
--------------------
-- Module: 'Options'
--------------------
--[[ Options class
Parse and hold filter options
]]
local stringify = pandoc.utils.stringify
local metatype = pandoc.utils.type
--- # Options object
---@class Options
---@field new fun(meta: pandoc.Meta):Options create Options object
---@field allowDepth fun(depth: number):boolean depth is allowed
---@field getDepth fun():number returns max depth (error messages)
---@field debug boolean debug mode
---@field suppressBiblio boolean suppress-bibliography mode
local Options = {}
---create an Options object
---@param meta pandoc.Meta
---@param default_max_depth number
---@return object Options
function Options:new(meta, default_max_depth)
o = {}
setmetatable(o,self)
self.__index = self
o:read(meta, default_max_depth)
return o
end
--- normalize: normalize user options. No value check, we just
--- handle aliases and return options as a map.
---@param meta metaObject
---@return pandoc.MetaMap
function Options:normalize(meta)
--- look for 'rciteproc' or its aliases
local opts = (meta.rciteproc and meta.rciteproc)
or (meta['recursive-citeproc'] and meta['recursive-citeproc'])
or (meta.recursiveciteproc and meta.recursiveciteproc)
or pandoc.MetaMap{}
--- ensure opts a map; single value assumed to be max-depth
opts = (metatype(opts) == 'table' and opts)
or ((metatype(opts) == 'Inlines' or metatype(opts) == 'string')
and pandoc.MetaMap({ ['max-depth'] = stringify(opts)}))
or pandoc.MetaMap{}
--- provide alias(es)
aliases = { ['max-depth'] = 'maxdepth' }
for key,alias in pairs(aliases) do
opts[key] = opts[key] == nil and opts[alias] ~= nil and opts[alias]
or opts[key]
end
return opts
end
---read: read options from doc's meta
---@param meta pandoc.Meta
---@param default_max_depth number
function Options:read(meta, default_max_depth)
local opts = Options:normalize(meta)
-- Option: max-depth
local userMaXDepth = opts['max-depth'] and (
(opts['max-depth']) == 'Inlines' and tonumber(stringify(opts['max-depth']))
metatypeor tonumber(opts['max-depth'])
)
local maxDepth = userMaXDepth and userMaXDepth >=0 and userMaXDepth
or default_max_depth
self.getDepth = function()
return maxDepth
end
-- whether a depth is allowed; returns true when depth = 1
self.allowDepth = function (depth)
return maxDepth == 0 or maxDepth >= depth
end
-- Option: debug
self.debug = opts.debug and opts.debug == true or false
-- Option: suppress-bibliography
self.suppressBiblio = meta['suppress-bibliography'] and
meta['suppress-bibliography'] == true or false
end
return Options
end,
["log"] = function()
--------------------
-- Module: 'log'
--------------------
local FILTER_NAME = 'Recursive-Citeproc'
---log: send message to std_error
---@param type 'INFO'|'WARNING'|'ERROR'
---@param text string error message
local function log(type, text)
local level = {INFO = 0, WARNING = 1, ERROR = 2}
if level[type] == nil then type = 'ERROR' end
if level[PANDOC_STATE.verbosity] <= level[type] then
local message = '[' .. type .. '] '..FILTER_NAME..': '.. text .. '\n'
if quarto then
quarto.log.output(message)
else
io.stderr:write(message)
end
end
end
return log
end,
["refsdiv"] = function()
--------------------
-- Module: 'refsdiv'
--------------------
--[[ refsdiv.lua
Manipulate Citeproc's #refs Div in a document
Citeproc adds bibliography at the end of a #refs Div. If not
found, it creates one at the end of the document. Users can
otherwise place it anywhere and add some content to it.
This module handles manipulating bibliography entries within
the #refs Div without moving it or losing user's content.
The structure of a #refs Div is as follows (Pandoc 2.17 - 3.6+)
Div
( "refs"
, [ "references" , "csl-bib-body" , "hanging-indent" ]
, [ ( "entry-spacing" , "0" ) ]
)
[ Para
[ Str "Preamble: users can add a #refs Div, citeproc adds the entries after." ]
, ... (more user blocks) ...
, Div
( "ref-Allen2020" , [ "csl-entry" ] , [] )
[ Para
[ Str "Entry text" ]
]
, Div
( "ref-Black2022" , [ "csl-entry" ] , [] )
[ Para
[ Str "Entry text" ]
]
]
]]
---@alias pandoc.Walkable pandoc.Pandoc|pandoc.Meta|pandoc.Blocks
-- # Settings
---Pandoc's default bibliography identifier
local REFSDIV_ID = 'refs'
---@class refsdiv
---@field get fun(container: pandoc.Walkable, refsId: string|nil): pandoc.Div|nil get the #refs Div
---@field getEntries fun(container: pandoc.Walkable, refsId: string|nil): pandoc.Blocks get its entries
---@field removeEntries fun(container: pandoc.Walkable, refsId: string|nil): pandoc.Blocks remove its entries
---@field extractEntries fun(container: pandoc.Walkable, refsId: string|nil): pandoc.Blocks, pandoc.Blocks extract its entries
---@field rename fun(container: pandoc.Walkable, newId: string, refsId: string|nil): pandoc.Walkable rename the Div
---@field remove fun(container: pandoc.Walkable, refsId: string|nil): pandoc.Walkable remove the full Div
local refsdiv = {}
---Get references Div from a walkable container
---@param container pandoc.Walkable
---@param refsId string|nil identifier for the Refs Div (default REFSDIV_ID)
---@return pandoc.Div|nil
function refsdiv.get(container, refsId)
local identifier = refsId and refsId ~= '' and refsId
or REFSDIV_ID
local result = nil
container:walk{
Div = function(div)
if div.identifier and div.identifier == identifier then
result = div
end
end
}
return result
end
---Get CSL entries from a Div in a walkable container
---@param container pandoc.Walkable walkable element containing the Refs Div
---@param refsId string|nil identifier for the Refs Div (default REFSDIV_ID)
---@return pandoc.Blocks
function refsdiv.getEntries(container, refsId)
local identifier = refsId and refsId ~= '' and refsId
or REFSDIV_ID
local refsDiv = refsdiv.get(container, identifier)
local result = pandoc.Blocks{}
if refsDiv then
refsDiv.content:walk{
Div = function(div)
if div.classes:includes('csl-entry') then
result:insert(div)
end
end
}
end
return result
end
---Extract CSL entries from a container
---@param container pandoc.Walkable walkable element containing the Refs Div
---@param refsId string|nil identifier for the Refs Div (default REFSDIV_ID)
---@return pandoc.Walkable container without any CSL entries found
---@return pandoc.Blocks result extracted CSL entries
function refsdiv.extractEntries(container, refsId)
local identifier = refsId and refsId ~= '' and refsId
or REFSDIV_ID
local result = pandoc.Blocks{}
local refsDivFilter = {
Div = function(div)
if div.classes:includes('csl-entry') then
result:insert(div)
return {} -- this erases the entry
end
end
}
local containerFilter = {
Div = function(div)
if div.identifier and div.identifier == identifier then
div.content = div.content:walk( refsDivFilter )
return div
end
end
}
return container:walk(containerFilter), result
end
---Rename the Div. Example use: rename(doc, 'stored') to store it and
---rename(doc, 'refs', 'stored') to restore. To erase you must rename '';
---in Pandoc filters `div.identifier = nil` leaves the id unchanged.
---@param container pandoc.Walkable
---@param newId string new Id. nil can't ers
---@param refsId string|nil
---@return pandoc.Walkable container with Div renamed
function refsdiv.rename(container, newId, refsId)
local identifier = refsId and refsId ~= '' and refsId
or REFSDIV_ID
return container:walk{
Div = function(div)
if div.identifier and div.identifier == identifier then
div.identifier = newId
return div
end
end
}
end
---Remove references Div from a walkable container
---@param container pandoc.Walkable
---@param refsId string|nil identifier for the Refs Div (default REFSDIV_ID)
---@return pandoc.Div|nil
function refsdiv.remove(container, refsId)
local identifier = refsId and refsId ~= '' and refsId
or REFSDIV_ID
return container:walk{
Div = function(div)
if div.identifier and div.identifier == identifier then
return {}
end
end
}
end
return refsdiv
end,
----------------------
-- Modules part end --
----------------------
}
if files[path] then
return files[path]
else
return origin_seacher(path)
end
end
end
---------------------------------------------------------
----------------Auto generated code block----------------
---------------------------------------------------------
--[[-- # Recursive-citeproc - Self-citing BibTeX
bibliographies in Pandoc and Quarto
@author Julien Dutant <julien.dutant@kcl.ac.uk>
@copyright 2021-2025 Philosophie.ch
@license MIT - see LICENSE file for details.
@release 2.1.0
]]
-- Pandoc 2.17 for relying on `elem:walk()`, `pandoc.Inlines`, pandoc.utils.type
PANDOC_VERSION:must_be_at_least '2.17'
local log = require('log')
local refsdiv = require('refsdiv')
local Options = require('Options')
local CitationIdList = require('CitationIdList')
local stringify = pandoc.utils.stringify
--- # Settings
-- Default depth. 10 covers most uses and ends fast if entries are missing.
local DEFAULT_MAX_DEPTH = 10
-- Error messages
local ERROR_MESSAGES = {
REFS_FOUND = "It looks like you are running Citeproc before this filter."
.." No need to do that: this filter replaces Citeproc.",
MAX_DEPTH = function (depth) return 'Reached maximum depth of self-citations '
..'('.. tostring(depth) ..').'
..'Check if there are circular self-citations in your bibligraphy, '
end,
NOTHING_TO_DO = 'No self-citations found.'
}
--- # Helper functions
---Run citeproc on a document
---@param doc pandoc.Pandoc
---@return pandoc.Pandoc
local function citeproc(doc)
if PANDOC_VERSION >= '2.19.1' then
return pandoc.utils.citeproc(doc)
else
local args = {'--from=json', '--to=json', '--citeproc'}
local result = pandoc.utils.run_json_filter(doc, 'pandoc', args)
return result and result
or pandoc.Pandoc({})
end
end
---Avoid crash with empty but non-nil bibliography key
---@param meta pandoc.Meta
---@return pandoc.Meta meta
local function fixEmptyBiblio(meta)
if meta.bibliography and stringify(meta.bibliography) == '' then
meta.bibliography = nil
return meta
else
return meta
end
end
--- # Filter classes and functions
---Diagnosis of the document's pre-existing state
---@class Diagnosis
---@field hasBib boolean whether the document has a pre-existing bibliography
---@field citesInBib CitationIdList citations in the pre-existing bibliography
---@field hasCitesInBib boolean whether it has Cite elements in its bibliography
---@field nothingToDo boolean whether recursive citeproc is needed at all
---@field biblioCanBeUsed boolean whether the pre-existing biblio can be used in the first pass
local Diagnosis = {}
---Create a diagnosis
---@param doc pandoc.Pandoc document to be diagnosed
function Diagnosis:new(doc)
local o = {}
setmetatable(o,self)
self.__index = self
--is there a previous bibliography (#refs Div with CSL entries)?
--if yes, does it contain Cite elements?
local entries = refsdiv.getEntries(doc)
if #entries > 0 then
o.hasBib = true
o.citesInBib = CitationIdList:new(entries)
o.hasCitesInBib = not o.citesInBib:isEmpty()
else
o.hasBib, o.hasCitesInBib = false, false
o.citesInBib = CitationIdList:new()
end
--nothing to do if there are bib entries but not Cite elements in them
o.nothingToDo = o.hasBib and not o.hasCitesInBib
or false
return o
end
-- # Main filter
---Main filter process
---@param doc pandoc.Pandoc
---@return pandoc.Pandoc|nil
local function recursiveCiteproc(doc)
local options = Options:new(doc.meta, DEFAULT_MAX_DEPTH)
local diag = Diagnosis:new(doc)
-- if biblio, warn Pandoc users that it's unnecessary
if diag.hasBib and not quarto then
('WARNING', ERROR_MESSAGES.REFS_FOUND)
logend
-- quick exit if nothing needs to be done
if diag.nothingToDo then
('INFO', ERROR_MESSAGES.NOTHING_TO_DO)
logreturn
end
-- prepare document
doc.meta = fixEmptyBiblio(doc.meta) -- fix empty string bug
-- NB: for simplicity, we wipe out any pre-existing bibliography
doc, _ = refsdiv.extractEntries(doc)
-- prepare recursion
local originalCitations = CitationIdList:new(doc)
local originalNoCite = doc.meta.nocite and doc.meta.nocite -- store
local newCitations = diag.citesInBib -- we already know we need to add those
local depth = 1
if options.debug then
('INFO', 'pass 0: ', newCitations:toStr())
logend
while true do
-- run Citeproc with the new citations added to the original nocite
doc.meta.nocite = originalNoCite
doc.meta = newCitations:insertInNocite(doc.meta)
doc = citeproc(doc)
-- does the generated bib contain even more citations?
-- if not, break; otherwise consider another run.
local citationsInBib = CitationIdList:new(refsdiv.getEntries(doc))
if options.debug then
('INFO', 'pass '..tostring(depth)..': '..citationsInBib:toStr())
logend
if newCitations:includes(citationsInBib) then
break
elseif not options.allowDepth(depth + 1) then
('WARNING', "reached maximum recursion depth. I could not process the"
log.." following citation key(s): "
..citationsInBib:minus(newCitations):toStr())
break
else
newCitations = citationsInBib
depth = depth + 1
doc,_ = refsdiv.extractEntries(doc, nil)
end
end
-- Typeset citations in the bibliography
-- TODO: try 'suppress-bibliography'
doc = refsdiv.rename(doc, 'recursive-citeproc-stored') -- store
doc = citeproc(doc) -- typesets citations but adds a new biblio
doc = refsdiv.remove(doc) -- remove the generated biblio
doc = refsdiv.rename(doc, 'refs', 'recursive-citeproc-stored') -- restore
return doc
end
--- # return filter
return {
{
Pandoc = recursiveCiteproc
}
}