8  Metadata

Your markdown document should start with a ‘metadata’ block. This contains information about the document that isn’t part of the article’s body. Some of it is information about the article itself: title, author name and affiliation, DOI, etc. The rest is information needed to turn the manuscript into published output: name of a .bib file containing its references, special LaTeX packages to be used for its PDF output, parameters used by your template engine (e.g. ‘imagify’ parameters).

In this chapter, we explain how metadata entries are formatted in general. Subsequent chapters deal with particular aspects of the metadata such as the title or author’s information.

8.1 The metadata block

The metadata block is at the beginning of your markdown file, and it starts and ends with lines consisting of three hyphens ---. Here is a template - you can add this at the beginning of the markdown file:

---
title: The Solution to the Problem that has Plagued Many People in the Past
shorttitle: The Solution to the Problem
author: Max Mustermann
affiliation: University of Wisemen
bibliography: references.bib
---

In RStudio, the metadata block is displayed both in Source and Visual modes.

The metadata block consists of a series of fields and their values. In the above example:

  • the fields are title, shorttitle, author, affiliation.
  • the title field’s value is a line of text, namely “The Solution to the Problem that has Plagued Many People in the Past”.

The metadata block follows what’s called the YAML syntax. This dictates, for instance, that field names should be immediately followed by a colon without space (title: and not title :). All you need to know about this syntax is explained in this section.

8.2 One-line field-value

In most cases you enter a field and its value in one line:

title: The Solution to the Problem that has Plagued Many People in the Past

The field’s name must be immediately followed by a colon without space:

BAD GOOD
title:An interesting fact title: An interesting fact

8.2.1 One-line text values

The value is usually a bit of text (including a filename like references.bib). In most cases, they can be entered with or without quotes. The following are equivalent:

GOOD

title: An interesting fact

title: 'An interesting fact'

title: "An interesting fact"

The text is assumed to be in markdown: you can surround words with *...* for emphasis (italics), use codes for special characters, etc.

You must use quotes if the text value contains a colon, an apostrophe or quotation marks:

BAD GOOD

title: Pleasure: friend or foe?

title: Lars' revenge

title: A "good" friend

title: A 'good' friend

title: My Life: a Long Story

title: 'Pleasure: friend or foe?'

title: "Lars's revenge"

title: 'A "good" friend'

title: "A 'good' friend"

title: "My Life: A Long Story"

As you can see, if a text value contains single quotation marks, it must be put within double quotation marks and conversely. The apostrophe is entered using a single quotation marks, so a title containing one should be wrapped in double quotation marks.

If your value happens to contain both kinds of quotation marks, you can enter as a text block value or wrap it within one kind of quotation marks but ‘escape’ marks of that kind with a backslash whenever they appear in the title:

title: 'A \'quote\' and "another" in this title'

8.2.2 Number and boolean values

In some cases a value may be a number (12) or a Boolean value (true or false). Enter they without quotation marks.

volume: 75

8.3 Text block values

Another way of entering text values is to use the text block syntax. This is useful when the value is a longer text, e.g. an abstract, or when the value contains special characters. The basic syntax is:

title: |
  The Solution to the Problem that Plagued Many People in the Past
  • One line contains the field name followed by a colon, space and the pipe symbol |.
  • One or more subsequent lines contain the text. Each line must be indented by 2 to 4 spaces relative to the field’s name.
  • No quotation marks are used. Any added quotation marks will be treated as part of the text itself.
BAD GOOD
title: | A longish
    phrase
title: Another longish
    phrase
title: |
    Value spread on several lines
but I forgot indentation
title: |
    "This wasn't mean to have
    quotation marks!"
title: |
  A longish phrase
title: |
  Another longish phrase
title: |
  Value spread on several lines
  indented properly
title: |
  With "quotation" marks and colons:
  no problem!

Even though text block values can occupy multiple lines, they are not a way to specify line breaks. For instance, the following:

title: |
  A long title that would without a doubt
  end up occupying more than one line

is equivalent to:

title: A long title that would without a doubt end up occupying more than one line

Using the former is not a way to force the template engine to break the title at that line.

In general, you shouldn’t enter forced line breaks in titles or abstracts anyway. Your journal template’s may allow you to enter manual linebreaks for a better look on the cover page or chapter page. These are not entered in the title field but in some separate title fields. (For instance, Dialectica uses title-cover.) See the instructions specific to your journal.

8.4 How to spot syntax errors in your YAML metadata block

  • smart editors like VSCode or RStudio color the metadata block to distinguish keys and values, and sometimes underline in red places where the editor thinks you committed a mistake.
  • you get a “YAML Parse error” when creating outputs. This states a line where an error was encountered (the mistake will be on that line or the previous ones).
  • the metadata block is printed out in your output formats instead of being processed. This means it’s not even recognized as a metadata block at all: check that it begins and ends with --- lines.

8.5 Linebreaks: genuine, merely visual and explicit

Can you spot the error in this metadata block?

title: The Formalization of arguments with various
applications
author: Robert Michels

The problem is that the title is spread over two lines: it should either be all on one line or use the text block syntax (|).

To avoid such errors it’s important to distinguish:

  1. Genuine line breaks in your markdown file,
  2. Merely visual line wrapping, and
  3. Explicit line breaks to be printed out in your output

And, to distinguish these, you need a text editor that displays line numbers—smart editors like VSCode, RStudio, Sublime etc. all do.

A markdown file, like any text file, is simply a long string of characters. Most of them are letters, numbers and symbols. But some are linebreak characters: they denote the end of a line rather than a letter, number or symbol. Using ‘↩︎’ to represent this special character, a simple text file could be:

now then, let's go out↩to enjoy the snow... until↩I slip and fall!

This is how the computer ‘sees’ it. For us, though, it’s hard to read. So, your text editor displays these line breaks characters by starting a new line instead, which it indicates with a new line number:

 1 now then, let's go out
 2 to enjoy the snow... until
 3 I slip and fall!

These are genuine line breaks, in the sense that they reflect what’s actually in the file.

There is no limit to the length of (genuine) lines in a file. For instance, a two-line file may be:

this file starts with a very very very very very very very very very very very very long line followed by↩a short one

The first line of this file is likely to be longer than the width of your text editor. If so, your editor faces a dilemma: should it show it to you as one line, in which case you’ll need to scroll to the right to read it? Or should it “wrap” the line, i.e. split it when it reaches the right end of the editor’s window, in which case what you see won’t exactly correspond to what’s in the file? Editors normally offer both options—usually in a menu called “View”: Wrapping, or No Wrapping.

But editors with line numbers give you the best comprise: they only have line numbers where there are genuine linebreaks. Without wrapping, they show you the file above like this:

1 this file starts with a very very very very very very very very very very very very long line followed by
2 a short one

So you see that there are only two lines; the first is long and probably overflows your editor’s window to the right. In Wrapping mode, depending on window size they might show the file as follows:

1 this file starts with a very very very very very very very very 
  very very very very long line followed by
2 a short one

You can see that the second line isn’t a genuine line break in the file because it doesn’t have a line number. It’s really the rest of the first line, merely displayed below it so you don’t have to scroll right to see it. By contrast, “a short one” is a genuine new line.

Back to metadata blocks. In an editor with line numbers, you can see the difference between:

1 ---
2 title: The Formalization of arguments with various
  applications
3 author: Robert Michels
4 ---

and:

1 ---
2 title: The Formalization of arguments with various
3 applications
4 author: Robert Michels
5 ---

Both try to provide the title: value in one line (they don’t use the | of text blocks). The first one is correct. As line numbers show, our title is indeed given in one line (line 2). The editor merely displays “applications” on a subsequent line, for our convenience. What’s really in the file, what the computer ‘sees’, is just one line. So, there’s no error there. In the second, however, we have mistakenly entered a genuine linebreak between “various” and “application”, as shown by the new line number 3. So, what the computer reads is:

  • the title is “The Formalization of arguments with various”
  • then there’s a line “applications”. It’s not a field name, because it’s not followed by a colon (it’s not: applications: ...). So, what is it?

And you get a “YAML Parse error” message, which is to say that the computer doesn’t understand the structure of your metadata block.

Genuine line breaks in your markdown files are not explicit line breaks, however. Recall the file containing Bashō’s haiku:

now then, let's go out↩to enjoy the snow... until↩I slip and fall!

It has three genuine line breaks, which your text editor will show thus:

 1 now then, let's go out
 2 to enjoy the snow... until
 3 I slip and fall!

In markdown, however, linebreaks are just treated as spaces. So, if you create an output from this file, you’ll get a single paragraph:

now then, let’s go out to enjoy the snow… until I slip and fall!

When turning markdown into PDF or HTML output, there’s no attempt to match the lines in your markdown file. That’s good because you normally don’t want to manually decide where to break the lines. The only thing you want to manually enter are the paragraph divisions, and these are just entered by two successive linebreaks, i.e. an empty line:

1 This is a paragraph containing one or more
2 sentences. Line breaks in the markdown file are
3 treated as spaces, so the final output may
4 break lines elsewhere than shown here.
5
6 However, it will treat this as a second paragraph
7 because it is separated from the previous
8 by a (genuine) empty line, i.e. two successive
9 linebreaks. 

In some cases though, like poetry verses, you want to specify explicit line breaks that aren’t paragraphs. These are entered by ‘escaping’ the linebreak character, i.e. having a backslash followed by the linebreak character. In the text editor:

 1 now then, let's go out\
 2 to enjoy the snow... until\
 3 I slip and fall!

Which corresponds to the following file:

now then, let's go out\↩to enjoy the snow... until\↩I slip and fall!

In markdown, escaped characters are reproduced as they are. Thus an escaped linebreak is reproduced as an explicit linebreak in your output. Beware that the \ should be immediately followed by the end of the line, not a space: otherwise it’s just an explicit space and all you’ll see in your output is an extra space.

In summary:

  1. Merely apparent linebreaks are lines starting with no numbers in your editors. They correspond to nothing real in the file; they’re just a convenient way of displaying a long line.
  2. Genuine linebreaks are linebreaks characters in the file, corresponding to new numbered lines in your editor. They are important in metadata block. In the main text, they are treated as spaces, so you can cut your lines whenever your lines. The exception is two linebreaks in a row, i.e. an empty line, which mean a new paragraph.
  3. Explicit linebreaks are backslash \ at the very end of a line. They represent forced linebreaks that must be reproduced in the output.

Terminological note: hard vs soft line wrapping. You may encounter the terms of “hard” and “soft” line wrapping. Here’s what they mean.

  • Hard line wrapping is entering genuine linebreaks. For instance, your editor may offer the option of reflowing your text with linebreaks at a certain fixed length. This will normally be in the “Edit” menu, as it changes the file itself.
  • Soft line wrapping is merely displaying long lines on multiple lines. Text editors offer a soft wrapping option, which allows you to view long lines without having to scroll horizontally. This will normally be in the “View” menu, as it doesn’t modify the file itself but only how you view it.

General comment. The beauty of working with text files such as markdown files is that there is nothing hidden: what you see in the editor is exactly what there is on file. The only thing that’s not immediately obvious is genuine linebreak characters. Now that you know how to use an editor’s line numbers to identify them, markdown files have no secret for you.

To see what we mean by ‘hidden’, compare with Word files. When you see an italicized word fortiori in a MS Word document, what there actually is in your .docx file is a complicated bit of code such as this:

<w:r><w:rPr><w:i/><w:iCs/><w:lang w:val="en-US"/></w:rPr><w:t>fortiori</w:t></w:r>

Because you can’t see what’s going on ‘under the hood’ when you modify a file in MS Word, it’s virtually impossible to do proper typesetting. For that you’d need to see the file as it is; but as the example above shows, they’re virtually impossible to read when seen that way. Markdown gives us the benefit of working with a humanly readable file and at the same time working with the file exactly as the computer ‘sees’ it.

8.6 List and map values

In addition to simple values like a piece of text or a number, a metadata field’s value can be a list or a map.

8.6.1 Lists

A list is a sequence of values. For instance, the keywords field is a list of text values:

keywords:
- pleasure
- pain
- life
- death

Each item in the list is on a separate line, starting with a dash (-). The dashes are aligned with the field’s name (keyword here). Quotation marks can be used, as with one-line text values Section 8.2.1.

An alternative, shorter way to enter a list of one-line values is to use square brackets and commas, as in:

keywords: [pleasure, pain, life, death]

That is equivalent to the above; again quotation marks can be used. It cannot be used for more complex list values such as text blocks or maps.

List items can be text blocks too, and you can mix text blocks and one-line values:

keywords:
- |
  A we"ird key'word with:symbols
- |
  A keyword that I would like
  to enter on multiple lines
- this item is just a line

A list item can also be a map. Let’s explain what these are first.

8.6.2 Maps

A map or dictionary is a set of field-value pairs. The metadata block itself is a map:

title: The Meaning of Life
author: Jane E. Doe
bibliography: references.bib

But a field’s value can itself be a map. We indicate this by indenting the map under the field’s name. This is typically used to specify options for a template extension like Imagify (used to turned LaTeX into images):

title: Yet Another Mistaken Theory
author: Jane E. Doe
imagify:
  scope: all
  output_folder: _imagify_outputs
bibliography: references.bib

Here imagify’s value is a map with two fields: scope and output_folder. Note how these entries are indented and below imagify. By contrast, bibliography is aligned completely to the left, with title, author and imagify, so it’s a main field, not a sub-field of imagify.

Sub maps can contains sub-sub maps and so on:

field1:
  subfield1:
    subsubfield1: value
    subsubfield2: value
  subfield2: value
field2: value

Any of the values can be a list. If it is, align the list’s dashes with the field’s name. For instance, we can make the value of subsubfield1 a list of two values as follows:

field1:
  subfield1:
    subsubfield1:
    - first value
    - second value
    subsubfield2: value
  subfield2: value
field2: value

A map cannot have two fields with the same name:

BAD

author: Jane Smith
author: John Doe

However, the same field name may appear in distinct maps, including a submap and the main map, or distinct maps in a list of maps.

8.6.3 Maps in lists

A list item can be a map. The author field is a case in point:

author: 
- name: Vincent Conitzer
  email: conitzer@cs.duke.edu
  correspondence: true  
  institute: Duke University
- name: Nadine Elzein
  email: nadine.elzein@warwick.ac.uk
  institute: University of Warwick

Let’s break this down. Our author field is a list of two values:

author:
- first value
- second value

Each of these values is itself a map. The second value is the following map, for instance:

name: Nadine Elzein
email: nadine.elzein@warwick.ac.uk
institute: University of Warwick

When we insert it in the list, we make sure we indent the map so that each of its fields is aligned to the right of the -:

author:
- first value
- name: Nadine Elzein
  email: nadine.elzein@warwick.ac.uk
  institute: University of Warwick

Note how name, email and institute are all aligned, and to the right of the second dash. Doing the same with the map corresponding to the first value we obtain the desired result:

author: 
- name: Vincent Conitzer
  email: conitzer@cs.duke.edu
  correspondence: true  
  institute: Duke University
- name: Nadine Elzein
  email: nadine.elzein@warwick.ac.uk
  institute: University of Warwick

Note how each of the fields name, email, institute appear twice but in distinct maps.

Values in maps can be lists too, as we said. The institute field is a case in point:

author:
- name: Susan Stebbing
  institute: 
  - "King's College London"
  - Bedford College, London
- name: Someone Else
  institute: Some University
- name: Third Person
  institute: Some College

Here author is a list with three values, corresponding to the left-most dashes. Each is a map with two fields, name and institute. Note how the dashes in front of “King’s College” and “Bedford College” are aligned with the institute field name.

8.6.4 List in lists

Though we haven’t found a use for them in copyediting metadata, lists whose values are lists are also possible:

mylistoflists:
- - item1
  - item2
- - item3
  - item4

Here mylistoflists is a list with two values; the first value is itself a list with two values (item1, item2), and the second is another list with two values (item3, item4).