Refname Matching in Content Sources

At the heart of Antora is the gathering of content stored in references across various git repositories. The playbook provides several keys to instruct Antora which reference names (i.e., refnames) to consider, including branches, tags, and worktrees, and which start paths within those references to scan. One way to configure this filter is to list each refname individually. However, since content often moves very quickly, that approach can be burdensome and static. That’s why Antora provides a facility to include and exclude refnames in bulk using pattern matching. The pattern matching approach has the benefit of simplifying the configuration and automatically discovering new refnames as they become available.

This page describes the value types and syntax you can use for matching refnames in a content source.

Value types

The keys covered on this page accept two kinds of values:

  • a string (i.e., a character sequence)

  • an array of strings

A string value will be split a commas, preferably followed by space, if present. For example, the string v1.0.x, v2.0.x will become an array of two strings, v1.0.x and v2.0.x.

It’s usually best to enclose the string values in single quotes to avoid conflicts between characters in the value and special syntax in YAML (or your chosen configuration language).

If you’re matching refnames using patterns, we strongly encourage you to use the array syntax. The single or comma-separated string syntax is intended to be used for exact matching only. Otherwise, you might get unexpected behavior.

Exact match

The simplest approach to matching refnames is to specify them as exact names. For example, you can match a single branch named main as follows:

branches: main

If you want to also add branch names for older release lines, you can separate them by commas:

branches: v1.0.x, v2.0.x, main

You can also express the value as an array to make it explicit:

branches: [v1.0.x, v2.0.x, main]

When we say “exact match”, we’re referring to a match against the shortname of the reference. You aren’t matching remotes/origin/v1.0.x or heads/v1.0.x, for instance, but rather v1.0.x. The reason for this is that Antora looks for both remote and local references, preferring the local references, and only selects one per unique shortname.

While it appears that the exact match is a literal value, it is, in fact, a pattern. And a pattern always matches from the beginning to the end of the refname. However, most refnames don’t contain any characters which have meaning in the pattern matching syntax (not even .). So it acts like a literal value that looks for a refname that matches exactly. But keep the fact that it’s a pattern in the back of your mind if you find that it is not matching the refname you expect it to match.

Wildcards (basic globbing)

Having to maintain a list of exact refnames in your playbook can be tedious and noisy. That’s why Antora allows you to match refnames in bulk using pattern matching.

The most basic pattern matching tool is the wildcard (*). The use of this pattern is often referred to as globbing, since you use it to capture a glob of items.

You can glob all refnames that exist using a lone wildcard:

branches: '*'

But it’s rare that you’ll want to do that. Let’s say, instead, that we want to match all refnames that start with v. We can do that by placing an asterisk after the letter.

branches: v*

The * means “any number of characters”. In this case, it matches any number of characters the follow a v at the start of the rename.

You can also use the wildcard between two parts of a string to match any number of characters between the two parts. Let’s use it to match version numbers more precisely.

branches: v*.*.x

This pattern will match v1.0.x, v2.0.x, and even v20.10.x. However, it will only match numbers if the refnames themselves only have number. That means it could also match very.last.x. While we’ll be able to address that problem later when we get into more advanced patterns, it does bring us to the problem of overmatching and the need for exclusions.

Exclusions

So far we’ve talked about refnames we want to match, or inclusions. When you start using patterns, you can run into the problem of matching too many items. You can take away from previously matched items using one or more exclusion. An exclusion entry always begins with !.

Let’s say you want to match all version-like refnames, but you want to exclude the version before you started using Antora.

branches: [v*.*.x, '!v1.0.x']

Notice that we’ve switched the value to the array syntax, as recommended. It’s also necessary to enclose an exclusion entry in single quotes so it does not confuse the YAML parser. When in doubt, enclose string values in single quotes. It never hurts to do it.

Let’s throw away that non-version refname that we inadvertently matched:

branches: [v*.*.x, '!v1.0.x', '!very.last.x']

You can also use wildcards in the exclusion pattern to bulk match what you’ve already matched and remove those matches.

branches: [v*.*.x, '!v1.*.x', '!very.last.x']

The wildcard is a useful tool, but it’s very loose with what characters it matches. You may find that you need to get more precise. That’s where braces come in.

Braces

Braces, also known as brace expressions, are patterns enclosed in curly braces ({}). There are three kinds of brace expressions supported in Antora:

  • a comma-separated list of alternate character sequences

  • an alpha or numeric range

  • a numeric range with a step

Alternation

A comma-separated list of alternate characters, such as {this,that} should be read as “this or that”. This can be useful for matching specific numbers in a version. Let’s assume that we only want to match a very limited number of major version refnames. We can identify them using an alternation brace expression:

branches: v{5,6}.*.x

This expression will match v5.0.x, v5.1.x, and v6.0.x.

You can use an alternation expression to match the absense of a character or segment using an empty entry. Let’s consider of making the v prefix optional.

branches: '{,v}{5,6}.*.x'

You can also use wildcards in the alternation entry. For example, we might want to match prereleases this way:

branches: '{,v}{5,6}.*.x{,-*}'

Let’s consider another case of matching specific minor versions for a given major version refname:

branches: v5.{7,8}.x

If the refname does not start with a v, then we need to escape the first period by enclosing it in round brackets.

branches: 5(.){7,8}.x
If the pattern begins with a number followed by a period, and the pattern contains at least one brace expression, the period must be enclosed in round brackets (()) (e.g., 5(.){7,8}.x). This requirement is imposed as a result of a parsing bug in the pattern matching library.

As you can imagine, if you’re specifying a bunch of numbers, they may start to form a range. You can consolidate the alternation using the range syntax.

Range

If the characters you are matching are members of a range, you can specify them using the start and end values only and separating them with two periods (..). A range is another kind of alternation in which each item is considered.

Let’s revisit our version pattern so it only considers matches that have numbers.

branches: v{1..9}.{0..9}.x

Now we will no longer match very.last.x. However, it will no longer match v20.10.x either. We can fix that by expanding the range, which is not limited to single-digit numbers.

branches: v{1..99}.{0..99}.x

However, there are more efficient ways of writing this, which we’ll get into in the extended globbing section.

Returning to our other example, let’s say that 5.9.x was just released and we want to add it to our Antora-era version number pattern. We can switch from a basic alternation to a range.

branches: 5(.){7..9}.x

Alternately, we could use an exclusion to express this match the opposite way.

branches: ['5.*.x', '!5.{0..6}.x']

By default, a range considers each item. You can skip over items using steps.

Steps

You can adjust the step size of a range by appending a third parameter, offset by two periods. The step size tells the pattern matcher how far to go when moving to the next item in the range. You can match all even major versions from 2 to 8 using the following pattern:

branches: v{2..8..2}.*.x

By changing the beginning value, you can match odd major versions instead:

branches: v{1..9..2}.*.x

On their own, braces have limited ability to express complex patterns. To make brace expressions truly powerful, you need to combine them with extended globbing.

Extended globbing and repetition

So far we’ve been matching single occurrences of segments, whether they’re character sequences, alternations, or ranges. We can take our patterns further by specifying how many times these segments must occur, if at all. An extended glob allows you to enclose a pattern or pattern list inside a pair of round brackets, then assign a repetition operator to it.

The pattern matching in Antora supports the following operators:

  • * - zero or more times

  • + - one or more times (i.e., at least once)

  • ? - zero or one time (i.e., optional)

  • @ - exactly once (implied if no operator is specified)

  • ! - must not be present

The following extended glob is a more formal way of writing {0..9}:

@({0..9})

Here’s an extended glob that matches any sequence of numbers.

*({0..9})

This use of repetition is far more efficient than the following range, which you should avoid using:

{0..100}
While you can put * after the pattern list, we don’t recommend it. Repetition operations should always be placed before the pattern list.

We can use extended globbing make our version matcher precise in matching all minor versions refnames, beginning with v1.0.0:

branches: v@({1..9})*({0..9}).+({0..9}).x

We’re saying that the major version must not start with 0. Then, it can be followed by any number of digits (e.g., 1, 10, etc.). The minor version can have one or more digits (e.g., 0, 99, 101, etc.).

We can use the negation operator to exclude while we’re including instead of using a separate exclusion entry:

branches: 5(.)!({0..5}).x

Brace expressions can be nested in order to specify two branches of matching. Here’s how we can exclude those early minor versions while still matching any still to come.

branches: '5(.)({{6..9},{1..9}+({0..9})}).x'

We want to caution against making your pattern too complex. While it may be possible to craft a pattern that can do all the matching you need to do, it becomes increasingly harder to read and maintain. That’s why we encourage you to use as many inclusion and exclusion patterns as you need to comfortably match the refnames for your site.

Let’s wrap up with a full example of matching a very specific range of version numbers with all milestone versions removed.

tags:
- '{5,6}.+({0..9}).+({0..9}){,-*}'
- '!(5).{0..5}.*'
- '!*-M+({0..9})'