Configure an Instance of Collector

A Collector instance is created by defining collector configuration in a component version descriptor (antora.yml). The collector configuration is defined using the collector key under the ext in that file.

The ext key is the designated area in the component version descriptor for extensions to use to define additional configuration.

A Collector instance is associated with the content source root, aka origin, in which it is defined. The instance is then invoked within the worktree allocated for that origin. In other words, the actions that the Collector instance performs are carried out in a worktree that corresponds to the git reference for that origin.

collector key

The collector key in the component version descriptor defines the configuration for the steps that are carried out on the origin (a git reference) in which the component version descriptor is found. Each step consists of any number of clean, run, and scan actions. The first step may also include the worktree configuration.

Here’s an example of a component version descriptor that configures the collector extension.

antora.yml
name: colorado
title: Colorado
version: '5.6.0'
nav:
- modules/ROOT/nav.adoc
ext:
  collector: (1)
  - clean: (2)
      dir: build/generated (3)
    run: (4)
      command: ./gradlew --console rich generateContent (5)
    scan: (6)
    - dir: build/generated (7)
      files: '**/*.adoc' (8)
  - scan:
      dir: src/test/java/org/example
      into: modules/ROOT/examples
1 collector key
2 clean key
3 dir key that specifies the directory to clean
4 run key
5 command key that specifies the command to run
6 scan key
7 dir key that specifies the directory to scan
8 files key that provides a micromatch pattern of files to scan

The value of the collector key can be a map or an array. If the value is an array (i.e., a list of entries), each entry must be a map consisting of built-in key-value pairs. If the value is a map (i.e., the leading - marker is dropped), it’s assumed to be a single-entry array.

Acceptable map keys are worktree, clean, run, and scan. The worktree key is only permitted in the first entry. All entries may contain any combination of the clean, run, and scan keys.

Each key accepts a primitive value (i.e., String or Boolean), which is a short-hand form intended for quick configuration.

The worktree configuration is always processed first. Then, for each step, the extension runs the actions in the following order: clean, run, scan. Therefore, it’s customary to define the keys in this order as well.

If the collector key isn’t set in a component version descriptor, or its value is falsy, the extension won’t run on that origin.

Programmatic configuration

Instead of putting the configuration for your Collector instances directly into antora.yml files, it’s possible to configure Collector programmatically. To do so, you’d use another Antora extension that listens for the contentAggregated event and injects the configuration into the runtime component version descriptor for a given origin. The runtime component version descriptor is accessible via the descriptor property on each origin of a component version bucket in the content aggregate.

Let’s consider a theoretical Node.js script named generated-files.js that generates files into the build directory, which Collector can then import into the bucket. We’ll only run this command on branches, though you could apply any filter, even one configured by the user.

module.exports.register = function () {
  this.once('contentAggregated', ({ contentAggregate }) => {
    for (const { origins } of contentAggregate) {
      for (const origin of origins) {
        if (origin.reftype !== 'branch') continue
        const collector = {
          run: {
            command: 'generate-files.js',
            dir: '.',
          },
          scan: './build/generated-files',
        }
        Object.assign((origin.descriptor.ext ??= {}), collector)
      }
    }
  })
}

The value of the collector property is exactly the JavaScript equivalent of the YAML-based configuration described in this section.

The benefit of the programmatic approach is that you can modify the configuration across all origins from a central location. This strategy avoids having to update antora.yml files across numerous origins (i.e., git references and start paths). You could even consider using the programmatic approach as a way to alter the configuration in the antora.yml file if one of them has fallen out of date or needs environment-specific settings. So it’s not just one or the other.