Core Concepts

The Antora Collector extension discovers, allocates, and invokes Collector instances. A Collector instance performs designated actions—​such as clean, run, and scan—​in the context of a worktree. The run action allows external commands, local or system-wide, to be invoked. The scan action finds and imports files, generated or otherwise, from the local filesystem.

This page explores these concepts to give you a broad view of how Collector works and what you can accomplish with it.

Scope

The Antora Collector extension provides a configuration-based approach to contribute additional files to a component version bucket (which is destined to become a component version) using external commands.

If more advanced capability is needed, a custom Antora extension could be used to add, update, or delete files associated with a component version. Thus, the Antora Collector extension isn’t doing anything exclusive that another extension could not itself do. It’s just an Antora extension that provides an approachable way to extend the capabilities of Antora.

When does this extension run?

The Antora Collector extension orchestrates Collector instances during the contentAggregated event. At this stage in Antora’s pipeline, Antora has aggregated the files it has discovered into buckets by component version. But, Antora has yet to classify these component versions and files into the content catalog. Thus, the Antora Collector extension acts on component version buckets (collectively the content aggregate) for the purpose of augmenting the buckets with additional files or metadata.

What is a Collector instance?

A Collector instance consists of a sequence of steps that perform designated actions—​such as clean, run, and scan—​in the context of a worktree. Each step must declare a clean, run, or scan action, or any combination of these. The actions for a step are performed in the following set order: clean, run, scan. A Collector instance with a single step can be abbreviated in the configuration as a map.

The first step can configure how the worktree for the Collector instance is allocated. The worktree is always a directory on the local filesystem. If the origin already has a worktree, Collector will reuse that worktree by default. If the origin does not have a worktree, Collector will checkout out the git reference into a temporary directory by default, thus allocating a worktree automatically. The preparation of the worktree is configurable.

Since a Collector instance is run in the worktree for an origin, it’s able to see all files in that worktree, even those that Antora did not scan and import into the component version bucket. This allows the Collector instance to reorganize and import files stored elsewhere in the worktree (i.e., git repository branch), even those that are not in the Antora content structure.

What is the purpose of a Collector instance?

Typically, the goal of a Collector instance is to generate files and import those files into the component version bucket. The bucket for the component version from which the Collector instance is run is the default target, though the target can be any component verion bucket, including a new one. A Collector instance also has the ability to alter the identity of the current component version bucket by generating an antora.yml file.

Where is a Collector instance defined?

A Collector instance is defined per content source root in an antora.yml file. A content source root is the location of the antora.yml file, also referred to as an origin.

A Collector instance is defined and configured using the collector key, which must itself be defined under the ext key.

There can only be a single Collector instance per origin, though that instance may perform multiple steps. While this often equates to a single Collector instance per component version, this is not always the case. A distributed component version is able to define multiple Collector instances (since a distributed component version has multiple origins).

How is a Collector instance configured?

A Collector instance is configured by specifying actions to perform. These groups of actions are carried out in the order they are defined, thus becoming its steps. Within a step, the group of actions are always performed in a set order: clean, run, scan.

The worktree in which the Collector instance runs can be configured using the worktree key. This key controls the predefined worktree action, which always runs at the beginning of the first step.

What happens after a Collector instance has run?

Once all Collector instances have been run, there’s no difference in how the imported files are classified than had they been discovered in the content source root. A Collector instance is essentially adding a dynamic component to Antora’s content aggregator, allowing files to be generated at runtime instead of stored in a git repository.

If a worktree was created for the Collector instance, it’s removed automatically unless configured otherwise.

Portability considerations

When defining a Collector instance, it’s important to consider the portability of the steps. Ideally, a Collector instance should not depend on system-wide commands or files, preferring instead to use files available within the worktree or retrieved from the network. The one exception is the use of Node.js. A Collector instance is able to discover and run the Node.js binary that was used to launch Antora.

You also may want to ensure that the site can still be built successfully when the Collector extension is not in use. This will make it easier for authors to preview their work locally.