Configure a Scan Action

The scan action is the most important part of Collector because it’s what imports additional files or metadata into the content aggregate. (Those files, in turn, get added to the content catalog). The scan action looks for matching files under the specified directory and imports them into a component version bucket. The scan action can also be used to update the metadata in antora.yml, or even change its identity.

You can configure one or more scan actions using the scan key. This page explains how to use the scan key and the keys it accepts.

File import

The files discovered by a scan operation are added to the component version bucket in the content aggregate before the content is classified. If a file being imported already matches the identity of a file in the bucket, the contents and stat of the existing file is updated.

If antora.yml is discovered at the root of the scanned directory (often generated), the contents of that file are parsed. By default, the parsed data is overlaid onto the current component version bucket. That means a generated antora.yml file can be used to update the name, version, and prerelease of the bucket. If the bucket has the prerelease property set, but the imported file is missing this key, the prerelease property is removed from the bucket. To learn how to configure Collector to target a different bucket and creating it, if necessary, instead of updating the current one, refer to the section Target a different component version bucket.

The imported files are indistinguishable from ones discovered in a static Antora content root. It’s as though the discovered files are in the repository branch that Antora scans, only they are added after the fact.

scan key

A scan action is configured using the scan configuration key for the Collector instance. The scan key must be nested under the collector key.

antora.yml

ext:
  collector:
    scan:
      dir: build-foo

If the value of the collector key is an array, the scan key must be specified as a key on an array entry. The scan key may be used in more than one entry in that array.

antora.yml

ext:
  collector:
  - scan:
    - dir: build-foo
    - dir: build-bar
  - scan: build-baz

Here’s a real world example that shows how to configure a scan action:

antora.yml

name: colorado
title: Colorado
version: '5.6.0'
ext:
  collector:
    scan:
    - dir: build/generated
      files: '**/*.adoc'
      clean: true
    - dir: build/log
      clean: true

The value of the scan key can be a map, an array, or a string. If the value is a string, the value is assumed to the value of the dir key of a map. If the value is an array (i.e., a list of entries), each entry must be a string or a map consisting of built-in key-value pairs. Each scan entry is invoked sequentially, in the order specified in the array. If the value is a map (i.e., the leading - marker is dropped), it’s assumed to be a single-entry array.

Acceptable keys for the map value are listed in the table below.

Key Default Type Description

Key	Default	Type	Description
`dir^*`	undefined	String (absolute path, path relative to start path if starts with `./`, or path relative to content source root)	The directory to scan.
`files`	*/	String (micromatch pattern)	The pattern to use for filtering files.
`into`	empty	String (a base path) or Map	The base path to prepend to imported files or a destination component version.
`clean`	false	Boolean	Whether to clean the scan directory before running commands.

dir^*

undefined

String (absolute path, path relative to start path if starts with ./, or path relative to content source root)

The directory to scan.

files

**/*

String (micromatch pattern)

The pattern to use for filtering files.

into

empty

String (a base path) or Map

The base path to prepend to imported files or a destination component version.

clean

false

Boolean

Whether to clean the scan directory before running commands.

* required

dir key

The dir key specifies a directory to scan. This is the only required key for a run action. It’s typically looking for files generated by a command which was run previously, though it can be used to discover any file in the worktree. The extension then imports any files discovered into the current bucket in the content aggregate.

The value of the dir key is a string path for a single directory. If the path is absolute, that value is used as is. If the path is . or starts with ./, it’s resolved starting from the start path (i.e., the root of the current origin). Otherwise, the path is resolved relative to the worktree directory.

files key

The files key specifies which files to import from the scan directory. The files key accepts a micromatch pattern (i.e., glob), supporting the same syntax as the worktrees key on a content source in Antora.

scan:
  dir: build
  files: '**/*.adoc'

By default, all files found in the scan directory are imported (though not necessarily classified by Antora).

into key

For each file discovered by a scan (other than antora.yml), a virtual file is added to the bucket in the content aggregate for the current component and version (as specified in antora.yml) by default. The file is added using the path relative to the scan dir. Thus, there’s an assumption that the scanned files are organized according to the standard Antora structure (e.g., modules/ROOT/pages/generated.adoc). The into key allows this assumption to be broken.

If the files are not organized as an Antora content root, they can be remapped using the scan configuration. The into key specifies a path to prepend to the relative path of all files discovered by the current scan operation. This path should be a relative directory and / should be used to separator path segments (since it represents a virtual path) (e.g., modules/ROOT/pages). This value is prepended as a base directory path to the relative virtual path of every scanned file. This key is not set by default.

For example:

scan:
  dir: build/pages
  into: modules/ROOT/pages

The into key accepts a base path as a string or a map. When the value is a map, can also be used to specify the target component name and/or version.

By using the into key, the scanned files do not have to be organized according to the standard Antora structure. Instead, the relative path is synthetic.

Target a different component version bucket

The into key also provides a way to import the files into a component version bucket that’s different from the one in which Collector is running. To do so, the value of the into key must be a map.

Acceptable keys when the value of the into key is a map are listed in the table below.

Key Default Type Description

Key	Default	Type	Description
`name`	undefined	String	The name of the target component version.
`version`	undefined	String	The version of the target component version.
`dir`	empty	String (a base path)	The base path to prepend to imported files.

name

undefined

String

The name of the target component version.

version

undefined

String

The version of the target component version.

dir

empty

String (a base path)

The base path to prepend to imported files.

The name and/or version of the target component version bucket can be specified using the name and version keys, respectively. (If only one of the keys is specified, the value of the other key inherits from the current component version descriptor).

For example:

scan:
  dir: build/apidocs
  into:
    name: apidocs
    version: ~

The same can be achieved by specifying the target name and version in a scanned antora.yml file and also setting the create: true key. In this case, the into key is not needed on the scan action. For example:

antora.yml

name: apidocs
version: ~
create: true

Setting the create: true key in the generated antora.yml file tells Collector to use the target bucket, creating it if necessary, instead of updating the identity of the current bucket.

If you want to prepend a base directory path to the files while also specifying a target component version bucket, the path should be specified in the dir key. For example:

scan:
  dir: build/apidocs
  into:
    name: apidocs
    version: ~
    dir: modules/ROOT/attachments

The dir key is effectively the same as an into key with a string value, except that a target component version bucket is also specified.

clean key

If the clean key is set to true on the entry, that implicitly creates a separate clean entry using the same dir. The clean is scoped to the current step (the same step that defines the run key).

The clean action is only needed in the following two cases:

The Collector instance is running on an existing worktree. The worktree may have build files generated from a separate process or from a Collector instance from a previous Antora run. If you want to ensure that no residual files are discovered, the scan directory (and possibly other directories) should be cleaned.
The temporary worktree for the Collector instance is kept indefinitely. In this case, the Collector instance will be recycling a worktree it has used on a previous Antora run and thus may require cleaning.

If Collector creates a new worktree in which to run the Collector instance, there’s usually no need to clean. However, if the Collector instance has multiple steps, and those steps generate files into the same scan directory, it may be necessary to clean the scan directory at the start of each step. You’ll need to put some thought into when you want the scan directory to be cleaned.