Core Concepts

Antora encourages the use of discrete, topic-oriented pages to make content in the site easy to consume and search. This information architecture can be at odds with creating manuals, books, or other compilations to download and consume using a dedicated reading device (e.g., a PDF or EPUB reader). That’s where Assembler comes in.

Assembler provides a configuration and navigation-based approach to combining and exporting content, typically relying on an external command (e.g., Asciidoctor PDF) to handle the AsciiDoc conversion.

This page explores these concepts to give you a broad view of how Assembler works and what you can accomplish with it.

What’s the purpose of Assembler?

The main purpose of Assembler is to export content in an Antora site. Since individual pages can already be printed, distilled to PDF, or otherwise saved, Assembler is geared towards combining multiple pages.

Using the navigation as a blueprint, Assembler combines pages into assembly files that can then be converted to different export formats such as PDF to produce export files. This process is done per component version, either creating a single export file from the whole navigation or one export file per top-level navigation entry.

Assembler handles all details of how AsciiDoc content is merged so it can be converted using any AsciiDoc converter, which has no knowledge of Antora. For example, when merging pages, it rewrites IDs and references to ensure there are no ID conflicts and organizes sections into a valid hierarchy. For more details, see How are pages merged?

When does Assembler run?

Assembler runs during the navigationBuilt event raised by the Antora site generator. Since Assembler makes use of the navigation, this transition is the earliest point in the site generation that Assembler can run. The work is done early enough for the exports to be available to the page composer so references to those exports can be added to the UI.

Since Assembler runs during the navigationBuilt event, the main navigation, which Assembler uses by default, has already been built from the nav files. That means that any preprocessor conditionals in those files have already been evaluated and are thus not aware of Assembler. If you want to use preprocessor conditionals in the navigation to select pages based on the exporter (e.g., pdf), you must define alternate navigation using an Assembler profile.

What’s an assembly?

An assembly refers to the reduced AsciiDoc source merged from one or more pages. This source is stored in the contents property of a transient assembly file. An assembly file provides additional metadata, such as the AsciiDoc attributes for use during conversion and the resource ID template for creating the export file. The assembly file is what the converter accepts as input to generate the export file (or contents).

Assembly files are created per component version, though they may pull in content from other component versions by way of the navigation.

How are pages merged?

Assembler is designed so the AsciiDoc converter does not have to know anything about Antora or resolve any references outside the document. In order to achieve that, Assembler has to “reduce” the AsciiDoc source of each page. This step is handled by Asciidoctor Reducer. Asciidoctor Reducer is a tool that reduces an AsciiDoc document by resolving all include directives and preprocessor conditionals to produce a single, self-contained AsciiDoc document. With that step complete, Assembler can move on to merging those source documents.

Assembler decides how to organize and merge pages using the navigation model for a specific component version. By default, Assembler uses the same navigation used by the pages in the site. An alternate navigation can be specified using an assembly profile defined in the component version descriptor. That profile can even be set per export format (e.g., pdf, epub, etc). The root level defined globally, or by that profile, determines whether Assembler creates a single AsciiDoc document or multiple documents.

When writing an AsciiDoc document, there’s an assumption that IDs only have to be unique within that document. This assumption is broken when merging AsciiDoc documents. Thus, Assembler must ensure that there are no ID conflicts. Antora rewrites all references to pages included in the same assembly as internal xrefs.

Assembler also must ensure that sections are properly nested (honoring the doctype of the page). The page title is always inserted as a formal section in the merged source document, matching the level of the nav entry. (In other words, the page title becomes a section title in the assembly). Any roles and attribute entries in the document header are preserved, though the ID is generated. How the arrangement of sections on that page is done is controlled by the section merge strategy in Assembler.

Image references are rewritten to resolve to the image resource within the site.

The outcome is a single, self-contained AsciiDoc document that can be converted using any AsciiDoc converter, such as Asciidoctor PDF.

Do I need to use unique IDs across pages?

No, Assembler does not require that you use unique IDs across your site. IDs only have to be unique within a page, which is already a requirement when using Antora. Assembler automatically scopes IDs by rewriting them when it merges pages.

At the moment, only IDs in the shorthand form are supported (e.g., [#install]). The long-hand (e.g., id=install) and block anchor (e.g., [[install]]) forms are not supported.

How does Assembler scope IDs?

Every ID on a page is prefixed using the ID of that page. The ID of a page is generated from the docname of the page, which is the relative path in the pages family without the file extension. That value is then scrubbed so it only contains valid ID characters. Finally, page ID is added to the beginning of the original ID, separated by three separator characters (e.g., page-id:::original-id). The separator is : by default, which can be customized.

If the page is located in a module other than the ROOT module, the ID is further qualified by prefixing it with the module name followed by the separator (e.g., module-name:page-id:::original-id). If the page is located in a different component, regardless of module name, the ID is further qualified by prefixing it with the component name followed by the separator followed by the module name followed by the separator again (e.g.,, component-name:module-name:page-id:::original-id). If the module name is ROOT in this case, it is replaced by an empty string.

What happens to references that point outside the assembly?

All references to pages that point outside the assembly, or to any attachment, are rewritten as links to the published site using the site URL as a prefix. If the site URL is not set, the target of these references will remain unresolved.

How do Asciidoctor extensions work in Assembler?

Antora is written in JavaScript and uses Asciidoctor.js to convert pages to HTML. Thus, Asciidoctor extensions used by the site generator must also be written in JavaScript for Asciidoctor.js. Assembler, on the other hand, can use any AsciiDoc converter, which may not be written in JavaScript. Thus, when converting assembly files, extensions must work with whatever converter is being used. If you’re using Asciidoctor PDF to convert AsciiDoc to PDF, you must use Ruby-based Asciidoctor extensions. This may require using different Asciidoctor extensions in Assembler than you may be using in the site (for example, you can use Asciidoctor Diagram with Asciidoctor PDF instead of Asciidoctor Kroki).

What’s the result of running Assembler?

Assembler produce exports. The reason the files Assembler produces are called exports is because they can be downloaded and viewed offline, either by the browser or a reader application.

By delegating to an external converter, Assembler presents an opportunity to customize the style and appearance of the exported content (e.g., PDF theme).

These exports are stored in the content catalog under the export family of the ROOT module for the component version from which the export was created. The relative path of the export depends on the root level setting.

If the root level is 0, Assembler produces a single export. The relative path of the export is index followed by the file extension for the export format (e.g., index.pdf).

If the root level is 1, Assembler produces an export per top-level navigation entry. The relative path of the export is generated from the navtitle of the nav entry from which it was created (e.g., configure.pdf).

Assembler can be configured to qualify the relative path of the export using the component and version.

How are exports referenced?

Assembler stores the exports in the content catalog. While useful for some things, this does not make the export easily accessible from the page. (For example, how do you know what export is relevant for a given page?)

To alleviate that challenge, Assembler creates a direct reference from the page to each export that includes that page. These references can be accessed using the assembler.exports property on the virtual file. The first export of each format is also conveniently referenced via the assembler.<extname> property (e.g., assembler.pdf). That same reference also provides access to the fragment of the page within the export.

These direct references provide a convenient way to create a link in the UI to the export (e.g., the PDF).

Portability considerations

Ideally, the Assembler build command should not depend on system-wide commands or files, preferring instead to use files available within the build and playbook directory, or retrieved from the network.

You also may want to ensure that the site can still be built successfully when Assembler is not in use. This will make it easier for authors to preview their work locally.

Extending the scope

If more advanced capability is required, you have the option of writing your own exporter extension. That extension can use the Assembler API to configure Assembler and export the assembly files (i.e., the content) with a greater degree of control.

Assembler—by way of the exporter extension—isn’t doing anything covert that another extension could not itself do. At the end of the day, it’s just an Antora extension that provides an approachable way to extend the capabilities of Antora.