How to Organize Your Content Files

Antora employs both convention and configuration to aggregate content and generate your site. Before setting up or migrating your repositories, let’s review some key concepts that could impact how you organize your documentation projects and content files to work with Antora.

Storing your content source files

Antora can retrieve content source files from numerous git repositories by searching for files or their symlinks under a start path or multiple start paths in branches, tags, or a local worktree. The repositories Antora uses don’t have to be reserved exclusively for storing documentation (hence the start path). Antora can retrieve files from repositories that also host application code, tests, and other materials in sibling hierarchies. Antora relies on both convention and configuration to identify the documentation content.

In order to fetch source files from multiple and multi-use repositories, Antora requires that the documentation files be:

Although not required, we strongly recommend that you always use lowercase for filenames. Some filesystems are case sensitive, while others are not. By always using lowercase, you avoid any problems that occur if the filesystem, webserver, or transfer tool does not preserve the casing.

Classifying your content source files

Once Antora collects the source files from all content source roots, it classifies each file by assigning metadata to it, which is used to uniquely identify the file within the site. The file’s identifier, called a resource ID, is used for creating references from pages, other resources, and the configuration. This step also implicitly partitions the source files into component versions.

Antora’s virtual filesystem

Antora decouples source files from their storage locations after it collects them. For all intents and purposes, the origin of each file is irrelevant. In other words, Antora never goes back to the filesystem or git repository to read the file once it’s discovered and loaded. Antora bases all of its file operations on the virtual filesystem (VFS) it creates after it collects the files.

The only aspect of a file that maps back to the location on the filesystem is the family-relative path. And even this association is maintained merely as a convenience for the author. Aside from the family-relative path, all other parts of the file’s identity are based on associative metadata, such as the component name, version, module name, and family.

File metadata

So how does a file get this metadata? All files in the same content source root inherit the component name and version from the component version descriptor file, named antora.yml. These descriptor files help Antora sort and organize all of the collected source files into component versions. You can think of a component version as all of the documentation for a version of a project. For example, you’re reading a page in the Antora 3.1 component version right now.

These antora.yml files are how content that belongs to the same version of a project can be identified by Antora. It’s also how component versions are defined and populated implicitly.

Inside a content source root, files are further grouped into module and family folders, which provide two more facets of a source file’s identity. Finally, the family-relative path is captured to uniquely identify a source file within a family, even across multiple repositories or git references.

File locations and URLs

The location of a source file doesn’t dictate the location of the published file. Once a source file is loaded into Antora’s VFS, the file’s metadata is manipulated, which includes computing the file’s output location and URL. Each family of files has different rules for how these values are computed. The association between where the source file is found, where the published file is placed in the site, or how that file is accessed isn’t hardwired.

See What’s a component version? and What’s antora.yml? to learn how to assign a component name, version and other optional information to groups of content source files.

git refnames

git refnames, which includes the names of branches and tags, should only contain characters that do not need to be URL encoded. This rule is not enforced by git, but violating it can cause subtle problems in Antora.

While the refname does not appear in the URL of published resources, it is used in references back to the file’s origin. Specifically, using characters in the refname which have to be URL encoded complicates the assembly of the edit URL for pages.

As an example, the character # violates this rule, which would have to be encoded as %23. Although Antora will happily build the edit URL to include this character, the URL will not be interpreted as expected by the browser. That’s because # marks the boundary between the URL that’s sent to the server and the fragment that’s only seen by the browser. The result will be an incomplete URL and thus a 404 page.

It’s acceptable to use the universal directory separator, /, in refnames. By doing so, it effectively organizes the refs into a folder structure (e.g., r/3.0.x). However, this strategy can impact refname matching. The * character in a refname pattern does not match / (it does not cross the boundary of a folder). Therefore, to match a refname such as r/3.0.x, you must use the pattern r/* instead of r*.