The Hexatomic project

The Hexatomic software documentation is hosted at
hexatomic.github.io/hexatomic.

"A minimal infrastructure for the sustainable provision of extensible multi-layer annotation software for linguistic corpora" (Hexatomic) was a joint research project at Friedrich-Schiller-Universität Jena and Humboldt-Universität zu Berlin. It was funded under the call "Research Software Sustainability" issued by Deutsche Forschungsgemeinschaft under grant number 391160252 and ran from October 2018 until September 2021.

The project aimed to implement, test, and document a minimal infrastructure for the sustainable development, provision, and maintenance of research software. The complete project proposal is available as a PDF.

This infrastructure will be used to provide the linguistic community with software for deep multi-layer annotation of linguistic corpora.

The project is located at the Department of English Studies (Jena) and the Department for German Studies and Linguistics (Berlin).

Logo of Friedrich-Schiller Universität Jena

Logo of Humboldt-Universität zu Berlin

About this project documentation

Our project is funded as part of the call "Research Software Sustainability" by Deutsche Forschungsgemeinschaft (DFG).

The call description read

The objective of this call for proposals is the building and testing of infrastructures in order to make research software available and provide it in a sustainable manner to a larger audience. As best-practice examples, projects should have a positive impact on research-software development and on infrastructure facilities.

To fulfill this requirement, we will establish a minimal infrastructure for the sustainable development and provision of research software, cf. also the original project proposal.

Not only will the resulting software, Hexatomic, have to be sustainable, the infrastructure that is used to develop and provide it will need to as well.

One of the explicit aims of the call is to provide "best practice examples" for future projects. In order to make the extraction of best practices from our project possible, it is necessary to document not only the actual software and infrastructure, but also the processes through which decisions have been arrived at in the course of the project. This concerns all levels of decisions, including software engineering decisions, infrastructural components, architectural decisions, workflow decisions, tooling decisions, etc.

The right place to document processes and decisions is the documentation of the project itself which is provided here. Among other things, the project documentation will therefore serve as a decision log.

The project documentation is a "living document", i.e., it is constantly added to and changed, and may change in its structure. It does not refer to a specific version of Hexatomic, and is not versioned like the user, developer and maintainer documentation for the software. Instead, it is a place where the practices of our project are iteratively developed as a base for discussion and extraction of good practices in sustainable research software development.

Implementing documentation

This section describes how we approach and implement documentation.

Documentation is a core aspect of our project. We argue that in order to sustain a research software project, it must be documented on different levels, including for users, developers, and maintainers. Additionally, we document decisions we make during the project runtime, in order to enable traceability of decisions, and potentially the extraction of best practices for the sustainable development and provision of research software.

Documentation within the context of the project takes four different forms, or rather, addresses four different target groups:

  • Users of the Hexatomic software
  • Developers of/contributors to the Hexatomic software
  • Maintainers of the Hexatomic software, and of the infrastructure that is developed and implemented to develop and provide it
  • The research software community as the large set of people, projects, funders and other stakeholders, etc., which is interested in research software engineering, research software development, the sustainability of research software, research software infrastructure, etc.

Documentation sustainability

The sustainability of software documentation is directly related to the sustainability of the software it pertains to. Without documentation that is correct and complete1 at all times, a software cannot be expected to be usable at any point in the future.

"At all times" here describes the synchronic aspect of documentation sustainability. In practice, this is related to engineering, development and maintenance practices of a qualitative nature, as all involved parties must take care to keep documentation up-to-date, while only parts of this process may be easily automatable or measurable. Some methods for documenting source code in situ, e.g., literate programming, aid efforts to achieve synchronic sustainability, but they are not applicable in all projects, and usually additional documentation, e.g., for end users, must be created. Contribution and maintenance workflows can support the fulfillment of the completeness and correctness requirements by implementing methods such as code review, static code analysis, etc.

There is also a diachronic aspect to documentation sustainability, which is of a more technical nature, and is equivalent to technical sustainability of software. It pertains to the sustainability of the documentation tooling, and to the sustainability of the documentation artifacts, i.e., the documentation "products", such as rendered files (PDF, HTML, etc.) and sources, but also to compatibility with rendering systems (e.g., browsers), and the availability and findability of sources and artifacts.

In the context of software projects, this yields the following concrete requirements:

  1. Synchronic sustainability must be ensured by applying software engineering methods to the documentation workflow. This includes code and documentation review, leveraging IDE support for in situ documentation of source code, implementation of code styles such as naming conventions, runnable documentation, etc.
  2. Diachronic sustainability must be ensured by choosing sustainable tooling, and implementing documentation sources and artifacts so that they remain findable, available, and accessible.
1

It is important to note that complete may mean different things in different situations. Private/internal functions, e.g., private methods in Java, may not have to be documented, while public/external functions that may use them must be.

Documentation tooling

This section provides details on how documentation for the research project, and the Hexatomic software as well as the infrastructure used to develop and provide it, is facilitated.

Requirements

The requirements for documentation tooling are specific to our research project in some respects (e.g., pertaining to Javadoc), but all others can be applied to generic (research) software project setups. And those that are project-specific can probably be transferred to other use cases relatively easily.

Requirements summary

The requirements for documentation tools for sustainable software development are

  • (1) Sustainability
  • (2) Single tool toolchain
  • (3) Usability
    • (3a) Human-readable source format
    • (3b) Javadoc integration
    • (3c) Continuous integration capabilities
    • (3d) Maintainability
    • (3e) Maintainability (dependencies)
    • (3f) Usability of representations
  • (4) Different representation forms

Sustainability

The documentation sustainability section establishes that

[documentation] sustainability must be ensured by choosing sustainable tooling [...].

Software sustainability covers many aspects, including community parameters such as size, number and frequency of contributions; whether a code is open source; the programming language it is implemented with; ease of installation; its dependency tree, etc.

Accordingly, there is currently no canonic definition of software sustainability, as discussed in [1]. One of the definitions given in [1] seems operationable enough to use in the context of documentation tooling:

Sustainable software is software which is:

  • Easy to evolve and maintain
  • Fulfils its intent over time
  • Survives uncertainty
  • Supports relevant concerns (Political, Economic, Social, Technical, Legal, Environmental)

[1] D. S. Katz, 'Defining Software Sustainability', 2016. [Online]. Available: https://web.archive.org/web/20181122140500/https://danielskatzblog.wordpress.com/2016/09/13/defining-software-sustainability/. [Accessed: 22-Nov-2018]

Single tool "toolchain"

One of the key motivations behind the choice of tooling is to have one single software which we use for all of our (textual) documentation. This makes maintainership transitions easier as it requires less learning effort from new maintainers, as well as contributors.

Usability

Human-readable sources

In the event of a failure in the documentation software, the documentation sources must be readable and well-structured enough to function as a fallback in place of, e.g., HTML- or PDF-rendered documentation.

We use a combination of hierarchical directory structures and human-readable source file formats to achieve this.

Javadoc integration

As Hexatomic is written in Java, we use Javadoc to document source code in situ. This way, we can generate API documentation in HTML directly from the source code via the javadoc tool. This specific API documentation format is used standardly across most Java software projects1 and is something that code contributors will expect to find.

In order to boost findability of the API documentation, it would be helpful to be able to integrate the Javadoc HTML in the text documentation for developers as easily as possible, ideally through native support for this by the documentation software.

1

For example, all release artifacts of open source projects that are hosted on "Maven Central", the main repository for the dominant build system for Java (cf. [2]), are uploaded through the Open Source Software Repository Hosting (OSSRH) system, which requires artifacts to include a bundled version of the Javadoc API documentation.

[2] S. Maple and A. Binstock, 'The Largest Survey Ever of Java Developers', Java Magazine, November/December, p. 20, 2018. Available: http://www.javamagazine.mozaicreader.com/NovemberDecember2018#&pageSet=20&page=0. [Accessed: 30-Nov-2018]

Continuous integration capabilities

In order to embed documentation deeply in the project as well as the development workflow, editing documentation must be as easy as possible and should ideally not require more than making the actual change in the documentation without having to care about building the representation, deployment, etc.

The default way to achieve this for any kind of code, including documentation sources, is to employ a continuous integration (CI) system which polls the version control system where the code is held for changes, and reacts to these changes by starting the appropriate action, e.g., by building the code and deploying the artifacts.

The documentation software should therefore enable continuous integration of its builds and automated deployment of the documentation representations, either natively, or via the continuous integration system used in the project, or by simply not preventing the application of a CI system, to trigger builds and deploy artifacts through, e.g., a custom script run on the CI system.

In addition to providing an easy way to produce documentation, automated deployment will also ensure that the user-facing representation of the documentation can be up-to-date at all times.

Maintainability

The documentation software should be very easily maintainable.

This includes factors like easy updates to new versions of the software; no installation required or very simple to install; no or very few dependencies on other software, or ready-made packages that include all dependencies.

Usability of representations

The documentation representations produced by the documentation software should have a high level of usability.

While some of the features that make documentation "usable" for a reader may depend on a specific reader's approach to documentation as well as her own preferences, some factors of usability are more easily quantifiable, e.g., a representation's ability to display well on different devices - which is usually a feature of responsive or reactive design paradigms. Further factors include the existence and behaviour of a table of contents or menu, font choices, colour schemes, intuitivity of interfaces, simplicity, and a consistent style that users can "learn".

Different representation forms

The documentation software should ideally be able to produce representations in different formats, e.g., produce a HTML representation of the documentation, a PDF file, an EPUB file, etc.

Different representations are required to serve the purposes of different parts of the documentation. While API documentation may be read mainly during development work and should therefore be provided as hyperlinked documents in a website for quick accessibility and browsing, user documentation may be read in larger portions at a time, e.g., during preparation or evaluation phases, and should therefore also be available as a portable and potentially printable format such as PDF or EPUB.

Available tools

To find a tool that fulfills as many of the requirements as well as possible, and that is suitable for our project context, we have surveyed different documentation software.

These tools would be feasible to use as the single tool for text-based documentation (as compared to Javadoc) in our project (requirement (2)). At the same time, we only considered tools that use a human-readable source format as input to create documentation representations (requirement (3a)). This included only tools that use an easily human-readable text-based markup language, i.e., a Markdown dialect, reStructuredText, or AsciiDoc.

It also excluded other commonly used tools, e.g., Pandoc, if they are not mainly targeted at creating software documentation. Pandoc, for example, does not support out-of-the-box conversion to HTML beyond single pages.

Additionally, we have excluded tools which seemed too little known, i.e., which failed the "list test", the hypothesis being that if a tool would not be included in a sample of list websites ("Top 10 software documentation tools", "15+ Software Documentation Tools That Will Simplify Your Life", etc.) it would not have enough market share, which may imply a small user base and therefore not enough community incentive to keep it alive over the next few years.

Out of the whole section of generic static site generators (of which most documentation tools are a subset of), we have chosen Jekyll because it is natively supported by GitHub Pages, our chosen documentation hosting solution. The reasons why we use GitHub Pages are explained in another section (forthcoming).

We ended up with the following shortlist of tool options.

In our subsequent evaluation of the tools, we have not taken into account the specifics of the different Markdown dialects for our evaluation.

Documentation tooling - evaluation and implementation

Disclaimer

Although we have attempted to make our evaluation as objective as possible, subjective biases and preferences will usually influence choice of tooling (cf. the editor war). Additionally, our evaluation methods are informal to some extent, not strictly empirical, and non-reproducible. Nevertheless we feel that publishing them here may be useful to record the criteria we have taken into account.


Requirements for single tool documentation software (cf. documentation tooling section):

  • (1) Sustainability
  • (2) Single tool toolchain
  • (3) Usability
    • (3a) Human-readable source format
    • (3b) Javadoc integration
    • (3c) Continuous integration capabilities
    • (3d) Maintainability
    • (3e) Maintainability (dependencies)
    • (3f) Usability of representations
  • (4) Different representation forms

To evaluate which documentation tool may be the most suitable for our needs, we have marked it with a value from 1-5 for each requirement, with 1 being the lowest ("worst") mark and 5 the highest ("best").1

Tool13a3b3c3d3e3fx̄(3)4
Sphinx (rST)53134343.04
Sphinx (CM)35132343.04
Asciidoctor34143333.03
mkDocs35134323.01
mdBook45134453.673
Jekyll45134222.832

Table: Scores for the different requirements, cf. list above. x̄(3) is the mean of all Usability sub-requirements.

Sustainability

As of now, reliable measures for predicting the sustainability of a software do not exist [1], and intrinsically, the actual sustainability of a software can only be determined in hindsight. Therefore, assessing the sustainability potential for a software is a qualitative process, partly driven by the requirements of the project for which the software is assessed.

Despite this constraint, assumptions over the sustainability potential of a documentation software project can be based on some assessable factors, e.g., age of the software, approximated user base, development status, frequency of contributions, architecture, pervasiveness of a technological community, level of documentation, maturity of its dependencies, etc.

In the evaluation process, we have tried to approximate each candidate's potential for sustainability, which we present in the following. Additionally, we give the SourceRank metric for high quality packages used by libraries.io, based on the actual products we would employ.

[1] S. Druskat, 'A proposal for the measurement and documentation of research software sustainability in interactive metadata repositories', in Proceedings of the Fourth Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4), University of Manchester, Manchester, UK, 2016 [Online]. Available: http://ceur-ws.org/Vol-1686/

Sphinx

Sphinx is a documentation generator written in Python. It is used to generate the documentation for the Python programming language itself as well as many large Python projects. The documentation platform Read the Docs uses Sphinx to automate the creation and deployment of software documentation.

We have rated the sustainability of Sphinx with rST as very high (5).
We have rated the sustainability of Sphinx with CM as medium (3) due to additionally required extensions and the non-standard use case which is not as well supported as the standard one using rST.
SourceRank metric for the Sphinx project: 23

Python, Sphinx' implementation language, is a highly used programming language, with an estimated 41% market share, as of 2019, and increasing interest as of June 2019. Sphinx has a high criticality due to its use as the Python language documentation platform, and its pervasiveness of the Python community, where it must be regarded as the default documentation tool. As of June 2019, Sphinx is used by over 57,000 projects on Github alone. The Sphinx hosting service Read the Docs has hosted around 100,000 projects in 2018 (including Sphinx and mkDocs projects). Sphinx is a mature project, with 126 releases, the first one from Mar 2008. It is also actively developed, with an average of around 3 commits per day, with the last 100 commits within the last 17 days at the time of writing, a contributor base of 422, of which 51 (12%) have actively contributed over the last year, and in this period, 45 contributors (11%) have made more than 1 commit. Sphinx has averages of 0.9 issues per day and 0.5 pull requests per day. Its 5 dependencies seem to be mature, based on the fact that they all have been released in major versions.

mdBook

mdBook is a documentation generator written in Rust. It is used to generate the main documentation (both the reference documentation, and "The Book") for the Rust programming language itself.

We have rated the sustainability of mdBook as high (4).
SourceRank metric: 15

Although Rust, mdBook's implementation language, is relatively young - development has started in 2006 - it is growing in popularity and sees increasing interest. Major software projects are written in Rust, such as the Quantum and Servo browser engines, developed by Mozilla and used in the Firefox web browser, Facebook's Libra cryptocurrency, Dropbox's file system, and security features in the Tor project. mdBook has a high criticality as the main documentation tool for the Rust language itself and its pervasiveness as the documentation tool for many Rust projects. mdBook is a relatively mature project, with 44 releases, the first one from Aug 2015. It is also actively developed, with an average of around 0.86 commits per day, with the last 100 commits within the last 96 days at the time of writing, a contributor base of 124, of which 43 (35%) have contributed over the last year, and in this period, 22 contributors (18%) have made more than 1 commit. mdBook has averages of 0.34 issues per day and 0.33 pull requests per day. Its 25 dependencies seem somewhat mature, with 13 of them having been released in major versions.

Jekyll

Jekyll is a static site generator written in Ruby. It is the tool used to automatically generate Github Pages from Markdown files.

We have rated the sustainability of Jekyll as high (4).
SourceRank metric: 27

The popularity of Jekyll's implementation language, Ruby, seems to stagnate at around 9% market share and sees decreasing interest. Nevertheless, its market share has remained largely unchanged in the developer community over the last 6 years, and several large software projects are written in Ruby, including the Github development platform, Airbnb, Kickstarter, SlideShare. Jekyll has a high criticality as the main tool for generating Github pages. It is a mature project, with 136 releases, the first one from Dec 2008. Jekyll is also actively developed, with an average of around 2.7 commits per day, with the last 100 commits within the last 76 days at the time of writing. It has a contributor base of 852, of which 12 (1%) have contributed over the last year, and in this period, 7 contributors (<1%) have made more than 1 commit. Jekyll has averages of 1 issue per day and 0.9 pull requests per day. As of June 2019, Jekyll is used by over 296,000 projects on Github alone. Its 13 dependencies seem to be mostly mature, based on the fact that all but 2 have been released in major versions.

Asciidoctor

Asciidoctor is a processor and publishing toolchain for the AsciiDoc markup language. It is written in Ruby. Asciidoctor is used to build documentation for a number of larger software projects, including Grails, the Gradle build automation tool, Red Hat documentation, Solr, and many others.

We have rated the sustainability of Asciidoctor as medium (3).
SourceRank metric of the AsciiDoctor Maven plugin: 10

The popularity of Asciidoctor's implementation language, Ruby, seems to stagnate at around 9% market share and sees decreasing interest. Nevertheless, its market share has remained largely unchanged in the developer community over the last 6 years, and several large software projects are written in Ruby, including the Github development platform, Airbnb, Kickstarter, SlideShare. It is a mature project, with 60 releases, the first one from Feb 2014. Asciidoctor is actively developed, with an average of around 1.7 commits per day, with the last 100 commits within the last 82 days at the time of writing. It has a contributor base of 115, of which 21 (18%) have contributed over the last year, and in this period, 6 contributors (5%) have made more than 1 commit. Asciidoctor has averages of 0.8 issues per day and 0.5 pull requests per day. Its 3 dependencies seem to be immature, as none of them have a major release version.

mkDocs

mkDocs is a static site generator for project documentation. It is written in Python.

We have rated the sustainability of mkDocs as medium (3).
SourceRank metric: 9

Python, mkDocs' implementation language, is a highly used programming language, with an estimated 41% market share, as of 2019, and increasing interest as of June 2019. Its criticality seems to be low, as we could not find any large software projects documented with it. It is a mature project, with 41 releases, the first one from Jan 2014. mkDocs is actively developed, with an average of around 0.7 commits per day, with the last 100 commits within the last 444 days at the time of writing. It has a contributor base of 126, of which 27 (21%) have contributed over the last year, and in this period, 6 contributors (5%) have made more than 1 commit. mkDocs has averages of 0.6 issues per day and 0.4 pull requests per day. Its 7 dependencies seem to be mature, as 6 out of 7 have a major release version.

Usability

Assessing the usability of software is an opinionated process which takes into account encountered and predicted use in a specific context. In our evaluation, we have taken into account the readability of the source format, Javadoc integration, continuous integration capabilities, maintainability, maintainability of dependencies, and the usability of representations.

Human-readable source format

Sphinx (rST)
We have evaluated reStructuredText to be less-than-perfectly human-readable and human-usable due to

  • its syntax for hyperlinks, which is not in-situ but instead uses an in-text underscore syntax Text `link text_` text in combination with a text-external external footnote syntax .. _link text: https://hyperlink for named links, which compares unfavourably with the Markdown format;
  • the four-whitespace indentation convention for code blocks, which disallows easy copy and paste of valid source code snippets.

Sphinx (CM), mkDocs, mdBook, Jekyll
We have evaluated Markdown to be very human-readable and human-usable, independent of implementation dialects. Markdown does not exhibit the same syntax-specific issues as reStructuredText.

Asciidoctor
We have evaluated AsciiDoc, the input format for Asciidoctor, to be human-readable, despite some idiosyncrasies in the syntax, such as <whitespace>+ for single line breaks, and the less graphic headline syntax.

Javadoc integration

We have eventually decided to disregard the integration of Javadoc API documentation as a deciding factor for the best-suited documentation tool for the project. Javadoc-based HTML generation has been a standard feature of Java and Maven for years, and linking from any running text documentation to the respective API documentation is trivial, and can be automated in continuous integration. Nevertheless we provide a quick overview of the evaluations here.

At the time of the evaluation, only Sphinx with reStructuredText offered a clear way to integrate Javadoc, via the javasphinx extension. This extension has since been deprecated, and therefore, the requirement is of no consequence for the evaluation anymore. This is why all tools have been evaluated as not providing Javadoc integration with a score of 1.

Continuous integration capabilities

All evaluated tools can be automated via a continuous integration solution such as Travis CI (used in our project). However, bespoke scripts have to be created (and maintained) to produce and deploy the respective HTML representations, as none of the tools provide this functionality out-of-the-box. Therefore, all tools have been evaluated with medium capabilities (3), with the exception of Asciidoctor, for which a Maven plugin exists, which you can read about on the Asciidoctor website. As Hexatomic uses Maven in the build process, this is helpful, and Asciidoctor has therefore been evaluated with a score of 4.

Maintainability

We have looked at the maintainability of the tools, and have mainly evaluated ease-of-installation and updates, and dependencies.

Sphinx with rST and mkDocs are installable and updatable via standard Python technologies, i.e., an installation of Python and pip. mdBook is provided as a binary, which can be downloaded and used as is; alternatively, the standard way to install Rust software also works, i.e., via a Rust and cargo installation. All three tools are more or less self-contained to run required functionality such as tables of contents, search, etc., out-of-the-box, with existing modules for further functionality, and have therefore been scored with 4.

Asciidoctor requires an implementation of Ruby, and can be installed from an OS package manager or as a Ruby gem. It also requires extensions for processing documentation sources, which has led to a lower evaluation at 3.

Finally, Sphinx with CM has bee evaluated at 2, as it requires hacks to include directives in Markdown which are not natively supported, such as the ones needed to create a table of concents. These leads to manual work which in turn needs to be standardized within the project, documented, and followed. This leads to a loss of maintainability for Sphinx with CM.

Mainainability of dependencies

Dependencies in our evaluation are additional software which has to be installed in order to use the respective tool in our project. mdBook has fared best, as it did not need additional software that needed to be actively installed, although a large number of its runtime dependencies do not have major releases. Jekyll has achieved the lowest score, as specific functionality required the inclusion of many plugins in the configuration file. The other tools have received scores of 3, as they needed a modicum of configuration via extensions and/or had a larger number of runtime dependencies with non-major releases.

Usability of representations

mkDocs and Jekyll have both received low scores in assessing the usability of their HTML outputs. mkDocs can handle multiple pages, and contains global search, but it generally does not provide many features. This includes a maximum navigation depth of 2, which is not sufficient for our use case. Jekyll does not provide search or tables of contents out-of-the-box, and has to be customized to achieve the required level of functionality. Asciidoctor provides no client-side search, and per default produces single pages only. Sphinx does provide access to different versions of the documentation, and provides multi-page functionality and tables of contents, but its search does not provide good results. mdBook provides good search, different themes per default, menu fold functionality, multiple pages and a table of contents, as well as forward and backward buttons, code copy, and single page prints which can also be used to generate PDFs.

Different representations

Only Sphinx can comfortably handle a large number of different representations of documentations, such as LaTeX, EPub3, man pages, and more. The common PDF format has to be produced using an extension. mdBook uses a generic approach by providing a print button which compiles a single page view of the documentation. This can then be used to produce PDFs or other outputs via the browser's print dialogue. Asciidoctor has an external PDF converter, which, however, requires Ruby. It can produce manpages natively. Producing different presentations from Jekyll is cumbersome. PDFs can be produced via HTML to PDF conversion software, and a wrapper plugin for Jekyll exists. mkDocs does not support PDF conversion natively, although a plugin exists, which, however, relies on different other software which have to be installed separately.

Conclusion

To calculate a final score for each tool, we have calculated the mean for all usability sub-categories in (3), and have calculated the mean across the three values. The results are presented in the table below, and are also available as a Jupyter notebook.

Tool13a3b3c3d3e3fx̄(3)4final score
Sphinx (rST)53134343.044.0
mdBook45134453.6733.56
Sphinx (CM)35132343.043.33
Asciidoctor34143333.033.0
Jekyll45134222.8322.94
mkDocs35134323.012.33

Table: Scores for the different requirements, and final scores, cf. list above. x̄(3) is the mean of all Usability sub-requirements.

Due to the restrictions in usability (and slightly decreased human-readability) that reStructuredText represents (cf. section Human-readable source format), as well as a personal preference for markdown, we have decided to use mdBook for the text-based documentation of Hexatomic.

Implementation

We use local installations of mdBook on development machines to write the user, the developer & maintainer, and this project documentation. We use the Travis CI continuous integration platform to produce the documentation representations, and deploy them to GitHub Pages.

The sources for the project website reside in a dedicated repository, github.com/hexatomic/hexatomic.github.io. The sources for the Hexatomic software are held in the development repository for Hexatomic, github.com/hexatomic/hexatomic.


1

The evaluation was carried out by S. Druskat and T. Krause.

Code Review

Code review is an essential part of our project: Code and documentation changes come in as pull requests, and all pull requests will undergo review. We are using the tools of the GitHub platform for performing code reviews in public and do not distinguish between "internal" contributions from core team members and "external" ones. Having a platform like GitHub allows us to discuss changes, make these discussions transparent to others, and also use the public discussions as an additional history of how and why the code has become part of Hexatomic. Having a central place for discussions is especially important for distributed teams, where team members might be split over several institutions and cannot meet regularly in the same room. GitHub was chosen as the general platform for hosting the code, issues, and documentation, and GitHub code reviews are an integral part of the platform. While we are currently using GitHub code reviews for convenience, the same principles and standards for code review could be applied for other platforms/tools like Gerrit or git-appraise.

Objectives

Developer and maintainer resources are scarce and every code review is an additional overhead to the actual implementation. An objective is, therefore, to keep the effort for code review proportional to the development effort.

Code reviews can request changes and it is important that this feedback loop is not obstructing the development process. It must be ensured that there is both enough time for the developer to address the requested changes and that these changes are reviewed quickly. Also, the longer a pull request is not processed, the more conflicts to other developments can occur, which may lead to features not being deployed in a timely manner.

Responsibilities

It must be clear who is responsible for performing code reviews, and these responsibilities have to be documented in the maintainer documentation. In research projects like Hexatomic, we propose that this should be the general responsibility of the maintainer(s). They can delegate the code reviews to others if appropriate or necessary. E.g., if the maintainer is submitting a pull request themself, another person should be responsible for code review. Delegating code reviews can help distribute the workload over several people.

In Hexatomic we have three areas where contributions require code review.

Project documentation

One of the purposes of the project documentation is to report progress and outcomes of our research, similar to a formal project report. In fact, it is our approach to generate the project report based on the project documentation. We therefore argue that ideally, the documentation should be reviewed by the principal investigators (PIs). To avoid a "chicken and egg" problem, project maintainers who are also the researchers in the project can review each other's pull requests to bootstrap the initial documentation, e.g., about how code review is performed.

Once the project is completed, the project documentation will probably not need any updates. If an update is still necessary, the maintainers can decide who should review it.

User, developer and maintainer documentation

The maintainer should suggest a suitable person for a review of changes to documentation. This can include other project members and PIs who are not developers, but also external users. We argue that especially user documentation can benefit from a user perspective in the review process. This means that informed users must be enabled to make updates in and add changes to the documentation, and to use the review user interface. In order to achieve this, the code review process and involved technology is documented in the maintainer documentation, to share with reviewers.

Software code

For changes to the actual Hexatomic software, a maintainer is responsible for code review. They may also enlist additional reviews from others.

Review schedules

To find the right balance between a quick code review feedback loop, and justifiable overhead for maintainers, we propose to have weekly dedicated time slots where the maintainers perform the code review. How much time is allocated, and when exactly these time slots are, depends on the work schedule of the actual maintainer. The time slots should be documented in the CONTRIBUTING.md file to make them transparent to external contributors. An example would be to have two time slots of one hour each per maintainer, on two different days in the week, with one day in between the review slots. This would allow the contributor to update the pull request with the requested changes and still get a new review in the same week. Ideally, a pull request with minor remarks could be merged in the same week it is created.

We think it is important to limit the amount of time per week a maintainer has to invest in code reviews. This allows the calculation of man-hours required to maintain a software project, and to plan funding of maintenance work. If too many code reviews are requested at the same time, we propose to triage them first and give them a visible priority.

Periodic unreviewed code triage

The maintainer has a very central position in our proposal for a minimal sustainable research software infrastructure. While a maintainer is responsible for integrating external contributions, and for releasing new versions of the software, we did not envision this position as someone who actively extends the software with new features. Having a maintainer for a project makes sure that there is always a point of contact for other developers who want to contribute to the project. This can help to keep the project alive, and ensures that certain software engineering standards such as code review can be followed.

Issues with the maintainer role

Keeping a project alive can mean different things depending on the software and the expectation of the community. We tried to minimize the work that needs to be done by the maintainer and moved responsibility to the community that uses the software to help keep it alive, e.g. by providing funding for a limited time to add a specific feature or fix a bug which is of a particular interest for this user.

However, there are issues that threaten the fundamental availability of the software itself, if not fixed. For example, if the execution environment changes in a backward incompatible way, sometimes only small adjustments to startup scripts or build configuration can fix these issues. Or if there is a security issue in one of the dependencies that can easily be fixed by just updating to a fully backward compatible patched version. If a software can't be run on current systems, but only on out-of-date virtual machines or containers with potentially vulnerable operating systems, the actual usability and “aliveness” of a software is under serious threat. Some issues can be mitigated to a certain degree by carefully choosing the software your project depends on, but some issues just cannot be predicted. Thus, maintaining a software may also mean fixing these kinds of issues, and therefore the maintainer may actually have to change code.

This leads to a dilemma: if the maintainer themself is changing the code, who is reviewing it for correctness?

Pool of reviewers

One possible solution to this problem is to have larger developer community with at least two maintainers. For projects with a small user base, this may not be achievable even if the software is essential for their research. In research areas like the humanities, funding for developers is typically limited to an active project phase and there is often only funding for a single developer. We initially thought that student assistants may be able to fill the gap as second maintainers and reviewers, but hiring a student assistant may not be possible for all projects.

An alternative solution is to have Research Software Engineering (RSE) teams at an institution, who can contribute as code reviewer-on-demand. These RSE groups are essentially professionalized and paid communities of maintainers for a whole institution. Unfortunately, very few organizations do have such RSE teams as pool for reviewers yet. It is also not an easy task to develop the permanent funding at each institution that is needed for the establishment of an RSE group, and as such RSE groups do therefore not provide a short term solution. If a pool of code reviewers could be provided on a larger scale and on a volunteer basis, for example as some kind of "Stack Overflow for open source research software code reviewers", this could enhance the situation for smaller projects. Still, some way of scheduling this resource is needed, and it is not clear who should do the organization and funding of the platform itself.

Postponing the review with unreviewed code triages

This idea is based on the current funding situation for smaller projects, especially in the humanities, where the presence of only a single maintainer can be guaranteed for the originally funded lifetime of a software project at most, and where there is no institutional pool of reviewers. It is inspired by the "weekly performance triage" of the Rust compiler,1 where performance regressions for new code in the compiler are not measured and detected for each pull request, but are triaged every week, and the offending pull request is only identified if a regression has been found in any of the changes since the week before.

For critical bug fixes that hinder the execution of the software (but not for new features or non-urgent bug fixes), the single maintainer can author the fix and add a regular pull request, but decide not to request a code review. In this case, all required and all optional checks of the continuous integration pipeline must execute successfully. This includes successful test execution, but specifically also static code analysis, where the goal should be to not introduce any new issue and to stay in the acceptable limits of metrics such as test coverage for new code, or maximum line duplication. Having an advanced static code analysis in the continuous integration pipeline is a strict requirement for this approach. Once all checks have passed, the maintainer marks the pull request with a special label for unreviewed pull requests and merges it, thus producing a new hotfix release of the software.

To ensure that all fixes are still reviewed at some point in the future, the project should introduce periodic reviews of previously unreviewed changes. For Hexatomic, we use a quarterly approach. During triage, a reviewer will look into a special triage log file in the repository to determine which source revision was triaged last. Next, all pull requests with the "unreviewed" label are merged into a new version control branch which is based on the last triaged commit. The changes are reviewed and if there are any issues found, they are added to the issue tracker of the project, so that they can be resolved later. A triage report is added to the log file, and the "unreviewed" label is removed from all triaged pull requests, so that the next reviewer can start from the latest commit at this point of time. The process itself is documented in our developer/maintainer documentation, so anyone can perform the triage.

An advantage of this approach is that these code audits can bundle several pull requests, and if there is short-term funding for another developer or the possibility to contract an external company or freelancer to perform these audits, someone who did not write the code is reviewing it. But even if no external developer is available, and instead the original maintainer performs the reviews after some time, we expect that the maintainer is able to find issues in the code which where overlooked the first time when the code was still actively present in their mind.

People

RolePerson
Principal Investigator (Friedrich-Schiller-Universität Jena)Volker Gast
Principal Investigator (Humboldt-Universität zu Berlin)Anke Lüdeling
Researcher (Friedrich-Schiller-Universität Jena)Stephan Druskat
Researcher (Humboldt-Universität zu Berlin)Thomas Krause
Research Assistant (Friedrich-Schiller-Universität Jena)Clara Lachenmaier
Research Assistant (Friedrich-Schiller-Universität Jena)Bastian Bunzeck