Software Development Practices#

Across our various code bases there are a number of common practices that are strongly encouraged to ensure the quality of software within the POLARIS framework.

Broadly these principals break down to:

  1. Setting baseline standards and automating their enforcement

  2. Requiring peer review of changes before they are made

  3. Automated testing of proposed changes

In the following we go into detail about how we implement these principals within the POLARIS development team.

CI Architecture#

A common element in our approach to software development is to automate everything that it is possible to automate. To do this we utilize GitLab’s CI/CD features to provide a framework in which all our required automations can take place. Our division within Argonne provides a self-hosted GitLab instance which we augment with 3 additional “CI Runners” that serve different testing functions:

  1. A general purpose linux runner (24 physical core / 128GB RAM)

  2. A general purpose windows runner (24 physical core / 128GB RAM)

  3. A “model running” specific runner (20 physical core / 256GB RAM)

The first two are designed to run multiple CI jobs in parallel and are thus quite well provisioned from a hardware perspective, to allow us to maximise our throughput of CI jobs and minimize the elapsed wall clock time between when a software change is proposed and when it can be merged back into our main branch. The third runner has a larger memory allocation to allow for running of all our current suite of models which require between 80GB-200GB of available memory depending on the model.

Short Loops#

The above is a practical example of the application of a powerful concept of “iteration shortening” which is embraced throughout our practices. This is a principal that is rarely given a name or even discussed explicitly but which is enormously powerful and underpins the successful execution of many commonly adopted software development practices such as Agile, TDD and CI/CD.

The principal states that the longer a process goes on without a clear resolution, the more likely it is to fail. In one post on the topic Adam Ruka describes this is as:

constantly asking the question “does this work as intended?”. Successful projects are generally those where the answer can be quickly known.

Or, in other words, the longer it takes to get an answer to this question, the more likely that mistakes will be made, and therefore anything we can do to shorten the feedback loop will be repaid with higher quality outcomes.

This manifests in our day to day practies in many ways. Some examples are given here:

  1. writing unit tests when implementing new features so that we can quickly see that our implementation matches expectations

  2. utilizing stripped down toy models for fast integration testing environments

  3. setting up our IDE to automatically save and format code so that our linting requirements are always satisfied.

The above is very much a non-exhaustive list and the reader is encouraged to think of other ways in which the principal can be applied to their day-to-day practices.

Baseline Standards#

Most long running open source software projects will have a “Code Style Guide” which lays out (often in great detail) the exact rules about how code is written - “there should be one blank line betwen functions”, “variables should be named CamelCase format”, etc. When followed religiously these rules lead to a code base with consistent conventions which can make reading and modifying the code significantly easier for someone familiar with the conventions.

However, like anything religiously applied it also can lead to the possibility of Holy War - whereby the righteous “Indenting should be done with SPACES” faction within the team attempts to defeat the evil “Indenting should be done with TABS” group.

To avoid this we, wherever possible, externalise this type of decision making on such immaterial topics to industry standards which are mostly acceptable to all parties within the team. This is exemplified by the black python code formatter which characterises itself as “an opinionated and uncompromising formatter with less than three user configurable options”. By adopting this standard for all our Python code we have an objective and impartial judge that can not only provide a ruling on any style dispute, but also can automatically apply a fix directly to the offending section.

C++#

  • Formatting applied by clang-format with a minimally modified ruleset based on the LLVM standard ruleset.

Python#

  • Code formatting applied via black

  • PEP8 style guide enforced via ruff (or flake8)

  • Static Type Checking via mypy

Peer Review#

All changes to our codebase start life as a “proposal to change” in the form of a Merge Request (MR) in the corresponding GitLab repository. There are two standard tags which are applied to a Merge Request during its lifecycle, WIP - Do Not Merge and Please Review and Merge; contributors are encouraged to create a MR (tagged with the former) as soon as work begins on a feature and to use that MR to elicit feedback during the development cycle. This can comprise either paired programming sessions with other team members to jointly write code (highly effective when sharing screens over Teams) or via asynchronous review of the code through the GitLab interface.

Before a MR is merged at least one other team member should provide “approval” that the proposed change is fit for purpose. Depending on what type of change is being proposed, different team members should be consulted for feedback: if a change is syntactical or architectural then feedback from the core dev team would be sufficient - if key modelling logic is being modified a subject matter expert should ideally also be consulted.

Things that contributors should consider when creating merge requests:

  1. Make it as small as it can be - small MR are significantly easier to provide meaningful feedback on, if large scale changes are required consider breaking the change into multiple MR.

  2. Annotate the MR with the information the reviewer will need - if manual testing is required include the results of such testing in the MR description, if the MR is a WIP include a TODO list of items that are planned but not yet complete, use comments to draw attention to more impactful parts of the changes that warrant more scrutiny, etc.

Things that reviewers should be considered when providing feedback:

  1. Does this change have impacts on other modelling compenents that the author may have missed?

  2. Will this change result in significant runtime or memory usage increases?

  3. Does this change have appropriate documentation updates?

  4. Does this change have appropriate levels of automated tests?

  5. Is your feedback constructive and clearly actionable?

  6. Prioritise quick resolution - it is better to merge a MR that makes the codebase better now rather than waiting 3 months to make it perfect.

Automated Testing#

Automated testing is the accelerator that allows contributors and reviewers to focus on important issues, rather than getting bogged down in “does this do what I think it does?” or “wont this change the way that X behaves” questions. By writing a test that demonstrates our new functionality we achieve two key things:

  1. Proof that the changes achieve the desired outcome now and

  2. A way to prove that they continue to achieve the same outcome into the future despite other changes that might be made later

There are also many schools of thought that suggest that by writing tests at the same time as (or preferably before) writing the actual code, that the structure and architecture of the final solution will be more modular and maintainable. By focusing on how a new feature will be used in practice the contributor is forced to think more about the interfaces that feature will have to other parts of the system. This leads to better architectural outcomes as a whole. It also has significant benefits for productivity when contributors can be more certain that their changes are not breaking other features that they are less familiar with - a very real problem in complex interconnected systems such as POLARIS.

Where possible it is preferred that all proposed changes come with an accompanying test that demonstrates that the changes achieve the desired objective of the MR.

Documentation as Code#

Another aspect of shortening loops is that all our documentation lives side by side with the code that it describes. This document, for example, is stored as a Markdown format file within the polaris-linux codebase but it likely being read now as a HTML webpage on the internet. This means that updates to the code can (and should) be shipped with corresponding changes to the documentation and in some cases where we have documentation with code samples included, those code samples can be run against the changed code to automatically detect incompatibilities.

This practice combines well with the “automate everything” principal - an automated step in our CI/CD system will build web-content versions of the documentation for each commit in a Merge Request - if there are any syntactial or compatibility problems with the documentation the MR can not be merged. Additionally, when a documentation change is approved and merged back into the main branch, the CI/CD system will take the produced web-content and publish it to our web-server, ensuring that our online documentation is always consistent with our codebase.