Published

Ascender: Accelerator of Scientific Development and Research

Authors
sweaty-hands

"With an enthusiastic team you can achieve almost anything." - Tahir Shah

TL;DR

  • We have released Ascender, a GitHub repository template for research projects using Python.
  • It especially focuses on features that are useful in projects where team research and development.

Table of Contents

Background

Hello, this is @gatheluck. This post is also the sixth post of "研究コミュニティ cvpaper.challenge 〜CV 分野の今を映し,トレンドを創り出す〜 Advent Calendar 2022".

Because cvpaper.challnege, where I belong to, conducts research activities in groups, there are frequent opportunities for group members to develop experimental code in different environments for a single project. However, since there has been no common template or coding style guideline for our group, sharing and reusing of code or agile style team development had not been successful. Therefore, as part of the meta-survey activities (meta-survey report slide) of our XCCV group which is a collaborative research group between the MIRU Mentorship Program and cvpaper.challenge, a project template Ascender was developed and released as a public OSS in July 2022.

Since its release, we have received positive comments from several people, including Kaggle Grandmaster @mamas-san, and it has been used in several research projects, including in the cvpaper.challenge. Most recently, NeDDF which is accepted by ECCV 2022 also uses Ascender in its implementation.

The latest version is v0.1.2 at the time of this blog post publication and has earned 140 stars (As of December 5th, 2022). We are continuously developing, so if you have any requests for new features or improvements, please feel free to submit an issue or pull request.

This article provides a brief introduction to Ascender's features and usage. If this is your first time to know about it, we hope you will consider using it in your next project.

What is Ascender

Ascender is a GitHub repository template for research projects using Python. The main features are as follows:

  • Container-based virtualization: Docker is used to reduce development environment dependencies and improve portability.
  • Package management: Package management using Poetry improves reproducibility of the same environment.
  • Coding style: Automatically formats coding style using Black, Flake8, and isort.
  • Static type check: Static type checking with Mypy improves code reliability.
  • Test: Easy to integrate test code with pytest.
  • GitHub related features: Basic workflow which is kicked when pull request is created, issue template, etc. have been implemented.

There are already widely used templates for ML projects in Python, such as Cookiecutter Data Science and Lightning-Hydra-Template, but Ascender have focused on team development and have tried not to be restricted by specific libraries.

Prerequisites

Ascender recommends that you develop in a Docker container to minimize the influence of the development environment. Therefore, we will explain how to install Docker, Docker Compose and NVIDIA Container Toolkit (nvidia-docker2) in preparation for starting development with Ascender. On the other hand, it is also possible to develop with Ascender without Docker, although it is not recommended. Therefore, if you have already installed Docker or if you do not use Docker, please skip this section.

The following installation commands are for Ubuntu. If you are developing on a different OS, please refer to the official documentation of each OS and install accordingly.

Install Docker and Docker Compose.

% sudo apt update
% sudo apt install docker-ce docker-ce-cli containerd.io docker-compose-plugin

When you run sudo docker run --rm hello-world and see a message starting with "Hello from Docker!", you have successfully installed Docker.

If you develop using a GPU, you must also install the NVIDIA Container Toolkit.

% distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

% sudo apt update
% sudo apt install -y nvidia-docker2
% sudo systemctl restart docker

When you run sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base nvidia-smi and see the result, you have successfully installed the NVIDIA Container Toolkit.

Start development

First, create a new Github repository using Ascender as a template and clone it locally. You can go to the repository creation page from this link.

% git clone git@github.com:<YOUR_USER_NAME>/<YOUR_REPO_NAME>.git
% cd <YOUR_REPO_NAME>

Build the Docker image and launch the container. Use docker-compose.yaml in environments/gpu for a GPU environment or environments/cpu for a CPU only environment.

% cd environments/gpu  # For CPU only environments, please use `cd environments/cpu`.
% sudo docker compose up -d

Enter the container with sudo docker compose exec core bash and use Poetry to create a virtual environment and install packages. You only need to run poetry install the first time.

% sudo docker compose exec core bash
$ poetry install

Now you are ready for development. sudo docker compose up -d have mounted a volume of your host PC's directory in the container /home/challenger/ascender, so that any code edits made on the host PC will be reflected in the container and vice versa.

If you do not use Docker for development, clone the repository locally, install Poetry directly on your local PC and run poetry install.

% git clone git@github.com:<YOUR_USER_NAME>/<YOUR_REPO_NAME>.git
% cd <YOUR_REPO_NAME>
% python3 -m pip install poetry  # Install Poetry on local PC.
% poetry install

However, since the workflow of Github Actions (described below) works with Dockerfile, the workflow may raise an error when creating PRs, etc. if you proceed with development without using Dockerfile. If you encounter such an error, you need to modify the Dockerfile or delete the workflow.

During development

Ascender introduces Poetry, Black, Flake8, isort, Mypy, and pytest to support development, and uses Make as a task runner that can easily execute them with a few commands.

Poetry

Ascender uses Poetry to manage Python packages. Compared to pip, Poetry has a certain learning cost, but using Poetry allows for a more reproducible development environment and a lower cost when releasing packages to PyPI. For a detailed explanation of how to use Poetry, please refer to this slide or its official documentation.

You can add packages using the poetry add command.

# Installing packages with Poetry
$ poetry add <PACKAGE_NAME>

One of the most common errors that occur when you run poetry add is the SolverProblemError. For the solution of SolverProblemError, please refer to the "Frequently faced error" page of the above slide.

To run commands using the virtual environment created by Poetry, use the poetry run command. If you have installed a package with Poetry, but face error which can't find it, you may have forgotten to add poetry run.

# Run hoge.py using the virtual environment created by Poetry
$ poetry run python hoge.py

Black, Flake8, isort

Ascender uses Black, Flake8, and isort to standardize the code style. You can apply these tools at once by using the commands defined in the Makefile (make format and make lint). The command names are based on PFN's pysen.

# Format the code style using Black and isort
$ make format
# Check code style and static type using Black, Flake8, isort, mypy (does not format code)
$ make lint

If you want to adjust the style settings, please edit pyproject.toml for Black and isort, and .flake8 for Flake8, as necessary. Please refer to the following links for the available configurations for each tool:

Mypy

Ascender recommends the use of type annotations whenever possible to improve code readability and reliability, and introduces static type checking with Mypy. If you want to see how to annotate types in Python, please refer to the Mypy cheatsheet or the typing package page of the Python documentation. Static type checking with Mypy can be done with the already introduced command make lint which also run other checks at same time.

# Check code style and static type using Black, Flake8, isort, mypy (does not format code)
$ make lint

Mypy configuration is described in pyproject.toml. Please refer to the official Mypy documentation for more information on the available configurations.

pytest

Ascender uses pytest as a tool for testing. If you want to know how to use pytest, please refer to the official documentation or Komoto-san's article on the M3 Tech Blog.

There is of course the question of whether it is really necessary to write test codes for research projects at the expense of development speed. For example, if test coverage itself is used as a KPI, it may be a overdone in a research project.

However, we believe that writing tests for important functions and classes that are central to the method has advantages worth devoting certain resources to, such as making it easier to find fatal bugs and helping other team members understand the use of the functions and classes.

You can run the tests placed under the tests directory by running the make test command. You can also use the make test-all command to perform all the checks introduced so far, using Black, Flake8, isort, Mypy, and pytest at once.

# Running tests with pytest
$ make test
# Run checks by Black, Flake8, isort, Mypy, pytest at once
$ make test-all

Create pull request

Ascender recommends that features be developed in a dedicated branch, and pull requests are created as needed to incorporate features into the main branch. If you want to learn more about developing with GitHub and its features, please refer to the official courses in GitHub Skills.

The default pull request template is defined in .github/PULL_REQUEST_TEMPLATE.md. In addition, a workflow that executes style and static type checks and test code is run at the time of a pull request. This is defined in .github/workflows/lint-and-test.yaml and runs for Python 3.8 and 3.9 by default.

Stop development

When development is finished, stop the container if necessary.

% cd environments/gpu  # `cd environments/cpu` for CPU only environments
% sudo dokcer compose stop

If you want to remove the container, use sudo dokcer compose down instead.

Summary

This time, we introduced Ascender, a GitHub repository template for research projects using Python. We will continue to develop and maintain Ascender, so if you have any questions or requests for additional features, please feel free to contact us via GitHub issue. Some questions are already answered in the FAQ section of the README, so please refer to that section as well.