Developer notes#

Contribution Guidelines#

Commit messages#

  • To automatically generate the changelog and releases we use conventional commits use the prefix feat for new features, chore for updating grunt tasks etc; no production code change, fix for bug fixes and docs for changes to the documentation. Use feat!:, or fix!:, refactor!:, etc., to represent a breaking change (indicated by the !). This will result in bump of the SemVer major version number.

Python code#

Please install the pre-commit hooks using

to automatically

  • format the code with black

  • sort the imports with isort

  • lint the code with pylint

We use type hints, which we feel is a good way of documentation and helps us find bugs using mypy.

Some of the pre-commit hooks modify the files, e.g., they trim whitespaces or format the code. If they modify your file, you will have to run git add and git commit again. To skip the pre-commit checks (not recommended) you can use git commit --no-verify.

New features#

Please make a new branch for the development of new features. Rebase on the upstream master and include a test for your new feature. (The CI checks for a drop in code coverage.)

Releases#

Releases are automated using a GitHub actions based on the commit message. Maintainers manually upload the release to PyPi.

Implementing a new PAL class#

If you want to use PyePAL with a model that we do not support yet, i.e., not GPy or sklearn Gaussian process regression, it is easy to write your own class. For this, you will need to inherit from PALBase and implement your _train and _predict() functions (and maybe also the pyepal.pal.pal_base.PALBase._set_hyperparameters and pyepal.pal.pal_base.PALBase._should_optimize_hyperparameters functions) using the design_space and y attributes of the class.

For instance, if we develop some multioutput model that has a train() and a predict() method, we could simply use the following design pattern

from pyepal import PALBase

class PALMyModel(PALBase):
    def _train(self):
        self.models[0].train(self.design_space[self.sampled], self.y[self.sampled])

    def _predict(self):
        self.mu, self.std = self.models[0].predict(self.design_space)

Note that we typically provide the models, even if it is only one, in a list to keep the API consistent.

In some instances, you may want to perform an operation in parallel, e.g., train the models for different objectives in parallel. One convenient way to do this in Python is by using concurrent.futures. The only caveat to this that this approach requires that the function is picklable. To ensure this, you may want to implement the function that you want to parallelize, outside the class. For example, you could use the following design pattern

from pyepal import PALBase
import concurrent.futures
from functools import partial

def _train_model_picklable(i, models, design_space, objectives, sampled):
    model = models[i]
    model.fit(
        design_space[sampled[:, i]],
        objectives[sampled[:, i], i].reshape(-1, 1),
    )
    return model

class MyPal(PALBase):
    def __init__(self, *args, **kwargs):
        n_jobs = kwargs.pop("n_jobs", 1)
        validate_njobs(n_jobs)
        self.n_jobs = n_jobs
        super().__init__(*args, **kwargs)

        validate_number_models(self.models, self.ndim)

    def _train(self):
        train_single_partial = partial(
            _train_model_picklable,
            models=self.models,
            design_space=self.design_space,
            objectives=self.y,
            sampled=self.sampled,
        )
        models = []
        with concurrent.futures.ProcessPoolExecutor(
            max_workers=self.n_jobs
        ) as executor:
            for model in executor.map(train_single_partial, range(self.ndim)):
                models.append(model)
        self.models = models