Developer notes
Contents
Developer notes#
Contribution Guidelines#
Commit messages#
To automatically generate the changelog and releases we use conventional commits use the prefix
feat
for new features,chore
for updating grunt tasks etc; no production code change,fix
for bug fixes anddocs
for changes to the documentation. Usefeat!:
, orfix!:
,refactor!:
, etc., to represent a breaking change (indicated by the !). This will result in bump of the SemVer major version number.
Python code#
Please install the pre-commit hooks using
to automatically
We use type hints, which we feel is a good way of documentation and helps us find bugs using mypy.
Some of the pre-commit hooks modify the files, e.g., they trim whitespaces or format the code. If they modify your file, you will have
to run git add
and git commit
again. To skip the pre-commit checks (not recommended) you can use git commit --no-verify
.
New features#
Please make a new branch for the development of new features. Rebase on the upstream master and include a test for your new feature. (The CI checks for a drop in code coverage.)
Releases#
Releases are automated using a GitHub actions based on the commit message. Maintainers manually upload the release to PyPi.
Implementing a new PAL class#
If you want to use PyePAL with a model that we do not support yet, i.e., not GPy
or sklearn
Gaussian process regression, it is easy to write your own class. For this, you will need to inherit from PALBase
and implement your _train
and _predict()
functions (and maybe also the pyepal.pal.pal_base.PALBase._set_hyperparameters
and pyepal.pal.pal_base.PALBase._should_optimize_hyperparameters
functions) using the design_space
and y
attributes of the class.
For instance, if we develop some multioutput model that has a train()
and a predict()
method, we could simply use the following design pattern
from pyepal import PALBase
class PALMyModel(PALBase):
def _train(self):
self.models[0].train(self.design_space[self.sampled], self.y[self.sampled])
def _predict(self):
self.mu, self.std = self.models[0].predict(self.design_space)
Note that we typically provide the models, even if it is only one, in a list to keep the API consistent.
In some instances, you may want to perform an operation in parallel, e.g., train the models for different objectives in parallel. One convenient way to do this in Python is by using concurrent.futures. The only caveat to this that this approach requires that the function is picklable. To ensure this, you may want to implement the function that you want to parallelize, outside the class. For example, you could use the following design pattern
from pyepal import PALBase
import concurrent.futures
from functools import partial
def _train_model_picklable(i, models, design_space, objectives, sampled):
model = models[i]
model.fit(
design_space[sampled[:, i]],
objectives[sampled[:, i], i].reshape(-1, 1),
)
return model
class MyPal(PALBase):
def __init__(self, *args, **kwargs):
n_jobs = kwargs.pop("n_jobs", 1)
validate_njobs(n_jobs)
self.n_jobs = n_jobs
super().__init__(*args, **kwargs)
validate_number_models(self.models, self.ndim)
def _train(self):
train_single_partial = partial(
_train_model_picklable,
models=self.models,
design_space=self.design_space,
objectives=self.y,
sampled=self.sampled,
)
models = []
with concurrent.futures.ProcessPoolExecutor(
max_workers=self.n_jobs
) as executor:
for model in executor.map(train_single_partial, range(self.ndim)):
models.append(model)
self.models = models