The PyePAL API reference#

The PAL package#

Core functions#

Core functions for PAL

Base class#

Base class for PAL

class pyepal.pal.pal_base.PALBase(X_design, models, ndim, epsilon=0.01, delta=0.05, beta_scale=0.1111111111111111, goals=None, coef_var_threshold=3, ranges=None)[source]#

Bases: object

PAL base class

__init__(X_design, models, ndim, epsilon=0.01, delta=0.05, beta_scale=0.1111111111111111, goals=None, coef_var_threshold=3, ranges=None)[source]#

Initialize the PAL instance

Parameters
  • X_design (np.array) – Design space (feature matrix)

  • models (list) – Machine learning models

  • ndim (int) – Number of objectives

  • epsilon (Union[list, float], optional) – Epsilon hyperparameter. Defaults to 0.01.

  • delta (float, optional) – Delta hyperparameter. Defaults to 0.05.

  • beta_scale (float, optional) – Scaling parameter for beta. If not equal to 1, the theoretical guarantees do not necessarily hold. Also note that the parametrization depends on the kernel type. Defaults to 1/9.

  • goals (List[str], optional) – If a list, provide “min” for every objective that shall be minimized and “max” for every objective that shall be maximized. Defaults to None, which _means that the code maximizes all objectives.

  • coef_var_threshold (float, optional) – Use only points with a coefficient of variation below this threshold in the classification step. Defaults to 3.

  • ranges (np.ndarray, optional) – Numpy array of length ndmin, where each element contains the value range of given objective. If this is provided, we will use \(\epsilon \cdot ranges\) to computer the uncertainties of the hyperrectangles instead of the default behavior \(\epsilon \cdot |\mu|\)

__repr__()[source]#

Return repr(self).

__weakref__#

list of weak references to the object (if defined)

augment_design_space(X_design, classify=False, clean_classify=True)[source]#

Add new design points to PAL instance

Parameters
  • X_design (np.ndarrary) – Design matrix. Two-dimensional array containing measurements in the rows and the features as the columns.

  • classify (bool) – Reclassifies the new design space, using the old model. This is, it runs inference, calculates the hyperrectangles, and runs the classification. Does not increase the iteration count. Note though that points that already have been classified as Pareto-optimal will not be re-classified, e.g., discarded—even if the new design points dominate the existing “Pareto optimal” points. Defaults to False.

  • clean_classify (bool) – Reclassifies the new design space, using the old model. This is, it runs inference, calculates the hyperrectangles, and runs the classification. Does not increase the iteration count. But, in contrast to classify it erases all previous classifications, before running the new classification. Hence, if some new design point dominates a previously Pareto efficient point, the previous Pareto optimal point will no longer be classified as Pareto efficient. This flag is incompatible with classify. If you choose clean_classify, PyePAL will erase all previous classifications, independent of what you choose for classify. Defaults to True.

Return type

None

property discarded_indices#

Return the indices of the discarded points

property discarded_points#

Return the discarded points

property hyperrectangle_sizes#

Return the sizes of the hyperrectangles

property means#

Return the means predicted by the model

property number_design_points#

Return the number of points in the design space

property number_discarded_points#

Return the number of discarded points

property number_pareto_optimal_points#

Return the number of Pareto optimal points

property number_sampled_points#

Return the number of sampled points

property number_unclassified_points#

Return the number of unclassified points

property pareto_optimal_indices#

Return the indices of the Pareto optimal points

property pareto_optimal_points#

Return the pareto optimal points

run_one_step(batch_size=1, pooling_method='fro', sample_discarded=False, use_coef_var=True, replace_mean=True, replace_std=True)[source]#

[summary]

Parameters
  • batch_size (int, optional) – Number of indices that will be returned. Defaults to 1.

  • pooling_method (str) – Method that is used to aggregate the uncertainty in different objectives into one scalar. Available options are: “fro” (Frobenius/Euclidean norm), “mean”, “median”. Defaults to “fro”.

  • sample_discarded (bool) – if true, it will sample from all points and not only from the unclassified and Pareto optimal ones

  • use_coef_var (bool) – If True, uses the coefficient of variation instead of the unscaled rectangle sizes

  • replace_mean (bool) – If true uses the measured _means for the sampled points

  • replace_std (bool) – If true uses the measured standard deviation for the sampled points

Raises

ValueError – In case the PAL instance was not initialized with measurements.

Returns

Returns array of indices if there are

unclassified points that can be sample left.

Return type

Union[np.array, None]

sample(exclude_idx=None, pooling_method='fro', sample_discarded=False, use_coef_var=True)[source]#

Runs the sampling step based on the size of the hyperrectangle. I.e., favoring exploration.

Parameters
  • exclude_idx (Union[np.array, None], optional) – Points in design space to exclude from sampling. Defaults to None.

  • pooling_method (str) – Method that is used to aggregate the uncertainty in different objectives into one scalar. Available options are: “fro” (Frobenius/Euclidean norm), “mean”, “median”. Defaults to “fro”.

  • sample_discarded (bool) – if true, it will sample from all points and not only from the unclassified and Pareto optimal ones

  • use_coef_var (bool) – If True, uses the coefficient of variation instead of the unscaled rectangle sizes

Raises

ValueError – In case there are no uncertainty rectangles, i.e., when the _predict has not been successfully called.

Returns

Index of next point to evaluate in design space

Return type

int

property sampled_indices#

Return the indices of the sampled points

property sampled_mask#

Create a mask for the sampled points We count a point as sampled if at least one objective has been measured, i.e., self.sampled is a N * number objectives array in which some columns can be false if a measurement has not been performed

property sampled_points#

Return the sampled points

should_cross_validate()[source]#

Override for more complex cross validation schedules

property unclassified_indices#

Return the indices of the unclassified points

property unclassified_points#

Return the discarded points

update_train_set(indices, measurements, measurement_uncertainty=None)[source]#

Update training set following a measurement

Parameters
  • indices (np.ndarray) – Indices of design space at which the measurements were taken

  • measurements (np.ndarray) – Measured values, 2D array. the length must equal the length of the indices array. the second direction must equal the number of objectives. If an objective is missing, provide np.nan. For example, np.array([1, 1, np.nan])

  • measurement_uncertainty (np.ndarray) – uncertainty in the measuremens, if not provided (None) will be zero. If it is not None, it must be an array with the same shape as the measurements If an objective is missing, provide np.nan. For example, np.array([1, 1, np.nan])

property uses_fixed_epsilon#

True if it uses the fixed epsilon \(\epsilon \cdot ranges\)

For GPy models#

PAL using GPy GPR models

class pyepal.pal.pal_gpy.PALGPy(*args, **kwargs)[source]#

Bases: pyepal.pal.pal_base.PALBase

PAL class for a list of GPy GPR models, with one model per objective

__init__(*args, **kwargs)[source]#

Contruct the PALGPy instance

Parameters
  • X_design (np.array) – Design space (feature matrix)

  • models (list) – Machine learning models

  • ndim (int) – Number of objectives

  • epsilon (Union[list, float], optional) – Epsilon hyperparameter. Defaults to 0.01.

  • delta (float, optional) – Delta hyperparameter. Defaults to 0.05.

  • beta_scale (float, optional) – Scaling parameter for beta. If not equal to 1, the theoretical guarantees do not necessarily hold. Also note that the parametrization depends on the kernel type. Defaults to 1/9.

  • goals (List[str], optional) – If a list, provide “min” for every objective that shall be minimized and “max” for every objective that shall be maximized. Defaults to None, which means that the code maximizes all objectives.

  • coef_var_threshold (float, optional) – Use only points with a coefficient of variation below this threshold in the classification step. Defaults to 3.

  • restarts (int) – Number of random restarts that are used for hyperparameter optimization. Defaults to 20.

  • n_jobs (int) – Number of parallel processes that are used to fit the GPR models. Defaults to 1.

For coregionalized GPy models#

PAL for coregionalized GPR models

class pyepal.pal.pal_coregionalized.PALCoregionalized(*args, **kwargs)[source]#

Bases: pyepal.pal.pal_base.PALBase

PAL class for a coregionalized GPR model

__init__(*args, **kwargs)[source]#

Construct the PALCoregionalized instance

Parameters
  • X_design (np.array) – Design space (feature matrix)

  • models (list) – Machine learning models

  • ndim (int) – Number of objectives

  • epsilon (Union[list, float], optional) – Epsilon hyperparameter. Defaults to 0.01.

  • delta (float, optional) – Delta hyperparameter. Defaults to 0.05.

  • beta_scale (float, optional) – Scaling parameter for beta. If not equal to 1, the theoretical guarantees do not necessarily hold. Also note that the parametrization depends on the kernel type. Defaults to 1/9.

  • goals (List[str], optional) – If a list, provide “min” for every objective that shall be minimized and “max” for every objective that shall be maximized. Defaults to None, which means that the code maximizes all objectives.

  • coef_var_threshold (float, optional) – Use only points with a coefficient of variation below this threshold in the classification step. Defaults to 3.

  • restarts (int) – Number of random restarts that are used for hyperparameter optimization. Defaults to 20.

  • parallel (bool) – If true, model hyperparameters are optimized in parallel, using the GPy implementation. Defaults to False.

For sklearn GPR models#

PAL using Sklearn GPR models

class pyepal.pal.pal_sklearn.PALSklearn(*args, **kwargs)[source]#

Bases: pyepal.pal.pal_base.PALBase

PAL class for a list of Sklearn (GPR) models, with one model per objective

__init__(*args, **kwargs)[source]#

Construct the PALSklearn instance

Parameters
  • X_design (np.array) – Design space (feature matrix)

  • models (list) – Machine learning models. You can provide a list of GaussianProcessRegressor instances or a list of fitted RandomizedSearchCV/GridSearchCV instances with GaussianProcessRegressor models

  • ndim (int) – Number of objectives

  • epsilon (Union[list, float], optional) – Epsilon hyperparameter. Defaults to 0.01.

  • delta (float, optional) – Delta hyperparameter. Defaults to 0.05.

  • beta_scale (float, optional) – Scaling parameter for beta. If not equal to 1, the theoretical guarantees do not necessarily hold. Also note that the parametrization depends on the kernel type. Defaults to 1/9.

  • goals (List[str], optional) – If a list, provide “min” for every objective that shall be minimized and “max” for every objective that shall be maximized. Defaults to None, which means that the code maximizes all objectives.

  • coef_var_threshold (float, optional) – Use only points with a coefficient of variation below this threshold in the classification step. Defaults to 3.

  • n_jobs (int) – Number of parallel processes that are used to fit the GPR models. Defaults to 1.

For quantile regression with LightGBM#

Implements a PAL class for GBDT models which can predict uncertainity intervals when used with quantile loss. For an example of GBDT with quantile loss see Jablonka, Kevin Maik; Moosavi, Seyed Mohamad; Asgari, Mehrdad; Ireland, Christopher; Patiny, Luc; Smit, Berend (2020): A Data-Driven Perspective on the Colours of Metal-Organic Frameworks. ChemRxiv. Preprint. https://doi.org/10.26434/chemrxiv.13033217.v1

For general information about quantile regression see https://en.wikipedia.org/wiki/Quantile_regression

Note that the scaling of the hyperrectangles has been derived for GPR models (with RBF kernels).

class pyepal.pal.pal_gbdt.PALGBDT(*args, **kwargs)[source]#

Bases: pyepal.pal.pal_base.PALBase

PAL class for a list of LightGBM GBDT models

__init__(*args, **kwargs)[source]#

Construct the PALGBDT instance

Parameters
  • X_design (np.array) – Design space (feature matrix)

  • (List[Iterable[LGBMRegressor (models) – Machine learning models. You need to provide a list of iterables. One iterable per objective and every iterable contains three LGBMRegressors. The first one for the lower uncertainty limits, the middle one for the median and the last one for the upper limit. To create appropriate models, you need to use the quantile loss. If you want to parallelize training, we recommend that you use the LightGBM parallelization and fit the models for the different objectives in serial fashion.s

  • LGBMRegressor – Machine learning models. You need to provide a list of iterables. One iterable per objective and every iterable contains three LGBMRegressors. The first one for the lower uncertainty limits, the middle one for the median and the last one for the upper limit. To create appropriate models, you need to use the quantile loss. If you want to parallelize training, we recommend that you use the LightGBM parallelization and fit the models for the different objectives in serial fashion.s

  • LGBMRegressor]] – Machine learning models. You need to provide a list of iterables. One iterable per objective and every iterable contains three LGBMRegressors. The first one for the lower uncertainty limits, the middle one for the median and the last one for the upper limit. To create appropriate models, you need to use the quantile loss. If you want to parallelize training, we recommend that you use the LightGBM parallelization and fit the models for the different objectives in serial fashion.s

  • ndim (int) – Number of objectives

  • epsilon (Union[list, float], optional) – Epsilon hyperparameter. Defaults to 0.01.

  • delta (float, optional) – Delta hyperparameter. Defaults to 0.05.

  • beta_scale (float, optional) – Scaling parameter for beta. If not equal to 1, the theoretical guarantees do not necessarily hold. Also note that the parametrization depends on the kernel type. Defaults to 1/9.

  • goals (List[str], optional) – If a list, provide “min” for every objective that shall be minimized and “max” for every objective that shall be maximized. Defaults to None, which means that the code maximizes all objectives.

  • coef_var_threshold (float, optional) – Use only points with a coefficient of variation below this threshold in the classification step. Defaults to 3.

  • interquartile_scaler (float, optional) – Used to convert the difference between the upper and lower quantile into a standard deviation. This, is std = (up-low)/interquartile_scaler. Defaults to 1.35, following Wan, X., Wang, W., Liu, J. et al. Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Med Res Methodol 14, 135 (2014). https://doi.org/10.1186/1471-2288-14-135

For GPR with GPFlow#

PAL using GPy GPR models

class pyepal.pal.pal_gpflowgpr.PALGPflowGPR(*args, **kwargs)[source]#

Bases: pyepal.pal.pal_base.PALBase

PAL class for a list of GPFlow GPR models, with one model per objective. Please consider that there are specific multioutput models (https://gpflow.readthedocs.io/en/master/notebooks/advanced/multioutput.html) for which the train and prediction function would need to be adjusted. You might also consider using streaming GPRs (https://github.com/thangbui/streaming_sparse_gp). In future releases we might support this case automatically (i.e., handle the case in which only one model is provided).

__init__(*args, **kwargs)[source]#

Contruct the PALGPflowGPR instance

Parameters
  • X_design (np.array) – Design space (feature matrix)

  • models (list) – Machine learning models

  • ndim (int) – Number of objectives

  • epsilon (Union[list, float], optional) – Epsilon hyperparameter. Defaults to 0.01.

  • delta (float, optional) – Delta hyperparameter. Defaults to 0.05.

  • beta_scale (float, optional) – Scaling parameter for beta. If not equal to 1, the theoretical guarantees do not necessarily hold. Also note that the parametrization depends on the kernel type. Defaults to 1/9.

  • goals (List[str], optional) – If a list, provide “min” for every objective that shall be minimized and “max” for every objective that shall be maximized. Defaults to None, which means that the code maximizes all objectives.

  • coef_var_threshold (float, optional) – Use only points with a coefficient of variation below this threshold in the classification step. Defaults to 3.

  • opt (function, optional) – Optimizer function for the GPR parameters. If None (default), then we will use ` gpflow.optimizers.Scipy()`

  • opt_kwargs (dict, optional) – Keyword arguments passed to the optimizer. If None, PyePAL will pass {“maxiter”: 100}

  • n_jobs (int) – Number of parallel threads that are used to fit the GPR models. Defaults to 1.

Schedules for hyperparameter optimization#

Provides some scheduling functions that can be used to implement the _should_optimize_hyperparameters function

pyepal.pal.schedules.exp_decay(iteration, base=10)[source]#

Optimize hyperparameters at logartihmically spaced intervals

Parameters
  • iteration (int) – current iteration

  • base (int, optional) – Base of the logarithm. Defaults to 10.

Returns

True if iteration is on the log scaled grid

Return type

bool

pyepal.pal.schedules.linear(iteration, frequency=10)[source]#

Optimize hyperparameters at equally spaced intervals

Parameters
  • iteration (int) – current iteration

  • frequency (int, optional) – Spacing between the True outputs. Defaults to 10.

Returns

True if iteration can be divided by frequency without remainder

Return type

bool

Utilities for multiobjective optimization#

Utilities for dealing with Pareto fronts in general

pyepal.pal.utils.dominance_check(point1, point2)[source]#

One point dominates another if it is not worse in all objectives and strictly better in at least one. This here assumes we want to maximize

Return type

bool

pyepal.pal.utils.dominance_check_jitted(point, array)[source]#

Check if point dominates any point in array

Return type

bool

pyepal.pal.utils.dominance_check_jitted_2(array, point)[source]#

Check if any point in array dominates point

Return type

bool

pyepal.pal.utils.dominance_check_jitted_3(array, point, ignore_me)[source]#

Check if any point in array dominates point. ignore_me since numba does not understand masked arrays

Return type

bool

pyepal.pal.utils.exhaust_loop(palinstance, y, batch_size=1)[source]#

Helper function that takes an initialized PAL instance and loops the sampling until there is no unclassified point left. This is useful if all measurements are already taken and one wants to test the algorithm with different hyperparameters.

Parameters
  • palinstance (PALBase) – A initialized instance of a class that inherited from PALBase and implemented the ._train() and ._predict() functions

  • y (np.array) – Measurements. The number of measurements must equal the number of points in the design space.

  • batch_size (int, optional) – Number of indices that will be returned. Defaults to 10.

Returns

None. The PAL instance is updated in place

pyepal.pal.utils.get_hypervolume(pareto_front, reference_vector, prefactor=- 1)[source]#

Compute the hypervolume indicator of a Pareto front I multiply it with minus one as we assume that we want to maximize all objective and then we calculate the area

f1 | |----| | -| | -| ———— f2

But the code we use for the hv indicator assumes that the reference vector is larger than all the points in the Pareto front. For this reason, we then flip all the signs using prefactor

This indicator is not needed for the epsilon-PAL algorithm itself but only to allow tracking a metric that might help the user to see if the algorithm converges.

Return type

float

pyepal.pal.utils.get_kmeans_samples(X, n_samples, **kwargs)[source]#

Get the samples that are closest to the k=n_samples centroids

Parameters
  • X (np.array) – Feature array, on which the KMeans clustering is run

  • n_samples (int) – number of samples are should be selected

  • KMeans (**kwargs passed to the) –

Returns

selected_indices

Return type

np.array

pyepal.pal.utils.get_maxmin_samples(X, n_samples, metric='euclidean', init='mean', seed=None, **kwargs)[source]#

Greedy maxmin sampling, also known as Kennard-Stone sampling (1). Note that a greedy sampling is not guaranteed to give the ideal solution and the output will depend on the random initialization (if this is chosen).

If you need a good solution, you can restart this algorithm multiple times with random initialization and different random seeds and use a coverage metric to quantify how well the space is covered. Some metrics are described in (2). In contrast to the code provided with (2) and (3) we do not consider the feature importance for the selection as this is typically not known beforehand.

You might want to standardize your data before applying this sampling function.

Some more sampling options are provided in our structure_comp (4) Python package. Also, this implementation here is quite memory hungry.

References: (1) Kennard, R. W.; Stone, L. A. Computer Aided Design of Experiments. Technometrics 1969, 11 (1), 137–148. https://doi.org/10.1080/00401706.1969.10490666. (2) Moosavi, S. M.; Nandy, A.; Jablonka, K. M.; Ongari, D.; Janet, J. P.; Boyd, P. G.; Lee, Y.; Smit, B.; Kulik, H. J. Understanding the Diversity of the Metal-Organic Framework Ecosystem. Nature Communications 2020, 11 (1), 4068. https://doi.org/10.1038/s41467-020-17755-8. (3) Moosavi, S. M.; Chidambaram, A.; Talirz, L.; Haranczyk, M.; Stylianou, K. C.; Smit, B. Capturing Chemical Intuition in Synthesis of Metal-Organic Frameworks. Nat Commun 2019, 10 (1), 539. https://doi.org/10.1038/s41467-019-08483-9. (4) https://github.com/kjappelbaum/structure_comp

Parameters
  • X (np.array) – Feature array, this is the array that is used to perform the sampling

  • n_samples (int) – number of points that will be selected, needs to be lower than the length of X

  • metric (str, optional) – Distance metric to use for the maxmin calculation. Must be a valid option of scipy.spatial.distance.cdist (‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘wminkowski’, ‘yule’). Defaults to ‘euclidean’

  • init (str, optional) – either ‘mean’, ‘median’, or ‘random’. Determines how the initial point is chosen. Defaults to ‘center’

  • seed (int, optional) – seed for the random number generator. Defaults to None.

  • cdist (**kwargs passed to the) –

Returns

selected_indices

Return type

np.array

pyepal.pal.utils.is_pareto_efficient(costs, return_mask=True)[source]#

Find the Pareto efficient points Based on https://stackoverflow.com/questions/ 32791911/fast-calculation-of-pareto-front-in-python

Parameters
  • costs (np.array) – An (n_points, n_costs) array

  • return_mask (bool, optional) – True to return a mask, Otherwise it will be a (n_efficient_points, ) integer array of indices. Defaults to True.

Returns

[description]

Return type

np.array

Utilities for plotting#

Plotting utilities

pyepal.plotting.plot_bar_iterations(pareto_optimal, non_pareto_points, unclassified_points, ax=None)[source]#

Plot stacked barplots for every step of the iteration.

Parameters
  • pareto_optimal (np.ndarray) – Number of pareto optimal points for every iteration.

  • non_pareto_points (np.ndarray) – Number of discarded points for every iteration

  • unclassified_points (np.ndarray) – Number of unclassified points for every iteration

Returns

matplotlib axis (the same that was provided as input

or one from a new figure if no axis was provided)

Return type

axis

pyepal.plotting.plot_histogram(y, palinstance, ax=None)[source]#

Plot histograms, with maxima scaled to one and different categories indicated in color for one objective

Parameters
  • y (np.ndarray) – objective (measurement)

  • palinstance (PALBase) – instance of a PAL class

  • ax (ax) – Matplotlib figure axis

Returns

matplotlib axis (the same that was provided as input

or one from a new figure if no axis was provided)ƒ

Return type

ax

pyepal.plotting.plot_jointplot(y, palinstance, labels=None, figsize=(8.0, 6.0))[source]#

Plot a jointplot of the objective space with histograms on the diagonal and 2D-Pareto plots on the off-diagonal.

Parameters
  • y (np.array) – Two-dimensional array with the objectives (measurements)

  • palinstance (PALBase) – “trained” PAL instance

  • labels (Union[List[str], None], optional) – Labels for each objective. Defaults to “objective [index]”.

  • figsize (tuple, optional) – Figure size for joint plot. Defaults to (8.0, 6.0).

Returns

matplotlib Figure object.

Return type

fig

pyepal.plotting.plot_pareto_front_2d(y_0, y_1, std_0, std_1, palinstance, ax=None)[source]#

Plot a 2D pareto front, with the different categories indicated in color.

Parameters
  • y_0 (np.ndarray) – objective 0

  • y_1 (np.ndarray) – objective 1

  • std_0 (np.ndarray) – standard deviation objective 0

  • std_1 (np.ndarray) – standard deviation objective 0

  • palinstance (PALBase) – PAL instance

  • ax (axix, optional) – Matplotlib figure axis. Defaults to None.

Returns

matplotlib axis (the same that was provided as input

or one from a new figure if no axis was provided)

Return type

ax

pyepal.plotting.plot_residuals(y, palinstance, labels=None, figsize=(6.0, 4.0))[source]#

Plot signed residual (on y axis) vs fitted (on x axis) plot of sampled points. Will create suplots for y.ndim > 1.

Parameters
  • y (np.array) – Two-dimensional array with the objectives (measurements)

  • palinstance (PALBase) – “trained” PAL instance

  • labels (Union[List[str], None], optional) – Labels for each objective. Defaults to “objective [index]”.

  • figsize (tuple, optional) – Figure size for each individual residual vs fitted objective plot. Defaults to (6.0, 4.0).

Returns

matplotlib Figure object

Return type

fig

Input validation#

Methods to validate inputs for the PAL classes

pyepal.pal.validate_inputs.base_validate_models(models)[source]#

Currently no validation as the predict and train function are implemented independet of the base class

Return type

list

pyepal.pal.validate_inputs.validate_beta_scale(beta_scale)[source]#
Parameters

beta_scale (Any) – scaling factor for beta

Raises

ValueError – If beta is smaller than 0

Returns

scaling factor for beta

Return type

float

pyepal.pal.validate_inputs.validate_coef_var(coef_var)[source]#

Make sure that the coef_var makes sense

pyepal.pal.validate_inputs.validate_coregionalized_gpy(models)[source]#

Make sure that model is a coregionalized GPR model

pyepal.pal.validate_inputs.validate_delta(delta)[source]#

Make sure that delta is in a reasonable range

Parameters

delta (Any) – Delta hyperparameter

Raises

ValueError – Delta must be in [0,1].

Returns

delta

Return type

float

pyepal.pal.validate_inputs.validate_epsilon(epsilon, ndim)[source]#

Validate epsilon and return a np.array

Parameters
  • epsilon (Any) – Epsilon hyperparameter

  • ndim (int) – Number of dimensions/objectives

Raises
  • ValueError – If epsilon is a list there must be one float per dimension

  • ValueError – Epsilon must be in [0,1]

  • ValueError – If epsilon is an array there must be one float per dimension

Returns

Array of one epsilon per objective

Return type

np.ndarray

pyepal.pal.validate_inputs.validate_gbdt_models(models, ndim)[source]#

Make sure that the number of iterables is equal to the number of objectives and that every iterable contains three LGBMRegressors. Also, we check that at least the first and last models use quantile loss

Return type

List[Iterable]

pyepal.pal.validate_inputs.validate_goals(goals, ndim)[source]#
Create a valid array of goals. 1 for maximization, -1

for objectives that are to be minimized.

Parameters
  • goals (Any) – List of goals, typically provideded as strings ‘max’ for maximization and ‘min’ for minimization

  • ndim (int) – number of dimensions

Raises
  • ValueError – If goals is a list and the length is not equal to ndim

  • ValueError – If goals is a list and the elements are not strings ‘min’, ‘max’ or -1 and 1

Returns

Array of -1 and 1

Return type

np.ndarray

pyepal.pal.validate_inputs.validate_gpy_model(models)[source]#

Make sure that all elements of the list a GPRegression models

pyepal.pal.validate_inputs.validate_interquartile_scaler(interquartile_scaler)[source]#

Make sure that the interquartile_scaler makes sense

Return type

float

pyepal.pal.validate_inputs.validate_ndim(ndim)[source]#

Make sure that the number of dimensions makes sense

Parameters

ndim (Any) – number of dimensions

Raises
  • ValueError – If the number of dimensions is not an integer

  • ValueError – If the number of dimensions is not greater than 0

Returns

the number of dimensions

Return type

int

pyepal.pal.validate_inputs.validate_njobs(njobs)[source]#

Make sure that njobs is an int > 1

Return type

int

pyepal.pal.validate_inputs.validate_nt_models(models, ndim)[source]#

Make sure that we can work with a sequence of pyepal.pal.models.nt.NTModel()

Return type

Sequence

pyepal.pal.validate_inputs.validate_number_models(models, ndim)[source]#

Make sure that there are as many models as objectives

Parameters
  • models (Any) – List of models

  • ndim (int) – Number of objectives

Raises

ValueError – If the number of models does not equal the number of objectives

pyepal.pal.validate_inputs.validate_optimizers(optimizers, ndim)[source]#

Make sure that we can work with a Sequence if JaxOptimizer

Return type

Sequence

pyepal.pal.validate_inputs.validate_positive_integer_list(seq, ndim, parameter_name='Parameter')[source]#

Can be used, e.g., to validate and standardize the ensemble size and epochs input

Return type

Sequence[int]

pyepal.pal.validate_inputs.validate_sklearn_gpr_models(models, ndim)[source]#

Make sure that there is a list of GPR models, one model per objective

Return type

List[GaussianProcessRegressor]

The models package#

Helper functions for GPR with GPy#

Wrappers for Gaussian Process Regression models.

We typically use the GPy package as it offers most flexibility for Gaussian processes in Python. Typically, we use automatic relevance determination (ARD), where one lengthscale parameter per input dimension is used.

If your task requires training on larger training sets, you might consider replacing the models with their sparse version but for the epsilon-PAL algorithm this typically shouldn’t be needed.

For kernel selection, you can have a look at https://www.cs.toronto.edu/~duvenaud/cookbook/ Matérn, RBF and RationalQuadrat are good quick and dirty solutions but have their caveats

pyepal.models.gpr.build_coregionalized_model(X_train, y_train, kernel=None, w_rank=1, **kwargs)[source]#

Wrapper for building a coregionalized GPR, it will have as many outputs as y_train.shape[1]. Each output will have its own noise term

Return type

GPCoregionalizedRegression

pyepal.models.gpr.build_model(X_train, y_train, index=0, kernel=None, **kwargs)[source]#

Build a single-output GPR model

Return type

GPRegression

pyepal.models.gpr.get_matern_32_kernel(NFEAT, ARD=True, **kwargs)[source]#

Matern-3/2 kernel without ARD

Return type

Matern32

pyepal.models.gpr.get_matern_52_kernel(NFEAT, ARD=True, **kwargs)[source]#

Matern-5/2 kernel without ARD

Return type

Matern52

pyepal.models.gpr.get_ratquad_kernel(NFEAT, ARD=True, **kwargs)[source]#

Rational quadratic kernel without ARD

Return type

RatQuad

pyepal.models.gpr.predict(model, X)[source]#

Wrapper function for the prediction method of a GPy regression model. It return the standard deviation instead of the variance

Return type

Tuple[array, array]

pyepal.models.gpr.predict_coregionalized(model, X, index=0)[source]#

Wrapper function for the prediction method of a coregionalized GPy regression model. It return the standard deviation instead of the variance

Return type

Tuple[array, array]

pyepal.models.gpr.set_xy_coregionalized(model, X, y, mask=None)[source]#

Wrapper to update a coregionalized model with new data