The PyePAL API reference
Contents
The PyePAL API reference#
The PAL package#
Core functions#
Core functions for PAL
Base class#
Base class for PAL
- class pyepal.pal.pal_base.PALBase(X_design, models, ndim, epsilon=0.01, delta=0.05, beta_scale=0.1111111111111111, goals=None, coef_var_threshold=3, ranges=None)[source]#
Bases:
object
PAL base class
- __init__(X_design, models, ndim, epsilon=0.01, delta=0.05, beta_scale=0.1111111111111111, goals=None, coef_var_threshold=3, ranges=None)[source]#
Initialize the PAL instance
- Parameters
X_design (np.array) – Design space (feature matrix)
models (list) – Machine learning models
ndim (int) – Number of objectives
epsilon (Union[list, float], optional) – Epsilon hyperparameter. Defaults to 0.01.
delta (float, optional) – Delta hyperparameter. Defaults to 0.05.
beta_scale (float, optional) – Scaling parameter for beta. If not equal to 1, the theoretical guarantees do not necessarily hold. Also note that the parametrization depends on the kernel type. Defaults to 1/9.
goals (List[str], optional) – If a list, provide “min” for every objective that shall be minimized and “max” for every objective that shall be maximized. Defaults to None, which _means that the code maximizes all objectives.
coef_var_threshold (float, optional) – Use only points with a coefficient of variation below this threshold in the classification step. Defaults to 3.
ranges (np.ndarray, optional) – Numpy array of length ndmin, where each element contains the value range of given objective. If this is provided, we will use \(\epsilon \cdot ranges\) to computer the uncertainties of the hyperrectangles instead of the default behavior \(\epsilon \cdot |\mu|\)
- __weakref__#
list of weak references to the object (if defined)
- augment_design_space(X_design, classify=False, clean_classify=True)[source]#
Add new design points to PAL instance
- Parameters
X_design (np.ndarrary) – Design matrix. Two-dimensional array containing measurements in the rows and the features as the columns.
classify (bool) – Reclassifies the new design space, using the old model. This is, it runs inference, calculates the hyperrectangles, and runs the classification. Does not increase the iteration count. Note though that points that already have been classified as Pareto-optimal will not be re-classified, e.g., discarded—even if the new design points dominate the existing “Pareto optimal” points. Defaults to False.
clean_classify (bool) – Reclassifies the new design space, using the old model. This is, it runs inference, calculates the hyperrectangles, and runs the classification. Does not increase the iteration count. But, in contrast to classify it erases all previous classifications, before running the new classification. Hence, if some new design point dominates a previously Pareto efficient point, the previous Pareto optimal point will no longer be classified as Pareto efficient. This flag is incompatible with classify. If you choose clean_classify, PyePAL will erase all previous classifications, independent of what you choose for classify. Defaults to True.
- Return type
None
- property discarded_indices#
Return the indices of the discarded points
- property discarded_points#
Return the discarded points
- property hyperrectangle_sizes#
Return the sizes of the hyperrectangles
- property means#
Return the means predicted by the model
- property number_design_points#
Return the number of points in the design space
- property number_discarded_points#
Return the number of discarded points
- property number_pareto_optimal_points#
Return the number of Pareto optimal points
- property number_sampled_points#
Return the number of sampled points
- property number_unclassified_points#
Return the number of unclassified points
- property pareto_optimal_indices#
Return the indices of the Pareto optimal points
- property pareto_optimal_points#
Return the pareto optimal points
- run_one_step(batch_size=1, pooling_method='fro', sample_discarded=False, use_coef_var=True, replace_mean=True, replace_std=True)[source]#
[summary]
- Parameters
batch_size (int, optional) – Number of indices that will be returned. Defaults to 1.
pooling_method (str) – Method that is used to aggregate the uncertainty in different objectives into one scalar. Available options are: “fro” (Frobenius/Euclidean norm), “mean”, “median”. Defaults to “fro”.
sample_discarded (bool) – if true, it will sample from all points and not only from the unclassified and Pareto optimal ones
use_coef_var (bool) – If True, uses the coefficient of variation instead of the unscaled rectangle sizes
replace_mean (bool) – If true uses the measured _means for the sampled points
replace_std (bool) – If true uses the measured standard deviation for the sampled points
- Raises
ValueError – In case the PAL instance was not initialized with measurements.
- Returns
- Returns array of indices if there are
unclassified points that can be sample left.
- Return type
Union[np.array, None]
- sample(exclude_idx=None, pooling_method='fro', sample_discarded=False, use_coef_var=True)[source]#
Runs the sampling step based on the size of the hyperrectangle. I.e., favoring exploration.
- Parameters
exclude_idx (Union[np.array, None], optional) – Points in design space to exclude from sampling. Defaults to None.
pooling_method (str) – Method that is used to aggregate the uncertainty in different objectives into one scalar. Available options are: “fro” (Frobenius/Euclidean norm), “mean”, “median”. Defaults to “fro”.
sample_discarded (bool) – if true, it will sample from all points and not only from the unclassified and Pareto optimal ones
use_coef_var (bool) – If True, uses the coefficient of variation instead of the unscaled rectangle sizes
- Raises
ValueError – In case there are no uncertainty rectangles, i.e., when the _predict has not been successfully called.
- Returns
Index of next point to evaluate in design space
- Return type
int
- property sampled_indices#
Return the indices of the sampled points
- property sampled_mask#
Create a mask for the sampled points We count a point as sampled if at least one objective has been measured, i.e., self.sampled is a N * number objectives array in which some columns can be false if a measurement has not been performed
- property sampled_points#
Return the sampled points
- property unclassified_indices#
Return the indices of the unclassified points
- property unclassified_points#
Return the discarded points
- update_train_set(indices, measurements, measurement_uncertainty=None)[source]#
Update training set following a measurement
- Parameters
indices (np.ndarray) – Indices of design space at which the measurements were taken
measurements (np.ndarray) – Measured values, 2D array. the length must equal the length of the indices array. the second direction must equal the number of objectives. If an objective is missing, provide np.nan. For example, np.array([1, 1, np.nan])
measurement_uncertainty (np.ndarray) – uncertainty in the measuremens, if not provided (None) will be zero. If it is not None, it must be an array with the same shape as the measurements If an objective is missing, provide np.nan. For example, np.array([1, 1, np.nan])
- property uses_fixed_epsilon#
True if it uses the fixed epsilon \(\epsilon \cdot ranges\)
For GPy models#
PAL using GPy GPR models
- class pyepal.pal.pal_gpy.PALGPy(*args, **kwargs)[source]#
Bases:
pyepal.pal.pal_base.PALBase
PAL class for a list of GPy GPR models, with one model per objective
- __init__(*args, **kwargs)[source]#
Contruct the PALGPy instance
- Parameters
X_design (np.array) – Design space (feature matrix)
models (list) – Machine learning models
ndim (int) – Number of objectives
epsilon (Union[list, float], optional) – Epsilon hyperparameter. Defaults to 0.01.
delta (float, optional) – Delta hyperparameter. Defaults to 0.05.
beta_scale (float, optional) – Scaling parameter for beta. If not equal to 1, the theoretical guarantees do not necessarily hold. Also note that the parametrization depends on the kernel type. Defaults to 1/9.
goals (List[str], optional) – If a list, provide “min” for every objective that shall be minimized and “max” for every objective that shall be maximized. Defaults to None, which means that the code maximizes all objectives.
coef_var_threshold (float, optional) – Use only points with a coefficient of variation below this threshold in the classification step. Defaults to 3.
restarts (int) – Number of random restarts that are used for hyperparameter optimization. Defaults to 20.
n_jobs (int) – Number of parallel processes that are used to fit the GPR models. Defaults to 1.
For coregionalized GPy models#
PAL for coregionalized GPR models
- class pyepal.pal.pal_coregionalized.PALCoregionalized(*args, **kwargs)[source]#
Bases:
pyepal.pal.pal_base.PALBase
PAL class for a coregionalized GPR model
- __init__(*args, **kwargs)[source]#
Construct the PALCoregionalized instance
- Parameters
X_design (np.array) – Design space (feature matrix)
models (list) – Machine learning models
ndim (int) – Number of objectives
epsilon (Union[list, float], optional) – Epsilon hyperparameter. Defaults to 0.01.
delta (float, optional) – Delta hyperparameter. Defaults to 0.05.
beta_scale (float, optional) – Scaling parameter for beta. If not equal to 1, the theoretical guarantees do not necessarily hold. Also note that the parametrization depends on the kernel type. Defaults to 1/9.
goals (List[str], optional) – If a list, provide “min” for every objective that shall be minimized and “max” for every objective that shall be maximized. Defaults to None, which means that the code maximizes all objectives.
coef_var_threshold (float, optional) – Use only points with a coefficient of variation below this threshold in the classification step. Defaults to 3.
restarts (int) – Number of random restarts that are used for hyperparameter optimization. Defaults to 20.
parallel (bool) – If true, model hyperparameters are optimized in parallel, using the GPy implementation. Defaults to False.
For sklearn GPR models#
PAL using Sklearn GPR models
- class pyepal.pal.pal_sklearn.PALSklearn(*args, **kwargs)[source]#
Bases:
pyepal.pal.pal_base.PALBase
PAL class for a list of Sklearn (GPR) models, with one model per objective
- __init__(*args, **kwargs)[source]#
Construct the PALSklearn instance
- Parameters
X_design (np.array) – Design space (feature matrix)
models (list) – Machine learning models. You can provide a list of GaussianProcessRegressor instances or a list of fitted RandomizedSearchCV/GridSearchCV instances with GaussianProcessRegressor models
ndim (int) – Number of objectives
epsilon (Union[list, float], optional) – Epsilon hyperparameter. Defaults to 0.01.
delta (float, optional) – Delta hyperparameter. Defaults to 0.05.
beta_scale (float, optional) – Scaling parameter for beta. If not equal to 1, the theoretical guarantees do not necessarily hold. Also note that the parametrization depends on the kernel type. Defaults to 1/9.
goals (List[str], optional) – If a list, provide “min” for every objective that shall be minimized and “max” for every objective that shall be maximized. Defaults to None, which means that the code maximizes all objectives.
coef_var_threshold (float, optional) – Use only points with a coefficient of variation below this threshold in the classification step. Defaults to 3.
n_jobs (int) – Number of parallel processes that are used to fit the GPR models. Defaults to 1.
For quantile regression with LightGBM#
Implements a PAL class for GBDT models which can predict uncertainity intervals when used with quantile loss. For an example of GBDT with quantile loss see Jablonka, Kevin Maik; Moosavi, Seyed Mohamad; Asgari, Mehrdad; Ireland, Christopher; Patiny, Luc; Smit, Berend (2020): A Data-Driven Perspective on the Colours of Metal-Organic Frameworks. ChemRxiv. Preprint. https://doi.org/10.26434/chemrxiv.13033217.v1
For general information about quantile regression see https://en.wikipedia.org/wiki/Quantile_regression
Note that the scaling of the hyperrectangles has been derived for GPR models (with RBF kernels).
- class pyepal.pal.pal_gbdt.PALGBDT(*args, **kwargs)[source]#
Bases:
pyepal.pal.pal_base.PALBase
PAL class for a list of LightGBM GBDT models
- __init__(*args, **kwargs)[source]#
Construct the PALGBDT instance
- Parameters
X_design (np.array) – Design space (feature matrix)
(List[Iterable[LGBMRegressor (models) – Machine learning models. You need to provide a list of iterables. One iterable per objective and every iterable contains three LGBMRegressors. The first one for the lower uncertainty limits, the middle one for the median and the last one for the upper limit. To create appropriate models, you need to use the quantile loss. If you want to parallelize training, we recommend that you use the LightGBM parallelization and fit the models for the different objectives in serial fashion.s
LGBMRegressor – Machine learning models. You need to provide a list of iterables. One iterable per objective and every iterable contains three LGBMRegressors. The first one for the lower uncertainty limits, the middle one for the median and the last one for the upper limit. To create appropriate models, you need to use the quantile loss. If you want to parallelize training, we recommend that you use the LightGBM parallelization and fit the models for the different objectives in serial fashion.s
LGBMRegressor]] – Machine learning models. You need to provide a list of iterables. One iterable per objective and every iterable contains three LGBMRegressors. The first one for the lower uncertainty limits, the middle one for the median and the last one for the upper limit. To create appropriate models, you need to use the quantile loss. If you want to parallelize training, we recommend that you use the LightGBM parallelization and fit the models for the different objectives in serial fashion.s
ndim (int) – Number of objectives
epsilon (Union[list, float], optional) – Epsilon hyperparameter. Defaults to 0.01.
delta (float, optional) – Delta hyperparameter. Defaults to 0.05.
beta_scale (float, optional) – Scaling parameter for beta. If not equal to 1, the theoretical guarantees do not necessarily hold. Also note that the parametrization depends on the kernel type. Defaults to 1/9.
goals (List[str], optional) – If a list, provide “min” for every objective that shall be minimized and “max” for every objective that shall be maximized. Defaults to None, which means that the code maximizes all objectives.
coef_var_threshold (float, optional) – Use only points with a coefficient of variation below this threshold in the classification step. Defaults to 3.
interquartile_scaler (float, optional) – Used to convert the difference between the upper and lower quantile into a standard deviation. This, is std = (up-low)/interquartile_scaler. Defaults to 1.35, following Wan, X., Wang, W., Liu, J. et al. Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Med Res Methodol 14, 135 (2014). https://doi.org/10.1186/1471-2288-14-135
For GPR with GPFlow#
PAL using GPy GPR models
- class pyepal.pal.pal_gpflowgpr.PALGPflowGPR(*args, **kwargs)[source]#
Bases:
pyepal.pal.pal_base.PALBase
PAL class for a list of GPFlow GPR models, with one model per objective. Please consider that there are specific multioutput models (https://gpflow.readthedocs.io/en/master/notebooks/advanced/multioutput.html) for which the train and prediction function would need to be adjusted. You might also consider using streaming GPRs (https://github.com/thangbui/streaming_sparse_gp). In future releases we might support this case automatically (i.e., handle the case in which only one model is provided).
- __init__(*args, **kwargs)[source]#
Contruct the PALGPflowGPR instance
- Parameters
X_design (np.array) – Design space (feature matrix)
models (list) – Machine learning models
ndim (int) – Number of objectives
epsilon (Union[list, float], optional) – Epsilon hyperparameter. Defaults to 0.01.
delta (float, optional) – Delta hyperparameter. Defaults to 0.05.
beta_scale (float, optional) – Scaling parameter for beta. If not equal to 1, the theoretical guarantees do not necessarily hold. Also note that the parametrization depends on the kernel type. Defaults to 1/9.
goals (List[str], optional) – If a list, provide “min” for every objective that shall be minimized and “max” for every objective that shall be maximized. Defaults to None, which means that the code maximizes all objectives.
coef_var_threshold (float, optional) – Use only points with a coefficient of variation below this threshold in the classification step. Defaults to 3.
opt (function, optional) – Optimizer function for the GPR parameters. If None (default), then we will use ` gpflow.optimizers.Scipy()`
opt_kwargs (dict, optional) – Keyword arguments passed to the optimizer. If None, PyePAL will pass {“maxiter”: 100}
n_jobs (int) – Number of parallel threads that are used to fit the GPR models. Defaults to 1.
Schedules for hyperparameter optimization#
Provides some scheduling functions that can be used to implement the _should_optimize_hyperparameters function
- pyepal.pal.schedules.exp_decay(iteration, base=10)[source]#
Optimize hyperparameters at logartihmically spaced intervals
- Parameters
iteration (int) – current iteration
base (int, optional) – Base of the logarithm. Defaults to 10.
- Returns
True if iteration is on the log scaled grid
- Return type
bool
- pyepal.pal.schedules.linear(iteration, frequency=10)[source]#
Optimize hyperparameters at equally spaced intervals
- Parameters
iteration (int) – current iteration
frequency (int, optional) – Spacing between the True outputs. Defaults to 10.
- Returns
True if iteration can be divided by frequency without remainder
- Return type
bool
Utilities for multiobjective optimization#
Utilities for dealing with Pareto fronts in general
- pyepal.pal.utils.dominance_check(point1, point2)[source]#
One point dominates another if it is not worse in all objectives and strictly better in at least one. This here assumes we want to maximize
- Return type
bool
- pyepal.pal.utils.dominance_check_jitted(point, array)[source]#
Check if point dominates any point in array
- Return type
bool
- pyepal.pal.utils.dominance_check_jitted_2(array, point)[source]#
Check if any point in array dominates point
- Return type
bool
- pyepal.pal.utils.dominance_check_jitted_3(array, point, ignore_me)[source]#
Check if any point in array dominates point. ignore_me since numba does not understand masked arrays
- Return type
bool
- pyepal.pal.utils.exhaust_loop(palinstance, y, batch_size=1)[source]#
Helper function that takes an initialized PAL instance and loops the sampling until there is no unclassified point left. This is useful if all measurements are already taken and one wants to test the algorithm with different hyperparameters.
- Parameters
palinstance (PALBase) – A initialized instance of a class that inherited from PALBase and implemented the ._train() and ._predict() functions
y (np.array) – Measurements. The number of measurements must equal the number of points in the design space.
batch_size (int, optional) – Number of indices that will be returned. Defaults to 10.
- Returns
None. The PAL instance is updated in place
- pyepal.pal.utils.get_hypervolume(pareto_front, reference_vector, prefactor=- 1)[source]#
Compute the hypervolume indicator of a Pareto front I multiply it with minus one as we assume that we want to maximize all objective and then we calculate the area
f1 | |----| | -| | -| ———— f2
But the code we use for the hv indicator assumes that the reference vector is larger than all the points in the Pareto front. For this reason, we then flip all the signs using prefactor
This indicator is not needed for the epsilon-PAL algorithm itself but only to allow tracking a metric that might help the user to see if the algorithm converges.
- Return type
float
- pyepal.pal.utils.get_kmeans_samples(X, n_samples, **kwargs)[source]#
Get the samples that are closest to the k=n_samples centroids
- Parameters
X (np.array) – Feature array, on which the KMeans clustering is run
n_samples (int) – number of samples are should be selected
KMeans (**kwargs passed to the) –
- Returns
selected_indices
- Return type
np.array
- pyepal.pal.utils.get_maxmin_samples(X, n_samples, metric='euclidean', init='mean', seed=None, **kwargs)[source]#
Greedy maxmin sampling, also known as Kennard-Stone sampling (1). Note that a greedy sampling is not guaranteed to give the ideal solution and the output will depend on the random initialization (if this is chosen).
If you need a good solution, you can restart this algorithm multiple times with random initialization and different random seeds and use a coverage metric to quantify how well the space is covered. Some metrics are described in (2). In contrast to the code provided with (2) and (3) we do not consider the feature importance for the selection as this is typically not known beforehand.
You might want to standardize your data before applying this sampling function.
Some more sampling options are provided in our structure_comp (4) Python package. Also, this implementation here is quite memory hungry.
References: (1) Kennard, R. W.; Stone, L. A. Computer Aided Design of Experiments. Technometrics 1969, 11 (1), 137–148. https://doi.org/10.1080/00401706.1969.10490666. (2) Moosavi, S. M.; Nandy, A.; Jablonka, K. M.; Ongari, D.; Janet, J. P.; Boyd, P. G.; Lee, Y.; Smit, B.; Kulik, H. J. Understanding the Diversity of the Metal-Organic Framework Ecosystem. Nature Communications 2020, 11 (1), 4068. https://doi.org/10.1038/s41467-020-17755-8. (3) Moosavi, S. M.; Chidambaram, A.; Talirz, L.; Haranczyk, M.; Stylianou, K. C.; Smit, B. Capturing Chemical Intuition in Synthesis of Metal-Organic Frameworks. Nat Commun 2019, 10 (1), 539. https://doi.org/10.1038/s41467-019-08483-9. (4) https://github.com/kjappelbaum/structure_comp
- Parameters
X (np.array) – Feature array, this is the array that is used to perform the sampling
n_samples (int) – number of points that will be selected, needs to be lower than the length of X
metric (str, optional) – Distance metric to use for the maxmin calculation. Must be a valid option of scipy.spatial.distance.cdist (‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘wminkowski’, ‘yule’). Defaults to ‘euclidean’
init (str, optional) – either ‘mean’, ‘median’, or ‘random’. Determines how the initial point is chosen. Defaults to ‘center’
seed (int, optional) – seed for the random number generator. Defaults to None.
cdist (**kwargs passed to the) –
- Returns
selected_indices
- Return type
np.array
- pyepal.pal.utils.is_pareto_efficient(costs, return_mask=True)[source]#
Find the Pareto efficient points Based on https://stackoverflow.com/questions/ 32791911/fast-calculation-of-pareto-front-in-python
- Parameters
costs (np.array) – An (n_points, n_costs) array
return_mask (bool, optional) – True to return a mask, Otherwise it will be a (n_efficient_points, ) integer array of indices. Defaults to True.
- Returns
[description]
- Return type
np.array
Utilities for plotting#
Plotting utilities
- pyepal.plotting.plot_bar_iterations(pareto_optimal, non_pareto_points, unclassified_points, ax=None)[source]#
Plot stacked barplots for every step of the iteration.
- Parameters
pareto_optimal (np.ndarray) – Number of pareto optimal points for every iteration.
non_pareto_points (np.ndarray) – Number of discarded points for every iteration
unclassified_points (np.ndarray) – Number of unclassified points for every iteration
- Returns
- matplotlib axis (the same that was provided as input
or one from a new figure if no axis was provided)
- Return type
axis
- pyepal.plotting.plot_histogram(y, palinstance, ax=None)[source]#
Plot histograms, with maxima scaled to one and different categories indicated in color for one objective
- Parameters
y (np.ndarray) – objective (measurement)
palinstance (PALBase) – instance of a PAL class
ax (ax) – Matplotlib figure axis
- Returns
- matplotlib axis (the same that was provided as input
or one from a new figure if no axis was provided)ƒ
- Return type
ax
- pyepal.plotting.plot_jointplot(y, palinstance, labels=None, figsize=(8.0, 6.0))[source]#
Plot a jointplot of the objective space with histograms on the diagonal and 2D-Pareto plots on the off-diagonal.
- Parameters
y (np.array) – Two-dimensional array with the objectives (measurements)
palinstance (PALBase) – “trained” PAL instance
labels (Union[List[str], None], optional) – Labels for each objective. Defaults to “objective [index]”.
figsize (tuple, optional) – Figure size for joint plot. Defaults to (8.0, 6.0).
- Returns
matplotlib Figure object.
- Return type
fig
- pyepal.plotting.plot_pareto_front_2d(y_0, y_1, std_0, std_1, palinstance, ax=None)[source]#
Plot a 2D pareto front, with the different categories indicated in color.
- Parameters
y_0 (np.ndarray) – objective 0
y_1 (np.ndarray) – objective 1
std_0 (np.ndarray) – standard deviation objective 0
std_1 (np.ndarray) – standard deviation objective 0
palinstance (PALBase) – PAL instance
ax (axix, optional) – Matplotlib figure axis. Defaults to None.
- Returns
- matplotlib axis (the same that was provided as input
or one from a new figure if no axis was provided)
- Return type
ax
- pyepal.plotting.plot_residuals(y, palinstance, labels=None, figsize=(6.0, 4.0))[source]#
Plot signed residual (on y axis) vs fitted (on x axis) plot of sampled points. Will create suplots for y.ndim > 1.
- Parameters
y (np.array) – Two-dimensional array with the objectives (measurements)
palinstance (PALBase) – “trained” PAL instance
labels (Union[List[str], None], optional) – Labels for each objective. Defaults to “objective [index]”.
figsize (tuple, optional) – Figure size for each individual residual vs fitted objective plot. Defaults to (6.0, 4.0).
- Returns
matplotlib Figure object
- Return type
fig
Input validation#
Methods to validate inputs for the PAL classes
- pyepal.pal.validate_inputs.base_validate_models(models)[source]#
Currently no validation as the predict and train function are implemented independet of the base class
- Return type
list
- pyepal.pal.validate_inputs.validate_beta_scale(beta_scale)[source]#
- Parameters
beta_scale (Any) – scaling factor for beta
- Raises
ValueError – If beta is smaller than 0
- Returns
scaling factor for beta
- Return type
float
- pyepal.pal.validate_inputs.validate_coef_var(coef_var)[source]#
Make sure that the coef_var makes sense
- pyepal.pal.validate_inputs.validate_coregionalized_gpy(models)[source]#
Make sure that model is a coregionalized GPR model
- pyepal.pal.validate_inputs.validate_delta(delta)[source]#
Make sure that delta is in a reasonable range
- Parameters
delta (Any) – Delta hyperparameter
- Raises
ValueError – Delta must be in [0,1].
- Returns
delta
- Return type
float
- pyepal.pal.validate_inputs.validate_epsilon(epsilon, ndim)[source]#
Validate epsilon and return a np.array
- Parameters
epsilon (Any) – Epsilon hyperparameter
ndim (int) – Number of dimensions/objectives
- Raises
ValueError – If epsilon is a list there must be one float per dimension
ValueError – Epsilon must be in [0,1]
ValueError – If epsilon is an array there must be one float per dimension
- Returns
Array of one epsilon per objective
- Return type
np.ndarray
- pyepal.pal.validate_inputs.validate_gbdt_models(models, ndim)[source]#
Make sure that the number of iterables is equal to the number of objectives and that every iterable contains three LGBMRegressors. Also, we check that at least the first and last models use quantile loss
- Return type
List
[Iterable
]
- pyepal.pal.validate_inputs.validate_goals(goals, ndim)[source]#
- Create a valid array of goals. 1 for maximization, -1
for objectives that are to be minimized.
- Parameters
goals (Any) – List of goals, typically provideded as strings ‘max’ for maximization and ‘min’ for minimization
ndim (int) – number of dimensions
- Raises
ValueError – If goals is a list and the length is not equal to ndim
ValueError – If goals is a list and the elements are not strings ‘min’, ‘max’ or -1 and 1
- Returns
Array of -1 and 1
- Return type
np.ndarray
- pyepal.pal.validate_inputs.validate_gpy_model(models)[source]#
Make sure that all elements of the list a GPRegression models
- pyepal.pal.validate_inputs.validate_interquartile_scaler(interquartile_scaler)[source]#
Make sure that the interquartile_scaler makes sense
- Return type
float
- pyepal.pal.validate_inputs.validate_ndim(ndim)[source]#
Make sure that the number of dimensions makes sense
- Parameters
ndim (Any) – number of dimensions
- Raises
ValueError – If the number of dimensions is not an integer
ValueError – If the number of dimensions is not greater than 0
- Returns
the number of dimensions
- Return type
int
- pyepal.pal.validate_inputs.validate_njobs(njobs)[source]#
Make sure that njobs is an int > 1
- Return type
int
- pyepal.pal.validate_inputs.validate_nt_models(models, ndim)[source]#
Make sure that we can work with a sequence of
pyepal.pal.models.nt.NTModel()
- Return type
Sequence
- pyepal.pal.validate_inputs.validate_number_models(models, ndim)[source]#
Make sure that there are as many models as objectives
- Parameters
models (Any) – List of models
ndim (int) – Number of objectives
- Raises
ValueError – If the number of models does not equal the number of objectives
- pyepal.pal.validate_inputs.validate_optimizers(optimizers, ndim)[source]#
Make sure that we can work with a Sequence if JaxOptimizer
- Return type
Sequence
The models package#
Helper functions for GPR with GPy#
Wrappers for Gaussian Process Regression models.
We typically use the GPy package as it offers most flexibility for Gaussian processes in Python. Typically, we use automatic relevance determination (ARD), where one lengthscale parameter per input dimension is used.
If your task requires training on larger training sets, you might consider replacing the models with their sparse version but for the epsilon-PAL algorithm this typically shouldn’t be needed.
For kernel selection, you can have a look at https://www.cs.toronto.edu/~duvenaud/cookbook/ Matérn, RBF and RationalQuadrat are good quick and dirty solutions but have their caveats
- pyepal.models.gpr.build_coregionalized_model(X_train, y_train, kernel=None, w_rank=1, **kwargs)[source]#
Wrapper for building a coregionalized GPR, it will have as many outputs as y_train.shape[1]. Each output will have its own noise term
- Return type
GPCoregionalizedRegression
- pyepal.models.gpr.build_model(X_train, y_train, index=0, kernel=None, **kwargs)[source]#
Build a single-output GPR model
- Return type
GPRegression
- pyepal.models.gpr.get_matern_32_kernel(NFEAT, ARD=True, **kwargs)[source]#
Matern-3/2 kernel without ARD
- Return type
Matern32
- pyepal.models.gpr.get_matern_52_kernel(NFEAT, ARD=True, **kwargs)[source]#
Matern-5/2 kernel without ARD
- Return type
Matern52
- pyepal.models.gpr.get_ratquad_kernel(NFEAT, ARD=True, **kwargs)[source]#
Rational quadratic kernel without ARD
- Return type
RatQuad
- pyepal.models.gpr.predict(model, X)[source]#
Wrapper function for the prediction method of a GPy regression model. It return the standard deviation instead of the variance
- Return type
Tuple
[array
,array
]