Skip to content

ConstantKernel #2263

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

SebastianAment
Copy link
Contributor

Summary:

ConstantKernel vs ConstantMean

Summary: ConstantKernel promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current ConstantMean approach.

By default, GPyTorch, and BoTorch by extension currently uses a ConstantMean whose constant value is optimized during the optimization of the marginal likelihood.

This notebook compares the default approach with the approach of using a ConstantKernel instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant.

Pros and Cons: While the number of free parameters is the same for the ConstantMean and ConstantKernel approaches, the latter infers more information, more stably, That is the ConstantKernel approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. ConstantMean by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed Y values.
A current limitation of ConstantKernel is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator.

It is also notable that the LinearKernel currently does not support a constant offset (though the PolynomialKernel of degree 1 does), so the ConstantKernel can also be used to introduce the offset to linear GP models.

Differential Revision: D53666027

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Mar 25, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D53666027

SebastianAment added a commit to SebastianAment/botorch that referenced this pull request Mar 25, 2024
Summary:

# `ConstantKernel` vs `ConstantMean`

**Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach.

**By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood.

**This notebook** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant.

**Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably, That is the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values.
A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator.

It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models.

Differential Revision: D53666027
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D53666027

SebastianAment added a commit to SebastianAment/botorch that referenced this pull request Mar 25, 2024
Summary:

# `ConstantKernel` vs `ConstantMean`

**Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach.

**By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood.

**This notebook** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant.

**Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably, That is the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values.
A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator.

It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models.

Differential Revision: D53666027
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D53666027

Copy link

codecov bot commented Mar 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.97%. Comparing base (a39a3cd) to head (0f124b6).

❗ Current head 0f124b6 differs from pull request most recent head d3851ab. Consider uploading reports for the commit d3851ab to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2263   +/-   ##
=======================================
  Coverage   99.97%   99.97%           
=======================================
  Files         197      198    +1     
  Lines       17148    17187   +39     
=======================================
+ Hits        17144    17183   +39     
  Misses          4        4           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

SebastianAment added a commit to SebastianAment/botorch that referenced this pull request Mar 25, 2024
Summary:

# `ConstantKernel` vs `ConstantMean`

**Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach.

**By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood.

**[This notebook](N5131594)** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant.

**Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably, That is the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values.
A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator.

It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models.

Differential Revision: D53666027
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D53666027

SebastianAment added a commit to SebastianAment/botorch that referenced this pull request Mar 25, 2024
Summary:

# `ConstantKernel` vs `ConstantMean`

**Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach.

**By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood.

**[This notebook](N5131594)** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant.

**Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably, That is the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values.
A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator.

It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models.

Differential Revision: D53666027
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D53666027

SebastianAment added a commit to SebastianAment/botorch that referenced this pull request Mar 25, 2024
Summary:

# `ConstantKernel` vs `ConstantMean`

**Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach.

**By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood.

**[This notebook](N5131594)** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant.

**Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably, That is the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values.
A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator.

It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models.

Differential Revision: D53666027
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D53666027

SebastianAment added a commit to SebastianAment/botorch that referenced this pull request Mar 25, 2024
Summary:

# `ConstantKernel` vs `ConstantMean`

**Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach.

**By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood.

**[This notebook](N5131594)** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant.

**Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably, That is the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values.
A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator.

It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models.

Differential Revision: D53666027
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D53666027

SebastianAment added a commit to SebastianAment/botorch that referenced this pull request Mar 25, 2024
Summary:

# `ConstantKernel` vs `ConstantMean`

**Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach.

**By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood.

**[This notebook](N5131594)** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant.

**Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably. That is, the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values.
A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator.

It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models.

Differential Revision: D53666027
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D53666027

SebastianAment added a commit to SebastianAment/botorch that referenced this pull request Mar 25, 2024
Summary:

# `ConstantKernel` vs `ConstantMean`

**Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach.

**By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood.

**[This notebook](N5131594)** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant.

**Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably. That is, the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values.
A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator.

It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models.

Differential Revision: D53666027
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D53666027

SebastianAment added a commit to SebastianAment/botorch that referenced this pull request Mar 25, 2024
Summary:

# `ConstantKernel` vs `ConstantMean`

**Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach.

**By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood.

**[This notebook](N5131594)** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant.

**Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably. That is, the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values.
A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator.

It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models.

Differential Revision: D53666027
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D53666027

Summary:

# `ConstantKernel` vs `ConstantMean`

**Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach.

**By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood.

**[This notebook](N5131594)** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant.

**Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably. That is, the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values.
A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator.

It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models.

Differential Revision: D53666027
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D53666027

@saitcakmak
Copy link
Contributor

merged upstream cornellius-gp/gpytorch#2511

@saitcakmak saitcakmak closed this Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants