ConstantKernel #2263

SebastianAment · 2024-03-25T14:27:29Z

Summary:

`ConstantKernel` vs `ConstantMean`

Summary: ConstantKernel promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current ConstantMean approach.

By default, GPyTorch, and BoTorch by extension currently uses a ConstantMean whose constant value is optimized during the optimization of the marginal likelihood.

This notebook compares the default approach with the approach of using a ConstantKernel instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant.

Pros and Cons: While the number of free parameters is the same for the ConstantMean and ConstantKernel approaches, the latter infers more information, more stably, That is the ConstantKernel approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. ConstantMean by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed Y values.
A current limitation of ConstantKernel is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator.

It is also notable that the LinearKernel currently does not support a constant offset (though the PolynomialKernel of degree 1 does), so the ConstantKernel can also be used to introduce the offset to linear GP models.

Differential Revision: D53666027

facebook-github-bot · 2024-03-25T14:27:38Z

This pull request was exported from Phabricator. Differential Revision: D53666027

Summary: # `ConstantKernel` vs `ConstantMean` **Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach. **By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood. **This notebook** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant. **Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably, That is the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values. A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator. It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models. Differential Revision: D53666027

facebook-github-bot · 2024-03-25T14:35:08Z

This pull request was exported from Phabricator. Differential Revision: D53666027

Summary: # `ConstantKernel` vs `ConstantMean` **Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach. **By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood. **This notebook** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant. **Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably, That is the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values. A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator. It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models. Differential Revision: D53666027

facebook-github-bot · 2024-03-25T14:36:01Z

This pull request was exported from Phabricator. Differential Revision: D53666027

codecov · 2024-03-25T14:37:00Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.97%. Comparing base (a39a3cd) to head (0f124b6).

❗ Current head 0f124b6 differs from pull request most recent head d3851ab. Consider uploading reports for the commit d3851ab to get more accurate results

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2263   +/-   ##
=======================================
  Coverage   99.97%   99.97%           
=======================================
  Files         197      198    +1     
  Lines       17148    17187   +39     
=======================================
+ Hits        17144    17183   +39     
  Misses          4        4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Summary: # `ConstantKernel` vs `ConstantMean` **Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach. **By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood. **[This notebook](N5131594)** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant. **Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably, That is the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values. A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator. It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models. Differential Revision: D53666027

facebook-github-bot · 2024-03-25T15:04:47Z

This pull request was exported from Phabricator. Differential Revision: D53666027

Summary: # `ConstantKernel` vs `ConstantMean` **Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach. **By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood. **[This notebook](N5131594)** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant. **Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably, That is the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values. A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator. It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models. Differential Revision: D53666027

facebook-github-bot · 2024-03-25T15:06:29Z

This pull request was exported from Phabricator. Differential Revision: D53666027

Summary: # `ConstantKernel` vs `ConstantMean` **Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach. **By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood. **[This notebook](N5131594)** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant. **Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably, That is the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values. A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator. It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models. Differential Revision: D53666027

facebook-github-bot · 2024-03-25T15:23:36Z

This pull request was exported from Phabricator. Differential Revision: D53666027

Summary: # `ConstantKernel` vs `ConstantMean` **Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach. **By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood. **[This notebook](N5131594)** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant. **Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably, That is the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values. A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator. It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models. Differential Revision: D53666027

facebook-github-bot · 2024-03-25T15:25:02Z

This pull request was exported from Phabricator. Differential Revision: D53666027

Summary: # `ConstantKernel` vs `ConstantMean` **Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach. **By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood. **[This notebook](N5131594)** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant. **Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably. That is, the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values. A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator. It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models. Differential Revision: D53666027

facebook-github-bot · 2024-03-25T16:12:28Z

This pull request was exported from Phabricator. Differential Revision: D53666027

Summary: # `ConstantKernel` vs `ConstantMean` **Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach. **By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood. **[This notebook](N5131594)** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant. **Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably. That is, the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values. A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator. It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models. Differential Revision: D53666027

facebook-github-bot · 2024-03-25T16:13:42Z

This pull request was exported from Phabricator. Differential Revision: D53666027

Summary: # `ConstantKernel` vs `ConstantMean` **Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach. **By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood. **[This notebook](N5131594)** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant. **Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably. That is, the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values. A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator. It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models. Differential Revision: D53666027

facebook-github-bot · 2024-03-25T17:21:52Z

This pull request was exported from Phabricator. Differential Revision: D53666027

Summary: # `ConstantKernel` vs `ConstantMean` **Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach. **By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood. **[This notebook](N5131594)** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant. **Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably. That is, the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values. A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator. It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models. Differential Revision: D53666027

facebook-github-bot · 2024-03-25T17:23:07Z

This pull request was exported from Phabricator. Differential Revision: D53666027

saitcakmak · 2024-04-29T19:49:28Z

merged upstream cornellius-gp/gpytorch#2511

facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Mar 25, 2024

facebook-github-bot added the fb-exported label Mar 25, 2024

SebastianAment force-pushed the export-D53666027 branch from fde40d7 to 9bb79fc Compare March 25, 2024 14:35

SebastianAment force-pushed the export-D53666027 branch from 9bb79fc to 5d798f9 Compare March 25, 2024 14:35

SebastianAment force-pushed the export-D53666027 branch from 5d798f9 to 3f63a5a Compare March 25, 2024 15:04

SebastianAment force-pushed the export-D53666027 branch from 3f63a5a to 7b44bec Compare March 25, 2024 15:06

SebastianAment force-pushed the export-D53666027 branch from 7b44bec to 1a2aecb Compare March 25, 2024 15:23

SebastianAment force-pushed the export-D53666027 branch from 1a2aecb to a8e96c0 Compare March 25, 2024 15:24

SebastianAment force-pushed the export-D53666027 branch from a8e96c0 to ad2b1f1 Compare March 25, 2024 16:12

SebastianAment force-pushed the export-D53666027 branch from ad2b1f1 to 93fe876 Compare March 25, 2024 16:13

SebastianAment force-pushed the export-D53666027 branch from 93fe876 to 0f124b6 Compare March 25, 2024 17:21

SebastianAment force-pushed the export-D53666027 branch from 0f124b6 to d3851ab Compare March 25, 2024 17:22

saitcakmak closed this Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ConstantKernel #2263

ConstantKernel #2263

SebastianAment commented Mar 25, 2024

facebook-github-bot commented Mar 25, 2024

facebook-github-bot commented Mar 25, 2024

facebook-github-bot commented Mar 25, 2024

codecov bot commented Mar 25, 2024 •

edited

Loading

facebook-github-bot commented Mar 25, 2024

facebook-github-bot commented Mar 25, 2024

facebook-github-bot commented Mar 25, 2024

facebook-github-bot commented Mar 25, 2024

facebook-github-bot commented Mar 25, 2024

facebook-github-bot commented Mar 25, 2024

facebook-github-bot commented Mar 25, 2024

facebook-github-bot commented Mar 25, 2024

saitcakmak commented Apr 29, 2024

ConstantKernel #2263

ConstantKernel #2263

Conversation

SebastianAment commented Mar 25, 2024

ConstantKernel vs ConstantMean

facebook-github-bot commented Mar 25, 2024

facebook-github-bot commented Mar 25, 2024

facebook-github-bot commented Mar 25, 2024

codecov bot commented Mar 25, 2024 • edited Loading

Codecov Report

facebook-github-bot commented Mar 25, 2024

facebook-github-bot commented Mar 25, 2024

facebook-github-bot commented Mar 25, 2024

facebook-github-bot commented Mar 25, 2024

facebook-github-bot commented Mar 25, 2024

facebook-github-bot commented Mar 25, 2024

facebook-github-bot commented Mar 25, 2024

facebook-github-bot commented Mar 25, 2024

saitcakmak commented Apr 29, 2024

`ConstantKernel` vs `ConstantMean`

codecov bot commented Mar 25, 2024 •

edited

Loading