-
Notifications
You must be signed in to change notification settings - Fork 419
ConstantKernel #2263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ConstantKernel #2263
Conversation
This pull request was exported from Phabricator. Differential Revision: D53666027 |
Summary: # `ConstantKernel` vs `ConstantMean` **Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach. **By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood. **This notebook** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant. **Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably, That is the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values. A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator. It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models. Differential Revision: D53666027
fde40d7
to
9bb79fc
Compare
This pull request was exported from Phabricator. Differential Revision: D53666027 |
Summary: # `ConstantKernel` vs `ConstantMean` **Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach. **By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood. **This notebook** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant. **Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably, That is the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values. A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator. It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models. Differential Revision: D53666027
9bb79fc
to
5d798f9
Compare
This pull request was exported from Phabricator. Differential Revision: D53666027 |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2263 +/- ##
=======================================
Coverage 99.97% 99.97%
=======================================
Files 197 198 +1
Lines 17148 17187 +39
=======================================
+ Hits 17144 17183 +39
Misses 4 4 ☔ View full report in Codecov by Sentry. |
Summary: # `ConstantKernel` vs `ConstantMean` **Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach. **By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood. **[This notebook](N5131594)** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant. **Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably, That is the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values. A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator. It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models. Differential Revision: D53666027
5d798f9
to
3f63a5a
Compare
This pull request was exported from Phabricator. Differential Revision: D53666027 |
Summary: # `ConstantKernel` vs `ConstantMean` **Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach. **By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood. **[This notebook](N5131594)** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant. **Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably, That is the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values. A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator. It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models. Differential Revision: D53666027
3f63a5a
to
7b44bec
Compare
This pull request was exported from Phabricator. Differential Revision: D53666027 |
Summary: # `ConstantKernel` vs `ConstantMean` **Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach. **By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood. **[This notebook](N5131594)** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant. **Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably, That is the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values. A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator. It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models. Differential Revision: D53666027
7b44bec
to
1a2aecb
Compare
This pull request was exported from Phabricator. Differential Revision: D53666027 |
Summary: # `ConstantKernel` vs `ConstantMean` **Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach. **By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood. **[This notebook](N5131594)** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant. **Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably, That is the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values. A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator. It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models. Differential Revision: D53666027
1a2aecb
to
a8e96c0
Compare
This pull request was exported from Phabricator. Differential Revision: D53666027 |
Summary: # `ConstantKernel` vs `ConstantMean` **Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach. **By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood. **[This notebook](N5131594)** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant. **Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably. That is, the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values. A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator. It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models. Differential Revision: D53666027
a8e96c0
to
ad2b1f1
Compare
This pull request was exported from Phabricator. Differential Revision: D53666027 |
Summary: # `ConstantKernel` vs `ConstantMean` **Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach. **By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood. **[This notebook](N5131594)** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant. **Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably. That is, the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values. A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator. It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models. Differential Revision: D53666027
ad2b1f1
to
93fe876
Compare
This pull request was exported from Phabricator. Differential Revision: D53666027 |
Summary: # `ConstantKernel` vs `ConstantMean` **Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach. **By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood. **[This notebook](N5131594)** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant. **Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably. That is, the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values. A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator. It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models. Differential Revision: D53666027
93fe876
to
0f124b6
Compare
This pull request was exported from Phabricator. Differential Revision: D53666027 |
Summary: # `ConstantKernel` vs `ConstantMean` **Summary**: `ConstantKernel` promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the current `ConstantMean` approach. **By default**, GPyTorch, and BoTorch by extension currently uses a `ConstantMean` whose constant value is optimized during the optimization of the marginal likelihood. **[This notebook](N5131594)** compares the default approach with the approach of using a `ConstantKernel` instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant. **Pros and Cons**: While the number of free parameters is the same for the `ConstantMean` and `ConstantKernel` approaches, the latter infers more information, more stably. That is, the `ConstantKernel` approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach. `ConstantMean` by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observed `Y` values. A current limitation of `ConstantKernel` is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator. It is also notable that the `LinearKernel` currently does not support a constant offset (though the `PolynomialKernel` of degree 1 does), so the `ConstantKernel` can also be used to introduce the offset to linear GP models. Differential Revision: D53666027
0f124b6
to
d3851ab
Compare
This pull request was exported from Phabricator. Differential Revision: D53666027 |
merged upstream cornellius-gp/gpytorch#2511 |
Summary:
ConstantKernel
vsConstantMean
Summary:
ConstantKernel
promises to more stably infer the constant that the default GP converges to as it moves away from the training data, than the currentConstantMean
approach.By default, GPyTorch, and BoTorch by extension currently uses a
ConstantMean
whose constant value is optimized during the optimization of the marginal likelihood.This notebook compares the default approach with the approach of using a
ConstantKernel
instead, which represents the prior variance of an unknown constant value. In this case, the prior variance of the constant is optimized during the hyper-parameter optimization stage, but the actual value of the constant is computed afterward using the standard linear algebraic approach for the computation of Gaussian process posteriors. Therefore, the inference of the constant value is likely more stable with the kernel approach, in addition to including the uncertainty quantification of the constant.Pros and Cons: While the number of free parameters is the same for the
ConstantMean
andConstantKernel
approaches, the latter infers more information, more stably, That is theConstantKernel
approach infers the prior variance in our belief of the constant and the inference of the posterior mean is done in closed form through the standard linear algebraic approach.ConstantMean
by constrast infers the value of the constant using numerical optimization, which can be fickle, at times giving rise to seemingly non-sensical constants outside of the range of the observedY
values.A current limitation of
ConstantKernel
is that it allocates another intermediate Tensor for the constant kernel matrix, which is wasteful. A more efficient implementation would use a low-rank, or best, a lazily evaluated constant linear operator.It is also notable that the
LinearKernel
currently does not support a constant offset (though thePolynomialKernel
of degree 1 does), so theConstantKernel
can also be used to introduce the offset to linear GP models.Differential Revision: D53666027