-
Notifications
You must be signed in to change notification settings - Fork 62
Mixed Data ML #668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Mixed Data ML #668
Conversation
- Weight lp Distances - Added weight to traditional lp_distance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick review, without looking deep into the code.
Important:
- This PR mixes different features (style fixes, some different distances, means and stds, centering and standardization). Please try to separate each of them in its own PR if possible. This way it is possible for me to easily review and merge some parts as soon as they are ready while you improve the others in the meantime.
- Add docstrings (with formulas) to clarify things.
|
||
|
||
def duoble_mean(X: FData) -> FData: | ||
"""Compute the double mean of a FData object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move that to the next line.
"""Compute the double mean of a FData object. | |
""" | |
Compute the double mean of a FData object. |
@@ -361,3 +366,91 @@ def trim_mean( | |||
trimmed_curves = X[indices_descending_depth[:n_samples_to_keep]] | |||
|
|||
return trimmed_curves.mean() | |||
|
|||
|
|||
def duoble_mean(X: FData) -> FData: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo.
def duoble_mean(X: FData) -> FData: | |
def double_mean(X: FData) -> FData: |
@@ -361,3 +366,91 @@ def trim_mean( | |||
trimmed_curves = X[indices_descending_depth[:n_samples_to_keep]] | |||
|
|||
return trimmed_curves.mean() | |||
|
|||
|
|||
def duoble_mean(X: FData) -> FData: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to offer the double mean at all? I have only seen it used in the context of distance covariance/correlation, so I do not know how useful would that be.
individual_observation_means = average_function_value(X) | ||
if isinstance(X, FDataBasis): | ||
mean_function = X.mean() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say something is missing here, otherwise it will raise an exception because double_fdata
has no value.
return double_fdata | ||
|
||
def individual_observation_mean(X: FData) -> NDArrayFloat: | ||
"""Compute the grand mean of a FData object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong docstring?
return res # type: ignore[no-any-return] | ||
|
||
|
||
def weighted_lp_norm( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functional version not needed.
Callable[[GridPointsLike], NDArrayFloat] | float | None | ||
) = None, | ||
) -> None: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing docstring.
) = None, | ||
) -> None: | ||
|
||
# Checks that the lp normed is well defined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love if the code of Lp and this one could be merged together or simplified in some way.
return res[0] if len(res) == 1 else res | ||
|
||
|
||
def same_structure_and_data(df1: pd.DataFrame, df2: pd.DataFrame) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be documented and probably moved to the validation module.
metrics = metric.metrics if metric.metrics else [l2_distance] * n_cols | ||
weights = metric.weights if metric.weights else 1.0 | ||
|
||
if isinstance(metrics, Metric): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe all these checks are cleaner with a match sentence?
Describe the proposed changes
Metrics, representation and machine learning functionality related to Mixed Data
Checklist before requesting a review