Mixed Data ML #668

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

luisheb wants to merge 17 commits into GAA-UAM:develop from luisheb:mixed-data-ml

Contributor

luisheb commented Apr 17, 2025

Describe the proposed changes

Metrics, representation and machine learning functionality related to Mixed Data

Checklist before requesting a review

I have performed a self-review of my code
The code conforms to the style used in this package
The code is fully documented and typed (type-checked with Mypy)
I have added thorough tests for the new/changed functionality

Luis Hebrero added 17 commits

April 15, 2025 17:33


          Notation Correction

d8efa96


          - BasisBased Distances

aa65aca

- Weight lp Distances
- Added weight to traditional lp_distance


          pruebas notebook

71a4359


          test file (will be deleted)

0b8a5d8


          Removed unnecessary file

14332d5


          lp_norm final

6f9bdfe


          Metrics on the way

cc586c4


          Scaler y casos de uso modificados

d03aae8


          distances

2ea56c7


          Accesor Modified

da38368


          Ruff correction

a67acf5


          ruff corrections: fix typos

3d27684


          Ruff issues corrected

5d9e05f


          Error correction and Mixed Data Functionality

8ae4c28


          MixedDataPlot

441fd05


          Remove DataFrame Accessor

6af9654


          Remove notebooks

bcac246

luisheb changed the title ~~Mixed data ml~~ Mixed Data ML

vnmabus requested changes

View reviewed changes

Member

vnmabus left a comment •

edited

Loading

Just a quick review, without looking deep into the code.

Important:

This PR mixes different features (style fixes, some different distances, means and stds, centering and standardization). Please try to separate each of them in its own PR if possible. This way it is possible for me to easily review and merge some parts as soon as they are ready while you improve the others in the meantime.
Add docstrings (with formulas) to clarify things.

skfda/exploratory/stats/_stats.py



		def duoble_mean(X: FData) -> FData:
		"""Compute the double mean of a FData object.

Member

vnmabus Apr 21, 2025

Move that to the next line.

Suggested change

      
                """Compute the double mean of a FData object.
          
                """
          
                Compute the double mean of a FData object.

skfda/exploratory/stats/_stats.py

@@ @@ -361,3 +366,91 @@ def trim_mean( @@
                   trimmed_curves = X[indices_descending_depth[:n_samples_to_keep]]
                   return trimmed_curves.mean()
+              def duoble_mean(X: FData) -> FData:

Member

vnmabus Apr 21, 2025

Typo.

Suggested change

      
            def duoble_mean(X: FData) -> FData:
          
            def double_mean(X: FData) -> FData:

skfda/exploratory/stats/_stats.py

@@ @@ -361,3 +366,91 @@ def trim_mean( @@
                   trimmed_curves = X[indices_descending_depth[:n_samples_to_keep]]
                   return trimmed_curves.mean()
+              def duoble_mean(X: FData) -> FData:

Member

vnmabus Apr 21, 2025

Do we want to offer the double mean at all? I have only seen it used in the context of distance covariance/correlation, so I do not know how useful would that be.

skfda/exploratory/stats/_stats.py

+                  individual_observation_means = average_function_value(X)
+                  if isinstance(X, FDataBasis):
+                      mean_function = X.mean()

Member

vnmabus Apr 21, 2025

I would say something is missing here, otherwise it will raise an exception because double_fdata has no value.

skfda/exploratory/stats/_stats.py

+                  return double_fdata
+              def individual_observation_mean(X: FData) -> NDArrayFloat:
+                  """Compute the grand mean of a FData object.

Member

vnmabus Apr 21, 2025

Wrong docstring?

skfda/misc/metrics/_weighted_lp_norm.py

		return res # type: ignore[no-any-return]


		def weighted_lp_norm(

Member

vnmabus Apr 21, 2025

Functional version not needed.

skfda/misc/metrics/_weighted_lp_norm.py

+                          Callable[[GridPointsLike], NDArrayFloat] | float | None
+                      ) = None,
+                  ) -> None:

Member

vnmabus Apr 21, 2025

Missing docstring.

skfda/misc/metrics/_weighted_lp_norm.py

+                      ) = None,
+                  ) -> None:
+                      # Checks that the lp normed is well defined

Member

vnmabus Apr 21, 2025

I would love if the code of Lp and this one could be merged together or simplified in some way.

skfda/misc/metrics/_pproduct_metric.py

		return res[0] if len(res) == 1 else res


		def same_structure_and_data(df1: pd.DataFrame, df2: pd.DataFrame) -> bool:

Member

vnmabus Apr 21, 2025

This should be documented and probably moved to the validation module.

skfda/misc/metrics/_pproduct_metric.py

+                  metrics = metric.metrics if metric.metrics else [l2_distance] * n_cols
+                  weights = metric.weights if metric.weights else 1.0
+                  if isinstance(metrics, Metric):

Member

vnmabus Apr 21, 2025

Maybe all these checks are cleaner with a match sentence?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet