Skip to content

Update mean and sum functions #643

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: develop
Choose a base branch
from

Conversation

aleexarias
Copy link
Contributor

@aleexarias aleexarias commented Jan 13, 2025

Update mean and sum functions for FData, FDataGrid, FDataIrregular and FDataBasis to correctly handle NaN values in coefficients.

Fixes #642

Describe the proposed changes

Edit the mean function from FData so that it only becomes a parameter check, leaving the checks as it is.
Add an auxiliar function in FDataGrid that works for mean, sum and var, and simply calls the relevant np.sum/nansum, mean/nanmean, var/nanvar when relevant depending on the skipna parameter, have the mean and sum function work with this auxiliar function.
Add a mean function in FDataBasis that calculates the means for the coefficients when the functions have no nan values in the coefficients, otherwise it is not considered for the calculations.
Add a mean function in FDataIrregular that calculates the mean based on the mean_counts parameter and depending on skipna or not.

  • I have performed a self-review of my code
  • The code conforms to the style used in this package
  • The code is fully documented and typed (type-checked with Mypy)
  • I have added thorough tests for the new/changed functionality

@vnmabus vnmabus changed the title Update mean an sum functions Update mean and sum functions Feb 14, 2025
if skipna:
count_values = np.sum(~np.isnan(common_values), axis=0)
else:
count_values = np.full(sum_values.shape, self.n_samples)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this just self.n_samples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To operate with sum_values, it is needed in array form to fit seamlessly with the flow of the case where skipna is specified

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really? I think NumPy's broadcasting would handle it just fine. Or am I wrong in that?

out: None = None,
keepdims: bool = False,
skipna: bool = False,
min_count: int = 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that min_count is not being used here. Why is that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is left for compatibility with the mean functions of FDataIrregular and Grid, but it does not make sense to use it, as you do not have measurements for each observation, but simply the observations approximated by functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would at least make sense in a global level (if some curves are NaN because they were not measured). Of course, if you only have a FDataBasis that does not make much sense, but it does if the FDataBasis is just a column among many in a DataFrame, and it was not measured in some cases.

aleexarias and others added 7 commits March 1, 2025 14:06
…correctly handle NaN values in coefficients

irreg

Updated mean an sum functions for FData, FDataGrid, FDataBasis and FDataIrregular to correctly handle NaN values in coefficients
@@ -595,15 +595,6 @@
"contributions": [
"doc"
]
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you removing a contributor??

CONTRIBUTORS.md Outdated
@@ -1,6 +1,6 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to contributors should be made using the bot, please remove these files from the PR.

skipna=skipna,
min_count=min_count,
)
/ np.sum(~self.isna()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the trailing comma. Otherwise, you are returning a tuple and the tests fail:

  • This is a number: (5)
  • This is a tuple: (5,)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error in calculating the mean values in the FData object
2 participants