-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addition of a subroutine to compute the median of array elements #426
Conversation
Untested suggestion: I wonder if the code duplication for the integer/real cases can be reduced by including the output type So here 'o1' is the type of The calculations seem to rely on constants (1. and 0.5) that have the right precision, but it looks like they could be defined as local parameters of type |
Another suggestion: From the sorting review, I understood that Considering that for median calculation both cases should be pretty common, I wonder if Perhaps the routine could have an optional argument |
Thank you @gareth-nx for your suggestions.
Great idea. I did it a bit differently that what you proposed. But it reduced the code a lot.
Indeed, I also think that
The API of |
In the case with program median_local
use stdlib_stats, only : median
use iso_c_binding, only : dp => C_DOUBLE
implicit none
real(dp) :: x(10), y(11)
integer :: i
x = (/(i*1.0_dp, i = 1, size(x))/)
y = (/(i*1.0_dp, i = 1, size(y))/)
! Should this be an error, because size(mask) != size(y)?
print*, median(y, mask= (x > 5.5_dp))
end program |
Good suggestion. I wonder what the intrinsic |
With GFortran, it seems that no checks are done with |
I would say such a check is light-weight and potentially saves a lot of
problems that would otherwise not be easily noticed. For the runtime
library of a compiler the situation may be different: the compiler knows
that extra checks are called for (debug, checks on array bounds and such),
but our code does not have that luxury. In debug builds you would see an
array bounds problem reported, not a mismatch in array arguments.
My preference would be to add the check ;), but others may disagree,
Op di 8 jun. 2021 om 19:44 schreef Jeremie Vandenplas <
***@***.***>:
… In the case with mask as an argument, consider checking whether the size
of the mask is equal to the size of x. Maybe this is not desired (to
avoid throwing errors etc). But results like the following could easily be
a user-error.
program median_local
use stdlib_stats, only : median
use iso_c_binding, only : dp => C_DOUBLE
implicit none
real(dp) :: x(10), y(11)
integer :: i
x = (/(i*1.0_dp, i = 1, size(x))/)
y = (/(i*1.0_dp, i = 1, size(y))/)
! Should this be an error, because size(mask) != size(y)?
print*, median(y, mask= (x > 5.5_dp))end program
Good suggestion. I wonder what the intrinsic sum reports (and what the
standard says) in such a case.
With GFortran, it seems that no checks are done with sum in a release
mode. In a debug mode, a runtime error is provided and mentioned that a
mismatch was found. I am in favor to keep the same behavior as with the
intrinsic sum implemented in gfortran (which is the case currently).
However, I open to implemeent a check if desired by the community.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#426 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAN6YR5T6J3QEWW26NJ45YDTRZJIPANCNFSM46DRBQGA>
.
|
@gareth-nx @arjenmarkus I added a check on the shapes of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed an error stop
that was inappropriate.
Hi @jvdp1 Is this ready for review? I will be happy to do that once it is (but I note the very top of the thread suggests you still need to update some hyperlinks). |
Hi @gareth-nx,
Thjank you. you may start the review. I will try to update the hyperlinks
for FORD later...
Le dim. 4 juil. 2021 à 02:27, gareth-nx ***@***.***> a écrit :
… Hi @jvdp1 <https://github.com/jvdp1>
Is this ready for review? I will be happy to do that once it is (but I
note the very top of the thread suggests you still need to update some
hyperlinks).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#426 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD5RO7H7HEZOOV42SZEEYFTTV6TIDANCNFSM46DRBQGA>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice -- a couple of minor comments for your consideration, but they should be easy to address.
Thank you @gareth-nx @leonfoks @ivan-pi @milancurcic for your review and comments. This whole discussion about selection algorithms led me to add some rules for cases where |
@ivan-pi can this PR be merged? |
n = size(x, kind=int64) | ||
c = floor( (n + 1) / 2._${o1}$, kind=int64 ) | ||
|
||
x_tmp = reshape(x, [n]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the reshape necessary here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I see this is to flatten the array. I guess pack
could also be used in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subroutine sort
only accepts rank-1 array, while median
support arrays of all ranks.
Do you have another suggestion to avoid the reshape
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, x_tmp = pack(x, .true.)
should work too. Do you think that pack
is more efficient than reshape
in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine that a good compiler would do the same thing. So it can remain as is.
call check( any(ieee_is_nan(median(d3, 3, .false.))), '${k1}$ median(d3, 3, .false.)' ) | ||
|
||
call check( abs(median(d1, 1) - 1.5_${k1}$) < ${k1}$tol, '${k1}$ median(d1, 1), even') | ||
call check( sum(abs(median(d2, 1) - [2._${k1}$, -4._${k1}$, 7._${k1}$, 1._${k1}$])) < ${k1}$tol, & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would using the array kind specifier help reduce the preprocessor noise?
[real(${k1}$) :: 2.0, -4.0, 7.0, 1.0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With gfortran
, this implies a conversion from real(4)
to real(8)
, with a warning triggered by -Wconversion-extra
.
For this reason, I am inclined to keep as it is now.
Co-authored-by: Ivan Pribec <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the links you provided the implementation looks good to me. The preprocessing work necessary to manage the different ranks is admirable.
The only comment I have left is how does the behavior compare to other languages when NaN values are present? The check at the beginning any(ieee_is_nan(x))
seems quite expensive, assuming that the majority of cases NaNs won't be there. But perhaps I am wrong and this is not a big issue.
The R interpreter gives NA if there are NA or NaN values present in the input vector:
|
Julia median returns NaN when NaN are present in the array (and as mentioned in @gareth-nx comment, it seems to be case for R too). My main issue was that the result of @ivan-pi I answered all your questions. However, a couple of comments remain opened. |
The open comments don't affect the behaviour, so with three approvals this can be merged. Thanks for your work.🙏 |
Thank you all, I'll merge. |
API can be already reviewed
Still to do:
sort
toord_sort
when Issue with stdlib_sorting #428 will be merged