-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: loadtxt skiprows and max_rows #652
Feature: loadtxt skiprows and max_rows #652
Conversation
src/stdlib_io.fypp
Outdated
nrow = number_of_rows(s) | ||
nrow = number_of_rows(s) - skiprows_ | ||
|
||
if ( nrow < 0 ) call error_stop("loadtxt: skipping more rows than present.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this really be a fatal error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also cut off nrow
at 0, which would result in an empty array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But numpy also throws an error for skiprows > nrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gfortran also crashes when trying to read after EOF. So having a meaningful error message is preferable IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using iostat
and iomsg
instead of this check, and returning an appropriate message or an zero-size array when appropriate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I am not familiar with iostat
and iomsg
. This would only affect the following loop, right?
do i = 1, skiprows_
read(s, *)
end do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. 👍🏻 I'd invite other reviewers to voice their opinions if the skipping more rows than present should be an error or not.
Co-authored-by: Ivan Pribec <[email protected]>
@ivan-pi thanks for the review. I just added |
Question from my side: Do we want to mimic the numpy behavior and signature? |
Just a comment about error handling: there is still an open issue about it #224 (it's worth reading) Since this function is dealing with the system (reading a file) this may fail (as both noted) and the best approach so far IMHO it is how Indeed imagine a user building a GUI application where a fatal error would close the entire program (and loss of progress). Of course, we could add some boilerplate to I hope it helps. |
@14NGiestas then the suggestion with cutting of |
Yep, at first glance your suggestion is ok (just make sure it is mentioned in the docs and examples) |
I believe the specs should also be updated at specs/stdlib_io |
Oh. Why are there two files for the same docs?! |
Co-authored-by: Ian Giestas Pauli <[email protected]>
@MuellerSeb the stdlib project has this little quirk where it mediates, in a manner, future standardized Fortran features.
It will end up being the "same" but it serves a different purpose. This PR is a kinda undefined behavior AFAIK (since it updates a spec) and maybe we should discuss more. I'm reviewing in a good faith that is useful to have such options available but the workflow doesn't seem to be ready for this specific situation. |
I would say that it is labelled as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me. Returning a zero-array is fine IMO. Thank you.
doc/specs/stdlib_io.md
Outdated
|
||
### Arguments | ||
|
||
`filename`: Shall be a character expression containing the file name from which to load the rank-2 `array`. | ||
|
||
`array`: Shall be an allocatable rank-2 array of type `real`, `complex` or `integer`. | ||
|
||
`skiprows` (optional): Skip the first `skiprows` lines. If skipping more rows than present, a 0-sized array will be returned. The default is 0. | ||
|
||
`max_rows` (optional): Read `max_rows` lines of content after `skiprows` lines. The default is -1, to read all the lines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about with a negative number (other than 0), or a 0 value provided for max_rows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every negative number will be interpreted as: read all lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be mentioned in the specs (instead of only the value -1
).
@jvdp1 another option for negative values for |
In this case, I would suggest to define a select case(max_rows_)
case( 0 )
....
case( : -1 )
.... = real_all
case default
....
end select A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MuellerSeb and reviewers. I made suggestions to the spec and docstrings. Anything else needed here before we merge?
Co-authored-by: Milan Curcic <[email protected]>
Co-authored-by: Milan Curcic <[email protected]>
Hey @milancurcic, thanks for your cleanup. There is still the comment from @jvdp1 about the Beside that I just noticed, that if you use nrow = number_of_rows(s) I think that is bad. |
Do you mean number of rows from the first line in the file? And if yes, I don't understand why is that not what we want. I'm not familiar with the npy format, maybe this is something specific to that. |
Sorry I meant columns: ! determine number of columns
ncol = number_of_columns(s) Just copied the wrong line. If you have a file like this: comment
0.1 0.2
0.3 0.4 And read it with call loadtxt("log.txt", data, skiprows=1) It will still think, that there is only one column, since the first line has only one ( |
I see, that makes total sense. I agree that won't work. What do you think about this approach:
|
Thought of almost the same. We could also just add a do i = 1, skiprows_
read(s, *)
end do |
@MuellerSeb nice, I like that even better. |
If everyone is fine with the mechanism for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice catch and solution for number_of_columns
!
Thanks all, will merge. |
Hey there,
I needed to skip rows with loadtxt, so I added this feature similar to the numpy routine. In addition, I also added
max_rows
.Added optional arguments:
skiprows
: skipping rows at the beginning of the file (default 0)max_rows
: maximum for read lines (default -1 for all lines)Argument names are taken from
numpy.loadtxt
. All negative values formax_rows
are interpreted as "all lines".