Feature: loadtxt skiprows and max_rows #652

MuellerSeb · 2022-04-19T11:15:49Z

Hey there,

I needed to skip rows with loadtxt, so I added this feature similar to the numpy routine. In addition, I also added max_rows.

Added optional arguments:

skiprows: skipping rows at the beginning of the file (default 0)
max_rows: maximum for read lines (default -1 for all lines)

Argument names are taken from numpy.loadtxt. All negative values for max_rows are interpreted as "all lines".

ivan-pi · 2022-04-19T19:47:10Z

src/stdlib_io.fypp

-      nrow = number_of_rows(s)
+      nrow = number_of_rows(s) - skiprows_
+
+      if ( nrow < 0 ) call error_stop("loadtxt: skipping more rows than present.")


Should this really be a fatal error?

We could also cut off nrow at 0, which would result in an empty array.

But numpy also throws an error for skiprows > nrow.

gfortran also crashes when trying to read after EOF. So having a meaningful error message is preferable IMO.

Why not using iostat and iomsg instead of this check, and returning an appropriate message or an zero-size array when appropriate?

Unfortunately, I am not familiar with iostat and iomsg. This would only affect the following loop, right?

do i = 1, skiprows_ read(s, *) end do

src/stdlib_io.fypp

ivan-pi

Looks good to me. 👍🏻 I'd invite other reviewers to voice their opinions if the skipping more rows than present should be an error or not.

Co-authored-by: Ivan Pribec <[email protected]>

MuellerSeb · 2022-04-19T20:38:27Z

@ivan-pi thanks for the review. I just added max_rows since it would make sense to directly add it. Hope this is ok.

MuellerSeb · 2022-04-19T20:47:40Z

Question from my side: Do we want to mimic the numpy behavior and signature?

src/stdlib_io.fypp

14NGiestas · 2022-04-27T18:57:21Z

Just a comment about error handling: there is still an open issue about it #224 (it's worth reading)

Since this function is dealing with the system (reading a file) this may fail (as both noted) and the best approach so far IMHO it is how stdlib_bitsets by wclodius2 module deals with errors: have an optional status and let the user decides if an error should terminate the application or not.

Indeed imagine a user building a GUI application where a fatal error would close the entire program (and loss of progress). Of course, we could add some boilerplate to inquire about the file's existence first (on the end-user side) but it defeats the purpose of using/having a helper function such as loadtxt in the first place.

I hope it helps.

MuellerSeb · 2022-04-28T13:49:50Z

@14NGiestas then the suggestion with cutting of skiprows_ should be fine for this pull request, since it doesn't introduce any new error behavior. If everybody is fine with it, I would commit my suggestion for this.

14NGiestas · 2022-04-28T20:25:03Z

@14NGiestas then the suggestion with cutting of skiprows_ should be fine for this pull request, since it doesn't introduce any new error behavior. If everybody is fine with it, I would commit my suggestion for this.

Yep, at first glance your suggestion is ok (just make sure it is mentioned in the docs and examples)
If I recall correctly returning a zero-sized array is a behavior used in stdlib_math linspace and logspace.

src/stdlib_io.fypp

14NGiestas · 2022-05-05T12:06:59Z

I believe the specs should also be updated at specs/stdlib_io

src/tests/io/test_loadtxt.f90

MuellerSeb · 2022-05-05T13:59:47Z

I believe the specs should also be updated at specs/stdlib_io

Oh. Why are there two files for the same docs?!

Co-authored-by: Ian Giestas Pauli <[email protected]>

14NGiestas · 2022-05-05T15:26:20Z

@MuellerSeb the stdlib project has this little quirk where it mediates, in a manner, future standardized Fortran features.

The specification is a document that describes the API and the functionality, so that anyone can use it to create an implementation from scratch without looking at stdlib. The stdlib library then provides the reference implementation.

It will end up being the "same" but it serves a different purpose. This PR is a kinda undefined behavior AFAIK (since it updates a spec) and maybe we should discuss more. I'm reviewing in a good faith that is useful to have such options available but the workflow doesn't seem to be ready for this specific situation.

jvdp1 · 2022-05-08T19:25:17Z

It will end up being the "same" but it serves a different purpose. This PR is a kinda undefined behavior AFAIK (since it updates a spec) and maybe we should discuss more. I'm reviewing in a good faith that is useful to have such options available but the workflow doesn't seem to be ready for this specific situation.

I would say that it is labelled as experimental. So changing the API is ok at this stage, especially that it is also backward-compatible.

jvdp1

Changes look good to me. Returning a zero-array is fine IMO. Thank you.

src/stdlib_io.fypp

jvdp1 · 2022-05-08T19:21:13Z

doc/specs/stdlib_io.md


 ### Arguments

 `filename`: Shall be  a character expression containing the file name from which to load the rank-2 `array`.

 `array`: Shall be an allocatable rank-2 array of type `real`, `complex` or `integer`.

+`skiprows` (optional): Skip the first `skiprows` lines. If skipping more rows than present, a 0-sized array will be returned. The default is 0.
+
+`max_rows` (optional): Read `max_rows` lines of content after `skiprows` lines. The default is -1, to read all the lines.


What about with a negative number (other than 0), or a 0 value provided for max_rows?

Every negative number will be interpreted as: read all lines.

This should be mentioned in the specs (instead of only the value -1).

MuellerSeb · 2022-05-09T08:31:55Z

@jvdp1 another option for negative values for max_rows could be to to interpret them as wrap-around values. Meaning, -1 will be the full file, -2 will leave out the last line and so on. We would just need to find a solution for too large negative numbers. It should again be cut-off at 0 read lines.

jvdp1 · 2022-05-09T10:20:37Z

@jvdp1 another option for negative values for max_rows could be to to interpret them as wrap-around values. Meaning, -1 will be the full file, -2 will leave out the last line and so on. We would just need to find a solution for to big negative numbers. It should again be cut-off at 0 read lines.

In this case, I would suggest to define a parameter integer, e.g., read_all, which would be available to the user, and then to add a procedure for checking the provided value, e.g.

select case(max_rows_)
 case( 0 )
  ....
 case( : -1 )
   .... = real_all
 case default
  ....
end select

A parameter integer would provide the flexibility for the API (as the user doesn't need to rely on a specific number)

milancurcic

Thanks @MuellerSeb and reviewers. I made suggestions to the spec and docstrings. Anything else needed here before we merge?

doc/specs/stdlib_io.md

src/stdlib_io.fypp

Co-authored-by: Milan Curcic <[email protected]>

MuellerSeb · 2022-06-13T12:39:15Z

Hey @milancurcic, thanks for your cleanup. There is still the comment from @jvdp1 about the -1 value for max_rows and the discussion, if other negative values should skip lines at the end.

Beside that I just noticed, that if you use skiprows to skip comments in the beginning for example, the following line will still try to get the number of columns from the first line in the file:

nrow = number_of_rows(s)

I think that is bad.

milancurcic · 2022-06-13T13:07:22Z

Beside that I just noticed, that if you use skiprows to skip comments in the beginning for example, the following line will still try to get the number of columns from the first line in the file:
nrow = number_of_rows(s)
I think that is bad.

Do you mean number of rows from the first line in the file? And if yes, I don't understand why is that not what we want. I'm not familiar with the npy format, maybe this is something specific to that.

MuellerSeb · 2022-06-13T13:18:30Z

Beside that I just noticed, that if you use skiprows to skip comments in the beginning for example, the following line will still try to get the number of columns from the first line in the file:
nrow = number_of_rows(s)
I think that is bad.
Do you mean number of rows from the first line in the file? And if yes, I don't understand why is that not what we want. I'm not familiar with the npy format, maybe this is something specific to that.

Sorry I meant columns:

      ! determine number of columns
      ncol = number_of_columns(s)

Just copied the wrong line.

If you have a file like this:

comment
0.1 0.2
0.3 0.4

And read it with

call loadtxt("log.txt", data, skiprows=1)

It will still think, that there is only one column, since the first line has only one ("comment").

milancurcic · 2022-06-13T13:28:49Z

I see, that makes total sense. I agree that won't work. What do you think about this approach:

Add an optional argument row to number_of_columns which defaults to 1 (current behavior). row indicates the line number at which we'll determine the number of columns.
In loadtxt, if skiprows is present, move the call to number_of_columns to go after the skip-rows logic, and call number_of_columns(s, skiprows_+1).

MuellerSeb · 2022-06-13T13:35:57Z

I see, that makes total sense. I agree that won't work. What do you think about this approach:

* Add an optional argument `row` to `number_of_columns` which defaults to 1 (current behavior). `row` indicates the line number at which we'll determine the number of columns.

* In `loadtxt`, if `skiprows` is present, move the call to `number_of_columns` to go after the skip-rows logic, and call `number_of_columns(s, skiprows_+1)`.

Thought of almost the same. We could also just add a skiprows to number_of_columns to pass skiprows_ there directly (would also be more consistent). I would move the call of number_of_columns also to have the correct value for skiprows_. I would just add these lines from loadtxt to number_of_columns:

      do i = 1, skiprows_
        read(s, *)
      end do

milancurcic · 2022-06-13T13:41:35Z

@MuellerSeb nice, I like that even better.

MuellerSeb · 2022-06-13T14:09:55Z

If everyone is fine with the mechanism for max_rows set with negative numbers, I think this is ready for merging.

jvdp1

LGTM. Nice catch and solution for number_of_columns!

milancurcic · 2022-06-16T19:55:00Z

Thanks all, will merge.

MuellerSeb added 3 commits April 19, 2022 13:06

stdlib_io: add skiprows to loadtxt

d980c64

Add test for skiprows in loadtxt

de74cd8

stdlib_io: add check for skiprows >= 0

ea80574

MuellerSeb mentioned this pull request Apr 19, 2022

Optional arguments for loadtxt #512

Closed

ivan-pi reviewed Apr 19, 2022

View reviewed changes

awvwgk linked an issue Apr 19, 2022 that may be closed by this pull request

Optional arguments for loadtxt #512

Closed

ivan-pi reviewed Apr 19, 2022

View reviewed changes

src/stdlib_io.fypp Outdated Show resolved Hide resolved

ivan-pi requested changes Apr 19, 2022

View reviewed changes

awvwgk added reviewers needed This patch requires extra eyes topic: IO Common input/output related features labels Apr 19, 2022

stdlib_io: add max_rows to loadtxt

bcc3f36

MuellerSeb changed the title ~~Feature: loadtxt skiprows~~ Feature: loadtxt skiprows and max_rows Apr 19, 2022

MuellerSeb and others added 2 commits April 19, 2022 22:23

Update src/stdlib_io.fypp

3c66fcc

Co-authored-by: Ivan Pribec <[email protected]>

loadtxt: Add test for max_rows

938ba8c

MuellerSeb requested a review from ivan-pi April 19, 2022 20:40

MuellerSeb commented Apr 19, 2022

View reviewed changes

src/stdlib_io.fypp Outdated Show resolved Hide resolved

loadtxt: treat negative skiprows as 0 (numpy like)

5af87c6

jvdp1 reviewed Apr 24, 2022

View reviewed changes

src/stdlib_io.fypp Show resolved Hide resolved

MuellerSeb commented Apr 27, 2022

View reviewed changes

src/stdlib_io.fypp Outdated Show resolved Hide resolved

MuellerSeb requested a review from jvdp1 April 27, 2022 13:39

MuellerSeb commented May 5, 2022

View reviewed changes

src/stdlib_io.fypp Outdated Show resolved Hide resolved

MuellerSeb added 2 commits May 5, 2022 12:56

stdlib_io: cut off skiprows

ad2faa9

stdlib_io: docs for cut off skiprows

6c01940

14NGiestas reviewed May 5, 2022

View reviewed changes

src/tests/io/test_loadtxt.f90 Outdated Show resolved Hide resolved

rename test in src/tests/io/test_loadtxt.f90

93689c9

Co-authored-by: Ian Giestas Pauli <[email protected]>

update specs docs of loadtxt

ec049ea

jvdp1 reviewed May 8, 2022

View reviewed changes

milancurcic approved these changes Jun 13, 2022

View reviewed changes

doc/specs/stdlib_io.md Outdated Show resolved Hide resolved

src/stdlib_io.fypp Outdated Show resolved Hide resolved

MuellerSeb and others added 2 commits June 13, 2022 14:35

Update doc/specs/stdlib_io.md

c046e7e

Co-authored-by: Milan Curcic <[email protected]>

Update src/stdlib_io.fypp

764322a

Co-authored-by: Milan Curcic <[email protected]>

MuellerSeb added 2 commits June 13, 2022 15:50

loadtxt: determine number of columns from first line to be read

0f356db

number_of_columns: bugfix for missing int

b428213

jvdp1 approved these changes Jun 13, 2022

View reviewed changes

milancurcic merged commit 3028a36 into fortran-lang:master Jun 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: loadtxt skiprows and max_rows #652

Feature: loadtxt skiprows and max_rows #652

MuellerSeb commented Apr 19, 2022 •

edited

Loading

ivan-pi Apr 19, 2022

MuellerSeb Apr 19, 2022 •

edited

Loading

MuellerSeb Apr 19, 2022 •

edited

Loading

MuellerSeb Apr 20, 2022 •

edited

Loading

jvdp1 Apr 23, 2022

MuellerSeb Apr 24, 2022

ivan-pi left a comment

MuellerSeb commented Apr 19, 2022

MuellerSeb commented Apr 19, 2022

14NGiestas commented Apr 27, 2022

MuellerSeb commented Apr 28, 2022

14NGiestas commented Apr 28, 2022

14NGiestas commented May 5, 2022

MuellerSeb commented May 5, 2022

14NGiestas commented May 5, 2022

jvdp1 commented May 8, 2022

jvdp1 left a comment

jvdp1 May 8, 2022

MuellerSeb May 9, 2022

jvdp1 May 9, 2022

MuellerSeb commented May 9, 2022 •

edited

Loading

jvdp1 commented May 9, 2022

milancurcic left a comment

MuellerSeb commented Jun 13, 2022

milancurcic commented Jun 13, 2022

MuellerSeb commented Jun 13, 2022 •

edited

Loading

milancurcic commented Jun 13, 2022

MuellerSeb commented Jun 13, 2022

milancurcic commented Jun 13, 2022

MuellerSeb commented Jun 13, 2022

jvdp1 left a comment

milancurcic commented Jun 16, 2022

Feature: loadtxt skiprows and max_rows #652

Feature: loadtxt skiprows and max_rows #652

Conversation

MuellerSeb commented Apr 19, 2022 • edited Loading

ivan-pi Apr 19, 2022

Choose a reason for hiding this comment

MuellerSeb Apr 19, 2022 • edited Loading

Choose a reason for hiding this comment

MuellerSeb Apr 19, 2022 • edited Loading

Choose a reason for hiding this comment

MuellerSeb Apr 20, 2022 • edited Loading

Choose a reason for hiding this comment

jvdp1 Apr 23, 2022

Choose a reason for hiding this comment

MuellerSeb Apr 24, 2022

Choose a reason for hiding this comment

ivan-pi left a comment

Choose a reason for hiding this comment

MuellerSeb commented Apr 19, 2022

MuellerSeb commented Apr 19, 2022

14NGiestas commented Apr 27, 2022

MuellerSeb commented Apr 28, 2022

14NGiestas commented Apr 28, 2022

14NGiestas commented May 5, 2022

MuellerSeb commented May 5, 2022

14NGiestas commented May 5, 2022

jvdp1 commented May 8, 2022

jvdp1 left a comment

Choose a reason for hiding this comment

jvdp1 May 8, 2022

Choose a reason for hiding this comment

MuellerSeb May 9, 2022

Choose a reason for hiding this comment

jvdp1 May 9, 2022

Choose a reason for hiding this comment

MuellerSeb commented May 9, 2022 • edited Loading

jvdp1 commented May 9, 2022

milancurcic left a comment

Choose a reason for hiding this comment

MuellerSeb commented Jun 13, 2022

milancurcic commented Jun 13, 2022

MuellerSeb commented Jun 13, 2022 • edited Loading

milancurcic commented Jun 13, 2022

MuellerSeb commented Jun 13, 2022

milancurcic commented Jun 13, 2022

MuellerSeb commented Jun 13, 2022

jvdp1 left a comment

Choose a reason for hiding this comment

milancurcic commented Jun 16, 2022

MuellerSeb commented Apr 19, 2022 •

edited

Loading

MuellerSeb Apr 19, 2022 •

edited

Loading

MuellerSeb Apr 19, 2022 •

edited

Loading

MuellerSeb Apr 20, 2022 •

edited

Loading

MuellerSeb commented May 9, 2022 •

edited

Loading

MuellerSeb commented Jun 13, 2022 •

edited

Loading