Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement open(filename, mode) and use it #71

Merged
merged 17 commits into from
Jan 4, 2020
Merged

Conversation

certik
Copy link
Member

@certik certik commented Jan 3, 2020

Motivated by #14.

Prior art:

Python: open

In future PRs (or here) we can support the Python's modes 'a', 'x', 't'. The mode 'b' is binary in Python, which in Fortran would be access=stream. We can also support a Fortran specific mode 'u' which would mean form="unformatted".

The other Python's options do not seem to have an equivalent in Fortran (I think), so they would not be supported.

We also need to add more tests for this.

@certik
Copy link
Member Author

certik commented Jan 3, 2020

@everythingfunctional, I was wondering if you can review this, I thought you might like it. The whole io module so far is modeled by Python/NumPy.

@everythingfunctional
Copy link
Member

Sure thing.

Copy link
Member

@milancurcic milancurcic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If open is publicly exposed and imported, I think that prohibits the use of the open statement in the same scope.

Plus, it could be easily confused with the statement. Should we call this something else, perhaps open_file or fopen?

@certik
Copy link
Member Author

certik commented Jan 3, 2020

If open is publicly exposed and imported, I think that prohibits the use of the open statement in the same scope.

That does not seem to be the case. As you can see, I am using the open statement in the function itself, and you can also use it outside of the function (I tested it). So it looks like we are free to use open.

But if there is some issue with the name, then my next favorite name would be fopen.

@milancurcic
Copy link
Member

Nice! If there's no chance of conflict with the statement, I recommend keeping the name open, and likewise for future close, read, write, etc.

@certik
Copy link
Member Author

certik commented Jan 3, 2020

I cannot guarantee yet that there is not a chance of conflict, only that so far I didn't notice any issues in gfortran. But we can start with using open and if there are issues that come up, we can rename it (one can also rename it at import time in some corner case if needed).

@everythingfunctional
Copy link
Member

I'm still a bit torn between creating a new derived type to return with an OO interface vs just returning the unit number. But for now this is still experimental, and without all the rest of the OO interface wouldn't be of much use anyway.

@certik
Copy link
Member Author

certik commented Jan 3, 2020

@everythingfunctional as I proposed in #14 (comment), we should have both an OO and a low level interface. This PR is part of the low level interface. The high level interface needs to be designed, but the idea would be that it would simply be a thin wrapper on top of the low level interface.

Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me. I have mainly minor suggestions.
Maybe it would be better to rename the function from open to opentxt, because this function is only for sequential formatted files. opentxt would also be in line with savetxt and loadtxt

open(newunit=u, file=filename, position="append", status="old", &
action="write")
else
call error_stop("Unsupported mode")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there some rules for error messages? Usually, I would mention something a bit more verbose to help the user, e.g.

     call error_stop("ERROR, open function: Unsupported mode")

Maybe worthwhile to open an issue about that?

@certik
Copy link
Member Author

certik commented Jan 3, 2020 via email

@jvdp1
Copy link
Member

jvdp1 commented Jan 3, 2020

I would like to support binary stream and unformatted files also, see the issue description. Do you have same case that you think would be hard to support?

OK. It is fine for me. The name open is ok for me.
Then I would suggest to already introduce the t for text mode, and add if ( mode_ == 'a' .or. mode_ == 'at') then to show that we want to support more modes.

The Python x mode could be also arleady introduced (action = 'write', status = 'new' ).

I can add some suggestions if you want/agree

@certik
Copy link
Member Author

certik commented Jan 3, 2020 via email

Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are my propositions for the text modes (assuming the default is "text").

For form = "unformatted", access = "sequential", I would suggest u
For form = "unformatted", access = "stream", I would suggest s
What about form = "unformatted", access = "direct" and form = "unformatted", access = "append"?

Also some compilers support form = "binary" which (I think) is not standard. So I would not use the letter 'b' to avoid confusion.

@certik
Copy link
Member Author

certik commented Jan 3, 2020

form = "unformatted", access = "append" would be au. You can combine any of rwa with any of ust.

What does form = "unformatted", access = "direct" do?

Also, what exactly does the non-standard extension form = "binary" do? Does it do anything that the mode s or u cannot do?

The goal of open as I had it in mind was to cover 99% of use cases that people have, and follow a familiar Python like interface. People know about the mode "b", and I think it very nicely maps to form = "unformatted", access = "stream", and if the non-standard form = "binary" does something different, then I would not encourage using it and rather use "b" for the standard "stream" as in Python.

I started gathering all the "open" statements from projects here:

https://github.com/fortran-lang/stdlib/wiki/List-of-popular-open-source-Fortran-projects

at:

https://github.com/fortran-lang/stdlib/wiki/Usage-of-%22open%22

It turns out the current list there already pretty much covers most projects. As long as we can cover all those use cases, we should be fine.

One missing piece is how errors are handled --- sometimes the code wants to handle errors itself. So we should have an optional iostat argument, that if it is present, we will return the error status there.

@jvdp1
Copy link
Member

jvdp1 commented Jan 3, 2020

What does form = "unformatted", access = "direct" do?

It allows to read/write at a specific record using the rec statement in read/write, and it provides a form of random access to a file.

Also, what exactly does the non-standard extension form = "binary" do? Does it do anything that the mode s or u cannot do?

The goal of open as I had it in mind was to cover 99% of use cases that people have, and follow a familiar Python like interface. People know about the mode "b", and I think it very nicely maps to form = "unformatted", access = "stream", and if the non-standard form = "binary" does something different, then I would not encourage using it and rather use "b" for the standard "stream" as in Python.

The binary form should be similar to stream access (in the sense that it writes a sequence of bytes). Maybe could we use both 's' and 'b' for stream files?

I should gather all the "open" statements from projects here:

https://github.com/fortran-lang/stdlib/wiki/List-of-popular-open-source-Fortran-projects

and see if we can cover them all.

Possible combinations:

  • form = "formatted" | "unformatted" (| "binary")

  • access = "direct" | "sequential" | "stream" | "append"

Edit: access = append is considered as sequential but it is positioned at the end of the file. So it seems useless if position= is used.

@everythingfunctional
Copy link
Member

The options to Python's open are listed here. I would suggest starting with that specification as far as it makes sense in Fortran, and only adding other options if there is some use case that it doesn't cover.

@jvdp1
Copy link
Member

jvdp1 commented Jan 3, 2020

The options to Python's open are listed here. I would suggest starting with that specification as far as it makes sense in Fortran, and only adding other options if there is some use case that it doesn't cover.

Currently, among all characters used for Python 's open, only b and + are missing. As discussed, b could be covered by stream files. I am not sure what does the +.

Python does not seem to cover direct access, while it seems do be used in some Fortran libraries (I never use it myself; https://github.com/fortran-lang/stdlib/wiki/Usage-of-%22open%22 ). It may be simpler to ignore the direct access for now.

@everythingfunctional
Copy link
Member

Modes 'w+' and 'w+b' open and truncate the file. Modes 'r+' and 'r+b' open the file with no truncation.

So, basically + means action = "READWRITE", with no other differences, (I think).

@jvdp1
Copy link
Member

jvdp1 commented Jan 4, 2020

@certik: I created a PR to your branch open from my branch opencertik. It was easier for me to make some suggestions.
Shortly, I added a function to parse the optional mode, and I extended it to the Python characters "+" for readwrite and "b"/"s" for stream files.
Currently, unformated files are not supported (but it would not difficult to implement with my suggestion), as well as access=direct(but I think we should discuss it broadly before implementing it).

certik added 3 commits January 4, 2020 13:26
stdlib_experimental_io: addition of a parser and characters "+" and "s/b" (for stream)
@certik
Copy link
Member Author

certik commented Jan 4, 2020

@jvdp1 thank you! You were reading my mind, I wanted to implement it just like this.

@certik
Copy link
Member Author

certik commented Jan 4, 2020

This is ready to go in as far as I am concerned. The implementation could be improved and all corner cases tested more, but for a first iteration I think it is good enough.

@jvdp1
Copy link
Member

jvdp1 commented Jan 4, 2020

@jvdp1 thank you! You were reading my mind, I wanted to implement it just like this.

Happy to help!

Question/comment: I expected that the other would be always "r/w/a/x", followed (possibly) by '+' and then by "t/b". If we don't want to expect that order (e.g., to allow something like "+br") then the function parse_mode should be modified with a loop going through trim(adjustl(mode)).
It could be also more efficient than the current implementation.

@certik
Copy link
Member Author

certik commented Jan 4, 2020

Question/comment: I expected that the other would be always "r/w/a/x", followed (possibly) by '+' and then by "t/b". If we don't want to expect that order (e.g., to allow something like "+br") then the function parse_mode should be modified with a loop going through trim(adjustl(mode)).
It could be also more efficient than the current implementation.

I tested Python and it seems it can be in any order. And we should indeed modify parse_mode, to return it in "canonical" order.

I won't have time to do that, but if you want, go ahead and work on it. Or we can merge first and submit a PR later, which would save me from constantly having to rebase and fix conflicts.

@jvdp1
Copy link
Member

jvdp1 commented Jan 4, 2020

I tested Python and it seems it can be in any order. And we should indeed modify parse_mode, to return it in "canonical" order.

I won't have time to do that, but if you want, go ahead and work on it. Or we can merge first and submit a PR later, which would save me from constantly having to rebase and fix conflicts.

I just modify parse_mode for that in my branch opencertik. But as you suggested, it is probably better to merge first and than to submit another PR with the new implementation of parse_mode. I will be then able to add some tests for parse_mode with different orders.

Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.
Another PR will be needed to modify parse_mode. (see #77 )

public :: loadtxt, savetxt, open

! Private API that is exposed so that we can test it in tests
public :: parse_mode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it is possible with CMake, but would it be possible to use some preprocessing (cpp?) for setting parse_mode public only for the tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could. See also #75.

@certik
Copy link
Member Author

certik commented Jan 4, 2020

Ok, it seems that nobody is against this and everybody who commented likes it and @jvdp1 approved it, so I am going to go ahead and merge this. Then let's improve upon this with further PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants