Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggested change in directory structure #216

Closed
wclodius2 opened this issue Jul 5, 2020 · 35 comments
Closed

Suggested change in directory structure #216

wclodius2 opened this issue Jul 5, 2020 · 35 comments

Comments

@wclodius2
Copy link
Contributor

I find the current directory/source file structure awkward. Rather than a src directory with lots of files with the term experimental in their names and module names, I suggest two directories: src with the validated source files, and experimental_src, with the source code under development. Only a few people should have commit privileges in the src directory, everyone in the experimental_src. None of the files/modules should have the term experimental in their names. That way when they have matured the developers can notify the src maintainers, and the maintainers can simply copy the files unmodified from experimental_src to src to commit. FWIW I suggest that the more mature src files be committed to src once they have been renamed and had their use statements modified to deal with the renaming.

@milancurcic
Copy link
Member

Hi WIlliam, I don't find the structure awkward, but I find the "experimental" in file names and modules unnecessary. We discussed this some time ago but now I can't find the thread. I think there was majority agreement to go with the "experimental" naming scheme, and it wasn't a big deal for me. However, I don't know why it's needed.

Actually, I personally find your proposal awkward as well. My preferred solution is to:

  • Drop the "experimental" from file and module names
  • Keep the source file where they are
  • Start tracking the semantic version, e.g. 0.x.x is always "experimental", 1.x.x becomes stable.
  • Begin making tagged and versioned releases

The agreed upon Workflow remains the same.

@certik
Copy link
Member

certik commented Jul 5, 2020 via email

@jvdp1
Copy link
Member

jvdp1 commented Jul 5, 2020

I think the experimental namespace is important. I also like having the word "experimental" in the name of the module. It reminds the user that changes might happen in the API and implementations of the procedures.
However, having the word "experimental" in the name of the module might be an issue when we will reach the version 1.0.0 (for example, how will we add a new procedure (in the experimental namespace) to a module released in version 1.0.0?).

@certik
Copy link
Member

certik commented Jul 5, 2020 via email

@jvdp1
Copy link
Member

jvdp1 commented Jul 5, 2020

You do it by having both stable io and experimental_io modules.

I agree with that. However, if there are experimental and released procedures of a same topic, we could end up with something like:

use stdlib_stats, only: mean
use stdlib_experimental_stats, only: var

This situation could be awkward.
Furthermore, if var calls mean, we will have something like:

module stdlib_experimental_stats
 use stdlib_stats, only: mean
...
end module stdlib_experimental_stats

Note that I prefer this situation (and again I believe we need an experimental namespace) over a situation where we would add "experimental" to the name of the procedures.

@milancurcic
Copy link
Member

We need an experimental namespace, where we land all new features and then we graduate them into main.

Couldn't this be done with feature branches? (and as far as I can tell, it's how it's most commonly done)

then when we merge a PR, it immediately becomes part of stable, so we have to support it forever.

No, only until 2.x.x. I don't think the version number or release mode should drive development, but the other way around.

Is perhaps the experimental namespace intended to be used alongside stable (as Jeremie did in his example), and that's why you need it to be a separate namespace?

If we do maintain 2 namespaces side by side, I don't think any single module should exist in both at the same time. For example, once stats is moves to stable, it's immediately removed from experimental.

@jvdp1
Copy link
Member

jvdp1 commented Jul 5, 2020

If we do maintain 2 namespaces side by side, I don't think any single module should exist in both at the same time. For example, once stats is moves to stable, it's immediately removed from experimental.

Hmmm... how would it be possible to add a new procedure in this situation? Feature branches would probably be more appropriate for such a situation than the current approach.

@certik
Copy link
Member

certik commented Jul 5, 2020 via email

@milancurcic
Copy link
Member

milancurcic commented Jul 6, 2020

I'm re-reading this thread and I think I'm starting to understand better what we're trying to do. But as I do, I have growing concerns / confusion about why it's done the way it is, especially with explicit namespacing of the modules with their version number.

Please, let's take a step back. Rather than stating how we're going to do something, let's set our requirements. I don't think we've done this in the past, but again, I can't find the thread so please correct me if I'm wrong. Once we agree on the requirements, we can discuss the simplest way to accomplish what we need.

Requirements

Based on this thread and what I think we need, these are our requirements: (for brevity, I'll refer to experimental as v0 and stable as v1)

  1. v0 and v1 coexist in the same codebase / repo. In other words, part of stdlib is v0, and some other part is v1.
  2. We want to clearly communicate to the user which procedures are v0 and which are v1.
  3. There are no overlapping procedures between v0 and v1. In other words, a procedure promoted to v1 is removed from v0.
  4. A module can exist in both v0 and v1, but it must not have overlapping procedures between the two.
  5. A procedure from v0 can depend on a procedure from v1 but not the other way around.
  6. A procedure can only go up in major version. For example, a procedure in v1 can't go back to v0, but it can go up to v2.

Questions:

  • Do you agree with these requirements?
  • Are there any that I missed?

Proposed solution

Given these requirements, can't we simply put a comment header in each procedure indicating which is experimental and which is stable? For example:

module stdlib_stats
  ...
contains
  ...
  function mean(x)
    ! version 1.0.3 (stable)
    ...
  end function mean
  
  function var(x)
    ! version 0.1.2 (experimental)
    ...
  end function var
...

Of course, the version is not only documented in the source comment, but also in the API docs by FORD. Then the users can clearly see what's in stable and what's in experimental.

Problems (that I see) with the current approach

  • Once a procedure moves from experimental to stable, users need to change their code.
  • More than one user have expressed it seems or feels awkward
  • It works now because we have only experimental, but how will it work when we have stable+experimental (issue raised by Jeremie above, pretty awkward IMO)

Problem with terminology

I also think we should revisit the choice of "experimental" over some other words. @jvdp1 wrote:

I also like having the word "experimental" in the name of the module. It reminds the user that changes might happen in the API and implementations of the procedures.

To me personally, "experimental" means "crazy ideas we're trying out". But we really want to communicate clearly and precisely that "this API may or may not change in the future". We know that an unstable API is less attractive to most users. Coupled with a strong word (experimental), we may be effectively discouraging users from trying it out, and having them just wait and watch the project from the side until something matures. But we can't mature without users.

I'd like to propose a gentler word that we use to signal unstable API. My favorite is to simply use semantic versioning (0.x.x vs 1.x.x). But if that's not acceptable, perhaps we can use "dev" as in, API in development?

I'm sorry to badger you with all this but I think it's important and I don't think we have asked enough many whys before making a decision.

@leonfoks
Copy link

leonfoks commented Jul 6, 2020

Having explicit name changes in module names, and even file names, seems less maintainable long term than using branches of the repo. Why not have a master branch with limited write permissions, a develop branch for everything experimental with more permissive permissions, and feature branches off the develop branch?

Sure features can only be worked on if they are independent of each other, but that might force a more logical order of development of the required functions/subs. Like having a quick select first before a median function for example, both of which could be feature branches during dev.

With a robust module naming convention, we could eliminate temporary file names, and simply have a "maths" module that persists throughout the branches.

Does it make sense to explicitly keep track of individual function versions? This seems like a lot of work! Is this standard practice for other standard libraries?

@milancurcic
Copy link
Member

@leonfoks I like that too.

Does it make sense to explicitly keep track of individual function versions? This seems like a lot of work! Is this standard practice for other standard libraries?

I don't think it's a common practice. It's a consequence of requiring both stable and experimental procedures in the same codebase, and that they can't repeat in the two release tracks. (now as I write it out, I think this is our most problematic requirement). You wouldn't need to keep track of minor and bug-fix version numbers, but you need to keep track of the major one.

If instead the stable track was a subset of experimental track (I think a much more common development model), then it'd be much more practical to use git branches to manage the code.

@leonfoks
Copy link

leonfoks commented Jul 6, 2020

I see about versioning. It would help with reverting any functions if the need comes up.

The master could be the stable track, which by definition of the tree structure is a subset of the develop branch (which could also be called the experimental branch). If any bugs are identified in a "stable" master function, we would still want to branch off develop, PR and merge with develop, and then release to master once a release is ready on develop.

@certik
Copy link
Member

certik commented Jul 6, 2020

The way we talked about stdlib in the past is that it is to become a de-facto unofficial extension of the Fortran Standard, so once we "standardize" something using https://github.com/fortran-lang/stdlib/blob/0cd354ae87a61dcbfee432657f3b1bc3bd0bb335/WORKFLOW.md, we do not change it and keep supporting it pretty much forever, just like the Fortran Standard is very strictly backwards compatible for decades. We can discuss this of course and not do things this way, but so far that has been my understanding.

This has really good advantages.

From this it follows that there is v0 and v1, but there is never a v2 or any other version. Once a function is in, it's in. We can add things in a backwards compatible manner (such as more optional arguments), but if we want to change the API, we have to rename the function, or introduce a new module (Python does this all the time, such as the old optparse and a new argparse modules).

I personaly like experimental, just like the C++ experimental stdlib, but we can use other names for v0.

v1 is called stable, again, we can discuss the name.

Yes, you need to change your code if you use something from experimental --- by removing the name "experimental" in most (but not all!) cases. When a function is v0, we can change the API a bit after we gain experience from actually using it, so users who depend on v0 functions have to be ready to change / update their code anyway. As such, having the word "experimental" in the name of the module reminds them of this. Once the word "experimental" is removed by graduating a function from v0 to v1, then we commit to never change the API, and thus user's code will forever continue to work. There is no v2 that would somehow break code.

Your proposal of having semantic versioning on a function level I think is even more confusing than the experimental / stable naming of modules.

Do you want to do a video call to discuss these issues? It might be faster to get on the same page.

@wclodius2
Copy link
Contributor Author

wclodius2 commented Jul 6, 2020 via email

@milancurcic
Copy link
Member

milancurcic commented Jul 6, 2020

@certik Okay, yes, I am not opposed to freezing the API at v1 (no v2) but I forgot about it in this thread. This probably only means that it will take much longer to stabilize a procedure. I don't know if this is a positive or a negative. There's definitely a positive in maintaining backward compatibility and we can ensure that by allowing only optional arguments to be added, as you said.

I think a call would be helpful. I can't do this week and next week it's already time for a monthly call. So I suggest we dedicate 15 minutes to this issue, or have a separate call the week after (July 20-24).

@milancurcic
Copy link
Member

I think that requirements 3 and 4 are just different ways of saying the same thing. I also think that what they are describing is a requirement for failure.

No, requirement 4 has a statement about module overlap. Indeed, the 2nd part of requirement 4 is an affirmation of requirement 3. I also don't like this solution, and think that we should have overlapping release tracks (stable a subset of experimental).

Nothing would piss off users more than having them use non-experimental code, and then having that usage break because procedures they depend on are removed from the module.

I agree and this doesn't happen with the current requirements. What do you see that I don't see?

@milancurcic
Copy link
Member

In yet another words, if a user is working with the experimental release, they have full access to stable. A user shouldn't have to work with two builds of the library in order to work with experimental features.

@jvdp1
Copy link
Member

jvdp1 commented Jul 6, 2020

+1 to discuss this issue during the next call. I am still a bit confused how we will add new/experimental/v0 procedures in a v1 module.

@certik
Copy link
Member

certik commented Jul 6, 2020

@milancurcic :

A user shouldn't have to work with two builds of the library in order to work with experimental features.

Exactly. Which I think excludes feature branches, as those would not be present in the build of the latest master.

@jvdp1 :

I am still a bit confused how we will add new/experimental/v0 procedures in a v1 module.

An experimental/v0 module is stdlib_experimental_io and a stable/v1 module is stdlib_io. Say the function open graduates from experimental to stable: then it gets moved from stdlib_experimental_io.f90 to stdlib_io.f90. Let's say we propose a new function loadtxt: then it first goes into experimental, so it will go into stdlib_experimental_io.f90. The module stdlib_experimental_io can use stdlib_io, but not the other way round. Users will use stdlib_io once we have it, and that's stable. If they want to use an experimental feature such as loadtxt, then they will import it from stdlib_experimental_io. Once loadtxt graduates to stable, then we'll move it to stdlib_io.

@wclodius2
Copy link
Contributor Author

wclodius2 commented Jul 6, 2020 via email

@milancurcic
Copy link
Member

A user shouldn't have to work with two builds of the library in order to work with experimental features.

Exactly. Which I think excludes feature branches, as those would not be present in the build of the latest master.

If we organize this as stable ⊆ experimental ⊆ feature as @leonfoks suggested, then we can do it.

@certik
Copy link
Member

certik commented Jul 6, 2020

If we organize this as stable ⊆ experimental ⊆ feature

So we would release the "feature" branch, that way users can try things out, and perhaps compiler vendors can only ship the "stable" one? Yes, that would be fine me, but still more work than having everything in one branch, just speaking from my experience doing this in the past.

@milancurcic
Copy link
Member

Okay, I didn't consider that we would literally tag and release every feature branch, beyond what git checkout feature-branch-name gives you.

If you intend to "ship" all experimental features to end-users as a tarball, then all code in one branch is easier. I am not convinced we really need this, not until we actually hear from users asking for it.

@certik
Copy link
Member

certik commented Jul 7, 2020

Let's discuss at our monthly call, and if more time is needed to discuss this, let's do a dedicated call just for stdlib.

@leonfoks
Copy link

leonfoks commented Jul 7, 2020

If there’s a dedicated stdlib call I would get on it too if that’s okay???

@certik
Copy link
Member

certik commented Jul 7, 2020

@leonfoks please join us. We will discuss this at our regular Fortran monthly call. If you sign up at our mailinglist or follow our Discourse, the call will be announced there. Links to both are at https://fortran-lang.org/

@jvdp1
Copy link
Member

jvdp1 commented Jul 7, 2020

An experimental/v0 module is stdlib_experimental_io and a stable/v1 module is stdlib_io. Say the function open graduates from experimental to stable: then it gets moved from stdlib_experimental_io.f90 to stdlib_io.f90. Let's say we propose a new function loadtxt: then it first goes into experimental, so it will go into stdlib_experimental_io.f90. The module stdlib_experimental_io can use stdlib_io, but not the other way round. Users will use stdlib_io once we have it, and that's stable. If they want to use an experimental feature such as loadtxt, then they will import it from stdlib_experimental_io. Once loadtxt graduates to stable, then we'll move it to stdlib_io.

It is the way I understood it (and as commented it here), but I have been confused with this thread.
This approach has 2 advantages for me: 1) the end-user can use all available procedures, and 2) he is aware of the risks when using an experimental procedure (including changing his/her code).

If you intend to "ship" all experimental features to end-users as a tarball, then all code in one branch is easier. I am not convinced we really need this, not until we actually hear from users asking for it.

If the vendors/we want to ship onlty the stable version, they/we could just ignore the experimental modules (since stable modules don't depend on experimental modules), and provide stable tarballs.

@milancurcic
Copy link
Member

@wclodius2 it'd be great if you can join the call too. We'll set up a schedule this week.

@wclodius2
Copy link
Contributor Author

wclodius2 commented Jul 14, 2020 via email

@certik
Copy link
Member

certik commented Jul 14, 2020 via email

@milancurcic
Copy link
Member

@wclodius2 We discussed this yesterday on the call and came to a tentative agreement. It can still change as we go but this is what we currently agreed to:

  • Adopt semantic versioning for stdlib (major.minor.bugfix). We'd start at 0.1.0.
  • Both experimental (API may or may not change) and stable (API won't change) procedures reside together in modules
  • Experimental vs. stable status of a procedure is documented in the code and documentation, not in the module name. So the first practical step would be to have a PR that removes experimental from module names and marks each function as such in docstrings.
  • We continue working with PR's directly into master branch.

This largely follows the Rust development model, I put some links here.

What I think is still unclear is when does stdlib as a whole move in major version from 0.x.x to 1.x.x. We can look how others (like Rust) did this and decide what makes most sense for us. Perhaps when some large fraction of procedures are stabilized?

What do you think? Does it seem like a good development model to you?

@certik
Copy link
Member

certik commented Jul 17, 2020 via email

@jvdp1
Copy link
Member

jvdp1 commented Jul 21, 2020

  • Adopt semantic versioning for stdlib (major.minor.bugfix). We'd start at 0.1.0.
  • Both experimental (API may or may not change) and stable (API won't change) procedures reside together in modules
  • Experimental vs. stable status of a procedure is documented in the code and documentation, not in the module name. So the first practical step would be to have a PR that removes experimental from module names and marks each function as such in docstrings.
  • We continue working with PR's directly into master branch.

This largely follows the Rust development model, I put some links here.

What I think is still unclear is when does stdlib as a whole move in major version from 0.x.x to 1.x.x. We can look how others (like Rust) did this and decide what makes most sense for us. Perhaps when some large fraction of procedures are stabilized?

It is indeed unclear when sdtlib will move in major version 1.0.0. However, I think this is conditional on users' feedback. And if we want that users start to use stdlib, I think we should first apply the following discussed change:

  • Experimental vs. stable status of a procedure is documented in the code and documentation, not in the module name. So the first practical step would be to have a PR that removes experimental from module names and marks each function as such in docstrings.

I am currently trying to finish the implementation of a fonction for stdlib skewness (similar API to mean, corr,...) for stdlib_experimental_stats. But instead I could first submit this PR to remove experimental and mark each procedure as experimantal in the specs, fi the community agrees on these changes.

@certik
Copy link
Member

certik commented Jul 21, 2020

@jvdp1 that's fine with me. (Just don't lump unrelated changes into the same PR, so if you have to move other functions from experimental, please do it in a dedicated PR.)

Regarding stdlib moving to 1.0.0, I think that's quite simple: we move it when it is ready and the community agrees it's time. Right now that is distant future, so I don't think we have to worry about it.

jvdp1 added a commit that referenced this issue Jul 28, 2020
Changes directory structure of stdlib as discussed in #216
@awvwgk
Copy link
Member

awvwgk commented Sep 18, 2021

Seems to be resolved.

@awvwgk awvwgk closed this as completed Sep 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants