Skip to content

Add 'sourceSortOrder' to decide how source files are iterated #719

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 22, 2017

Conversation

Paul-Vincent
Copy link
Contributor

@Paul-Vincent Paul-Vincent commented Apr 11, 2017

…es' to process source files before directories.

We have noticed that when running the code generation on Windows and Linux we can inconsistencies of which packages included sub-classes were being created. Given the source files:

src/main/resources/virtualization-capabilities-api/schema/json/includes/MessageProperties.jsd
src/main/resources/virtualization-capabilities-api/schema/json/includes/VirtualMachineQuery.jsd
src/main/resources/virtualization-capabilities-api/schema/json/VMQueryRequestMessage.jsd

On our Linux box it generates:

target/generated-sources/virtualization-capabilities-api/com/dell/cpsd/vcenter/capabilities/api/MessageProperties.java
target/generated-sources/virtualization-capabilities-api/com/dell/cpsd/vcenter/capabilities/api/Query.java
target/generated-sources/virtualization-capabilities-api/com/dell/cpsd/vcenter/capabilities/api/VMQueryRequestMessage.java

And on Windows it generates:
target/generated-sources/virtualization-capabilities-api/com/dell/cpsd/vcenter/capabilities/api/includes/MessageProperties.java
target/generated-sources/virtualization-capabilities-api/com/dell/cpsd/vcenter/capabilities/api/includes/VirtualMachineQuery.java
target/generated-sources/virtualization-capabilities-api/com/dell/cpsd/vcenter/capabilities/api/VMQueryRequestMessage.java

We're submitting this new configuration option to ensure that within a directory, files are processed first before any sub-directories to ensure consistency between environments. The default false maintains the existing behaviour to ensure backward compatibility.

We could only provide an integration test for when the option is set to true, as in this case we can verify the behaviour, the false configuration option assertions would depend on the underlying OS.

…es' to process source files before directories
@joelittlejohn
Copy link
Owner

Can you help me understand why the existing code which uses Collections.sort(schemaFiles); is not enough for you to get predictable results between OSs?

I assume that you're using $refs without specifying javaType in your schemas, this isn't recommended as there are a variety of other ways in which your types can be read in a different order and hence generate different names.

@Paul-Vincent
Copy link
Contributor Author

Paul-Vincent commented Apr 11, 2017

Hi @joelittlejohn Thanks for the fast response. When investigating the issue we were seeing we did a small test using java.lang.File list files and then using Collections.sort(schemaFiles) and found the following:

On Linux:
Collections.sort(schemaFiles) list order:
./src/main/resources/virtualization-capabilities-api/schema/json/VMQueryRequestMessage.jsd ./src/main/resources/virtualization-capabilities-api/schema/json/includes

New Comparator sorted list order:
./src/main/resources/virtualization-capabilities-api/schema/json/VMQueryRequestMessage.jsd ./src/main/resources/virtualization-capabilities-api/schema/json/includes

On Windows:
Collections.sort(schemaFiles) list order:
.\src\main\resources\virtualization-capabilities-api\schema\json\includes .\src\main\resources\virtualization-capabilities-api\schema\json\VMQueryRequestMessage.jsd

New Comparator sorted list order:
.\src\main\resources\virtualization-capabilities-api\schema\json\VMQueryRequestMessage.jsd .\src\main\resources\virtualization-capabilities-api\schema\json\includes

Thanks for highlighting around the javaType, we're exploring the possibility of submitting a further feature pull request to enable this to be relative to the target package.

@ctrimble
Copy link
Collaborator

Is there a reason we cannot just sort these files by their canonical paths? It seems like that would be a much simpler change and should stabilize the sort order, since natural order is clearly defined for strings in Java.

@Paul-Vincent
Copy link
Contributor Author

Hi @ctrimble Thanks for the comment. I think we'd still see inconsistencies between file generation if we're exclusively sorting on the canonical path, we would see files sorted equally with sub directories, as in example above with the includes sub-directory as a peer to VMQueryRequestMessage.jsd.

Given the scenario where we process the files as:
/schemas/includes/CommonAttribute.jsd
/schemas/VMQueryRequestMessage.jsd

We'll create:
/target/includes/CommonAttribute.java
/target/VMQueryRequestMessage.java

Given a subsequent addition of a new schema AddNodes.jsd which also references the CommonAttribute the files will be processed as:
/schemas/AddNodes.jsd
/schemas/includes/CommonAttribute.jsd
/schemas/VMQueryRequestMessage.jsd

This time creating:
/target/AddNodes.jsd
/target/CommonAttribute.java
/target/VMQueryRequestMessage.java

Note the package change. The feature we're proposing will ensure that this scenario is covered, in that the first scenario will be processed as:
/schemas/VMQueryRequestMessage.jsd
/schemas/includes/CommonAttribute.jsd

Creating:
/target/CommonAttribute.java
/target/VMQueryRequestMessage.java

And then the second scenario will be processed as:
/schemas/AddNodes.jsd
/schemas/VMQueryRequestMessage.jsd
/schemas/includes/CommonAttribute.jsd

Creating:
/target/AddNodes.jsd
/target/CommonAttribute.java
/target/VMQueryRequestMessage.java

This goes some way toward helping out with workaround around:
#683
#164

@ctrimble
Copy link
Collaborator

@Paul-Vincent The sort order here is still going to be unstable. The root cause is that files are visited in a different order on Unix systems and Windows systems, because Windows does not take case into account. https://docs.oracle.com/javase/7/docs/api/java/io/File.html#compareTo(java.io.File)

This fix does not address root cause, as you are still going to have an unstable sort order. For instance, the files alpha.json, Beta.json, gamma.json will not be visited in the same order on Unix and Windows.

There are issues with the mapping between source file locations and generated file locations in this project, but that problem is not being directly addressed by this PR.

@Paul-Vincent
Copy link
Contributor Author

Hi @ctrimble, The issue we're trying to resolve with this PR is to resolve the files and directories issue that we're seeing at the moment without the PR. We can add to the PR to do a case insensitive sort within files and directories if you feel it would be use?

@ctrimble
Copy link
Collaborator

ctrimble commented Apr 12, 2017

@Paul-Vincent I don't know what Joe wants to do with this, but I would not take this approach to solving the issue. You are encountering an incompatibility between the sort order of files on Windows and Linux, per the PR description. That problem will still exist, in part, after the PR is applied. Subdirectories with mixed case will still be visited in different order. Files in the same directory with mixed case will still be visited in a different order.

Based of the examples offered and issues sited, it seems like the issues of inconsistent sort order and generated file location are being conflated. In my opinion, each should be addressed individually, with the former being a bug in the comparator and the latter being a more systemic problem with the overall visit order of the schemas.

@Paul-Vincent
Copy link
Contributor Author

Just in my opinion, the case used within files and directories is within the control of the user so shouldn't be much of a factor, even though it would be nice to be consistent. By ensuring that directories are processed after files it means that we, as clients, can control the generated packages, we have means through the javaName/javaType to ensure the class name is as we expect.

The biggest problem we're trying to address here is the package that classes are being generated within, our CI environment using Linux but some of our developers use Windows.

@Paul-Vincent
Copy link
Contributor Author

@ctrimble @joelittlejohn Hi guys, please let me know what we can do to help progress this discussion. We're coming up against some internal milestone dates, and we may need to work around this. Thanks.

@ctrimble
Copy link
Collaborator

@Paul-Vincent I would feel comfortable merging a PR that was based on sorting files by File. getCanonicalPath(), along with a flag to turn back on sorting by File.compareTo(). That would fix the issue with different naming on Linux and Windows, but allow people to get the current ordering if needed.
I don't see this PR as getting the project any closer to the principle of least surprise, with regard to where files end up.

@Paul-Vincent
Copy link
Contributor Author

thanks @ctrimble So if we change the comparator to do a case insensitive sort first based on the canonical path, and ensure files are before directories we'll be good? As an example resulting in:

/schemas/alpha.jsd
/schemas/Beta.jsd
/schemas/gamma.jsd
/schemas/Lambda.jsd
/schemas/include/common_atttributes.jsd
/schemas/include/Common_Properties.jsd

@ctrimble
Copy link
Collaborator

ctrimble commented Apr 14, 2017

Without regard to the type of node (file or directory), but just on the canonical path. The comparison should look something like this:

if( legacySortOrder ) {
  return fileA.compareTo(fileB);
} else {
  return fileA.getCanonicalPath().compareTo(fileB.getCanonicalPath());
}

That will give you the same set of generated files on Linux and Windows. The flag will need to be a configuration option and will let those doing Windows only work keep the current functionality.

note: I am not sure what that flag should be called.

@joelittlejohn
Copy link
Owner

I think this is a good compromise and way forward to do something useful here that solves your problem. Without using javaType in your schemas you'll likely still have occasional surprises when editing your input, however if the result of this PR is at least that output can be made predictable when the same exact project is built on different OSs then that will be a positive change. @ctrimble has identified that the current changes do not solve the problem of OS-specific ordering and I agree that solving this should be the goal of this PR.

There seems to be one final question here about how to treat subdirectories. Sorting by canonical path is simple, and solves the problem of OS-specific output, but I don't think it it produces behaviour that is particularly useful or expected. I think dipping in and out of subdirectories and mixing files in the current directory with files in subdirectories as you iterate is a strange thing to do.

So I'd be happy with iteration order that uses: case-sensitive-sorted files followed by case-sensitive-sorted directories (or vice-versa).

The only question that remains for me is, which makes more sense, directories first or files first? I actually think directories first makes more sense, since I imagine that subdirectories contain shared things that schemas alongside those subdirectories will use. In this case, it's nicer for the naming of those items in subdirectories to be driven by their own filenames and not any $refs to them. Do you think the opposite is true? People generally put shared schemas in the root directory and then have subdirectories with schemas inside that use e.g. "$ref": "../MyThing.json"?

I'd actually be happy to call any configuration option we introduce here simply sortInputFiles, because I think that the current sorting (being OS-specific) isn't really a definitive sort. It will be obvious to any user of this config option that they are turning on true, definitive input sorting.

@ctrimble
Copy link
Collaborator

Joe, this is your call, but the naming problem with $refed schemas is expressed by the visit order, not caused by it. We would be well served by fixing the sort order by aligning with the order on one of the OSes and then fixing the $ref issue by directly addressing that problem.

@joelittlejohn
Copy link
Owner

joelittlejohn commented Apr 14, 2017

@ctrimble Yes, I agree. I only mention issues with $ref to reiterate that there are still further ways in which naming could produce unexpected results, even if we create a 'predictable' visit order across OSs. I'd like to make whatever change is most useful here with the limitations of $ref'd schemas in mind.

The more I think about this, the more I think that whatever strategy we choose here we'll see another issue or PR raised soon with an alternative requirement. I think my preferred solution would be:

<sourceSortOrder>OS<sourceSortOrder> <-- this is the default

<sourceSortOrder>FILES_FIRST<sourceSortOrder> <-- case-sensitive sort, visit files first

<sourceSortOrder>SUBDIRS_FIRST<sourceSortOrder> <-- case-sensitive sort, visit subdirs before files

We don't need to decide to align with any new sort order in particular because the current behaviour is going to become the default. If anyone wants any other strategy (and that strategy is sensible) then we can add it. I use the word 'source' here rather than 'input' because it fits with the naming convention for the other config arguments.

I think it's fair to say that the fact that this option needs to exist at all is proof of bad decisions on my part back when this project started. It's a sticking-plaster for a much deeper problem but I'm happy to see this added as I think people would find it useful and it's fairly easy and simple to implement.

@Paul-Vincent
Copy link
Contributor Author

Thanks for the feedback guys, it's a nice idea to cater for various sorting options. I'll update the pull request aa soon as I can, hopefully this weekend.

@Paul-Vincent
Copy link
Contributor Author

Hi, thanks for the feedback @joelittlejohn and @ctrimble. I've updated the configuration option, removing the previously added processSourceFiledBeforeDirectories with the suggestion of sourceSortOrder configuration option with the options of OS, SUBDIRS_FIRST or FILES_FIRST.

Please note that this is a case sensitive sort as per @joelittlejohn last comment.

@joelittlejohn
Copy link
Owner

joelittlejohn commented Apr 18, 2017 via email

@joelittlejohn
Copy link
Owner

joelittlejohn commented Apr 18, 2017 via email

@Paul-Vincent
Copy link
Contributor Author

Great thanks @joelittlejohn. I fully understand about avoiding the ping-pong edits, I've updated the formatting and the imports as requested.
Enjoy your vacation.

@joelittlejohn joelittlejohn changed the title Introduce new configuration option 'ProcessSourceFilesBeforeDirectori… Add 'sourceSortOrder' to decide how source files are iterated Apr 22, 2017
@joelittlejohn joelittlejohn added this to the 0.4.34 milestone Apr 22, 2017
@joelittlejohn joelittlejohn merged commit 609b5db into joelittlejohn:master Apr 22, 2017
@Paul-Vincent Paul-Vincent deleted the processFilesFirst branch April 22, 2017 20:44
@joelittlejohn
Copy link
Owner

@Paul-Vincent Just in case you didn't see it, 0.4.34 is released with your changes. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants