-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to retrieve resources from a namespace package #68
Comments
In GitLab by @jaraco on Nov 2, 2018, 02:56 This also relates to #60 - where having support for resolving a resource relative to a module is concrete, even if that module is in a namespace package. So I would frame this issue as one of two options:
|
In GitLab by @warsaw on Nov 2, 2018, 14:28 The problem as I see it is that because there is no single location for the PEP 420 namespace package, there's no logical place for the resources. The two cases you cite aren't exactly analogous. In the first case, you have a The second case, where the The problem comes about because I don't know how you reconcile a fake namespace package with a real namespace package, so that you can consistently determine a location from which to find resources. It would have to work for both file system and zip files, and it would have to be robust in the case where you have multiple portions. For example, if you actually had another These all seem like very dicey semantics, but if we can come up with clear, definable, rules, I'm not necessarily against it. |
In GitLab by @jaraco on Nov 2, 2018, 16:34
I thought it was a consideration. I know in my mind, it was. I seem to think the tests even captured this condition. I don't recall ever there being a constraint that namespace packages can only contain other packages... that's just a common pattern. I create "fake namespaces" all the time. Many of the jaraco.* distributions have just one module, like jaraco.collections. My reading of PEP 420 was that the spec is primarily about the container and not its contents, and that like any other package, it can contain modules. In fact the specification section says:
You said:
I agree - perhaps I should put together some more complex examples that include more than one instance of the namespace package. I should maybe also demonstrate what happens with pkg_util-based packages. In my mind, there are two ways to have clear semantics. One is to disallow loading resources from namespace packages, but allow resources to be loaded as siblings of modules in a namespace package (which do have a concrete location), which is what pkg_resources does. The other is to allow loading of a resource from the namespace package following the same logic that importlib uses to load modules. I don't see how it's any more dicey to resolve resources from a namespace package than it is to resolve modules or subpackages from a namespace package. |
In GitLab by @warsaw on Nov 2, 2018, 18:30
Here's the problem as I see it. If there's a solution, I'm all for adding this support. It doesn't appear that it's possible to extract a namespace package's location from any public API. The closest I can come is to use
Alternatively, you could try to find it based on the sibling's name, e.g.:
I'm not sure what would then happen for real namespace packages containing just multiple portions. You'd likely have to fail, but what would be the failure criteria? |
In GitLab by @jaraco on Nov 2, 2018, 19:49 I did just check the behavior on pkgutil-style namespaces, and the behavior is as expected:
Switching the path order gives precedence to the earlier package on the path.
|
In GitLab by @jaraco on Nov 2, 2018, 19:56 Thinking about the PEP 420 packages, here's a poor replication of what I had in mind:
You don't have to access the private member of NamespacePath because it is iterable. |
In GitLab by @jaraco on Nov 2, 2018, 20:43 I thought I'd take a stab at the implementation and see if I could patch in a quick proof of concept, but it quickly became clear that there's not a single interface that does file discovery. What I did discover is this:
I'm not yet sure about the story for zip files... but for simple packages, it seems the submodule_search_locations already presents the list of paths that need to be searched. |
In GitLab by @brettcannon on Nov 3, 2018, 16:31 I think what this really comes down to is how easily or often will people mess up when using namespace packages with So the real question becomes whether knowingly searching across multiple directories for a resource's location is acceptable because it is pragmatic enough to overcome the debugging headache of trying to figure out why a file wasn't discovered that you expected to be there. But there is also the complication that this will be new semantics for namespace packages overall, e.g. Basically the options sound like:
None of these are great. I also don't think adding some flag is going to help make this easier somehow either. I think if it's not going to be (1) then it should probably be (3) and the semantics get redefined as a search on IOW I don't see an obvious answer. |
In GitLab by @jaraco on Nov 3, 2018, 17:12 The more I think about it, (2) anchoring on submodules, is probably not preferable. I was considering it because that's what pkg_resources did and that would help address the limitations discovered in #60.
What is a scenario where that's not the case? Is that not how the loader locates modules? What sort of breakage are you imagining?
Hmm. Maybe that's something the loader(s) should supply. It's probably too late to think about the abstraction layers provided by finders/loaders, but I always imagined that anything that could load a module would also be responsible for providing data. |
In GitLab by @warsaw on Nov 5, 2018, 16:38 It might indeed make sense for
+1 |
In GitLab by @jaraco on Nov 15, 2018, 12:45
Does that preclude the possibility that |
In GitLab by @jaraco on Dec 27, 2018, 13:38 In the feature/traversable branch, I've started to draft the idea. It adds a new module As you can see, relying on these abstractions makes the code in I did have to make one modification to the test suite, to add Next I'd like to add support to these Traversable objects to provide a context manager supplying a path on the file system (similar to what And I haven't yet talked about the elephant in this issue - the namespace packages. This branch does not yet do anything for namespace packages. In fact, it retains the behavior that namespace packages are disallowed. But my feeling is that it should be possible in The biggest concern I have about this approach is it is lower level and more sophisticated than the Such a change would probably require a transitional approach, such as:
Following this approach, the code for importlib_resources becomes diminishingly small, similar to what we find in I'll continue to hack on this, but for now, I think this branch is ready for an initial review. I'd be happy to answer any questions or field any concerns you may have. |
In GitLab by @jaraco on May 6, 2019, 15:27
This work was done in jaraco/zipp#4, released as zipp 0.4. |
In GitLab by @jaraco on May 6, 2019, 15:55 mentioned in merge request !76 |
In GitLab by @jaraco on Feb 29, 2020, 17:53 I'm going to defer this work for a subsequent milestone. |
In GitLab by @jaraco on Feb 29, 2020, 17:53 removed milestone |
Seems to be fixed in Python 3.10.0 . With Python 3.9 I'm using this workaround, just in case somebody finds it useful: trailing = []
while True:
try:
root = importlib.resources.files(namespace)
break
except TypeError: # due pathlib.Path(None)
if '.' not in namespace:
raise
# walk up to the parent package
namespace, name = namespace.rsplit('.', 1)
trailing.append(name)
# walk down from package to the namespace
for step in trailing[::-1]:
root = root.joinpath(step)
# root is now an importlib.resources.abc.Traversable pointing to the namespace |
I wonder if this is a use case that |
Yes, that's when #196 landed.
I don't see why not, afterall, if Python can load code from the file, why shouldn't we be able to read its contents? |
Right. The problem was, let's say you have packages foo.bar and foo.baz and you can install them independently. Where would you put So now, let's say you have |
Currently, we find all versions of |
I'm not sure what you mean by "all cases of resource.txt". There should only be one such file relevant to this discussion, specifically |
I mean, if multiple packages write conflicting a But, I think namespace package are fairly compelling for data storage. Let's say I have some big data files I need to support portions of the use-cases handled by my package, I can just put them in a namespace package, instead of having to use a namespace with other packages inside just to load my data. I can just handle things directly if the resource lookup fails, and I won't have to change my code if I decide to split one of those namespace packages. |
Here's the thing though. There won't ever be anything other than one single Personally, I think a much better solution for namespace package shared resources would be to have a "resources" subpackage, say I still think it doesn't make sense to put anything in a top level namespace package. |
The My points are Anyway, this is just my opinion 😛 |
I don't mind a fun conversation on a closed issue if you don't! 😄 |
Yes, the same problem as importlib has when finding It's my understanding (though I couldn't confirm it in the PEP) that two packages supplying the same file or directory are unsupported and the behavior is undefined. That is, if both Agreed, if both |
My recommendation would be to depend on |
That's my understanding as well, and it jives with the semantics motivated by the Linux distro packaging example.
Thanks @jaraco I see what you're advocating for. What makes these cases different than the I'd still question whether it's best practice to do so, but at least it makes sense semantically. Do the importlib.resources documentation (or maybe packaging guides) discuss any of these use cases and best practices? |
I suspect not specifically, though it does honor the basic expectation that any package can contain resources and any two namespace packages that implement the same file (resource or module) is going to cause problems. I wasn't inclined to specifically document the consistency, because in my mind, it aligned with what a person might expect intuitively, but your surprise reveals that's not the case. I just reviewed the docs, and they are pretty sparse about the purpose of the API. It does link to the "using" guide, and that does seem like it might be an appropriate place to publish that information. |
…hat `.readthedocs.yml` will be deprecated) (#68)
This moves `VERSION` to be in a subdirectory of the main `pants` package. This is because loading resources from top-level names of namespace packages is extremely hand-wavy and ambiguous (Who "owns" `pants`? The `pants` package or the `pants.testutil` package?) So now we have it unambiguously in `pants/_version/`. This pain is seen in #17563 which tries to convert `pants` to be an _explicit_ namespace package, which truly messes up resource loading via `pkgutil` and `importlib.resources` ([this long ticket](python/importlib_resources#68) has some context, too). This PR makes `_version/VERSION` a symlink to the existing `VERSION` so that https://github.com/pantsbuild/setup isn't broken. This "works" because Pants is symlink oblivious in source-tree traversal, and therefore sees `pants/_version/VERSION` as `pants/VERSION`.
In GitLab by @jaraco on Nov 2, 2018, 02:48
Attempting to retrieve resources from a namespace package fails.
I see an obvious problem here - that a namespace package can have more than one base path, so it has no single location. But it does have a location... and pkg_resources lets one load resources from namespace package:
pkg_resources
doesn't succeed with a PEP 420 namespace package:But even in that situation, it does allow for loading resources for a module within a PEP 420 namespace package:
This issue sort-of relates to #60, but is more serious because it seems there's no input that
importlib_resources
can accept to load resources from a namespace package.The issue emerged in pmxbot when I tried to convert the package from a (deprecated) pkg_resources-style namespace package to a PEP 420 namespace package. The code failed at this line when it tried to load the phrases.
It seems to me (without looking at the code) it should be straightforward to support loading resources from namespace packages, following the same logic that importlib follows to import a module from that package.
The only other option I see is to force packages to rewrite their packages to only put resources in non-namespace packages, which is a bit of an imposition and unintuitive constraint.
The text was updated successfully, but these errors were encountered: