Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to retrieve resources from a namespace package #68

Closed
jaraco opened this issue Oct 21, 2020 · 32 comments · Fixed by #196
Closed

Unable to retrieve resources from a namespace package #68

jaraco opened this issue Oct 21, 2020 · 32 comments · Fixed by #196
Labels
api enhancement New feature or request

Comments

@jaraco
Copy link
Member

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Nov 2, 2018, 02:48

Attempting to retrieve resources from a namespace package fails.

draft $ mkdir foo
draft $ touch foo/bar.txt
draft $ rwt importlib_resources
Collecting importlib_resources
  Using cached https://files.pythonhosted.org/packages/2f/f7/b4aa02cdd3ee7ebba375969d77c00826aa15c5db84247d23c89522dccbfa/importlib_resources-1.0.2-py2.py3-none-any.whl
Installing collected packages: importlib-resources
Successfully installed importlib-resources-1.0.2
Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 03:13:28)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import importlib_resources
>>> importlib_resources.read_text(__import__('foo'), 'bar.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/importlib/resources.py", line 169, in read_text
    with open_text(package, resource, encoding, errors) as fp:
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/importlib/resources.py", line 126, in open_text
    _check_location(package)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/importlib/resources.py", line 82, in _check_location
    raise FileNotFoundError(f'Package has no location {package!r}')
FileNotFoundError: Package has no location <module 'foo' (namespace)>

I see an obvious problem here - that a namespace package can have more than one base path, so it has no single location. But it does have a location... and pkg_resources lets one load resources from namespace package:

draft $ cat > foo/__init__.py
import pkg_resources; pkg_resources.declare_namespace('foo')
draft $ python
Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 03:13:28)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pkg_resources
>>> pkg_resources.resource_stream('foo', 'bar.txt')
<_io.BufferedReader name='/Users/jaraco/draft/foo/bar.txt'>

pkg_resources doesn't succeed with a PEP 420 namespace package:

$ rm foo/__init__.py
>>> pkg_resources.resource_stream('foo', 'bar.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1145, in resource_stream
    return get_provider(package_or_requirement).get_resource_stream(
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pkg_resources/__init__.py", line 359, in get_provider
    return _find_adapter(_provider_factories, loader)(module)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1387, in __init__
    self.module_path = os.path.dirname(getattr(module, '__file__', ''))
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/posixpath.py", line 156, in dirname
    p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not NoneType

But even in that situation, it does allow for loading resources for a module within a PEP 420 namespace package:

draft $ touch foo/mod.py
draft $ tree
.
└── foo
    ├── bar.txt
    └── mod.py
draft $ python
Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 03:13:28)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pkg_resources
>>> pkg_resources.resource_stream('foo.mod', 'bar.txt')
<_io.BufferedReader name='/Users/jaraco/draft/foo/bar.txt'>

This issue sort-of relates to #60, but is more serious because it seems there's no input that importlib_resources can accept to load resources from a namespace package.

The issue emerged in pmxbot when I tried to convert the package from a (deprecated) pkg_resources-style namespace package to a PEP 420 namespace package. The code failed at this line when it tried to load the phrases.

It seems to me (without looking at the code) it should be straightforward to support loading resources from namespace packages, following the same logic that importlib follows to import a module from that package.

The only other option I see is to force packages to rewrite their packages to only put resources in non-namespace packages, which is a bit of an imposition and unintuitive constraint.

@jaraco jaraco added api enhancement New feature or request labels Oct 21, 2020
@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Nov 2, 2018, 02:50

This issue is a revisiting of #20.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Nov 2, 2018, 02:56

This also relates to #60 - where having support for resolving a resource relative to a module is concrete, even if that module is in a namespace package.

So I would frame this issue as one of two options:

  • support resolving resources relative to modules, or
  • support resolving resources in a namespace package

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @warsaw on Nov 2, 2018, 14:28

The problem as I see it is that because there is no single location for the PEP 420 namespace package, there's no logical place for the resources. The two cases you cite aren't exactly analogous.

In the first case, you have a foo/__init__.py so foo is not a PEP 420 namespace package. I don't actually know what the effect of pkg_resources.declare_namespace('foo') is, but it's not turning a concrete package into a PEP 420 namespace package. In that case, I would expect that importlib_resources should work on that example too (which you don't show, but it's what to try before removing foo/__init__.py).

The second case, where the foo directory has bar.txt and mod.py but no __init__.py file, falls into a category I call "a fake namespace" package. It's kind of cheating by providing only a single portion but just omitting the __init__.py. That's not a use case we ever considered valid while designing PEP 420, but I agree it's possible, just not a good idea. ;). @brettcannon what do you think?

The problem comes about because I don't know how you reconcile a fake namespace package with a real namespace package, so that you can consistently determine a location from which to find resources. It would have to work for both file system and zip files, and it would have to be robust in the case where you have multiple portions. For example, if you actually had another foo portion somewhere, what happens to the importability of foo.mod, foo itself, and the location(s) you search for the resource.

These all seem like very dicey semantics, but if we can come up with clear, definable, rules, I'm not necessarily against it.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Nov 2, 2018, 16:34

That's not a use case we ever considered valid while designing PEP 420.

I thought it was a consideration. I know in my mind, it was. I seem to think the tests even captured this condition. I don't recall ever there being a constraint that namespace packages can only contain other packages... that's just a common pattern.

I create "fake namespaces" all the time. Many of the jaraco.* distributions have just one module, like jaraco.collections.

My reading of PEP 420 was that the spec is primarily about the container and not its contents, and that like any other package, it can contain modules. In fact the specification section says:

A namespace package is not fundamentally different from a regular package. It is just a different way of creating packages. Once a namespace package is created, there is no functional difference between it and a regular package.

You said:

if you actually had another foo portion somewhere, what happens

I agree - perhaps I should put together some more complex examples that include more than one instance of the namespace package. I should maybe also demonstrate what happens with pkg_util-based packages.

In my mind, there are two ways to have clear semantics. One is to disallow loading resources from namespace packages, but allow resources to be loaded as siblings of modules in a namespace package (which do have a concrete location), which is what pkg_resources does. The other is to allow loading of a resource from the namespace package following the same logic that importlib uses to load modules. I don't see how it's any more dicey to resolve resources from a namespace package than it is to resolve modules or subpackages from a namespace package.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Nov 2, 2018, 16:34

assigned to @jaraco

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @warsaw on Nov 2, 2018, 18:30

One is to disallow loading resources from namespace packages, but allow resources to be loaded as siblings of modules in a namespace package (which do have a concrete location), which is what pkg_resources does. The other is to allow loading of a resource from the namespace package following the same logic that importlib uses to load modules. I don't see how it's any more dicey to resolve resources from a namespace package than it is to resolve modules or subpackages from a namespace package.

Here's the problem as I see it. If there's a solution, I'm all for adding this support. It doesn't appear that it's possible to extract a namespace package's location from any public API. The closest I can come is to use foo.__path__._path and even there it's technically a list of directories. So how do you definitively know where the resource is? Or do you propose using the non-public API for NamespacePath and searching all the directories?

% tree 
.
└── foo
    ├── __pycache__
    │   └── mod.cpython-37.pyc
    ├── bar.txt
    └── mod.py
% python3
>>> import foo
>>> foo
<module 'foo' (namespace)>
>>> foo.__path__
_NamespacePath(['/private/tmp/68/foo'])
>>> foo.__path__._path
['/private/tmp/68/foo']

Alternatively, you could try to find it based on the sibling's name, e.g.:

>>> import foo.mod
>>> foo.mod.__file__
'/private/tmp/68/foo/mod.py'

I'm not sure what would then happen for real namespace packages containing just multiple portions. You'd likely have to fail, but what would be the failure criteria?

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Nov 2, 2018, 19:49

I did just check the behavior on pkgutil-style namespaces, and the behavior is as expected:

draft $ mkdir path1
draft $ mkdir path2
draft $ $PYTHONPATH='path1:path2'
draft $ mkdir path1/pkg
draft $ mkdir path2/pkg
draft $ cat > path1/pkg/__init__.py
__path__ = __import__('pkgutil').extend_path(__path__, __name__)
draft $
draft $ cat > path2/pkg/__init__.py
__path__ = __import__('pkgutil').extend_path(__path__, __name__)
draft $ cat > path2/pkg/file.txt
path2 text
draft $ cat > path1/pkg/file.txt
path 1 text
draft $ tree
.
├── path1
│   └── pkg
│       ├── __init__.py
│       └── file.txt
└── path2
    └── pkg
        ├── __init__.py
        └── file.txt
draft $ python
>>> import importlib.resources
>>> importlib.resources.read_text('pkg', 'file.txt')
'path 1 text\n'

Switching the path order gives precedence to the earlier package on the path.

draft $ env PYTHONPATH=path2:path1 python3.7
>>> import importlib.resources
>>> importlib.resources.read_text('pkg', 'file.txt')
'path2 text\n'

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Nov 2, 2018, 19:56

Thinking about the PEP 420 packages, here's a poor replication of what I had in mind:

draft $ tree
.
├── path1
│   └── pkg
│       └── file.txt
└── path2
    └── pkg
        └── file.txt

4 directories, 2 files
draft $ python3.7 -q
>>> import pkg
>>> for path in pkg.__path__:
...   try:
...     text = open(os.path.join(path, 'file.txt')).read()
...     break
...   except Exception:
...     pass
...
>>> text
'path 1 text\n'

You don't have to access the private member of NamespacePath because it is iterable.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Nov 2, 2018, 20:43

I thought I'd take a stab at the implementation and see if I could patch in a quick proof of concept, but it quickly became clear that there's not a single interface that does file discovery.

What I did discover is this:

>>> pkg.__spec__.submodule_search_locations
_NamespacePath(['/Users/jaraco/draft/path1/pkg', '/Users/jaraco/draft/path2/pkg'])
>>> foo.__spec__.submodule_search_locations
['/Users/jaraco/draft/foo']

I'm not yet sure about the story for zip files... but for simple packages, it seems the submodule_search_locations already presents the list of paths that need to be searched.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @brettcannon on Nov 3, 2018, 16:31

I think what this really comes down to is how easily or often will people mess up when using namespace packages with importlib.resources, and then they do how hard will it be to debug? The search space is simply different with namespace packages which is why we initially didn't bother supporting it. But @jaraco is suggesting that we get that anchor point from a submodule in a namespace package as the anchor. That would be a change in semantics as we have purposefully restrict the code to package and not modules because we didn't want people caught off-guard when a package shifted into a module and thus its file system location changed.

So the real question becomes whether knowingly searching across multiple directories for a resource's location is acceptable because it is pragmatic enough to overcome the debugging headache of trying to figure out why a file wasn't discovered that you expected to be there.

But there is also the complication that this will be new semantics for namespace packages overall, e.g. _NamespaceLoader doesn't defined get_data() or anything else which could potentially anchor it anywhere like we get when you work off of an __init__.py.

Basically the options sound like:

  1. Do nothing
  2. Start accepting submodules as a way to determine locations on the file system (while still say "no" to namespace packages)
  3. Start accepting namespace packages

None of these are great. I also don't think adding some flag is going to help make this easier somehow either. I think if it's not going to be (1) then it should probably be (3) and the semantics get redefined as a search on __path__ for resources. Unfortunately that could break people as the location of __init__.py does not necessarily correspond to what is on __path__.

IOW I don't see an obvious answer.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Nov 3, 2018, 17:12

The more I think about it, (2) anchoring on submodules, is probably not preferable. I was considering it because that's what pkg_resources did and that would help address the limitations discovered in #60.

the location of __init__.py does not necessarily correspond to what is on __path__.

What is a scenario where that's not the case? Is that not how the loader locates modules? What sort of breakage are you imagining?

_NamespaceLoader doesn't define get_data() or anything else which could potentially anchor it anywhere

Hmm. Maybe that's something the loader(s) should supply. It's probably too late to think about the abstraction layers provided by finders/loaders, but I always imagined that anything that could load a module would also be responsible for providing data.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @warsaw on Nov 5, 2018, 16:38

It might indeed make sense for _NamespaceLoader to define get_data() and that should "fix" this problem properly, without changes to importlib_resources. Of course, that would mean it could only be a feature in Python 3.8 (assuming the functionality is added there), so that might be less than ideal. I feel more comfortable about the semantics in that case, and _NamespaceLoader would define how to find resources from __path__ or whatever.

but I always imagined that anything that could load a module would also be responsible for providing data.

+1

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Nov 15, 2018, 12:45

Of course, that would mean it could only be a feature in Python 3.8.

Does that preclude the possibility that importlib_resources could provide a shim for backward compatibility? I'll work on a draft in this package that does just that.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Dec 27, 2018, 13:38

In the feature/traversable branch, I've started to draft the idea. It adds a new module tree which with a function from_package that returns a traversable tree for a given package. Relying on zip for zip packages and pathlib for file system paths, this function returns an object that can be used to traverse into subdirectories (should help address #58).

As you can see, relying on these abstractions makes the code in contents() and is_resource() much simpler. As of yet, the technique doesn't yet expose an API for the user to get one of these traversable objects, but it does demonstrate that the tests can pass with minimal modification.

I did have to make one modification to the test suite, to add subdirectory/ as an entry in the zip file. It is possible to construct a zip file without this entry, but when using standard unix tools, including Python's stdlib, if you were to zip up a directory tree, it would contain the directory entries. The zipp project relies on these subdirectory entries to identify directory objects. Perhaps zipp should not rely on this convention. That's something to consider for later.

Next I'd like to add support to these Traversable objects to provide a context manager supplying a path on the file system (similar to what path() does).

And I haven't yet talked about the elephant in this issue - the namespace packages. This branch does not yet do anything for namespace packages. In fact, it retains the behavior that namespace packages are disallowed. But my feeling is that it should be possible in tree.from_package to identify namespace packages and provide a generic Traversable multiplexer that would allow traversal across a set of roots.

The biggest concern I have about this approach is it is lower level and more sophisticated than the ResourceReader protocol. What I really want is a new protocol that allows loaders to supply Traversable objects and not ResourceReader objects.

Such a change would probably require a transitional approach, such as:

  • In Python 3.8, introduce a new protocol for loaders to supply traversable objects. Deprecate the ResourceReader protocol.
  • In a later Python release, drop support for the ResourceReader protocol.

Following this approach, the code for importlib_resources becomes diminishingly small, similar to what we find in importlib_metadata.api.

I'll continue to hack on this, but for now, I think this branch is ready for an initial review. I'd be happy to answer any questions or field any concerns you may have.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on May 6, 2019, 15:27

Perhaps zipp should not rely on this convention. That's something to consider for later.

This work was done in jaraco/zipp#4, released as zipp 0.4.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on May 6, 2019, 15:55

mentioned in merge request !76

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 29, 2020, 17:53

I'm going to defer this work for a subsequent milestone.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 29, 2020, 17:53

removed milestone

@ergoithz
Copy link

Seems to be fixed in Python 3.10.0 .

With Python 3.9 I'm using this workaround, just in case somebody finds it useful:

trailing = []
while True:
    try: 
        root = importlib.resources.files(namespace)
        break
    except TypeError:  # due pathlib.Path(None)
        if '.' not in namespace:
            raise
        # walk up to the parent package
        namespace, name = namespace.rsplit('.', 1)
        trailing.append(name)
# walk down from package to the namespace
for step in trailing[::-1]:
    root = root.joinpath(step)
# root is now an importlib.resources.abc.Traversable pointing to the namespace

@warsaw
Copy link
Member

warsaw commented Jul 20, 2022

I wonder if this is a use case that importlib.resources should even support? While technically speaking, the only thing that defines a PEP 420 namespace package is one without an __init__.py, in practice, are namespace package containing resources a useful thing? How would you distribute such a package? Aren't most namespace packages "virtual" things that get created when subpackage portions get installed? In which case, which distribution would contain the namespace package resources?

@FFY00
Copy link
Member

FFY00 commented Jul 20, 2022

Seems to be fixed in Python 3.10.0 .

Yes, that's when #196 landed.

I wonder if this is a use case that importlib.resources should even support? While technically speaking, the only thing that defines a PEP 420 namespace package is one without an __init__.py, in practice, are namespace package containing resources a useful thing? How would you distribute such a package? Aren't most namespace packages "virtual" things that get created when subpackage portions get installed? In which case, which distribution would contain the namespace package resources?

I don't see why not, afterall, if Python can load code from the file, why shouldn't we be able to read its contents?
AFAIK namespace packages were initially introduced to allow distributors to distribute multiple bits of packages without having conflicting __init__.py files. In those scenarios, code would be limited to the package type, with some features being supported and some but not others.
The last question does raise an interesting point, however, I don't see why not have it work with both types, given that the API might be used with unknown (to the API user) objects 😅

@warsaw
Copy link
Member

warsaw commented Jul 20, 2022

AFAIK namespace packages were initially introduced to allow distributors to distribute multiple bits of packages without having conflicting init.py files. In those scenarios, code would be limited to the package type, with some features being supported and some but not others

Right. The problem was, let's say you have packages foo.bar and foo.baz and you can install them independently. Where would you put foo/__init__.py in order to ensure that it would get install regardless of which subpackage portion you installed? This problem was further exacerbated on Linux distros (I was both a Debian and Ubuntu developer when I worked on this problem) where you wouldn't know which .deb to put the foo/__init__.py file in. There was an elaborate refcounting based approach to handling this, but it only worked if the __init__.py files were empty or at least exactly the same (which wasn't always the case, usually due to code drift). It was a hacked which mostly, but not always worked.

So now, let's say you have foo/resource.txt that you want importlib.resources to find. Don't you have exactly the same problem?

@FFY00
Copy link
Member

FFY00 commented Jul 20, 2022

Currently, we find all versions of resource.txt I believe! Which is a solution available to us, but that wasn't for the problem you described. Users can write code to handle those situations tailoring to their use-case 😊

@warsaw
Copy link
Member

warsaw commented Jul 20, 2022

I'm not sure what you mean by "all cases of resource.txt". There should only be one such file relevant to this discussion, specifically foo/resource.txt (i.e. the file in the namespace package directory. Any resource.txt files in subpackages of the namespace package foo shouldn't be relevant because they would be resources of the subpackage, which importlib.resources should have no problem finding (as long as the subpackage is a concrete package, not also a namespace package of course).

@FFY00
Copy link
Member

FFY00 commented Jul 20, 2022

I mean, if multiple packages write conflicting a foo/resource.txt file, we will find all versions on the import path. Granted, that isn't a common use-case. I think it would be highly unusual for someone to want to do this.

But, I think namespace package are fairly compelling for data storage. Let's say I have some big data files I need to support portions of the use-cases handled by my package, I can just put them in a namespace package, instead of having to use a namespace with other packages inside just to load my data. I can just handle things directly if the resource lookup fails, and I won't have to change my code if I decide to split one of those namespace packages.

@warsaw
Copy link
Member

warsaw commented Jul 21, 2022

Here's the thing though. There won't ever be anything other than one single foo/resources.txt file, and then, which portion installs it? To use my previous example, as long as you can install portions independently, then each portion will have to contain foo/resource.txt. Now as a maintainer, you have to ensure both portions have the same foo/resources.txt file; if you don't then the resource will be different if you install foo.bar before foo.baz or vice versa. The last one installed will probably "win" and that portion's foo/resource.txt file will be used.

Personally, I think a much better solution for namespace package shared resources would be to have a "resources" subpackage, say foo.resources. Make foo.bar and foo.baz depend on foo.resources so it always gets installed regardless of portion install order. Then it's totally unambiguous that getting the foo.resources/resources.txt resource will work as expected.

I still think it doesn't make sense to put anything in a top level namespace package.

@FFY00
Copy link
Member

FFY00 commented Jul 21, 2022

The resource.txt issue is completely fabricated by us. I really don't think there is any use-case where you would need to do something like that.

My points are
- IMO namespace packages are compelling for data, as presented in the 2nd paragraph of my last reply
- even if that's not the case, I think being able to load the code files via importlib.resources is valuable, and having it consistent with other types of packages is very nice

Anyway, this is just my opinion 😛

@warsaw
Copy link
Member

warsaw commented Jul 21, 2022

Anyway, this is just my opinion

I don't mind a fun conversation on a closed issue if you don't! 😄

@jaraco
Copy link
Member Author

jaraco commented Jul 22, 2022

So now, let's say you have foo/resource.txt that you want importlib.resources to find. Don't you have exactly the same problem?

Yes, the same problem as importlib has when finding foo/module.py. Namespace packages can contain modules (e.g. jaraco.collections). Resources honor the same semantics. Only one package should supply a given module/resource.

It's my understanding (though I couldn't confirm it in the PEP) that two packages supplying the same file or directory are unsupported and the behavior is undefined. That is, if both foo.bar and foo.baz supplied a foo/resources.txt, that would be considered invalid.

Agreed, if both foo.bar and foo.baz require a resources.txt, they should depend on another package that supplies that resource exactly once. Still, there's no reason that a package named foo-resources couldn't supply foo/resources.txt, which is what this change enabled, bringing parity with foo/module.py.

@jaraco
Copy link
Member Author

jaraco commented Jul 22, 2022

With Python 3.9 I'm using this workaround, just in case somebody finds it useful:

My recommendation would be to depend on importlib_resources>=3.2; python_version < "3.10" or simply depend on importlib_resources>=3.2 uniformally, rather than wrapping importlib_resources with a bespoke wrapper.

@warsaw
Copy link
Member

warsaw commented Jul 24, 2022

It's my understanding (though I couldn't confirm it in the PEP) that two packages supplying the same file or directory are unsupported and the behavior is undefined. That is, if both foo.bar and foo.baz supplied a foo/resources.txt, that would be considered invalid.

That's my understanding as well, and it jives with the semantics motivated by the Linux distro packaging example.

Still, there's no reason that a package named foo-resources couldn't supply foo/resources.txt, which is what this change enabled, bringing parity with foo/module.py

Thanks @jaraco I see what you're advocating for. What makes these cases different than the foo/__init__.py case is that in the latter, pre-PEP 420, every portion had to supply the top level __init__.py or the subpackage it provided couldn't be imported. But in the resource or module case, you can get away with only one portion providing that resource or module.

I'd still question whether it's best practice to do so, but at least it makes sense semantically. Do the importlib.resources documentation (or maybe packaging guides) discuss any of these use cases and best practices?

@jaraco
Copy link
Member Author

jaraco commented Jul 29, 2022

Do the importlib.resources documentation (or maybe packaging guides) discuss any of these use cases and best practices?

I suspect not specifically, though it does honor the basic expectation that any package can contain resources and any two namespace packages that implement the same file (resource or module) is going to cause problems. I wasn't inclined to specifically document the consistency, because in my mind, it aligned with what a person might expect intuitively, but your surprise reveals that's not the case.

I just reviewed the docs, and they are pretty sparse about the purpose of the API. It does link to the "using" guide, and that does seem like it might be an appropriate place to publish that information.

jaraco pushed a commit that referenced this issue Nov 10, 2022
…hat `.readthedocs.yml` will be deprecated) (#68)
thejcannon added a commit to pantsbuild/pants that referenced this issue Nov 21, 2022
This moves `VERSION` to be in a subdirectory of the main `pants` package. 

This is because loading resources from top-level names of namespace packages is extremely hand-wavy and ambiguous (Who "owns" `pants`? The `pants` package or the `pants.testutil` package?) So now we have it unambiguously in `pants/_version/`.

This pain is seen in #17563 which tries to convert `pants` to be an _explicit_ namespace package, which truly messes up resource loading via `pkgutil` and `importlib.resources` ([this long ticket](python/importlib_resources#68) has some context, too).

This PR makes `_version/VERSION` a symlink to the existing `VERSION` so that https://github.com/pantsbuild/setup isn't broken. This "works" because Pants is symlink oblivious in source-tree traversal, and therefore sees `pants/_version/VERSION` as `pants/VERSION`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants