Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow resources to be in subdirectories #58

Closed
jaraco opened this issue Oct 21, 2020 · 43 comments
Closed

Allow resources to be in subdirectories #58

jaraco opened this issue Oct 21, 2020 · 43 comments
Labels
enhancement New feature or request
Milestone

Comments

@jaraco
Copy link
Member

jaraco commented Oct 21, 2020

In GitLab by @rob.speer on May 16, 2018, 01:23

Suppose I have a Web application, myapp, that needs to serve a static file, which by convention needs to be in the path myapp/static/ld/context.ld.json. Suppose I also want to be able to access that file from Python code, because its contents are used in a test.

As importlib_resources is currently defined, I would need to rewrite the path as if it were a Python submodule, even though it does not contain actual Python code: path(myapp.static.ld, "context.ld.json"). I would also need to create empty files named myapp/static/__init__.py and myapp/static/ld/__init__.py, and hopefully exclude them from being served as static files.

That would be enough for me to give up and use paths relative to __file__ instead. In general, I would heartily recommend importlib if I could reasonably promise that it was an improvement over using __file__ or over existing uses of pkg_resources, which it wouldn't be if it doesn't support subdirectories.

The call I would like to be able to make in this situation is path(myapp, "static/ld/context.ld.json").

@jaraco jaraco added the enhancement New feature or request label Oct 21, 2020
@jaraco jaraco added this to the 1.1 milestone Oct 21, 2020
@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @warsaw on May 16, 2018, 16:28

This was a deliberate choice, but I think you have a valid use case. @brettcannon what do you think? And if we allow this, should we make sure it gets into Python 3.7?

I'm hesitant because I don't know how much work it will take or what the semantic and implementation implications are.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @brettcannon on May 16, 2018, 16:33

I get the desire to use this, but I'm personally not convinced it's worth the headache of having to manage paths separate from the import system since the abstraction handles a lot of subtle details for us.

IOW I'm with Barry and I'm not comfortable in doing this until someone produces a PR that implements this to see what kind of complications this would introduce as for me some empty files is not a huge cost for cross-platform file loading when that's needed (I get why people who are controlling their deployment and not doing this in a library feel like not bothering).

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @warsaw on May 21, 2018, 15:27

I actually have another possible use case. Over in importlib_metadata it occurs to me that I'd really like to use importlib_resources to handle the "loading the metadata from a zip file" case, but because the {egg,dist}-info directory is not a Python package, I can't use Python's import system to get to it.

I have been considering writing a custom loader for that purpose, so we could use the ResourceReader API, but that's really not a perfect fit. It would make more sense to add a low-level API to finders rather than loaders to find a distribution package's metadata, because you obviously don't want to be able to import from an {egg,dist}-info directory.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @warsaw on Sep 11, 2018, 19:31

Note that we are using finders, not loaders for importlib_metadata so we don't need this over there. I won't close the issue, but currently still have no plans to implement this. I'm with @brettcannon that we can re-evaluate this if someone feels strongly enough about it to submit a PR.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Oct 31, 2018, 17:27

I recently encountered another project where I adapted the implementation to use importlib_resources. This particular project had a jarfile in a subdirectory of the package. Package was baton and the resource was in baton/netsuite/. I ended up adding an empty __init__.py to baton/netsuite, making it a package, and then instead of loading netsuite/tool.jar from the baton package, I loaded tool.jar from the (empty) baton.netsuite package. A little awkward, but not horrible.

You know what might be better? I know it's a little late to re-envision the design, but what if instead of returning a context that when entered ensures there's a file on the file system, it instead returns a first-class object which is traversable. So the OP could implement his application thus:

ld = importlib_resources.path('myapp') / 'static' / 'ld'
with ld.open() as strm:
    handle_lines(strm)
other_files = ld.listdir()

Such a solution would be much more elegant in that :

  1. It doesn't require a context to delete potentially copied files (files are read from the source, whether it's a local file, file in a zip file, or some other resource such as a network file system).
  2. Users get the same familiar interface they've come to expect from pathlib (though with a possibly limited interface).
  3. All importlib_resources needs to do is construct the appropriate wrapper, and that wrapper provides read_binary, read_text through the familiar .open('b').read() and .open().read().

Such an approach would require some functionality that doesn't exist (pathlib-compatible wrappers for zip files and others).

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @warsaw on Nov 1, 2018, 13:30

I'd love to see zipfile compatible pathlib-like API. As @jaraco knows, this would also come in handy for importlib_metadata.

I wouldn't be opposed to a traversable object, but also remember that a motivating use case for .path()'s current API are dynamic libraries which, due to the limitations of dlopen() require a physical file system location.

Can you think of a way to support the existing .path() API and also provide a traversable object? If not, let's think about a new function that could return such a thing. I think it's a very interesting idea and would nicely support the OP use case.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Dec 17, 2018, 15:19

I've just stumbled on another project that stores "package data" in a subdirectory of the package. And as it turns out, that's what distutils indicates to do. I think it's inconsistent for importlib.resources to declare this an unsupported case but distutils (as deprecated as it is) to still be recommending this for packaging.

Can you think of a way to support the existing .path() API and also provide a traversable object?

At first blush, it feels incompatible with the implementation I have in mind. I can draft what I have in mind as a separate function and then we can analyze if there's a way to combine these into a unified implementation.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @gregory.szorc on Feb 26, 2019, 23:59

I was told to comment here after filing https://bugs.python.org/issue36128.

I can see both sides of the argument for whether the ResourceReader interface should allow resource names to exist in "subdirectories."

On one hand, preventing path separators keeps things simple and provides only 1 way to access a specific resource.

On the other, requiring resources exist within Python packages can be rather annoying (as the original reporter has stated). Just today I was refactoring Mercurial to convert various directories into Python packages and it was somewhat annoying :/ It is much more convenient to place a bunch of resource files in a friendly directory tree and access them using hierarchical addressing with path separators as the delimiter. And this conveniently maps to filesystem paths.

From my perspective as someone who has hacked together Python module importing using zero-copy (https://gregoryszorc.com/blog/2018/12/28/faster-in-memory-python-module-importing/), I love the ResourceReader interface because it allows resources to be imported from something that isn't the filesystem. My read of the ResourceReader interface is that it is supposed to be an abstraction that allows resources to exist outside of traditional filesystems. Unfortunately, it falls a bit short.

For starters, the implementation in importlib today assumes platform native path separators. If running on POSIX, it treats / specially and on Windows, \ is treated specially. If we're really talking about an agnostic interface for resource loading, we need to treat both path separators the same or pick only one that is special. Otherwise you have a leaky abstraction that behaves differently depending on OS. If path separators are special, normalizing on / would be preferred. But this code has already shipped in Python 3.7 and I doubt the backwards compatibility break to ignore \ could be stomached.

I can make the argument that resource names should be nearly anything and it is up to the ResourceReader to resolve them however it sees fit. This is truly agnostic of storage. Unfortunately, there is the practical concern that in the common case of mapping resources to filenames, we'll have multiple ways of referencing a resource from N Python packages in the Python package hierarchy. Yes, there are ways to limit this. But it feels overly complicated for a ResourceReader to enforce matters by looking for the existence of parent/child packages. Then there are more esoteric concerns, such as the fact that not all filesystems can represent all filenames. What happens when someone tries to use a resource named AUX and Windows/NTFS refuses to write that file? Or what if resources collide on case sensitivity? Or what if the filesystem can't store all Unicode sequences? There arguably needs to be restrictions on what is a valid resource name.

That being said, what is the harm for ResourceReaders allowing N ways to access a specific resource for some ResourceReaders? There are some theoretical concerns. But they feel like edge cases to me. I expect >95% of consumers of ResourceReader to be within the current package and if people are doing bad things through multiple addressing, the fault seems to be theirs. If we're really concerned about the multiple addressing problems, perhaps we could define a well-named variable on modules to control whether resources are exposed. For example, if the foo.bar module defines __no_resources__ = True (or something), foo.bar.__spec__.loader.get_resource_loader() will fail for that module and consumers will be forced to use foo.__spec__.loader.get_resource_loader() for their resource needs. (I'm not sure if this is feasible - I'm mostly just thinking aloud.) (If I were defining things from scratch, I would consider making modules opt in to exposing resources because it does feel like a special case.)

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @warsaw on Feb 27, 2019, 14:09

@gregory.szorc I'm a little strapped for time right now, but I just want to thank you for your very valuable feedback. It's always great to have real-world use cases to base decisions on, and I think you give us a really important data point (and one that validates the basic idea, even with its warts).

I'll follow up again once I have time to digest your comment and think about its implications.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on May 6, 2019, 15:55

mentioned in merge request !76

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @warsaw on May 9, 2019, 14:06

assigned to @warsaw

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @warsaw on Jun 2, 2019, 12:57

I really wanted to get to this in time for Python 3.8, but I just couldn't get far enough with working code. Contributions are welcome.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @davidism on Oct 18, 2019, 10:35

I was removing the dependency on pkg_resources from Werkzeug's SharedDataMiddleware and Jinja's PackageLoader, and ended up not using importlib_resources or most of the loader and resource reader APIs due to these limitations. Nested directories of resources can be included in a package with MANIFEST.in, so it seemed reasonable that these would still be accessible with the resource API.

Due to backwards compatibility as well as ease of use, it wasn't possible for me to require users to add init files to every folder and subfolder. Additionally, even if init files were added, there didn't appear to be a way to access subdirectories, or even determine that a name was a directory. There were secondary requirements such as being able to get the mtime of a resource if possible as well.

I'm hesitant to bring this up, since it may have been a convenient oversight, but reader.open_resource doesn't have the "is not a directory" check that the rest of the API has, so it's possible to open files under nested directories, if you know they're already there. This worked for both filesystem and zip packages. That was enough for serving files, but not listing them.

It would be great to see support for nested resources without adding init files.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @warsaw on Oct 18, 2019, 13:04

@davidism Any chance you'd like to work on a PR for this?

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @wimglenn on Nov 1, 2019, 16:23

Hey Barry - probably worth mentioning, this design makes importlib.resources unable to read resources from many existing packages.

This works (pytz 2019.3):

pkgutil.get_data("pytz", "zoneinfo/America/Chicago")

This doesn't:

importlib.resources.read_binary("pytz", "zoneinfo/America/Chicago")
# ValueError: 'zoneinfo/America/Chicago' must be only a file name

So it looks like one can't use importlib.resources with existing distributions that packaged data in subdirectories (of course those distributions are immutable now). And, to be honest, requiring projects change their data subdirectories into subpackages in order to be compatible with importlib.resources going forward is a big ask.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @warsaw on Nov 1, 2019, 17:07

Contributions welcome :)

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @wimglenn on Nov 1, 2019, 18:32

OK, should I take that to mean "it's not an intentional opinion that importlib.resources made here, just a convenience of implementation"? What's the scope here - can you think of any other weird edge-cases where subdirectories will need to work besides in the zipimporter? Pyinstaller, maybe?

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @warsaw on Nov 4, 2019, 21:18

When we originally wrote importlib_resources we punted on implementing subdirectories for both semantic and pragmatic reasons. We've since been convinced that we should do it. This issue has everything we know about it!

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @Mekk on Nov 11, 2019, 09:35

Let me just mention my „favorite” use case on old APIs: pypa/setuptools#1635 New APIs do fine via importlib, less so via raw reader interface (code below tried on 3.7.3). So let's be careful not to allow too much:

>>> import importlib.resources
>>> importlib.resources.read_binary("multiprocessing", '../../../../etc/passwd')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/importlib/resources.py", line 152, in read_binary
    resource = _normalize_path(resource)
  File "/usr/lib/python3.7/importlib/resources.py", line 61, in _normalize_path
    raise ValueError('{!r} must be only a file name'.format(path))
ValueError: '../../../../etc/passwd' must be only a file name

(this is fine)

but:

>>> import multiprocessing
>>> import sys
>>> reader = sys.modules['multiprocessing'].__spec__.loader.get_resource_reader('multiprocessing')
>>> reader.open_resource('../../../../etc/passwd')
<_io.FileIO name='/usr/lib/python3.7/multiprocessing/../../../../etc/passwd' mode='rb' closefd=True>

(this is somewhat dangerous)

Context: the idea of mapping urls to resources (and relying on resource api for validation) seems appealing to some people.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @ztane on Nov 19, 2019, 09:24

@Mekk well ... of course for as long as the accepted Stack Overflow answer uses

this_dir, this_filename = os.path.split(__file__)
DATA_PATH = os.path.join(this_dir, "data", "data.txt")

what could go wrong ;)

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @brettcannon on Dec 6, 2019, 17:01

So I can think of two issues here: one is security-related and one is API-related.

The security one is directory traversal. What if we just flat-out banned .. in the path? Since we control importlib.resources we can put that protection in at that level for the specified resource. Would that help deal with the /etc/passwd problem? (We can't really prevent anything that a ResourceReader would do anyway.)

The API-related one is I say if you want something in a subdirectory you are expected to pass in a pathlib.PurePath instance which will work with pathlib.PurePath.joinpath(). That deals with the path separation issue as pathlib takes care of that for us. If you manage to mess up and not do that then it's on you, but at least the docs and API can try to steer you in the general direction of what makes the most sense to prevent what will be a common user mistake. We could even go so far as to say that unless you pass in an os.PathLike object you cannot have a path separator in the resource name.

What do you people think about that in terms of design constraints to solve the key issues that led us to punt on this decision? BTW I think all of this is backwards-compatible.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Jan 18, 2020, 11:04

In the feature/traversable branch (!76), I have what I believe is a viable approach to this issue. Here it is in action:

~ $ pip-run git+https://gitlab.com/python-devs/importlib_resources@feature/traversable pytz                                                                                                                  
Collecting git+https://gitlab.com/python-devs/importlib_resources@feature/traversable
  Cloning https://gitlab.com/python-devs/importlib_resources (to revision feature/traversable) to /private/var/folders/c6/v7hnmq453xb6p2dbz1gqc6rr0000gn/T/pip-req-build-nasj425r
  Running command git clone -q https://gitlab.com/python-devs/importlib_resources /private/var/folders/c6/v7hnmq453xb6p2dbz1gqc6rr0000gn/T/pip-req-build-nasj425r
  Running command git checkout -b feature/traversable --track origin/feature/traversable
  Switched to a new branch 'feature/traversable'
  Branch 'feature/traversable' set up to track remote branch 'feature/traversable' from 'origin'.
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
    Preparing wheel metadata ... done
Collecting pytz
  Using cached https://files.pythonhosted.org/packages/e7/f9/f0b53f88060247251bf481fa6ea62cd0d25bf1b11a87888e53ce5b7c8ad2/pytz-2019.3-py2.py3-none-any.whl
Building wheels for collected packages: importlib-resources
  Building wheel for importlib-resources (PEP 517) ... done
  Created wheel for importlib-resources: filename=importlib_resources-1.0.2-cp38-none-any.whl size=30597 sha256=ec143b514747b3aa0539379fa0722f3c33722abfb7e7da69c8a6953699f1d654
  Stored in directory: /private/var/folders/c6/v7hnmq453xb6p2dbz1gqc6rr0000gn/T/pip-ephem-wheel-cache-_xqgh_rz/wheels/68/0f/d6/156e93ff45c897b6608bf064c1ce26ca250cb845d747e05284
Successfully built importlib-resources
Installing collected packages: pytz, importlib-resources
Successfully installed importlib-resources-1.0.2 pytz-2019.3
Python 3.8.1 (v3.8.1:1b293b6006, Dec 18 2019, 14:08:53) 
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from importlib_resources._py3 import get
>>> zoneinfo = get('pytz', 'zoneinfo')
>>> chicago = zoneinfo / 'America/Chicago'
>>> data = chicago.read_bytes()
>>> len(data)
3576

As of this writing, the get() function is not exported in the importlib_resources namespace, but that's a trivial change. Is "get" the right name for this function?

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @warsaw on Jan 19, 2020, 16:05

@jaraco This is really interesting. Let me see if I understand what's going on here.

get() returns some kind of os.PathLike-like object, and you can use path concatenation syntax and semantics to traverse into a resource in a subdirectory. Specifically, this can be a non-Python package subdirectory. I'll assume that the implementation prevents navigating to non-subdirectories; i.e. you can't escape the enclosing resource package directory, or find your way to /etc, etc. :)

If that's how it's supposed to work, then I think this is a fantastic solution to the problem. We'd need to be clear about what the chicago object is, and how much of the Path API it supports. This is important because you're exposing read_bytes() on the object, not read_binary() as one might expect from the importlib.resources API.

Is pytz/zoneinfo a Python package? I think it's not, but I just want to be sure.

Unfortunately importlib.resources.path() already exists, I'm not sure you'd want to overload that function, because although the arguments are the same, and it's the obvious choice, the fact that existing .path() guarantees a physical file system path and is a context manager, probably prevents it from being reused.

What do you think about resolve(), traverse(), or subdirectory()?

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @brettcannon on Jan 20, 2020, 16:02

I also agree that returning an os.PathLike object that implements pathlib.PurePath and what it can of pathlib.Path would be great! That should handle most of the concerns we have had.

Perhaps there should be something that takes a module as an anchor and returns that location and then you construct everything from there using pathlib.PurePath methods? The first name that comes to mind is pathlike(module) -> OurPathObject.

Am I mistaken into thinking that if we adopted this approach it basically would negate most of the custom API we have provided (which I don't view as a bad thing)?

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 9, 2020, 10:32

I'll assume that the implementation prevents navigating to non-subdirectories; i.e. you can't escape the enclosing resource package directory, or find your way to /etc, etc. :)

Actually, there's no protection against this, but I'm not sure it's important that there is. After all, you can get to /etc with pathlib.Path('/etc/'). It's not obvious to me that it should be the responsibility of this library to sanitize inputs to .get(). On further consideration, perhaps it does make sense to ensure that whatever get() returns is actually a path within the package.

We'd need to be clear about what the chicago object is, and how much of the Path API it supports.

Agreed, this is a new, replacement API that's more aligned to Pathlib conventions than the former conventions.

Is pytz/zoneinfo a Python package? I think it's not, but I just want to be sure.

No, and Yes. It's not a traditional package with a __init__, but it happens to be importable.

~ $ pip-run -q pytz                                                                                                                                                       
Python 3.8.1 (v3.8.1:1b293b6006, Dec 18 2019, 14:08:53) 
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pytz.zoneinfo
>>> pytz.zoneinfo.__file__
>>> pytz.zoneinfo
<module 'pytz.zoneinfo' (namespace)>
>>> import pathlib
>>> pathlib.Path(pytz.__file__).parent
PosixPath('/var/folders/c6/v7hnmq453xb6p2dbz1gqc6rr0000gn/T/pip-run-ul3px_mm/pytz')
>>> [child.name for child in (pathlib.Path(pytz.__file__).parent / 'zoneinfo').iterdir()]
['Indian', 'Atlantic', 'CST6CDT', 'Poland', 'US', 'Kwajalein', 'leapseconds', 'Brazil', 'Pacific', 'zone1970.tab', 'MST', 'NZ', 'Arctic', 'Universal', 'Libya', 'Turkey', 'iso3166.tab', 'EST5EDT', 'Greenwich', 'America', 'Australia', 'Etc', 'NZ-CHAT', 'Asia', 'MET', 'Portugal', 'Antarctica', 'GMT-0', 'CET', 'Eire', 'PST8PDT', 'Jamaica', 'GMT', 'Zulu', 'Japan', 'ROC', 'GB-Eire', 'Europe', 'ROK', 'Navajo', 'Singapore', 'posixrules', 'GB', 'Mexico', 'EST', 'GMT0', 'Hongkong', 'PRC', 'zone.tab', 'Iran', 'MST7MDT', 'WET', 'W-SU', 'UCT', 'Cuba', 'Egypt', 'GMT+0', 'tzdata.zi', 'EET', 'Israel', 'Africa', 'Factory', 'UTC', 'Chile', 'HST', 'Canada', 'Iceland']
>>> pytz
<module 'pytz' from '/var/folders/c6/v7hnmq453xb6p2dbz1gqc6rr0000gn/T/pip-run-ul3px_mm/pytz/__init__.py'>

It seems that the directory is importable due to the PEP 420 namespace package implementation. I thought child namespaces of parent simple packages was not allowed, but this behavior indicates otherwise.

Regardless, the implementation is not relying on this behavior, demonstrated by it working on Python 2.7 where pytz.zoneinfo is definitely not a package.

~ $ python2.7 -m pip-run -q git+https://gitlab.com/python-devs/importlib_resources@feature/traversable pytz                                                               
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support
WARNING: Python 2.7 is not recommended. 
This version is included in macOS for compatibility with legacy software. 
Future versions of macOS will not include Python 2.7. 
Instead, it is recommended that you transition to using 'python3' from within Terminal.

Python 2.7.16 (default, Dec 13 2019, 18:00:32) 
[GCC 4.2.1 Compatible Apple LLVM 11.0.0 (clang-1100.0.32.4) (-macos10.15-objc-s on darwin
Type "help", "copyright", "credits" or "license" for more information.

>>> from importlib_resources._py2 import get
>>> zoneinfo = get('pytz', 'zoneinfo')
>>> chicago = zoneinfo / 'America/Chicago'
>>> data = chicago.read_bytes()
>>> len(data)
3559

What do you think about resolve(), traverse(), or subdirectory()?

I agree path() would be best, but it's taken, and that get() is not awesome. I like resolve() and I'm lukewarm on traverse(). I'm not to keen on subdirectory() because the result could be a file reference and not a directory. I was also thinking of files() or inspect() or load(). Another approach could be to reclaim path() thus:

  • Expose new behavior as importlib.resources.__future__.path and add deprecation warning to importlib.resources.path.
  • Optionally expose existing path as importlib.resources.__legacy__.path.
  • In the future (probably 2 python iterations), make importlib.resources.path the new behavior.

(or some variation)

Another consideration I had was perhaps the resource parameter to get could be eliminated altogether. After all, the caller could always select a resource within the package. In the pytz example:

# use resource='' to disable its effect
>>> zoneinfo = get(pytz, resource='') / 'zoneinfo'
>>> zoneinfo = get(pytz, resource='').joinpath('zoneinfo')

If we use this approach, then I'm liking the function files(pkg) a lot, enough that I'll make that change in the MR.

Perhaps there should be something that takes a module as an anchor and returns that location and then you construct everything from there using pathlib.PurePath methods? The first name that comes to mind is pathlike(module) -> OurPathObject.

Well, gee. I thought it was my idea, but now that I'm reading through Brett's response, he's made the same suggestion (that surely I read earlier). Sounds like we have some convergence. What do you think of files(pkg) instead of pathlike(pkg)?

Am I mistaken into thinking that if we adopted this approach it basically would negate most of the custom API we have provided (which I don't view as a bad thing)?

It does provide a mechanism to completely bypass most if not all of the remaining API, yes, although I imagine retaining it for compatibility. I think it would be interesting to see what the implementation looks like without those functions (for curiosity).

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 9, 2020, 10:33

On further consideration, perhaps it does make sense to ensure that whatever get() returns is actually a path within the package.

Oh! If files(pkg) doesn't accept a path at all, then it's no longer the responsibility of importlib.resources to sanitize the path. That's particularly attractive.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 9, 2020, 10:42

The latest implementation implements files(pkg) as described above.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 17, 2020, 10:10

I think it would be interesting to see what the implementation looks like without those functions (for curiosity).

I've started stripping out this functionality in the feature/drop-legacy-api branch. Progress was good, but I got stumped on Traversable.open() for zip files, which doesn't support text-decoding. This leads me to believe that's a feature zipp should support. This also suggests the Traversable API should include .open().

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 29, 2020, 17:52

I'll continue the work on extensibility support in #77. This feature is complete.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 29, 2020, 17:52

closed

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @ankostis on Apr 22, 2020, 10:17

Just a question about another API refered earlier by @wimglenn:

  • What is wrong with pkgutil.get_data()?
  • Why isn't this API enough?
  • Is it's use discouraged for the exact use case of this issue?

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @brettcannon on Apr 22, 2020, 15:11

@ankostis because pkgutil.get_data() is using an under-defined API. importlib_resources is much more rigorously defined. Honestly, pkgutil will eventually be deprecated entirely once the final holes in support in importlib are filled.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @hinakuroori on Jun 22, 2020, 14:33

mentioned in commit hinakuroori/ament_package@0c80a10a6f7de541cc8b09c16846df5ffc9ab82f

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @hinakuroori on Jun 22, 2020, 15:35

mentioned in commit hinakuroori/sros2@f1173fa1914b7ad59d1d2b46b854a86c7bba81c0

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @hinakuroori on Jun 23, 2020, 13:33

mentioned in commit hinakuroori/sros2@dbec91d1c9efafadb9b7eeb183cc5f2663feb136

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @hinakuroori on Jun 23, 2020, 14:01

mentioned in commit hinakuroori/ament_package@4db28d437e28dfdc0972fa65d92098647691bd9a

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @brettcannon on Jun 25, 2020, 14:10

marked this issue as related to #104

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @adam.hendry on Aug 19, 2020, 22:53

@brettcannon @warsaw I'm of the opinion we should keep pkgutil and steer away from importlib.resources, or just port the exact same functionality of pkgutil into importlib.resources (referencing https://stackoverflow.com/questions/6028000/how-to-read-a-static-file-from-inside-a-python-package/58941536?noredirect=1#comment112280625_58941536):

Benefits:

  1. From an initial timeit assesment, appears to be 2x faster than importlib.resources
  2. More pythonic (resources aren't packages, but required to be so by adding __init__.py with importlib.resources)
  3. Does not require massive rewrites of existing code (i.e. adding __init__.py's everywhere in published packages)

Cons:

  1. No context manager interface, so if users needs a long-lived resource they'll want importlib.resources (if I need a long-lived resource, I can make a context manager myself...really not worried about this)
  2. API is underdefined (not sure what that means, having looked at both the importlib.resources and pkgutil docs)

Honestly, the cons above aren't really cons to me. I can't see any benefits of importlib.resources over pkgutil. Rather, importlib.resources feels unpythonic (nonintuitive) and pkgutil does the same thing and faster. If the API being underdefined is the only issue...why not just update the API? Let's update pkgutil or port it to importlib.resources if we want it to live there rather than creating something new that confuses everyone and is problematic.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @adam.hendry on Aug 22, 2020, 19:37

@brettcannon and @warsaw I'm actually changing my answer. pkgutil also requires non-namespace packages. Hence, it also requires __init__.py files in the package directory structure anyway as well. Therefore, importlib.resources is more feature-rich, and hence the appropriate module to use. Apologies for my confusion.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @ankostis on Aug 23, 2020, 05:22

In my view the greatest benefit of importlib.resourcesvs pkgutil is that it is "compile-time" checked, i.e. linters & IDEs can pinpoint miss-named packages.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @warsaw on Aug 23, 2020, 17:51

Please keep in mind that we all want to support data files in non-package subdirectories, it's just that none of us has had the time to implement it. In the fine tradition of open source, "Contributions Welcome!"

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @wimglenn on Aug 26, 2020, 13:20

@warsaw Seems to be already implemented! The issue is closed, and Jason says the feature is complete. I've had a look on v3.0.0, and got pathlib instances to site-packages - when the package was zipped I got zipfile.Path instance which quacked in similar ways. Worked on both Py2 and Py3.

Example:

>>> import importlib_resources
>>> resource = importlib_resources.files("pytz") / "zoneinfo" / "America" / "Chicago"
>>> resource.read_bytes()[:10]
b'TZif2\x00\x00\x00\x00\x00'

I've updated https://github.com/wimglenn/resources-example to demonstrate importlib.resources separately from importlib_resources.

Looks pretty good now, just a waiting game for the new APIs to appear in stdlib.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Aug 30, 2020, 21:06

That's right. The functionality is present in importlib_resources 1.3 and the same functionality is available in Python 3.9. What remains is to add support for namespace packages and work toward an API for alternate loaders to supply resources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant