Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submission for SC consideration: PEP 616: String methods to remove prefixes and suffixes #23

Closed
sweeneyde opened this issue Apr 9, 2020 · 12 comments

Comments

@sweeneyde
Copy link
Member

I'd like to request that PEP 616 be reviewed by the Steering Council.

Title: String methods to remove prefixes and suffixes

Abstract: This is a proposal to add two new methods, removeprefix() and removesuffix(), to the APIs of Python's various string objects. These methods would remove a prefix or suffix (respectively) from a string, if present, and would be added to Unicode str objects, binary bytes and bytearray objects, and collections.UserString.

Links:

@vstinner
Copy link
Member

vstinner commented Apr 9, 2020

Some context. Dennis Sweeney created https://bugs.python.org/issue39939 with an implementation. I asked for a PEP: https://bugs.python.org/issue39939#msg364313 The PEP got like 100 messages, so I consider that it was properly discussed :-) The method names changed from cutprefix/cutsuffix to removeprefix/removesuffix.

I proposed to accept a tuple of str, like: removesuffix(("\n", "\r\n", "\r")). The PEP now has a good rationale to explain why the idea was abandoned:
https://www.python.org/dev/peps/pep-0616/#accepting-a-tuple-of-affixes

@vstinner
Copy link
Member

vstinner commented Apr 9, 2020

I looked in the stdlib which functions would benefit of removesuffix(). I only found 3 files in Lib/*.py:

gzip:

if fname.endswith(b'.gz'):
    fname = fname[:-3]

pydoc:

if url.endswith('.html'):
    url = url[:-5]

tarfile:

if self.name.endswith(".gz"):
    self.name = self.name[:-3]

But there are many functions which cannot use removesuffix(). Examples:

        if output.endswith("\r") and not final:
            output = output[:-1]
            self.pendingcr = True

or:

    if filename.endswith('.pyc'):
        filename = filename[:-1]

or

        if mod.lower().endswith(".py"):
            mod = mod[:-3]

@brettcannon
Copy link
Member

I found more for the string[:-len(suffix)] case.

Lib/ctypes/macholib/dyld.py
64:                yield path[:-len('.dylib')] + suffix + '.dylib'

Lib/email/feedparser.py
364:                                preamble[-1] = lastline[:-len(eolmo.group(0))]
408:                                payload = payload[:-len(mo.group(0))]
496:                        line = line[:-len(mo.group(0))]

Lib/email/_header_value_parser.py
2796:                    encoded_part = part.fold(policy=policy)[:-len(policy.linesep)]

Lib/http/cookiejar.py
1059:                host_prefix = req_host[:-len(domain)]

Lib/test/test_importlib/import_/test_relative_imports.py
66:                uncache_names.append(name[:-len('.__init__')])
143:                            create[-1][:-len('.__init__')], count))

Lib/test/test_importlib/util.py
224:                import_name = name[:-len('.__init__')]
373:                import_name = name[:-len('.__init__')]

Lib/test/test_tools/test_i18n.py
107:                creationDate = creationDate[:-len('\\n')]

Lib/tkinter/test/runtktests.py
47:                        ".%s.%s" % (pkg_name, name[:-len(py_ext)]),

@brettcannon
Copy link
Member

Here are the matches for string[len(prefix):]:

Lib/ctypes/macholib/dyld.py
92:        yield os.path.join(executable_path, name[len('@executable_path/'):])

Lib/distutils/filelist.py
325:            pattern_re = r'%s\A%s' % (start, pattern_re[len(start):])

Lib/distutils/sysconfig.py
208:                ldshared = newcc + ldshared[len(cc):]

Lib/distutils/util.py
456:                dfile = dfile[len(prefix):]

Lib/email/feedparser.py
458:                    epilogue[0] = firstline[len(bolmo.group(0)):]

Lib/email/_header_value_parser.py
1197:    value = value[len(atext):]
2240:    value = value[len(ttext):]
2282:    value = value[len(attrtext):]
2323:    value = value[len(attrtext):]
2913:        to_encode = to_encode[len(to_encode_word):]

Lib/encodings/idna.py
128:    label1 = label[len(ace_prefix):]

Lib/http/cookiejar.py
1920:                line = line[len(header):].strip()

Lib/idlelib/help.py
163:                    d = d[len(self.hprefix):]

Lib/idlelib/pyshell.py
1229:                        line = new_base_indent + line[len(orig_base_indent):]

Lib/importlib/_bootstrap_external.py
389:            head = head[len(stripped_path):]
404:        opt_level = optimization[len(_OPT):]

Lib/lib2to3/main.py
74:                                        filename[len(self._input_base_dir):])

Lib/lib2to3/refactor.py
233:                fix_name = fix_name[len(self.FILE_PREFIX):]
666:                yield line[len(prefix):]

Lib/multiprocessing/connection.py
759:    message = message[len(CHALLENGE):]

Lib/test/test_asyncio/test_sock_lowlevel.py
147:        data = data[len(headers):]
201:        data = data[len(headers):]

Lib/test/test_bigmem.py
519:        self.assertEqual(s[len(s):], _(''))

Lib/test/test_bytes.py
1646:            self.assertEqual(list(it), data[len(orig):])

Lib/test/test_codecs.py
407:            before_sequence = before.encode(self.encoding)[len(bom):]
408:            after_sequence = after.encode(self.encoding)[len(bom):]

Lib/test/test_compileall.py
69:            bc = file.read()[len(metadata):]
233:            modpath = mod[len(self.directory+os.sep):]

Lib/test/test_email/__init__.py
153:                testnameroot = 'test_' + name[len(paramsname):]

Lib/test/test_list.py
108:            self.assertEqual(list(it), data[len(orig):])

Lib/test/test_logging.py
1805:        packet = request.packet[len(slen):]

Lib/test/test_sax.py
782:            self.xml('<foo a="1.0">Hello</foo><bar b="2.0"></bar>')[len(self.xml('')):])

Lib/test/test_socket.py
2551:            *(args + self.sendmsg_to_server_defaults[len(args):]))

Lib/test/test_ssl.py
168:    return has_tls_version(name[len('PROTOCOL_'):])

Lib/test/test_tarfile.py
178:            self.assertEqual(fobj.read(), data[len(line):],

Lib/test/test_tempfile.py
84:        nsuf  = nbase[len(nbase)-len(suf):]

Lib/test/test_tools/test_i18n.py
47:                line = line[len('msgid '):]

Lib/tkinter/test/runtktests.py
36:            pkg_name = dirpath[len(basepath) + len(os.sep):].replace('/', '.')

Lib/wsgiref/handlers.py
567:            environ['PATH_INFO'] = path[len(script):]

@brettcannon
Copy link
Member

I just realized I botched my ripgrep search, so there's more:

Lib/pathlib.py
876:            name = name[:-len(old_suffix)] + suffix
Lib/argparse.py
359:                        lines[0] = lines[0][len(indent):]
2024:            positionals[:] = positionals[len(arg_counts):]

Lib/locale.py
146:        right_spaces = s[len(stripped):]

Lib/nntplib.py
227:                token = token[len(h):] if token else None

Lib/modulefinder.py
605:                new_filename = r + original_filename[len(f):]

Lib/ntpath.py
659:                spath = new_unc_prefix + path[len(unc_prefix):]
661:                spath = path[len(prefix):]

Lib/zipimport.py
174:            key = pathname[len(self.archive + path_sep):]

@vstinner
Copy link
Member

Lib/ctypes/macholib/dyld.py
64:                yield path[:-len('.dylib')] + suffix + '.dylib'

The full code is:

            if path.endswith('.dylib'):
                yield path[:-len('.dylib')] + suffix + '.dylib'
            else:
                yield path + suffix

removesuffix() cannot be used directly here.


Lib/email/feedparser.py
364:                                preamble[-1] = lastline[:-len(eolmo.group(0))]

The full code is more complex:

 NLCRE_eol = re.compile(r'(\r\n|\r|\n)\Z')
(...)
                            eolmo = NLCRE_eol.search(lastline)
                            if eolmo:
                                preamble[-1] = lastline[:-len(eolmo.group(0))]

removesuffix() cannot be used directly: it doesn't support regex. Same for the two following examples using mo.group()


Lib/email/_header_value_parser.py
2796:                    encoded_part = part.fold(policy=policy)[:-len(policy.linesep)]

Full code doesn't use endswith(), the suffix is always removed:

                if part.syntactic_break:
                    encoded_part = part.fold(policy=policy)[:-len(policy.linesep)]

removesuffix() cannot be used.


Lib/http/cookiejar.py
1059:                host_prefix = req_host[:-len(domain)]

removesuffix() cannot be used.

I didn't check other examples.

@bskinn
Copy link
Contributor

bskinn commented Apr 10, 2020

IMO these instances where it is not possible to use removesuffix() are a weak argument against adoption of the removeaffix() methods of the PEP.

The removeaffix() methods, as proposed, will have a specific, constrained behavior (which, to note, I would anticipate using with some frequency in, e.g., my data analysis code). If/once adopted, the use of a removeaffix() method at a given location will implicitly communicate that the action needed in that location is exactly this specific behavior. In other cases, where the removeaffix() behavior cannot be used, the use of more complex code has the potential to signal the reader that the required logic is correspondingly more complex.

@vstinner
Copy link
Member

I wrote https://github.com/python/cpython/pull/19455/files to see how the PEP 616 could be used in the stdlib.

@sweeneyde
Copy link
Member Author

Even in situations where it "can't be used" because it would double-check whether the affix matches, it could actually still be used, and is even more performant in many cases by removing the global len lookup:

\cpython> .\PCbuild\win32\python.exe -m pyperf timeit -s "a = 'foobar'; b = 'bar'" "a.removesuffix(b)"
Mean +- std dev: 89.0 ns +- 10.3 ns

\cpython> .\PCbuild\win32\python.exe -m pyperf timeit -s "a = 'foobar'; b = 'bar'" "a[:-len(b)]"
Mean +- std dev: 141 ns +- 18 ns

For instance, although it's technically redundant, we could still write:

            if path.endswith('.dylib'):
                yield path.removesuffix('.dylib') + suffix + '.dylib'
            else:
                yield path + suffix

While not as clear-cut, it's still a benefit in my opinion.

@vstinner
Copy link
Member

@sweeneyde: I don't think that this issue is the right place to discuss that.

@vstinner
Copy link
Member

@sweeneyde: I don't think that this issue is the right place to discuss that.

I suggest to move the "can't be used" discussion in https://github.com/python/cpython/pull/19455/files PR ;-)

@vstinner
Copy link
Member

The Python Steering Council accepts the PEP 616:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants