Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement further str split variants via a builder API #168

Closed
pitaj opened this issue Jan 23, 2023 · 8 comments
Closed

Implement further str split variants via a builder API #168

pitaj opened this issue Jan 23, 2023 · 8 comments
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@pitaj
Copy link

pitaj commented Jan 23, 2023

Proposal

Problem statement

There are multiple "options" provided one the str type:

  • reverse direction (rsplit...)
  • inclusive of the separator (split_inclusive)
  • terminated instead of separated (split_terminator, rsplit_terminator)
  • limited number of splits allowed (splitn, rsplitn)
  • split exactly once into a tuple (split_once, rsplit_once)

Currently, various useful combination of these options are missing from the str type (non-exhaustive):

  • inclusive + reversed (rsplit_inclusive)
  • inclusive + limited (splitn_inclusive, rsplitn_inclusive)
  • inclusive + once (split_inclusive_once, rsplit_inclusive_once)

Additionally, there may be demand for before- and after-inclusive versions (split_inclusive is after-inclusive).

Adding all of the combinations would quickly balloon the API surface of str.
The combinatorial explosion would only get worse if more options are added.

Motivation, use-cases

A comment on a PR adding more combinations was the inspiration for this proposal:

We're hesistant to add this many similar types and functions.
...
An idea that was brought up is to not add more iterator types, but to add new methods to the existing ones, to cover most/all of the functionality this PR adds.

Example: rsplit_inclusive

"a+b+c".split_inclusive('+').collect::<Vec<_>>() // => vec!["a+", "b+", "c"]
// reversing the iterator is not enough
"a+b+c".split_inclusive('+').rev().collect::<Vec<_>>() // => vec!["c", "b+", "a+"]
// but some desire vec!["+c", "+b", "a"]

Example: splitn_inclusive

let v: Vec<&str> = "Mary had a little lambda".splitn_inclusive(' ').collect();
assert_eq!(v, ["Mary ", "had ", "a little lambda"]);

Example: split_inclusive_once

assert_eq!("cfg=foo=bar".split_inclusive_once('='), Some(("cfg=", "foo=bar")));

Solution sketches

My proposal is to add a builder API to the existing Split type. This consists of a handful of functions that modify the splitting behavior:

impl<'a, P: Pattern<'a>> Split<'a, P> {
    // `s.split(p).once()` acts like `s.split_once(p)`
    pub fn once(self) -> Option<(&'a str, &'a str)>
    // `s.split(p).to_terminated()` acts like `s.split_terminator(p)`
    pub fn to_terminated(self) -> Self
    // `s.split(p).to_inclusive()` acts like `s.split_inclusive(p)`
    pub fn to_inclusive(self) -> Inclusive<Self>
    // `s.split(p).to_reversed()` acts like `s.rsplit(p)`
    pub fn to_reversed(self) -> Reversed<Self>
    where
         P::Searcher: ReverseSearcher<'a>
    // `s.split(p).with_limit()` acts like `s.splitn(p)`
    pub fn with_limit(self) -> Limited<Self>
}

And these modifiers can be combined to produce any of the existing splitting functions:

Existing fn Builder chain
split(pat) split(pat)
split_inclusive(pat) split(pat).to_inclusive()
rsplit(pat) split(pat).to_reversed()
split_terminator(pat) split(pat).to_terminated()
rsplit_terminator(pat) split(pat).to_terminated().to_reversed()
split(pat).to_reversed().to_terminated()
splitn(n, pat) split(pat).with_limit(n)
rsplitn(n, pat) split(pat).with_limit(n)
split_once(pat) split(pat).once()
rsplit_once(pat) split(pat).to_reversed().once()

Plus more that aren't currently available:

Imaginary fn Builder chain
rsplit_inclusive(pat) split(pat).to_inclusive().to_reversed()
split_inclusive_once(pat) split(pat).to_inclusive().once()
rsplit_inclusive_once(pat) split(pat).to_inclusive().to_reversed().once()
splitn_inclusive(n, pat) split(pat).to_inclusive().with_limit(n)
rsplitn_inclusive(n, pat) split(pat).to_inclusive().to_reversed().with_limit(n)

All of the above (with the exception of once variants) return a type that implements Iterator, DoubleEndedIterator, etc as the existing functions do. The difference is that the type returned is not a standalone struct, but instead a combinator of generic structs. For instance, split(pat).to_reversed() returns Reversed<Split<'a, P>>, split(pat).to_inclusive().to_reversed() returns Reversed<Inclusive<Split<'a, P>>>, etc.

A proof of concept is available in the str_splitter crate. When desired, I can quickly put forward an implementation PR, since the base code there is directly from the standard library source.

This approach has several benefits:

  • no combinatorial explosion of fns or types (just one combinator struct for each modifier)
  • easily extensible with any future modifiers (initiator vs terminator, including the sep before vs after)
  • produces fully useable iterators, just like the existing functions
  • builders are familiar to users of the Rust standard library and ecosystem at large

Alternate solutions

One alternate solution put forward in the aforementioned PR comment is adding extra next functions on the Split struct that will each return the next substring of the given modifier:

For example, std::slice::Split could have a method or two like .next_with_separator() or .next_separator() or .next_including_separator() or something in that direction. We didn't discuss this option in detail, but we'd like to see some exploration of alternatives that do not involve adding so many similar functions and types. An alternative with fewer functions and types might also be easier to document, and easier to learn and use for users.

The downside of this approach is that those would not be usable as iterators in their own right. Additionally, adding more orthogonal modifiers would result in the same combinatorial explosion (though to a lesser degree) of functions on the Split type.

Links and related work

What happens now?

This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.

@pitaj pitaj added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Jan 23, 2023
@pitaj pitaj changed the title Implement further str split combinations via a builder API Implement further str split variants via a builder API Jan 23, 2023
@clarfonthey
Copy link

Adding an extra comment here that it would be nice to also add the ability to include indices of splits, as mentioned in #222.

@Amanieu
Copy link
Member

Amanieu commented May 18, 2023

We discussed this in the libs meeting today. We don't like the large API surface that this introduces, and think this would be a better fix for an exotic-splits crate which provides an extension trait for str (and slice as well, potentially).

@Amanieu Amanieu closed this as completed May 18, 2023
@pitaj
Copy link
Author

pitaj commented May 18, 2023

I don't really understand how this is a large API surface. My implementation in the str_splitter crate has only 5 functions and 3 structs (or a single struct with const generic parameters). Sure, it's not just a couple functions, but it's also not 10.

Any such external crate will have to reimplement Pattern until it's stabilized and will also need to reimplement all of the splitting machinery from the standard library. That seems very unfortunate.

It's also weird to send people out of the standard library for rsplit_inclusive when the forward variant exists, as one example.

@Amanieu
Copy link
Member

Amanieu commented May 18, 2023

One of the particular concerns that was raised is that builder APIs are not very ergonomic to work with in practice, except in certain cases where there is a potentially unbounded number of options to apply (e.g. Command, Iterator). A better API design to avoid code duplication would be a method that supports several operations via additional parameters (possibly using Option or enum).

@BurntSushi
Copy link
Member

A better API design to avoid code duplication would be a method that supports several operations via additional parameters (possibly using Option or enum).

I would want to see a concrete design proposal here, but I'm not sure I would agree with. I am personally a fan of builders and think they work quite nicely.

But I don't mean to make any universal claims. A non-builder API might work better here. Not sure.

@the8472
Copy link
Member

the8472 commented May 18, 2023

A builder might make sense if we expect it to generalize along additional dimensions in the future, something that a method can't handle. But such a thing could still first be designed in a crate.

@pitaj
Copy link
Author

pitaj commented May 19, 2023

One of the particular concerns that was raised is that builder APIs are not very ergonomic to work with in practice

The builder API proposed here is different from most in that every stage is a totally usable iterator. It doesn't require an explicit .build() step. It's more of a combinatorial approach, but easily chainable with methods.

To me, it feels quite ergonomic. It works well with editor suggestions.

A builder might make sense if we expect it to generalize along additional dimensions in the future

Such future extensions have already been proposed, such as left/right inclusivity and (slice, index) iterators.

But such a thing could still first be designed in a crate.

I'm not sure if this was just missed, but I have implemented this approach two different ways in the str_splitter crate. Take a look at the docs for Splitter as an example:

https://docs.rs/str_splitter/latest/str_splitter/combinators/struct.Splitter.html

All of the doctests in that crate passed when I last published it, but due to relying on nightly features it could have changed.

(Also it provides as_str instead of remainder as that change was introduced after I published)

@the8472
Copy link
Member

the8472 commented May 19, 2023

That's great. If there's a lot of demand for it in the future we can reconsider it for inclusion in the library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

5 participants