Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing a string method that works for iterators and generators #57180

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

Priynsh
Copy link
Contributor

@Priynsh Priynsh commented Jan 28, 2025

fixes #57072 , allowing queries like
String(Iterators.map(c->c+1, "Hello, world"))
to work. also adds tests for the same :)

Comment on lines 1297 to 1301
collected = collect(x)
if !(isa(collected, AbstractVector) && all(x -> isa(x, Char), collected))
throw(MethodError(String, (x,)))
end
return String(collected::AbstractVector{<:AbstractChar})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot guarantee that collected is an AbstractVector{<:AbstractChar}, even if it only contains Char.

Also, I'm not too fond of throwing a method error here. The method does exist, but the data may not be applicable. Consider an iterator that may return Char or Nothing. In this case, it will only throw a method error sometimes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about collected = collect(Char, itr) ? That would avoid if statement with the the all( isa) call, which requires an extra wipe through the data.
The call to collect guarantees an Array{Char,N} return value, where N == 1 if the iterator is 1-dimensional, N == 0, if the "iterator" is a number.

The String_iterator and call of IteratorSize would be avoided altogether by this:

String(itr) = String(collect(Char, itr))
String(x::AbstractArray{Char}) = throw(MethodError(String, (x, )))

Comment on lines +1281 to +1284
- **String(x::AbstractIterator)**
- Converts an iterator into a string.
- Throws a `MethodError` if the iterator contains invalid data types (non-Char types) or if it is an infinite iterator.
- Ensures that the result is a valid string representation composed solely of characters (`Char`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just make this a concise one-liner, which is more efficient than your version too (since it needs to make fewer intermediate copies):

String(x) = sprint(join, x)
Suggested change
- **String(x::AbstractIterator)**
- Converts an iterator into a string.
- Throws a `MethodError` if the iterator contains invalid data types (non-Char types) or if it is an infinite iterator.
- Ensures that the result is a valid string representation composed solely of characters (`Char`).
- **String(any_iterable)**
- prints an iterable object into a string using `join`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But wait, do we want this? This has completely different semantics from the existing String function. For example, with this change we get:

julia> String(Int8[101, 102])
"101102"

julia> String(UInt8[101, 102])
"ef"

To me this seems like the wrong meaning of the type constructor String. This should only be used to convert things that are already a little bit 'string-like' and should not be conflated with print, which this implementation suggests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that makes sense. This seems more like write then:

String(x) = sprint(io -> foreach(c -> write(io, c), x))

The print behavior was from looking at the implementation of String(::AbstractVector{<:AbstractChar}), but that probably also can also be calling write, since it is a case where print is defined to be a call to write.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good idea, Examples Char(0x2ee8) and String([0x2ee8]) should be consistent. Also the dependency on Litte/Big-Endiness is not expected.
I think the prototype should be like ... String(collect(Char, x)) as I mentioned before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, so the problem with making just one liners is the error statements end up being loose for cyclic iterators, or even integers in some cases. So sticking with the ... String(collect(Char, x)) prototype.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be consistent

That seems to only works for ASCII examples though. For utf8 bytes, the existing String methods already define that those results are not consistent. For your example, I think the existing method that behaves like that is called transcode, though it would be hard to generalize to this (since it doesn't in general iterate the contents of the source but rather decode it)

new implementation without parsing twice
adding new tests for non-ascii characters
@Priynsh Priynsh requested a review from KlausC January 30, 2025 12:38
@Priynsh Priynsh marked this pull request as ready for review February 4, 2025 10:56
@Priynsh
Copy link
Contributor Author

Priynsh commented Feb 6, 2025

bump.

@Priynsh Priynsh requested a review from vtjnash February 6, 2025 11:21
Co-authored-by: Jameson Nash <[email protected]>
Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I noted in the comments, this only works correctly for integers encoding for ASCII. It maybe should type assert that it is AbstractChar or that the sizeof it is 1 byte?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

implement String(::Base.Generator)
4 participants