Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: eachline method for strings #6933

Closed
wants to merge 1 commit into from
Closed

WIP: eachline method for strings #6933

wants to merge 1 commit into from

Conversation

ghost
Copy link

@ghost ghost commented May 23, 2014

eachline variant for strings with tests included. It should be compatible with eachline(stream::IO) and produce the same results for the same textual input.

Any and all style/implementation feedback is very much welcome, I am very much a novice when it comes to Julia and the standard library code base.

@ivarne
Copy link
Member

ivarne commented May 23, 2014

This seems like nice functionality (and a +1 for including tests), but I think we should have a discussion about API and naming.

Isn't this just an iterator version of split with a hardcoded split on '\n'. Maybe we should have a splititr instead, that would mimic the interface of split? Usually we have only iterator versions of such functions and document collect(...) for those who need an array, but that would not make much sense for split.

@kmsquire
Copy link
Member

The way this is typically done is to use

io = IOBuffer(str)
for eachline(io)
   # ... do stuff
end

I would think this is sufficient, but perhaps it should be documented better?

@IainNZ
Copy link
Member

IainNZ commented May 23, 2014

I kinda like it, and I'd certainly never (think to) create an IOBuffer just to do it. A splititr function is even cooler though.

@kmsquire
Copy link
Member

Another option would be to define

eachline(s::String) = eachline(IOBuffer(s))

That doesn't handle splititr, of course.

@ninjin, the code itself looks quite nice, very Julian. Cheers!

@ghost
Copy link
Author

ghost commented May 23, 2014

@ivarne: I tend to agree that a splititr is a natural next step and something I should have considered it. I also agree with @IainNZ that the IOBuffer approach did not come naturally to me. @kmsquire: It makes me happy that I am picking up the idioms, learning Julia has been wonderful.

If we had a splititr implementation, eachline would simply become:

eachline(s::String) = splititr(s, '\n')

And we could get some rather awesome string processing loops, like this slightly artificial example for TSV data:

for line in eachline(s)
    for column in splititr(line, '\t')
        # Then a miracle occurs.
    end
end

It is getting a bit late here (JST), but I will try to take the time to cook something up this weekend. First, I will have a look at the current split implementation. Possibly it would be nice to use regexps for the splititr implementation, or would it be better to keep it simple with only character/string matches? Also, should I close this pull request and open a new one or change the title of this one if I continue to work on a splititr variant?

Taking the risk to sound a bit controversial, but the the IOBuffer approach evokes something Java-esque in me. Having to wrap data in order to call rather basic functionality.

@JeffBezanson
Copy link
Member

The iterator object should not be mutable here; this kind of iteration could work exactly like normal string iteration (by character), just advancing to the next delimiter instead of to the next character boundary.

It's also a good idea to generalize EachLine to any delimiter; not much reason not to do that.

@ghost
Copy link
Author

ghost commented May 29, 2014

I just had some pleasant time to sink into this. What I did was that I went with the simple one-liner from @kmsquire, kept the tests, force-pushed to my branch. The reasoning behind this is that it is most likely better to create a new branch for splititr and that using the existing code for eachline(::IO) ensures that it behaves the same way for eachline(::String). I have hacked together an initial version of splititr and will create a separate branch and a new WIP PR for it. The goal will be to cover all the functionality of split.

Also, thanks to @JeffBezanson for the feedback. I think I get the intended usage of the iterator protocol now, but more about that in the upcoming WIP splititr PR.

@ghost
Copy link
Author

ghost commented May 29, 2014

The build failed for clang and passed for gcc, I am pretty sure the failure is unrelated to the PR.

@JeffBezanson
Copy link
Member

This seems pretty unobjectionable to me, though I realized that IOBuffer actually only accepts ByteStrings. We'd need new code to handle general strings. Have to decide how important that is.

@StefanKarpinski
Copy link
Member

Yes, this seems like a good thing to me. I suspsect that the best way to shake out issues and see how pressing they are may be to merge this part and then see what happens.

@JeffBezanson
Copy link
Member

Or if we merge #7027, this is not necessary.

@ghost
Copy link
Author

ghost commented Jun 11, 2014

I tend to agree with the last statement by @JeffBezanson, once I get around to implementing #7027 it would be more natural to use it since it will work for any String. I will change the title of this PR so that it won't be merged my accident at some point and await the completion of #7027 to finally close this PR.

@ghost ghost changed the title eachline method for strings WIP: eachline method for strings Jun 11, 2014
@jiahao jiahao force-pushed the master branch 3 times, most recently from 6c7c7e3 to 1a4c02f Compare October 11, 2014 22:06
@ihnorton ihnorton added the strings "Strings!" label Oct 7, 2015
@vtjnash
Copy link
Member

vtjnash commented Jun 26, 2019

eachline(IOBuffer(s)) is only for utf8 String, but adding this on top of #7027 seems good

@ghost ghost closed this by deleting the head repository Jan 28, 2023
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants