Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements with text processing #14196

Closed
timholy opened this issue Nov 30, 2015 · 5 comments
Closed

Performance improvements with text processing #14196

timholy opened this issue Nov 30, 2015 · 5 comments
Assignees
Labels
performance Must go faster

Comments

@timholy
Copy link
Member

timholy commented Nov 30, 2015

I don't have time to make this a PR right now, so this is just a placeholder so it doesn't get lost: see https://groups.google.com/d/msg/julia-users/2uaRs3JIdfw/hMLdj6wxCwAJ. The most important change was making the string encoding a type parameter of what is effectively EachLine, and then using that in downstream code to prevent type instability.

Without this, on the "Hungarian Wikipedia" test data set (see https://github.com/juditacs/wordcount), some lines come back as UTF8String and some as ASCIIString. Might as well keep it consistent.

@tkelman tkelman added the performance Must go faster label Nov 30, 2015
@JeffBezanson
Copy link
Member

ref #1792

@nalimilan
Copy link
Member

Looks like this is another issue which will get fixed by merging ASCIIString and UTF8String into a single String type (#14383), which should get rid of type instability in EachLine.

@timholy What other lessons to you draw from that thread regarding thread processing? I must say it's not immediately obvious to me.

@timholy timholy self-assigned this Feb 20, 2016
@timholy
Copy link
Member Author

timholy commented Feb 20, 2016

Thanks for the reminder. I have a grant deadline of Wednesday, after which I'll try to remember to come back to this.

@nalimilan
Copy link
Member

No worries. I was just trying to link related issues together.

@timholy
Copy link
Member Author

timholy commented Feb 10, 2017

Fixed by the strings changes.

@timholy timholy closed this as completed Feb 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

4 participants