Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

show: elide long strings #40724

Closed
StefanKarpinski opened this issue May 5, 2021 · 1 comment
Closed

show: elide long strings #40724

StefanKarpinski opened this issue May 5, 2021 · 1 comment
Labels
display and printing Aesthetics and correctness of printed representations of objects. feature Indicates new feature / enhancement requests strings "Strings!"

Comments

@StefanKarpinski
Copy link
Member

StefanKarpinski commented May 5, 2021

We helpfully avoid dumping huge arrays into your REPL output if you look at them, eliding the middle and showing just the corners. If someone has a gargantuan string, however, we just dump the whole thing in the REPL. Perhaps we should elide long strings as well? Inspiration: https://stackoverflow.com/questions/67400862/what-is-the-option-in-julia-repl-that-allows-only-a-limited-output. I'm not keen on just putting ... in the middle of the string data like that, but perhaps we could show the quoted head and tail of the string separated by if the string would be more than, say, five lines:

function show_elided(
    io    :: IO,
    str   :: AbstractString;
    dots  :: AbstractString = "",
    width :: Integer = displaysize(io)[2],
    lines :: Integer = 5,
    min   :: Integer = 100,
)
    ldots = length(dots)
    chars = max(min, lines*width - ldots - 4)
    head = nextind(str, 0, (chars+1) ÷ 2)
    tail = prevind(str, ncodeunits(str) + 1, chars ÷ 2)
    if ldots + 3 < tail - head
        show(io, @view str[1:head])
        print(io, dots)
        show(io, @view str[tail:end])
    else
        show(io, str)
    end
end

Example usages:

julia> show_elided(stdout, randstring(10)); println()
"L8DVECZcbZ"

julia> show_elided(stdout, randstring(100)); println()
"EL4zwADuvfloLuw6XczdXx5OidDpIac5RJlFdgVG1PGqYXi3agVzoLTahRPPd3xf3woyCpm6Lyv8IHPvqNgk
AvX7nI9TfKkLX6P6"

julia> show_elided(stdout, randstring(423)); println()
"GZtfbm7dHJdxeL4V2TZ1sowuiMIdFjM3kvhozFPjIENmwT890buJEpAVEzDqN2K6ERTPHny5w87OwQFk5FZQ
hAtG1jQSUkmjCm9dez21E290roOy1k53kVZxyoS99uoMiKPSMSqhRBlEAKDJdx1c0V22Kp2FjFyhPExKj4NPJ
K8gyhHwaKjtfsuHIsc2UcUJ494MvcSpcs8tsYp78lWX5wBKPOXf7UcFzWiuHLA0sWak1NK5dELpMVfbXzwrxY
Do076szkabJkiatBdxKcOiW8fLZSzLWqhhxGwfQSVtlyrL7NSLplpGsCAxFHIXUEmO79zOpVm2hKlPjYTFoO2
FbKpQ47NZ1SclHogkknZzI2mw6EpHrphoTByel7AhAwZPYPqiL2qul43aDKpXZUuAE9zy8rnCHqbSdjl1Ilr"

julia> show_elided(stdout, randstring(424)); println()
"PT9NbVKIYs4ZSrqVQddAgAFKEel5zphoiERB4xrmMxBxD7iyazi4yFK8PWOoF76kxAMm0JIDkwIaCqBBIWKs
nvZL3gshtXBQiEI3kYTSSzopDbyrXn7hd1nqr80ZyvQ4m7wbUQH17jNndhtBbf2LsZ4iyX1Ioe6LN3yirDfKm
rtttt2HC3QjrNaPSLD9Qgqn3Wjj9eexXO5VLUe4y"  "YVw7kmAafXlORaNHs5go27fTCRmhvoZXhfcDfy71
Kj1PtZ851Ote9PQHi4TVzi6Ke9jLUKBIoRkC7Srbh9yDmDRVUmmrmkfHcQsBScTsZRgrVRzeYgeIfHs2xiQby
2xsKT2jVkbPzryYF6Zpceg8gokRfjz2oPN4gJ9eMPgqtXK3TY7gwWv1WAODhEQIEU3fkLDo1ihFDI8zWTFvV"

julia> show_elided(stdout, randstring(10000)); println()
"JVoh8h7GkIcyWBAq3CvfRLXkXe07EbdmkFH3tCQim0PktYTb391XPDYPdOq3pMmudLRlkqmmOkXRQxv6K38f
ASLm27XmFWHTX6FvLtEqRDQ3IMkffXQiCIMPwiQiOO6gncJplhuUJnUQdwXkKKY9R8iGq0Z32OGcNMqZr89pf
uiToerbGJOVGb73pnujVxJJ73RBFC7Sk7G9oEdnQ"  "aUyvT1lfMIrE90JtKrDk9hn9sAFSwOiLSXvc6xKx
lAoE0RqZQBeNWHETvQEAkRLUVDQYMwWq9gaqCNh6lMCwLQilqVUDC9kjhTqkaQvr75ICjYh682JTAMUMPuiGh
Tq3rQarWguMwejI1SYgJhaPXLU0dV1yT5s1dtzepEIPKXSmBuWW2UuANA7VH15ELLwswWAvuQiMujCATIAk9"

I have inserted line breaks where my terminal wraps the screen so that you can see how this would look in a terminal (GitHub's code display doesn't wrap lines). The last example shows that even for a very long string we only show five lines with the ellipsis in the middle. The middle two examples with string lengths of 423 and 424 show that I've got the cutoff logic for when to do the elision correct: 423 fits exactly in five lines whereas 424 wouldn't, so it gets elided.

Note that care needs to be taken with the implementation to avoid doing O(ncodeunits(str)) work, which is easy to accidentally do, e.g. by checking if length(str) is larger than some minimum. This implementation avoids that by advancing characters from the beginning and ends of the string and then checking to see if those indices overlap. We never figure out exactly how many characters the string is, but we know it's enough characters that we need to wrap it, and we know how much of the head and tail to print. The middle of a long string is never examined at all.

Another consideration is character width: this code assumes that all characters have a column width of one. Of course, some characters are two columns, others are zero, and sometimes sequences of characters combine into a single output glyph. Some characters will also be quoted in a string instead of printed. Since we can't know the correct printed width of any string when in a given terminal and font, using the approximation of counting each character as a single column seems fine. Worst case, instead of printing exactly lines lines of text, we'll print a little more or a little less. NBD.

@StefanKarpinski StefanKarpinski added strings "Strings!" display and printing Aesthetics and correctness of printed representations of objects. feature Indicates new feature / enhancement requests labels May 5, 2021
@miguelraz
Copy link
Contributor

This is neat - didn't know I needed it but now I know I do. Nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
display and printing Aesthetics and correctness of printed representations of objects. feature Indicates new feature / enhancement requests strings "Strings!"
Projects
None yet
Development

No branches or pull requests

2 participants