Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can/should we take repr seriously? #33260

Closed
tkf opened this issue Sep 13, 2019 · 21 comments
Closed

Can/should we take repr seriously? #33260

tkf opened this issue Sep 13, 2019 · 21 comments
Labels
display and printing Aesthetics and correctness of printed representations of objects.

Comments

@tkf
Copy link
Member

tkf commented Sep 13, 2019

Currently, repr in Julia is maybe evaluatable or parseable or (often) just a string representation only consumable by humans. However, it could be much useful if it is guaranteed to be evaluatable. For example, you can find a lot of examples in base/loading.jl which is used as simple IPC mechanism:

julia/base/loading.jl

Lines 1148 to 1151 in 9a8b2fd

code = """
append!(empty!(Base.DEPOT_PATH), $(repr(map(abspath, DEPOT_PATH))))
append!(empty!(Base.DL_LOAD_PATH), $(repr(map(abspath, DL_LOAD_PATH))))
"""

In case of base/loading.jl it is OK to use repr because the types used there are known to have evaluatable repr. However, some basic types like PkgId cannot be used in this way:

julia> repr(Base.PkgId(InteractiveUtils))
"InteractiveUtils [b77e0a4c-d291-57a0-90e8-8db25a27a240]"

Another concrete example I have in mind where repr can be useful is to create a plot "spec" using DataVoyager.jl and dump it into a runnable Julia script. Currently, using it needs a rather convoluted invocation like show(IOContext(stdout, :compact=>false), "text/plain", spec) because repr is not a reliable entry point for evaluatable code.

We have Serialization module for serializing complex Julia objects. Unfortunately, it cannot be used for communicating with different Julia versions. For complex serializations it is probably a good idea to use something like JSON instead of Julia code. However, I think the above examples are good enough motivation to have repr for simple cases.

It was suggested to add a flag like parseable (e.g., #33178 (comment), #30683 (comment)) via the IOContext mechanism. I have a concern about this direction because it is practically impossible to enforce parsability this way (a brief discussion with @JeffBezanson in #30757 (comment)). I think the overloading API should be clear that the implementer has to make it evaluatable. This can be done by introducing overloading API such as show(::IO, ::MIME"application/julia", obj) or repr(::IO, obj).

Questions:

  • Do we need an API to produce Julia code that is (very likely to be) evaluatable? (100% guarantee is impossible when using different versions of libraries and/or Julia)
  • How should it be implemented? Based on IOContext? New entry point (e.g., show(::IO, ::MIME"application/julia", obj))? Something else?
@JeffBezanson JeffBezanson added the display and printing Aesthetics and correctness of printed representations of objects. label Sep 13, 2019
@JeffBezanson
Copy link
Member

I have a concern about this direction because it is practically impossible to enforce parsability this way

I don't really see why --- telling people to handle the flag doesn't seem any worse than telling people to add a particular method.

@vtjnash
Copy link
Member

vtjnash commented Sep 16, 2019

Aside: eval-uatable or parse-able can be totally unrelated things, so it's not necessarily meaningful to say it would be both. There's also just many things that don't have a eval representation, but can be serialized or printed.

@JeffBezanson
Copy link
Member

A big part of the problem seems to be that it's hard to characterize how array/dataframe elements should be printed in the REPL. They don't need to be parseable, and we want to use a "nice"/non-parseable representation when possible, but we also want to print type info like the f0 suffix, and we want to quote strings. It's a mix of requirements that's hard to associate with a single function like show or print that should have a simple description.

@tkf
Copy link
Member Author

tkf commented Sep 16, 2019

I don't really see why --- telling people to handle the flag doesn't seem any worse than telling people to add a particular method.

I believe relying on people to not forget something is a bad idea. People forgetting to define HasEltype is a good example. I think overloading API should help people recognizing that they must construct (at least) parseable representation.

It's a mix of requirements that's hard to associate with a single function like show or print that should have a simple description.

Isn't it an argument favoring adding an API for repr implementation? It would decouple repr from display and print.

@JeffBezanson
Copy link
Member

OK, adding repr(::IO, x) would be pretty simple and worth considering.

One thing I want to discuss more is, what are the use cases for non-parseable show? Are there any types where we want print, show, and repr to all be different? I'm skeptical. Again, it seems most of the complexity is in how we want array elements to look, rather than in needing many different kinds of printing per se. For example, given these definitions:

print: pretty, concise, doesn't need to be parseable
show: parseable

I think the desired array element printing is very close to:

default: print
numbers: print if type is implied by eltype, otherwise show
strings: show

Any examples where that would go wrong? And are there any other contexts like this?

@tkf
Copy link
Member Author

tkf commented Sep 16, 2019

One thing I want to discuss more is, what are the use cases for non-parseable show?

The only reason I can think of is backward compatibility. It would be great if show(::IO, x) were not used for array's text/plain show and it is used only for repr. I'm proposing a new entry point since it sounds difficult for entire Julia community to switch to it. It probably is easier to introduce repr(::IO, x) and then deprecate show(::IO, x) while switching to Julia 2.0.

I understand that co-existence of similar APIs is very unsatisfactory. But addition of a new API is the only option I can think of to clean up show infrastructure and make repr work while preserving backward compatibility (which of course can be due to my lack of imagination).

I think the desired array element printing is very close to:

default: print
numbers: print if type is implied by eltype, otherwise show
strings: show

I suggest to not include this logic inside array printing. Rather, I think array printer should just set compact and typeinfo for text/plain show and then the above logic should be implemented inside text/plain show of each type. This is because:

  • It's usable for other printing of collections (especially table-like data structures).
  • It avoids confusion that show can be used to pretty-print element values. (OK, maybe this is weak, given that not many people implement numbers)

@tkf
Copy link
Member Author

tkf commented Sep 16, 2019

I think array printer should just set compact and typeinfo for text/plain show

It also should set displaysize to handle something like a vector of vectors, so that inner array can switch to printing like [1, 2, 3] when the height is 1.

@oxinabox
Copy link
Contributor

oxinabox commented Dec 2, 2019

I think we should be using MIME types to control this?
I thought we already were.

@vtjnash
Copy link
Member

vtjnash commented Dec 2, 2019

Yes, we use MIME for that. We briefly used an IOContext flag for it (multiline instead of displaysize[2] == 1, but otherwise basically the same). It was a violation of the purpose of IOContext, and thus did badly—search the old issues for this. We could potentially test for both options (MIME(text/plain) + displaysize[2] == 1 is a valid test combo in 3-arg mime-show), but the caller would need to be prepared to handle multiline output then (since IOContext values are only advisory).

@tkf
Copy link
Member Author

tkf commented Jan 30, 2020

I think #34387 is a big step for fixing the situation (thanks to fchorney!). Ref: #30901 (comment)

But not everything is resolved since show(io::IO, ::MIME"text/plain", X::AbstractArray) still does not set IOContext(io, :compact => true) always

julia/base/arrayshow.jl

Lines 329 to 332 in 1d918dd

# 1) compute new IOContext
if !haskey(io, :compact) && length(axes(X, 2)) > 1
io = IOContext(io, :compact => true)
end

That is to say, 3-arg show for element types still cannot detect if it is called by the show of the container in a reliable manner.

@JeffBezanson You suggested to keep this in #34387 (comment) (just for now?). Is there a plan to improve this?

Also, it would be nice if fallback to 2-arg show

julia/base/arrayshow.jl

Lines 108 to 111 in 1d918dd

# If the output contains line breaks, try 2-arg show instead.
if occursin('\n', sx)
sx = sprint(show, x, context=io, sizehint=0)
end

can be removed in a future version of Julia. If 2-arg show is meant to be used for repr, we should expect this to generate larger output on average. Furthermore, the fallback to 2-arg show motivates users to overload it to customize how it is printed inside container. This often contradicts with how repr works.

@goretkin
Copy link
Contributor

goretkin commented May 9, 2020

Instead of requiring repr to return a Meta.parse-able string, I would consider a new uneval to return a eval-able expression. Like eval (or perhaps more like macroexpand), you can provide uneval with a module, since names require different qualification depending on which module you evaluate them in. I think this is the reason for an IOContext :module key.

I can't tell if that punts most of the problem to now serializing expressions into strings, but I do like that it separates it, and I think it addresses #33260 (comment).

example:
OffsetArrays doesn't "repr it as you build it", so a less verbose version (filter out the parser provenance LineNodes, and begin blocks) of this would be great

import OffsetArrays: OffsetArray

uneval(x) = quote $x end

function uneval(o::OffsetArray)
  p = uneval(parent(o))
  a = uneval(axes(o))
  oa = uneval(OffsetArray)
  quote
    ($oa)($p, $a)
  end
end

a = OffsetArray(rand(3,3), (3:5, 3:5))
@show repr(a)
println("show:")
show(stdout, "text/plain", a)
println()

@show string(uneval(a))
@show eval(Meta.parse(string(uneval(a)))) == a
repr(a) = "[0.39017124461364094 0.5295100509285167 0.15327227892449202; 0.2674771551403925 0.9820070800345242 0.30326872122566284; 0.438120593884042 0.01486844883391325 0.3635461803662967]"
show:
3×3 OffsetArray(::Array{Float64,2}, 3:5, 3:5) with eltype Float64 with indices 3:5×3:5:
 0.390171  0.52951    0.153272
 0.267477  0.982007   0.303269
 0.438121  0.0148684  0.363546
string(uneval(a)) = "begin\n    #= /private/tmp/uneval.jl:10 =#\n    (begin\n        #= /private/tmp/uneval.jl:3 =#\n        OffsetArray\n    end)(begin\n            #= /private/tmp/uneval.jl:3 =#\n            [0.39017124461364094 0.5295100509285167 0.15327227892449202; 0.2674771551403925 0.9820070800345242 0.30326872122566284; 0.438120593884042 0.01486844883391325 0.3635461803662967]\n        end, begin\n            #= /private/tmp/uneval.jl:3 =#\n            (3:5, 3:5)\n        end)\nend"

eval(Meta.parse(string(uneval(a)))) == a = true

EDIT:
MacroTools is wonderful.

import MacroTools: prewalk, rmlines, unblock
@show string(prewalk(rmlines  unblock, uneval(a)))
"(OffsetArray)([0.39017124461364094 0.5295100509285167 0.15327227892449202; 0.2674771551403925 0.9820070800345242 0.30326872122566284; 0.438120593884042 0.01486844883391325 0.3635461803662967], (3:5, 3:5))"

@mbauman
Copy link
Member

mbauman commented May 29, 2020

The current repr is simply documented and defined to be not much more than sprint(show, _). I'd advocate for its deprecation in 2.0. I think any fully round-trippable function probably needs to be opt-in (with no default definition) and should have a new name completely separated from show.

@StefanKarpinski
Copy link
Member

Why do we need a textual format that’s round trippable? I have yet to encounter a situation where this necessary. Seems like a very hard and complex requirement to try to satisfy.

@goretkin
Copy link
Contributor

Why do we need a textual format that’s round trippable? I have yet to encounter a situation where this necessary. Seems like a very hard and complex requirement to try to satisfy.

For me, it makes it more enjoyable to use a REPL. I think it's the same justification for why people spend effort making configuration file formats that are both human- and machine-readable.

It would let a user print out some complicated nested data structure, be able to select a piece of it, and copy the selction, and paste into the REPL to get an that value. If we weren't beholden to text terminals, and we had smarter terminals with better copy and pasting, then it would likely obviate the need for a format like this, since you could just have a presentation format, and in parallel have an underlying representation that lets you reconstruct the value from the clipboard.

@StefanKarpinski
Copy link
Member

I get that it’s nice which is why repr is suggested to have this property. But handling all the tricky cases like circularity in data structures is really absurdly difficult. That’s why I’m in favor of having a convention round-tripability rather than a hard requirement.

@tkf
Copy link
Member Author

tkf commented May 30, 2020

Why do we need a textual format that’s round trippable? I have yet to encounter a situation where this necessary.

As I've explained in the OP, Base uses repr and there are other situations this would be useful. For example:

That’s why I’m in favor of having a convention round-tripability rather than a hard requirement.

I think an API that may or may not satisfy some property is rather hard to use because you can't rely on it. That's why I think @mbauman's suggestion for making this opt-in makes sense.

@StefanKarpinski
Copy link
Member

  • Vega.jl: doesn't actually need to be parsable, just reasonably readable; if you need the original user input, you should save it, e.g. like regexes do.
  • PkgBenchmark.jl: only needs to be parsable for very limited types, serialization would be the better way to do this.
  • BenchmarkCI.jl: same kind of thing, only needs to work for strings and booleans.
  • Aqua.jl: same deal, only used for vectors of strings, serialization would be the better way to do this.

Again, I'm not saying that it's not nice if the output of repr is parseable and you can use it that way if you're confident that for the kinds of values you're printing are of limited types where repr does work, but insisting that repr be able to print arbitrary objects in such a way that parsing and evaling the resulting string gives you back an equal object is just way too much. Serialization does this without the requirement that the serialized representation be a valid Julia expression and it's already super complicated and adds a ton of complexity. What's being proposed here would foist that kind of overhead and complexity on every call to the simple repr function.

Here's an example. How would you define repr for this type:

mutable struct X
    x::X
    function X()
        x = new()
        x.x = x
    end
end

Currently we print it like this:

julia> x = X()
X(X(#= circular reference @-1 =#))

How would you print this so that the printed expression evaluates back to an equal value? Or what about this array:

julia> a = Any[1, 2, 3, 4]
4-element Array{Any,1}:
 1
 2
 3
 4

julia> a[3] = a;

julia> repr(a)
"Any[1, 2, Any[#= circular reference @-1 =#], 4]"

Do we disallow calling repr on arrays?

@tkf
Copy link
Member Author

tkf commented Jun 1, 2020

Vega.jl: doesn't actually need to be parsable, just reasonably readable; if you need the original user input, you should save it, e.g. like regexes do.

I don't think so. It's reasonable (and already done) to support converting arbitrary vega/vega-lite spec back to Julia syntax. For example, you'd want to print a spec as Julia code and then put it in a script so that it can be re-used for different dataset.

Saving the original code is not an option since vega-lite spec can be coming from elsewhere like a GUI to construct the spec: https://github.com/queryverse/DataVoyager.jl

Serialization does this

As I discussed in the OP, using Serialization is not appropriate sometimes. For example, you can't cross Julia versions or sessions with different sysimages.

Also, using Serialization (or something like JSON) is a lot more complicated than constructing code to be executed. I don't think it's appropriate for something like pre-compilation mechanism in Base and the packages like PkgBenchmark.jl.

What's being proposed here would foist that kind of overhead and complexity on every call to the simple repr function.

That's why I said opt-in makes sense. Why not do it when it's not super complex?

Do we disallow calling repr on arrays?

If we have repr2 that is opt-in, it'd throw if you do repr2(a). It's very safe because you'd know that it doesn't work at the time you call it rather than the time you try to parse it.

@JeffBezanson
Copy link
Member

Yes, I don't think anybody here wants a major effort to give all objects parseable representations no matter what. There are a couple cases:

  1. Cycles and shared references: require heavy parser support, we probably won't do it.
  2. Impossible cases, e.g. foreign or stateful objects of some kind.
  3. Corner cases where parseable output is possible but awful.

So I think all anybody wants is a mode where 1 & 2 give an error at printing time, and 3 gives you the parseable output even though it's awful. A :parseable flag seems like a reasonable way to get that: initially it will be the same as repr, then just get more accurate and strict over time.

@tkf
Copy link
Member Author

tkf commented Jun 1, 2020

I think a flag would not be enough because an implementer can easily forget about checking :parseable. I think it's better to have an API that helps the implementer to notice what properties must hold for the result of the function. I think this is essential for making sure that trying to get a parsable representation of unsupported object throws an error at print-time.

@vtjnash
Copy link
Member

vtjnash commented Jun 2, 2020

We slowly get closer over time with Core.println() (aka jl_static_show) to being a parsable textual format. The other difficult problem is handling shared references however, since x = []; [x, x] is rather hard to describe clearly in a textual format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
display and printing Aesthetics and correctness of printed representations of objects.
Projects
None yet
Development

No branches or pull requests

7 participants