-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification on JSONata handling of (nested) singleton arrays #93
Comments
Hi Tom, I'm really glad you raised this because it has prompted me to revisit one of the core design points of this language in an attempt to put it on a more robust footing. The idea of collating the results of a location path into a flat array is inspired by the semantics of XPath/XQuery. In XQuery, all results (and intermediate values) are contained in a I think from a pragmatic point of view, this has been good enough; but your examination as an implementor of this language, has exposed a few oddities that need to be ironed out. The issue with all of the expressions you have presented above is that JSONata, in its current implementation is treating all arrays as if they had this This was not really my original intention. I tried to keep the
Making this distinction will make all of your expressions behave as you would expect. E.g. There is a branch called |
Thanks Andrew, that makes sense. Been playing around with the singleton branch in the exerciser and I'm seeing some strange results. I'll maintain a list of these as I spot them here:
The following expression, with the path expression applied directly to inlined JSON, seems to be working as I'd expect: However, if I pull out Another example of this behaviour: But when using root context: |
Thanks Tom. I've just pushed a new commit which hopefully sorts out these inconsistencies. |
@tomklapiscak are you planning to open source that Java port? |
I see a potential problem here, but maybe I'm missing something. One of the things I noticed when looking through the evaluation code in This isn't a real problem, per se. It makes it easy to pass Javascript values in and get Javascript values out and it probably helps out a lot on performance because you don't have to box and unbox values into AST nodes. But, it is limiting because it constrains you to have a 1:1 mapping to Javascript values for your semantics. The potential issue I see here with this notion of a "result sequence". Because you our operating with pure Javascript values (and I want to emphasize...I'm not saying that is a bad idea), you won't have the luxury of introducing a new expression tree node to represent them. As such, you'll have to have rules like you've described for keeping track of which ones had their origins as "result sequences" and which ones were just arrays. My concern is that this could lead to lots of special cases in the evaluation code and that, in turn, could lead to lots of subtle bugs (like the ones @tomklapiscak is talking about when switching contexts). But I could be completely off base here. |
I'll admit that this whole discussion is intriguing, but all seemed a bit "academic" to me -- before now... It seems I'm running into some of these same edge cases, and I'd like to get this group's take on whether what I'm trying to do makes sense, or I'm just not understanding the syntax enough to choose the right incantations... My use case is a payload of query results from a relational database, which typically has this structure:
So the payload itself is always an array, but depending on whether the original input is a single query statement, or multiple statements, the number of arrays in the payload array will be either 1 series or more than one. At this point, I cannot find a common syntax that will allow me to work with both an array of 1 array of results, as well as an array of multiple result arrays. For example, if i need to get a count of the # of series (which equals the number of query results), I would expect $count to work in both cases -- but instead, I see this:
Effectively, the outer array level has been removed when it only contains a single array! Likewise, I have to use two different expressions to get an array of device names from the 1st object in each series (each query result contains data for only 1 device):
So many of my data restructuring tasks have been simplified using Jsonata -- but if i have to code different logic for an array of arrays vs. an array of 1 array, then I'll probably stick to pure javascript just for my own sanity. Please tell me I'm not understanding something, or making some incorrect assumptions. Steve |
I haven't yet had a chance to dig into the new It seems to me that this flattening arrays and/or promoting 1-length arrays to scalars really only needs to be done in the context of certain operators. The obvious ones are So to @shrickus' point, something like Also, you have funny cases (and I saw one in the I was hoping to spend some time this afternoon looking over the |
@shrickus I agree that the The reason for |
I am working on a Java implementation of the JSONata engine, and I'm struggling to understand JSONata's treatment of singleton arrays...
From http://docs.jsonata.org/complex.html:
Within a JSONata expression or subexpression, any value (which is not itself an array) and an array containing just that value are deemed to be equivalent.
Firstly, if so - why does the following hold?
[1] = 1
=>false
(should this not betrue
?)So, outside of the '=' operator itself, I suspect the singleton/primitive equivalence is effectively achieved by "unwrapping" singleton arrays as they are referenced by expressions.
I see the following behaviour:
{'a': 1 }.a
=>1
{'a': [1] }.a
=>1
{'a': [[1]] }.a
=>1
{'a': [[[1]]] }.a
=>[1]
{'a': [[[[1]]]] }.a
=>[[1]]
so it seems to me that JSONata is applying (at most) two levels of flattening to nested singleton arrays pulled out of an associative array via a path. Why 2? Why does it not completely flatten the output array in cases like this?
Next, if I wrap the statements above in an array constructor, I see the the following:
[{'a': 1 }.a]
=>[1]
[{'a': [1] }.a]
=>[1]
[{'a': [[1]] }.a]
=>[1]
these results seem reasonable (given that the embedded statement returns
1
in those cases), however:[{'a': [[[1]]] }.a]
=>[1]
should this not be
[[1]]
? (given than the statement inside the array constructor evaluates to[1]
?The text was updated successfully, but these errors were encountered: