Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store.get() and store.sparql() return different values for similar blank nodes #134

Closed
ehupin opened this issue Mar 7, 2021 · 12 comments
Closed
Labels

Comments

@ehupin
Copy link

ehupin commented Mar 7, 2021

Hi,

I am quite new to the RDF world and I am currently testing node-quadstore as a solution to persist triples.
During my tests I found what seems to be a weird behavior, as the returned values for blank nodes differ when they are fetch using store.get() vs store.sparql().

Here is a script to reproduce this:

import {Quadstore} from "quadstore";
import {newEngine} from "quadstore-comunica";
const leveldown = require('leveldown')
const rdf = require('rdf-ext')

async function test() {
    const db = leveldown('test')
    const store = new Quadstore({
        backend: db,
        comunica: newEngine(),
        dataFactory: rdf
    });

    const intermediaryNode = rdf.blankNode()
    const quads = [
        rdf.quad(
            rdf.namedNode('http://example.org/from'),
            rdf.namedNode('http://example.org/link'),
            intermediaryNode
        ),
        rdf.quad(
            intermediaryNode,
            rdf.namedNode('http://example.org/link'),
            rdf.namedNode('http://example.org/to'),
        )
    ]

    await store.open()
    await store.multiPut(quads)

    const getResult = await store.get({})
    console.log(getResult.items)
    // [
    //     QuadExt {
    //         subject: BlankNodeExt { value: 'b1' },
    //         predicate: NamedNodeExt { value: 'http://example.org/link' },
    //         object: NamedNodeExt { value: 'http://example.org/to' },
    //         graph: DefaultGraphExt { value: '' }
    //     },
    //     QuadExt {
    //         subject: NamedNodeExt { value: 'http://example.org/from' },
    //         predicate: NamedNodeExt { value: 'http://example.org/link' },
    //         object: BlankNodeExt { value: 'b1' },
    //         graph: DefaultGraphExt { value: '' }
    //     }
    // ]


    const sparqlResult = await store.sparql(`SELECT * WHERE { ?s ?p ?o}`);
    console.log(sparqlResult.items)
    // [
    //     {
    //         '?s': BlankNode { termType: 'BlankNode', value: 'b11' },
    //         '?p': NamedNodeExt { value: 'http://example.org/link' },
    //         '?o': NamedNodeExt { value: 'http://example.org/to' }
    //     },
    //     {
    //         '?s': NamedNodeExt { value: 'http://example.org/from' },
    //         '?p': NamedNodeExt { value: 'http://example.org/link' },
    //         '?o': BlankNode { termType: 'BlankNode', value: 'b12' }
    //     }
    // ]


    await store.close()
}

test()

What bugs me here is that the blank node I created is returned as a single one (b1) when I use store.get(), but it is returned as two different ones (b11 and b12) when I use store.sparql().

Am I missing something about how this work, and should I change the way I create/store/fetch my data to prevent such a behavior?

Here are the versions I use:

    "leveldown": "^5.6.0",
    "quadstore": "^8.0.0",
    "quadstore-comunica": "^0.3.1",
    "rdf-ext": "^1.3.1"
@jacoscaz
Copy link
Collaborator

jacoscaz commented Mar 7, 2021

Hello @ehupin! Thank you for that code snippet, I can confirm I am able to reproduce this. It's weird, the correct behavior is the one you're getting from store.get() and that is also the behavior I would have expected from store.sparql().

As a temporary workaround you can skolemise blank nodes into named nodes, which I would recommend anyway as blank nodes can be rather confusing.

@rubensworks is this expected behavior in Comunica? Quadstore returns blank nodes with the same labels they had when inserted, which leads me to think that something in Comunica's handling of blank nodes might be causing this.

@namedgraph
Copy link

The client shouldn't expect stable blank nodes though. That's what URIs are for :)

@jacoscaz
Copy link
Collaborator

jacoscaz commented Mar 7, 2021

Just for clarify and set expectations: although blank node labels are not guaranteed to be stable, quads and the relationships between quads are. The issue, here, is not that the labels change between store.get() and store.sparql() but, rather, that the way they change when using the latter is breaking the relationship between those quads.

@ehupin
Copy link
Author

ehupin commented Mar 8, 2021

Thanks all for your answers!
First it gaves me a better understanding the issue but moreover it helps me to better grasp how to use and what are the limitations of blank nodes. Skolemization is an interesting subject that I will definitely explore!

@rubensworks
Copy link

rubensworks commented Mar 8, 2021

is this expected behavior in Comunica?

Yep, that's expected and intentional behaviour. We have to do this to ensure non-clashing bnodes when querying over multiple sources.

It's quite normal for RDF tools to modify bnode labels like this, as you can indeed never attach meaning to them when using them across different documents/contexts.

Scratch my latest reply. I didn't read the issue well enough.

Links between blank nodes are only defined within the context of a single document or query execution. However, no meaning should be attached to their concrete labels, as these can change at any time.

In that sense, the output of store.get is correct, but store.sparql is wrong. (Either the bnode in the first triple should have label b12, or the second triple should have b11)
Something probably is going wrong at the connection point between Quadstore and Comunica.

@jacoscaz
Copy link
Collaborator

I should be able to look into this within a week from today. Apologies for the latency, I'm having a couple of very intense weeks.

@jacoscaz
Copy link
Collaborator

@rubensworks I've managed to reproduce the problem with N3.Store.

I've modified the script provided by @ehupin to be able to easily switch between quadstore and N3.Store. It still uses a bunch of utils and types from quadstore as I haven't had the time to make it fully agnostic but the main parts are now implementation-independent:

  • I'm using RDF/JS interfaces to import quads (.import()) and read quads (.match());
  • I'm using the Comunica engine instance directly, passing the instantiated store to it.

Is there something in how I am packaging Comunica that might trigger this?

import {Quadstore} from "./lib/quadstore";
import {newEngine} from "quadstore-comunica";
import {BindingArrayResult, QuadArrayResult} from './lib/types';
import leveldown from 'leveldown';
import {DataFactory} from 'rdf-data-factory';
import { Store as N3Store } from 'n3';
// const rdf = require('rdf-ext')
import { ArrayIterator, wrap } from 'asynciterator';
import { streamToArray } from './lib/utils';
import {Algebra, translate} from 'sparqlalgebrajs';

async function test() {

  const rdf = new DataFactory();
  const engine = newEngine();

  const store = new N3Store();

  // const db = leveldown('test');
  // const store = new Quadstore({
  //   backend: db,
  //   comunica: newEngine(),
  //   dataFactory: rdf
  // });
  // await store.open();

  const intermediaryNode = rdf.blankNode();
  const quads = [
    rdf.quad(
      rdf.namedNode('http://example.org/from'),
      rdf.namedNode('http://example.org/link'),
      intermediaryNode
    ),
    rdf.quad(
      intermediaryNode,
      rdf.namedNode('http://example.org/link'),
      rdf.namedNode('http://example.org/to'),
    )
  ]

  await new Promise((resolve, reject) => {
    store.import(new ArrayIterator(quads))
      .on('end', resolve)
      .on('err', reject)
    ;
  });

  // @ts-ignore
  const storeQuads: Quad[] = await streamToArray(store.match());
  console.log(storeQuads);

  const sparqlQuery = 'SELECT * WHERE { ?s ?p ?o}';
  const sparqlOperation = translate(sparqlQuery, { quads: true, dataFactory: rdf });
  const sparqlResult = await engine.query(sparqlOperation, { source: store });
  // @ts-ignore
  const sparqlBindings = (await sparqlResult.bindings()).map(b => b.toObject());
  console.log(sparqlBindings);

}

test().catch((err) => {
  console.error(err);
  process.exit(1);
});

@rubensworks
Copy link

Hmm, that's not good.
Not sure what could cause this.
My first guess would be somewhere here https://github.com/comunica/comunica/blob/master/packages/actor-query-operation-quadpattern/lib/ActorQueryOperationQuadpattern.ts

Now that I think of it, this sounds similar to comunica/comunica#773, which I initially thought to be a parsing issue, but may very well have the same cause as here.

@jacoscaz
Copy link
Collaborator

@rubensworks I'll see whether I can replicate this in a new test within Comunica's test suite and open an issue over there if so.

@jacoscaz
Copy link
Collaborator

Opened issue upstream: comunica/comunica#795

@jacoscaz
Copy link
Collaborator

jacoscaz commented Mar 28, 2021

For posterity: this has required some work in both Comunica and sparqlee, the latter being Comunica’s SPARQL expression evaluator. I think we’re relatively close to fixing this and the fix will surely be included in the next version of quadstore. Relevant issues and PRs:

@jacoscaz
Copy link
Collaborator

jacoscaz commented Apr 4, 2021

Released in [email protected]!

@jacoscaz jacoscaz closed this as completed Apr 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants