Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

base64Binary literals may not deserialise correctly with classic-level #152

Closed
gsvarovsky opened this issue Sep 13, 2022 · 10 comments
Closed

Comments

@gsvarovsky
Copy link
Contributor

e.g.:

dataFactory.literal('MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDFi56F11nqql9rT2OBNrYaEfBXEALnXVPjxjcwQYQjixt4Cakq37+vEzBdL8s3nMEFsPu6L59cqq6Obt3Oh1qnVZL+hY/mC9lZ33rXQsfCd7eQYm+aHKj/A8wqLK5BTtX6FRFDXaH+YdEwI8KS7eOFuwZjQ+LPphK8e4ZpfmVsZQIDAQAB', dataFactory.namedNode('http://www.w3.org/2001/XMLSchema#base64Binary'))

does not deserialise correctly when retrieved in classic-level.

Test case (patch to existing quadstore.prototype.get.js test):

Index: test/quadstore.prototype.get.js
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/test/quadstore.prototype.get.js b/test/quadstore.prototype.get.js
--- a/test/quadstore.prototype.get.js	(revision 78fd616f7b55d70e8c89733fd7e6c223d5b1cc66)
+++ b/test/quadstore.prototype.get.js	(date 1663081452226)
@@ -54,6 +54,12 @@
           dataFactory.literal('Hello, World', 'en-us'),
           dataFactory.namedNode('ex://c4'),
         ),
+        dataFactory.quad(
+          dataFactory.namedNode('ex://s5'),
+          dataFactory.namedNode('ex://p5'),
+          dataFactory.literal('MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDFi56F11nqql9rT2OBNrYaEfBXEALnXVPjxjcwQYQjixt4Cakq37+vEzBdL8s3nMEFsPu6L59cqq6Obt3Oh1qnVZL+hY/mC9lZ33rXQsfCd7eQYm+aHKj/A8wqLK5BTtX6FRFDXaH+YdEwI8KS7eOFuwZjQ+LPphK8e4ZpfmVsZQIDAQAB', dataFactory.namedNode('http://www.w3.org/2001/XMLSchema#base64Binary')),
+          dataFactory.namedNode('ex://c5'),
+        ),
       ];
       await this.store.multiPut(this.quads);
     });
@@ -287,6 +293,14 @@
       should(quads).have.length(0);
     });
 
+    it('should retrieve base64 object', async function () {
+      const { dataFactory, store } = this;
+      const { items: quads } = await store.get({
+        subject: dataFactory.namedNode('ex://s5'),
+      });
+      should(quads).be.equalToQuadArray([this.quads[7]]);
+    });
+
   });
 
   describe('Quadstore.prototype.get() w/ order', () => {

@gsvarovsky
Copy link
Contributor Author

I have a suite of "compliance" tests, like system tests, in which one sub-suite uses asymmetric cryptography to sign and verify messages between clones. That involves storing public keys in the data. All those tests fail because the public keys are corrupted when retrieved. That's what led me to write the quadstore test included in the ticket. No other tests fail – they whizz through!

In more raw terms, I write this:

Literal {
  termType: 'Literal',
  value: 'MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDFi56F11nqql9rT2OBNrYaEfBXEALnXVPjxjcwQYQjixt4Cakq37+vEzBdL8s3nMEFsPu6L59cqq6Obt3Oh1qnVZL+hY/mC9lZ33rXQsfCd7eQYm+aHKj/A8wqLK5BTtX6FRFDXaH+YdEwI8KS7eOFuwZjQ+LPphK8e4ZpfmVsZQIDAQAB',
  language: '',
  datatype: NamedNode {
    termType: 'NamedNode',
    value: 'http://www.w3.org/2001/XMLSchema#base64Binary'
  }
}

but get back this:

Literal {
  termType: 'Literal',
  value: 'd7eQYm+aHKj/A8wqLK5BTtX6FRFDXaH+YdEwI8KS7eOFuwZjQ+LPphK8e4ZpfmVsZQIDAQAB\x00\x00',
  language: '',
  datatype: NamedNode {
    termType: 'NamedNode',
    value: 'http://www.w3.org/2001/XMLSchema#base64Binary\x00\x00MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDFi56F11nqql9rT2OBNrYaEfBXEALnXVPjxjcwQYQjixt4Cakq37+vEzBdL8s3nMEFsPu6L59cqq6Obt3Oh1qnVZL+hY/mC9lZ33rXQs'
  }
}

@gsvarovsky
Copy link
Contributor Author

I'd be more than happy to look into this myself, if you like!

@jacoscaz
Copy link
Collaborator

Hello! Apologies for the latency, got married this weekend :). I'm still looking into this regression, for which we need specific tests. Before my break I did manage to discover that this seems to be related to any term that serializes to a string longer than a specific amount of chars. The type of the term itself appears to make no difference.

@jacoscaz
Copy link
Collaborator

Here's two value arrays: the first one related to a breaking test case in which I'm serializing a literal having length of 128 chars, the second one related to a passing test case in which I'm serializing the same literal minus one character.

Uint16Array(17) [0, 15, 0, 15, 3, 49135, 189,  0,  15, 0, 0, 0, 0, 0, 0, 0, 0]
Uint16Array(16) [0, 15, 0, 15, 3,   127,   0, 15,   0, 0, 0, 0, 0, 0, 0, 0]

The array even changes in length (!!!). Weird.

@jacoscaz
Copy link
Collaborator

This doesn't happen with the memory-level backend. Also, tests reproducing the serialization steps for a single term seem to pass with flying colors:

it('Should serialize and deserialize a single generic literal term having length of 2048', async function () {
      const { store: { dataFactory: factory }, prefixes } = this;
      const term = factory.literal(''.padStart(2048, 'abab'));
      const value = new Uint16Array(4);
      const serialized = termWriter.write(value, 0, term, prefixes);
      const value_8 = viewUint16ArrayAsUint8Array(value, 0, value.length);
      // I/O to/from backend here
      const value_16 = viewUint8ArrayAsUint16Array(value_8);
      const deserialized = termReader.read(serialized, 0, value_16, 0, factory, prefixes);
      should(deserialized.equals(term)).be.true();
    });

@jacoscaz
Copy link
Collaborator

This also doesn't happen when reproducing the steps for (de)serializing a single quad:

it('Should serialize and deserialize a quad having terms longer than 127 chars', async function () {
      const { store: { dataFactory: factory, separator }, prefixes } = this;
      const quad = factory.quad(
        factory.namedNode('http://ex.com/s'),
        factory.namedNode('http://ex.com/p'),
        factory.literal(''.padStart(128, 'abab')),
        factory.namedNode('http://ex.com/g'),
      );
      const value = new Uint16Array(16);
      const prefix = `SPOG${separator}`;
      const termNames = ['subject', 'predicate', 'object', 'graph'];
      const serialized = quadWriter.write(prefix, value, 0, quad, termNames, prefixes);
      const value_8 = viewUint16ArrayAsUint8Array(value, 0, value.length);
      const value_16 = viewUint8ArrayAsUint16Array(value_8);
      const deserialized = quadReader.read(serialized, prefix.length, value_16, 0, termNames, factory, prefixes);
      console.log(deserialized);
      should(deserialized.equals(quad)).be.true();
    });

@jacoscaz jacoscaz reopened this Sep 21, 2022
@jacoscaz
Copy link
Collaborator

@gsvarovsky could you please test again with version 11.0.3-beta.1 which I've just published to NPM?

@gsvarovsky
Copy link
Contributor Author

gsvarovsky commented Sep 21, 2022

m-ld-js Unit tests all pass
m-ld-js Compliance tests all pass (and in record time)

awesome work. 🫡

@jacoscaz
Copy link
Collaborator

All right! Will add some more tests and publish a patch version. Thank you for flagging this, it was a major regression.

@jacoscaz
Copy link
Collaborator

Version 11.0.3 is now on NPM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants