Perfornance - Cache expensive url operations #203

dnephin · 2015-03-02T00:25:03Z

Based on the work in #182 (I cherry-picked a few commits) for issue #158
Replaces #202
Also fixes the problem described in #201

I rebased over many of my changes in #202. Many of them were unnecessary after adding caching.

On cpython 2.7 I'm seeing the benchmark runtime go from ~56ms to ~22ms. I don't expect any significant improvement in pypy, since it's already going to have this cached.

The primary change here is to add two caches to RefResolver, one for urljoin, and another for resolving a reference.

Removing the context manger from ref() validation (the latest commit), gets this down to ~19ms

…g by keeping fragments separated from URL (and avoid redunant frag/defrag). Conflicts: jsonschema/tests/test_benchmarks.py issue python-jsonschema#158: Use try-finally to ensure resolver scopes_stack empty when iteration breaks (no detectable performance penalty). * Replace non-python-2.6 DefragResult with named-tuple. * Add test-case checking scopes_stack empty. Conflicts: jsonschema/tests/test_validators.py jsonschema/validators.py

dnephin · 2015-03-02T00:32:20Z

jsonschema/tests/test_validators.py

        resolver = RefResolver.from_schema(schema)
        with resolver.resolving("#/a") as resolved:
            self.assertEqual(resolved, schema["a"])
-        with resolver.resolving("foo://bar/schema#/a") as resolved:
+        with resolver.resolving("http://bar/schema#/a") as resolved:


I had to change these to http because (at least on py3) the behaviour of urljoin changes based on the scheme. With an unknown scheme it was not replacing the fragment at all.

Yeah there's some global that defines weirdly which schemes it's fine with. I don't recall why this worked before then?

I believe it worked before because it wasn't joining the ref to the rest of the url. The url was being fetched first, then the ref was being used to locate the section in the document. Now the full url is being stored as the scope instead.

Ah, OK, makes sense.

Julian · 2015-03-03T18:04:40Z

jsonschema/_utils.py

@@ -38,6 +38,22 @@ def __repr__(self):
        return repr(self.store)


+class Cache(object):


I'd prefer to use something pre-baked for caching. functools.lrucache with the backport module, say?

That should work. This doesn't really need the "lru" portion, but with a sufficiently large value for maxsize, that should be fine.

Apparently both functools32 and repoze.lru have this. I'll look into which ones work on py2.6

On a related note, do you think I should add a new kwarg to RefResolver to disable caching entirely (since there is already an option to disable caching of remote refs) ?

Maybe a similar solution to what we do for remote cache would work, with the cache just being an optional argument which constructs an appropriately configured lrucache, but disabling caching would be providing a functools.lrucache with maxsize=0 or whatever.

But yeah, sounds like a decent idea. Nice catch.

So it looks like functools32 is py2.7 only (I actually need 2.6 still).

repos.lru works in 2.6, but it's not small (https://github.com/repoze/repoze.lru/blob/master/repoze/lru/__init__.py ~373 lines). I notice that currently there are no install_requires dependencies.

I could add an install_requires for version < 3.2 or I could vendor it in jsonschema/vendor/repos/lru.py. I usually prefer to add the dependency instead of vendoring it. Do you have a preference here?

Ech, annoying. OK, yeah adding the dependency is fine with me.

It'd be nice to only do so where needed if there's a close enough interface between that and functools.lru_cache.

Cool, the interface is identical, I'll only install repoze.lru for python2.

Adding a dependency required a small change to where __version__ is defined.

When trying to run setup, setup.py imports __init__.py imports validators.py imports comat.py tries to import repoze.lru which will fail before the virtualenv doesn't exist yet. So I created a version.py, moved __version__ there, imported it from both setup.py and __init__.py and use exec() to read it without importing it.

I believe this is a pretty common way of handling this problem.

Sadly with lru_cache the benchmark is back to ~21ms (it's ~19.5ms with my custom Cache object). I guess that is good enough though. I understand wanting to use something that already exists.

I've been meaning to switch to http://vcversioner.readthedocs.org/en/latest/ which handles that process for us and isn't overcomplex like some other tools that do the same sort of thing.

Good to hear that we can swap those.

We're making the cache a parameter though right? So you should be able to drop in something faster if the 2ms are relevant for your app?

(Obviously I understand wanting stuff to be fast out of the box too :P)

Ah yes, good point. I was considering just making the maxsize a parameter, but making the decorator function itself a parameter makes sense.

Edit: Oh, it might have even just been that the value I was using for maxsize was too low for my testcase. Increasing it seems to restore the performance.

Julian · 2015-03-03T20:28:09Z

Left some more comments inline as I'm trying to review this in the quick couple of minutes I can grab :D

Appreciated again, thanks a lot for pushing this forward.

dnephin · 2015-03-03T21:49:10Z

Thanks for the review and feedback! I've got a couple open questions (how to include the lrucache dep, and how composition would work), then I'll make the changes.

dnephin · 2015-03-04T21:24:03Z

Ok, I think I have all the changes in.

Julian · 2015-03-05T18:17:15Z

Awesome. I'm out on business, but I'll have another look and try to merge on Sunday, thanks again!

Julian · 2015-03-10T16:29:12Z

OK! Think this roughly lgtm -- some of the assertions that were removed I'd like to add back since they test backwards compatible things we're forced to leave, but I can add them back in after merging, which now I hopefully will have time for tomorrow.

Thakns again!

dnephin · 2015-03-10T18:28:13Z

Awesome, looking forward to it. Thank you!

Julian · 2015-03-15T17:50:57Z

Started merging (in https://github.com/Julian/jsonschema/tree/perf_cache_resolving) but adding back in the assertions I was referring to actually fails, so there's some more backwards incompat. I'll try to give it a bit more time early this week, unless you see immediately what the correct way to add base_uri back in is.

dnephin · 2015-03-15T21:51:37Z

Ah, I see what you were saying. My latest commit 7241db0 fixes it.

Julian · 2015-03-22T02:11:46Z

OK managed to find slightly more time, and made a few small changes -- needed to add back backwards compatibility for RefResolvers without the new methods, and I changed the parameters to take caches, rather than parameters they can use to make caches.

I think I'd like to move RefResolver.resolve into a function that handles the backwards compat (and leave it off the interface), and possibly combine it with push_scope since that's how you're using it (so, say resolve_into -> resolve + push_scope, similar to the context manager, which was sort of what I was going with before with suggesting composition but I think it can work as a function, and then after all that we'll probably have to run your benchmark again to make sure we've still gotten whatever performance gains we're looking for.

So we're close :), sorry it's taking so long.

dnephin · 2015-03-22T19:17:42Z

What would call resolve_into()? _validators.ref() ? Would it be added to the list of validator directly instead of ref()?

Julian · 2015-03-30T22:29:01Z

The same place (no not added to validator) -- I'm imagining just putting the caches on Validator (in create) -- it's probably just easier to show you the change and see if it makes sense, sorry this is taking so long, just trying to avoid digging ourselves into a hole by changing interfaces we have little control over (RefResolver) when we possibly can introduce a new one (or at worst slightly modify one we do control easiliy, the Validator interface) -- going to do that I hope by the end of the week and ping you.

Julian · 2015-04-06T00:33:47Z

OK never mind, I tried it and like this way better :)

Merged. Sorry again that this took so long, much appreciated!

dnephin · 2015-04-06T15:00:11Z

That's for all the work getting this merged!

Since there is a bug fix included in this change (for python3), would it be possible to do a bug fix release (v2.4.1) ?

Julian · 2015-04-06T17:19:37Z

I'd like to cut 2.5, but Travis is using a build of ubuntu that's failing the test suite on PyPy, which I have to figure out.

I can certainly try to cut a bugfix release, but which bug fix is in here again?

dnephin · 2015-04-06T17:32:36Z

I guess I never opened a ticket for it. The issue is described in #201 (which I discarded because this fixed it in a better way).

Julian · 2015-04-06T17:39:44Z

Ah the test flakiness... OK lemme see if I can make a branch with just the fix for that on it, hopefully tomorrow, if not sooner if I can just get 2.5 out as well.

bpicolo · 2015-05-12T18:41:38Z

@Julian Reckon you'll have a chance to cut the bug fix release? (Or is 2.5 close)?

Julian · 2015-05-14T20:34:26Z

I hope so, sorry :( I'm back from my trip, so hoping to have some spare
bandwidth over the weekend.
On May 12, 2015 2:41 PM, "Ben Picolo" [email protected] wrote:

@Julian https://github.com/Julian Think you'll have a chance to cut the
bug fix release? (Or is 2.5 close)?

—
Reply to this email directly or view it on GitHub
#203 (comment).

prat0318 · 2015-06-04T18:05:29Z

@Julian wanted to follow up if the patch could go in any immediate releases?

Julian · 2015-06-05T13:08:43Z

OK, release is out. Sorry for the delay here all. Feel free to follow up with any issues.

ankostis and others added 3 commits February 27, 2015 19:04

Fix test failures

22701dc

Add benchmark script.

812392b

dnephin changed the title ~~Perfornance - cache resolving~~ Perfornance - Cache expensive url operations Mar 2, 2015

This was referenced Mar 2, 2015

Performance - reduce urljoin/urldefrag overhead #202

Closed

Fix flaky errors with py3 #201

Closed

dnephin reviewed Mar 2, 2015
View reviewed changes

Perf improvements by using a cache.

613cf3e

dnephin force-pushed the perf_cache_resolving branch from ae2f281 to 613cf3e Compare March 2, 2015 00:35

Remove context manager from ref() validation.

d1e2448

prat0318 mentioned this pull request Mar 2, 2015

Python3 support Yelp/swagger_spec_validator#11

Merged

Julian reviewed Mar 3, 2015
View reviewed changes

Remove DefragResult.

ca59f3f

Use lru_cache

ee1a256

Fix base_uri backwards compatibility.

7241db0

jeff1evesque mentioned this pull request Mar 19, 2015

Slow performance for jsonschema validation (cProfile) jeff1evesque/login-activity#146

Open

prat0318 mentioned this pull request Mar 26, 2015

Delegate spec validation to swagger_spec_validator. Yelp/pyramid_swagger#86

Merged

Julian merged commit 7241db0 into python-jsonschema:master Apr 6, 2015

dnephin deleted the perf_cache_resolving branch April 6, 2015 15:00

bpicolo mentioned this pull request May 12, 2015

Support py3 Yelp/bravado-core#14

Merged

avian2 mentioned this pull request May 12, 2016

jsonschema 2.5.0 breaks Merger.get_schema() avian2/jsonmerge#20

Closed

		@@ -38,6 +38,22 @@ def __repr__(self):
		return repr(self.store)


		class Cache(object):

Perfornance - Cache expensive url operations #203

Perfornance - Cache expensive url operations #203

Conversation

dnephin commented Mar 2, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Julian commented Mar 3, 2015

dnephin commented Mar 3, 2015

dnephin commented Mar 4, 2015

Julian commented Mar 5, 2015

Julian commented Mar 10, 2015

dnephin commented Mar 10, 2015

Julian commented Mar 15, 2015

dnephin commented Mar 15, 2015

Julian commented Mar 22, 2015

dnephin commented Mar 22, 2015

Julian commented Mar 30, 2015

Julian commented Apr 6, 2015

dnephin commented Apr 6, 2015

Julian commented Apr 6, 2015

dnephin commented Apr 6, 2015

Julian commented Apr 6, 2015

bpicolo commented May 12, 2015

Julian commented May 14, 2015

prat0318 commented Jun 4, 2015

Julian commented Jun 5, 2015