Skip to content

Perfornance - Cache expensive url operations #203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 6, 2015

Conversation

dnephin
Copy link
Contributor

@dnephin dnephin commented Mar 2, 2015

Based on the work in #182 (I cherry-picked a few commits) for issue #158
Replaces #202
Also fixes the problem described in #201

I rebased over many of my changes in #202. Many of them were unnecessary after adding caching.

On cpython 2.7 I'm seeing the benchmark runtime go from ~56ms to ~22ms. I don't expect any significant improvement in pypy, since it's already going to have this cached.

The primary change here is to add two caches to RefResolver, one for urljoin, and another for resolving a reference.

Removing the context manger from ref() validation (the latest commit), gets this down to ~19ms

ankostis and others added 3 commits February 27, 2015 19:04
…g by keeping

fragments separated from URL (and avoid redunant frag/defrag).
Conflicts:
	jsonschema/tests/test_benchmarks.py

issue python-jsonschema#158: Use try-finally to ensure resolver scopes_stack empty when
iteration breaks (no detectable performance penalty).

* Replace non-python-2.6 DefragResult with named-tuple.
* Add test-case checking scopes_stack empty.
Conflicts:
	jsonschema/tests/test_validators.py
	jsonschema/validators.py
@dnephin dnephin changed the title Perfornance - cache resolving Perfornance - Cache expensive url operations Mar 2, 2015
resolver = RefResolver.from_schema(schema)
with resolver.resolving("#/a") as resolved:
self.assertEqual(resolved, schema["a"])
with resolver.resolving("foo://bar/schema#/a") as resolved:
with resolver.resolving("http://bar/schema#/a") as resolved:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to change these to http because (at least on py3) the behaviour of urljoin changes based on the scheme. With an unknown scheme it was not replacing the fragment at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah there's some global that defines weirdly which schemes it's fine with. I don't recall why this worked before then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it worked before because it wasn't joining the ref to the rest of the url. The url was being fetched first, then the ref was being used to locate the section in the document. Now the full url is being stored as the scope instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK, makes sense.

@dnephin dnephin force-pushed the perf_cache_resolving branch from ae2f281 to 613cf3e Compare March 2, 2015 00:35
@@ -38,6 +38,22 @@ def __repr__(self):
return repr(self.store)


class Cache(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to use something pre-baked for caching. functools.lrucache with the backport module, say?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should work. This doesn't really need the "lru" portion, but with a sufficiently large value for maxsize, that should be fine.

Apparently both functools32 and repoze.lru have this. I'll look into which ones work on py2.6

On a related note, do you think I should add a new kwarg to RefResolver to disable caching entirely (since there is already an option to disable caching of remote refs) ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a similar solution to what we do for remote cache would work, with the cache just being an optional argument which constructs an appropriately configured lrucache, but disabling caching would be providing a functools.lrucache with maxsize=0 or whatever.

But yeah, sounds like a decent idea. Nice catch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it looks like functools32 is py2.7 only (I actually need 2.6 still).

repos.lru works in 2.6, but it's not small (https://github.com/repoze/repoze.lru/blob/master/repoze/lru/__init__.py ~373 lines). I notice that currently there are no install_requires dependencies.

I could add an install_requires for version < 3.2 or I could vendor it in jsonschema/vendor/repos/lru.py. I usually prefer to add the dependency instead of vendoring it. Do you have a preference here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ech, annoying. OK, yeah adding the dependency is fine with me.

It'd be nice to only do so where needed if there's a close enough interface between that and functools.lru_cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, the interface is identical, I'll only install repoze.lru for python2.

Adding a dependency required a small change to where __version__ is defined.

When trying to run setup, setup.py imports __init__.py imports validators.py imports comat.py tries to import repoze.lru which will fail before the virtualenv doesn't exist yet. So I created a version.py, moved __version__ there, imported it from both setup.py and __init__.py and use exec() to read it without importing it.

I believe this is a pretty common way of handling this problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly with lru_cache the benchmark is back to ~21ms (it's ~19.5ms with my custom Cache object). I guess that is good enough though. I understand wanting to use something that already exists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been meaning to switch to http://vcversioner.readthedocs.org/en/latest/ which handles that process for us and isn't overcomplex like some other tools that do the same sort of thing.

Good to hear that we can swap those.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're making the cache a parameter though right? So you should be able to drop in something faster if the 2ms are relevant for your app?

(Obviously I understand wanting stuff to be fast out of the box too :P)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, good point. I was considering just making the maxsize a parameter, but making the decorator function itself a parameter makes sense.

Edit: Oh, it might have even just been that the value I was using for maxsize was too low for my testcase. Increasing it seems to restore the performance.

@Julian
Copy link
Member

Julian commented Mar 3, 2015

Left some more comments inline as I'm trying to review this in the quick couple of minutes I can grab :D

Appreciated again, thanks a lot for pushing this forward.

@dnephin
Copy link
Contributor Author

dnephin commented Mar 3, 2015

Thanks for the review and feedback! I've got a couple open questions (how to include the lrucache dep, and how composition would work), then I'll make the changes.

@dnephin
Copy link
Contributor Author

dnephin commented Mar 4, 2015

Ok, I think I have all the changes in.

@Julian
Copy link
Member

Julian commented Mar 5, 2015

Awesome. I'm out on business, but I'll have another look and try to merge on Sunday, thanks again!

@Julian
Copy link
Member

Julian commented Mar 10, 2015

OK! Think this roughly lgtm -- some of the assertions that were removed I'd like to add back since they test backwards compatible things we're forced to leave, but I can add them back in after merging, which now I hopefully will have time for tomorrow.

Thakns again!

@dnephin
Copy link
Contributor Author

dnephin commented Mar 10, 2015

Awesome, looking forward to it. Thank you!

@Julian
Copy link
Member

Julian commented Mar 15, 2015

Started merging (in https://github.com/Julian/jsonschema/tree/perf_cache_resolving) but adding back in the assertions I was referring to actually fails, so there's some more backwards incompat. I'll try to give it a bit more time early this week, unless you see immediately what the correct way to add base_uri back in is.

@dnephin
Copy link
Contributor Author

dnephin commented Mar 15, 2015

Ah, I see what you were saying. My latest commit 7241db0 fixes it.

@Julian
Copy link
Member

Julian commented Mar 22, 2015

OK managed to find slightly more time, and made a few small changes -- needed to add back backwards compatibility for RefResolvers without the new methods, and I changed the parameters to take caches, rather than parameters they can use to make caches.

I think I'd like to move RefResolver.resolve into a function that handles the backwards compat (and leave it off the interface), and possibly combine it with push_scope since that's how you're using it (so, say resolve_into -> resolve + push_scope, similar to the context manager, which was sort of what I was going with before with suggesting composition but I think it can work as a function, and then after all that we'll probably have to run your benchmark again to make sure we've still gotten whatever performance gains we're looking for.

So we're close :), sorry it's taking so long.

@dnephin
Copy link
Contributor Author

dnephin commented Mar 22, 2015

What would call resolve_into()? _validators.ref() ? Would it be added to the list of validator directly instead of ref()?

@Julian
Copy link
Member

Julian commented Mar 30, 2015

The same place (no not added to validator) -- I'm imagining just putting the caches on Validator (in create) -- it's probably just easier to show you the change and see if it makes sense, sorry this is taking so long, just trying to avoid digging ourselves into a hole by changing interfaces we have little control over (RefResolver) when we possibly can introduce a new one (or at worst slightly modify one we do control easiliy, the Validator interface) -- going to do that I hope by the end of the week and ping you.

@Julian Julian merged commit 7241db0 into python-jsonschema:master Apr 6, 2015
@Julian
Copy link
Member

Julian commented Apr 6, 2015

OK never mind, I tried it and like this way better :)

Merged. Sorry again that this took so long, much appreciated!

@dnephin
Copy link
Contributor Author

dnephin commented Apr 6, 2015

That's for all the work getting this merged!

Since there is a bug fix included in this change (for python3), would it be possible to do a bug fix release (v2.4.1) ?

@dnephin dnephin deleted the perf_cache_resolving branch April 6, 2015 15:00
@Julian
Copy link
Member

Julian commented Apr 6, 2015

I'd like to cut 2.5, but Travis is using a build of ubuntu that's failing the test suite on PyPy, which I have to figure out.

I can certainly try to cut a bugfix release, but which bug fix is in here again?

@dnephin
Copy link
Contributor Author

dnephin commented Apr 6, 2015

I guess I never opened a ticket for it. The issue is described in #201 (which I discarded because this fixed it in a better way).

@Julian
Copy link
Member

Julian commented Apr 6, 2015

Ah the test flakiness... OK lemme see if I can make a branch with just the fix for that on it, hopefully tomorrow, if not sooner if I can just get 2.5 out as well.

@bpicolo
Copy link

bpicolo commented May 12, 2015

@Julian Reckon you'll have a chance to cut the bug fix release? (Or is 2.5 close)?

@Julian
Copy link
Member

Julian commented May 14, 2015

I hope so, sorry :( I'm back from my trip, so hoping to have some spare
bandwidth over the weekend.
On May 12, 2015 2:41 PM, "Ben Picolo" [email protected] wrote:

@Julian https://github.com/Julian Think you'll have a chance to cut the
bug fix release? (Or is 2.5 close)?


Reply to this email directly or view it on GitHub
#203 (comment).

@prat0318
Copy link

prat0318 commented Jun 4, 2015

@Julian wanted to follow up if the patch could go in any immediate releases?

@Julian
Copy link
Member

Julian commented Jun 5, 2015

OK, release is out. Sorry for the delay here all. Feel free to follow up with any issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants