Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add the Ability to use Redis for Storage #85

Closed
wants to merge 15 commits into from

Conversation

pseudomuto
Copy link
Contributor

@minrk @rgbkrk

Now that #81 has been merged, it's possible to define external storage providers that can be used instead of the in memory trie/array.

Normally, I'd just add providers for redis, memcached, etc. here, but as we discussed before having all of those dependencies is not great.

The idea

Instead of making a provider for each thing, I've added the ability to use an external script to manage persistence. This script can be any executable thing and use whatever backend storage mechanism you like, without adding any dependencies in this project.

For example, I've used a simple file that I pickle/unpickle to in the tests here. It would be trivial to implement a version of that script that used anything else.

Essentially, you start CHP with the --storage-command /path/to/script option. That script should respond to the following API:

$ /path/to/script get <path>
$ /path/to/script get_all
$ /path/to/script add <path> <base64_json>
$ /path/to/script update <path> <base64_json>
$ /path/to/script remove <path>
$ /path/to/script exists <path>

The reason the JSON data is base64 encoded is because everything else I tried, resulted in the returned objects not being parseable by JSON.parse

@@ -33,7 +33,7 @@ using the npm package manager:
npm install -g configurable-http-proxy

To install from the source code found in this GitHub repo:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I swear it was vim not me! 😄

@@ -71,7 +71,8 @@ matching route is found in the proxy table:
-V, --version output the version number
--ip <ip-address> Public-facing IP of the proxy
--port <n> (defaults to 8000) Public-facing port of the proxy

--storage-command <script> The optional external executable to use for persistence
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we can come up with a more specific name here than --storage-command.

One suggestion:
--persistent-storage <path> Path to executable for persistent storage of route table (optional)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch my earlier suggestion of persistent-storage. storage-command still feels a bit vague to me as a name perhaps:
--external-storage <path> Path to executable script that stores routing table externally instead of in-memory

@@ -248,3 +248,26 @@ first part of the URL path, e.g.:
"/otherdomain.biz": "http://10.0.1.4:5555",
}
```

## Using External Storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: I'll have more detail after Min and Kyle's reviews.

## Route table and external storage (optional)

With the default configuration,  configurable-http-proxy (CHP) places the route table in the local process' memory.

If you would like to use external storage for the route table, i.e. redis, memcached, mysql, and others, you may specify an executable to use external storage. You would:

@willingc
Copy link
Contributor

willingc commented Oct 1, 2016

Hi @pseudomuto @minrk @rgbkrk,

I have a few questions of clarification on this PR.

I'm a bit confused on:

  1. whether we want to persist the routes in memory store to external storage;
  2. whether we want to not use memory store at all and use only external storage;
  3. what happens if the external storage is unavailable?

@minrk
Copy link
Member

minrk commented Oct 1, 2016

@pseudomuto does it make more sense to define a small REST API for external storage services, rather than invoking a script every time? I'm not sure what's better. Preferences, @rgbkrk?

It might be worth exploring whether redis and/or memcache clients are small enough dependencies that it's not a big deal to add them. While I am not sure about going that way, and it is certainly less extensible, it would be a lot simpler for users, etc. (fewer moving parts). We could also consider the possibility that we only really need to support one or two options. Just because there are a million ways to store data, doesn't mean we need to necessarily support a generic mechanism for arbitrary external storage.

@willingc

  1. no, external storage is an alternative to in-memory storage, for when the default CHP behavior is not desired (for persisting, sharding proxy instances)
  2. yes, I think in-memory should be used in the vast majority of cases. Only for those cases where multiple instances, extra redundancy, etc. are required would external storage be appropriate. There is a high cost to having to call out to an external agent for a lookup, so I wouldn't recommend doing that unless the scaling is required.
  3. I suspect the proxy becomes non-functional in this case. Alternatively, it could fallback on an in-memory cache. Given the modest scope and goals of CHP, I don't think handling memached going down is something we would need to deal with.

@rgbkrk
Copy link
Member

rgbkrk commented Oct 1, 2016

My preference would be to have a REST API or build in support for at least one reliable external storage option. We can always change things later.

@willingc
Copy link
Contributor

willingc commented Oct 1, 2016

@minrk Thanks for the clarifications.

@pseudomuto
Copy link
Contributor Author

My preference would be to have a REST API or built-in support for at least one reliable external storage option.

I'm down with that! For selfish reasons, I'd like it to be redis, but is there something else that you guys would prefer?

@pseudomuto
Copy link
Contributor Author

Alright...I've removed the external script type and added a redis implementation. If you guys are happy with this, let me know and I'll update the readme and add the config options.

@rgbkrk
Copy link
Member

rgbkrk commented Oct 1, 2016

Cool, I'm happy with this. I'm fine with Redis for now, though I much prefer Postgres in production. This is looking pretty solid.

@pseudomuto
Copy link
Contributor Author

Added options for configuring redis/storage-provider and updated the README (I regenerated the options by running bin/configurable-http-proxy -h so they'd all be up to date).

@willingc you had some suggestions for a better option name, does --storage-provider seem clear enough?

--default-target <host> Default proxy target (proto://host[:port])
--error-target <host> Alternate server for handling proxy errors (proto://host[:port])
--error-path <path> Alternate server for handling proxy errors (proto://host[:port])
--redirect-port <redirect-port> Redirect HTTP requests on this port to the server on HTTPS
--pid-file <pid-file> Write our PID to a file

--storage-provider <provider> The storage provider for the route table (defaults to memory)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pseudomuto Thanks! storage-provider works better than my suggestions.

I wonder if we should make the following redis options more generic i.e. --storage-provider-host.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I chose to use --redis-XYZ is to make it clear that these only apply to redis. I'm a little concerned that naming these options --storage-provider-XYZ would imply that there are multiple options for external storage.

I'm happy to concede here though, if you feel this would be better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like --redis, as the args are likely to be different for each provider. It gives it a nice namespace.

Question, though: I'm not so experienced at deploying redis, but how many args are there likely to be used? Is it enough that we might want to have a full (JSON?) passthrough to redis-client constructor args, rather than a new PR for every redis option? Or are these three all we are likely to ever use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a bunch of options, though honestly I can't see most of them being used for this purpose. The list of supported options outlines them all.

One thing we could do is use --redis-url and drop the others. Then you could specify an options you want in the URL, like:

--redis-url redis://[[user][:password@]][host][:port][/db-number][?db=db-number[&password=bar[&option=value]]]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the sound of redis-url, but you have more experience than I do. What would you prefer as a consumer of this CLI?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for --redis-url for simple yet effective.

a little concerned that naming these options --storage-provider-XYZ would imply that there are multiple options for external storage.

Good point @pseudomuto 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some thought, I think we should use --redis-url. It'll make it more flexible, and provide the ability for anyone that needs auth/retry settings etc.

@@ -247,3 +254,20 @@ first part of the URL path, e.g.:
"/otherdomain.biz": "http://10.0.1.4:5555",
}
```

## Using Redis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Redis as a Storage Provider


## Using Redis

If you require multiple instances of CHP to be running and kept in sync, you can use [redis] to to store the route table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your deployment runs multiple CHP instances, you may need to keep these instances in sync. You may use a storage provider, i.e. [redis], to store the route table instead of the default which is storing the table in memory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: If you use an external storage provider, such as redis, the route table will not be stored in memory.

If you require multiple instances of CHP to be running and kept in sync, you can use [redis] to to store the route table
rather than the default in memory option.

To do so, you'll need to pass in a couple of options to CHP when launching it. Specifically, you'll need to set the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To configure CHP to launch with redis as the external storage provider, run the following:

configurable-http-proxy --storage-provider redis

This command launches CHP and uses [redis] on `localhost:6379 and the default redis database.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wish to run [redis] on a different host, port, or database location, you should set the [additional options]. For example, pass the additional option when launching CHP:

configurable-http-proxy --storage-provider redis --storage-provider-port 6380

@willingc
Copy link
Contributor

willingc commented Oct 2, 2016

@pseudomuto I like the naming choice.

I've made some suggestions for clarity on the doc. Thanks!

@@ -18,6 +18,7 @@
"devDependencies": {
"jasmine": "^2.5.1",
"jshint": "^2.9.2",
"redis": "~2.6",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a devDependency, or a regular dependency? If it's not a regular dependency, how would a user install CHP with redis support?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! This should definitely be a regular dependency!

@pseudomuto
Copy link
Contributor Author

Made redis a regular dependency and updated the README with some of @willingc's suggestions

@pseudomuto pseudomuto changed the title Add the Ability to use External Storage Add the Ability to use Redis for Storage Oct 3, 2016
@pseudomuto
Copy link
Contributor Author

pseudomuto commented Oct 3, 2016

I'm just testing this out in a test environment now. A couple of issues have come up that will need to be address before this can go out (making a list to keep track here):

  • Last activity should be stored as ISO string
  • Need to support trie/closest match for the redis implementation as well (so things like /hub/spawn will be routed to the default_target)
  • Figure out issue with POST requests

@pseudomuto pseudomuto force-pushed the external_storage branch 2 times, most recently from 9cf28cb to d1c3a1a Compare October 3, 2016 20:08
@minrk
Copy link
Member

minrk commented Oct 12, 2016

@pseudomuto it looks like your last commits address the unchecked box. Is this ready to go?

@pseudomuto
Copy link
Contributor Author

There's still an issue when running this in production. For some reason, switching to the redis provider causes POST requests (all GETs seem to be working as expected) in JupyterHub to timeout. It seems like the body of the request doesn't get passed on or something.

For example, after setting the storage provider and options, I can't login to JH. The proxy does get the right target now, but the request just hangs until it times out.

I'm not sure what's causing this just yet. I've added a new checkount 😏

@minrk minrk changed the title Add the Ability to use Redis for Storage [WIP] Add the Ability to use Redis for Storage Jan 26, 2017
@narenst
Copy link

narenst commented Feb 4, 2017

Any updates on this? I could really use a store backed CHP :)

@willingc
Copy link
Contributor

willingc commented Apr 3, 2017

Hi @pseudomuto,

Thanks for all the work to this point. Is there anything we can do to help move this forward?

cc/ @minrk

@pseudomuto
Copy link
Contributor Author

@willingc I'm sorry to say that I won't be able to continue working on this right now (nor likely in the near future).

If someone's interested in taking this over, I can leave the PR up, at least as a point of reference. If not, we can just close it.

@willingc
Copy link
Contributor

willingc commented Apr 3, 2017

@pseudomuto No apologies needed. We are pleased that you have contributed this code. ☀️ 🍰

Thanks for being flexible about its use. I will leave it open for now. This way it is available if @minrk or others wish to adopt it.

@minrk
Copy link
Member

minrk commented Apr 5, 2017

No problem! I'm happy to leave this here for anyone to pick up and bring over the finish line. I probably won't have a chance to do that soon, but anyone is welcome to give it a go.

@andrelu
Copy link

andrelu commented Sep 14, 2017

Hello everyone, any update on this?

@willingc
Copy link
Contributor

Hi @andrelu, There's no status change since the April 5th message. We've left this open should someone from the community would like to pick it up. It's unlikely that @minrk or I will be working on this in the near future. Thanks for asking.

@minrk
Copy link
Member

minrk commented Oct 25, 2017

Superseded by #129, which enables pluggable storage without requiring that they be added to CHP itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants