Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cull idle kernels #2215

Merged
merged 4 commits into from
Apr 5, 2017
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 72 additions & 2 deletions notebook/services/kernels/kernelmanager.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,17 @@

from tornado import gen, web
from tornado.concurrent import Future
from tornado.ioloop import IOLoop
from tornado.ioloop import IOLoop, PeriodicCallback

from jupyter_client.multikernelmanager import MultiKernelManager
from traitlets import Dict, List, Unicode, TraitError, default, validate
from traitlets import Dict, List, Unicode, TraitError, Integer, default, validate

from notebook.utils import to_os_path
from notebook._tz import utcnow, isoformat
from ipython_genutils.py3compat import getcwd

from datetime import datetime, timedelta


class MappingKernelManager(MultiKernelManager):
"""A KernelManager that handles notebook mapping and HTTP error handling"""
Expand All @@ -34,6 +36,10 @@ def _default_kernel_manager_class(self):

_kernel_connections = Dict()

_culler_callback = None

_initialized_culler = False

@default('root_dir')
def _default_root_dir(self):
try:
Expand All @@ -52,6 +58,26 @@ def _update_root_dir(self, proposal):
raise TraitError("kernel root dir %r is not a directory" % value)
return value

cull_kernels_after_minutes_env = 'CULL_KERNELS_AFTER_MINUTES'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Providing a way to set traitlets with environment variables is done in the kernel gateway code base, but it's not common in the notebook codebase.

cull_kernels_after_minutes_default = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's simplify the trait declarations for now. @parente is right that the environment variable declarations aren't generally used here, so we can have a single assignment:

cull_idle_timeout = Integer(0, config=True,
    help="...
)

without the @default generator or other variables defined.

Plus, I think it might be clearer to use seconds everywhere, rather than a _minutes timeout here and a _seconds timeout below. How about we call them:

cull_idle_timeout: timeout (in seconds) after which a kernel is considered idle and ready to be culled.
cull_interval: interval (in seconds) on which to check for idle kernels to cleanup...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you - this has been addressed in the subsequent commit.

cull_kernels_after_minutes = Integer(cull_kernels_after_minutes_default, config=True,
help="""Duration (minutes) in which a kernel must remain idle before it can be culled. Culling is disabled (0) by default."""
)

@default('cull_kernels_after_minutes')
def cull_kernels_after_minutes_value(self):
return int(os.getenv(self.cull_kernels_after_minutes_env, self.cull_kernels_after_minutes_default))

kernel_culling_interval_seconds_env = 'KERNEL_CULLING_INTERVAL_SECONDS'
kernel_culling_interval_seconds_default = 300 # 5 minutes
kernel_culling_interval_seconds = Integer(kernel_culling_interval_seconds_default, config=True,
help="""The interval (seconds) in which kernels are culled if exceeding the idle duration."""
)

@default('kernel_culling_interval_seconds')
def kernel_culling_interval_seconds_value(self):
return int(os.getenv(self.kernel_culling_interval_seconds_env, self.kernel_culling_interval_seconds_default))

#-------------------------------------------------------------------------
# Methods for managing kernels and sessions
#-------------------------------------------------------------------------
Expand Down Expand Up @@ -105,6 +131,11 @@ def start_kernel(self, kernel_id=None, path=None, **kwargs):
else:
self._check_kernel_id(kernel_id)
self.log.info("Using existing kernel: %s" % kernel_id)

# Initialize culling if not already
if not self._initialized_culler:
self.initialize_culler()

# py2-compat
raise gen.Return(kernel_id)

Expand Down Expand Up @@ -225,3 +256,42 @@ def record_activity(msg_list):

kernel._activity_stream.on_recv(record_activity)

def initialize_culler(self):
"""Start idle culler if 'cull_kernels_after_minutes' is greater than zero.

Regardless of that value, set flag that we've been here.
"""
if not self._initialized_culler and self.cull_kernels_after_minutes > 0:
if self._culler_callback is None:
loop = IOLoop.current()
if self.kernel_culling_interval_seconds <= 0: #handle case where user set invalid value
self.log.warn("Invalid value for 'kernel_culling_interval_seconds' detected (%s) - using default value (%s).",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.warn is deprecated in Python in favor of log.warning

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

self.kernel_culling_interval_seconds, self.kernel_culling_interval_seconds_default)
self.kernel_culling_interval_seconds = self.kernel_culling_interval_seconds_default
self._culler_callback = PeriodicCallback(
self.cull_kernels, 1000*self.kernel_culling_interval_seconds, loop)
self.log.info("Culling kernels with idle durations > %s minutes at %s second intervals ...",
self.cull_kernels_after_minutes, self.kernel_culling_interval_seconds)
self._culler_callback.start()

self._initialized_culler = True

def cull_kernels(self):
self.log.debug("Polling every %s seconds for kernels idle > %s minutes...",
self.kernel_culling_interval_seconds, self.cull_kernels_after_minutes)
for kId, kernel in self._kernels.items():
self.cull_kernel(kId, kernel)

def cull_kernel(self, kId, kernel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cull_kernel can just take a kernel_id argument, and the above can be for kernel_id in list(self._kernels). The list will make a copy of the keys, which may be needed, as shutting down a kernel will remove it from the _kernels dictionary and you can't iterate over a dictionary while changing it, for example:

for kernel_id in list(self._kernels):
    self.cull_kernel_if_idle(kernel_id)

def cull_kernel_if_idle(self, kernel_id):
    kernel = self._kernels[kernel_id]
    ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - thanks.

activity = kernel.last_activity
name = kernel.kernel_name
self.log.debug("kId=%s, name=%s, last_activity=%s", kId, name, activity)
if activity is not None:
dtNow = utcnow()
#dtActivity = datetime.strptime(activity,'%Y-%m-%dT%H:%M:%S.%f')
dtIdle = dtNow - activity
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Python, we try to use _ to separate words, such as kernel_id, idle_time instead of camelCase

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had noticed the convention previously but am finding this to be a very difficult habit to break - thanks for pointing it out.

if dtIdle > timedelta(minutes=self.cull_kernels_after_minutes): # can be culled
idleDuration = int(dtIdle.total_seconds()/60.0)
self.log.warn("Culling kernel '%s' (%s) due to %s minutes of inactivity.", name, kId, idleDuration)
self.shutdown_kernel(kId)