Provide python or CL interface to generate StudyJob yaml and/or StudyJob #240

cwbeitel · 2018-11-11T20:15:41Z

It would be convenient if users could define and create a katib StudyJob YAML from a Python interface. It would also be nice to be able to submit and poll for the status of this from python as well.

In the case of kubeflow/examples#322 it looks like we can easily launch a kubeflow/pipelines pipeline for a single train_mnist call. But then when we want to go and tune the same with katib a yaml needs to be written.

One approach would be to enable katib jobs to be configured and launched from a CLI. This could then be wrapped with a kfp.ContainerOp (roughly as below) and thereby made more testable and easier to include in a broader pipeline.
Another would be to enable katib jobs to be triggered from a python call. This could also be wrapped in a container op but I'm guessing doing it this way would allow the result of the op to be more readily consumed.

import kfp.dsl as kfp
 def training_op(learning_rate: float, ... ):
  return kfp.ContainerOp(
    name=step_name,
    image='katib/mxnet-mnist-example',
    command=['katib-studyjob-launcher'],
    arguments=[
      '--cmd', 'python', '/mxnet/example/image-classification/train_mnist.py'
      '--hparam="--lr","%s","%s"' % (min_learningrate, max_learningrate),
      '--'
      '--batch-size', '64',
      ...
    ]
  )
 @kfp.pipeline(
  name='KatibStudyJob',
)
 def kubeflow_training(
  learning_rate: kfp.PipelineParam = kfp.PipelineParam(name='min_learningrate', value=0.1),
  learning_rate: kfp.PipelineParam = kfp.PipelineParam(name='max_learningrate', value=0.3),
...
):
  training = training_op(min_learningrate, max_learningrate, ...)

/cc @jlewi @texasmichelle

janvdvegt · 2018-12-02T09:28:06Z

What about submitting a StudyJob directly via the Kubernetes API in Python? That's what I was already working on for my own project so we could fire off experiments via our GUI. The dependency it creates is the Python kubernetes API package and in my case I need to create a Role and RoleBinding to allow the Pods that are running the Python package to interact with StudyJobs. I'm open to contribute this if this is something you are interested in

YujiOshima · 2018-12-04T00:47:02Z

@janvdvegt That's great! I'm so interested in your project. Very welcome to contribute!
We can make two level IF.
A high level IF is manage StudyJob with your project.
A low level iF is calling Katib API directly.

andreyvelich · 2020-10-16T23:46:28Z

We created Python SDK for Katib to run Experiments.

issue-label-bot · 2020-10-16T23:46:35Z

Issue-Label Bot is automatically applying the labels:

Label	Probability
kind/feature	0.98

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

issue-label-bot · 2020-10-16T23:46:36Z

Issue-Label Bot is automatically applying the labels:

Label	Probability
kind/feature	0.98

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

cwbeitel mentioned this issue Nov 11, 2018

Support for tensor2tensor ranged_hparams #241

Open

andreyvelich closed this as completed Oct 16, 2020

issue-label-bot bot added the kind/feature label Oct 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide python or CL interface to generate StudyJob yaml and/or StudyJob #240

Provide python or CL interface to generate StudyJob yaml and/or StudyJob #240

cwbeitel commented Nov 11, 2018

janvdvegt commented Dec 2, 2018

YujiOshima commented Dec 4, 2018

andreyvelich commented Oct 16, 2020

issue-label-bot bot commented Oct 16, 2020

issue-label-bot bot commented Oct 16, 2020

Provide python or CL interface to generate StudyJob yaml and/or StudyJob #240

Provide python or CL interface to generate StudyJob yaml and/or StudyJob #240

Comments

cwbeitel commented Nov 11, 2018

janvdvegt commented Dec 2, 2018

YujiOshima commented Dec 4, 2018

andreyvelich commented Oct 16, 2020

issue-label-bot bot commented Oct 16, 2020

issue-label-bot bot commented Oct 16, 2020