Custom model changes needed to unify with SQLFlow model zoo #1476

typhoonzero · 2019-11-18T09:47:23Z

Background

Unify model zoo implementation of SQLFlow and ElasticDL: sql-machine-learning/models#22
WIP PR: sql-machine-learning/models#27

Custom Model Requirements for Unifying ModelZoo

Support feature_columns argument when initializing a model.
Support a default eval_metrics_fn, so that this function is not "required" when writing a custom model definition.
Current loss(output, labels) function can not be reused in kerasmodel.compile, should be compatible with keras loss functions, like: keras.losses.mean_squared_error(y_true, y_pred): https://keras.io/losses/
dataset_fn is still needed when reading data from MaxCompute: https://github.com/sql-machine-learning/elasticdl/blob/develop/model_zoo/odps_iris_dnn_model/odps_iris_dnn_model.py

The text was updated successfully, but these errors were encountered:

terrytangyuan · 2019-11-18T15:14:03Z

We already support this. Users just need to add feature columns definitions as part of their model definition. Something like the following would work. See tutorial in https://www.tensorflow.org/tutorials/structured_data/feature_columns for details.

feature_layer = tf.keras.layers.DenseFeatures(feature_columns)
model = tf.keras.Sequential([
  feature_layer,
  layers.Dense(128, activation='relu'),
  layers.Dense(128, activation='relu'),
  layers.Dense(1, activation='sigmoid')
])

What would be a good default for eval_metrics_fn? I think this is highly dependent on the type of model that's defined, e.g. regression vs. classification.
I thought it should be compatible already. @LiMinghao1994 @workingloong @brightcoder01 can take a look at this if that's not the case.
It's not needed if you are using SQLFLow directly since dataset_fn is automatically generated by ElasticDL codegen.

typhoonzero · 2019-11-18T15:24:05Z

What would be a good default for eval_metrics_fn? I think this is highly dependent on the type of model that's defined, e.g. regression vs. classification.

Well, I'll consider about this. It's true that the eval_metrics_fn is needed for every model.

It's not needed if you are using SQLFLow directly since dataset_fn is automatically generated by ElasticDL codegen.

Is it possible to test a model using MaxCompute as the dataset, so that the dataset_fn is automatically generated by ElasticDL, so that if we need to test a model if it's working, we don't need to write a dataset_fn again.

terrytangyuan · 2019-11-18T15:31:58Z

Is it possible to test a model using MaxCompute as the dataset, so that the dataset_fn is automatically generated by ElasticDL, so that if we need to test a model if it's working, we don't need to write a dataset_fn again.

I think it should be auto generated by SQFLow instead since ElasticDL's model definition must contain information like feature column names and label column name, which should be written by the user as part of dataset_fn. This kind of information can be easier to obtain through SQLFlow's extended query.

typhoonzero · 2019-11-19T00:55:03Z

@terrytangyuan The problem is if we want to involve many model developers to contribute models, the dataset_fn should not be a part of the model. It may be a function used to test the model is working, but not in the model's definition file.

typhoonzero · 2019-11-19T00:57:47Z

I thought it should be compatible already. @LiMinghao1994 @workingloong @brightcoder01 can take a look at this if that's not the case.

The order of output and labels argument should be the same as Keras's loss functions.

terrytangyuan · 2019-11-19T02:36:10Z

@typhoonzero and I synced offline. To summarize, we will:

Automatically create dataset_fn() internally in ElasticDL so when ODPS data source is used this will be generated automatically without having to implement it in model definition file.
Allow specifying feature col names and label col name through ElasticDL, e.g. --data_reader_params or --envs.
We start with supporting Keras subclass API first. Functional Keras API requires the use of tf.keras.layers.Input which needs to know the input shape, which is not easy to support since dataset_fn and model is decoupled now.

terrytangyuan · 2019-11-20T15:28:34Z

I thought it should be compatible already. @LiMinghao1994 @workingloong @brightcoder01 can take a look at this if that's not the case.

The order of output and labels argument should be the same as Keras's loss functions.

@typhoonzero This should be fixed by #1490. Please test to see if it works now.

terrytangyuan · 2019-12-03T20:13:57Z

@typhoonzero This can be closed now, right?

typhoonzero assigned terrytangyuan and QiJune Nov 18, 2019

terrytangyuan mentioned this issue Nov 19, 2019

Automatically generate dataset_fn if ODPS data source is used #1482

Closed

terrytangyuan assigned terrytangyuan and unassigned QiJune and terrytangyuan Nov 20, 2019

terrytangyuan mentioned this issue Nov 20, 2019

Allow user to specify tf.keras style loss function #1490

Merged

terrytangyuan mentioned this issue Nov 27, 2019

Provide default implementation of dataset_fn for ODPS data source #1531

Merged

terrytangyuan closed this as completed Dec 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom model changes needed to unify with SQLFlow model zoo #1476

Custom model changes needed to unify with SQLFlow model zoo #1476

typhoonzero commented Nov 18, 2019

terrytangyuan commented Nov 18, 2019

typhoonzero commented Nov 18, 2019

terrytangyuan commented Nov 18, 2019

typhoonzero commented Nov 19, 2019

typhoonzero commented Nov 19, 2019

terrytangyuan commented Nov 19, 2019

terrytangyuan commented Nov 20, 2019 •

edited

Loading

terrytangyuan commented Dec 3, 2019

Custom model changes needed to unify with SQLFlow model zoo #1476

Custom model changes needed to unify with SQLFlow model zoo #1476

Comments

typhoonzero commented Nov 18, 2019

Background

Custom Model Requirements for Unifying ModelZoo

terrytangyuan commented Nov 18, 2019

typhoonzero commented Nov 18, 2019

terrytangyuan commented Nov 18, 2019

typhoonzero commented Nov 19, 2019

typhoonzero commented Nov 19, 2019

terrytangyuan commented Nov 19, 2019

terrytangyuan commented Nov 20, 2019 • edited Loading

terrytangyuan commented Dec 3, 2019

terrytangyuan commented Nov 20, 2019 •

edited

Loading