Skip to content

Custom model changes needed to unify with SQLFlow model zoo #1476

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
typhoonzero opened this issue Nov 18, 2019 · 8 comments
Closed

Custom model changes needed to unify with SQLFlow model zoo #1476

typhoonzero opened this issue Nov 18, 2019 · 8 comments
Assignees

Comments

@typhoonzero
Copy link
Collaborator

Background

Unify model zoo implementation of SQLFlow and ElasticDL: sql-machine-learning/models#22
WIP PR: sql-machine-learning/models#27

Custom Model Requirements for Unifying ModelZoo

  1. Support feature_columns argument when initializing a model.
  2. Support a default eval_metrics_fn, so that this function is not "required" when writing a custom model definition.
  3. Current loss(output, labels) function can not be reused in kerasmodel.compile, should be compatible with keras loss functions, like: keras.losses.mean_squared_error(y_true, y_pred): https://keras.io/losses/
  4. dataset_fn is still needed when reading data from MaxCompute: https://github.com/sql-machine-learning/elasticdl/blob/develop/model_zoo/odps_iris_dnn_model/odps_iris_dnn_model.py
@terrytangyuan
Copy link
Member

  1. We already support this. Users just need to add feature columns definitions as part of their model definition. Something like the following would work. See tutorial in https://www.tensorflow.org/tutorials/structured_data/feature_columns for details.
feature_layer = tf.keras.layers.DenseFeatures(feature_columns)
model = tf.keras.Sequential([
  feature_layer,
  layers.Dense(128, activation='relu'),
  layers.Dense(128, activation='relu'),
  layers.Dense(1, activation='sigmoid')
])
  1. What would be a good default for eval_metrics_fn? I think this is highly dependent on the type of model that's defined, e.g. regression vs. classification.
  2. I thought it should be compatible already. @LiMinghao1994 @workingloong @brightcoder01 can take a look at this if that's not the case.
  3. It's not needed if you are using SQLFLow directly since dataset_fn is automatically generated by ElasticDL codegen.

@typhoonzero
Copy link
Collaborator Author

What would be a good default for eval_metrics_fn? I think this is highly dependent on the type of model that's defined, e.g. regression vs. classification.

Well, I'll consider about this. It's true that the eval_metrics_fn is needed for every model.

It's not needed if you are using SQLFLow directly since dataset_fn is automatically generated by ElasticDL codegen.

Is it possible to test a model using MaxCompute as the dataset, so that the dataset_fn is automatically generated by ElasticDL, so that if we need to test a model if it's working, we don't need to write a dataset_fn again.

@terrytangyuan
Copy link
Member

Is it possible to test a model using MaxCompute as the dataset, so that the dataset_fn is automatically generated by ElasticDL, so that if we need to test a model if it's working, we don't need to write a dataset_fn again.

I think it should be auto generated by SQFLow instead since ElasticDL's model definition must contain information like feature column names and label column name, which should be written by the user as part of dataset_fn. This kind of information can be easier to obtain through SQLFlow's extended query.

@typhoonzero
Copy link
Collaborator Author

@terrytangyuan The problem is if we want to involve many model developers to contribute models, the dataset_fn should not be a part of the model. It may be a function used to test the model is working, but not in the model's definition file.

@typhoonzero
Copy link
Collaborator Author

I thought it should be compatible already. @LiMinghao1994 @workingloong @brightcoder01 can take a look at this if that's not the case.

The order of output and labels argument should be the same as Keras's loss functions.

@terrytangyuan
Copy link
Member

@typhoonzero and I synced offline. To summarize, we will:

  1. Automatically create dataset_fn() internally in ElasticDL so when ODPS data source is used this will be generated automatically without having to implement it in model definition file.
  2. Allow specifying feature col names and label col name through ElasticDL, e.g. --data_reader_params or --envs.
  3. We start with supporting Keras subclass API first. Functional Keras API requires the use of tf.keras.layers.Input which needs to know the input shape, which is not easy to support since dataset_fn and model is decoupled now.

@terrytangyuan
Copy link
Member

terrytangyuan commented Nov 20, 2019

I thought it should be compatible already. @LiMinghao1994 @workingloong @brightcoder01 can take a look at this if that's not the case.

The order of output and labels argument should be the same as Keras's loss functions.

@typhoonzero This should be fixed by #1490. Please test to see if it works now.

@terrytangyuan
Copy link
Member

@typhoonzero This can be closed now, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants