Skip to content

model zoo design #949

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

typhoonzero
Copy link
Collaborator

No description provided.

TRAIN sqlflow.org/modelzoo/iris_dnn_128x32
WITH model.learning_rate=0.001, model.learning_rate_decay="cosine_decay" ...
INTO modeldb.my_iris_dnn_model_fine_tune;
```
Copy link

@QiJune QiJune Oct 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another common case could be creating a new variant model based on the standard model(maybe just adjusting some layers), and training with a new dataset.

For example, we could use DeepFM model to do recommendation. But the model structure may be a little different, in other data sources and scenarios.

Or such high-level users could provide this new variant model back to model zoo.

@typhoonzero typhoonzero changed the title [WIP] model zoo design model zoo design Oct 11, 2019
@QiJune
Copy link

QiJune commented Oct 11, 2019

Data pipeline is a little complex, now we only add some check mechanism in model_meta.json to check the input format. We could complete the design gradually.

QiJune
QiJune previously approved these changes Oct 11, 2019
@@ -0,0 +1,187 @@
# Model Zoo

SQLFlow model zoo is a place to store model definitions, pre-trained model weights and model documentations. You can directly train, predict, analyze using one of the models using SQLFlow, or you can do model fine-tune, transfer learning to use the model to fit your dataset.
Copy link
Collaborator

@brightcoder01 brightcoder01 Oct 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our model zoo has some something in common with https://www.tensorflow.org/hub, maybe we can take it for reference.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will take a look~

The directory name is responsible to explain the model's type, network structure and which dataset is
used to train the pre-trained weights. You can access `sqlflow.org/modelzoo/iris_dnn_128x32/README.md`
from the browser to get the model's full documentation. All models under `sqlflow.org/modelzoo` are
developed under `https://github.com/sql-machine-learning/models` weights is only stored under
Copy link
Collaborator

@brightcoder01 brightcoder01 Oct 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weights is only stored under => weights are only stored under

terrytangyuan
terrytangyuan previously approved these changes Oct 11, 2019

Some details about the files in one model:

- `model_meta.json` contains important information used for load and run this model, a sample is shown below:
Copy link
Collaborator

@brightcoder01 brightcoder01 Oct 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the design of sqlflow, how do we express the feature engineering logic(such as the preprocess logic using TF-Transform) ? Should we put it in the select section such as SELECT SUM(X), MEAN(Y) FROM some_table, or in model definition?

Copy link
Collaborator Author

@typhoonzero typhoonzero Oct 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do data pre-processing using SQL statements currently, the pre-process SQL statements will generate some temporary tables for train/validation. Then the training SQL statement will use the temp table directly.

We did not consider cases using TF-Transform for now, maybe we will add this part of the design in the future.

@typhoonzero typhoonzero dismissed stale reviews from terrytangyuan and QiJune via 00e7f61 October 12, 2019 01:00
PREDICT predict_result.class
USING sqlflow.org/modelzoo/iris_dnn_128x32;
```
1. Train a model from scratch using the model definition:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model zoo doesn't train anything.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? SQLFlow users can choose to use the model to train from scratch or fine-tune a model, either is possible. We can not limit the usage only to fine-tuning or transfer learning.


## The Model Zoo Hosting Service

The model zoo hosting service is a file service that can be accessed from the internet.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model definitions are Python programs. We cannot make a file service that provides program downloading. This violates most security rules.

I see a viable solution to publishing a version of a model zoo is to package model definitions into a Docker image, together with the SQLFlow server.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added subdirectories in the file service for versioning. In fact, the versioning of Docker images are also saving different images for each version.

@typhoonzero
Copy link
Collaborator Author

close due to #1042

@typhoonzero typhoonzero deleted the model_zoo_design branch July 17, 2020 02:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants