-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Modular TensorFlow Graph C API #318
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the proposal. Would this RFC also enable custom XLA pass plugins, or is mainly targeting TF graph?
Hi, @eric-haibin-lin , XLA pass is not included in this RFC, it is mainly targeting TF graph. |
thanks for the reply! |
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
There will be a public design review meeting for this RFC. Meeting participants are expected to read the RFC ahead of time as the meeting will focus on the remaining issues/questions. |
@googlebot I fixed it. |
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
f12c89a
to
639e1dc
Compare
|
||
### Struct/Function/Object Overview | ||
- Struct | ||
- Struct that should be filled by the plugin: `P_OptimizerConfigFns`, `P_RegistrationParams`, `TF_OptimizerBuilder` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does P_
prefix mean that the struct should be filled by the plugin? I am thinking if TP_
prefix would be better here (that might match what we do for StreamExecutor
C API better where we have SP
and SE
prefixes).
Also, should TF_OptimizerBuilder
be (T)P_OptimizerBuilder
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I have already changed those names.
|
||
Graph [Optimize](https://github.com/tensorflow/tensorflow/blob/r2.3/tensorflow/core/grappler/optimizers/graph_optimizer.h#L58) function is the main part that plugin authors need to implement. The C API looks like below. Both input and output graphs are represented by serialized `TF_Buffer` objects: | ||
```cpp | ||
void P_Optimize(void* optimizer, TF_Buffer* graph_buf, TF_Buffer* optimized_graph_buf, TF_Status* s); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to have a struct for optimizer
instead of void*
? I am wondering if there would be some properties we would want to pass in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mechanism is following Compute
function in kernel C API. Here void* optimizer
is a struct defined in plugin and created in P_Create
function, it only contains some members that are used in plugin.
I am wondering if there would be some properties we would want to pass in the future.
I guess what you referred is TP_OptimizerBuilder
. Previously it is a function call with fixed param, but just as you mentioned, making it as a struct would be more flexible. Currently only create/delete/optimize is included since these are most useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have already changed and updated the RFC.
|
||
Plugin: | ||
```cpp | ||
Status P_MessageToBuffer(const plugin::protobuf::MessageLite& in, TF_Buffer* out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure, P_MessageToBuffer
and P_BufferToMessage
are not a part of the API surface, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think the name needs to be changed, they are confusing, maybe "plugin::MessageToBuffer" is more clear. I will clarify it in RFC.
|
||
void TF_InitGraphPlugin(P_RegistrationParams* params, TF_Status* status) { | ||
// Plugin authors can turn on/off some optimizers. | ||
params.config_fns.get_remapping = get_remapping; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these be TF_Bool fields instead of functions that return TF_Bool? Is there a reason that makes a function better in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It optional for plugin users to set these configurations. It not set, function ptr here is a nullptr, then it will use default value defined in rewrite_config.proto. TF_Bool can not represent default value.
|
||
* **Configuring existing optimizers** | ||
|
||
If pluggable graph optimizer is registered to a device type, e.g., GPU, its optional for plugin authors to provide a recommended configuration indicate whether some of existing optimizers in proper can be turned on/off, by populating flags in `P_RegistrationParams`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would existing optimizer configuration also be per-device-type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, plugin authors should register the optimizer along with device_type and configuration, and only one optimizer and configuration is allowed for each device type.
Making TP_OptimizerBuilder as a struct.
c0b0e7f
to
0cfdfd7
Compare
0cfdfd7
to
302e0ac
Compare
What does it mean optimizer per device in practice? Grappler meta-optimizer runs on graphs that could be placed on multiple devices at the same time. |
/cc @kulinseth and Alex, regarding TF dialect question raised during the meeting. I guess we can do something similar to https://github.com/tensorflow/mlir-hlo to pull TF dialect out of the TF repo, but this will be trickier because it unfortunately has dependencies (not much) on tensorflow core. You probably can get a more informed answer in this group: https://groups.google.com/a/tensorflow.org/g/mlir |
Design review meeting notes@annarev: Should structs populated by plug-in be prefixed with @annarev: Should the // Struct for Optimizer. Plugin authors must provide an optimize function.
// Creation and deletion functions are optional.
typedef struct TP_Optimizer {
size_t struct_size;
void* ext; // reserved for future use
void* (*create_func)();
void (*optimize_func)(void*, TF_Buffer*, TF_Buffer*);
void (*destory_func)(void*);
} TP_Optimizer; @kulinseth: Is the format for 'tf' dialect in the buffer TBD? @aselle: Are there any version compatibility requirements for Grappler? @annarev: @annarev: Can the configurations in @annarev: Would existing optimizer configuration also be per-device-type? (Original comment) @ezhulenev: What does it mean to have an optimizer per a device in practice? (Original comment) @sanjoy: Why do you have to tie a graph optimization pass to a PluggableDevice? @ezhulenev: What about optimizer plug-ins with conflicting configurations? @kulinseth: Is the order of these plug-in passes towards the end of all Grappler passes? @annarev: Should @annarev: What does @kulinseth: Quick question. For custom ops that are exposed through an op kernel library and are visible in the TensorFlow namespace, would they work with the plug-in passes? Alexandre Rames: Maybe this is not the right venue for the question. I’d like to learn more about the visibility of the @ematejska: The RFC should be updated to reflect the conclusions on conflicting plug-in optimizer configurations. After that, we will see if we can conclude the RFC on the PR or if we will need another design review meeting. |
@ezhulenev Thanks! for the pointer, much appreciated. Looks like this would be a good template to start with. Like you said as we dig into details we will know if the dependencies can be resolved.
|
f20edd5
to
d068e2f
Compare
d068e2f
to
444b95f
Compare
cc: @penpornk @ezhulenev @kulinseth @annarev @aselle @rmlarsen @ematejska @Jianhui-Li @ematejska @sanjoy
Please review. |
Apologies: above comment should have been for a different RFC. |
The above should have been posted to a different RFC. Apologies for the confusion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ShengYang1 Thank you for the changes and sorry for the delay! I took a quick look and have minor comments. The rest seems fine to me.
void* ext; // reserved for future use | ||
void* (*create_func)(); | ||
void (*optimize_func)(void*, TF_Buffer*, TF_Buffer*); | ||
void (*destory_func)(void*); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo:
void (*destory_func)(void*); | |
void (*destroy_func)(void*); |
Same as other places (there are a few other "destory"s).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review. All comments have been addressed.
|:-------------- |:-------------- |:-------------- |:-------------- |:-------------- | | ||
| ON | ON/DEFAULT | ON/DEFAULT | **ON**| The optimizer is enabled. | | ||
| ON | OFF | OFF | **OFF**| The optimizer is disabled, unless users manually unload the plugin. Grappler prints warnings to remind users that config has been changed based on plugin's config.| | ||
| ON | ON/DEFAULT | OFF | **OFF**| The optimizer is disabled if at least one plugin turns off it. Grappler prints warnings to remind users that config has been changed based on plugin's config, and potention performance regression may happen due to the conflict configs.| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "turns off it" should be "turns it off". Same for other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| ON | OFF | OFF | **OFF**| The optimizer is disabled, unless users manually unload the plugin. Grappler prints warnings to remind users that config has been changed based on plugin's config.| | ||
| ON | ON/DEFAULT | OFF | **OFF**| The optimizer is disabled if at least one plugin turns off it. Grappler prints warnings to remind users that config has been changed based on plugin's config, and potention performance regression may happen due to the conflict configs.| | ||
| ON | OFF | ON/DEFAULT | **OFF**| Same as previous scenario.| | ||
| OFF | ON/DEFAULT/OFF | ON/DEFAULT/OFF | **OFF**| The optimizer is always disabled when user turns off it. | | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add another paragraph (after the table) saying that plug-ins are responsible for benchmarking and ensuring that turning off the optimizer(s) doesn't cause nontrivial performance degradations? Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
``` | ||
|
||
When multiple plugins are registered and running for a graph, an error would be raised if their recommended configurations are different. Users should unload one of the plugins or disable plugin optimizers to resolve the conflict. | ||
When multiple plugins are successfully registered and running for a graph, their recommended configurations may differ with each other. If any of the config has turned off the optimizer, the optimizer will be disabled. The following table lists all possible scenarios. In the table, plugin's config is represented by tristates, `ON` means turn on optimizer, `OFF` means turn off optimzier, and `DEFAULT` means uses proper's config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo:
When multiple plugins are successfully registered and running for a graph, their recommended configurations may differ with each other. If any of the config has turned off the optimizer, the optimizer will be disabled. The following table lists all possible scenarios. In the table, plugin's config is represented by tristates, `ON` means turn on optimizer, `OFF` means turn off optimzier, and `DEFAULT` means uses proper's config. | |
When multiple plugins are successfully registered and running for a graph, their recommended configurations may differ with each other. If any of the config has turned off the optimizer, the optimizer will be disabled. The following table lists all possible scenarios. In the table, plugin's config is represented by tristates, `ON` means turn on optimizer, `OFF` means turn off optimizer, and `DEFAULT` means uses proper's config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Hi all, To push this RFC forward, I'd like to set a deadline for responses. It has been nine days since the authors made the changes so I think another week is sufficient. Please raise your concerns here by next Tuesday, December 8, 2020. If there are no outstanding concerns by then, I think we should wrap this up. |
|
||
* **GraphDef** | ||
|
||
The compatibility of `GraphDef` between plugin and proper is guaranteed by protobuf library. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compatibility of `GraphDef` between plugin and proper is guaranteed by protobuf library. | |
The compatibility of `GraphDef` between plugin and proper follows the same compatibility [rules](https://developers.google.com/protocol-buffers/docs/cpptutorial?hl=en#extending-a-protocol-buffer) and [guarantees](https://developers.google.com/protocol-buffers/docs/proto3?hl=en#updating) as protobuf library. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion. Updated.
@ematejska The deadline for additional feedback has passed. Since there is no outstanding concern, should we merge this? |
This comment has been minimized.
This comment has been minimized.
Hi, @ShengYang1 I am not sure whether this is the right place to ask question, but I will appreciate that if you can give me some advice:-). In your RFC, you mentioned that:
I found that the utility classes in the
So I think just copying these files into plugin side mabye not enough and my question is: If I want to modify the graph itself and do some pattern matches on the graph to support large set of operators fusion in the plugin side, are there any other better solutions? Thanks very much! |
Thanks for your question. Actually I prefer directly copying all those APIs( |
@ShengYang1 Thanks very much for your valuable suggestions , I will try:-). |
This RFC will be open for comment until Wednesday, Nov 18th, 2020.
Modular TensorFlow Graph C API
Objective
TensorFlow currently provides a C++ API for registering a custom graph optimizer in Grappler. This project aims to create a modular/plugin-based TensorFlow implementation with C APIs. Plugins will be able to register custom graph optimizers. Users only need to install the plugin in a specified directory, and the mechanism is able to discover and plug in the capabilities offered by the plugin.
This RFC is based on the Modular TensorFlow RFC, which aims at extending the TensorFlow design to plug in capabilities like adding a new graph optimizer.