Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: TensorFlow Build Improvements #179

Closed
wants to merge 0 commits into from
Closed

Conversation

bmzhao
Copy link
Member

@bmzhao bmzhao commented Nov 23, 2019

This RFC will be open for comment until Friday, December 13th, 2019.

cc @gunan @martinwicke

Tensorflow Build TLC

Status Proposed
Author(s) Brian Zhao ([email protected])
Sponsor Gunhan Gulsoy ([email protected])
Updated 2019-11-22

Objective

Simplify Tensorflow’s build so that

  1. Builds are more granular
  2. Builds are easier to reason about
  3. TF’s build is well positioned to immediately benefit from Bazel shared library support

@ewilderj ewilderj changed the title RFC: TF Build TLC RFC: TensorFlow Build Improvements Nov 25, 2019
@@ -0,0 +1,213 @@
# Tensorflow Build TLC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Tensorflow Build TLC
# Tensorflow Build Improvements

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@ewilderj ewilderj added the RFC: Proposed RFC Design Document label Nov 25, 2019
Simplify Tensorflow’s build so that
1. Builds are more granular
2. Builds are easier to reason about
3. TF’s build is well positioned to immediately benefit from [Bazel shared library support](https://docs.google.com/document/d/1qYHGcxP9BEi7t9b53r8STr9G3XVtvwy9V-EXqr54Xxg/edit#)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please link to a public document on this feature, or remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (although the original doc link was a public doc).

brijk7
brijk7 previously approved these changes Nov 26, 2019
@brijk7 brijk7 dismissed their stale review November 26, 2019 02:40

Didn't mean to approve before feedback period.

@reuben
Copy link

reuben commented Nov 26, 2019

It's great to see this proposal. For DeepSpeech we have similar constraints as Android, we want to keep code size down, so for a while now we've been using very fine grained deps, and for one case we had to break up the TensorFlow target by hand to avoid a large binary size increase from a seemingly small dependency. That change in particular leads to several megabytes difference in our final binary size compared to simply depending on //tensorflow/core/kernels:cwise_op. This proposal seems to be going in the right direction and is much appreciated.

@bmzhao
Copy link
Member Author

bmzhao commented Nov 27, 2019

It's great to see this proposal. For DeepSpeech we have similar constraints as Android, we want to keep code size down, so for a while now we've been using very fine grained deps, and for one case we had to break up the TensorFlow target by hand to avoid a large binary size increase from a seemingly small dependency. That change in particular leads to several megabytes difference in our final binary size compared to simply depending on //tensorflow/core/kernels:cwise_op. This proposal seems to be going in the right direction and is much appreciated.

Thank you! We hope this will help make Tensorflow's build more approachable and usable in other projects!

tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Nov 27, 2019
… is now guaranteed to have cc_libraries foo_proto_cc_headers_only and foo_proto_cc_impl. The name foo_proto_cc aliases to either foo_proto_cc_headers_only or foo_proto_cc_impl depending on whether make_default_target_header_only is True.

This also fixes a bug where header only tf_proto_libraries did not declare dependencies on the header-only tf_proto_libraries that they depend on.

This is needed as part of the Tensorflow Build Improvement RFC, described here: tensorflow/community#179

PiperOrigin-RevId: 282674607
Change-Id: I408ef6a9dd841d2c3779c7d526c4b83d75040b53
Copy link
Member

@seanpmorgan seanpmorgan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much needed proposal, thanks! Looking forward to the lowered barrier


When multiple build targets contain the same source files, they form parallel build graphs. This is currently the case with the build targets [//tensorflow/core:mobile_srcs_no_runtime](https://github.com/tensorflow/tensorflow/blob/500438a24419f8f0921bc8bb52d1951e3ee4235e/tensorflow/core/BUILD#L1572-L1582) and [//tensorflow/core:framework_internal_impl](https://github.com/tensorflow/tensorflow/blob/500438a24419f8f0921bc8bb52d1951e3ee4235e/tensorflow/core/BUILD#L2654-L2666) with respect to any C++ files in tensorflow/core/util/*. Any build target that accidentally includes both of these targets in its dependencies will have [ODR violations](https://en.wikipedia.org/wiki/One_Definition_Rule).

This parallel build graph exists because Android would like to produce a minimal Tensorflow shared object by pulling in only a subset of Tensorflow's codebase. The reason why Android couldn't simply use the existing Tensorflow Build targets is because they are [too large](https://docs.google.com/document/d/1kg_tVB1g3c5isPaP1OXW_KZwF_hcTiKz5afLkKjiSz8/edit#heading=h.8tfaww3ddb83). Therefore, if we address [problem 1](https://docs.google.com/document/d/1kg_tVB1g3c5isPaP1OXW_KZwF_hcTiKz5afLkKjiSz8/edit#heading=h.ytms1uc3twy), we should be able to refactor the Android build to re-use these new smaller targets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were these meant to link to a google doc instead of headers in this doc? If so they are private.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good catch! Yes these were supposed to links back to headers in the same markdown file. Just fixed, thanks!

tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Nov 28, 2019
This change is part of the refactoring described in Tensorflow Build Improvements RFC: tensorflow/community#179

PiperOrigin-RevId: 282970277
Change-Id: I1430d0beea8917bd4fb167f30c740e2e9329b3cc
tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 4, 2019
This change is part of the refactoring described in Tensorflow Build Improvements RFC: tensorflow/community#179

PiperOrigin-RevId: 283675621
Change-Id: I355d72b476c3f4222dd3e83768a374bf3cc8beb3
tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 5, 2019
…ules. This allows tensorflow to start experimenting with rules_cc's cc_shared_library implementation.

This change is part of the Tensorflow Build Improvements RFC: tensorflow/community#179

PiperOrigin-RevId: 283908432
Change-Id: Id497dfee6211650f0e10ccb1332ac625af2f23fd

For tensorflow/core/BUILD, we will introduce BUILD files in:

- [tensorflow/core/common_runtime](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/common_runtime)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are MKL components in several of these folders but default builds don't include them and won't detect errors during the refactoring process. How can we help make this smoother?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Clayne,

That's a great point; so far the changes we've made don't touch anything MKL specific, but as we start refactoring core/common_runtime we will need to be careful not to break MKL integration.

My understanding is that Tensorflow builds use MKL by default via Eigen: https://github.com/tensorflow/tensorflow/blob/d26ba01420ed9e4e1a55f22ca0e3034fdd3d3671/tensorflow/core/kernels/BUILD#L756-L758, meaning hopefully our CI will start yelling at us if a breakage occurs; but I'll double check with @gunan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, most of MKL is not covered under the presubmits we are running. By default, only tensor contraction ops use MKL-DNN.

I think the majority of MKL code is under tensorflow/core/kernels, as MKL kernel implementations. The requirement there is, nothing outside tensorflow/core/kernels should define kernels as dependencies (except for binaries, or pip package).
I think MKL also has some code outside kernels, and for these, it depends on where exactly the code is. As Brian suggested, we will start from the bottom layer of the code, and slowly build up our work.
But if you could refactor your code to minimize number of dependencies, break any include cycles and build a clear hierarchy within your code/files, that would help us greatly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can look at that. Most of the MKL code is in mkl_* files and/or surrounded by #ifdef INTEL_MKL_DNN code blocks so should be fairly easy to identify when restructuring BUILDs

tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 7, 2019
…d wires it into tensorflow/core/BUILD.

This change is part of the refactoring described in Tensorflow Build Improvements RFC: tensorflow/community#179

PiperOrigin-RevId: 284279060
Change-Id: I1ac535f3f811643dceccd8caafa4c4dcfdfd3b88
tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 8, 2019
…rm:stacktrace_handler into tensorflow/core/BUILD.

This change also removes tf_additional_lib_srcs, which after bazel starlark expansion amounts to just stacktrace_handler. This change is part of the refactoring described in the Tensorflow Build Improvements RFC: tensorflow/community#179

PiperOrigin-RevId: 284385812
Change-Id: I2ad704e196cf354008a3f7e5132d6bdd9f4569d9
tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 8, 2019
This change also removes the glob tensorflow/core/platform:legacy_lib_internal_srcs, and replaces it with the leftover from the glob's expansion.
This change is part of the Tensorflow Build Improvements RFC: tensorflow/community#179

PiperOrigin-RevId: 284386433
Change-Id: I052f3381c76fbaea52f3299618c6961ec506fbcc
tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 8, 2019
…latform-specific headers explicit, and also helps make checking in BUILD files in subdirectories of tf/core/platform easier.

This change is part of the Tensorflow Build Improvements RFC: tensorflow/community#179

PiperOrigin-RevId: 284392848
Change-Id: I635d2051bc064d88dbdd8fb146636887adc19dbb
pull bot pushed a commit to Pandinosaurus/tensorflow that referenced this pull request Dec 8, 2019
…e_google, and extract their common files into a single filegroup. This change helps prepare for a subsequent change adding BUILD files to tf/core/platform/default and tf/core/platform/windows.

This change is part of the Tensorflow Build Improvements RFC described here: tensorflow/community#179

PiperOrigin-RevId: 284436373
Change-Id: I695ef8042388ad2bc17c763b87f4f7d0e189cd74
ArmageddonKnight pushed a commit to UofT-EcoSystem/tensorflow that referenced this pull request Dec 10, 2019
…tform/windows/BUILD.

This is part of the refactoring described in the Tensorflow Build Improvements RFC: tensorflow/community#179
Subsequent changes will migrate targets from build_refactor.bzl into the new BUILD files.

PiperOrigin-RevId: 284712709
Change-Id: I650eb200ba0ea87e95b15263bad53b0243732ef5
tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 10, 2019
…/build_refactor.bzl into tf/core/platform/windows/BUILD.

Subsequent changes will do the same for tf/core/platform/default/BUILD, and remove build_refactor.bzl.
This refactoring is part of the Tensorflow Build Improvements RFC described here: tensorflow/community#179

PiperOrigin-RevId: 284858768
Change-Id: I60dc0b81c6d5e7c81722e33c4d8ce290499e9b54
ArmageddonKnight pushed a commit to UofT-EcoSystem/tensorflow that referenced this pull request Dec 11, 2019
…/build_refactor.bzl into tf/core/platform/default/BUILD.

Subsequent changes will change tensorflow/core/platform/BUILD to use these targets, and remove build_refactor.bzl.
This refactoring is part of the Tensorflow Build Improvements RFC described here: tensorflow/community#179

PiperOrigin-RevId: 284885829
Change-Id: I80b0722f11d89b4cb0bef628beedd24d882ac1b4
tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 11, 2019
…BUILD and tf/core/platform/windows/BUILD, allowing us to finally remove build_refactor.bzl.

This refactoring is part of the Tensorflow Build Improvements RFC described here: tensorflow/community#179

PiperOrigin-RevId: 284910748
Change-Id: I6bd2447a22d3f5c81988959a7fece803ba34bc70
ArmageddonKnight pushed a commit to UofT-EcoSystem/tensorflow that referenced this pull request Dec 11, 2019
…BUILD and tf/core/platform/windows/BUILD, allowing us to finally remove build_refactor.bzl.

This refactoring is part of the Tensorflow Build Improvements RFC described here: tensorflow/community#179

PiperOrigin-RevId: 284913256
Change-Id: I46b7bd50f2d2056ac54f7adf49a7a3ecc5ccbe86
tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 22, 2019
This is part of the refactoring described in tensorflow/community#179

PiperOrigin-RevId: 286819947
Change-Id: Iff7841d87244a10ab4554e61b2731350020ff236
tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 23, 2019
…build_config:protos_cc, and tensorflow/core/platform/default/build_config:protos_cc_impl.

This is part of the refactoring described in tensorflow/community#179

PiperOrigin-RevId: 286821940
Change-Id: I7295a47489788f7ad52dad639ea8db60b100c6a6
tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 23, 2019
…ild_config/BUILD.

This helps pave the way for removing the build_config folder.
This is part of the refactoring described in tensorflow/community#179

PiperOrigin-RevId: 286822930
Change-Id: Ic9d0d064ed14c5c9bfb1822ae958b402ac2581bd
tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 23, 2019
…ve custom macros defined in tf/core/platform/default/build_config.bzl, and remove the build_config folder.

This is part of the refactoring described in tensorflow/community#179

PiperOrigin-RevId: 286827875
Change-Id: Ied11f461cba3a18b48669be4853be8fd9a1f0f72
@bmzhao
Copy link
Member Author

bmzhao commented Jan 7, 2020

As mentioned in SIG Build, here's a link to the master spreadsheet of the current status of the refactoring so far:
https://docs.google.com/spreadsheets/d/11QLBDRa2XHNAtA_Wby81Q9xWnDHTp6ChJrVkstbBRBE/edit?usp=sharing

tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 9, 2020
This is necessary to start using bazel's experimental cc_shared_library support.
This change is part of the refactoring described in tensorflow/community#179

PiperOrigin-RevId: 288808507
Change-Id: Ie78fb4ff8dad128ebef280037cce4d3c4f42addc
tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 10, 2020
…d moving existing usages of cc_library, cc_test, and cc_binary to rules_cc's version for a subset of the build known to be part of libtensorflow_framework. We will migrate further subdirectories of tf/core as we go along. This is part of Tensorflow's build refactoring, described in: tensorflow/community#179

PiperOrigin-RevId: 289161138
Change-Id: Ic28a5b032a44315ea0528ad8c6737b36eb1d27a6
tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 15, 2020
…ibrary to tf/.bazelrc after patching the iOS build failure. This basically is e635ec0 with an additional patch to rules_swift. This change is part of the build refactoring described in tensorflow/community#179

PiperOrigin-RevId: 289776116
Change-Id: I7f29e0e0b4447334a334ad888f464e12fbe29485
tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 15, 2020
…ibrary to tf/.bazelrc after patching the iOS build failure. This basically is e635ec0 with an additional patch to rules_swift. This change is part of the build refactoring described in tensorflow/community#179

PiperOrigin-RevId: 289909322
Change-Id: I06f10d811f0ca598047e837acc0230afbf290e6f
tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 19, 2020
…on of cc build rules. This is part of the build refactoring described in tensorflow/community#179

PiperOrigin-RevId: 290466858
Change-Id: I9b38b1b7f44f1defea9be6ffb9e5da0c5ca99fb5
tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 21, 2020
…factoring described in tensorflow/community#179.

PiperOrigin-RevId: 290757344
Change-Id: I2c4214ad7b34372c1ff1fbdd94baf5d3721c2a5e
ArmageddonKnight pushed a commit to UofT-EcoSystem/tensorflow that referenced this pull request Jan 28, 2020
…refactoring described in tensorflow/community#179.

PiperOrigin-RevId: 291839268
Change-Id: I4ace5769ae4a41753c56114bbb9d841a0035d9e8
tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Feb 12, 2020
This is necessary to start using bazel's experimental cc_shared_library support.
This change is part of the refactoring described in tensorflow/community#179

PiperOrigin-RevId: 294736824
Change-Id: I8d13c71545cf2f2edd47818b40fa413e3e9c3254
@bmzhao bmzhao requested a review from ematejska as a code owner February 19, 2020 00:44
@bhack
Copy link
Contributor

bhack commented Apr 14, 2020

This RFC will be open for comment until Friday, December 13th, 2019.

Can we extend the deadline? Is this closed for comment?
Can we use a specific label after the public deadline in the case it is closed for comment and just waiting the internal approval queue?
Generally we have no visibility about this processes after the deadline.

@mihaimaruseac
Copy link
Contributor

Hi, @bhack
Since this PR is still open, if there are comments please comment on the RFC using the GitHub review interface.
If there are comments against a closed PR for RFCs, you should open an issue and there will be discussion. Depending on the implementation status (which can always be checked on the other ecosystem repositories), the implementation might change based on the discussion.

@bhack
Copy link
Contributor

bhack commented Apr 14, 2020

@mihaimaruseac Without any other label feedback I don't understand the status of an open PR after the public RFC deadline. Can you clarify this a little bit related to the point 5 of our RFC governance at https://github.com/tensorflow/community/blob/master/governance/TF-RFCs.md?

@theadactyl
Copy link
Contributor

@ematejska can you check on status of this RFC and make sure it's in the right bucket?
@bhack let's make sure to save process meta-commentary in corresponding issues to make them easier to track and refer back to - feel free to open one on this topic or ping a stale issue if one is already open

@bmzhao
Copy link
Member Author

bmzhao commented Apr 27, 2020

Oops as I was updating my fork, I accidentally rewrote this commit, which automatically closed this PR; unfortunately, it looks like re-pushing will create a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.