|
| 1 | +# tensorflow/api-owners review practices |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +This is an attempt to gather commonly discussed topics when doing API |
| 6 | +reviews. It’ll hopefully be a useful resource to both API owners and people |
| 7 | +proposing API changes. [TF API Owners](https://github.com/orgs/tensorflow/teams/api-owners) |
| 8 | +meet twice weekly to discuss changes. We try to get to PRs on the next meeting, |
| 9 | +but we don’t always make it all the way through. If your change is particularly |
| 10 | +urgent, please ping the PR to notify us of any urgency. |
| 11 | + |
| 12 | +## Process |
| 13 | + |
| 14 | +We only look at changes which have already been approved by other reviewers. If |
| 15 | +there are major outstanding comments, we will wait with API review until those |
| 16 | +are resolved. If there are questions for API owners, explicitly raise this in |
| 17 | +the comments to get an answer. |
| 18 | + |
| 19 | + |
| 20 | +## High level points |
| 21 | + |
| 22 | +### Backward and forward compatibility |
| 23 | +We avoid backwards-incompatible API changes. We also avoid |
| 24 | +backwards-incompatible behavior changes, such as restricting the set of valid |
| 25 | +inputs to a function or extending the set of valid outputs of a function. Adding |
| 26 | +support for previously not supported behavior is okay, as are changes to |
| 27 | +explicitly experimental APIs (see section below). When needing to provide a new |
| 28 | +or different behavior, we strongly prefer a new version of the API over breaking |
| 29 | +backwards compatibility. Note that we are free to deprecate APIs; we just cannot |
| 30 | +break code which relies on their documented behavior. We need to worry about |
| 31 | +backward compatibility both of our python APIs and of the serialized GraphDefs, |
| 32 | +and in general breaking serialized GraphDefs is worse than breaking the python |
| 33 | +APIs. |
| 34 | + |
| 35 | +Forward compatibility is more subtle: we should avoid changing the graph |
| 36 | +produced by currently correct python code without a three weeks notice. This |
| 37 | +comes up most frequently when adding new ops, but also applies to non-obvious |
| 38 | +things such as the graph emitted by gradients or pfor. |
| 39 | + |
| 40 | + |
| 41 | +### Docstrings |
| 42 | + |
| 43 | +TF APIs should have comprehensive documentation in the form of docstrings. If at |
| 44 | +all possible these docstrings should have runnable examples, and these examples |
| 45 | +should form a doctest so they stay correct. The examples should demonstrate an |
| 46 | +end-to-end user workflow, such that it’s clear how to generate the necessary |
| 47 | +inputs for the API and what to do with the outputs. The docstring should be |
| 48 | +understandable by someone who is not familiar with TF. See the [guide to writing |
| 49 | +TF docstrings](https://www.tensorflow.org/community/contribute/docs_ref) for |
| 50 | +more information. |
| 51 | + |
| 52 | +Our documentation generator for classes only sees methods, so prefer defining |
| 53 | +members as properties instead of assigning them in `__init__`. |
| 54 | + |
| 55 | +Docstrings should only refer to other public TF API symbols (i.e. do not refer |
| 56 | +to other symbols defined in the same file as a function which is just now being |
| 57 | +made public) and should refer to public API symbols by their full exported name. |
| 58 | + |
| 59 | +### Common names |
| 60 | + |
| 61 | +Prefer keepdims over keep_dims. Prefer axis over dim. Data types are called |
| 62 | +dtype. name is a common last argument of ops but backward compatibility mandates |
| 63 | +that new arguments are added after the last existing argument, even if that |
| 64 | +results in name not being the last argument. |
| 65 | + |
| 66 | +We generally prefer spelling things out over using abbreviations except when |
| 67 | +abbreviations are more standard than spelling things out (i.e. don’t spell out |
| 68 | +linalg or svd). When in doubt we ask domain experts or use web search to see |
| 69 | +what spelling is most common. |
| 70 | + |
| 71 | +If possible we prefer to name things in a similar way to numpy (e.g., we would |
| 72 | +not pick einsum as a name, but numpy and others before it have, and that |
| 73 | +precedent is very strong). |
| 74 | + |
| 75 | +We prefer experimental namespaces (i.e. tf.data.experimental.foobar) over |
| 76 | +experimental-prefixed names (i.e. tf.data.experimental_foobar) except when |
| 77 | +adding an experimental class method, or an experimental argument. Experimental |
| 78 | +endpoints should be deprecated in a minor release before they can be removed in |
| 79 | +the next. We would like new experimental symbols to be things which will |
| 80 | +eventually end up in core TF as opposed to things we expect will be phased out |
| 81 | +with no clear replacement. The best expectation to have for an experimental |
| 82 | +endpoint is that the “experimental” will simply be removed. If you don’t believe |
| 83 | +that’ll work, it should probably not be added in its current form. |
| 84 | + |
| 85 | +### Style |
| 86 | + |
| 87 | +Generally, follow Google style. |
| 88 | + |
| 89 | +Avoid redundancy. Do not write arguments of the form `function(..., |
| 90 | +enable_feature=False, feature_config=None)` if you can also write `function(..., |
| 91 | +feature_config=None)`, where implicitly, `enable_feature = feature_config is not |
| 92 | +None`. |
| 93 | + |
| 94 | +Try to embed well with the ambient language. Think about how your API interacts |
| 95 | +with language idioms (e.g., in Python: can it be hashed, i.e., used as a dict |
| 96 | +key? Is it iterable? Is it a mapping? Can it be equality compared? |
| 97 | +Ordered?). Think about how your API interacts with other pieces of the Python |
| 98 | +ecosystem as well— is there an analogue in Numpy or PyTorch that we should |
| 99 | +consider aligning with? |
| 100 | + |
| 101 | +Use language-native constructs wherever you can. In Python, a tuple should be a |
| 102 | +tuple. The bar for custom configuration objects is relatively high, a dict or |
| 103 | +namedtuple goes a long way. |
| 104 | + |
| 105 | +In particular, do not expose protobufs directly as part of an API. You can use |
| 106 | +protobufs for serialization or to encode network traffic. Protobufs should |
| 107 | +always be an implementation detail, and never visible on the API surface. Use |
| 108 | +language native constructs (dicts or classes for Python, structs for C/C++) if |
| 109 | +you need configuration objects. |
| 110 | + |
| 111 | +Avoid global (or any non-local) state as much as possible (this includes Python |
| 112 | +'with' scopes). If you need global context, consider whether it can be |
| 113 | +thread-local. The TF API is supposed to be thread-safe. Avoid stateful operation |
| 114 | +(mutability) if you can. Both features make it hard to reason about code, and |
| 115 | +make composability harder to achieve. |
| 116 | + |
| 117 | +We prefer strings ("auto", "never", etc) over enums (tf.namespace.AUTO, |
| 118 | +etc). Strings are easier to type, and forces us to document all possible values |
| 119 | +and their semantics in the docstrings of all places which accept the string, as |
| 120 | +opposed to only in the enum definition, which is a little friendlier. |
| 121 | + |
| 122 | +### Orthogonality and integration with the existing APIs |
| 123 | + |
| 124 | +Is the new API implementable in terms of existing APIs? If so, we might want to |
| 125 | +consider pointing users to using the existing APIs. Does the new API add enough |
| 126 | +value over a combination of existing APIs? Does the API solve only a specific |
| 127 | +problem (that’s usually a sign combinations of existing APIs would be |
| 128 | +preferred)? |
| 129 | + |
| 130 | +If not, are existing APIs implementable in terms of the new API? If this is |
| 131 | +simple, we might want to steer users towards the new and away from the old API |
| 132 | +(possibly, old APIs should be deprecated along with introducing the new API). |
| 133 | + |
| 134 | +If neither is the case, it might be possible that there is a more general API |
| 135 | +which makes both the existing API and the new API easy to express. We try to |
| 136 | +keep global consistency of our API in mind when reviewing new changes. |
| 137 | + |
| 138 | +How will this API work together with others? Does it do something slightly |
| 139 | +differently than others? Does it expect inputs which match what other parts of |
| 140 | +TensorFlow produce? Does its output match what other parts of TensorFlow can |
| 141 | +consume? |
| 142 | + |
| 143 | +Does it do things the same way other similar pieces in TensorFlow do it? E.g., |
| 144 | +if a common pattern to achieve a behavior is an extra argument, don't use a |
| 145 | +function decorator to achieve the same in a different area of the API. |
| 146 | + |
| 147 | +Two wrongs don’t make a right. That is, if a bad API already exists in TF, that |
| 148 | +does not give license to new APIs to be bad in the same way. Improvement must be |
| 149 | +balanced with consistency, however, and sometimes it’s okay to carry small |
| 150 | +imperfections into new APIs for the sake of consistency with old APIs. |
| 151 | + |
| 152 | +### Optional arguments with default values |
| 153 | + |
| 154 | +Many APIs have optional arguments with a default value. Our recommendation is to |
| 155 | +use `None` as the default value of any optional arguments and have the |
| 156 | +implementation be responsible for handling it as opposed to using a default |
| 157 | +value that directly represents the behavior (e.g. `aggregate='sum'`). The |
| 158 | +latter prevents the implementation from distinguishing between the caller not |
| 159 | +setting the argument vs. the caller setting the argument to the default value, |
| 160 | +which may be needed when the default behavior is changing. |
| 161 | + |
| 162 | +### Does it belong in TF at all? |
| 163 | + |
| 164 | +As TF evolves there’s a tendency to put everything inside of it, with costs |
| 165 | +compounding over the long term. If there is a reasonable home for a new API |
| 166 | +outside core TF (say in addons, io, TFP, or other projects entirely) that can be |
| 167 | +strongly preferrable. If new code can be released as independent libraries, it |
| 168 | +should be. This is especially true for APIs that are actively evolving; core TF |
| 169 | +imposes many restrictions, so it’s far better to trial new APIs outside of the |
| 170 | +core library. |
| 171 | + |
| 172 | +## Adding new ops |
| 173 | + |
| 174 | +Adding new ops to TF should be done with care. We generally prefer not adding |
| 175 | +new ops if possible, but performance, hardware compatibility, and other concerns |
| 176 | +often do require new ops. |
| 177 | + |
| 178 | +When adding new ops, look for: |
| 179 | + |
| 180 | + - closure under automatic differentiation (i.e. we avoid ops which are |
| 181 | + differentiable but not twice-differentiable, or which are technically |
| 182 | + differentiable but not marked as such) |
| 183 | + - performant kernels (it’s better not to have an op than to have an op with a |
| 184 | + suboptimal kernel; we need to make sure kernel experts have reviewed the |
| 185 | + code) |
| 186 | + - broadcasting (all numerical ops should broadcast using numpy rules) |
| 187 | + - does support for this op have to be added to pfor/vectorized_map? |
| 188 | + - dtype support (in general all numerical ops should support the common |
| 189 | + integer, floating point, and complex dtypes, if they all make sense; we need |
| 190 | + to watch out for int32 on GPUs though) |
| 191 | + - device support (cuda kernels should be implemented if possible, and similarly |
| 192 | + a tf/xla bridge entry should be added if it makes sense) |
| 193 | + - attributes versus inputs (anything which can be an input to an operation |
| 194 | + should be an input, and attributes should only be used to parametrize the |
| 195 | + operation in ways that affect the output dtypes or sometimes shapes) |
| 196 | + - state management (is the op stateful? Can it instead be made stateless and |
| 197 | + rely on optimizations like memory reuse for performance? Can it be made to |
| 198 | + keep its state using one of the existing mechanisms like variables? If not, |
| 199 | + its state should be encapsulated using resource handles if at all possible) |
| 200 | + - we generally don’t like ops which are supported only on a single device (be |
| 201 | + it CPU, GPU, XLA, TPU, etc) and prefer to have at least a plan for writing |
| 202 | + device-agnostic code |
| 203 | + - should the python layer for this operation support raggedtensor/sparsetensor? |
| 204 | + |
| 205 | +## Experimental APIs |
| 206 | + |
| 207 | +Experimental APIs are APIs which have the word 'experimental' somewhere in their |
| 208 | +name; for example `tf.experimental.foo`, or `tf.foo.experimental.Bar`, or |
| 209 | +`tf.foo(experimental_bar=True)` or `tf.Foo().experimental_bar()`. We generally |
| 210 | +prefer experimental namespaces when possible, so prefer |
| 211 | +`tf.foo.experimental.Bar` over `tf.foo.ExperimentalBar`. |
| 212 | + |
| 213 | +Experimental APIs are APIs intended to be added to TensorFlow as-is, but which |
| 214 | +we reserve the right to change in backwards-incompatible ways if we have |
| 215 | +to. This is different from apis in `tensorflow/addons`, many of which are not |
| 216 | +necessarily intended to be added to core TF as they might have a more narrow use |
| 217 | +case initially (if APIs in `tensorflow/addons` do become widely useful they can |
| 218 | +"graduate" to core, either using experimental or not). |
| 219 | + |
| 220 | +No temporary APIs should be added to experimental (i.e. "we just need this until |
| 221 | +certain bugfix or certain new feature becomes available" is not a valid reason |
| 222 | +to add an API with experimental in the name.) |
| 223 | + |
| 224 | +No API with known deficiencies should be added to experimental. Experimental |
| 225 | +APIs should, to the best of our knowledge, not be expected to change in a known |
| 226 | +way (no argument with a known bad name, etc). Experimental can, however, be used |
| 227 | +for APIs which are a work-in-progress: it's fine to add experimental methods to |
| 228 | +a base class even if those methods are only implemented on some subclasses as |
| 229 | +long as we expect all classes to eventually implement those. |
| 230 | + |
| 231 | +The same amount of due diligence required for a real API is required for an |
| 232 | +experimental API: this means tests, benchmarks, documentation, end-to-end |
| 233 | +examples, etc |
| 234 | + |
| 235 | +Experimental APIs are not a license to break users. This means: |
| 236 | + 1. we do not remove experimental APIs which are widely used without an effort |
| 237 | + to help migrate users away |
| 238 | + 2. experimental APIs are not removed without warning and don't have |
| 239 | + backwards-incompatible changes made to them without warning (the warning can be |
| 240 | + a deprecation on version 2.x and removal on 2.x+1, but plain removal on 2.x |
| 241 | + with no notice on 2.x-1 is not ok) |
| 242 | + |
| 243 | +Small changes which are mentioned in relnotes and have obvious fixes might be |
| 244 | +made (for example if adding a new argument to a long argument list and we |
| 245 | +believe there are few pass-by-position users we might allow the new argument to |
| 246 | +be added to the middle and not the end of the parameter list). |
| 247 | + |
| 248 | +Large backwards-incompatible changes to experimental APIs still require an |
| 249 | +`experimental_foo_v2` or similar backwards-compatible evolution plan to avoid |
| 250 | +breaking users of the existing experimental API. |
| 251 | + |
| 252 | +No API endpoint should stay in experimental forever. If a particular |
| 253 | +experimental API hasn't had major changes in two minor releases we should remove |
| 254 | +the experimental annotation from the API name or delete it. If we do want to |
| 255 | +delete it we need to have a deprecation plan that can migrate all users to some |
| 256 | +other API endpoint or composition of existing APIs. In rare cases experimental |
| 257 | +APIs can continue to be iterated on after many releases (see TPUStrategy); this |
| 258 | +only applies for fairly large API surfaces. |
| 259 | + |
| 260 | +When removing the experimental annotation we should, if at all possible, allow |
| 261 | +escape routes to not break existing code. This means toplevel symbols |
| 262 | +`tf.experimental.foo` and methods like `tf.Class.experimental_foo` should get a |
| 263 | +deprecation warning on 2.x before deletion on 2.x+1; we should use the |
| 264 | +doc_controls decorators to not pollute API docs with deprecated "graduated" |
| 265 | +experimental APIs. For experimental function arguments we should consider |
| 266 | +catching `**kwargs` to raise the proper warnings for at least one version (note |
| 267 | +though that `**kwargs` is generally discouraged from our APIs; we prefer |
| 268 | +explicitly named keyword arguments if at all possible). |
0 commit comments