Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand the stakeholders section #73

Closed
wants to merge 4 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 94 additions & 2 deletions capi_problems.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,11 @@ describes this complex state of affairs in terms of the
actions that different stakeholders need to perform through
the C API.

There are two main groups of the users of the C API:

* External users. They only consume the API.
* CPython developers. They define, implement, and consume the API inside the CPython code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of the structure of this section, I think this binary division into "us" and "them" gives too much weight to the internal usage in cpython. I tried in the section to include a list of types of users, and this lumps everything other than cpython core into one category, with the others being sub-categories.

Can we, instead of talking about "two main types of users", just add the internal usage in cpython core as one of the stakeholders, and in the cpython core section talk about what they do or don't need?


There are actions which are generic, and required by
all types of API users:

Expand All @@ -94,12 +99,99 @@ all types of API users:
* Manage sub-interpreters
* Handle and send signals

External users often support multiple Python versions
or Python implementations. For them, these are the common
requirements:

* Specification/documentation of the API and ABI
* Backwards/forwards compatibility and stability for:
* API (code works as-is but requires recompilation)
* ABI (compiled binaries work)
* Across CPython versions
* Across Python implementations

Note: in practice, there will be tradeoffs between some of the requirements,
for example, stability vs. performance.

Groups of users are:

**CPython developers**

CPython developers change the CPython internals to implement new
functionality or improve/optimize the current functionality. While ideally,
they would be free to alter any internal details, in practice, anything that
is intentionally (or even unintentionally) exposed to external C API users
should remain compatible (depending on the concrete policy for the
specific API). If it is part of the Stable ABI, it must remain binary
compatible.

One recent notable example is that a significant amount of work on the
Gilectomy project and the `PEP 703 <https://peps.python.org/pep-0703/>`__
was spent on dealing with reference counting, most specifically on
keeping the reference counting contract, which is only exposed in C API,
while making it scale with parallel execution.

For more details see the Discuss
`thread <https://discuss.python.org/t/lets-get-rid-of-the-stable-abi-but-keep-the-limited-api/18458>`__:
Let’s get rid of the stable ABI, but keep the limited API.

Because the CPython code is by definition intended to run only on given
version of CPython and is always recompiled, the following requirements
*do not apply* to CPython developers:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reword this somehow? Everyone's requirement apply to cpython core devs because they are the stewards of the language.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, absolutely valid point. What about "...do not apply to the C API usage in CPython core codebase"?


* Compatibility with alternative Python implementations
Copy link
Member

@iritkatriel iritkatriel Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cpython test suite is actually maintained to support alternative implementations. There is a decorator to indicate which tests are testing implementation details that other implementations don't need to care about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's the public contract of Python, but from the point of view of CPython developer I don't think that it matters whether the API that is used to implement CPython itself supports alternative Pythons or not (and should not be visible in Python's public contract). For alternative Pythons it matters whether the C API that extensions use is implementable with their internal structure. Those two APIs (for CPython core, for 3rd party extensions) are one thing currently (or overlap a lot), but they do not have to be.

Maybe it should clarify that in the section "CPython developers" we focus on the development of CPython core and the C API's requirements for that purpose only. If those requirements are in contradiction with requirements for other use-cases, it's fine. I think that's the point of collecting the requirements.

Copy link
Member

@iritkatriel iritkatriel Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it even correct to say that CPython Core (excluding stdlib extensions) is using the cpython API? It can access whatever it wants.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should just focus on the requirements that different stakeholders have, and omit statements about what requirements they don't care about. So rather than explaining how python core doesn't need to support multiple versions, we mention that many extension writers do.

Copy link
Contributor Author

@steve-s steve-s Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. The document actually does not specify what exactly is "C API" in its context. Is it limited API, things exposed in Python.h, things exposed in any header in Python's include directory? In any case, those header files and the functions(, structs, macros, ...) declared in them are shared between external code (extensions, embedders) and the interpreter itself. Also the introduction of this document mentions:

It [C API] evolved from what was initially the internal API between the C code of the interpreter and the Python language and libraries

I think that it is a real issue that the current C API (even if we think of it as the most restricted version: the limited API) hinders the development of the internals of CPython itself. The linked discussion demonstrates that clearly I believe. There is also discussion about stable ABI and PEP 703, because it needs to hide implementation details of reference counting behind actual function call as opposed to C macros, i.e., break the stable ABI. I think all that is evidence that there should be a stakeholder section for CPython and it should list those issues.

My specific point here is that whatever CPython uses to implement itself, does not actually need to work for other Pythons and does not need any stability guarantees. Whether that is the same API, some superset, or something completely different belongs to the design/implementation of whatever comes up from assessing the requirements described here.

Admittedly, I believe that C API for 3rd party extensions that hides the implementation details enough to be really future proof not only for other Pythons but future development of CPython, must be inherently different from what CPython uses internally. The fact that the current C API can somehow serve both as public C API for external code and internal API actually shows that it is not abstract enough. However, this is a conclusion/solution, which is what I am trying to avoid in this document, and I am rather trying to describe the problems that lead me to this conclusion :-)

I will think about rephrasing this section bit more. Any suggestions are welcome :-)

* API/ABI stability

Note: this section is concerned with CPython core, not the standard library.

**Extension writers**

These are the traditional users of the C API, and their requirements
are listed above.
Different users have different motivations for developing. Depending on
the motivations some of the expectations and requirements are different.

*Performance of Python code*

Functionality that can be implemented in idiomatic Python code is instead
implemented as an extension (or existing Python code is rewritten).

Such code performs many Python-specific operations, such as accessing
attributes, and usually defines multiple Python classes and other complex
Python specific structures.

Because the users actually do not desire to use C, but only want
the performance of C, these extensions are often implemented through
alternative APIs, most notably through `Cython`, which allows writing
code "like Python".

Another example includes packages such as NumPy or Pandas, where part of the
package falls into this category (NumPy's dtypes and the complex logic
around them) and other parts fall into the following category.

Requirements: access to many Python specifics (e.g., defining a metaclass)
at the best performance.

*Performance of numerical computations*

Code that is intended to provide the best possible performance through low level
techniques not available in Python or through external libraries, such as
TensorFlow.

Such code needs to retrieve raw data from Python, then perform the computation,
and then transfer the results back to Python.

Requirements: fast bulk read and write access to raw data encapsulated
in some Python structures.

*Binding for native libraries*

Extensions that provide Python interface to native libraries that provide
functionality not available in Python. Example: psutils.

Requirements: ability to expose native functions to Python. While most of
such extensions define Python classes and other Python specific constructs,
it is not a strict requirement. The Python "layer" around the native
functions can be implemented in Python code.


**Embedders**

Expand Down