Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object layout #69

Closed
markshannon opened this issue Jul 20, 2021 · 6 comments
Closed

Object layout #69

markshannon opened this issue Jul 20, 2021 · 6 comments

Comments

@markshannon
Copy link
Member

This is a meta issue for object layout and the parts of the VM that interact through objects.

Object layout considerations include:

Which bits of information should be stored in the object and which externally?
For example, the memory allocator, PyMem_Malloc or plain malloc, needs to know the size of allocated memory. Objects are also self describing. This is redundant.

The size of the object header, and what different layouts we want to support. Which objects should be treated specially and which should use a generic layout?

How will the state of an object be recorded, so that it can be efficiently handled by the cycle GC, refcounting and the allocator.

How compact can we make objects, specifically plain Python objects?
In Java an object consists of a header containing a reference to the class (either a pointer or some ID) some GC information, followed by the values. This should be our target for most objects.

@markshannon
Copy link
Member Author

And since #72 requires changing the layout of the dict as the object degenerates, this is also relevant:
Dictionary modifications to enable optimizations

@markshannon
Copy link
Member Author

As a bit of motivation:
For this class

class C:
    def __init__(self):
        self.a = 1
        self.b = 2
        self.c = 3
        self.d = 4
        self.e = 5

The object itself takes 6 words:

  • GC 1
  • GC 2
  • ref count
  • class pointer
  • weakref
  • __dict__ pointer

and the dictionary takes 13 words:

  • GC 1
  • GC 2
  • ref count
  • class
  • size
  • version
  • keys pointer
  • values pointer
  • values[5]

for a total of 19 words.

With a semi-compact header and compact object layout this can be reduced to 9 words:

  • GC
  • ref count
  • class pointer
  • __dict__ pointer
  • values[5]

roughly halving memory usage.

@gvanrossum
Copy link
Collaborator

But the design in #72 requires us to fix the size of values[] before knowing how large it will grow, and proposes 16 values, so it would be 20 words (21 until we actually have the semi-compact header from #70), so more than the original 19. Gains in space will only show up when there are more than 6 or 7 attributes. (There may be other gains due to LOAD_ATTR specialization.)

@markshannon
Copy link
Member Author

We need to fix the size of values[] for each object, not for all objects of the same class, so we can converge on a near ideal size.
See #72 (comment)
Say we need four attributes for instances of a class. The first instance would have space for 20, the second 19 and so on, down to five or six (depending on much much headroom we want).

@markshannon
Copy link
Member Author

#72 is done (at least for now, it is partly deferred)
#80 is done (once the PR is merged).
That leaves #125 and #77 which are only loosely related, so I'm closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants