Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add minimum threshold for number of buffers #1858

Merged
merged 3 commits into from
Nov 7, 2018
Merged

Add minimum threshold for number of buffers #1858

merged 3 commits into from
Nov 7, 2018

Conversation

brada4
Copy link
Contributor

@brada4 brada4 commented Nov 6, 2018

And dont break expectation that single-thread version can work in many threads.
#1847

@martin-frbg
Copy link
Collaborator

Also related are #938, #1141 - though it could be that is it ultimately the GEMM buffer handling that needs looking into.

@brada4
Copy link
Contributor Author

brada4 commented Nov 6, 2018

Yes, it is wiping dirt under doormat type of solution.

@martin-frbg
Copy link
Collaborator

Not sure if we want to go all the way to 64 as at least with the non-tls implementation this will actually have an impact on memory requirements ?

@brada4
Copy link
Contributor Author

brada4 commented Nov 6, 2018

Just in memory.c

#endif /* defined(SMP) */
    local_memory_table = (struct alloc_t **)malloc(sizeof(struct alloc_t *) * NUM_BUFFERS);
    memset(local_memory_table, 0, sizeof(struct alloc_t *) * NUM_BUFFERS);
#if defined(SMP)

i.e more 62 64-bit pointers (496 bytes) get pre-allocated for 1-CPU version

@martin-frbg
Copy link
Collaborator

Yes, but that's the experimental TLS version you are looking at - pretty sure that in the other/older/default implementation it corresponds to an actual allocation.

@brada4
Copy link
Contributor Author

brada4 commented Nov 6, 2018

The "old" version
(8+8+8+8+48)*62=4960

static volatile struct {
  BLASULONG lock;
  void *addr;
#if defined(WHEREAMI) && !defined(USE_OPENMP)
  int   pos;
#endif
  int used;
#ifndef __64BIT__
  char dummy[48];
#else
  char dummy[40];
#endif

@martin-frbg
Copy link
Collaborator

Just looking for reasons why nobody tried to uncouple NUM_BUFFERS from NUM_CPU (NUM_THREADS) before, although this problem has come up in some form or other every few months in the past years.

@brada4
Copy link
Contributor Author

brada4 commented Nov 6, 2018

I think it is buffer (me) or two (formula there) per thread, the big issue is that single-threaded version depends on this structure at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants