|
| 1 | +# Cache |
| 2 | + |
| 3 | +- <https://en.wikipedia.org/wiki/CPU_cache> |
| 4 | +- <http://stackoverflow.com/questions/16699247/what-is-cache-friendly-code> |
| 5 | +- <http://stackoverflow.com/questions/9936132/why-does-the-order-of-the-loops-affect-performance-when-iterating-over-a-2d-arra> |
| 6 | +- <http://stackoverflow.com/questions/8469427/how-and-when-to-align-to-cache-line-size> |
| 7 | +- <http://stackoverflow.com/questions/763262/how-does-one-write-code-that-best-utilizes-the-cpu-cache-to-improve-performance> |
| 8 | +- <http://stackoverflow.com/questions/7905760/matrix-multiplication-small-difference-in-matrix-size-large-difference-in-timi> |
| 9 | +- <http://stackoverflow.com/questions/8547778/why-is-one-loop-so-much-slower-than-two-loops> |
| 10 | + |
| 11 | +## Example |
| 12 | + |
| 13 | +Shut up and do an ASCII art example, first direct mapped then set. |
| 14 | + |
| 15 | +## Direct mapped |
| 16 | + |
| 17 | +One possible cache location per memory address. |
| 18 | + |
| 19 | +Upside: simple circuit, fast to find which is it, and small area. |
| 20 | + |
| 21 | +Downside: you might invalidate an entry that was recently accessed, even if all other entries are old. |
| 22 | + |
| 23 | +## Fully associative |
| 24 | + |
| 25 | +One cache entry for every memory, so would be perfect because no conflicts. |
| 26 | + |
| 27 | +But of course, requires a cache as large as main memory, thus useless. |
| 28 | + |
| 29 | +## Set associative |
| 30 | + |
| 31 | +Middle ground between direct mapped. |
| 32 | + |
| 33 | +- 2 way associative example <https://www.youtube.com/watch?v=mCF5XNn_xfA> |
| 34 | + |
| 35 | +Now address specifies the set where it might be, and the tag can be anywhere in that set. |
| 36 | + |
| 37 | +Unlike direct mapped, you now have the fun choice of which entry to evict when a set is full: <https://en.wikipedia.org/wiki/Cache_replacement_policies> |
| 38 | + |
| 39 | +## Bits |
| 40 | + |
| 41 | +Forgetting SMP coherency. |
| 42 | + |
| 43 | +### Validity bit |
| 44 | + |
| 45 | +If false, indicates that the given data is invalid, and must be re-fetched. |
| 46 | + |
| 47 | +When it is set to invalid: |
| 48 | + |
| 49 | +- at startup, everything is set invalid, otherwise we wouldn't be able to differentiate between valid data and the noise present at startup. This is the major use case. |
| 50 | +- another processor modifies main memory for a cache that we hold https://en.wikipedia.org/wiki/Bus_snooping |
| 51 | + |
| 52 | +### Dirty bit |
| 53 | + |
| 54 | +Set whenever the CPU writes to cache, unset when cache is written to main memory. |
| 55 | + |
| 56 | +## Tag |
| 57 | + |
| 58 | +Part of the original address (MSB), stored in the cache, to disambiguate if a cache line is a hit or not. |
| 59 | + |
| 60 | +## Virtual or physical memory |
| 61 | + |
| 62 | +Four possibilities: virtual or physical addressed or tagged. |
| 63 | + |
| 64 | +TODO: which one is best / most common and why? |
| 65 | + |
| 66 | +## Cache coherency |
| 67 | + |
| 68 | +You have many CPUs modifying memory. How to keep caches up to date. |
| 69 | + |
| 70 | +- <https://en.wikipedia.org/wiki/Cache_coherence> |
| 71 | +- <https://en.wikipedia.org/wiki/Snarfing> |
| 72 | +- <https://en.wikipedia.org/wiki/Bus_snooping> |
| 73 | +- <https://en.wikipedia.org/wiki/Dragon_protocol> |
| 74 | +- <https://en.wikipedia.org/wiki/Firefly_(cache_coherence_protocol)> |
| 75 | +- <https://en.wikipedia.org/wiki/Write-once_(cache_coherence)> |
| 76 | +- <https://en.wikipedia.org/wiki/MESIF_protocol> |
| 77 | +- <https://en.wikipedia.org/wiki/MERSI_protocol> |
| 78 | +- <https://en.wikipedia.org/wiki/MOESI_protocol> |
| 79 | +- <https://en.wikipedia.org/wiki/MOSI_protocol> |
| 80 | +- <https://en.wikipedia.org/wiki/MESI_protocol> |
| 81 | +- <https://en.wikipedia.org/wiki/MSI_protocol> |
| 82 | +- ARM AMBA 4 ACE |
0 commit comments