More CPU arch

cirosantilli · cirosantilli · commit e4e2e0610a86 · 2016-11-27T12:41:29.000Z
diff --git a/README.md b/README.md
@@ -9,9 +9,11 @@ x86 userland minimal examples. Hundreds of runnable asserts. Containers (ELF), l
     1.  [How to learn](how-to-learn.md)
     1.  [Instruction sets](instruction-sets.md)
         1.  [Other architectures](other-architectures.md)
-            1. [ARM](https://github.com/cirosantilli/arm-assembly-cheat)
-            1. [RISC-V](risc-v.md)
-            1. [Microcontrollers](microcontrollers.md)
+            1.  [ARM](https://github.com/cirosantilli/arm-assembly-cheat)
+            1.  [RISC-V](risc-v.md)
+            1.  [Microcontrollers](microcontrollers.md)
+            1.  Educational
+                1. [Y86](y86.md)
         1.  [RISC vs CISC](risc-vs-cisc.md)
             1.  [Microcode](microcode.md)
         1.  [System vs application programming](system-vs-application-programming.md)
@@ -28,6 +30,12 @@ x86 userland minimal examples. Hundreds of runnable asserts. Containers (ELF), l
     1.  [CPU Hardware design](cpu-hardware-design.md)
         1.  [CPU Optimizations](cpu-optimizations.md)
         1.  [CPU bugs](cpu-bugs.md)
+        1.  [Cache](cache.md)
+        1.  [Instruction level parallelism](instruction-level-parallelism.md)
+            1.  [Pipeline](pipeline.md)
+            1.  [Branch prediction](branch-prediction.md)
+            1.  [Superscalar](superscalar.md)
+            1.  [VLIW](vliw.md)
 1.  [IA-32](ia-32.md)
     1.  [main.asm](main.asm)
     1.  [hello_world.asm](hello_world.asm)
@@ -121,6 +129,9 @@ x86 userland minimal examples. Hundreds of runnable asserts. Containers (ELF), l
     1.  [ELF](elf.md)
         1. [ELF Hello World Tutorial](http://www.cirosantilli.com/elf-hello-world)
     1.  [Library](library/)
+1.  Dynamic libraries
+    1.  [ld-linux.so](ld-linux-so.md)
+        1.  [ldd](ldd.md)
 1.  [Compiler generated](compiler-generated/)
 1.  [Binutils](binutils.md)
     1.  [ld](ld.md)
@@ -129,7 +140,6 @@ x86 userland minimal examples. Hundreds of runnable asserts. Containers (ELF), l
     1.  [objcopy](objcopy.md)
     1.  [objdump](objdump.md)
     1.  [size](size.md)
-1.  [ldd](ldd.md)
 1.  [misc](misc.md)
 1.  [Bibliography](bibliography.md)
 1.  Related tutorials
diff --git a/branch-prediction.md b/branch-prediction.md
@@ -0,0 +1,7 @@
+# Branch prediction
+
+Only makes sense in pipelined processors: at a branch, it tries to guess which side will be taken, and puts those instructions in the pipeline.
+
+- <http://en.wikipedia.org/wiki/Speculative_execution>
+- <http://en.wikipedia.org/wiki/Branch_predictor>
+- <http://en.wikipedia.org/wiki/Memory_dependence_prediction>
diff --git a/cache.md b/cache.md
@@ -0,0 +1,82 @@
+# Cache
+
+- <https://en.wikipedia.org/wiki/CPU_cache>
+- <http://stackoverflow.com/questions/16699247/what-is-cache-friendly-code>
+- <http://stackoverflow.com/questions/9936132/why-does-the-order-of-the-loops-affect-performance-when-iterating-over-a-2d-arra>
+- <http://stackoverflow.com/questions/8469427/how-and-when-to-align-to-cache-line-size>
+- <http://stackoverflow.com/questions/763262/how-does-one-write-code-that-best-utilizes-the-cpu-cache-to-improve-performance>
+- <http://stackoverflow.com/questions/7905760/matrix-multiplication-small-difference-in-matrix-size-large-difference-in-timi>
+- <http://stackoverflow.com/questions/8547778/why-is-one-loop-so-much-slower-than-two-loops>
+
+## Example
+
+Shut up and do an ASCII art example, first direct mapped then set.
+
+## Direct mapped
+
+One possible cache location per memory address.
+
+Upside: simple circuit, fast to find which is it, and small area.
+
+Downside: you might invalidate an entry that was recently accessed, even if all other entries are old.
+
+## Fully associative
+
+One cache entry for every memory, so would be perfect because no conflicts.
+
+But of course, requires a cache as large as main memory, thus useless.
+
+## Set associative
+
+Middle ground between direct mapped.
+
+- 2 way associative example <https://www.youtube.com/watch?v=mCF5XNn_xfA>
+
+Now address specifies the set where it might be, and the tag can be anywhere in that set.
+
+Unlike direct mapped, you now have the fun choice of which entry to evict when a set is full: <https://en.wikipedia.org/wiki/Cache_replacement_policies>
+
+## Bits
+
+Forgetting SMP coherency.
+
+### Validity bit
+
+If false, indicates that the given data is invalid, and must be re-fetched.
+
+When it is set to invalid:
+
+- at startup, everything is set invalid, otherwise we wouldn't be able to differentiate between valid data and the noise present at startup. This is the major use case.
+- another processor modifies main memory for a cache that we hold https://en.wikipedia.org/wiki/Bus_snooping
+
+### Dirty bit
+
+Set whenever the CPU writes to cache, unset when cache is written to main memory.
+
+## Tag
+
+Part of the original address (MSB), stored in the cache, to disambiguate if a cache line is a hit or not.
+
+## Virtual or physical memory
+
+Four possibilities: virtual or physical addressed or tagged.
+
+TODO: which one is best / most common and why?
+
+## Cache coherency
+
+You have many CPUs modifying memory. How to keep caches up to date.
+
+- <https://en.wikipedia.org/wiki/Cache_coherence>
+- <https://en.wikipedia.org/wiki/Snarfing>
+- <https://en.wikipedia.org/wiki/Bus_snooping>
+- <https://en.wikipedia.org/wiki/Dragon_protocol>
+- <https://en.wikipedia.org/wiki/Firefly_(cache_coherence_protocol)>
+- <https://en.wikipedia.org/wiki/Write-once_(cache_coherence)>
+- <https://en.wikipedia.org/wiki/MESIF_protocol>
+- <https://en.wikipedia.org/wiki/MERSI_protocol>
+- <https://en.wikipedia.org/wiki/MOESI_protocol>
+- <https://en.wikipedia.org/wiki/MOSI_protocol>
+- <https://en.wikipedia.org/wiki/MESI_protocol>
+- <https://en.wikipedia.org/wiki/MSI_protocol>
+- ARM AMBA 4 ACE
diff --git a/cpu-optimizations.md b/cpu-optimizations.md
@@ -6,52 +6,12 @@ Compilers take most of those into consideration.
 
 ## Instruction level parallelism
 
-<https://en.wikipedia.org/wiki/Instruction-level_parallelism>
-
-### Instruction pipelining
-
-- <http://en.wikipedia.org/wiki/Instruction_pipeline>
-- <http://en.wikipedia.org/wiki/Operand_forwarding> <http://web.cs.iastate.edu/~prabhu/Tutorial/PIPELINE/forward.html>
-- <http://en.wikipedia.org/wiki/Bubble_%28computing%29>
-- <https://en.wikipedia.org/wiki/Very_long_instruction_word>
-- <https://en.wikipedia.org/wiki/Hazard_%28computer_architecture%29>
-- <https://en.wikipedia.org/wiki/Orthogonal_instruction_set>
-- <https://en.wikipedia.org/wiki/Bubble_(computing)> (AKA pipeline stall)
-
-Tutorials:
-
-- <https://scalibq.wordpress.com/2012/02/19/cpus-and-pipelines-how-do-they-work/>
-- <https://www.cs.uaf.edu/2010/fall/cs441/lecture/09_16_pipelining.html>
-
 ### Out of order processing
 
 <http://en.wikipedia.org/wiki/Out-of-order_execution>
 
 - <http://en.wikipedia.org/wiki/Register_renaming>
 
-### Branch prediction
-
-Only makes sense in pipelined processors: at a branch, it tries to guess which side will be taken, and puts those instructions in the pipeline.
-
-- <http://en.wikipedia.org/wiki/Speculative_execution>
-- <http://en.wikipedia.org/wiki/Branch_predictor>
-- <http://en.wikipedia.org/wiki/Memory_dependence_prediction>
-
-### Superscalar architecture
-
-TODO vs pipeline and instruction level parallelism?
-
-- <http://en.wikipedia.org/wiki/Superscalar>
-
 ## Megahertz myth
 
 - <http://en.wikipedia.org/wiki/Megahertz_myth>
-
-## Cache
-
-- <http://stackoverflow.com/questions/16699247/what-is-cache-friendly-code>
-- <http://stackoverflow.com/questions/9936132/why-does-the-order-of-the-loops-affect-performance-when-iterating-over-a-2d-arra>
-- <http://stackoverflow.com/questions/8469427/how-and-when-to-align-to-cache-line-size>
-- <http://stackoverflow.com/questions/763262/how-does-one-write-code-that-best-utilizes-the-cpu-cache-to-improve-performance>
-- <http://stackoverflow.com/questions/7905760/matrix-multiplication-small-difference-in-matrix-size-large-difference-in-timi>
-- <http://stackoverflow.com/questions/8547778/why-is-one-loop-so-much-slower-than-two-loops>
diff --git a/instruction-level-parallelism.md b/instruction-level-parallelism.md
@@ -0,0 +1,3 @@
+# Instruction level parallelism
+
+<https://en.wikipedia.org/wiki/Instruction-level_parallelism>
diff --git a/ld-linux-so.md b/ld-linux-so.md
@@ -0,0 +1,61 @@
+# ld-linux.so
+
+TODO what does it do exactly. How is it called? By the kernel?
+
+    man ld.so
+
+`/lib64/ld-linux-x86-64.so.2` is the usual 64-bit version.
+
+`ld-linux.so*` is the executable that does the dynamic loading for every executable.
+
+As such, it cannot have any dependencies.
+
+Its path is specified in the `.interp` section of ELF files, which Linux reads and uses to call `ld-linux`.
+
+The default on Ubuntu 14.04 is `/lib64/ld-linux-x86-64.so.2`.
+
+The program to be run is passed as an argument to `ld-linux`:
+
+    /lib64/ld-linux-x86-64.so.2 a.out
+
+Then:
+
+    man execve
+
+says that the path of the loader is stored in the elf file, and `readelf -a` shows a section devoted to it:
+
+    INTERP         0x0000000000000238 0x0000000000400238 0x0000000000400238
+                    0x000000000000001c 0x000000000000001c  R      1
+        [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
+
+On Ubuntu 16.04, provided by the `libc6` package.
+
+## LD_DEBUG
+
+TODO
+
+## LD_LIBRARY_PATH
+
+You can also add to path with environment variables.
+
+Don't rely on this method for production.
+
+    export LD_LIBRARY_PATH='/path/to/link'
+
+## ld.so
+
+TODO what is it?
+
+## ld.so.conf
+
+TODO. E.g. Ubuntu 16.04 mesa:
+
+    /usr/lib/x86_64-linux-gnu/mesa/ld.so.conf
+
+## /etc/ld.so.conf
+
+TODO.
+
+## ldconfig
+
+TODO
diff --git a/ldd.md b/ldd.md
@@ -1,5 +1,7 @@
 # ldd
 
+Print information about the configuration of the dynamic loader `ld-linux.so`.
+
 List required shared libraries of an executable and if they can be found.
 
 `binutils` package.
diff --git a/library/README.md b/library/README.md
@@ -52,34 +52,6 @@ Since the disadvantages are so minor, it is almost always better to use dynamic
 
 <http://www.ibm.com/developerworks/library/l-dynamic-libraries/>
 
-## ld.so
-
-## ld-linux.so
-
-    man ld.so
-
-`ld-linux.so*` is the program that does the dynamic loading for every executable.
-
-Its path is specified in the `.interp` section of ELF files, which Linux reads and uses to call `ld-linux`.
-
-The default on Ubuntu 14.04 is `/lib64/ld-linux-x86-64.so.2`.
-
-The program to be run is passed as an argument to `ld-linux`:
-
-    /lib64/ld-linux-x86-64.so.2 a.out
-
-### LD_DEBUG
-
-TODO
-
-### LD_LIBRARY_PATH
-
-You can also add to path with environment variables.
-
-Don't rely on this method for production.
-
-    export LD_LIBRARY_PATH='/path/to/link'
-
 ## Search path
 
 Find where GCC search path for both `.a` and `.so`:
diff --git a/pipeline.md b/pipeline.md
@@ -0,0 +1,41 @@
+# Pipeline
+
+Throughput latency tradeoff. Hugely advantageous on throughput side, so always used today.
+
+- <http://en.wikipedia.org/wiki/Instruction_pipeline>
+- <http://en.wikipedia.org/wiki/Operand_forwarding> <http://web.cs.iastate.edu/~prabhu/Tutorial/PIPELINE/forward.html>
+- <http://en.wikipedia.org/wiki/Bubble_%28computing%29>
+- <https://en.wikipedia.org/wiki/Very_long_instruction_word>
+- <https://en.wikipedia.org/wiki/Orthogonal_instruction_set>
+- <https://en.wikipedia.org/wiki/Bubble_(computing)> (AKA pipeline stall)
+
+Tutorials:
+
+- <https://scalibq.wordpress.com/2012/02/19/cpus-and-pipelines-how-do-they-work/>
+- <https://www.cs.uaf.edu/2010/fall/cs441/lecture/09_16_pipelining.html>
+- <https://www.youtube.com/watch?v=euhQ_hdDGA8> talks about the classic 5 stage pipeline
+- <https://compas.cs.stonybrook.edu/course/cse502-s13/lectures/cse502-L4-pipelining.pdf> good structural diagram
+
+## Hazards
+
+Types:
+
+- data
+- structural: same pipeline stage used by two instructions at the same time
+- control: branches
+
+- <https://en.wikipedia.org/wiki/Hazard_%28computer_architecture%29>
+
+## Classic RISC pipeline
+
+Most tutorials cover this, so it is likely a good idea to learn this one real well.
+
+Seems to come from MIPS, so basic MIPS assembly will help you.
+
+<https://en.wikipedia.org/wiki/Classic_RISC_pipeline>
+
+## Implementations
+
+Vendors document pipeline length.
+
+Intel: <https://en.wikipedia.org/wiki/List_of_Intel_CPU_microarchitectures>
diff --git a/risc-v.md b/risc-v.md
@@ -117,6 +117,26 @@ Previously closed source custom ISA I think, then opened and front-end hacked fo
 
 Uses ModelSim...
 
+## Tethered vs untethered
+
+<https://youtu.be/XSyH9T-Cj4w?t=64> tethered cannot do IO on itself: <https://www.youtube.com/watch?v=XSyH9T-Cj4w>
+
+Rocket it tethered, lowRISC untethered.
+
+## Hardware implementations
+
+- <https://github.com/lowRISC/lowrisc-chip>
+- <https://github.com/ucb-bar/rocket-chip>
+- <https://github.com/ucb-bar/riscv-boom>
+- <https://github.com/cliffordwolf/picorv32>
+- Pulpino
+
+## Prototypes
+
+9 Silicon Prototypes: <https://web.archive.org/web/20160904102006/https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-17.pdf>
+
+https://web.archive.org/web/20160904102554/https://people.eecs.berkeley.edu/%7Eyunsup/papers/riscv-esscirc2014.pdf
+
 ## News
 
 2016 indiegogo https://www.indiegogo.com/projects/risc-v-microprocessor/x/6766065#/
@@ -129,16 +149,14 @@ Uses ModelSim...
 
 <https://www.quora.com/Would-RISC-V-become-the-dominant-CPU-architecture-in-the-next-5-years-given-that-Google-Oracle-and-HP-are-strongly-rallying-behind-RISC-V>
 
-## Tethered vs untethered
+<http://www.design-reuse.com/news/40903/codasip-and-baysand-partnership-makes-risc-v-based-asics-an-ideal-choice-for-iot-designs.html>
 
-<https://youtu.be/XSyH9T-Cj4w?t=64> tethered cannot do IO on itself: <https://www.youtube.com/watch?v=XSyH9T-Cj4w>
+<https://www.crowdsupply.com/onchip/open-v>: crowd funded attempt of RV microcontroller, open up to RTL level, on dev board. 50\$  each, delivery in 1.5 years.
 
-Rocket it tethered, lowRISC untethered.
+## Conferences
 
-## Hardware implementations
+- <http://orconf.org/index.html>
 
-- <https://github.com/lowRISC/lowrisc-chip>
-- <https://github.com/ucb-bar/rocket-chip>
-- <https://github.com/ucb-bar/riscv-boom>
-- <https://github.com/cliffordwolf/picorv32>
-- Pulpino
+## Companies
+
+Good place to search: <http://orconf.org/index.html#sponsors>
diff --git a/superscalar.md b/superscalar.md
@@ -0,0 +1,5 @@
+# Superscalar architecture
+
+TODO vs pipeline and instruction level parallelism?
+
+<http://en.wikipedia.org/wiki/Superscalar>
diff --git a/vliw.md b/vliw.md
diff --git a/y86.md b/y86.md

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+# Instruction level parallelism`
	`2`	`+`
	`3`	`+<https://en.wikipedia.org/wiki/Instruction-level_parallelism>`