Skip to content

Commit e4e2e06

Browse files
committed
More CPU arch
1 parent c66ab6a commit e4e2e06

13 files changed

+281
-81
lines changed

README.md

+14-4
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,11 @@ x86 userland minimal examples. Hundreds of runnable asserts. Containers (ELF), l
99
1. [How to learn](how-to-learn.md)
1010
1. [Instruction sets](instruction-sets.md)
1111
1. [Other architectures](other-architectures.md)
12-
1. [ARM](https://github.com/cirosantilli/arm-assembly-cheat)
13-
1. [RISC-V](risc-v.md)
14-
1. [Microcontrollers](microcontrollers.md)
12+
1. [ARM](https://github.com/cirosantilli/arm-assembly-cheat)
13+
1. [RISC-V](risc-v.md)
14+
1. [Microcontrollers](microcontrollers.md)
15+
1. Educational
16+
1. [Y86](y86.md)
1517
1. [RISC vs CISC](risc-vs-cisc.md)
1618
1. [Microcode](microcode.md)
1719
1. [System vs application programming](system-vs-application-programming.md)
@@ -28,6 +30,12 @@ x86 userland minimal examples. Hundreds of runnable asserts. Containers (ELF), l
2830
1. [CPU Hardware design](cpu-hardware-design.md)
2931
1. [CPU Optimizations](cpu-optimizations.md)
3032
1. [CPU bugs](cpu-bugs.md)
33+
1. [Cache](cache.md)
34+
1. [Instruction level parallelism](instruction-level-parallelism.md)
35+
1. [Pipeline](pipeline.md)
36+
1. [Branch prediction](branch-prediction.md)
37+
1. [Superscalar](superscalar.md)
38+
1. [VLIW](vliw.md)
3139
1. [IA-32](ia-32.md)
3240
1. [main.asm](main.asm)
3341
1. [hello_world.asm](hello_world.asm)
@@ -121,6 +129,9 @@ x86 userland minimal examples. Hundreds of runnable asserts. Containers (ELF), l
121129
1. [ELF](elf.md)
122130
1. [ELF Hello World Tutorial](http://www.cirosantilli.com/elf-hello-world)
123131
1. [Library](library/)
132+
1. Dynamic libraries
133+
1. [ld-linux.so](ld-linux-so.md)
134+
1. [ldd](ldd.md)
124135
1. [Compiler generated](compiler-generated/)
125136
1. [Binutils](binutils.md)
126137
1. [ld](ld.md)
@@ -129,7 +140,6 @@ x86 userland minimal examples. Hundreds of runnable asserts. Containers (ELF), l
129140
1. [objcopy](objcopy.md)
130141
1. [objdump](objdump.md)
131142
1. [size](size.md)
132-
1. [ldd](ldd.md)
133143
1. [misc](misc.md)
134144
1. [Bibliography](bibliography.md)
135145
1. Related tutorials

branch-prediction.md

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# Branch prediction
2+
3+
Only makes sense in pipelined processors: at a branch, it tries to guess which side will be taken, and puts those instructions in the pipeline.
4+
5+
- <http://en.wikipedia.org/wiki/Speculative_execution>
6+
- <http://en.wikipedia.org/wiki/Branch_predictor>
7+
- <http://en.wikipedia.org/wiki/Memory_dependence_prediction>

cache.md

+82
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Cache
2+
3+
- <https://en.wikipedia.org/wiki/CPU_cache>
4+
- <http://stackoverflow.com/questions/16699247/what-is-cache-friendly-code>
5+
- <http://stackoverflow.com/questions/9936132/why-does-the-order-of-the-loops-affect-performance-when-iterating-over-a-2d-arra>
6+
- <http://stackoverflow.com/questions/8469427/how-and-when-to-align-to-cache-line-size>
7+
- <http://stackoverflow.com/questions/763262/how-does-one-write-code-that-best-utilizes-the-cpu-cache-to-improve-performance>
8+
- <http://stackoverflow.com/questions/7905760/matrix-multiplication-small-difference-in-matrix-size-large-difference-in-timi>
9+
- <http://stackoverflow.com/questions/8547778/why-is-one-loop-so-much-slower-than-two-loops>
10+
11+
## Example
12+
13+
Shut up and do an ASCII art example, first direct mapped then set.
14+
15+
## Direct mapped
16+
17+
One possible cache location per memory address.
18+
19+
Upside: simple circuit, fast to find which is it, and small area.
20+
21+
Downside: you might invalidate an entry that was recently accessed, even if all other entries are old.
22+
23+
## Fully associative
24+
25+
One cache entry for every memory, so would be perfect because no conflicts.
26+
27+
But of course, requires a cache as large as main memory, thus useless.
28+
29+
## Set associative
30+
31+
Middle ground between direct mapped.
32+
33+
- 2 way associative example <https://www.youtube.com/watch?v=mCF5XNn_xfA>
34+
35+
Now address specifies the set where it might be, and the tag can be anywhere in that set.
36+
37+
Unlike direct mapped, you now have the fun choice of which entry to evict when a set is full: <https://en.wikipedia.org/wiki/Cache_replacement_policies>
38+
39+
## Bits
40+
41+
Forgetting SMP coherency.
42+
43+
### Validity bit
44+
45+
If false, indicates that the given data is invalid, and must be re-fetched.
46+
47+
When it is set to invalid:
48+
49+
- at startup, everything is set invalid, otherwise we wouldn't be able to differentiate between valid data and the noise present at startup. This is the major use case.
50+
- another processor modifies main memory for a cache that we hold https://en.wikipedia.org/wiki/Bus_snooping
51+
52+
### Dirty bit
53+
54+
Set whenever the CPU writes to cache, unset when cache is written to main memory.
55+
56+
## Tag
57+
58+
Part of the original address (MSB), stored in the cache, to disambiguate if a cache line is a hit or not.
59+
60+
## Virtual or physical memory
61+
62+
Four possibilities: virtual or physical addressed or tagged.
63+
64+
TODO: which one is best / most common and why?
65+
66+
## Cache coherency
67+
68+
You have many CPUs modifying memory. How to keep caches up to date.
69+
70+
- <https://en.wikipedia.org/wiki/Cache_coherence>
71+
- <https://en.wikipedia.org/wiki/Snarfing>
72+
- <https://en.wikipedia.org/wiki/Bus_snooping>
73+
- <https://en.wikipedia.org/wiki/Dragon_protocol>
74+
- <https://en.wikipedia.org/wiki/Firefly_(cache_coherence_protocol)>
75+
- <https://en.wikipedia.org/wiki/Write-once_(cache_coherence)>
76+
- <https://en.wikipedia.org/wiki/MESIF_protocol>
77+
- <https://en.wikipedia.org/wiki/MERSI_protocol>
78+
- <https://en.wikipedia.org/wiki/MOESI_protocol>
79+
- <https://en.wikipedia.org/wiki/MOSI_protocol>
80+
- <https://en.wikipedia.org/wiki/MESI_protocol>
81+
- <https://en.wikipedia.org/wiki/MSI_protocol>
82+
- ARM AMBA 4 ACE

cpu-optimizations.md

-40
Original file line numberDiff line numberDiff line change
@@ -6,52 +6,12 @@ Compilers take most of those into consideration.
66

77
## Instruction level parallelism
88

9-
<https://en.wikipedia.org/wiki/Instruction-level_parallelism>
10-
11-
### Instruction pipelining
12-
13-
- <http://en.wikipedia.org/wiki/Instruction_pipeline>
14-
- <http://en.wikipedia.org/wiki/Operand_forwarding> <http://web.cs.iastate.edu/~prabhu/Tutorial/PIPELINE/forward.html>
15-
- <http://en.wikipedia.org/wiki/Bubble_%28computing%29>
16-
- <https://en.wikipedia.org/wiki/Very_long_instruction_word>
17-
- <https://en.wikipedia.org/wiki/Hazard_%28computer_architecture%29>
18-
- <https://en.wikipedia.org/wiki/Orthogonal_instruction_set>
19-
- <https://en.wikipedia.org/wiki/Bubble_(computing)> (AKA pipeline stall)
20-
21-
Tutorials:
22-
23-
- <https://scalibq.wordpress.com/2012/02/19/cpus-and-pipelines-how-do-they-work/>
24-
- <https://www.cs.uaf.edu/2010/fall/cs441/lecture/09_16_pipelining.html>
25-
269
### Out of order processing
2710

2811
<http://en.wikipedia.org/wiki/Out-of-order_execution>
2912

3013
- <http://en.wikipedia.org/wiki/Register_renaming>
3114

32-
### Branch prediction
33-
34-
Only makes sense in pipelined processors: at a branch, it tries to guess which side will be taken, and puts those instructions in the pipeline.
35-
36-
- <http://en.wikipedia.org/wiki/Speculative_execution>
37-
- <http://en.wikipedia.org/wiki/Branch_predictor>
38-
- <http://en.wikipedia.org/wiki/Memory_dependence_prediction>
39-
40-
### Superscalar architecture
41-
42-
TODO vs pipeline and instruction level parallelism?
43-
44-
- <http://en.wikipedia.org/wiki/Superscalar>
45-
4615
## Megahertz myth
4716

4817
- <http://en.wikipedia.org/wiki/Megahertz_myth>
49-
50-
## Cache
51-
52-
- <http://stackoverflow.com/questions/16699247/what-is-cache-friendly-code>
53-
- <http://stackoverflow.com/questions/9936132/why-does-the-order-of-the-loops-affect-performance-when-iterating-over-a-2d-arra>
54-
- <http://stackoverflow.com/questions/8469427/how-and-when-to-align-to-cache-line-size>
55-
- <http://stackoverflow.com/questions/763262/how-does-one-write-code-that-best-utilizes-the-cpu-cache-to-improve-performance>
56-
- <http://stackoverflow.com/questions/7905760/matrix-multiplication-small-difference-in-matrix-size-large-difference-in-timi>
57-
- <http://stackoverflow.com/questions/8547778/why-is-one-loop-so-much-slower-than-two-loops>

instruction-level-parallelism.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Instruction level parallelism
2+
3+
<https://en.wikipedia.org/wiki/Instruction-level_parallelism>

ld-linux-so.md

+61
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# ld-linux.so
2+
3+
TODO what does it do exactly. How is it called? By the kernel?
4+
5+
man ld.so
6+
7+
`/lib64/ld-linux-x86-64.so.2` is the usual 64-bit version.
8+
9+
`ld-linux.so*` is the executable that does the dynamic loading for every executable.
10+
11+
As such, it cannot have any dependencies.
12+
13+
Its path is specified in the `.interp` section of ELF files, which Linux reads and uses to call `ld-linux`.
14+
15+
The default on Ubuntu 14.04 is `/lib64/ld-linux-x86-64.so.2`.
16+
17+
The program to be run is passed as an argument to `ld-linux`:
18+
19+
/lib64/ld-linux-x86-64.so.2 a.out
20+
21+
Then:
22+
23+
man execve
24+
25+
says that the path of the loader is stored in the elf file, and `readelf -a` shows a section devoted to it:
26+
27+
INTERP 0x0000000000000238 0x0000000000400238 0x0000000000400238
28+
0x000000000000001c 0x000000000000001c R 1
29+
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
30+
31+
On Ubuntu 16.04, provided by the `libc6` package.
32+
33+
## LD_DEBUG
34+
35+
TODO
36+
37+
## LD_LIBRARY_PATH
38+
39+
You can also add to path with environment variables.
40+
41+
Don't rely on this method for production.
42+
43+
export LD_LIBRARY_PATH='/path/to/link'
44+
45+
## ld.so
46+
47+
TODO what is it?
48+
49+
## ld.so.conf
50+
51+
TODO. E.g. Ubuntu 16.04 mesa:
52+
53+
/usr/lib/x86_64-linux-gnu/mesa/ld.so.conf
54+
55+
## /etc/ld.so.conf
56+
57+
TODO.
58+
59+
## ldconfig
60+
61+
TODO

ldd.md

+2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# ldd
22

3+
Print information about the configuration of the dynamic loader `ld-linux.so`.
4+
35
List required shared libraries of an executable and if they can be found.
46

57
`binutils` package.

library/README.md

-28
Original file line numberDiff line numberDiff line change
@@ -52,34 +52,6 @@ Since the disadvantages are so minor, it is almost always better to use dynamic
5252

5353
<http://www.ibm.com/developerworks/library/l-dynamic-libraries/>
5454

55-
## ld.so
56-
57-
## ld-linux.so
58-
59-
man ld.so
60-
61-
`ld-linux.so*` is the program that does the dynamic loading for every executable.
62-
63-
Its path is specified in the `.interp` section of ELF files, which Linux reads and uses to call `ld-linux`.
64-
65-
The default on Ubuntu 14.04 is `/lib64/ld-linux-x86-64.so.2`.
66-
67-
The program to be run is passed as an argument to `ld-linux`:
68-
69-
/lib64/ld-linux-x86-64.so.2 a.out
70-
71-
### LD_DEBUG
72-
73-
TODO
74-
75-
### LD_LIBRARY_PATH
76-
77-
You can also add to path with environment variables.
78-
79-
Don't rely on this method for production.
80-
81-
export LD_LIBRARY_PATH='/path/to/link'
82-
8355
## Search path
8456

8557
Find where GCC search path for both `.a` and `.so`:

pipeline.md

+41
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Pipeline
2+
3+
Throughput latency tradeoff. Hugely advantageous on throughput side, so always used today.
4+
5+
- <http://en.wikipedia.org/wiki/Instruction_pipeline>
6+
- <http://en.wikipedia.org/wiki/Operand_forwarding> <http://web.cs.iastate.edu/~prabhu/Tutorial/PIPELINE/forward.html>
7+
- <http://en.wikipedia.org/wiki/Bubble_%28computing%29>
8+
- <https://en.wikipedia.org/wiki/Very_long_instruction_word>
9+
- <https://en.wikipedia.org/wiki/Orthogonal_instruction_set>
10+
- <https://en.wikipedia.org/wiki/Bubble_(computing)> (AKA pipeline stall)
11+
12+
Tutorials:
13+
14+
- <https://scalibq.wordpress.com/2012/02/19/cpus-and-pipelines-how-do-they-work/>
15+
- <https://www.cs.uaf.edu/2010/fall/cs441/lecture/09_16_pipelining.html>
16+
- <https://www.youtube.com/watch?v=euhQ_hdDGA8> talks about the classic 5 stage pipeline
17+
- <https://compas.cs.stonybrook.edu/course/cse502-s13/lectures/cse502-L4-pipelining.pdf> good structural diagram
18+
19+
## Hazards
20+
21+
Types:
22+
23+
- data
24+
- structural: same pipeline stage used by two instructions at the same time
25+
- control: branches
26+
27+
- <https://en.wikipedia.org/wiki/Hazard_%28computer_architecture%29>
28+
29+
## Classic RISC pipeline
30+
31+
Most tutorials cover this, so it is likely a good idea to learn this one real well.
32+
33+
Seems to come from MIPS, so basic MIPS assembly will help you.
34+
35+
<https://en.wikipedia.org/wiki/Classic_RISC_pipeline>
36+
37+
## Implementations
38+
39+
Vendors document pipeline length.
40+
41+
Intel: <https://en.wikipedia.org/wiki/List_of_Intel_CPU_microarchitectures>

risc-v.md

+27-9
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,26 @@ Previously closed source custom ISA I think, then opened and front-end hacked fo
117117

118118
Uses ModelSim...
119119

120+
## Tethered vs untethered
121+
122+
<https://youtu.be/XSyH9T-Cj4w?t=64> tethered cannot do IO on itself: <https://www.youtube.com/watch?v=XSyH9T-Cj4w>
123+
124+
Rocket it tethered, lowRISC untethered.
125+
126+
## Hardware implementations
127+
128+
- <https://github.com/lowRISC/lowrisc-chip>
129+
- <https://github.com/ucb-bar/rocket-chip>
130+
- <https://github.com/ucb-bar/riscv-boom>
131+
- <https://github.com/cliffordwolf/picorv32>
132+
- Pulpino
133+
134+
## Prototypes
135+
136+
9 Silicon Prototypes: <https://web.archive.org/web/20160904102006/https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-17.pdf>
137+
138+
https://web.archive.org/web/20160904102554/https://people.eecs.berkeley.edu/%7Eyunsup/papers/riscv-esscirc2014.pdf
139+
120140
## News
121141

122142
2016 indiegogo https://www.indiegogo.com/projects/risc-v-microprocessor/x/6766065#/
@@ -129,16 +149,14 @@ Uses ModelSim...
129149

130150
<https://www.quora.com/Would-RISC-V-become-the-dominant-CPU-architecture-in-the-next-5-years-given-that-Google-Oracle-and-HP-are-strongly-rallying-behind-RISC-V>
131151

132-
## Tethered vs untethered
152+
<http://www.design-reuse.com/news/40903/codasip-and-baysand-partnership-makes-risc-v-based-asics-an-ideal-choice-for-iot-designs.html>
133153

134-
<https://youtu.be/XSyH9T-Cj4w?t=64> tethered cannot do IO on itself: <https://www.youtube.com/watch?v=XSyH9T-Cj4w>
154+
<https://www.crowdsupply.com/onchip/open-v>: crowd funded attempt of RV microcontroller, open up to RTL level, on dev board. 50\$ each, delivery in 1.5 years.
135155

136-
Rocket it tethered, lowRISC untethered.
156+
## Conferences
137157

138-
## Hardware implementations
158+
- <http://orconf.org/index.html>
139159

140-
- <https://github.com/lowRISC/lowrisc-chip>
141-
- <https://github.com/ucb-bar/rocket-chip>
142-
- <https://github.com/ucb-bar/riscv-boom>
143-
- <https://github.com/cliffordwolf/picorv32>
144-
- Pulpino
160+
## Companies
161+
162+
Good place to search: <http://orconf.org/index.html#sponsors>

superscalar.md

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Superscalar architecture
2+
3+
TODO vs pipeline and instruction level parallelism?
4+
5+
<http://en.wikipedia.org/wiki/Superscalar>

0 commit comments

Comments
 (0)