@@ -89,33 +89,71 @@ Note: as the SSTable does not support redundant keys, there is no ambiguity betw
89
89
90
90
### SSTFooter
91
91
```
92
- +-------+-------+-----+-------------+---------+---------+
93
- | Block | Block | ... | IndexOffset | NumTerm | Version |
94
- +-------+-------+-----+-------------+---------+---------+
95
- |----( # of blocks)---|
92
+ +-----+----------------+-------------+-------------+---------+---------+
93
+ | Fst | BlockAddrStore | StoreOffset | IndexOffset | NumTerm | Version |
94
+ +-----+----------------+-------------+-------------+---------+---------+
96
95
```
97
- - Block(SSTBlock): uses IndexValue for its Values format
96
+ - Fst(Fst): finite state transducer mapping keys to a block number
97
+ - BlockAddrStore(BlockAddrStore): store mapping a block number to its BlockAddr
98
+ - StoreOffset(u64): Offset to start of the BlockAddrStore. If zero, see the SingleBlockSStable section
98
99
- IndexOffset(u64): Offset to the start of the SSTFooter
99
100
- NumTerm(u64): number of terms in the sstable
100
- - Version(u32): Currently equal to 2
101
+ - Version(u32): Currently equal to 3
101
102
102
- ### IndexValue
103
- ```
104
- +------------+----------+-------+-------+-----+
105
- | EntryCount | StartPos | Entry | Entry | ... |
106
- +------------+----------+-------+-------+-----+
107
- |---( # of entries)---|
108
- ```
103
+ ### Fst
109
104
110
- - EntryCount(VInt): number of entries
111
- - StartPos(VInt): the start pos of the first (data) block referenced by this (index) block
112
- - Entry (IndexEntry)
105
+ Fst is in the format of tantivy\_ fst
113
106
114
- ### Entry
115
- ```
116
- +----------+--------------+
117
- | BlockLen | FirstOrdinal |
118
- +----------+--------------+
119
- ```
120
- - BlockLen(VInt): length of the block
121
- - FirstOrdinal(VInt): ordinal of the first element in the given block
107
+ ### BlockAddrStore
108
+
109
+ +---------+-----------+-----------+-----+-----------+-----------+-----+
110
+ | MetaLen | BlockMeta | BlockMeta | ... | BlockData | BlockData | ... |
111
+ +---------+-----------+-----------+-----+-----------+-----------+-----+
112
+ |---------(N blocks)----------|---------(N blocks)----------|
113
+
114
+ - MetaLen(u64): length of the BlockMeta section
115
+ - BlockMeta(BlockAddrBlockMetadata): metadata to seek through BlockData
116
+ - BlockData(CompactedBlockAddr): bitpacked per block metadata
117
+
118
+ ### BlockAddrBlockMetadata
119
+
120
+ +--------+------------+--------------+------------+--------------+-------------------+-----------------+----------+
121
+ | Offset | RangeStart | FirstOrdinal | RangeSlope | OrdinalSlope | FirstOrdinalNBits | RangeStartNBits | BlockLen |
122
+ +--------+------------+--------------+------------+--------------+-------------------+-----------------+----------+
123
+
124
+ - Offset(u64): offset of the corresponding BlockData in the datastream
125
+ - RangeStart(u64): the start position of the first block
126
+ - FirstOrdinal(u64): the first ordinal of the first block
127
+ - RangeSlope(u32): slope predicted for start range evolution (see computation in BlockData)
128
+ - OrdinalSlope(u64): slope predicted for first ordinal evolution (see computation in BlockData)
129
+ - FirstOrdinalNBits(u8): number of bits per ordinal in datastream (see computation in BlockData)
130
+ - RangeStartNBits(u8): number of bits per range start in datastream (see computation in BlockData)
131
+
132
+ ### BlockData
133
+
134
+ +-----------------+-------------------+---------------+
135
+ | RangeStartDelta | FirstOrdinalDelta | FinalRangeEnd |
136
+ +-----------------+-------------------+---------------+
137
+ | ------(BlockLen repetitions)---------|
138
+
139
+ - RangeStartDelta(var): RangeStartNBits * bits* of little endian number. See below for decoding
140
+ - FirstOrdinalDelta(var): FirstOrdinalNBits * bits* of little endian number. See below for decoding
141
+ - FinalRangeEnd(var): RangeStartNBits * bits* of integer. See below for decoding
142
+
143
+ converting a BlockData of index Index and a BlockAddrBlockMetadata to an actual block address is done as follow:
144
+ range\_ prediction := RangeStart + Index * RangeSlop;
145
+ range\_ derivation := RangeStartDelta - (1 << (RangeStartNBits-1));
146
+ range\_ start := range\_ prediction + range\_ derivation
147
+
148
+ The same computation can be done for ordinal.
149
+
150
+ Note that ` range_derivation ` can take negative value. ` RangeStartDelta ` is just its translation to a positive range.
151
+
152
+
153
+ ## SingleBlockSStable
154
+
155
+ The format used for the index is meant to be compact, however it has a constant cost of around 70
156
+ bytes, which isn't negligible for a table containing very few keys.
157
+ To limit the impact of that constant cost, single block sstable omit the Fst and BlockAddrStore from
158
+ their index. Instead a block with first ordinal of 0, range start of 0 and range end of IndexOffset
159
+ is implicitly used for every operations.
0 commit comments