Skip to content

Commit 48648f5

Browse files
lundibunditargos
authored andcommitted
benchmark: add lines to scatter plots
Adds lines between the points of the same category in scatter.R plots. PR-URL: #22074 Reviewed-By: Andreas Madsen <[email protected]> Reviewed-By: Anatoli Papirovski <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
1 parent 75a9192 commit 48648f5

File tree

3 files changed

+51
-42
lines changed

3 files changed

+51
-42
lines changed

benchmark/scatter.R

+1
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,7 @@ if (!is.null(plot.filename)) {
7979
width=.1, na.rm=TRUE
8080
);
8181
p = p + geom_point();
82+
p = p + geom_line();
8283
p = p + ylab("rate of operations (higher is better)");
8384
p = p + ggtitle(dat[1, 1]);
8485
ggsave(plot.filename, p);

doc/guides/doc_img/scatter-plot.png

-11.3 KB
Loading

doc/guides/writing-and-running-benchmarks.md

+50-42
Original file line numberDiff line numberDiff line change
@@ -270,56 +270,64 @@ After generating the csv, a comparison table can be created using the
270270
the `--plot filename` option.
271271

272272
```console
273-
$ cat scatter.csv | Rscript benchmark/scatter.R --xaxis chunk --category encoding --plot scatter-plot.png --log
274-
275-
aggregating variable: inlen
276-
277-
chunk encoding mean confidence.interval
278-
16 ascii 1111933.3 221502.48
279-
16 base64-ascii 167508.4 33116.09
280-
16 base64-utf8 122666.6 25037.65
281-
16 utf8 783254.8 159601.79
282-
64 ascii 2623462.9 399791.36
283-
64 base64-ascii 462008.3 85369.45
284-
64 base64-utf8 420108.4 85612.05
285-
64 utf8 1358327.5 235152.03
286-
256 ascii 3730343.4 371530.47
287-
256 base64-ascii 663281.2 80302.73
288-
256 base64-utf8 632911.7 81393.07
289-
256 utf8 1554216.9 236066.53
290-
1024 ascii 4399282.0 186436.46
291-
1024 base64-ascii 730426.6 63806.12
292-
1024 base64-utf8 680954.3 68076.33
293-
1024 utf8 1554832.5 237532.07
273+
$ cat scatter.csv | Rscript benchmark/scatter.R --xaxis chunkLen --category encoding --plot scatter-plot.png --log
274+
275+
aggregating variable: inLen
276+
277+
chunkLen encoding rate confidence.interval
278+
16 ascii 1515855.1 334492.68
279+
16 base64-ascii 403527.2 89677.70
280+
16 base64-utf8 322352.8 70792.93
281+
16 utf16le 1714567.5 388439.81
282+
16 utf8 1100181.6 254141.32
283+
64 ascii 3550402.0 661277.65
284+
64 base64-ascii 1093660.3 229976.34
285+
64 base64-utf8 997804.8 227238.04
286+
64 utf16le 3372234.0 647274.88
287+
64 utf8 1731941.2 360854.04
288+
256 ascii 5033793.9 723354.30
289+
256 base64-ascii 1447962.1 236625.96
290+
256 base64-utf8 1357269.2 231045.70
291+
256 utf16le 4039581.5 655483.16
292+
256 utf8 1828672.9 360311.55
293+
1024 ascii 5677592.7 624771.56
294+
1024 base64-ascii 1494171.7 227302.34
295+
1024 base64-utf8 1399218.9 224584.79
296+
1024 utf16le 4157452.0 630416.28
297+
1024 utf8 1824266.6 359628.52
294298
```
295299

296-
Because the scatter plot can only show two variables (in this case _chunk_ and
297-
_encoding_) the rest is aggregated. Sometimes aggregating is a problem, this
300+
Because the scatter plot can only show two variables (in this case _chunkLen_
301+
and _encoding_) the rest is aggregated. Sometimes aggregating is a problem, this
298302
can be solved by filtering. This can be done while benchmarking using the
299303
`--set` parameter (e.g. `--set encoding=ascii`) or by filtering results
300304
afterwards using tools such as `sed` or `grep`. In the `sed` case be
301305
sure to keep the first line since that contains the header information.
302306

303307
```console
304-
$ cat scatter.csv | sed -E '1p;/([^,]+, ){3}128,/!d' | Rscript benchmark/scatter.R --xaxis chunk --category encoding --plot scatter-plot.png --log
305-
306-
chunk encoding mean confidence.interval
307-
16 ascii 701285.96 21233.982
308-
16 base64-ascii 107719.07 3339.439
309-
16 base64-utf8 72966.95 2438.448
310-
16 utf8 475340.84 17685.450
311-
64 ascii 2554105.08 87067.132
312-
64 base64-ascii 330120.32 8551.707
313-
64 base64-utf8 249693.19 8990.493
314-
64 utf8 1128671.90 48433.862
315-
256 ascii 4841070.04 181620.768
316-
256 base64-ascii 849545.53 29931.656
317-
256 base64-utf8 809629.89 33773.496
318-
256 utf8 1489525.15 49616.334
319-
1024 ascii 4931512.12 165402.805
320-
1024 base64-ascii 863933.22 27766.982
321-
1024 base64-utf8 827093.97 24376.522
322-
1024 utf8 1487176.43 50128.721
308+
$ cat scatter.csv | sed -E '1p;/([^,]+, ){3}128,/!d' | Rscript benchmark/scatter.R --xaxis chunkLen --category encoding --plot scatter-plot.png --log
309+
310+
chunkLen encoding rate confidence.interval
311+
16 ascii 1302078.5 71692.27
312+
16 base64-ascii 338669.1 15159.54
313+
16 base64-utf8 281904.2 20326.75
314+
16 utf16le 1381515.5 58533.61
315+
16 utf8 831183.2 33631.01
316+
64 ascii 4363402.8 224030.00
317+
64 base64-ascii 1036825.9 48644.72
318+
64 base64-utf8 780059.3 60994.98
319+
64 utf16le 3900749.5 158366.84
320+
64 utf8 1723710.6 80665.65
321+
256 ascii 8472896.1 511822.51
322+
256 base64-ascii 2215884.6 104347.53
323+
256 base64-utf8 1996230.3 131778.47
324+
256 utf16le 5824147.6 234550.82
325+
256 utf8 2019428.8 100913.36
326+
1024 ascii 8340189.4 598855.08
327+
1024 base64-ascii 2201316.2 111777.68
328+
1024 base64-utf8 2002272.9 128843.11
329+
1024 utf16le 5789281.7 240642.77
330+
1024 utf8 2025551.2 81770.69
323331
```
324332

325333
![compare tool boxplot](doc_img/scatter-plot.png)

0 commit comments

Comments
 (0)