Skip to content

Commit 5220c44

Browse files
authoredApr 19, 2024··
Merge pull request #13 from DanRoscigno/docs-as-code-update
edits
2 parents b62e7fb + 8713921 commit 5220c44

File tree

3 files changed

+137
-26
lines changed

3 files changed

+137
-26
lines changed
 

‎content/documentation/modules/ROOT/pages/docs-as-code.adoc

+114-3
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,27 @@
1-
= Single sourcing documentation code snippets from end to end tests
1+
= Docs-As-Code
22

3-
== Two lifecycles for technical documentation
3+
Docs-as-code can have many meanings. I include:
44

5-
=== Documenting a feature
5+
* Source control
6+
* Automated testing
7+
* Peer reviews
8+
* Enforcing a style-guide
9+
* Collecting and responding to reader feedback
10+
11+
I think that most of the above is well understood, however, the area that I think that needs to be improved in general is automated
12+
testing. Here is a breakdown of what I think automated testing is:
13+
14+
* Doc test builds
15+
* Link checking
16+
* Style-guide enforcement
17+
* Code-sample testing
18+
19+
I don't think that code-sample testing is widely adopted. I believe that most organizations fall in the "test when someone complains"
20+
camp.
21+
22+
== Single sourced code samples
23+
24+
=== Complaint-driven workflow
625

726
[[complaint-driven-pipeline-diagram]]
827
.Complaint Driven Writing Pipeline
@@ -22,6 +41,7 @@ P[/Receive Complaint/]
2241
--> E
2342
----
2443

44+
=== Test-driven workflow
2545

2646
[[CI-driven-pipeline-diagram]]
2747
.CI Test Driven Writing Pipeline
@@ -36,3 +56,94 @@ M[/Breaking change/] -->
3656
P[/CI Fails/] -- Update the tests -->C
3757
C
3858
----
59+
60+
=== Overview
61+
62+
A year or two ago I saw a GitHub issue related to a bug in some documentation
63+
that I wrote. Later in the same week I saw another issue about a different page
64+
in the docs. Both of these issues were related to key features that the
65+
community and customers were using in production.
66+
67+
After retesting the steps in the docs and confirming that the issues were
68+
accurate I looked through the release notes and found the related breaking
69+
changes.
70+
71+
Waiting for the community and customers to find the bugs in the docs or bugs in
72+
the code is a common problem, and it is embarrassing.
73+
74+
The best way to know when software changes is to run tests against every code
75+
change. This is common for code changes, but somehow the code changes and sample
76+
data used in the tests don't make their way into the documentation.
77+
78+
=== The fix
79+
80+
* Treat the docs as code.
81+
* As end to end docs (tutorials, quick starts, how to guides) are designed they
82+
should be written as test plans.
83+
* Automate the test plan.
84+
* Write the doc, but instead of copy/pasting the code snippets (SQL in my case)
85+
into the docs, import the snippets directly from the automated test.
86+
* Run the test suite on a regular basis.
87+
* As tests fail get the code fixed if the failure indicates a bug, or update the
88+
test to include the new behavior of the system. The update to the test should cause
89+
an update to the documentation as the doc system is pulling the code snippets
90+
from the tests.
91+
92+
=== Example
93+
94+
A recent feature of the project I am working on queries data in files stored in object
95+
storage, figures out the schema of the data, then creates and populates a table in a
96+
database.
97+
98+
The SQL that causes this magic to happen looks like this:
99+
100+
.Create table from S3 using FILES() table function
101+
[,sql]
102+
----
103+
CREATE TABLE DocsQA.user_behavior_inferred
104+
AS SELECT * FROM FILES (
105+
"path" = "s3://starrocks-examples/user_behavior_ten_million_rows.parquet",
106+
"format" = "parquet",
107+
"aws.s3.region" = "us-east-1",
108+
"aws.s3.access_key" = "AAAAAAAAAAAAAAAAAAAA",
109+
"aws.s3.secret_key" = "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"
110+
);
111+
----
112+
113+
Yesterday I would have copied the above out of the SQL client I used to run
114+
the query and pasted it into a Markdown file. But today I would instead use
115+
this syntax to grab the above from the test specification like so:
116+
117+
[,markdown]
118+
----
119+
```
120+
sql reference title="Create table from S3 using FILES() table function"
121+
https://github.com/DanRoscigno/docs/blob/6d6fcf905162adf80bd094cb9dd133a5c557bdd3/SQL/files_table_fxn.sql#L1-L11
122+
```
123+
----
124+
125+
NOTE: This is `docusaurus-theme-github-codeblock` syntax, not Asciidoc. With Asciidoc I would include content by
126+
https://docs.asciidoctor.org/asciidoc/latest/directives/include-tagged-regions/[tagged regions^,target="_blank"].
127+
128+
In Docusaurus this renders as:
129+
130+
image::shared:testSQL.png[Create table from S3]
131+
132+
=== Proof of concept
133+
134+
An implementation of the above is described at
135+
https://github.com/DanRoscigno/SingleSourceCodeTestingAndDocs/blob/main/README.md[Single-sourcing docs from code^,target="_blank"]
136+
137+
== Reader feedback
138+
139+
Collecting reader feedback is important. Many of the feedback widgets available rely on systems that are blocked by
140+
browser ad blockers. I am using a React component that collects feedback and writes to PostHog and does not rely on
141+
cookies. Each week a
142+
https://github.com/StarRocks/starrocks/blob/main/.github/workflows/weekly-docs-feedback.yml[scheduled GitHub workflow^,target="_blank"]
143+
collects the feedback from PostHog and generates an issue with the reader feedback. The same workflow queries Algolia
144+
for the top successful searches, and the top failed searches in the docs. This informs the documentation team and
145+
product management on which features or commands are important to readers.
146+
147+
== CI checks
148+
149+
Link checking, Markdown linting, and build tests are done on each commit to documentation pull requests by the https://github.com/StarRocks/starrocks/blob/main/.github/workflows/ci-doc-checker.yml#L62-L135[doc CI job^,target="_blank"].

‎content/documentation/modules/ROOT/pages/index.adoc

+23-23
Original file line numberDiff line numberDiff line change
@@ -13,53 +13,53 @@ or have been modified.
1313
This example is designed to be followed step by step to integrate the database with a specific
1414
third-party visualization tool.
1515

16-
When I wrote this guide I pulled out the reusable content (the ports used by the database and where to
17-
find the connection details in the commercial UI) into reusable snippets and these are imported wherever
16+
When I wrote this guide, I pulled out the reusable content (the ports used by the database and where to
17+
find the connection details in the commercial UI) into reusable snippets. These are now imported wherever
1818
needed. Imports have always been available in Asciidoc, but this was missing in Docusaurus until recently.
1919

2020
Key features of this guide:
2121

2222
* Identify the goal and show the end result.
2323
* Provide a step-by-step procedure.
2424
* Include all the information necessary to complete the task.
25-
* Limit links out to download files necessary to implement the integration and a sample dataset used to verify the integration.
25+
* Limit links to those needed for downloading files to implement the integration and a sample dataset used to verify the integration.
2626

2727
The rest of the visualization tool integrations at ClickHouse follow the same pattern.
2828

29-
Typically, integration documentation would be limited to "install the driver, add the connection string, press the test button". I deviated from this because the community often had problems with using the integration once the connection was established. Deflecting issues reported in Slack and to the support desk is important to both the users and the support team.
29+
Typically, integration documentation would be limited to "install the driver, add the connection string, press the test button". I deviated from this because the community often had problems using the integration once the connection was established. Deflecting issues reported in Slack and to the support desk is important to both the users and the support team.
3030

3131
https://clickhouse.com/docs/en/integrations/superset[Superset Integration^,target="_blank"]
3232

3333
== How-To and Explanation
3434

3535
This is a guide for someone who has already gone through the basics of starting the database, creating a
36-
"Hello World" table and loading a few rows of data. This document exemplifies one place where I
37-
combine explanation with a how-to guide. This is meant to both teach someone how to load data, and
36+
"Hello World" table, and loading a few rows of data. This document exemplifies one place where I
37+
combine explanation with a how-to guide. This is meant to both teach someone how to load data and
3838
explain what they should be considering while they work through the process.
3939

40-
In the NYPD Complaint Data guide I guide a new user of the ClickHouse analytical database through
40+
In the NYPD Complaint Data documentation, I guide a new user of the ClickHouse analytical database through
4141
investigating the structure and content of an input file containing a dataset, determining the proper
4242
schema for the database table that the data will be stored in, how to transform the data while
4343
ingesting it, and how to run some interesting queries against that data.
4444

45-
Most guides in this product space tell the reader "type this, click that, clean up". I find that
45+
Most guides in this product space tell the reader "Type this, click that, clean up". I find that
4646
type of guide to be boring, and I wonder if the method presented is a "good" method or the simplest
4747
for the author to write.
4848

4949
Database guides often use very simple datasets that are guaranteed to work. This is necessary for the
50-
very first tutorial type content designed to get the product installed and the very first table created.
50+
very first tutorial-type content designed to get the product installed and the very first table created.
5151
Beyond that point, the reader needs to learn about how to understand their data and the database so
5252
that they can make proper decisions. When I wrote this guide I had almost no experience with the
53-
product. My mentor recommended that I "figure it out and write down everything that I learned". This
54-
first example is the result of that advice.
53+
product. My mentor recommended that I "Write down everything that I learn while working through the
54+
process". This first example is the result of that advice.
5555

5656
Here is an example from the NYPD Complaint Data document that I believe is a good way to present
5757
a system for learning about the data, and properly configuring the database table to store the data
5858
efficiently:
5959

6060
NOTE: The queries are not shown in the excerpt.
6161

62-
> In order to figure out what types should be used for the fields it is necessary to know what the data looks like. For example, the field JURISDICTION_CODE is a numeric: should it be a UInt8, or an Enum, or is Float64 appropriate?
62+
> To figure out what types should be used for the fields it is necessary to know what the data looks like. For example, the field JURISDICTION_CODE is a numeric: should it be a UInt8, or an Enum, or is Float64 appropriate?
6363
>
6464
> The query response shows that the JURISDICTION_CODE fits well in a UInt8.
6565
>
@@ -70,13 +70,13 @@ NOTE: The queries are not shown in the excerpt.
7070
> The dataset in use at the time of writing has only a few hundred distinct parks and playgrounds in the PARK_NM column. This is a small number based on the LowCardinality recommendation to stay below 10,000 distinct strings in a LowCardinality(String) field.
7171

7272
The document continues to teach a few more very important techniques for analyzing and manipulating
73-
data, and then finishes up with some queries and advice on what to learn next.
73+
data and then finishes up with some queries and advice on what to learn next.
7474

7575
https://web.archive.org/web/20230317111529/https://clickhouse.com/docs/en/getting-started/example-datasets/nypd_complaint_data[ClickHouse guide to analyzing NYPD complaint data^,target="_blank"]
7676

7777
== Solution guides
7878

79-
The documentation at Elastic was traditionally product based. This meant that the documentation was split up into these separate sets:
79+
The documentation at Elastic was traditionally product-based. This meant that the documentation was split up into these separate sets:
8080

8181
* Search engine
8282
* Visualization tool
@@ -85,13 +85,13 @@ The documentation at Elastic was traditionally product based. This meant that th
8585

8686
This separation of the documentation meant that the reader had to know which tools they needed, what terminology each of the tools used to describe the same idea, and which tool to pick if there were multiple options for a specific task. This issue hit me personally when I was trying to set up a new feature. I searched for the feature and the search engine documentation came up first in the results, so I followed that guide. I had to use pages of JSON configuration to get the integration working. I was speaking with some of the other writers about how difficult this was to configure, and the writer for the visualization tool told me that there was a button to configure that. This conversation led to regular knowledge sharing among the writers and the course developers so that we could provide end-to-end scenario-based documentation that highlighted the best way to accomplish tasks. There are several solution guides, and I worked on these:
8787

88-
https://www.elastic.co/guide/en/starting-with-the-elasticsearch-platform-and-its-solutions/current/getting-started-observability.html[Getting started with Observability^,target="_blank"]
88+
https://www.elastic.co/guide/en/starting-with-the-elasticsearch-platform-and-its-solutions/current/getting-started-observability.html[Getting Started with Observability^,target="_blank"]
8989

9090
https://www.elastic.co/guide/en/starting-with-the-elasticsearch-platform-and-its-solutions/current/getting-started-kubernetes.html[Monitor your Kubernetes Infrastructure^,target="_blank"]
9191

9292
https://www.elastic.co/guide/en/starting-with-the-elasticsearch-platform-and-its-solutions/current/getting-started-siem-security.html[Use Elastic Security for SIEM^,target="_blank"]
9393

94-
== Use-case based How-To guides
94+
== Use-case-based How-To guides
9595

9696
I am not a fan of using four lines of data to introduce the reader to a database product capable of
9797
ingesting and analyzing billions of rows of data or analyzing those billions of rows of data where they
@@ -103,10 +103,10 @@ that align with the needs of the community:
103103
* Analyzing data in Apache Iceberg
104104
* Analyzing data in Apache Hudi
105105

106-
There is some complexity to configuring the integrations with cloud storage, Apache Iceberg, and Apache Hudi. To make this easier for the reader I wrote Docker compose files to deploy MinIO, Iceberg, and Hudi. I think that this is appropriate, as the reader that wants to use external storage with StarRocks is likely familiar with the external storage. In addition to the compose files I documented the settings necessary, and in the case of the Hudi integration I submitted a pull request to the Hudi maintainers to improve their compose-based tutorial.
106+
There is some complexity to configuring the integrations with cloud storage, Apache Iceberg, and Apache Hudi. To make this easier for the reader I wrote Docker compose files to deploy MinIO, Iceberg, and Hudi. I think that this is appropriate, as the reader who wants to use external storage with StarRocks is likely familiar with the external storage. In addition to the compose files I documented the settings necessary, and in the case of the Hudi integration I submitted a pull request to the Hudi maintainers to improve their compose-based tutorial.
107107

108108
The "Basics" Quick Start is a step-by-step guide with no explanation until the end. There are some
109-
complicated manipulations of the data during loading. In the document I ask the reader to wait until they
109+
complicated manipulations of the data during loading. In the document, I ask the reader to wait until they
110110
have finished the entire process and promise to provide them with the details.
111111

112112
> The curl commands look complex, but they are explained in detail at the end of the tutorial. For now, we recommend running the commands and running some SQL to analyze the data, and then reading about the data loading details at the end.
@@ -139,21 +139,21 @@ https://docs.starrocks.io/docs/quick_start/[StarRocks Quick Starts^,target="_bla
139139

140140
I love to collaborate with other people. Learning from other people, and sharing my knowledge with others is a central part of who I am.
141141
When Elastic was a start-up we were "for developers and by developers". Even though I had a marketing title, the Elastic leadership was
142-
very clear: My job was to make sure that every word on the website was truthful. I loved that. I worked on the content on the website, but
142+
very clear: My job was to make sure that every word on the website was truthful. I loved that. I worked on the content of the website, but
143143
most of my time was spent writing blogs, presenting on webinars, and building demos. Some of the content I produced is described in this section.
144144

145145
=== Google Anthos
146146

147147
I joined Elastic as the Product Marketing Manager (PMM) for ingest products and Kubernetes. When Google Anthos was being developed
148148
Google did not have an on-premise logging solution and partnered with Elastic to provide one. I wrote the documentation for the
149-
integration. Google now has their own logging solution, so the documentation was pulled, here is a
149+
integration. Google now has a logging solution, so the documentation was pulled, here is a
150150
https://drive.google.com/file/d/1stnwF87lsOFE_95m-UKQDuZ4vkQosejp/view[PDF^,target="_blank"].
151151

152152
=== Kubernetes
153153

154154
I was working the Elastic booth at Kubecon 2018 and almost everyone who came to visit the booth told me that they loved
155-
Elasticsearch. As the PMM for ingest products I was interested in what agents were popular with the community. All but a
156-
handful of the people I spoke with were using Fluentd or Fluent Bit to feed Logstash. In order to raise awareness of Elastic
155+
Elasticsearch. As the PMM for ingest products, I was interested in what agents were popular with the community. All but a
156+
handful of the people I spoke with were using Fluentd or Fluent Bit to feed Logstash. To raise awareness of Elastic
157157
agents similar in functionality to Fluentd and Fluent Bit I joined the Kubernetes SIG-Docs and published this guide in the
158158
Kubernetes documentation.
159159

@@ -173,5 +173,5 @@ https://www.elastic.co/customer-success/resources?tab=2[Elastic Support Engineer
173173

174174
Some people prefer a short video when they want an introduction to a new technique. I recorded this to give people an overview of the https://www.youtube.com/watch?v=IO_uXPKQht0[Elastic Kubernetes operator^,target="_blank"].
175175

176-
There are more blogs, videos, and webinars available in the
176+
There are more blogs, videos, and webinars available on the
177177
https://www.elastic.co/search/?q=roscigno&size=n_20_n[Elastic search page^,target="_blank"].
Loading

0 commit comments

Comments
 (0)
Please sign in to comment.