Skip to content

Commit 680c821

Browse files
authored
Add CI with link checker. (opensearch-project#3584)
* Add CI with link checker. Signed-off-by: dblock <[email protected]> * Capture URI::InvalidURIError. Signed-off-by: dblock <[email protected]> * Use HEAD and catch URI errors. Signed-off-by: dblock <[email protected]> * Retry on a 405 with a GET. Signed-off-by: dblock <[email protected]> * Replaced external link checker with ruby-link-checker. Signed-off-by: dblock <[email protected]> * Don't exit with an exception. Signed-off-by: dblock <[email protected]> * Run internal link checker on build/ci. Signed-off-by: dblock <[email protected]> * Added broken links issue template. Signed-off-by: dblock <[email protected]> * Added host exclusions that 404 or fail on bots. Signed-off-by: dblock <[email protected]> * Raise anyway because Jekyll does it for us. Signed-off-by: dblock <[email protected]> * Fix broken links. Signed-off-by: dblock <[email protected]> * Only run link checker on main. Signed-off-by: dblock <[email protected]> * Re-add check-links.sh. Signed-off-by: dblock <[email protected]> * Run once a day on cron. Signed-off-by: dblock <[email protected]> --------- Signed-off-by: dblock <[email protected]>
1 parent 04f12af commit 680c821

File tree

24 files changed

+279
-163
lines changed

24 files changed

+279
-163
lines changed
+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
---
2+
title: '[AUTOCUT] Broken links'
3+
labels: 'bug'
4+
---
5+
6+
Links checker has failed on push of your commit.
7+
Please examine the workflow log {{ env.WORKFLOW_URL }}.

.github/workflows/jekyll-build.yml

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
name: Jekyll Build Verification
2+
3+
on: [pull_request]
4+
5+
jobs:
6+
check:
7+
runs-on: ubuntu-latest
8+
9+
steps:
10+
- uses: actions/checkout@v3
11+
- uses: ruby/setup-ruby@v1
12+
with:
13+
ruby-version: '3.0'
14+
bundler-cache: true
15+
- run: |
16+
JEKYLL_LINK_CHECKER=internal bundle exec jekyll build --future

.github/workflows/link-checker.yml

+25
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
name: Check Links
2+
on:
3+
workflow_dispatch:
4+
schedule:
5+
- cron: "30 11 * * *"
6+
jobs:
7+
check:
8+
runs-on: ubuntu-latest
9+
steps:
10+
- uses: actions/checkout@v3
11+
- uses: ruby/setup-ruby@v1
12+
with:
13+
ruby-version: '3.0'
14+
bundler-cache: true
15+
- run: |
16+
JEKYLL_FATAL_LINK_CHECKER=all bundle exec jekyll build --future
17+
- name: Create Issue On Build Failure
18+
if: ${{ failure() }}
19+
uses: dblock/create-a-github-issue@v3
20+
env:
21+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
22+
WORKFLOW_URL: "${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
23+
with:
24+
update_existing: true
25+
filename: .github/ISSUE_TEMPLATE/broken_links.md

Gemfile

+5-1
Original file line numberDiff line numberDiff line change
@@ -32,4 +32,8 @@ gem "tzinfo-data", platforms: [:mingw, :mswin, :x64_mingw, :jruby]
3232
gem "wdm", "~> 0.1.0" if Gem.win_platform?
3333

3434
# Installs webrick dependency for building locally
35-
gem "webrick", "~> 1.7"
35+
gem "webrick", "~> 1.7"
36+
37+
# Link checker
38+
gem "typhoeus"
39+
gem "ruby-link-checker"

_api-reference/explain.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Introduced 1.0
1010

1111
Wondering why a specific document ranks higher (or lower) for a query? You can use the explain API for an explanation of how the relevance score (`_score`) is calculated for every result.
1212

13-
OpenSearch uses a probabilistic ranking framework called [Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25) to calculate relevance scores. Okapi BM25 is based on the original [TF/IDF](http://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/package-summary.html#scoring) framework used by Apache Lucene.
13+
OpenSearch uses a probabilistic ranking framework called [Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25) to calculate relevance scores. Okapi BM25 is based on the original [TF/IDF](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/package-summary.html#scoring) framework used by Apache Lucene.
1414

1515
The explain API is an expensive operation in terms of both resources and time. On production clusters, we recommend using it sparingly for the purpose of troubleshooting.
1616
{: .warning }

_clients/OSC-dot-net.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ This getting started guide illustrates how to connect to OpenSearch, index docum
1515

1616
## Installing OpenSearch.Client
1717

18-
To install OpenSearch.Client, download the [OpenSearch.Client NuGet package](https://www.nuget.org/packages/OpenSearch.Client) and add it to your project in an IDE of your choice. In Microsoft Visual Studio, follow the steps below:
18+
To install OpenSearch.Client, download the [OpenSearch.Client NuGet package](https://www.nuget.org/packages/OpenSearch.Client/) and add it to your project in an IDE of your choice. In Microsoft Visual Studio, follow the steps below:
1919
- In the **Solution Explorer** panel, right-click on your solution or project and select **Manage NuGet Packages for Solution**.
2020
- Search for the OpenSearch.Client NuGet package, and select **Install**.
2121

_config.yml

+4-4
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,10 @@ baseurl: "/docs/latest" # the subpath of your site, e.g. /blog
55
url: "https://opensearch.org" # the base hostname & protocol for your site, e.g. http://example.com
66
permalink: /:path/
77

8-
opensearch_version: 2.6.0
9-
opensearch_dashboards_version: 2.6.0
10-
opensearch_major_minor_version: 2.6
11-
lucene_version: 9_5_0
8+
opensearch_version: '2.6.0'
9+
opensearch_dashboards_version: '2.6.0'
10+
opensearch_major_minor_version: '2.6'
11+
lucene_version: '9_5_0'
1212

1313
# Build settings
1414
markdown: kramdown

_dashboards/reporting.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ This problem can occur for two reasons:
5454

5555
- You don't have the correct version of `headless-chrome` to match the operating system on which OpenSearch Dashboards is running. Download the [correct version](https://github.com/opensearch-project/reporting/releases/tag/chromium-1.12.0.0).
5656

57-
- You're missing additional dependencies. Install the required dependencies for your operating system from the [additional libraries](https://github.com/opensearch-project/dashboards-reports/blob/main/dashboards-reports/rendering-engine/headless-chrome/README.md#additional-libaries) section.
57+
- You're missing additional dependencies. Install the required dependencies for your operating system from the [additional libraries](https://github.com/opensearch-project/dashboards-reports/blob/1.x/dashboards-reports/rendering-engine/headless-chrome/README.md#additional-libaries) section.
5858

5959
### Characters not loading in reports
6060

_data-prepper/common-use-cases/trace-analytics.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ The [OpenTelemetry source]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/c
3939
There are three processors for the trace analytics feature:
4040

4141
* *otel_trace_raw* - The *otel_trace_raw* processor receives a collection of [span](https://github.com/opensearch-project/data-prepper/blob/fa65e9efb3f8d6a404a1ab1875f21ce85e5c5a6d/data-prepper-api/src/main/java/org/opensearch/dataprepper/model/trace/Span.java) records from [*otel-trace-source*]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/otel-trace/), and performs stateful processing, extraction, and completion of trace-group-related fields.
42-
* *otel_trace_group* - The *otel_trace_group* processor fills in the missing trace-group-related fields in the collection of [span](https://github.com/opensearch-project/data-prepper/blob/fa65e9efb3f8d6a404a1ab1875f21ce85e5c5a6d/data-prepper-api/src/main/java/com/amazon/dataprepper/model/trace/Span.java) records by looking up the OpenSearch backend.
42+
* *otel_trace_group* - The *otel_trace_group* processor fills in the missing trace-group-related fields in the collection of [span](https://github.com/opensearch-project/data-prepper/blob/298e7931aa3b26130048ac3bde260e066857df54/data-prepper-api/src/main/java/org/opensearch/dataprepper/model/trace/Span.java) records by looking up the OpenSearch backend.
4343
* *service_map_stateful* – The *service_map_stateful* processor performs the required preprocessing for trace data and builds metadata to display the `service-map` dashboards.
4444

4545

_data-prepper/managing-data-prepper/configuring-log4j.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ You can configure logging using Log4j in Data Prepper.
1111

1212
## Logging
1313

14-
Data Prepper uses [SLF4J](http://www.slf4j.org/) with a [Log4j 2 binding](http://logging.apache.org/log4j/2.x/log4j-slf4j-impl/).
14+
Data Prepper uses [SLF4J](https://www.slf4j.org/) with a [Log4j 2 binding](https://logging.apache.org/log4j/2.x/log4j-slf4j-impl.html).
1515

1616
For Data Prepper versions 2.0 and later, the Log4j 2 configuration file can be found and edited in `config/log4j2.properties` in the application's home directory. The default properties for Log4j 2 can be found in `log4j2-rolling.properties` in the *shared-config* directory.
1717

_data-prepper/managing-data-prepper/monitoring.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,11 @@ You can monitor Data Prepper with metrics using [Micrometer](https://micrometer.
1111

1212
## JVM and system metrics
1313

14-
JVM and system metrics are runtime metrics that are used to monitor Data Prepper instances. They include metrics for classloaders, memory, garbage collection, threads, and others. For more information, see [JVM and system metrics](https://micrometer.io/docs/ref/jvm).
14+
JVM and system metrics are runtime metrics that are used to monitor Data Prepper instances. They include metrics for classloaders, memory, garbage collection, threads, and others. For more information, see [JVM and system metrics](https://micrometer.io/?/docs/ref/jvm).
1515

1616
### Naming
1717

18-
JVM and system metrics follow predefined names in [Micrometer](https://micrometer.io/docs/concepts#_naming_meters). For example, the Micrometer metrics name for memory usage is `jvm.memory.used`. Micrometer changes the name to match the metrics system. Following the same example, `jvm.memory.used` is reported to Prometheus as `jvm_memory_used`, and is reported to Amazon CloudWatch as `jvm.memory.used.value`.
18+
JVM and system metrics follow predefined names in [Micrometer](https://micrometer.io/?/docs/concepts#_naming_meters). For example, the Micrometer metrics name for memory usage is `jvm.memory.used`. Micrometer changes the name to match the metrics system. Following the same example, `jvm.memory.used` is reported to Prometheus as `jvm_memory_used`, and is reported to Amazon CloudWatch as `jvm.memory.used.value`.
1919

2020
### Serving
2121

_data-prepper/pipelines/configuration/sources/http-source.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ request_timeout | No | Integer | The request timeout, in milliseconds. Default v
1919
thread_count | No | Integer | The number of threads to keep in the ScheduledThreadPool. Default value is `200`.
2020
max_connection_count | No | Integer | The maximum allowed number of open connections. Default value is `500`.
2121
max_pending_requests | No | Integer | The maximum allowed number of tasks in the `ScheduledThreadPool` work queue. Default value is `1024`.
22-
authentication | No | Object | An authentication configuration. By default, this creates an unauthenticated server for the pipeline. This uses pluggable authentication for HTTPS. To use basic authentication define the `http_basic` plugin with a `username` and `password`. To provide customer authentication, use or create a plugin that implements [ArmeriaHttpAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/ArmeriaHttpAuthenticationProvider.java).
22+
authentication | No | Object | An authentication configuration. By default, this creates an unauthenticated server for the pipeline. This uses pluggable authentication for HTTPS. To use basic authentication define the `http_basic` plugin with a `username` and `password`. To provide customer authentication, use or create a plugin that implements [ArmeriaHttpAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/1.2.0/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/ArmeriaHttpAuthenticationProvider.java).
2323
ssl | No | Boolean | Enables TLS/SSL. Default value is false.
2424
ssl_certificate_file | Conditionally | String | SSL certificate chain file path or Amazon Simple Storage Service (Amazon S3) path. Amazon S3 path example `s3://<bucketName>/<path>`. Required if `ssl` is set to true and `use_acm_certificate_for_ssl` is set to false.
2525
ssl_key_file | Conditionally | String | SSL key file path or Amazon S3 path. Amazon S3 path example `s3://<bucketName>/<path>`. Required if `ssl` is set to true and `use_acm_certificate_for_ssl` is set to false.

_data-prepper/pipelines/configuration/sources/otel-metrics-source.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ sslKeyFile | Conditionally | String | File-system path or Amazon S3 path to the
2525
useAcmCertForSSL | No | Boolean | Whether to enable TLS/SSL using a certificate and private key from AWS Certificate Manager (ACM). Default value is `false`.
2626
acmCertificateArn | Conditionally | String | Represents the ACM certificate ARN. ACM certificate take preference over S3 or local file system certificates. Required if `useAcmCertForSSL` is set to `true`.
2727
awsRegion | Conditionally | String | Represents the AWS Region used by ACM or Amazon S3. Required if `useAcmCertForSSL` is set to `true` or `sslKeyCertChainFile` and `sslKeyFile` is the Amazon S3 path.
28-
authentication | No | Object | An authentication configuration. By default, an unauthenticated server is created for the pipeline. This uses pluggable authentication for HTTPS. To use basic authentication, define the `http_basic` plugin with a `username` and `password`. To provide customer authentication, use or create a plugin that implements [GrpcAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/GrpcAuthenticationProvider.java).
28+
authentication | No | Object | An authentication configuration. By default, an unauthenticated server is created for the pipeline. This uses pluggable authentication for HTTPS. To use basic authentication, define the `http_basic` plugin with a `username` and `password`. To provide customer authentication, use or create a plugin that implements [GrpcAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/1.2.0/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/GrpcAuthenticationProvider.java).
2929

3030
<!--- ## Configuration
3131

_data-prepper/pipelines/configuration/sources/otel-trace.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ sslKeyFile | Conditionally | String | File system path or Amazon S3 path to the
3131
useAcmCertForSSL | No | Boolean | Whether to enable TLS/SSL using a certificate and private key from AWS Certificate Manager (ACM). Default value is `false`.
3232
acmCertificateArn | Conditionally | String | Represents the ACM certificate ARN. ACM certificate take preference over S3 or local file system certificate. Required if `useAcmCertForSSL` is set to `true`.
3333
awsRegion | Conditionally | String | Represents the AWS region used by ACM or Amazon S3. Required if `useAcmCertForSSL` is set to `true` or `sslKeyCertChainFile` and `sslKeyFile` are Amazon S3 paths.
34-
authentication | No | Object | An authentication configuration. By default, an unauthenticated server is created for the pipeline. This parameter uses pluggable authentication for HTTPS. To use basic authentication, define the `http_basic` plugin with a `username` and `password`. To provide customer authentication, use or create a plugin that implements [GrpcAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/GrpcAuthenticationProvider.java).
34+
authentication | No | Object | An authentication configuration. By default, an unauthenticated server is created for the pipeline. This parameter uses pluggable authentication for HTTPS. To use basic authentication, define the `http_basic` plugin with a `username` and `password`. To provide customer authentication, use or create a plugin that implements [GrpcAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/1.2.0/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/GrpcAuthenticationProvider.java).
3535

3636

3737
<!--- ## Configuration

_ml-commons-plugin/algorithms.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ The training process supports multi-threads, but the number of threads should be
5959

6060
## Linear regression
6161

62-
Linear regression maps the linear relationship between inputs and outputs. In ML Commons, the linear regression algorithm is adopted from the public machine learning library [Tribuo](https://tribuo.org/), which offers multidimensional linear regression models. The model supports the linear optimizer in training, including popular approaches like Linear Decay, SQRT_DECAY, [ADA](http://chrome-extension//gphandlahdpffmccakmbngmbjnjiiahp/https://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf), [ADAM](https://tribuo.org/learn/4.1/javadoc/org/tribuo/math/optimisers/Adam.html), and [RMS_DROP](https://tribuo.org/learn/4.1/javadoc/org/tribuo/math/optimisers/RMSProp.html).
62+
Linear regression maps the linear relationship between inputs and outputs. In ML Commons, the linear regression algorithm is adopted from the public machine learning library [Tribuo](https://tribuo.org/), which offers multidimensional linear regression models. The model supports the linear optimizer in training, including popular approaches like Linear Decay, SQRT_DECAY, [ADA](https://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf), [ADAM](https://tribuo.org/learn/4.1/javadoc/org/tribuo/math/optimisers/Adam.html), and [RMS_DROP](https://tribuo.org/learn/4.1/javadoc/org/tribuo/math/optimisers/RMSProp.html).
6363

6464
### Parameters
6565

_observing-your-data/ad/index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ An anomaly in OpenSearch is any unusual behavior change in your time-series data
1414

1515
It can be challenging to discover anomalies using conventional methods such as creating visualizations and dashboards. You could configure an alert based on a static threshold, but this requires prior domain knowledge and isn't adaptive to data that exhibits organic growth or seasonal behavior.
1616

17-
Anomaly detection automatically detects anomalies in your OpenSearch data in near real-time using the Random Cut Forest (RCF) algorithm. RCF is an unsupervised machine learning algorithm that models a sketch of your incoming data stream to compute an `anomaly grade` and `confidence score` value for each incoming data point. These values are used to differentiate an anomaly from normal variations. For more information about how RCF works, see [Random Cut Forests](https://api.semanticscholar.org/CorpusID:927435).
17+
Anomaly detection automatically detects anomalies in your OpenSearch data in near real-time using the Random Cut Forest (RCF) algorithm. RCF is an unsupervised machine learning algorithm that models a sketch of your incoming data stream to compute an `anomaly grade` and `confidence score` value for each incoming data point. These values are used to differentiate an anomaly from normal variations. For more information about how RCF works, see [Random Cut Forests](https://www.semanticscholar.org/paper/Robust-Random-Cut-Forest-Based-Anomaly-Detection-on-Guha-Mishra/ecb365ef9b67cd5540cc4c53035a6a7bd88678f9).
1818

1919
You can pair the anomaly detection plugin with the [alerting plugin]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/) to notify you as soon as an anomaly is detected.
2020

0 commit comments

Comments
 (0)