PanicOnWarnings option to detect SQL warnings and fail the copy process #1500

grodowski · 2025-02-28T17:01:04Z

Related issue: #1498

Description

This PR adds a CLI flag --panic-on-warnings that makes gh-ost fail after copying each batch when:

the expected, selected row count differs from actual copied row count (affectedRows)
SQL warnings are present, indicating why some rows were dropped

I also decided to always log all SQL warnings, which could be useful to find issues with data truncation or not null constraints even when the row count matches.

Opening as draft with a few remaining items:

review documentation
review test coverage
~~check explain analyze on the updated range select query~~

In case this PR introduced Go code changes:

contributed code is using same conventions as original code
script/cibuild returns with no formatting errors, build errors or unit test errors.

…ancy

- log insert warnings always - terminate if row count doesn't match and the PanicOnWarnings flag is set

…-type-change/extra_args Co-authored-by: Bastian Bartmann <[email protected]>

grodowski · 2025-03-11T14:54:19Z

Ran gh-ost locally on a 1.7GiB table with 110k rows with generated blob data to check if there's any noticable impact of the added count(*) for each batch.

Migrate cmd:

gh-ost --allow-on-master --allow-master-master --table="large_books" --alter="add index author_index(author(255))" --database="schema_migrations_test_app_development" --user="root" --password="" --initially-drop-old-table --initially-drop-ghost-table --default-retries=3 --verbose --debug --host=$MYSQL_HOST --port=$MYSQL_PORT --execute

Generic time benchmark on two binaries build on master and grodowski/raise_on_warnings (gh-ost-new)

gh-ost-master --allow-on-master --allow-master-master ... 0.99s user 1.45s system 7% cpu 30.708 total
gh-ost-master --allow-on-master --allow-master-master ... 0.98s user 1.43s system 7% cpu 31.321 total

gh-ost-new --allow-on-master --allow-master-master ... 1.00s user 1.46s system 7% cpu 32.743 total
gh-ost-new --allow-on-master --allow-master-master ... 0.98s user 1.41s system 7% cpu 31.117 total

Then, ran these explains on the copy query (BuildUniqueKeyRangeEndPreparedQueryViaTemptable):

-- before
explain analyze select /* gh-ost mydb.tbl test */ id
from (
    select
        id
    from
        large_books
    where id >= 50000 and id <= 52000
    order by
        id asc
    limit 1000) select_osc_chunk
order by
    id desc
limit 1

| -> Limit: 1 row(s)  (cost=1021..1021 rows=1) (actual time=0.943..0.943 rows=1 loops=1)
    -> Sort: select_osc_chunk.id DESC, limit input to 1 row(s) per chunk  (cost=1021..1021 rows=1) (actual time=0.942..0.942 rows=1 loops=1)
        -> Table scan on select_osc_chunk  (cost=906..921 rows=1000) (actual time=0.746..0.843 rows=1000 loops=1)
            -> Materialize  (cost=906..906 rows=1000) (actual time=0.744..0.744 rows=1000 loops=1)
                -> Limit: 1000 row(s)  (cost=806 rows=1000) (actual time=0.137..0.64 rows=1000 loops=1)
                    -> Filter: ((large_books.id >= 50000) and (large_books.id <= 52000))  (cost=806 rows=3992) (actual time=0.136..0.579 rows=1000 loops=1)
                        -> Covering index range scan on large_books using PRIMARY over (50000 <= id <= 52000)  (cost=806 rows=3992) (actual time=0.132..0.446 rows=1000 loops=1)

-- after
explain analyze select /* gh-ost mydb.tbl test */
    id,
    (select count(*) from (
        select
            id
        from
            large_books
        where id >= 50000 and id <= 52000
        order by
            id asc
        limit 1000
    ) select_osc_chunk)
from (
    select
        id
    from
        large_books
    where id >= 50000 and id <= 52000
    order by
        id asc
    limit 1000
) select_osc_chunk
order by
    id desc
limit 1;

| -> Limit: 1 row(s)  (cost=1021..1021 rows=1) (actual time=1.21..1.21 rows=1 loops=1)
    -> Sort: select_osc_chunk.id DESC, limit input to 1 row(s) per chunk  (cost=1021..1021 rows=1) (actual time=1.21..1.21 rows=1 loops=1)
        -> Table scan on select_osc_chunk  (cost=906..921 rows=1000) (actual time=0.955..1.08 rows=1000 loops=1)
            -> Materialize  (cost=906..906 rows=1000) (actual time=0.953..0.953 rows=1000 loops=1)
                -> Limit: 1000 row(s)  (cost=806 rows=1000) (actual time=0.161..0.813 rows=1000 loops=1)
                    -> Filter: ((large_books.id >= 50000) and (large_books.id <= 52000))  (cost=806 rows=3992) (actual time=0.16..0.732 rows=1000 loops=1)
                        -> Covering index range scan on large_books using PRIMARY over (50000 <= id <= 52000)  (cost=806 rows=3992) (actual time=0.157..0.574 rows=1000 loops=1)
-> Select #2 (subquery in projection; run only once)
    -> Aggregate: count(0)  (cost=1021..1021 rows=1) (actual time=1.04..1.04 rows=1 loops=1)
        -> Table scan on select_osc_chunk  (cost=906..921 rows=1000) (actual time=0.894..0.992 rows=1000 loops=1)
            -> Materialize  (cost=906..906 rows=1000) (actual time=0.894..0.894 rows=1000 loops=1)
                -> Limit: 1000 row(s)  (cost=806 rows=1000) (actual time=0.14..0.774 rows=1000 loops=1)
                    -> Filter: ((large_books.id >= 50000) and (large_books.id <= 52000))  (cost=806 rows=3992) (actual time=0.14..0.694 rows=1000 loops=1)
                        -> Covering index range scan on large_books using PRIMARY over (50000 <= id <= 52000)  (cost=806 rows=3992) (actual time=0.139..0.535 rows=1000 loops=1)

Looks like there will be no significant change in perf?

Copilot

Pull Request Overview

This PR introduces a new CLI flag (panic-on-warnings) that causes gh-ost to fail when SQL warnings are detected during row copying and also logs all SQL warnings.

A new flag "--panic-on-warnings" is added in the main CLI and corresponding context fields.
The iteration range calculation now also returns an expected row count with adjusted logic to process SQL warnings.
Various SQL builder functions and tests have been updated to support the enhanced query structure and error handling.

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
dev.yml	Adds configuration for testing environments and dependency management.
go/logic/applier.go	Updates range end calculation and SQL warning processing logic.
go/sql/builder_test.go	Adjusts expected queries and argument lists in tests to match new behavior.
go/logic/migrator.go	Incorporates expected range size into chunk iteration and warning checks.
go/sql/builder.go	Modifies SQL builder queries with minor refactorings and marks an unused variable.
go/cmd/gh-ost/main.go	Introduces the new CLI flag for panic-on-warnings.
go/base/context.go	Adds new context fields for managing SQL warnings and panic configuration.

Comments suppressed due to low confidence (1)

go/sql/builder.go:278

[nitpick] The variable 'uniqueKeyColumnDescending' is declared but not used anywhere in the function. Removing it could improve code clarity.

uniqueKeyColumnDescending := make([]string, len(uniqueKeyColumnNames)) // TODO unused variable

Copilot · 2025-03-11T14:58:38Z

go/logic/migrator.go

+						for _, warning := range this.migrationContext.MigrationLastInsertSQLWarnings {
+							this.migrationContext.Log.Infof("ApplyIterationInsertQuery has SQL warnings! %s", warning)
+						}
+						if expectedRangeSize != rowsAffected {


The error generated by terminateRowIteration is not being returned, which may inadvertently allow execution to continue even after detecting SQL warnings. Consider returning the error from terminateRowIteration to abort the operation as intended.

meiji163 · 2025-03-12T23:03:42Z

Looks good! I will test the performance of this internally (hopefully this week)

grodowski and others added 9 commits February 13, 2025 11:48

First pass at raise_on_warnings

e78c3a7

Update dev.yml

76a29d7

add CLI option, ignore expected warnings, check for row count discrep…

68d66eb

…ancy

Count rows in each insert range prepared query

e94a555

- log insert warnings always - terminate if row count doesn't match and the PanicOnWarnings flag is set

Lint comments - ensure errors are handled in show warnings

39300cf

update TestBuildUniqueKeyRangeEndPreparedQuery

e018fbc

Localtests for PanicOnWarnings with data loss

69d8158

Unwrap CTE (mysql 5)

7681d1c

Update localtests/panic-on-warnings-duplicate-unique-values-on-column…

b2aeab3

…-type-change/extra_args Co-authored-by: Bastian Bartmann <[email protected]>

Merge branch 'master' into grodowski/raise_on_warnings

3033eb4

grodowski marked this pull request as ready for review March 11, 2025 14:57

Copilot bot review requested due to automatic review settings March 11, 2025 14:57

grodowski requested review from rashiq, meiji163 and timvaillancourt as code owners March 11, 2025 14:57

Copilot AI reviewed Mar 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PanicOnWarnings option to detect SQL warnings and fail the copy process #1500

PanicOnWarnings option to detect SQL warnings and fail the copy process #1500

grodowski commented Feb 28, 2025 •

edited

Loading

grodowski commented Mar 11, 2025 •

edited

Loading

Copilot AI left a comment

Copilot AI Mar 11, 2025

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

meiji163 commented Mar 12, 2025 •

edited

Loading

PanicOnWarnings option to detect SQL warnings and fail the copy process #1500

Are you sure you want to change the base?

PanicOnWarnings option to detect SQL warnings and fail the copy process #1500

Conversation

grodowski commented Feb 28, 2025 • edited Loading

Description

grodowski commented Mar 11, 2025 • edited Loading

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Copilot AI Mar 11, 2025

Choose a reason for hiding this comment

meiji163 commented Mar 12, 2025 • edited Loading

grodowski commented Feb 28, 2025 •

edited

Loading

grodowski commented Mar 11, 2025 •

edited

Loading

meiji163 commented Mar 12, 2025 •

edited

Loading