DataShard and SchemeShard: handle borrowed parts in data erasure #15451

lex007in · 2025-03-07T09:47:18Z

Changelog entry

...

Changelog category

Not for changelog (changelog entry is not required)

Description for reviewers

Return error in case of borrowed parts being present in DataShard. SchemeShard will retry these failed DataCleanup attempts.
In case of split/merge SchemeShard will wait old tablet deletion.

github-actions · 2025-03-07T09:48:27Z

🟢 2025-03-26 00:46:56 UTC The validation of the Pull Request description is successful.

github-actions · 2025-03-07T09:50:35Z

⚪ 2025-03-07 09:50:34 UTC Pre-commit check linux-x86_64-release-asan for a71b5d4 has started.
⚪ 2025-03-07 09:50:49 UTC Artifacts will be uploaded here
⚪ 2025-03-07 09:53:33 UTC ya make is running...
🟡 2025-03-07 11:28:50 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
12133	11881	0	184	32	36

⚪ 2025-03-07 11:30:07 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-03-07 11:49:00 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
317 (only retried tests)	261	0	18	7	31

⚪ 2025-03-07 11:49:13 UTC ya make is running... (failed tests rerun, try 3)
🟢 2025-03-07 12:01:19 UTC Tests successful.

Test history | Ya make output | Test bloat | Test bloat | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
92 (only retried tests)	60	0	0	4	28

🟢 2025-03-07 12:01:29 UTC Build successful.
🟡 2025-03-07 12:01:59 UTC ydbd size 3.7 GiB changed* by +322.7 KiB, which is >= 100.0 KiB vs main: Warning

ydbd size dash	main: `a790a6d`	merge: `a71b5d4`	diff	diff %
ydbd size	3 994 271 672 Bytes	3 994 602 080 Bytes	+322.7 KiB	+0.008%
ydbd stripped size	1 388 750 024 Bytes	1 388 830 888 Bytes	+79.0 KiB	+0.006%

^{*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation}

github-actions · 2025-03-07T09:51:09Z

⚪ 2025-03-07 09:51:08 UTC Pre-commit check linux-x86_64-relwithdebinfo for a71b5d4 has started.
⚪ 2025-03-07 09:51:12 UTC Artifacts will be uploaded here
⚪ 2025-03-07 09:54:03 UTC ya make is running...
🟡 2025-03-07 11:17:15 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
26498	23916	0	2	2466	114

⚪ 2025-03-07 11:19:33 UTC ya make is running... (failed tests rerun, try 2)
🟢 2025-03-07 11:36:08 UTC Tests successful.

Test history | Ya make output | Test bloat | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
161 (only retried tests)	57	0	0	0	104

🟢 2025-03-07 11:36:18 UTC Build successful.
🟢 2025-03-07 11:36:37 UTC ydbd size 2.1 GiB changed* by +2.2 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash	main: `1237379`	merge: `a71b5d4`	diff	diff %
ydbd size	2 293 645 624 Bytes	2 293 647 920 Bytes	+2.2 KiB	+0.000%
ydbd stripped size	480 484 512 Bytes	480 484 960 Bytes	+448 Bytes	+0.000%

^{*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation}

github-actions · 2025-03-21T10:33:45Z

⚪ 2025-03-21 10:33:45 UTC Pre-commit check linux-x86_64-relwithdebinfo for 4ea03fe has started.
⚪ 2025-03-21 10:34:00 UTC Artifacts will be uploaded here
⚪ 2025-03-21 10:36:56 UTC ya make is running...
🔴 2025-03-21 11:04:20 UTC Build failed, see the logs. Also see fail summary

github-actions · 2025-03-21T10:35:10Z

⚪ 2025-03-21 10:35:10 UTC Pre-commit check linux-x86_64-release-asan for 4ea03fe has started.
⚪ 2025-03-21 10:35:24 UTC Artifacts will be uploaded here
⚪ 2025-03-21 10:38:07 UTC ya make is running...
🔴 2025-03-21 11:04:28 UTC Build failed, see the logs. Also see fail summary

github-actions · 2025-03-21T11:23:33Z

⚪ 2025-03-21 11:23:32 UTC Pre-commit check linux-x86_64-release-asan for a21d5c0 has started.
⚪ 2025-03-21 11:23:37 UTC Artifacts will be uploaded here
⚪ 2025-03-21 11:26:29 UTC ya make is running...
🟡 2025-03-21 12:40:41 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
12162	12076	0	29	22	35

⚪ 2025-03-21 12:41:49 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-03-21 12:55:39 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
124 (only retried tests)	79	0	3	8	34

⚪ 2025-03-21 12:55:50 UTC ya make is running... (failed tests rerun, try 3)
🟡 2025-03-21 13:08:31 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Test history | Ya make output | Test bloat | Test bloat | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
73 (only retried tests)	30	0	3	6	34

🟢 2025-03-21 13:08:38 UTC Build successful.
🟢 2025-03-21 13:09:07 UTC ydbd size 3.8 GiB changed* by +9.4 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash	main: `6dc2e21`	merge: `a21d5c0`	diff	diff %
ydbd size	4 073 892 992 Bytes	4 073 902 624 Bytes	+9.4 KiB	+0.000%
ydbd stripped size	1 409 061 352 Bytes	1 409 064 040 Bytes	+2.6 KiB	+0.000%

^{*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation}

github-actions · 2025-03-21T11:30:41Z

⚪ 2025-03-21 11:30:41 UTC Pre-commit check linux-x86_64-relwithdebinfo for a21d5c0 has started.
⚪ 2025-03-21 11:31:08 UTC Artifacts will be uploaded here
⚪ 2025-03-21 11:34:26 UTC ya make is running...
🟡 2025-03-21 12:39:46 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
19670	18329	0	5	1229	107

⚪ 2025-03-21 12:41:39 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-03-21 12:53:09 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
154 (only retried tests)	48	0	1	0	105

⚪ 2025-03-21 12:53:21 UTC ya make is running... (failed tests rerun, try 3)
🔴 2025-03-21 13:03:41 UTC Some tests failed, follow the links below.

Test history | Ya make output | Test bloat | Test bloat | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
147 (only retried tests)	42	0	1	0	104

🟢 2025-03-21 13:03:50 UTC Build successful.
🟢 2025-03-21 13:04:17 UTC ydbd size 2.2 GiB changed* by +4.4 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash	main: `6dc2e21`	merge: `a21d5c0`	diff	diff %
ydbd size	2 314 348 712 Bytes	2 314 353 208 Bytes	+4.4 KiB	+0.000%
ydbd stripped size	484 361 728 Bytes	484 362 624 Bytes	+896 Bytes	+0.000%

^{*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation}

github-actions · 2025-03-21T14:39:32Z

⚪ 2025-03-21 14:39:32 UTC Pre-commit check linux-x86_64-relwithdebinfo for 0e825db has started.
⚪ 2025-03-21 14:40:11 UTC Artifacts will be uploaded here
⚪ 2025-03-21 14:43:31 UTC ya make is running...
🟢 2025-03-21 15:49:46 UTC ydbd size 2.2 GiB changed* by +7.7 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash	main: `22e1472`	merge: `0e825db`	diff	diff %
ydbd size	2 315 427 832 Bytes	2 315 435 744 Bytes	+7.7 KiB	+0.000%
ydbd stripped size	484 584 608 Bytes	484 585 888 Bytes	+1.2 KiB	+0.000%

^{*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation}

github-actions · 2025-03-21T14:47:17Z

⚪ 2025-03-21 14:47:17 UTC Pre-commit check linux-x86_64-release-asan for 0e825db has started.
⚪ 2025-03-21 14:47:32 UTC Artifacts will be uploaded here
⚪ 2025-03-21 14:50:27 UTC ya make is running...
🟢 2025-03-21 16:14:39 UTC ydbd size 3.8 GiB changed* by +64.8 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash	main: `331ebd5`	merge: `0e825db`	diff	diff %
ydbd size	4 075 664 496 Bytes	4 075 730 880 Bytes	+64.8 KiB	+0.002%
ydbd stripped size	1 409 660 072 Bytes	1 409 671 144 Bytes	+10.8 KiB	+0.001%

^{*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation}

snaury · 2025-03-21T14:54:52Z

ydb/core/tx/datashard/datashard__data_cleanup.cpp

+            Response = std::make_unique<TEvDataShard::TEvForceDataCleanupResult>(
+                record.GetDataCleanupGeneration(),
+                Self->TabletID(),
+                NKikimrTxDataShard::TEvForceDataCleanupResult::FAILED);


Минорное: хорошо кроме статуса без каких-либо подробностей отправлять ещё какой-то ErrorReason, который бы можно было залоггировать на стороне клиента (например SchemeShard'а). Также меня несколько смущает, что эта ошибка будет повторяться, пока на шарде что-то не изменится, но о том что что-то изменилось узнать нельзя. Будет ли SchemeShard повторять запрос снова и снова?

Если добавить стоку ErrorReason, то увеличится размер сообщения на ровном месте, хотя может это и не страшно.
Да, SchemeShard будет всё время ретраить, и все ошибки которые может вернуть DataCleanup, можно и нужно ретраить. Можно переименовать как-то явно FAILED -> RETRYABLE_ERROR.

Сделал больше разных енумов и залогировал.

snaury · 2025-03-21T14:57:07Z

ydb/core/tx/schemeshard/schemeshard__tenant_data_erasure_manager.cpp

+            LOG_DEBUG_S(ctx, NKikimrServices::FLAT_TX_SCHEMESHARD,
+                "TTxCompleteDataErasureShard: data erasure failed at DataShard #" << record.GetTabletId()
+                    << ", schemestard: " << Self->TabletID());
+            return; // will be retried after timout in the queue


Из комментария не ясно что это за таймаут и когда запрос повторят? Точно ли не нужно с ним в очереди что-то сделать?

Эта очередь так устроена, что нужно явно звать OnDone() для задач, обработка которых завершена (там внутри OnDone задача удаляется из очереди в этот момент). Если не OnDone не вызывать, то потом вызывается обработчик таймаута, который в случае очиски вот тут: https://github.com/ydb-platform/ydb/blob/main/ydb/core/tx/schemeshard/schemeshard__tenant_data_erasure_manager.cpp#L182 -- именно он ретраит в конце. Сам таймаут задаётся в конфиге очистки.

ydb/core/tx/schemeshard/ut_data_erasure/ut_data_erasure.cpp

github-actions · 2025-03-21T17:05:32Z

⚪ 2025-03-21 17:05:32 UTC Pre-commit check linux-x86_64-release-asan for 0dfc973 has started.
⚪ 2025-03-21 17:05:47 UTC Artifacts will be uploaded here
⚪ 2025-03-21 17:08:58 UTC ya make is running...
🟡 2025-03-21 18:55:59 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
14132	14029	0	58	9	36

⚪ 2025-03-21 18:57:15 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-03-21 19:09:33 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
153 (only retried tests)	115	0	3	2	33

⚪ 2025-03-21 19:09:45 UTC ya make is running... (failed tests rerun, try 3)
🟡 2025-03-21 19:21:01 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Test history | Ya make output | Test bloat | Test bloat | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
64 (only retried tests)	29	0	3	1	31

🟢 2025-03-21 19:21:08 UTC Build successful.
🟢 2025-03-21 19:21:38 UTC ydbd size 3.8 GiB changed* by +10.8 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash	main: `a95ce0d`	merge: `0dfc973`	diff	diff %
ydbd size	4 075 721 248 Bytes	4 075 732 312 Bytes	+10.8 KiB	+0.000%
ydbd stripped size	1 409 668 456 Bytes	1 409 671 496 Bytes	+3.0 KiB	+0.000%

^{*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation}

github-actions · 2025-03-21T17:07:12Z

⚪ 2025-03-21 17:07:12 UTC Pre-commit check linux-x86_64-relwithdebinfo for 0dfc973 has started.
⚪ 2025-03-21 17:07:27 UTC Artifacts will be uploaded here
⚪ 2025-03-21 17:10:27 UTC ya make is running...
🟡 2025-03-21 18:44:00 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
28614	26004	0	3	2492	115

⚪ 2025-03-21 18:47:52 UTC ya make is running... (failed tests rerun, try 2)
🟢 2025-03-21 18:58:36 UTC Tests successful.

Test history | Ya make output | Test bloat | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
166 (only retried tests)	63	0	0	0	103

🟢 2025-03-21 18:58:46 UTC Build successful.
🟢 2025-03-21 18:59:05 UTC ydbd size 2.2 GiB changed* by +5.6 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash	main: `a95ce0d`	merge: `0dfc973`	diff	diff %
ydbd size	2 315 431 248 Bytes	2 315 436 944 Bytes	+5.6 KiB	+0.000%
ydbd stripped size	484 584 992 Bytes	484 586 048 Bytes	+1.0 KiB	+0.000%

^{*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation}

github-actions · 2025-03-26T00:46:59Z

⚪ 2025-03-26 00:46:58 UTC Pre-commit check linux-x86_64-release-asan for f688c04 has started.
⚪ 2025-03-26 00:47:13 UTC Artifacts will be uploaded here
⚪ 2025-03-26 00:50:15 UTC ya make is running...
🟡 2025-03-26 02:33:13 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
14150	14066	0	39	11	34

⚪ 2025-03-26 02:34:22 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-03-26 02:46:42 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
122 (only retried tests)	83	0	5	3	31

⚪ 2025-03-26 02:46:52 UTC ya make is running... (failed tests rerun, try 3)
🟡 2025-03-26 02:58:07 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Test history | Ya make output | Test bloat | Test bloat | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
65 (only retried tests)	28	0	5	4	28

🟢 2025-03-26 02:58:14 UTC Build successful.
🟢 2025-03-26 02:58:42 UTC ydbd size 3.8 GiB changed* by +10.2 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash	main: `f7971e1`	merge: `f688c04`	diff	diff %
ydbd size	4 085 067 968 Bytes	4 085 078 456 Bytes	+10.2 KiB	+0.000%
ydbd stripped size	1 411 362 856 Bytes	1 411 365 576 Bytes	+2.7 KiB	+0.000%

^{*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation}

github-actions · 2025-03-26T00:47:08Z

⚪ 2025-03-26 00:47:08 UTC Pre-commit check linux-x86_64-relwithdebinfo for f688c04 has started.
⚪ 2025-03-26 00:47:33 UTC Artifacts will be uploaded here
⚪ 2025-03-26 00:51:19 UTC ya make is running...
🟡 2025-03-26 02:30:18 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
28695	26026	0	4	2558	107

⚪ 2025-03-26 02:32:38 UTC ya make is running... (failed tests rerun, try 2)
🟢 2025-03-26 02:45:15 UTC Tests successful.

Test history | Ya make output | Test bloat | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
168 (only retried tests)	66	0	0	0	102

🟢 2025-03-26 02:45:22 UTC Build successful.
🟢 2025-03-26 02:45:45 UTC ydbd size 2.2 GiB changed* by +5.5 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash	main: `f7971e1`	merge: `f688c04`	diff	diff %
ydbd size	2 322 075 280 Bytes	2 322 080 896 Bytes	+5.5 KiB	+0.000%
ydbd stripped size	485 431 104 Bytes	485 432 096 Bytes	+992 Bytes	+0.000%

^{*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation}

ydb/core/tx/schemeshard/schemeshard__tenant_data_erasure_manager.cpp

github-actions · 2025-03-26T13:20:07Z

⚪ 2025-03-26 13:20:06 UTC Pre-commit check linux-x86_64-release-asan for 6d2861a has started.
⚪ 2025-03-26 13:20:13 UTC Artifacts will be uploaded here
⚪ 2025-03-26 13:23:26 UTC ya make is running...
🟡 2025-03-26 15:18:39 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
14175	14042	0	84	14	35

⚪ 2025-03-26 15:20:03 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-03-26 15:34:54 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
178 (only retried tests)	137	0	6	3	32

⚪ 2025-03-26 15:35:05 UTC ya make is running... (failed tests rerun, try 3)
🟡 2025-03-26 15:50:00 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Test history | Ya make output | Test bloat | Test bloat | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
65 (only retried tests)	29	0	4	3	29

🟢 2025-03-26 15:50:09 UTC Build successful.
🟡 2025-03-26 15:50:41 UTC ydbd size 3.8 GiB changed* by +231.3 KiB, which is >= 100.0 KiB vs main: Warning

ydbd size dash	main: `22ccfe5`	merge: `6d2861a`	diff	diff %
ydbd size	4 110 540 160 Bytes	4 110 777 032 Bytes	+231.3 KiB	+0.006%
ydbd stripped size	1 420 396 008 Bytes	1 420 475 336 Bytes	+77.5 KiB	+0.006%

^{*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation}

snaury · 2025-03-26T13:20:32Z

ydb/core/tx/schemeshard/schemeshard_impl.cpp

-            }
+        }
+        if (DataErasureManager->GetStatus() == EDataErasureStatus::IN_PROGRESS) {
+            Execute(CreateTxAddEntryToDataErasure(dataErasureShards), this->ActorContext());


А насколько это вообще безопасно тут делать? Ведь SetPartitioning вызывается из операции split/merge в транзакции, а здесь шедулится какая-то другая транзакция. И завершение split/merge может успешно закоммититься, а до этой транзакции даже очередь не дойдёт. В итоге шарды окажутся просто потерянными? Ну и ещё смущает, что SetPartitioning вызывается в рамках загрузки schemeshard'а, вообще тут кажется никогда не предполагалось какой-то такой сложной логики/действий.

Да, главное, что смущает -- что SetPartitioning() перегружается несвойственными ей делами. Я уже предлагал посмотреть, как можно переделать.

От вызова SetPartitioning() во время TxInit защищаться и не надо -- это ключевая вещь для продолжения процесса data erasure, если он уже работал до рестарта. Хотя конечно не хватает комментариев c описанием зависимостей в порядке загрузки состояния DataErasureManager и шардов таблиц.

до этой транзакции даже очередь не дойдёт

Это если schemeshard перезапустится?
Тогда статус data erasure будет IN_PROGRESS и как раз SetPartitioning() во время TxInit отработают и обновят актуальный список шардов для чистки. Логически верно, но кажется напряжно. Было бы лучше на рестарте выполнять один проход по общему списку шардов вместо отдельной транзакции на каждую таблицу.

github-actions · 2025-03-26T13:29:06Z

⚪ 2025-03-26 13:29:05 UTC Pre-commit check linux-x86_64-relwithdebinfo for 6d2861a has started.
⚪ 2025-03-26 13:29:12 UTC Artifacts will be uploaded here
⚪ 2025-03-26 13:32:30 UTC ya make is running...
🟡 2025-03-26 15:13:44 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
28720	26063	0	3	2542	112

⚪ 2025-03-26 15:16:29 UTC ya make is running... (failed tests rerun, try 2)
🟢 2025-03-26 15:30:53 UTC Tests successful.

Test history | Ya make output | Test bloat | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
161 (only retried tests)	55	0	0	0	106

🟢 2025-03-26 15:31:06 UTC Build successful.
🟡 2025-03-26 15:31:26 UTC ydbd size 2.2 GiB changed* by +112.3 KiB, which is >= 100.0 KiB vs main: Warning

ydbd size dash	main: `22ccfe5`	merge: `6d2861a`	diff	diff %
ydbd size	2 342 731 376 Bytes	2 342 846 344 Bytes	+112.3 KiB	+0.005%
ydbd stripped size	489 884 928 Bytes	489 906 048 Bytes	+20.6 KiB	+0.004%

^{*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation}

ijon · 2025-03-31T18:34:07Z

ydb/core/tx/schemeshard/schemeshard__delete_tablet_reply.cpp

+            if (Self->DataErasureManager->GetStatus() == EDataErasureStatus::IN_PROGRESS) {
+                Self->Execute(Self->CreateTxCancelDataErasureShards({ShardIdx}));
+            }


Почему потребовалось переносить запуск CancelDataErasureShards сюда?

В прошлом месте таблетка ещё не была удалена, и очистка могла успешно завершится до удаления этой таблетки, что плохо и нарушает гарантии удаления. Сюда же попадаем уже после того, как хайв ответил, что таблетка удалена.
(Да, ещё остаётся проблема, что хайв на самом деле отвечает до того, как удалил данные в блобсторадже, но это собираемся отдельно доделывать).

ijon · 2025-03-31T20:32:48Z

ydb/core/tx/schemeshard/schemeshard_impl.cpp

-            }
+        }
+        if (DataErasureManager->GetStatus() == EDataErasureStatus::IN_PROGRESS) {
+            Execute(CreateTxAddEntryToDataErasure(dataErasureShards), this->ActorContext());


Да, главное, что смущает -- что SetPartitioning() перегружается несвойственными ей делами. Я уже предлагал посмотреть, как можно переделать.

От вызова SetPartitioning() во время TxInit защищаться и не надо -- это ключевая вещь для продолжения процесса data erasure, если он уже работал до рестарта. Хотя конечно не хватает комментариев c описанием зависимостей в порядке загрузки состояния DataErasureManager и шардов таблиц.

до этой транзакции даже очередь не дойдёт

Это если schemeshard перезапустится?
Тогда статус data erasure будет IN_PROGRESS и как раз SetPartitioning() во время TxInit отработают и обновят актуальный список шардов для чистки. Логически верно, но кажется напряжно. Было бы лучше на рестарте выполнять один проход по общему списку шардов вместо отдельной транзакции на каждую таблицу.

ijon

OK for now.
But follow-up development is needed.

…-platform#15451)

Fixes for data cleanup edge cases: - DataShard: clean readsets in DataCleanup (#15438) - DataShard and SchemeShard: handle borrowed parts in data erasure (#15451)

github-actions bot added the not-for-changelog label Mar 7, 2025

lex007in requested a review from snaury March 7, 2025 12:27

lex007in marked this pull request as ready for review March 7, 2025 12:27

lex007in self-assigned this Mar 7, 2025

github-actions bot added not-for-changelog and removed not-for-changelog labels Mar 7, 2025

lex007in force-pushed the borrow branch from 6007823 to e21f627 Compare March 21, 2025 10:32

lex007in requested a review from a team as a code owner March 21, 2025 10:32

lex007in changed the title ~~LocalDB: add waiting for borrow parts returning in DataCleanup logic~~ DataShard and SchemeShard: handle borrowed parts in data erasure Mar 21, 2025

github-actions bot added not-for-changelog and removed not-for-changelog labels Mar 21, 2025

lex007in force-pushed the borrow branch 2 times, most recently from 3414a3e to 3d62e99 Compare March 21, 2025 11:19

lex007in requested a review from molotkov-and March 21, 2025 11:23

DataShard and SchemeShard: handle borrowed parts in data erasure

9d644a1

lex007in force-pushed the borrow branch from 3d62e99 to 9d644a1 Compare March 21, 2025 14:35

snaury reviewed Mar 21, 2025

View reviewed changes

Review fixes

dd9edc9

lex007in requested a review from snaury March 24, 2025 07:41

Wait tablet deletion and add split/merge tests

b2ae2e5

github-actions bot added not-for-changelog and removed not-for-changelog labels Mar 26, 2025

snaury reviewed Mar 26, 2025

View reviewed changes

ydb/core/tx/schemeshard/schemeshard__tenant_data_erasure_manager.cpp Show resolved Hide resolved

Fix status enum in TEvForceDataCleanupResult

c92971e

snaury approved these changes Mar 26, 2025

View reviewed changes

lex007in requested a review from ijon March 26, 2025 14:38

ijon reviewed Mar 31, 2025

View reviewed changes

ijon approved these changes Apr 1, 2025

View reviewed changes

lex007in mentioned this pull request Apr 1, 2025

SchemeShard: data erasure refactoring #16604

Open

lex007in merged commit 4556432 into ydb-platform:main Apr 1, 2025
15 checks passed

lex007in deleted the borrow branch April 1, 2025 14:23

lex007in added a commit to lex007in/ydb that referenced this pull request Apr 1, 2025

DataShard and SchemeShard: handle borrowed parts in data erasure (ydb…

67f585f

…-platform#15451)

lex007in mentioned this pull request Apr 1, 2025

Data cleanup fixes #16627

Merged

lex007in added a commit that referenced this pull request Apr 4, 2025

Data cleanup fixes (#16627)

1d89df8

Fixes for data cleanup edge cases: - DataShard: clean readsets in DataCleanup (#15438) - DataShard and SchemeShard: handle borrowed parts in data erasure (#15451)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataShard and SchemeShard: handle borrowed parts in data erasure #15451

DataShard and SchemeShard: handle borrowed parts in data erasure #15451

lex007in commented Mar 7, 2025 •

edited

Loading

github-actions bot commented Mar 7, 2025 •

edited

Loading

github-actions bot commented Mar 7, 2025 •

edited

Loading

github-actions bot commented Mar 7, 2025 •

edited

Loading

github-actions bot commented Mar 21, 2025 •

edited

Loading

github-actions bot commented Mar 21, 2025 •

edited

Loading

github-actions bot commented Mar 21, 2025 •

edited

Loading

github-actions bot commented Mar 21, 2025 •

edited

Loading

github-actions bot commented Mar 21, 2025 •

edited

Loading

github-actions bot commented Mar 21, 2025 •

edited

Loading

snaury Mar 21, 2025

lex007in Mar 21, 2025

lex007in Mar 24, 2025

snaury Mar 21, 2025

lex007in Mar 21, 2025

github-actions bot commented Mar 21, 2025 •

edited

Loading

github-actions bot commented Mar 21, 2025 •

edited

Loading

github-actions bot commented Mar 26, 2025 •

edited

Loading

github-actions bot commented Mar 26, 2025 •

edited

Loading

github-actions bot commented Mar 26, 2025 •

edited

Loading

snaury Mar 26, 2025

ijon Mar 31, 2025

github-actions bot commented Mar 26, 2025 •

edited

Loading

ijon Mar 31, 2025

lex007in Mar 31, 2025

ijon Mar 31, 2025

ijon left a comment

DataShard and SchemeShard: handle borrowed parts in data erasure #15451

DataShard and SchemeShard: handle borrowed parts in data erasure #15451

Conversation

lex007in commented Mar 7, 2025 • edited Loading

Changelog entry

Changelog category

Description for reviewers

github-actions bot commented Mar 7, 2025 • edited Loading

github-actions bot commented Mar 7, 2025 • edited Loading

github-actions bot commented Mar 7, 2025 • edited Loading

github-actions bot commented Mar 21, 2025 • edited Loading

github-actions bot commented Mar 21, 2025 • edited Loading

github-actions bot commented Mar 21, 2025 • edited Loading

github-actions bot commented Mar 21, 2025 • edited Loading

github-actions bot commented Mar 21, 2025 • edited Loading

github-actions bot commented Mar 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Mar 21, 2025 • edited Loading

github-actions bot commented Mar 21, 2025 • edited Loading

github-actions bot commented Mar 26, 2025 • edited Loading

github-actions bot commented Mar 26, 2025 • edited Loading

github-actions bot commented Mar 26, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Mar 26, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ijon left a comment

Choose a reason for hiding this comment

lex007in commented Mar 7, 2025 •

edited

Loading

github-actions bot commented Mar 7, 2025 •

edited

Loading

github-actions bot commented Mar 7, 2025 •

edited

Loading

github-actions bot commented Mar 7, 2025 •

edited

Loading

github-actions bot commented Mar 21, 2025 •

edited

Loading

github-actions bot commented Mar 21, 2025 •

edited

Loading

github-actions bot commented Mar 21, 2025 •

edited

Loading

github-actions bot commented Mar 21, 2025 •

edited

Loading

github-actions bot commented Mar 21, 2025 •

edited

Loading

github-actions bot commented Mar 21, 2025 •

edited

Loading

github-actions bot commented Mar 21, 2025 •

edited

Loading

github-actions bot commented Mar 21, 2025 •

edited

Loading

github-actions bot commented Mar 26, 2025 •

edited

Loading

github-actions bot commented Mar 26, 2025 •

edited

Loading

github-actions bot commented Mar 26, 2025 •

edited

Loading

github-actions bot commented Mar 26, 2025 •

edited

Loading