Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-37120][cdc-connector] Add ending split chunk first to avoid TaskManager oom #3856

Merged
merged 1 commit into from
Mar 5, 2025

Conversation

beryllw
Copy link
Contributor

@beryllw beryllw commented Jan 14, 2025

Add ending split chunk first to avoid TaskManager oom

@beryllw beryllw changed the title Add ending split chunk first to avoid TaskManager oom [FLINK-37120][pipeline-connector/mysql] Add ending split chunk first to avoid TaskManager oom Jan 14, 2025
@github-actions github-actions bot added the base label Jan 15, 2025
@beryllw
Copy link
Contributor Author

beryllw commented Jan 15, 2025

CI:https://github.com/beryllw/flink-cdc/actions/runs/12779452404
@leonardBang Could you please help me retrigger the CI?

@beryllw
Copy link
Contributor Author

beryllw commented Jan 15, 2025

Adding the ending split chunk first changes the snapshot read result order. I will fix the unit test.

@leonardBang
Copy link
Contributor

THanks @beryllw for the contribution, the idea makes sense to me, but we need to consider the compatibility, a configuration is recommended

@beryllw
Copy link
Contributor Author

beryllw commented Jan 15, 2025

THanks @beryllw for the contribution, the idea makes sense to me, but we need to consider the compatibility, a configuration is recommended

If data consistency is not an issue, are there any compatibility concerns we need to address?

@lvyanquan
Copy link
Contributor

What‘s more, could you add this optimization to MongoDBChunkSplitter?

@beryllw
Copy link
Contributor Author

beryllw commented Jan 16, 2025

What‘s more, could you add this optimization to MongoDBChunkSplitter?

Sure, i will check MongoDBChunkSplitter.

@beryllw
Copy link
Contributor Author

beryllw commented Jan 17, 2025

What‘s more, could you add this optimization to MongoDBChunkSplitter?

#3704 (comment)

I overlooked this issue #3704, good idea. Maybe support AssignStrategy in base module is better? Do we implement this in a future PR or in this PR?

@lvyanquan
Copy link
Contributor

lvyanquan commented Jan 17, 2025

I missed this issue #3704, good idea. Maybe support AssignStrategy in base module is better? Do we implement this in a future PR or in this PR?

My previous PR provided both ascending_order and descending_order order configurations, but I think it would be better to place both start and end unbounded chunks at the beginning, so you can go ahead.

I think if there is no impact on restarting from the previous state, there is no compatibility issue, but adding a parameter to control it would be safer.

@github-actions github-actions bot added the docs Improvements or additions to documentation label Jan 17, 2025
@lvyanquan
Copy link
Contributor

I think adding a test in MySqlSourceITCase to test the case of restoring from failure will be better.

@beryllw
Copy link
Contributor Author

beryllw commented Jan 20, 2025

I think adding a test in MySqlSourceITCase to test the case of restoring from failure will be better.

agree.

@beryllw beryllw requested a review from lvyanquan January 20, 2025 08:57
@lvyanquan
Copy link
Contributor

lvyanquan commented Jan 22, 2025

LGTM.
I'm worried if this parameter is a bit complicated, what about naming it scan.incremental.assign-max-chunk-first.enabled to avoid using ending? As chunk is a concept in snapshot phase.
WDYT @beryllw @leonardBang?

@beryllw
Copy link
Contributor Author

beryllw commented Jan 23, 2025

LGTM. I'm worried if this parameter is a bit complicated, what about naming it scan.incremental.assign-max-chunk-first.enabled to avoid using ending? As chunk is a concept in snapshot phase. WDYT @beryllw @leonardBang?

What about scan.incremental.snapshot.assign-ending-chunk-first.enabled, when the write throughput is low, the ending chunk is not equal to the maximum chunk.

@beryllw
Copy link
Contributor Author

beryllw commented Feb 7, 2025

@leonardBang @lvyanquan PTAL

@lvyanquan
Copy link
Contributor

Thanks @beryllw for this update, since you've added this feature to jdbc common module, it seems like that this config option was not added in DynamicTableSourceFactory and was not passed to Source.

@beryllw
Copy link
Contributor Author

beryllw commented Feb 11, 2025

@beryllw
Copy link
Contributor Author

beryllw commented Feb 12, 2025

@beryllw beryllw requested a review from lvyanquan February 12, 2025 02:04
Copy link
Contributor

@lvyanquan lvyanquan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @beryllw for this contribution and it looks good to me.

I think you can create one jira to make the default value of this option to true after one or two major versions and testing in product environment.

@lvyanquan
Copy link
Contributor

And you might need to do some rebase to fix the ci failure.

@beryllw
Copy link
Contributor Author

beryllw commented Feb 18, 2025

And you might need to do some rebase to fix the ci failure.

Sure.

@beryllw beryllw changed the title [FLINK-37120][pipeline-connector/mysql] Add ending split chunk first to avoid TaskManager oom [FLINK-37120][cdc-connector] Add ending split chunk first to avoid TaskManager oom Feb 18, 2025
@beryllw
Copy link
Contributor Author

beryllw commented Feb 18, 2025

https://github.com/beryllw/flink-cdc/actions/runs/13387743918
@leonardBang Please help me retrigger the CI.

Copy link
Contributor

@leonardBang leonardBang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @beryllw for the contribution, I left two comments

Copy link
Contributor

@leonardBang leonardBang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @beryllw for the great work, LGTM

@lvyanquan
Copy link
Contributor

Hi, @beryllw. Could you check the failed cases? If some are unrelated to the change, please add some description.

@beryllw
Copy link
Contributor Author

beryllw commented Feb 24, 2025

Looks like unrelated to this change, here is the CI beryllw#7, and I also manually tested the failed tests(except for MySqlConnectorITCase).

@beryllw
Copy link
Contributor Author

beryllw commented Feb 26, 2025

@leonardBang Is there anything else that needs to be completed? I'm concerned about potential conflicts later on.Could you please help me move this forward?

@leonardBang
Copy link
Contributor

@leonardBang Is there anything else that needs to be completed? I'm concerned about potential conflicts later on.Could you please help me move this forward?

Hey, @beryllw Thank you for your reminder, I'll help investigate the failure case and merge your PR.

@leonardBang
Copy link
Contributor

@leonardBang Is there anything else that needs to be completed? I'm concerned about potential conflicts later on.Could you please help me move this forward?

I guess the failed test cases in SqlServer and MongoDB may be related to our change, as some test case may assume the legacy split order will be assign first, could you check them?

@beryllw
Copy link
Contributor Author

beryllw commented Feb 27, 2025

I guess the failed test cases in SqlServer and MongoDB may be related to our change, as some test case may assume the legacy split order will be assign first, could you check them?

Sure, I will recheck the failed test cases in SqlServer and MongoDB.

@leonardBang
Copy link
Contributor

Rebase this PR to master to obtain fixed patches, let's see what happens next @beryllw

@leonardBang
Copy link
Contributor

CI passed, merging....

@leonardBang leonardBang merged commit 2c69986 into apache:master Mar 5, 2025
28 checks passed
@beryllw
Copy link
Contributor Author

beryllw commented Mar 6, 2025

Rebase this PR to master to obtain fixed patches, let's see what happens next @beryllw

Sorry, I've been too busy to check earlier. Thanks for your help.

@leonardBang
Copy link
Contributor

Rebase this PR to master to obtain fixed patches, let's see what happens next @beryllw

Sorry, I've been too busy to check earlier. Thanks for your help.

You're welcome @beryllw , no need to say sorry, the community is just our interest, our own work is the first priority ^^

lvyanquan pushed a commit to lvyanquan/flink-cdc that referenced this pull request Mar 7, 2025
…nded splits firstly to avoid buffering too much data

 This closes apache#3856.
SML0127 pushed a commit to SML0127/flink-cdc-connectors that referenced this pull request Mar 12, 2025
…nded splits firstly to avoid buffering too much data

 This closes apache#3856.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants