Skip to content

Commit c8c3516

Browse files
Isha Amoncarsuranjan
Isha Amoncar
authored andcommitted
[BACKPORT 2.12] [CDCSDK] [#9019] CDC SDK Client API and Java Console Subscriber
Summary: Original commits: - d294abf / D13836 - 6b15b16 / D15860 - cf5fead / D16057 - 5408e30 / D15989 - 53afee99e7cfe912fa7ec579ce470e999701a074 / D16176 Github Master Ticket: #9019 Design DocumentL https://docs.google.com/document/d/1_xZqU5UgzCu1W--kci3ajU7_iYXXHMQvudmybDI-Xsk/edit Functional Spec: https://docs.google.com/document/u/2/d/1nHuzHQ-qYVPbKi2dqo_drzSXMq00h7w5oi0JDf0GD1U/edit#heading=h.jmqfs7jgvvg8 This is the client-side change that exposes some APIs to be consumed by CDC consumers. Currently, these APIs are not public and are to be consumed by our Debezium connector. For testing purposes, we have written a console subscriber for testing purposes. [CDCSDK][#11679] Add missing license headers to Java files The following files have missing license headers: 1. `CreateCDCStreamRequest.java` 2. `GetCheckpointRequest.java` 3. `GetCheckpointResponse.java` 4. `GetDBStreamInfoRequest.java` 5. `GetDBStreamInfoResponse.java` 6. `SetCheckpointRequest.java` 7. `SetCheckpointResponse.java` This diff adds these missing headers. [#11779][CDCSDK] Add option to send a DDL record based on a flag value in GetChangesRequest Before this, the issue was that if for a stream ID, some data was consumed and a client comes up with the same stream ID and requests for changes, it will only receive the changes. Now the issue with this was with `Debezium` that when the connector was restarted, it directly received the changes without any DDL record, this DDL record was essential for Debezium since it was used to process the schema info for the columns in Debezium and in case it was not there, it lead to a `NullPointerException` on the client side, thus causing a connector crash effectively. [#11729][DocDB][xCluster] Fix for replication not working if user upgrades to a branch with CDCSDK code changes With the changes for CDCSDK, we have separate `source_type` values i.e. `XCLUSTER` for xCluster replication and `CDCSDK` for the new changes. Similarly there is another option i.e. `checkpoint_type` which can have `IMPLICIT` and `EXPLICIT` values. If a stream for replication has been created before upgrading, it was unable to continue replication after upgrading to the latest version since the `source_type` and `checkpoint_type` options were missing from it as it has only been introduced with the CDCSDK changes only. Test Fixes for 2.12 Test Plan: Jenkins: skip Unit tests in java for APIs and CDC behavior. We have done some long-running testing with applications. We have also run the YB-sample apps and enabled CDC on the table. Verified that all the events are received. Tested the complete CDC with Debezium pipeline with the specified change. Command to run test: `ybd --cxx-test integration-tests_cdcsdk_ysql-test --gtest_filter CDCSDKYsqlTest.TestNeedSchemaInfoFlag` * Manually tested with a custom build on dev portal Reviewers: nicolas, bogdan, ybase, rahuldesirazu, sdash, iamoncar, zyu, jhe, mkantimath, sergei Reviewed By: sergei Differential Revision: https://phabricator.dev.yugabyte.com/D16251
1 parent b952e40 commit c8c3516

File tree

85 files changed

+8713
-451
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

85 files changed

+8713
-451
lines changed

ent/src/yb/cdc/cdc_service.cc

+23-2
Original file line numberDiff line numberDiff line change
@@ -256,11 +256,15 @@ class CDCServiceImpl::Impl {
256256
it->last_streamed_op_id = op_id;
257257
}
258258

259-
std::shared_ptr<Schema> GetOrAddSchema(const ProducerTabletInfo& producer_tablet) {
259+
std::shared_ptr<Schema> GetOrAddSchema(const ProducerTabletInfo &producer_tablet,
260+
const bool need_schema_info) {
260261
std::lock_guard<decltype(mutex_)> l(mutex_);
261262
auto it = cdc_state_metadata_.find(producer_tablet);
262263

263264
if (it != cdc_state_metadata_.end()) {
265+
if (need_schema_info) {
266+
it->current_schema = std::make_shared<Schema>();
267+
}
264268
return it->current_schema;
265269
}
266270
CDCStateMetadataInfo info = CDCStateMetadataInfo {
@@ -645,6 +649,20 @@ CHECKED_STATUS VerifyArg(const SetCDCCheckpointRequestPB& req) {
645649
return Status::OK();
646650
}
647651

652+
// This function is to handle the upgrade scenario where the DB is upgraded from a version
653+
// without CDCSDK changes to the one with it. So in case, some required options are missing,
654+
// the default values will be added for the same.
655+
void AddDefaultOptionsIfMissing(std::unordered_map<std::string, std::string>* options) {
656+
if ((*options).find(cdc::kSourceType) == (*options).end()) {
657+
(*options).emplace(cdc::kSourceType, CDCRequestSource_Name(cdc::CDCRequestSource::XCLUSTER));
658+
}
659+
660+
if ((*options).find(cdc::kCheckpointType) == (*options).end()) {
661+
(*options).emplace(cdc::kCheckpointType,
662+
CDCCheckpointType_Name(cdc::CDCCheckpointType::IMPLICIT));
663+
}
664+
}
665+
648666
} // namespace
649667

650668
template <class ReqType, class RespType>
@@ -1101,7 +1119,7 @@ void CDCServiceImpl::GetChanges(const GetChangesRequestPB* req,
11011119
std::string commit_timestamp;
11021120
OpId last_streamed_op_id;
11031121

1104-
auto cached_schema = impl_->GetOrAddSchema(producer_tablet);
1122+
auto cached_schema = impl_->GetOrAddSchema(producer_tablet, req->need_schema_info());
11051123
s = cdc::GetChangesForCDCSDK(
11061124
req->stream_id(), req->tablet_id(), cdc_sdk_op_id, record, tablet_peer, mem_tracker,
11071125
&msgs_holder, resp, &commit_timestamp, &cached_schema,
@@ -2108,6 +2126,9 @@ Result<std::shared_ptr<StreamMetadata>> CDCServiceImpl::GetStream(const std::str
21082126
RETURN_NOT_OK(client()->GetCDCStream(stream_id, &ns_id, &object_ids, &options));
21092127

21102128
auto stream_metadata = std::make_shared<StreamMetadata>();
2129+
2130+
AddDefaultOptionsIfMissing(&options);
2131+
21112132
for (const auto& option : options) {
21122133
if (option.first == kRecordType) {
21132134
SCHECK(CDCRecordType_Parse(option.second, &stream_metadata->record_type),

ent/src/yb/integration-tests/cdcsdk_ysql-test.cc

+102-1
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,15 @@ class CDCSDKYsqlTest : public CDCSDKTestBase {
123123
return resp.cluster_config().cluster_uuid();
124124
}
125125

126-
// the range is exclusive of end i.e. [start, end)
126+
Status DropDB(Cluster* cluster) {
127+
const std::string db_name = "testdatabase";
128+
RETURN_NOT_OK(CreateDatabase(&test_cluster_, db_name, true));
129+
auto conn = VERIFY_RESULT(cluster->ConnectToDB(db_name));
130+
RETURN_NOT_OK(conn.ExecuteFormat("DROP DATABASE $0", kNamespaceName));
131+
return Status::OK();
132+
}
133+
134+
// The range is exclusive of end i.e. [start, end)
127135
void WriteRows(uint32_t start, uint32_t end, Cluster* cluster) {
128136
auto conn = EXPECT_RESULT(cluster->ConnectToDB(kNamespaceName));
129137
LOG(INFO) << "Writing " << end - start << " row(s)";
@@ -159,6 +167,18 @@ class CDCSDKYsqlTest : public CDCSDKTestBase {
159167
change_req->mutable_from_cdc_sdk_checkpoint()->set_write_id(0);
160168
}
161169

170+
void PrepareChangeRequest(
171+
GetChangesRequestPB* change_req, const CDCStreamId& stream_id,
172+
const google::protobuf::RepeatedPtrField<master::TabletLocationsPB>& tablets,
173+
const CDCSDKCheckpointPB& cp) {
174+
change_req->set_stream_id(stream_id);
175+
change_req->set_tablet_id(tablets.Get(0).tablet_id());
176+
change_req->mutable_from_cdc_sdk_checkpoint()->set_index(cp.index());
177+
change_req->mutable_from_cdc_sdk_checkpoint()->set_term(cp.term());
178+
change_req->mutable_from_cdc_sdk_checkpoint()->set_key(cp.key());
179+
change_req->mutable_from_cdc_sdk_checkpoint()->set_write_id(cp.write_id());
180+
}
181+
162182
void PrepareSetCheckpointRequest(
163183
SetCDCCheckpointRequestPB* set_checkpoint_req,
164184
const CDCStreamId stream_id,
@@ -253,6 +273,44 @@ class CDCSDKYsqlTest : public CDCSDKTestBase {
253273
LOG(INFO) << "Got " << ins_count << " insert records";
254274
ASSERT_EQ(expected_records_size, ins_count);
255275
}
276+
277+
Result<GetChangesResponsePB> VerifyIfDDLRecordPresent(
278+
const CDCStreamId& stream_id,
279+
const google::protobuf::RepeatedPtrField<master::TabletLocationsPB>& tablets,
280+
bool expect_ddl_record, bool is_first_call, const CDCSDKCheckpointPB* cp = nullptr) {
281+
GetChangesRequestPB req;
282+
GetChangesResponsePB resp;
283+
284+
if (cp == nullptr) {
285+
PrepareChangeRequest(&req, stream_id, tablets);
286+
} else {
287+
PrepareChangeRequest(&req, stream_id, tablets, *cp);
288+
}
289+
290+
// The default value for need_schema_info is false.
291+
if (expect_ddl_record) {
292+
req.set_need_schema_info(true);
293+
}
294+
295+
RpcController get_changes_rpc;
296+
RETURN_NOT_OK(cdc_proxy_->GetChanges(req, &resp, &get_changes_rpc));
297+
298+
if (resp.has_error()) {
299+
return StatusFromPB(resp.error().status());
300+
}
301+
302+
auto record = resp.cdc_sdk_proto_records(0);
303+
304+
// If it's the first call to GetChanges, we will get a DDL record irrespective of the
305+
// value of need_schema_info.
306+
if (is_first_call || expect_ddl_record) {
307+
EXPECT_EQ(record.row_message().op(), RowMessage::DDL);
308+
} else {
309+
EXPECT_NE(record.row_message().op(), RowMessage::DDL);
310+
}
311+
312+
return resp;
313+
}
256314
};
257315

258316
TEST_F(CDCSDKYsqlTest, YB_DISABLE_TEST_IN_TSAN(TestBaseFunctions)) {
@@ -328,6 +386,49 @@ TEST_F(CDCSDKYsqlTest, YB_DISABLE_TEST_IN_TSAN(MultiRowInsertion)) {
328386
LOG(INFO) << "Got " << ins_count << " insert records";
329387
ASSERT_EQ(expected_records_size, ins_count);
330388
}
389+
390+
TEST_F(CDCSDKYsqlTest, YB_DISABLE_TEST_IN_TSAN(DropDatabase)) {
391+
ASSERT_OK(SetUpWithParams(3, 1, false));
392+
CDCStreamId stream_id = ASSERT_RESULT(CreateDBStream());
393+
ASSERT_OK(DropDB(&test_cluster_));
394+
}
395+
396+
TEST_F(CDCSDKYsqlTest, YB_DISABLE_TEST_IN_TSAN(TestNeedSchemaInfoFlag)) {
397+
ASSERT_OK(SetUpWithParams(1, 1, false));
398+
399+
auto table = ASSERT_RESULT(CreateTable(&test_cluster_, kNamespaceName, kTableName));
400+
401+
google::protobuf::RepeatedPtrField<master::TabletLocationsPB> tablets;
402+
ASSERT_OK(test_client()->GetTablets(
403+
table, 0, &tablets, /* partition_list_version = */ nullptr));
404+
405+
std::string table_id = ASSERT_RESULT(GetTableId(&test_cluster_, kNamespaceName, kTableName));
406+
CDCStreamId stream_id = ASSERT_RESULT(CreateDBStream());
407+
408+
ASSERT_OK(SetInitialCheckpoint(stream_id, tablets));
409+
410+
// This will write one row with PK = 0.
411+
WriteRows(0 /* start */, 1 /* end */, &test_cluster_);
412+
413+
// This is the first call to GetChanges, we will get a DDL record.
414+
GetChangesResponsePB resp = ASSERT_RESULT(VerifyIfDDLRecordPresent(stream_id, tablets, false,
415+
true));
416+
417+
// Write another row to the database with PK = 1.
418+
WriteRows(1 /* start */, 2 /* end */, &test_cluster_);
419+
420+
// We will not get any DDL record here since this is not the first call and the flag
421+
// need_schema_info is also unset.
422+
resp = ASSERT_RESULT(VerifyIfDDLRecordPresent(stream_id, tablets, false, false,
423+
&resp.cdc_sdk_checkpoint()));
424+
425+
// Write another row to the database with PK = 2.
426+
WriteRows(2 /* start */, 3 /* end */, &test_cluster_);
427+
428+
// We will get a DDL record since we have enabled the need_schema_info flag.
429+
resp = ASSERT_RESULT(VerifyIfDDLRecordPresent(stream_id, tablets, true, false,
430+
&resp.cdc_sdk_checkpoint()));
431+
}
331432
} // namespace enterprise
332433
} // namespace cdc
333434
} // namespace yb

ent/src/yb/master/catalog_manager_ent.cc

+36-2
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,34 @@ class CDCStreamLoader : public Visitor<PersistentCDCStreamInfo> {
209209
public:
210210
explicit CDCStreamLoader(CatalogManager* catalog_manager) : catalog_manager_(catalog_manager) {}
211211

212+
void AddDefaultValuesIfMissing(const SysCDCStreamEntryPB& metadata,
213+
CDCStreamInfo::WriteLock* l) {
214+
bool source_type_present = false;
215+
bool checkpoint_type_present = false;
216+
217+
// Iterate over all the options to check if checkpoint_type and source_type are present.
218+
for (auto option : metadata.options()) {
219+
if (option.key() == cdc::kSourceType) {
220+
source_type_present = true;
221+
}
222+
if (option.key() == cdc::kCheckpointType) {
223+
checkpoint_type_present = true;
224+
}
225+
}
226+
227+
if (!source_type_present) {
228+
auto source_type_opt = l->mutable_data()->pb.add_options();
229+
source_type_opt->set_key(cdc::kSourceType);
230+
source_type_opt->set_value(cdc::CDCRequestSource_Name(cdc::XCLUSTER));
231+
}
232+
233+
if (!checkpoint_type_present) {
234+
auto checkpoint_type_opt = l->mutable_data()->pb.add_options();
235+
checkpoint_type_opt->set_key(cdc::kCheckpointType);
236+
checkpoint_type_opt->set_value(cdc::CDCCheckpointType_Name(cdc::IMPLICIT));
237+
}
238+
}
239+
212240
Status Visit(const CDCStreamId& stream_id, const SysCDCStreamEntryPB& metadata)
213241
REQUIRES(catalog_manager_->mutex_) {
214242
DCHECK(!ContainsKey(catalog_manager_->cdc_stream_map_, stream_id))
@@ -244,6 +272,10 @@ class CDCStreamLoader : public Visitor<PersistentCDCStreamInfo> {
244272
auto l = stream->LockForWrite();
245273
l.mutable_data()->pb.CopyFrom(metadata);
246274

275+
// If no source_type and checkpoint_type is present, that means the stream was created in
276+
// a previous version where these options were not present.
277+
AddDefaultValuesIfMissing(metadata, &l);
278+
247279
// If the table has been deleted, then mark this stream as DELETING so it can be deleted by the
248280
// catalog manager background thread. Otherwise if this stream is missing an entry
249281
// for state, then mark its state as Active.
@@ -2505,7 +2537,8 @@ std::vector<scoped_refptr<CDCStreamInfo>> CatalogManager::FindCDCStreamsForTable
25052537
for (const auto& entry : cdc_stream_map_) {
25062538
auto ltm = entry.second->LockForRead();
25072539
// for xCluster the first entry will be the table_id
2508-
if (ltm->table_id().Get(0) == table_id && !ltm->started_deleting()) {
2540+
if (!ltm->table_id().empty() && ltm->table_id().Get(0) == table_id &&
2541+
!ltm->started_deleting()) {
25092542
streams.push_back(entry.second);
25102543
}
25112544
}
@@ -2983,7 +3016,8 @@ Status CatalogManager::ListCDCStreams(const ListCDCStreamsRequestPB* req,
29833016
continue;
29843017
}
29853018

2986-
if (filter_table && table->id() != entry.second->table_id().Get(0)) {
3019+
if (filter_table && !entry.second->table_id().empty() &&
3020+
table->id() != entry.second->table_id().Get(0)) {
29873021
continue; // Skip deleting/deleted streams and streams from other tables.
29883022
}
29893023

java/yb-cdc/pom.xml

+43-3
Original file line numberDiff line numberDiff line change
@@ -28,16 +28,17 @@
2828
<dependency>
2929
<groupId>org.apache.kafka</groupId>
3030
<artifactId>kafka-clients</artifactId>
31-
<version>2.8.1</version>
31+
<version>2.3.0</version>
3232
</dependency>
3333
<dependency>
3434
<groupId>org.apache.avro</groupId>
3535
<artifactId>avro</artifactId>
36-
<version>1.10.2</version>
36+
<version>1.9.0</version>
3737
</dependency>
3838
<dependency>
3939
<groupId>commons-io</groupId>
4040
<artifactId>commons-io</artifactId>
41+
<version>2.5</version>
4142
</dependency>
4243
<dependency>
4344
<groupId>org.yb</groupId>
@@ -69,10 +70,12 @@
6970
<dependency>
7071
<groupId>commons-cli</groupId>
7172
<artifactId>commons-cli</artifactId>
73+
<version>1.2</version>
7274
</dependency>
7375
<dependency>
7476
<groupId>commons-codec</groupId>
7577
<artifactId>commons-codec</artifactId>
78+
<version>1.10</version>
7679
</dependency>
7780
<dependency>
7881
<groupId>org.apache.commons</groupId>
@@ -83,6 +86,33 @@
8386
<artifactId>gson</artifactId>
8487
<version>2.8.0</version>
8588
</dependency>
89+
<dependency>
90+
<groupId>org.postgresql</groupId>
91+
<artifactId>postgresql</artifactId>
92+
<version>42.2.23</version>
93+
</dependency>
94+
<dependency>
95+
<groupId>org.mybatis</groupId>
96+
<artifactId>mybatis</artifactId>
97+
<version>3.4.5</version>
98+
</dependency>
99+
<!-- dependency for YCQL driver -->
100+
<dependency>
101+
<groupId>com.yugabyte</groupId>
102+
<artifactId>java-driver-core</artifactId>
103+
<version>4.6.0-yb-6</version>
104+
</dependency>
105+
<!-- dependency for smart jdbc driver yugabyte -->
106+
<dependency>
107+
<groupId>com.yugabyte</groupId>
108+
<artifactId>jdbc-yugabytedb</artifactId>
109+
<version>42.3.0-beta.1</version>
110+
</dependency>
111+
<dependency>
112+
<groupId>${junit.groupId}</groupId>
113+
<artifactId>junit</artifactId>
114+
<scope>test</scope>
115+
</dependency>
86116
</dependencies>
87117

88118
<build>
@@ -114,6 +144,7 @@
114144
<plugin>
115145
<groupId>org.apache.maven.plugins</groupId>
116146
<artifactId>maven-dependency-plugin</artifactId>
147+
<version>3.2.0</version>
117148
<executions>
118149
<execution>
119150
<id>copy-dependencies</id>
@@ -134,13 +165,14 @@
134165
<plugin>
135166
<groupId>org.apache.maven.plugins</groupId>
136167
<artifactId>maven-assembly-plugin</artifactId>
168+
<version>3.3.0</version>
137169
<configuration>
138170
<finalName>yb-cdc-connector</finalName>
139171
<appendAssemblyId>false</appendAssemblyId>
140172
<archive>
141173
<manifest>
142174
<addClasspath>true</addClasspath>
143-
<mainClass>org.yb.cdc.Main</mainClass>
175+
<mainClass>org.yb.cdc.CDCConsoleSubscriber</mainClass>
144176
</manifest>
145177
</archive>
146178
<descriptorRefs>
@@ -166,6 +198,14 @@
166198
<preparationGoals>clean verify</preparationGoals>
167199
</configuration>
168200
</plugin>
201+
<plugin>
202+
<artifactId>maven-surefire-plugin</artifactId>
203+
<version>2.22.1</version>
204+
</plugin>
205+
<plugin>
206+
<artifactId>maven-jar-plugin</artifactId>
207+
<version>3.0.2</version>
208+
</plugin>
169209
</plugins>
170210
</build>
171211
</project>

0 commit comments

Comments
 (0)