-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-32097][Connectors/Kinesis] Implement support for Kinesis deaggregation #188
base: main
Are you sure you want to change the base?
Conversation
In order to implement record deaggregation, the RecordBatch uses the AggregatorUtil, from the KCL 3, with the subscribed shard to deaggregate. The deaggregated records are now instance of KinesisClientRecord since the AggregatorUtil requires it. Therefore, there is a bit of refactor to update the class of the record that was received. Also, the RecordBatch class was extracted out of KinesisShardSplitReaderBase
…ng deaggregation. There was a need to create aggregated records to test if the records are being deaggregated correctly. For that there is a dependency that can create aggregated records, but unfortunately it does not have a version available in the maven repository that is compatible with the kcl 3.x. So it uses a release of github that is compatible. Also, the protobuf version set in the dependency management was not compatible with the kcl 3.x
… for aggregation tests.
…plitReader for aggregated records.
…m stream to loop.
Thanks for opening this pull request! Please check out our contributing guidelines. (https://flink.apache.org/contributing/how-to-contribute.html) |
@hlteoh37 could you take a look? |
@Lzgpom Building this locally and I've got a spotless-check format violation. |
I tested it locally, with a stream produced by KPL with aggregation enabled, and it worked fine. |
I tried to run |
Uhm, |
If I run a simple |
@nicusX I was using the java 17 and the spotless did not work. I changed to java 11 and it worked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM
I verified that it now does not fail checkstyle checks.
Note for maintainers: There is a breaking change in the public API that forces a major version change.
<!-- the kinesis aggregator dependency since it is not available in maven central --> | ||
<!-- look into issue https://github.com/awslabs/kinesis-aggregation/issues/120 --> | ||
<groupId>com.github.awslabs.kinesis-aggregation</groupId> | ||
<artifactId>amazon-kinesis-aggregator</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has Apache 2.0 licence, all good so
https://github.com/awslabs/kinesis-aggregation
@@ -60,7 +60,7 @@ default void open(DeserializationSchema.InitializationContext context) throws Ex | |||
* @param output the identifier of the shard the record was sent to | |||
* @throws IOException exception when deserializing record | |||
*/ | |||
void deserialize(Record record, String stream, String shardId, Collector<T> output) | |||
void deserialize(KinesisClientRecord record, String stream, String shardId, Collector<T> output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change to the public interface
@@ -50,10 +49,10 @@ | |||
/** Base implementation of the SplitReader for reading from KinesisShardSplits. */ | |||
@Internal | |||
public abstract class KinesisShardSplitReaderBase | |||
implements SplitReader<Record, KinesisShardSplit> { | |||
implements SplitReader<KinesisClientRecord, KinesisShardSplit> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change actually changes the interface exposed for the KinesisSource, so it would be a backwards incompatible change. Is there a way we can wrap this internally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been looking at the implications of not changing from Record
into KinesisClientRecord
and this is what I have found:
- The deaggreated records would have to be converted from
KinesisClientRecord
, sinceAggregatorUtil
only usesKinesisClientRecord
, toRecord
adding a slight overhead. - The
KinesisClientRecord
has thesubSequenceNumber
that could and should be used for checkpointing.
Apart from that I don't see any more issues. Either way I don't mind changing it back to Record
.
Purpose of the change
This PR implements the Kinesis deaggregation. The implementation was heavily based on the old kinesis connector.
Verifying this change
This change added tests and can be verified as follows:
Further ToDos and Follow-ups
Checkpoints do not take into consideration the sequence number of the deaggregated records.
Brief change log
AggregatorUtil
from the KCL 3.xRecordBatch
regarding deaggregation. There was a need for aggregated records to test. So it was used https://github.com/awslabs/kinesis-aggregation to do so. Unfortunatly, it does have a version in the maven repository compatible with KCL 3.x so it was used a workaround.Significant changes
@Public(Evolving)
)