-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSHARP-5442: Atlas Search doesn't respect the correct serializer #1583
base: main
Are you sure you want to change the base?
Conversation
@BorisDog I've added you as a reviewer since it seems you were the one that worked on search, mostly. This is still a work in progress, I've broke some things in the process.
|
|
||
var renderedField = _arrayField.Render(args); | ||
|
||
var serializer = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incredibly ugly, but was just testing if it worked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rstam this is what I was referring to.
Do you think it would be worth adding a new property to RenderArgs
so that we can get directly the serializer we need (similar to RenderForElemMatch
for instance)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is on the right track but can use a little tweaking to how the right serializer is found and to reduce duplication between how scalar and array fields are handled.
Something like this:
private protected override BsonDocument RenderArguments(RenderArgs<TDocument> args)
{
IBsonSerializer<TValue> valueSerializer;
if (_field != null)
{
var renderedField = _field.Render(args);
valueSerializer = renderedField.FieldSerializer;
}
else
{
var renderedArrayField = _arrayField.Render(args);
valueSerializer = (IBsonSerializer<TValue>)ArraySerializerHelper.GetItemSerializer(renderedArrayField.FieldSerializer);
}
var document = new BsonDocument();
using var bsonWriter = new BsonDocumentWriter(document);
var context = BsonSerializationContext.CreateRoot(bsonWriter);
bsonWriter.WriteStartDocument();
bsonWriter.WriteName("value");
valueSerializer.Serialize(context, _value);
bsonWriter.WriteEndDocument();
return document;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, this solves my doubts. That makes sense, I was mostly focusing on how to get the correct serializer before trying to refactor.
Regarding IBsonArraySerializer
, the majority of times when we implement that interface we also implement IChildSerializerConfigurable
, and that makes sense. It also seems that for this use case the result we'd get would be the same, so what would you think about using IChildSerializerConfigurable
instead?
To be honest this is more of a generic question, I'm curious about if there's any preference on using one over the other, of course considering that TryGetItemSerializationInfo
can potentially return more info (not in this case though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IBsonArraySerializer
and IChildSerializerConfigurable
are orthogonal. We can't just implement IChildSerializerConfigurable
instead of IBsonArraySerializer
.
In this case we must implement IBsonArraySerializer
.
It is true that in general whenever implementing IBsonArraySerializer
we would likely also want to implement IChildSerializerConfigurable
. However, that applies mostly to public
serializers that a user might want to configure. This is an internal
serializer, so I think implementing IBsonArraySerializer
is sufficient.
public EqualsSearchDefinition(FieldDefinition<TDocument, TField> path, TField value, SearchScoreDefinition<TDocument> score) | ||
: base(OperatorType.Equals, new SingleSearchPathDefinition<TDocument>(path), score) | ||
{ | ||
ValidateType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I question whether we need to have ValidateType
at all. Looks to me like this was just a client-side limitation related to what .NET types ToBsonValue
knew how to convert to BSON, but now that we are using the proper serializer this limitation should no longer exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked with Boris and he confirmed that it is a server limitation which types are supported.
But even so, we probably shouldn't be checking this client side. We should just send the value to the server and let it return an error if it doesn't support the data type.
That way our driver will work well with all supported versions of the server, which might have different limitations. And if future server versions support new value types we won't have to make any changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also like this idea. I tried to keep it consistent with what was there before, but I agree that moving this check server side would allow us to be more free in what we do and don't support over time.
I need to remember to do the same with the other operators too.
|
||
var renderedField = _arrayField.Render(args); | ||
|
||
var serializer = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is on the right track but can use a little tweaking to how the right serializer is found and to reduce duplication between how scalar and array fields are handled.
Something like this:
private protected override BsonDocument RenderArguments(RenderArgs<TDocument> args)
{
IBsonSerializer<TValue> valueSerializer;
if (_field != null)
{
var renderedField = _field.Render(args);
valueSerializer = renderedField.FieldSerializer;
}
else
{
var renderedArrayField = _arrayField.Render(args);
valueSerializer = (IBsonSerializer<TValue>)ArraySerializerHelper.GetItemSerializer(renderedArrayField.FieldSerializer);
}
var document = new BsonDocument();
using var bsonWriter = new BsonDocumentWriter(document);
var context = BsonSerializationContext.CreateRoot(bsonWriter);
bsonWriter.WriteStartDocument();
bsonWriter.WriteName("value");
valueSerializer.Serialize(context, _value);
bsonWriter.WriteEndDocument();
return document;
}
@@ -317,6 +317,8 @@ internal class IEnumerableSerializer<TItem> : SerializerBase<IEnumerable<TItem>> | |||
{ | |||
private readonly IBsonSerializer<TItem> _itemSerializer; | |||
|
|||
public IBsonSerializer<TItem> ItemSerializer => _itemSerializer; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding this new public property have the class implement IBsonArraySerializer
and implement the TryGetItemSerializationIinfo
method:
public bool TryGetItemSerializationInfo(out BsonSerializationInfo serialization
{
serializationInfo = new(elementName: null, _itemSerializer, typeof(TItem));
return true;
}
All serializers for array-like values should implement IBsonArraySerializer
. I don't know why this class did not already implement it.
var renderedField = _arrayField.Render(args); | ||
|
||
var serializer = | ||
(renderedField.ValueSerializer as FieldValueSerializerHelper.IEnumerableSerializer<TField>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of trying to cast to the very specific class FieldValueSerializerHelper.IEnumerableSerializer<TField>
use the more general from (used everywhere else) to get the item serializer from an array-like serializer:
valueSerializer = (IBsonSerializer<TValue>)ArraySerializerHelper.GetItemSerializer(renderedArrayField.FieldSerializer);
This is why FieldValueSerializerHelper.IEnumerableSerializer<TItem>
should implement IBsonArraySerializer
(as all array-like serializers should). Not sure why it didn't already...
public EqualsSearchDefinition(FieldDefinition<TDocument, TField> path, TField value, SearchScoreDefinition<TDocument> score) | ||
: base(OperatorType.Equals, new SingleSearchPathDefinition<TDocument>(path), score) | ||
{ | ||
ValidateType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked with Boris and he confirmed that it is a server limitation which types are supported.
But even so, we probably shouldn't be checking this client side. We should just send the value to the server and let it return an error if it doesn't support the data type.
That way our driver will work well with all supported versions of the server, which might have different limitations. And if future server versions support new value types we won't have to make any changes.
tests/MongoDB.Driver.Tests/Search/SearchDefinitionBuilderTests.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too familiar with the Search API but as far as I can tell this looks OK.
You should get one more LGTM before merging to main though.
@sanych-sun can you take a look too? |
@BorisDog another thing to consider is that this is a behaviour breaking change, so we need to consider if we want to merge this now or wait. I think it could also be seen as a bug fix to be honest, but we should take a decision. |
value switch | ||
private protected override BsonDocument RenderArguments(RenderArgs<TDocument> args) | ||
{ | ||
var searchPathDefinition = (SingleSearchPathDefinition<TDocument>)_path; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit inefficient that we need to render the path twice.
Also it would be good to extract this to a single helper method.
What do you and @rstam think about having new internal Render
method in SearchPathDefinition
, something like
(BsonValue, RenderedFieldDefinition) Render(...)
And passing field serializer in OperatorSearchDefinition.RenderArguments()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. I also think it's a little complicated to extract a method like that to SearchPathDefinition
, since (BsonValue, RenderedFieldDefinition) Render(...)
would make sense for a SingleSearchPathDefinition
, but not for MultiSearchPathDefinition
or WildcardSearchPathDefinition
.
The issue here is that operators support various degrees of SearchPathDefinition
. I'm trying to think of a more general solution that does not make breaking changes in the public API, but so far it all feels kinda clunky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I meant that the returned RenderedFieldDefinition
would be null in some cases. I agree that it's not the greatest design, but I don't have any better suggestion as of now. I does have the upside of not rendering the path twice.
As long as all the API changes are internal, we can evolve them in the future if we come up with a better idea (the new suggested Render method should be internal).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 60c1d57, I've added a fix to avoid rendering the path twice. It's not super nice, and I'm not sure it's worth it given that probably the network call of the atlas search would be much greater than the time needed to render the path twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if we want to have special cases depending on the path we could also specialize OperatorSearchDefinition.RenderArguments
and produce a different method for each path type, for example OperatorSearchDefinition.RenderArgumentsForSingleSearchPath
where we can pass additional arguments. This would require quite some changes, but could also allow to control which kind of path can be used with certain kinds of operators.
@BorisDog I've done some corrections following your suggestions, and it looks much better than before, thank you 😄 |
|
||
internal virtual (BsonValue, IBsonSerializer) RenderAndGetFieldSerializer(RenderArgs<TDocument> args) => (Render(args), null); | ||
|
||
internal (string renderedPath, IBsonSerializer fieldSerializer) RenderFieldWithSerializer(FieldDefinition<TDocument> fieldDefinition, RenderArgs<TDocument> args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RenderFieldWithSerializer
might be sound like "Render the field with provided serializer"
Would RenderFieldAndGetFieldSerializer
be more accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it makes sense
AssertRendered( | ||
subject.Equals("x", value), | ||
$"{{ equals: {{ path: 'x', value: {valueRendered} }} }}"); | ||
//There is no property called "x" where to pick up a properly configured GuidSerializer for the tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I fully understand this exclusion. Is the default guid serializer different from the one used by serialization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I need to improve the comment, I agree it's not super clear.
The issue here is that when we use a string path, we're going to get the default serializer for the value, so GuidSerializer
. In v3.0 we decided to have GuidRepresentation.Unspecified
as the default Guid representation, and this will throw an exception when trying to serialize/deserialize. Normally a user would go over this by registering a default GuidSerializer
with the proper representation (BsonSerializer.Register...(GuidSerializer)
), but we can't do it because the registered serializers are global and this would be a problem for all the other tests.
For the typed path on line 305 it's not an issue since we get the default field serializer from the class, and the Guid
property has the BsonGuidRepresentation
attribute on it.
This is more of a testing issue and should be solved when we can pass down the serialization domain 😁
@@ -313,29 +319,55 @@ public void Equals_should_render_supported_type<T>( | |||
new object[] { (float)1, "1", Exp(p => p.Float), nameof(Person.Float) }, | |||
new object[] { (double)1, "1", Exp(p => p.Double), nameof(Person.Double) }, | |||
new object[] { DateTime.MinValue, "ISODate(\"0001-01-01T00:00:00Z\")", Exp(p => p.Birthday), "dob" }, | |||
new object[] { DateTimeOffset.MaxValue, "ISODate(\"9999-12-31T23:59:59.999Z\")", Exp(p => p.DateTimeOffset), nameof(Person.DateTimeOffset) }, | |||
new object[] { DateTimeOffset.MaxValue, """{ "DateTime" : { "$date" : "9999-12-31T23:59:59.999Z" }, "Ticks" : 3155378975999999999, "Offset" : 0 }""", Exp(p => p.DateTimeOffset), nameof(Person.DateTimeOffset) }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So does this mean that we are changing the defaults also for non typed queries (SearchDefinitionBuilder<BsonDocument>
)?
If so, that might be considered a bug and a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it will change also for non typed queries. They would need to convert the DateTime
themselves if they want to keep the same behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably a breaking change, but I'm not sure this can be considered a bug. If you're working directly with Bson documents then I assume that you would like to control how to the values are serialized in any case.
@BorisDog, as an additional note. I've tried making an untyped query (using |
@@ -135,6 +160,8 @@ private protected enum OperatorType | |||
protected readonly SearchPathDefinition<TDocument> _path; | |||
protected readonly SearchScoreDefinition<TDocument> _score; | |||
|
|||
protected bool _useConfiguredSerialization = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced this is a good idea.
Might as well call this _useTheWrongSerializer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're not convinced about the naming or the actual API/solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced this bool
should even exist.
I somewhat tongue in cheek suggested a name that illustrates why this makes no sense to me.
We have a bug: we are using the wrong serializer.
We are fixing the bug: we now use the correct serializer.
But somehow we think a user would like to use the wrong serializer???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your point, but I suppose that some developers could already use this feature as is, and this would be considered a breaking change (even though it was not correct). The idea here would be to support this behaviour until next major and then get rid of it.
@BorisDog am I right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this sounds like a valid concern, I can't figure out any scenario in which it would actually be valid.
In essence we are replacing are "hand rolled" serializer (the ToBsonValue
method) with using the correct serializer.
There are only two scenarios:
- ToBsonValue and the configured serializer return the same serialized values
- Or they return different values
In the first case there is no problem.
In the second case the value returned by ToBsonValue
is wrong and I don't see how the query could have worked, and therefore why we would keep the existing behavior.
Can you point me to a test that passes when _useConfiguredSerialization
is false
and fails when it is true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is with pipelines using BsonDocument
(unfortunately this happens more often than we'd like).
For example DateTimeOffset
is serialized by default as document, while users with BsonDocument
pipelines might be relying on it to be serialized as BsonDate
.
I think in case we should introduce the "correct" behavior in the next major version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@papafe can you point me to a test that passes when _useConfiguredSerialization
is false
and fails when it is true
?
I want to confirm that this is not just a hypothetical concern, and if real, whether there isn't a better way to address than turning off the fix entirely by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rstam sorry for the late reply.
If you look at the tests Equals_should_render_supported_type
(it's already in main) and Equals_with_configured_serializer_should_render_supported_type
, they are essentially the same test but with different test data (EqualsSupportedTypesTestData
and EqualsWithConfiguredSerializerSupportedTypesTestData
). The only difference between the two sets of data is the line with DateTimeOffset
, and that's because in the current implementation that is serialized as a BsonDate
while the default DateTimeOffsetSerializer
uses a document representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rstam did you have the change to look at this?
@@ -135,6 +160,8 @@ private protected enum OperatorType | |||
protected readonly SearchPathDefinition<TDocument> _path; | |||
protected readonly SearchScoreDefinition<TDocument> _score; | |||
|
|||
protected bool _useConfiguredSerialization = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced this bool
should even exist.
I somewhat tongue in cheek suggested a name that illustrates why this makes no sense to me.
We have a bug: we are using the wrong serializer.
We are fixing the bug: we now use the correct serializer.
But somehow we think a user would like to use the wrong serializer???
@@ -135,6 +160,8 @@ private protected enum OperatorType | |||
protected readonly SearchPathDefinition<TDocument> _path; | |||
protected readonly SearchScoreDefinition<TDocument> _score; | |||
|
|||
protected bool _useConfiguredSerialization = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this sounds like a valid concern, I can't figure out any scenario in which it would actually be valid.
In essence we are replacing are "hand rolled" serializer (the ToBsonValue
method) with using the correct serializer.
There are only two scenarios:
- ToBsonValue and the configured serializer return the same serialized values
- Or they return different values
In the first case there is no problem.
In the second case the value returned by ToBsonValue
is wrong and I don't see how the query could have worked, and therefore why we would keep the existing behavior.
Can you point me to a test that passes when _useConfiguredSerialization
is false
and fails when it is true
?
JIRA ticket