You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using the ADO.NET provider to read a UTF-8 CHAR(n) field containing at least one character outside of the Basic Multilingual Plane (e.g. any emoji), the result will be improperly truncated. As an example, reading a CHAR(1) field containing the character '😊' (code point 0x1F60A) will result in a string value containing only the high surrogate (0xD83D). If this same character is stored in a VARCHAR(1) field, reading it works as expected.
I believe the cause of this issue can be found in GdsStatement.ReadRawValue:
After reading the string value from the IXdrReader, that value is truncated to remove the extra characters that were present in the buffer as padding. However, this truncation combines usage of the DbField.CharCount property (the number of Unicode code points stored in the field) with the .NET string.Length property and string.Substring method (which are based on the number of UTF-16 code units), leading to incorrect behavior when a single code point is encoded using multiple code units.
The text was updated successfully, but these errors were encountered:
When using the ADO.NET provider to read a UTF-8 CHAR(n) field containing at least one character outside of the Basic Multilingual Plane (e.g. any emoji), the result will be improperly truncated. As an example, reading a CHAR(1) field containing the character '😊' (code point 0x1F60A) will result in a string value containing only the high surrogate (0xD83D). If this same character is stored in a VARCHAR(1) field, reading it works as expected.
I believe the cause of this issue can be found in
GdsStatement.ReadRawValue
:NETProvider/src/FirebirdSql.Data.FirebirdClient/Client/Managed/Version10/GdsStatement.cs
Lines 1534 to 1539 in 4230c1e
After reading the string value from the
IXdrReader
, that value is truncated to remove the extra characters that were present in the buffer as padding. However, this truncation combines usage of theDbField.CharCount
property (the number of Unicode code points stored in the field) with the .NETstring.Length
property andstring.Substring
method (which are based on the number of UTF-16 code units), leading to incorrect behavior when a single code point is encoded using multiple code units.The text was updated successfully, but these errors were encountered: