Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improper truncation when reading from UTF-8 CHAR(n) fields containing characters outside of the Basic Multilingual Plane #1213

Open
YetNothingThunders opened this issue Feb 27, 2025 · 2 comments

Comments

@YetNothingThunders
Copy link

When using the ADO.NET provider to read a UTF-8 CHAR(n) field containing at least one character outside of the Basic Multilingual Plane (e.g. any emoji), the result will be improperly truncated. As an example, reading a CHAR(1) field containing the character '😊' (code point 0x1F60A) will result in a string value containing only the high surrogate (0xD83D). If this same character is stored in a VARCHAR(1) field, reading it works as expected.

I believe the cause of this issue can be found in GdsStatement.ReadRawValue:

var s = xdr.ReadString(innerCharset, field.Length);
if ((field.Length % field.Charset.BytesPerCharacter) == 0 &&
s.Length > field.CharCount)
{
return s.Substring(0, field.CharCount);
}

After reading the string value from the IXdrReader, that value is truncated to remove the extra characters that were present in the buffer as padding. However, this truncation combines usage of the DbField.CharCount property (the number of Unicode code points stored in the field) with the .NET string.Length property and string.Substring method (which are based on the number of UTF-16 code units), leading to incorrect behavior when a single code point is encoded using multiple code units.

@cincuranet
Copy link
Member

That's an interesting one. :) I'll look at it.

@mrotteveel
Copy link
Member

I had a similar issue in Jaybird (see FirebirdSQL/jaybird#760 and FirebirdSQL/jaybird@45ad2eb)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants