Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoding binary as utf-8 #132

Open
martinamps opened this issue Sep 16, 2015 · 6 comments
Open

Decoding binary as utf-8 #132

martinamps opened this issue Sep 16, 2015 · 6 comments

Comments

@martinamps
Copy link

I get this trace:

Traceback (most recent call last):
  File "./test.py", line 33, in <module>
    main()
  File "./test.py", line 26, in main
    for binlogevent in stream:
  File "/usr/local/lib/python2.7/dist-packages/pymysqlreplication/binlogstream.py", line 262, in fetchone
    self.__freeze_schema)
  File "/usr/local/lib/python2.7/dist-packages/pymysqlreplication/packet.py", line 98, in __init__
    freeze_schema = freeze_schema)
  File "/usr/local/lib/python2.7/dist-packages/pymysqlreplication/event.py", line 141, in __init__
    self.query = tmp.decode("utf-8")
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xae in position 460: invalid start byte

I printed a repr of the packet and it is essentially:

INSERT INTO x(......, ip) VALUES (....,  'b\xae\xe1\xbd');

This is row based replication where the master was originally sent INET6_ATON('::1') for example.

What's the recommended solution here? I'm surprised no one else has hit this as many column types leverage binary.

Thanks!

@julien-duponchelle
Copy link
Owner

Can you post the type of the column?

Le jeu. 17 sept. 2015 à 00:56, Martin Amps [email protected] a
écrit :

I get this trace:

Traceback (most recent call last):
File "./test.py", line 33, in
main()
File "./test.py", line 26, in main
for binlogevent in stream:
File
"/usr/local/lib/python2.7/dist-packages/pymysqlreplication/binlogstream.py",
line 262, in fetchone
self.
_freeze_schema) File
"/usr/local/lib/python2.7/dist-packages/pymysqlreplication/packet.py", line
98, in _init
freeze_schema = freeze_schema)
File "/usr/local/lib/python2.7/dist-packages/pymysqlreplication/event.py",
line 141, in init
self.query = tmp.decode("utf-8")
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xae in position 460:
invalid start byte

I printed a repr of the packet and it is essentially:

INSERT INTO x(......, ip) VALUES (...., 'b\xae\xe1\xbd');

This is row based replication where the master was originally sent
INET6_ATON('::1') for example.

What's the recommended solution here? I'm surprised no one else has hit
this as many column types leverage binary.

Thanks!


Reply to this email directly or view it on GitHub
#132.

@martinamps
Copy link
Author

Apologies - it’s a varbinary(15)

On Sep 16, 2015, at 11:13 PM, Julien Duponchelle [email protected] wrote:

Can you post the type of the column?

Le jeu. 17 sept. 2015 à 00:56, Martin Amps [email protected] a
écrit :

I get this trace:

Traceback (most recent call last):
File "./test.py", line 33, in
main()
File "./test.py", line 26, in main
for binlogevent in stream:
File
"/usr/local/lib/python2.7/dist-packages/pymysqlreplication/binlogstream.py",
line 262, in fetchone
self.
_freeze_schema) File
"/usr/local/lib/python2.7/dist-packages/pymysqlreplication/packet.py", line
98, in _init
freeze_schema = freeze_schema)
File "/usr/local/lib/python2.7/dist-packages/pymysqlreplication/event.py",
line 141, in init
self.query = tmp.decode("utf-8")
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xae in position 460:
invalid start byte

I printed a repr of the packet and it is essentially:

INSERT INTO x(......, ip) VALUES (...., 'b\xae\xe1\xbd');

This is row based replication where the master was originally sent
INET6_ATON('::1') for example.

What's the recommended solution here? I'm surprised no one else has hit
this as many column types leverage binary.

Thanks!


Reply to this email directly or view it on GitHub
#132.


Reply to this email directly or view it on GitHub #132 (comment).

@julien-duponchelle
Copy link
Owner

No problem :) Thanks a lot for the report. Sorry if I was rude in my
question I just wake up :P

Le jeu. 17 sept. 2015 à 08:15, Martin Amps [email protected] a
écrit :

Apologies - it’s a varbinary(15)

On Sep 16, 2015, at 11:13 PM, Julien Duponchelle <
[email protected]> wrote:

Can you post the type of the column?

Le jeu. 17 sept. 2015 à 00:56, Martin Amps [email protected] a
écrit :

I get this trace:

Traceback (most recent call last):
File "./test.py", line 33, in
main()
File "./test.py", line 26, in main
for binlogevent in stream:
File

"/usr/local/lib/python2.7/dist-packages/pymysqlreplication/binlogstream.py",
line 262, in fetchone
self.
_freeze_schema) File
"/usr/local/lib/python2.7/dist-packages/pymysqlreplication/packet.py",
line
98, in _init
freeze_schema = freeze_schema)
File
"/usr/local/lib/python2.7/dist-packages/pymysqlreplication/event.py",
line 141, in init
self.query = tmp.decode("utf-8")
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xae in position
460:
invalid start byte

I printed a repr of the packet and it is essentially:

INSERT INTO x(......, ip) VALUES (...., 'b\xae\xe1\xbd');

This is row based replication where the master was originally sent
INET6_ATON('::1') for example.

What's the recommended solution here? I'm surprised no one else has hit
this as many column types leverage binary.

Thanks!


Reply to this email directly or view it on GitHub
#132.


Reply to this email directly or view it on GitHub <
#132 (comment)
.


Reply to this email directly or view it on GitHub
#132 (comment)
.

@martinamps
Copy link
Author

No problem at all. Let me know if I can help at all debugging further - right now I just wrapped it in a try: except: block to make it fail a bit more gracefully, was planning to dive in a little deeper tomorrow. Bed time here on the west coast!

On Sep 16, 2015, at 11:25 PM, Julien Duponchelle [email protected] wrote:

No problem :) Thanks a lot for the report. Sorry if I was rude in my
question I just wake up :P

Le jeu. 17 sept. 2015 à 08:15, Martin Amps [email protected] a
écrit :

Apologies - it’s a varbinary(15)

On Sep 16, 2015, at 11:13 PM, Julien Duponchelle <
[email protected]> wrote:

Can you post the type of the column?

Le jeu. 17 sept. 2015 à 00:56, Martin Amps [email protected] a
écrit :

I get this trace:

Traceback (most recent call last):
File "./test.py", line 33, in
main()
File "./test.py", line 26, in main
for binlogevent in stream:
File

"/usr/local/lib/python2.7/dist-packages/pymysqlreplication/binlogstream.py",
line 262, in fetchone
self.
_freeze_schema) File
"/usr/local/lib/python2.7/dist-packages/pymysqlreplication/packet.py",
line
98, in _init
freeze_schema = freeze_schema)
File
"/usr/local/lib/python2.7/dist-packages/pymysqlreplication/event.py",
line 141, in init
self.query = tmp.decode("utf-8")
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xae in position
460:
invalid start byte

I printed a repr of the packet and it is essentially:

INSERT INTO x(......, ip) VALUES (...., 'b\xae\xe1\xbd');

This is row based replication where the master was originally sent
INET6_ATON('::1') for example.

What's the recommended solution here? I'm surprised no one else has hit
this as many column types leverage binary.

Thanks!


Reply to this email directly or view it on GitHub
#132.


Reply to this email directly or view it on GitHub <
#132 (comment)
.


Reply to this email directly or view it on GitHub
#132 (comment)
.


Reply to this email directly or view it on GitHub #132 (comment).

@julien-duponchelle
Copy link
Owner

I think we can ignore the invalid unicode char (do not break existing app) or return a byte string

@martinamps
Copy link
Author

I agree, it would probably be useful to parse it out in future but for now the potential exceptions should be fixed

On Sep 16, 2015, at 11:28 PM, Julien Duponchelle [email protected] wrote:

I think we can ignore the invalid unicode char (do not break existing app) or return a byte string


Reply to this email directly or view it on GitHub #132 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants