Skip to content

Issue with encoding of Unicode characters #1903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
scottbarstow opened this issue Mar 3, 2017 · 23 comments
Closed

Issue with encoding of Unicode characters #1903

scottbarstow opened this issue Mar 3, 2017 · 23 comments

Comments

@scottbarstow
Copy link
Contributor

scottbarstow commented Mar 3, 2017

When sending Japanese characters in SMS body, the message that goes out to the provider is not being encoded correctly.

If you set the SMS body to either:

Original: 日本語
Encoded (UTF-8): %e6%97%a5%e6%9c%ac%e8%aa%9e

It shows up at the destination as '???'

Here's the curl command to send the message:

curl -X POST http://sid:pwd@ip/restcomm/2012-04-24/Accounts/ACae6e420f425248d6a26948ages -d "To=number" -d "From=number" -d "Body=%e6%97%a5%e6%9c%ac%e8%aa%9e"

Here is the resulting payload from the SMS API:

<RestcommResponse>
  <SMSMessage>
    <Sid>sid</Sid>
    <DateCreated>Fri, 3 Mar 2017 01:29:16 +0000</DateCreated>
    <DateUpdated>Fri, 3 Mar 2017 01:29:16 +0000</DateUpdated>
    <DateSent/>
    <AccountSid>sid</AccountSid>
    <From>number</From>
    <To>number</To>
    <Body>日本語</Body>
    <Status>sending</Status>
    <Direction>outbound-api</Direction>
    <Price>0</Price>
    <PriceUnit>USD</PriceUnit>
    <ApiVersion>2012-04-24</ApiVersion>
    <Uri>/2012-04-24/Accounts/ACae6e420f425248d6a26948c17a9e2acf/SMS/Messages/SM85c966c2ede04fd481af30677c621440</Uri>
  </SMSMessage>

Last, here is how it shows up on the handset.

image

@leftyb
Copy link
Contributor

leftyb commented Mar 21, 2017

@scottbarstow @deruelle
You can define the encoding of the message using -d "Encoding=encoding".
At the moment we support :

UCS_2("UCS-2"),
UTF_8("UTF-8"),
GSM("GSM");

But I have been testing with Japanese and neither is working. Will try to do some more tests and will update.

Regards

@scottbarstow
Copy link
Contributor Author

Thanks for the update @leftyb

leftyb pushed a commit that referenced this issue Mar 23, 2017
leftyb pushed a commit that referenced this issue Mar 23, 2017
leftyb pushed a commit that referenced this issue Apr 5, 2017
maria-farooq pushed a commit that referenced this issue Apr 6, 2017
* master:
  Path to properly handle old format S3 URL and try to normalise them and update DB This close #2029
  Improvement for G1GC settings This refer to #2019
  Patch to protect when recording file not in the filesystem. This close #2028
  Patch Improve Actor Supervisor strategy and avoid timeouts This refer to #2022
  Patch Improve Actor Supervisor strategy and avoid timeouts This refer to #2022
  Patch Improve Actor Supervisor strategy and avoid timeouts This close #2023 This close #2022
  Work in progress to Improve Actor Supervisor strategy to avoid timeouts This refer to #2023
  issue #1903 multilingua OUTBOUND SMS.
  issue #1903
@scottbarstow scottbarstow reopened this Apr 11, 2017
@scottbarstow
Copy link
Contributor Author

This is still not fixed. I tried also sending Japanese from an RVD app and I get ??? for the message.

@scottbarstow
Copy link
Contributor Author

I also tested it just sending the characters via API call as before. Same result. Use the Curl command above to test and you'll be able to replicate. @deruelle @leftyb

@deruelle
Copy link
Member

@scottbarstow can you retest again, I wonder if I spoke too fast on upgrading but not cloud is upgraded. I just did a test from olympus and REST API and it's working for me. I receive the SMS with the japanese characters

@scottbarstow
Copy link
Contributor Author

scottbarstow commented Apr 18, 2017 via email

@scottbarstow
Copy link
Contributor Author

It's working @deruelle. Thanks for re-checking. Will close

@scottbarstow
Copy link
Contributor Author

scottbarstow commented Apr 19, 2017

@deruelle @leftyb fixed one thing and broke another. The message length is now only 70 chars for non-unicode messages. It appears that it's now encoding everything unicode, which is taking up 2 bytes per char.

I've proven it with the following message:

This works:
Welcome to Global Bank! Thanks for texting our new teller service. 123

This does not:
Welcome to Global Bank! Thanks for texting our new teller service. 1234

See: https://help.nexmo.com/hc/en-us/articles/204076866-How-long-is-a-single-SMS-body-

@deruelle
Copy link
Member

@scottbarstow I don't think we can do anything about it in this particular issue as Japanese needs to be unicode. The SMS Split is tracked separately in #1489. Ping me offline for priority on this one

@leftyb
Copy link
Contributor

leftyb commented Apr 20, 2017

@deruelle @scottbarstow The only solution that can be provided to solve the issue is to use GSM encoding by default (as it was before), and be able to specify the encoding when sending the message. For Rest API request this can be implemented/fixed fast (as said before to add -d "Encoding=encoding"). But for RVD we need to ask @otsakir help maybe to add that option.

@scottbarstow are you interested for Rest API request at the moment to send SMS messages?

@deruelle
Copy link
Member

@leftyb that's a good workaround for other countries that don't use unicode but it will not help for unicode countries we will need to solve #1489

@deruelle
Copy link
Member

Can we close this one @leftyb @scottbarstow ?

@scottbarstow
Copy link
Contributor Author

@leftyb @deruelle If we know authoritatively that Encoding=Unicode works for API then we can close. However, we need to open an issue related to this for RVD, as right now it does not support any other encoding.

@scottbarstow
Copy link
Contributor Author

scottbarstow commented Jun 27, 2017

@leftyb I just tested again with the following:

curl -X POST https://account:[email protected]/restcomm/2012-04-24/Accounts/account/SMS/Messages -d "To=to" -d "From=from" -d "Body=%e6%97%a5%e6%9c%ac%e8%aa%9e" -d "Encoding=UTF_8"
Still have the same issue, so I guess it's not fixed.

@scottbarstow
Copy link
Contributor Author

Also tried UCS_2 encoding. That's slightly better but (in that it renders Unicode chars) but the chars are wrong

@scottbarstow
Copy link
Contributor Author

@deruelle Kevin is scheduling a trip to Asia mid next month. This will have to be fixed prior to that and thoroughly tested, including RVD support for Unicode.

@marca56 Can you please make sure this gets on the right people's lists to get fixed?

@marca56
Copy link

marca56 commented Jul 1, 2017

@scottbarstow and @deruelle ... from the comment thread above, it looks like @leftyb was working on this. Shouldn't this be @gvagenas for Connect?

@leftyb
Copy link
Contributor

leftyb commented Jul 18, 2017

@scottbarstow I have tested with various languages and still API works properly. You dont need to use the encoded version of it, just the proper text, and then enable -d "Encoding=UCS_2"

Now for the RVD please open an issue.

@scottbarstow
Copy link
Contributor Author

@leftyb I've retested again this morning and it seems to work for me now as well. It was definitely a different result the other day. Thanks, will open separate issue for RVD

@leftyb
Copy link
Contributor

leftyb commented Jul 18, 2017

Great @scottbarstow.

@deruelle just to comment that tested enabling text spiting, after help of @vetss, and works as well. So it seems that the message can reach to 125 characters.

@deruelle
Copy link
Member

@leftyb great can it even go beyond 140 characters and be split correctly even for regular encoding ?

@leftyb
Copy link
Contributor

leftyb commented Jul 19, 2017

@deruelle It seems that there is a limitation from RC it self. I will need to check on that.

@deruelle
Copy link
Member

@leftyb let's open a separate issue to track that then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants