Summary: | TAO Codesets: UTF8_Latin1_Translator doesn't handle U+007F+ codepoints properly | ||
---|---|---|---|
Product: | TAO | Reporter: | Daniel van den Ouden <daniel.van.den.ouden> |
Component: | other | Assignee: | DOC Center Support List (internal) <tao-support> |
Status: | NEW --- | ||
Severity: | normal | ||
Priority: | P3 | ||
Version: | 2.1.2 | ||
Hardware: | All | ||
OS: | All | ||
URL: | https://groups.google.com/forum/?fromgroups&hl=en#!topic/comp.soft-sys.ace/kkC28cPvDgU | ||
Attachments: | Patch and unit test |
Created attachment 1405 [details] Patch and unit test Overview: Using codepoint above U+007F with the UTF8_Latin1_Translator yields incorrect results. Multibyte sequences are not handled properly when translating from UTF-8 to Latin1. Codepoints in the range [U+0080 - U+00BF] are incorrectly written as single bytes when translating from Latin1 to UTF-8. Steps to Reproduce: 1) Run the attached unit test. It will start a client with UTF-8 and server with Latin1 as native codeset. The server will use the translator. The client will send two strings: one containing all ASCII codepoints, the other containing all extra Latin1 codepoints. Both strings are encoded in UTF-8. The server will translate them to Latin1, translate them back to UTF-8 and send them back to the client. The client will compare the received strings with the one it sent to see if they're identical. Actual Results: The string with ASCII characters is handled correctly, the other string not. Expected Results: Both strings should be handled correctly. Build Date & Platform: TAO version 2.1.2, released Sat May 19 14:28:57 CEST 2012, tested on Windows XP Additional Builds and Platforms: TAO version 2.0.2, released Wed Apr 20 09:52:52 CEST 2011, tested on AIX 5.3 Additional Information: I have no idea how to create a patch. I've attached a zip archive containing the unit test, unified diff for the translator cpp and a fixed version of the translator cpp.