Bug 4050

Summary: TAO Codesets: UTF8_Latin1_Translator doesn't handle U+007F+ codepoints properly
Product: TAO Reporter: Daniel van den Ouden <daniel.van.den.ouden>
Component: otherAssignee: DOC Center Support List (internal) <tao-support>
Status: NEW ---    
Severity: normal    
Priority: P3    
Version: 2.1.2   
Hardware: All   
OS: All   
URL: https://groups.google.com/forum/?fromgroups&hl=en#!topic/comp.soft-sys.ace/kkC28cPvDgU
Attachments: Patch and unit test

Description Daniel van den Ouden 2012-06-15 03:53:26 CDT
Created attachment 1405 [details]
Patch and unit test

Overview:

    Using codepoint above U+007F with the UTF8_Latin1_Translator yields incorrect results. Multibyte sequences are not handled properly when translating from UTF-8 to Latin1. Codepoints in the range [U+0080 - U+00BF] are incorrectly written as single bytes when translating from Latin1 to UTF-8.

Steps to Reproduce: 

    1) Run the attached unit test. It will start a client with UTF-8 and server with Latin1 as native codeset. The server will use the translator. The client will send two strings: one containing all ASCII codepoints, the other containing all extra Latin1 codepoints. Both strings are encoded in UTF-8. The server will translate them to Latin1, translate them back to UTF-8 and send them back to the client. The client will compare the received strings with the one it sent to see if they're identical.

Actual Results: 

    The string with ASCII characters is handled correctly, the other string not.

Expected Results:

    Both strings should be handled correctly.

Build Date & Platform: 

    TAO version 2.1.2, released Sat May 19 14:28:57 CEST 2012, tested on Windows XP

Additional Builds and Platforms: 

    TAO version 2.0.2, released Wed Apr 20 09:52:52 CEST 2011, tested on AIX 5.3

Additional Information:

    I have no idea how to create a patch. I've attached a zip archive containing the unit test, unified diff for the translator cpp and a fixed version of the translator cpp.