Bug 28760 - System.Text.Encoding.UTF8.GetString returns corrupted string
Summary: System.Text.Encoding.UTF8.GetString returns corrupted string
Status: RESOLVED NOT_REPRODUCIBLE
Alias: None
Product: Class Libraries
Classification: Mono
Component: System ()
Version: 3.12.0
Hardware: PC Linux
: --- normal
Target Milestone: Untriaged
Assignee: Bugzilla
URL:
Depends on:
Blocks:
 
Reported: 2015-04-03 21:35 UTC by webmaster
Modified: 2015-04-08 19:53 UTC (History)
2 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED NOT_REPRODUCIBLE

Description webmaster 2015-04-03 21:35:07 UTC
mono 3.12.1
os: ubuntu server 14.04 (64 bit)

BitConverter.ToString(System.Text.Encoding.UTF8.GetBytes("a中文显示d")) returns the following valid byte array

61-E4-B8-AD-E6-96-87-E6-98-BE-E7-A4-BA-64

but the following will not return the original string

System.Text.Encoding.UTF8.GetString(System.Text.Encoding.UTF8.GetBytes("a中文显示d"))

it's somehow corrupted, namely a Console.Write of the aobove gives "a?????a".
Comment 1 webmaster 2015-04-03 21:47:40 UTC
OK, even Console.WriteLine("a中文显示d") doesn't print the original text ...
Comment 2 webmaster 2015-04-04 00:10:41 UTC
But "a中文显示d" == System.Text.Encoding.UTF8.GetString(System.Text.Encoding.UTF8.GetBytes("a中文显示d")) returns true.
Comment 3 webmaster 2015-04-04 00:58:07 UTC
By the way, the above anormaly happens in nodejs environment (mono is accessed via edge.js). I checked if I am running a top level console application, the above problem does not appear.
Comment 4 webmaster 2015-04-04 01:04:47 UTC
In order to avoid potential problems in marshaling layer, the text is base64 encoded in nodejs part and decoded inside mono part:


var edge = require('edge');
var base64 = require('js-base64').Base64;

var f = edge.func(function () {/*
    async (input) => {
        var buf = Convert.FromBase64String(input as string);
        var x = System.Text.Encoding.UTF8.GetString(buf);
        Console.WriteLine("v82clr: `" + x + "`");
        Console.WriteLine("a中文显示d" == x);
        return x;
    }
*/});

var k = "a中文显示d";
var r = f(base64.encode(k), true);
console.log('original: "' + k +'"');
console.log('clr2v8: `' + r + '`');
Comment 5 Marek Safar 2015-04-08 13:15:17 UTC
It works for me with normal console.
Comment 6 webmaster 2015-04-08 19:53:17 UTC
Did you tried to run it inside of the nodejs + edgejs environment, like what's shown above? It's fine (namely not "corrupted") if one start the mono process inside a shell.

What I found is that on my server specified above (two of them, one physical machine, one virtual machine) it is not really corrupted but the console inside of the nodejs + edgejs can not display utf-8 strings event the "LANG" environment variable is set properly in the execution context. Maybe it is because the font directories were not inherited from parent process or some other problems.

I belief it is still a problem if not a very important one. For example, it mislead us to a wrong direction in finding a real encoding setting problem in other part of our system. But I am not sure it is mono's problem or not.