Let's say I have an array of bytes:
var myArr = new byte[] { 0x61, 0x62, 0xc4, 0x85, 0xc4, 0x87 };
So it has 6 elements while it corresponds to utf8 abąć
which has 4 letters. Typically you do
Encoding.UTF8.GetString(myArr);
to convert it to a string. But lets assume that myArr
is actually bigger (there are more bytes at the end) but I do know (a priori to conversion) that I only want the first 4 letters. How can efficiently convert this array to the string? Also it would be preferable to have the index of the last byte in myArr
array (corresponding to the end of the converted string).
Example:
// 3 more bytes at the end of formerly defined myArr
var myArr = new byte[] { 0x61, 0x62, 0xc4, 0x85, 0xc4, 0x87, 0x01, 0x02, 0x03 };
var str = MyConvert(myArr, 4); // read 4 utf8 letters
// str is "abąć"
// possibly I want to know that MyConvert stoped at the index 6 in myArr
The resulting string str
object should have str.Length == 4
.
It looks like Decoder
has your back here, in particular with the somewhat huge Convert
method. I think you'd want:
var decoder = Encoding.UTF8.GetDecoder();
var chars = new char[4];
decoder.Convert(bytes, 0, bytes.Length, chars, 0, chars.Length,
true, out int bytesUsed, out int charsUsed, out bool completed);
Complete sample using the data in your question:
using System;
using System.Text;
public class Test
{
static void Main()
{
var bytes = new byte[] { 0x61, 0x62, 0xc4, 0x85, 0xc4, 0x87, 0x01, 0x02, 0x03 };
var decoder = Encoding.UTF8.GetDecoder();
var chars = new char[4];
decoder.Convert(bytes, 0, bytes.Length, chars, 0, chars.Length,
true, out int bytesUsed, out int charsUsed, out bool completed);
Console.WriteLine($"Completed: {completed}");
Console.WriteLine($"Bytes used: {bytesUsed}");
Console.WriteLine($"Chars used: {charsUsed}");
Console.WriteLine($"Text: {new string(chars, 0, charsUsed)}");
}
}
See more on this question at Stackoverflow