c++ - codepage - ¿Cómo se usa WideCharToMultiByte correctamente?

codepage español (3)

Aquí hay un par de funciones (basadas en el ejemplo de Brian Bondy) que usan WideCharToMultiByte y MultiByteToWideChar para convertir entre std :: wstring y std :: string usando utf8 para no perder ningún dato.

// Convert a wide Unicode string to an UTF8 string std::string utf8_encode(const std::wstring &wstr) { if( wstr.empty() ) return std::string(); int size_needed = WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), NULL, 0, NULL, NULL); std::string strTo( size_needed, 0 ); WideCharToMultiByte (CP_UTF8, 0, &wstr[0], (int)wstr.size(), &strTo[0], size_needed, NULL, NULL); return strTo; } // Convert an UTF8 string to a wide Unicode String std::wstring utf8_decode(const std::string &str) { if( str.empty() ) return std::wstring(); int size_needed = MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), NULL, 0); std::wstring wstrTo( size_needed, 0 ); MultiByteToWideChar (CP_UTF8, 0, &str[0], (int)str.size(), &wstrTo[0], size_needed); return wstrTo; }

He leído la documentación en WideCharToMultiByte , pero estoy atascado en este parámetro:

lpMultiByteStr [out] Pointer to a buffer that receives the converted string.

No estoy muy seguro de cómo inicializar correctamente la variable y alimentarla a la función

Elaboración de la answer proporcionada por Brian R. Bondy: Aquí hay un ejemplo que muestra por qué no se puede simplemente dimensionar el búfer de salida a la cantidad de caracteres anchos en la cadena fuente:

#include <windows.h> #include <stdio.h> #include <wchar.h> #include <string.h> /* string consisting of several Asian characters */ wchar_t wcsString[] = L"/u9580/u961c/u9640/u963f/u963b/u9644"; int main() { size_t wcsChars = wcslen( wcsString); size_t sizeRequired = WideCharToMultiByte( 950, 0, wcsString, -1, NULL, 0, NULL, NULL); printf( "Wide chars in wcsString: %u/n", wcsChars); printf( "Bytes required for CP950 encoding (excluding NUL terminator): %u/n", sizeRequired-1); sizeRequired = WideCharToMultiByte( CP_UTF8, 0, wcsString, -1, NULL, 0, NULL, NULL); printf( "Bytes required for UTF8 encoding (excluding NUL terminator): %u/n", sizeRequired-1); }

Y el resultado:

Wide chars in wcsString: 6 Bytes required for CP950 encoding (excluding NUL terminator): 12 Bytes required for UTF8 encoding (excluding NUL terminator): 18

Utiliza el parámetro lpMultiByteStr [out] creando una nueva matriz de caracteres. A continuación, pasa esta matriz de caracteres para que se llene. Solo necesita inicializar la longitud de la cadena + 1 para que pueda tener una cadena con terminación nula después de la conversión.

Aquí hay un par de funciones de ayuda útiles para usted, que muestran el uso de todos los parámetros.

#include <string> std::string wstrtostr(const std::wstring &wstr) { // Convert a Unicode string to an ASCII string std::string strTo; char *szTo = new char[wstr.length() + 1]; szTo[wstr.size()] = ''/0''; WideCharToMultiByte(CP_ACP, 0, wstr.c_str(), -1, szTo, (int)wstr.length(), NULL, NULL); strTo = szTo; delete[] szTo; return strTo; } std::wstring strtowstr(const std::string &str) { // Convert an ASCII string to a Unicode String std::wstring wstrTo; wchar_t *wszTo = new wchar_t[str.length() + 1]; wszTo[str.size()] = L''/0''; MultiByteToWideChar(CP_ACP, 0, str.c_str(), -1, wszTo, (int)str.length()); wstrTo = wszTo; delete[] wszTo; return wstrTo; }

En cualquier momento en la documentación cuando ve que tiene un parámetro que es un puntero a un tipo, y le dicen que es una variable de salida, querrá crear ese tipo y luego pasarle un puntero. La función usará ese puntero para llenar su variable.

Entonces puedes entender esto mejor:

//pX is an out parameter, it fills your variable with 10. void fillXWith10(int *pX) { *pX = 10; } int main(int argc, char ** argv) { int X; fillXWith10(&X); return 0; }