ejemplos - file_get_contents(''php//input'')

file_get_contents() rompe los caracteres UTF-8 (7)

Bien. Descubrí que file_get_contents () no está causando este problema. Hay una razón diferente de la que hablo en otra pregunta. Tonto de mí.

Vea esta pregunta: ¿Por qué DOM cambia la codificación?

Estoy cargando un HTML de un servidor externo. El código HTML tiene codificación UTF-8 y contiene caracteres como ľ, š, è, ť, ž, etc. Cuando cargo el HTML con file_get_contents () de esta manera:

$html = file_get_contents(''http://example.com/foreign.html'');

Desordena los caracteres UTF-8 y carga Å, ¾, ¤ y tonterías similares en lugar de los caracteres UTF-8 apropiados.

¿Como puedo resolver esto?

ACTUALIZAR:

Intenté guardar el HTML en un archivo y enviarlo con codificación UTF-8. Ambos no funcionan, por lo que significa que file_get_contents () ya está devolviendo HTML roto.

ACTUALIZACIÓN2:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="sk" lang="sk"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <meta http-equiv="Content-Style-Type" content="text/css" /> <meta http-equiv="Content-Language" content="sk" /> <title>Test</title> </head> <body> <?php $html = file_get_contents(''http://example.com''); echo htmlentities($html); ?> </body> </html>

Creo que simplemente tiene una doble conversión del tipo de personaje allí: D

Puede ser porque usted abrió un documento html dentro de un documento html. Así que tienes algo que se ve así al final

<!DOCTYPE html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <title></title> </head> <body> <!DOCTYPE html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <title>Test</title>.......

El uso de mb_detect_encoding por lo tanto puede llevarlo a otros problemas.

En idioma turco, mb_convert_encoding o cualquier otra conversión de juego de caracteres no funcionaba.

Y también urlencode no funcionó debido a espacio char convertido a + char. Debe ser% 20 para la codificación porcentual.

¡Este funcionó!

$url = rawurlencode($url); $url = str_replace("%3A", ":", $url); $url = str_replace("%2F", "/", $url); $data = file_get_contents($url);

Estoy trabajando con 35000 líneas de datos.

$f=fopen("veri1.txt","r"); $i=0; while(!feof($f)){ $i++; $line=mb_convert_encoding(fgets($f), ''HTML-ENTITIES'', "UTF-8"); echo $line; }

Este código convierte mis extraños personajes en normales.

Prueba esto también

$url = ''http://www.domain.com/''; $html = file_get_contents($url); //Change encoding to UTF-8 from ISO-8859-1 $html = iconv(''UTF-8'', ''ISO-8859-1//TRANSLIT'', $html);

Tuve un problema similar con el idioma polaco

Lo intenté:

$fileEndEnd = mb_convert_encoding($fileEndEnd, ''UTF-8'', mb_detect_encoding($fileEndEnd, ''UTF-8'', true));

Lo intenté:

$fileEndEnd = utf8_encode ( $fileEndEnd );

Lo intenté:

$fileEndEnd = iconv( "UTF-8", "UTF-8", $fileEndEnd );

Y entonces -

$fileEndEnd = mb_convert_encoding($fileEndEnd, ''HTML-ENTITIES'', "UTF-8");

Esto último funcionó perfectamente !!!!!!

Solución sugerida en los comentarios de la entrada manual de PHP para file_get_contents

function file_get_contents_utf8($fn) { $content = file_get_contents($fn); return mb_convert_encoding($content, ''UTF-8'', mb_detect_encoding($content, ''UTF-8, ISO-8859-1'', true)); }

También puede probar suerte con http://php.net/manual/en/function.mb-internal-encoding.php