php - utf8 - ¿Cómo eliminar múltiples secuencias de BOM UTF-8 antes de "<! DOCTYPE>"?

utf 8 sin bom xml (9)

Usando PHP5 (cgi) para sacar archivos de plantilla del sistema de archivos y tener problemas escupiendo HTML sin formato.

private function fetch($name) { $path = $this->j->config[''template_path''] . $name . ''.html''; if (!file_exists($path)) { dbgerror(''Could not find the template "'' . $name . ''" in '' . $path); } $f = fopen($path, ''r''); $t = fread($f, filesize($path)); fclose($f); if (substr($t, 0, 3) == b''/xef/xbb/xbf'') { $t = substr($t, 3); } return $t; }

A pesar de que he agregado la corrección BOM, sigo teniendo problemas para que Firefox la acepte. Puede ver una copia en vivo aquí: http://ircb.in/jisti/ (y el archivo de plantilla que lancé en http://ircb.in/jisti/home.html si desea verificarlo)

¿Algúna idea de cómo arreglar esto? o_o

Esta resolución de funtion global para chatset base del sistema UTF-8. Tanques!

Esto podría ayudar. avíseme si le interesa que amplíe mi proceso de pensamiento.

<?php // // labled TESTINGSTRIPZ.php // define(''CHARSET'', ''UTF-8''); $stringy = "/xef/xbb/xbf/"quoted text/" "; $str_find_array = array( "/xef/xbb/xbf"); $str_replace_array = array( ''''); $RESULT = trim( mb_convert_encoding( str_replace( $str_find_array, $str_replace_array, strip_tags( $stringy ) ), ''UTF-8'', mb_detect_encoding( strip_tags($stringy) ) ) ); print("YOUR RESULT IS: " . $RESULT.PHP_EOL); ?>

Resultado:

terminal$ php TESTINGSTRIPZ.php YOUR RESULT IS: "quoted text" // < with no hidden char.

Otra forma de eliminar la lista de materiales que es el punto de código Unicode U + FEFF

$str = preg_replace(''//x{FEFF}/u'', '''', $file);

Si está leyendo alguna API usando file_get_contents y obtuvo un NULL inexplicable de json_decode , verifique el valor de json_last_error() : a veces el valor devuelto por file_get_contents tendrá una lista de materiales extraña que es casi invisible cuando inspecciona la cadena, pero hará json_last_error() para devolver JSON_ERROR_SYNTAX (4).

>>> $json = file_get_contents("http://api-guiaserv.seade.gov.br/v1/orgao/all"); => "/t{"orgao":[{"Nome":"Tribunal de Justi/u00e7a","ID_Orgao":"59","Condicao":"1"}, ...]}" >>> json_decode($json); => null >>>

En este caso, verifique los primeros 3 bytes; repetirlos no es muy útil porque la lista de materiales es invisible en la mayoría de las configuraciones:

>>> substr($json, 0, 3) => " " >>> substr($json, 0, 3) == pack(''H*'',''EFBBBF''); => true >>>

Si la línea de arriba muestra TRUE para usted, entonces una prueba simple puede solucionar el problema:

>>> json_decode($json[0] == "{" ? $json : substr($json, 3)) => {#204 +"orgao": [ {#203 +"Nome": "Tribunal de Justiça", +"ID_Orgao": "59", +"Condicao": "1", }, ], ... }

Un método extra para hacer el mismo trabajo:

function remove_utf8_bom_head($text) { if(substr(bin2hex($text), 0, 6) === ''efbbbf'') { $text = substr($text, 3); } return $text; }

Los otros métodos que encontré no pueden funcionar en mi caso.

Espero que ayude en algún caso especial.

si alguien utiliza la importación de CSV, entonces debajo del código útil

$header = fgetcsv($handle); foreach($header as $key=> $val) { $bom = pack(''H*'',''EFBBBF''); $val = preg_replace("/^$bom/", '''', $val); $header[$key] = $val; }

tratar:

// -------- read the file-content ---- $str = file_get_contents($source_file); // -------- remove the utf-8 BOM ---- $str = str_replace("/xEF/xBB/xBF",'''',$str); // -------- get the Object from JSON ---- $obj = json_decode($str);

usarías el siguiente código para eliminar utf8 bom

//Remove UTF8 Bom function remove_utf8_bom($text) { $bom = pack(''H*'',''EFBBBF''); $text = preg_replace("/^$bom/", '''', $text); return $text; }

b''/xef/xbb/xbf'' representa la cadena literal "/ xef / xbb / xbf". Si desea buscar una lista de materiales, debe usar comillas dobles, por lo que las secuencias /x realidad se interpretan en bytes:

"/xef/xbb/xbf"

Sus archivos también parecen contener mucha más basura que una sola lista de materiales líder:

$ curl http://ircb.in/jisti/ | xxd 0000000: efbb bfef bbbf efbb bfef bbbf efbb bfef ................ 0000010: bbbf efbb bf3c 2144 4f43 5459 5045 2068 .....<!DOCTYPE h 0000020: 746d 6c3e 0a3c 6874 6d6c 3e0a 3c68 6561 tml>.<html>.<hea ...