utf8_encode utf8 tildes texto quitar preg_replace mostrar especiales ejemplo caracteres acentos php iconv

utf8 - ¿Cómo elimino los acentos de los caracteres en una cadena de PHP?



quitar tildes a texto php (20)

Estoy intentando eliminar los acentos de los caracteres en la cadena PHP como el primer paso para hacer que la cadena se pueda utilizar en una URL.

Estoy usando el siguiente código:

$input = "Fóø Bår"; setlocale(LC_ALL, "en_US.utf8"); $output = iconv("utf-8", "ascii//TRANSLIT", $input); print($output);

La salida que esperaría sería algo como esto:

F''oo Bar

Sin embargo, en lugar de transliterar los caracteres acentuados, se reemplazan con signos de interrogación:

F?? B?r

Todo lo que puedo encontrar en línea indica que establecer la configuración regional solucionará este problema, sin embargo, ya lo estoy haciendo. Ya he comprobado los siguientes detalles:

  1. La configuración regional que estoy configurando es compatible con el servidor (incluida en la lista producida por la locale -a )
  2. Las codificaciones de origen y destino (UTF-8 y ASCII) son compatibles con la versión del servidor de iconv (incluida en la lista producida por iconv -l )
  3. La cadena de entrada está codificada en UTF-8 (verificada mediante la función mb_check_encoding de PHP, como se sugiere en la respuesta del mercator )
  4. La llamada a setlocale es exitosa (devuelve ''en_US.utf8'' lugar de FALSE )

La causa del problema:

El servidor está utilizando la implementación incorrecta de iconv. Tiene la versión glibc en lugar de la versión libiconv requerida.

Tenga en cuenta que la función iconv en algunos sistemas puede no funcionar como espera. En tal caso, sería una buena idea instalar la biblioteca libiconv de GNU. Lo más probable es que termine con resultados más consistentes.
- Introducción del manual de PHP a iconv

Los detalles sobre la implementación de iconv que utiliza PHP están incluidos en el resultado de la función phpinfo .

(No puedo volver a compilar PHP con la biblioteca iconv correcta en el servidor con el que estoy trabajando para este proyecto, por lo que la respuesta que he aceptado a continuación es la que resultó más útil para eliminar acentos sin el respaldo de iconv).


¿Qué pasa con la implementación de WordPress ?

function remove_accents($string) { if ( !preg_match(''/[/x80-/xff]/'', $string) ) return $string; $chars = array( // Decompositions for Latin-1 Supplement chr(195).chr(128) => ''A'', chr(195).chr(129) => ''A'', chr(195).chr(130) => ''A'', chr(195).chr(131) => ''A'', chr(195).chr(132) => ''A'', chr(195).chr(133) => ''A'', chr(195).chr(135) => ''C'', chr(195).chr(136) => ''E'', chr(195).chr(137) => ''E'', chr(195).chr(138) => ''E'', chr(195).chr(139) => ''E'', chr(195).chr(140) => ''I'', chr(195).chr(141) => ''I'', chr(195).chr(142) => ''I'', chr(195).chr(143) => ''I'', chr(195).chr(145) => ''N'', chr(195).chr(146) => ''O'', chr(195).chr(147) => ''O'', chr(195).chr(148) => ''O'', chr(195).chr(149) => ''O'', chr(195).chr(150) => ''O'', chr(195).chr(153) => ''U'', chr(195).chr(154) => ''U'', chr(195).chr(155) => ''U'', chr(195).chr(156) => ''U'', chr(195).chr(157) => ''Y'', chr(195).chr(159) => ''s'', chr(195).chr(160) => ''a'', chr(195).chr(161) => ''a'', chr(195).chr(162) => ''a'', chr(195).chr(163) => ''a'', chr(195).chr(164) => ''a'', chr(195).chr(165) => ''a'', chr(195).chr(167) => ''c'', chr(195).chr(168) => ''e'', chr(195).chr(169) => ''e'', chr(195).chr(170) => ''e'', chr(195).chr(171) => ''e'', chr(195).chr(172) => ''i'', chr(195).chr(173) => ''i'', chr(195).chr(174) => ''i'', chr(195).chr(175) => ''i'', chr(195).chr(177) => ''n'', chr(195).chr(178) => ''o'', chr(195).chr(179) => ''o'', chr(195).chr(180) => ''o'', chr(195).chr(181) => ''o'', chr(195).chr(182) => ''o'', chr(195).chr(182) => ''o'', chr(195).chr(185) => ''u'', chr(195).chr(186) => ''u'', chr(195).chr(187) => ''u'', chr(195).chr(188) => ''u'', chr(195).chr(189) => ''y'', chr(195).chr(191) => ''y'', // Decompositions for Latin Extended-A chr(196).chr(128) => ''A'', chr(196).chr(129) => ''a'', chr(196).chr(130) => ''A'', chr(196).chr(131) => ''a'', chr(196).chr(132) => ''A'', chr(196).chr(133) => ''a'', chr(196).chr(134) => ''C'', chr(196).chr(135) => ''c'', chr(196).chr(136) => ''C'', chr(196).chr(137) => ''c'', chr(196).chr(138) => ''C'', chr(196).chr(139) => ''c'', chr(196).chr(140) => ''C'', chr(196).chr(141) => ''c'', chr(196).chr(142) => ''D'', chr(196).chr(143) => ''d'', chr(196).chr(144) => ''D'', chr(196).chr(145) => ''d'', chr(196).chr(146) => ''E'', chr(196).chr(147) => ''e'', chr(196).chr(148) => ''E'', chr(196).chr(149) => ''e'', chr(196).chr(150) => ''E'', chr(196).chr(151) => ''e'', chr(196).chr(152) => ''E'', chr(196).chr(153) => ''e'', chr(196).chr(154) => ''E'', chr(196).chr(155) => ''e'', chr(196).chr(156) => ''G'', chr(196).chr(157) => ''g'', chr(196).chr(158) => ''G'', chr(196).chr(159) => ''g'', chr(196).chr(160) => ''G'', chr(196).chr(161) => ''g'', chr(196).chr(162) => ''G'', chr(196).chr(163) => ''g'', chr(196).chr(164) => ''H'', chr(196).chr(165) => ''h'', chr(196).chr(166) => ''H'', chr(196).chr(167) => ''h'', chr(196).chr(168) => ''I'', chr(196).chr(169) => ''i'', chr(196).chr(170) => ''I'', chr(196).chr(171) => ''i'', chr(196).chr(172) => ''I'', chr(196).chr(173) => ''i'', chr(196).chr(174) => ''I'', chr(196).chr(175) => ''i'', chr(196).chr(176) => ''I'', chr(196).chr(177) => ''i'', chr(196).chr(178) => ''IJ'',chr(196).chr(179) => ''ij'', chr(196).chr(180) => ''J'', chr(196).chr(181) => ''j'', chr(196).chr(182) => ''K'', chr(196).chr(183) => ''k'', chr(196).chr(184) => ''k'', chr(196).chr(185) => ''L'', chr(196).chr(186) => ''l'', chr(196).chr(187) => ''L'', chr(196).chr(188) => ''l'', chr(196).chr(189) => ''L'', chr(196).chr(190) => ''l'', chr(196).chr(191) => ''L'', chr(197).chr(128) => ''l'', chr(197).chr(129) => ''L'', chr(197).chr(130) => ''l'', chr(197).chr(131) => ''N'', chr(197).chr(132) => ''n'', chr(197).chr(133) => ''N'', chr(197).chr(134) => ''n'', chr(197).chr(135) => ''N'', chr(197).chr(136) => ''n'', chr(197).chr(137) => ''N'', chr(197).chr(138) => ''n'', chr(197).chr(139) => ''N'', chr(197).chr(140) => ''O'', chr(197).chr(141) => ''o'', chr(197).chr(142) => ''O'', chr(197).chr(143) => ''o'', chr(197).chr(144) => ''O'', chr(197).chr(145) => ''o'', chr(197).chr(146) => ''OE'',chr(197).chr(147) => ''oe'', chr(197).chr(148) => ''R'',chr(197).chr(149) => ''r'', chr(197).chr(150) => ''R'',chr(197).chr(151) => ''r'', chr(197).chr(152) => ''R'',chr(197).chr(153) => ''r'', chr(197).chr(154) => ''S'',chr(197).chr(155) => ''s'', chr(197).chr(156) => ''S'',chr(197).chr(157) => ''s'', chr(197).chr(158) => ''S'',chr(197).chr(159) => ''s'', chr(197).chr(160) => ''S'', chr(197).chr(161) => ''s'', chr(197).chr(162) => ''T'', chr(197).chr(163) => ''t'', chr(197).chr(164) => ''T'', chr(197).chr(165) => ''t'', chr(197).chr(166) => ''T'', chr(197).chr(167) => ''t'', chr(197).chr(168) => ''U'', chr(197).chr(169) => ''u'', chr(197).chr(170) => ''U'', chr(197).chr(171) => ''u'', chr(197).chr(172) => ''U'', chr(197).chr(173) => ''u'', chr(197).chr(174) => ''U'', chr(197).chr(175) => ''u'', chr(197).chr(176) => ''U'', chr(197).chr(177) => ''u'', chr(197).chr(178) => ''U'', chr(197).chr(179) => ''u'', chr(197).chr(180) => ''W'', chr(197).chr(181) => ''w'', chr(197).chr(182) => ''Y'', chr(197).chr(183) => ''y'', chr(197).chr(184) => ''Y'', chr(197).chr(185) => ''Z'', chr(197).chr(186) => ''z'', chr(197).chr(187) => ''Z'', chr(197).chr(188) => ''z'', chr(197).chr(189) => ''Z'', chr(197).chr(190) => ''z'', chr(197).chr(191) => ''s'' ); $string = strtr($string, $chars); return $string; }

Para comprender mejor qué hace esta función, consulte esta tabla de conversión correspondiente aquí:

À => A Á => A  => A à => A Ä => A Å => A Ç => C È => E É => E Ê => E Ë => E Ì => I Í => I Î => I Ï => I Ñ => N Ò => O Ó => O Ô => O Õ => O Ö => O Ù => U Ú => U Û => U Ü => U Ý => Y ß => s à => a á => a â => a ã => a ä => a å => a ç => c è => e é => e ê => e ë => e ì => i í => i î => i ï => i ñ => n ò => o ó => o ô => o õ => o ö => o ù => u ú => u û => u ü => u ý => y ÿ => y Ā => A ā => a Ă => A ă => a Ą => A ą => a Ć => C ć => c Ĉ => C ĉ => c Ċ => C ċ => c Č => C č => c Ď => D ď => d Đ => D đ => d Ē => E ē => e Ĕ => E ĕ => e Ė => E ė => e Ę => E ę => e Ě => E ě => e Ĝ => G ĝ => g Ğ => G ğ => g Ġ => G ġ => g Ģ => G ģ => g Ĥ => H ĥ => h Ħ => H ħ => h Ĩ => I ĩ => i Ī => I ī => i Ĭ => I ĭ => i Į => I į => i İ => I ı => i IJ => IJ ij => ij Ĵ => J ĵ => j Ķ => K ķ => k ĸ => k Ĺ => L ĺ => l Ļ => L ļ => l Ľ => L ľ => l Ŀ => L ŀ => l Ł => L ł => l Ń => N ń => n Ņ => N ņ => n Ň => N ň => n ʼn => N Ŋ => n ŋ => N Ō => O ō => o Ŏ => O ŏ => o Ő => O ő => o Œ => OE œ => oe Ŕ => R ŕ => r Ŗ => R ŗ => r Ř => R ř => r Ś => S ś => s Ŝ => S ŝ => s Ş => S ş => s Š => S š => s Ţ => T ţ => t Ť => T ť => t Ŧ => T ŧ => t Ũ => U ũ => u Ū => U ū => u Ŭ => U ŭ => u Ů => U ů => u Ű => U ű => u Ų => U ų => u Ŵ => W ŵ => w Ŷ => Y ŷ => y Ÿ => Y Ź => Z ź => z Ż => Z ż => z Ž => Z ž => z ſ => s

Puede generar esta tabla de convección usted mismo simplemente iterando sobre la matriz $chars de la función:

foreach($chars as $k=>$v) { printf("%s -> %s", $k, $v); }


Acabo de crear un método removeAccents basado en la lectura de este hilo y este otro también ( ¿Cómo eliminar acentos y convertir letras en caracteres ASCII "simples"? ).

El método está aquí: https://github.com/lingtalfi/Bat/blob/master/StringTool.md#removeaccents

Las pruebas están aquí: https://github.com/lingtalfi/Bat/blob/master/btests/StringTool/removeAccents/stringTool.removeAccents.test.php ,

y aquí está lo que se probó hasta ahora:

$a = [ // easy '''', ''a'', ''après'', ''dédé fait la fête ?'', // hard ''àáâãäçèéêëìíîïñòóôõöùúûüýÿÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝ'', ''ŻŹĆŃĄŚŁĘÓżźćńąśłęó'', ''qqqqŻŹĆŃĄŚŁĘÓżźćńąśłęóqqq'', ''ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöøùúûüýÿ'', ''ÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïñòóôõöøùúûüýÿ'', ''ĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİĴĵĶķ'', ''ĹĺĻļĽľĿŀŁłŃńŅņŇňʼnŌōŎŏŐőŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽž'', ''ſƒƠơƯưǍǎǏǐǑǒǓǔǕǖǗǘǙǚǛǜǺǻǾǿ'', ''Ǽǽ'', ];

y convierte solo las cosas acentuadas (letras / ligaduras / cédilles / algunas letras con una línea a través de / ...?).

Aquí está el contenido del método: ( https://github.com/lingtalfi/Bat/blob/master/StringTool.php#L83 )

public static function removeAccents($str) { static $map = [ // single letters ''à'' => ''a'', ''á'' => ''a'', ''â'' => ''a'', ''ã'' => ''a'', ''ä'' => ''a'', ''ą'' => ''a'', ''å'' => ''a'', ''ā'' => ''a'', ''ă'' => ''a'', ''ǎ'' => ''a'', ''ǻ'' => ''a'', ''À'' => ''A'', ''Á'' => ''A'', ''Â'' => ''A'', ''Ã'' => ''A'', ''Ä'' => ''A'', ''Ą'' => ''A'', ''Å'' => ''A'', ''Ā'' => ''A'', ''Ă'' => ''A'', ''Ǎ'' => ''A'', ''Ǻ'' => ''A'', ''ç'' => ''c'', ''ć'' => ''c'', ''ĉ'' => ''c'', ''ċ'' => ''c'', ''č'' => ''c'', ''Ç'' => ''C'', ''Ć'' => ''C'', ''Ĉ'' => ''C'', ''Ċ'' => ''C'', ''Č'' => ''C'', ''ď'' => ''d'', ''đ'' => ''d'', ''Ð'' => ''D'', ''Ď'' => ''D'', ''Đ'' => ''D'', ''è'' => ''e'', ''é'' => ''e'', ''ê'' => ''e'', ''ë'' => ''e'', ''ę'' => ''e'', ''ē'' => ''e'', ''ĕ'' => ''e'', ''ė'' => ''e'', ''ě'' => ''e'', ''È'' => ''E'', ''É'' => ''E'', ''Ê'' => ''E'', ''Ë'' => ''E'', ''Ę'' => ''E'', ''Ē'' => ''E'', ''Ĕ'' => ''E'', ''Ė'' => ''E'', ''Ě'' => ''E'', ''ƒ'' => ''f'', ''ĝ'' => ''g'', ''ğ'' => ''g'', ''ġ'' => ''g'', ''ģ'' => ''g'', ''Ĝ'' => ''G'', ''Ğ'' => ''G'', ''Ġ'' => ''G'', ''Ģ'' => ''G'', ''ĥ'' => ''h'', ''ħ'' => ''h'', ''Ĥ'' => ''H'', ''Ħ'' => ''H'', ''ì'' => ''i'', ''í'' => ''i'', ''î'' => ''i'', ''ï'' => ''i'', ''ĩ'' => ''i'', ''ī'' => ''i'', ''ĭ'' => ''i'', ''į'' => ''i'', ''ſ'' => ''i'', ''ǐ'' => ''i'', ''Ì'' => ''I'', ''Í'' => ''I'', ''Î'' => ''I'', ''Ï'' => ''I'', ''Ĩ'' => ''I'', ''Ī'' => ''I'', ''Ĭ'' => ''I'', ''Į'' => ''I'', ''İ'' => ''I'', ''Ǐ'' => ''I'', ''ĵ'' => ''j'', ''Ĵ'' => ''J'', ''ķ'' => ''k'', ''Ķ'' => ''K'', ''ł'' => ''l'', ''ĺ'' => ''l'', ''ļ'' => ''l'', ''ľ'' => ''l'', ''ŀ'' => ''l'', ''Ł'' => ''L'', ''Ĺ'' => ''L'', ''Ļ'' => ''L'', ''Ľ'' => ''L'', ''Ŀ'' => ''L'', ''ñ'' => ''n'', ''ń'' => ''n'', ''ņ'' => ''n'', ''ň'' => ''n'', ''ʼn'' => ''n'', ''Ñ'' => ''N'', ''Ń'' => ''N'', ''Ņ'' => ''N'', ''Ň'' => ''N'', ''ò'' => ''o'', ''ó'' => ''o'', ''ô'' => ''o'', ''õ'' => ''o'', ''ö'' => ''o'', ''ð'' => ''o'', ''ø'' => ''o'', ''ō'' => ''o'', ''ŏ'' => ''o'', ''ő'' => ''o'', ''ơ'' => ''o'', ''ǒ'' => ''o'', ''ǿ'' => ''o'', ''Ò'' => ''O'', ''Ó'' => ''O'', ''Ô'' => ''O'', ''Õ'' => ''O'', ''Ö'' => ''O'', ''Ø'' => ''O'', ''Ō'' => ''O'', ''Ŏ'' => ''O'', ''Ő'' => ''O'', ''Ơ'' => ''O'', ''Ǒ'' => ''O'', ''Ǿ'' => ''O'', ''ŕ'' => ''r'', ''ŗ'' => ''r'', ''ř'' => ''r'', ''Ŕ'' => ''R'', ''Ŗ'' => ''R'', ''Ř'' => ''R'', ''ś'' => ''s'', ''š'' => ''s'', ''ŝ'' => ''s'', ''ş'' => ''s'', ''Ś'' => ''S'', ''Š'' => ''S'', ''Ŝ'' => ''S'', ''Ş'' => ''S'', ''ţ'' => ''t'', ''ť'' => ''t'', ''ŧ'' => ''t'', ''Ţ'' => ''T'', ''Ť'' => ''T'', ''Ŧ'' => ''T'', ''ù'' => ''u'', ''ú'' => ''u'', ''û'' => ''u'', ''ü'' => ''u'', ''ũ'' => ''u'', ''ū'' => ''u'', ''ŭ'' => ''u'', ''ů'' => ''u'', ''ű'' => ''u'', ''ų'' => ''u'', ''ư'' => ''u'', ''ǔ'' => ''u'', ''ǖ'' => ''u'', ''ǘ'' => ''u'', ''ǚ'' => ''u'', ''ǜ'' => ''u'', ''Ù'' => ''U'', ''Ú'' => ''U'', ''Û'' => ''U'', ''Ü'' => ''U'', ''Ũ'' => ''U'', ''Ū'' => ''U'', ''Ŭ'' => ''U'', ''Ů'' => ''U'', ''Ű'' => ''U'', ''Ų'' => ''U'', ''Ư'' => ''U'', ''Ǔ'' => ''U'', ''Ǖ'' => ''U'', ''Ǘ'' => ''U'', ''Ǚ'' => ''U'', ''Ǜ'' => ''U'', ''ŵ'' => ''w'', ''Ŵ'' => ''W'', ''ý'' => ''y'', ''ÿ'' => ''y'', ''ŷ'' => ''y'', ''Ý'' => ''Y'', ''Ÿ'' => ''Y'', ''Ŷ'' => ''Y'', ''ż'' => ''z'', ''ź'' => ''z'', ''ž'' => ''z'', ''Ż'' => ''Z'', ''Ź'' => ''Z'', ''Ž'' => ''Z'', // accentuated ligatures ''Ǽ'' => ''A'', ''ǽ'' => ''a'', ]; return strtr($str, $map); }


Creo que el problema aquí es que tus codificaciones consideran ä y å símbolos diferentes a ''a''. De hecho, la documentación de PHP para Strtr ofrece una muestra para eliminar acentos de la manera fea :(

http://ie2.php.net/strtr


Cuando se usa iconv , se debe establecer la configuración regional del parámetro:

function test_enc($text = ''ěščřžýáíé ĚŠČŘŽÝÁÍÉ fóø bår FÓØ BÅR æ'') { echo ''<tt>''; echo iconv(''utf8'', ''ascii//TRANSLIT'', $text); echo ''</tt><br/>''; } test_enc(); setlocale(LC_ALL, ''cs_CZ.utf8''); test_enc(); setlocale(LC_ALL, ''en_US.utf8''); test_enc();

Se rinde a:

????????? ????????? f?? b?r F?? B?R ae escrzyaie ESCRZYAIE fo? bar FO? BAR ae escrzyaie ESCRZYAIE fo? bar FO? BAR ae

Otros locales, entonces cs_CZ y en_US no los he instalado y no puedo probarlos.

En C # veo una solución que usa la traducción para formar unicode normalizado: los acentos se dividen y luego se filtran a través de la categoría Unicode sin espaciado.


De hecho, es una cuestión de gusto. Hay muchos sabores para convertir tales letras.

function replaceAccents($str) { $a = array(''À'', ''Á'', ''Â'', ''Ã'', ''Ä'', ''Å'', ''Æ'', ''Ç'', ''È'', ''É'', ''Ê'', ''Ë'', ''Ì'', ''Í'', ''Î'', ''Ï'', ''Ð'', ''Ñ'', ''Ò'', ''Ó'', ''Ô'', ''Õ'', ''Ö'', ''Ø'', ''Ù'', ''Ú'', ''Û'', ''Ü'', ''Ý'', ''ß'', ''à'', ''á'', ''â'', ''ã'', ''ä'', ''å'', ''æ'', ''ç'', ''è'', ''é'', ''ê'', ''ë'', ''ì'', ''í'', ''î'', ''ï'', ''ñ'', ''ò'', ''ó'', ''ô'', ''õ'', ''ö'', ''ø'', ''ù'', ''ú'', ''û'', ''ü'', ''ý'', ''ÿ'', ''Ā'', ''ā'', ''Ă'', ''ă'', ''Ą'', ''ą'', ''Ć'', ''ć'', ''Ĉ'', ''ĉ'', ''Ċ'', ''ċ'', ''Č'', ''č'', ''Ď'', ''ď'', ''Đ'', ''đ'', ''Ē'', ''ē'', ''Ĕ'', ''ĕ'', ''Ė'', ''ė'', ''Ę'', ''ę'', ''Ě'', ''ě'', ''Ĝ'', ''ĝ'', ''Ğ'', ''ğ'', ''Ġ'', ''ġ'', ''Ģ'', ''ģ'', ''Ĥ'', ''ĥ'', ''Ħ'', ''ħ'', ''Ĩ'', ''ĩ'', ''Ī'', ''ī'', ''Ĭ'', ''ĭ'', ''Į'', ''į'', ''İ'', ''ı'', ''IJ'', ''ij'', ''Ĵ'', ''ĵ'', ''Ķ'', ''ķ'', ''Ĺ'', ''ĺ'', ''Ļ'', ''ļ'', ''Ľ'', ''ľ'', ''Ŀ'', ''ŀ'', ''Ł'', ''ł'', ''Ń'', ''ń'', ''Ņ'', ''ņ'', ''Ň'', ''ň'', ''ʼn'', ''Ō'', ''ō'', ''Ŏ'', ''ŏ'', ''Ő'', ''ő'', ''Œ'', ''œ'', ''Ŕ'', ''ŕ'', ''Ŗ'', ''ŗ'', ''Ř'', ''ř'', ''Ś'', ''ś'', ''Ŝ'', ''ŝ'', ''Ş'', ''ş'', ''Š'', ''š'', ''Ţ'', ''ţ'', ''Ť'', ''ť'', ''Ŧ'', ''ŧ'', ''Ũ'', ''ũ'', ''Ū'', ''ū'', ''Ŭ'', ''ŭ'', ''Ů'', ''ů'', ''Ű'', ''ű'', ''Ų'', ''ų'', ''Ŵ'', ''ŵ'', ''Ŷ'', ''ŷ'', ''Ÿ'', ''Ź'', ''ź'', ''Ż'', ''ż'', ''Ž'', ''ž'', ''ſ'', ''ƒ'', ''Ơ'', ''ơ'', ''Ư'', ''ư'', ''Ǎ'', ''ǎ'', ''Ǐ'', ''ǐ'', ''Ǒ'', ''ǒ'', ''Ǔ'', ''ǔ'', ''Ǖ'', ''ǖ'', ''Ǘ'', ''ǘ'', ''Ǚ'', ''ǚ'', ''Ǜ'', ''ǜ'', ''Ǻ'', ''ǻ'', ''Ǽ'', ''ǽ'', ''Ǿ'', ''ǿ''); $b = array(''A'', ''A'', ''A'', ''A'', ''A'', ''A'', ''AE'', ''C'', ''E'', ''E'', ''E'', ''E'', ''I'', ''I'', ''I'', ''I'', ''D'', ''N'', ''O'', ''O'', ''O'', ''O'', ''O'', ''O'', ''U'', ''U'', ''U'', ''U'', ''Y'', ''s'', ''a'', ''a'', ''a'', ''a'', ''a'', ''a'', ''ae'', ''c'', ''e'', ''e'', ''e'', ''e'', ''i'', ''i'', ''i'', ''i'', ''n'', ''o'', ''o'', ''o'', ''o'', ''o'', ''o'', ''u'', ''u'', ''u'', ''u'', ''y'', ''y'', ''A'', ''a'', ''A'', ''a'', ''A'', ''a'', ''C'', ''c'', ''C'', ''c'', ''C'', ''c'', ''C'', ''c'', ''D'', ''d'', ''D'', ''d'', ''E'', ''e'', ''E'', ''e'', ''E'', ''e'', ''E'', ''e'', ''E'', ''e'', ''G'', ''g'', ''G'', ''g'', ''G'', ''g'', ''G'', ''g'', ''H'', ''h'', ''H'', ''h'', ''I'', ''i'', ''I'', ''i'', ''I'', ''i'', ''I'', ''i'', ''I'', ''i'', ''IJ'', ''ij'', ''J'', ''j'', ''K'', ''k'', ''L'', ''l'', ''L'', ''l'', ''L'', ''l'', ''L'', ''l'', ''l'', ''l'', ''N'', ''n'', ''N'', ''n'', ''N'', ''n'', ''n'', ''O'', ''o'', ''O'', ''o'', ''O'', ''o'', ''OE'', ''oe'', ''R'', ''r'', ''R'', ''r'', ''R'', ''r'', ''S'', ''s'', ''S'', ''s'', ''S'', ''s'', ''S'', ''s'', ''T'', ''t'', ''T'', ''t'', ''T'', ''t'', ''U'', ''u'', ''U'', ''u'', ''U'', ''u'', ''U'', ''u'', ''U'', ''u'', ''U'', ''u'', ''W'', ''w'', ''Y'', ''y'', ''Y'', ''Z'', ''z'', ''Z'', ''z'', ''Z'', ''z'', ''s'', ''f'', ''O'', ''o'', ''U'', ''u'', ''A'', ''a'', ''I'', ''i'', ''O'', ''o'', ''U'', ''u'', ''U'', ''u'', ''U'', ''u'', ''U'', ''u'', ''U'', ''u'', ''A'', ''a'', ''AE'', ''ae'', ''O'', ''o''); return str_replace($a, $b, $str); }


Esta es una pieza de código que encontré y uso a menudo:

function stripAccents($stripAccents){ return strtr($stripAccents,''àáâãäçèéêëìíîïñòóôõöùúûüýÿÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝ'',''aaaaaceeeeiiiinooooouuuuyyAAAAACEEEEIIIINOOOOOUUUUY''); }


Estoy de acuerdo con el comentario de georgebrock.

Si encuentra una forma de hacer que // TRANSLIT funcione, puede compilar URL amigables:

  1. usa iconv con // TRANSLIT ñ => n ~
    • eliminar caracteres no alfanuméricos que no sean espacios en blanco dentro de las palabras: $url = preg_replace( ''/(/w)[^/w/s](/w)/'', ''$1$2'', $url );
    • reemplace las separaciones restantes: $url = preg_replace( ''/[^a-z0-9]+/'', ''-'', $url );
    • eliminar double / leading / traling: $url = preg_replace( ''-'' , por ejemplo ''/(?:(^|/-)/-+|/-$)/'', '''', $url );

Si no puede hacer que funcione, reemplace setp 1 con reemplazo strtr / basado en caracteres, como la solución de Xetius.


Fusionó la implementación de Cazuma Nii Cavalcanti con la lista de canciones de Junior Mayhé, con la esperanza de ahorrar algo de tiempo para algunos de ustedes.

function stripAccents($str) { return strtr(utf8_decode($str), utf8_decode(''ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëìíîïñòóôõöøùúûüýÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıIJijĴĵĶķĹĺĻļĽľĿŀŁłŃńŅņŇňʼnŌōŎŏŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽžſƒƠơƯưǍǎǏǐǑǒǓǔǕǖǗǘǙǚǛǜǺǻǼǽǾǿ''), ''AAAAAAAECEEEEIIIIDNOOOOOOUUUUYsaaaaaaaeceeeeiiiinoooooouuuuyyAaAaAaCcCcCcCcDdDdEeEeEeEeEeGgGgGgGgHhHhIiIiIiIiIiIJijJjKkLlLlLlLlllNnNnNnnOoOoOoOEoeRrRrRrSsSsSsSsTtTtTtUuUuUuUuUuUuWwYyYZzZzZzsfOoUuAaIiOoUuUuUuUuUuAaAEaeOo''); }


La forma más fácil es usar la función nativa de PHP iconv() .

echo iconv(''UTF-8'', ''ASCII//TRANSLIT//IGNORE'', "Thîs îs à vêry wrong séntènce!"); // output: This is a very wrong sentence!


No puedo reproducir tu problema. Obtengo el resultado esperado.

¿Cómo está utilizando exactamente mb_detect_encoding() para verificar que su cadena sea de hecho UTF-8?

Si simplemente llamo a mb_detect_encoding($input) en una versión codificada en UTF-8 e ISO-8859-1 de su cadena, ambos devuelven "UTF-8", por lo que esa función no es particularmente confiable.

iconv() me da un "aviso" de PHP cuando recibe la cadena erróneamente codificada y solo repite "F", pero eso podría ser debido a diferentes configuraciones / versiones de PHP / iconv (?).

Te sugiero que intentes llamar a mb_check_encoding($input, "utf-8") primero para verificar que tu cadena sea realmente UTF-8. Creo que probablemente no lo sea


Puede utilizar una matriz de matriz => estilo de valor para usar con strtr () de forma segura para los caracteres UTF-8, incluso si son varios bytes.

function no_accent($str){ $accents = array(''À'' => ''A'', ''Á'' => ''A'', ''Â'' => ''A'', ''Ã'' => ''A'', ''Ä'' => ''A'', ''Å'' => ''A'', ''à'' => ''a'', ''á'' => ''a'', ''â'' => ''a'', ''ã'' => ''a'', ''ä'' => ''a'', ''å'' => ''a'', ''Ā'' => ''A'', ''ā'' => ''a'', ''Ă'' => ''A'', ''ă'' => ''a'', ''Ą'' => ''A'', ''ą'' => ''a'', ''Ç'' => ''C'', ''ç'' => ''c'', ''Ć'' => ''C'', ''ć'' => ''c'', ''Ĉ'' => ''C'', ''ĉ'' => ''c'', ''Ċ'' => ''C'', ''ċ'' => ''c'', ''Č'' => ''C'', ''č'' => ''c'', ''Ð'' => ''D'', ''ð'' => ''d'', ''Ď'' => ''D'', ''ď'' => ''d'', ''Đ'' => ''D'', ''đ'' => ''d'', ''È'' => ''E'', ''É'' => ''E'', ''Ê'' => ''E'', ''Ë'' => ''E'', ''è'' => ''e'', ''é'' => ''e'', ''ê'' => ''e'', ''ë'' => ''e'', ''Ē'' => ''E'', ''ē'' => ''e'', ''Ĕ'' => ''E'', ''ĕ'' => ''e'', ''Ė'' => ''E'', ''ė'' => ''e'', ''Ę'' => ''E'', ''ę'' => ''e'', ''Ě'' => ''E'', ''ě'' => ''e'', ''Ĝ'' => ''G'', ''ĝ'' => ''g'', ''Ğ'' => ''G'', ''ğ'' => ''g'', ''Ġ'' => ''G'', ''ġ'' => ''g'', ''Ģ'' => ''G'', ''ģ'' => ''g'', ''Ĥ'' => ''H'', ''ĥ'' => ''h'', ''Ħ'' => ''H'', ''ħ'' => ''h'', ''Ì'' => ''I'', ''Í'' => ''I'', ''Î'' => ''I'', ''Ï'' => ''I'', ''ì'' => ''i'', ''í'' => ''i'', ''î'' => ''i'', ''ï'' => ''i'', ''Ĩ'' => ''I'', ''ĩ'' => ''i'', ''Ī'' => ''I'', ''ī'' => ''i'', ''Ĭ'' => ''I'', ''ĭ'' => ''i'', ''Į'' => ''I'', ''į'' => ''i'', ''İ'' => ''I'', ''ı'' => ''i'', ''Ĵ'' => ''J'', ''ĵ'' => ''j'', ''Ķ'' => ''K'', ''ķ'' => ''k'', ''ĸ'' => ''k'', ''Ĺ'' => ''L'', ''ĺ'' => ''l'', ''Ļ'' => ''L'', ''ļ'' => ''l'', ''Ľ'' => ''L'', ''ľ'' => ''l'', ''Ŀ'' => ''L'', ''ŀ'' => ''l'', ''Ł'' => ''L'', ''ł'' => ''l'', ''Ñ'' => ''N'', ''ñ'' => ''n'', ''Ń'' => ''N'', ''ń'' => ''n'', ''Ņ'' => ''N'', ''ņ'' => ''n'', ''Ň'' => ''N'', ''ň'' => ''n'', ''ʼn'' => ''n'', ''Ŋ'' => ''N'', ''ŋ'' => ''n'', ''Ò'' => ''O'', ''Ó'' => ''O'', ''Ô'' => ''O'', ''Õ'' => ''O'', ''Ö'' => ''O'', ''Ø'' => ''O'', ''ò'' => ''o'', ''ó'' => ''o'', ''ô'' => ''o'', ''õ'' => ''o'', ''ö'' => ''o'', ''ø'' => ''o'', ''Ō'' => ''O'', ''ō'' => ''o'', ''Ŏ'' => ''O'', ''ŏ'' => ''o'', ''Ő'' => ''O'', ''ő'' => ''o'', ''Ŕ'' => ''R'', ''ŕ'' => ''r'', ''Ŗ'' => ''R'', ''ŗ'' => ''r'', ''Ř'' => ''R'', ''ř'' => ''r'', ''Ś'' => ''S'', ''ś'' => ''s'', ''Ŝ'' => ''S'', ''ŝ'' => ''s'', ''Ş'' => ''S'', ''ş'' => ''s'', ''Š'' => ''S'', ''š'' => ''s'', ''ſ'' => ''s'', ''Ţ'' => ''T'', ''ţ'' => ''t'', ''Ť'' => ''T'', ''ť'' => ''t'', ''Ŧ'' => ''T'', ''ŧ'' => ''t'', ''Ù'' => ''U'', ''Ú'' => ''U'', ''Û'' => ''U'', ''Ü'' => ''U'', ''ù'' => ''u'', ''ú'' => ''u'', ''û'' => ''u'', ''ü'' => ''u'', ''Ũ'' => ''U'', ''ũ'' => ''u'', ''Ū'' => ''U'', ''ū'' => ''u'', ''Ŭ'' => ''U'', ''ŭ'' => ''u'', ''Ů'' => ''U'', ''ů'' => ''u'', ''Ű'' => ''U'', ''ű'' => ''u'', ''Ų'' => ''U'', ''ų'' => ''u'', ''Ŵ'' => ''W'', ''ŵ'' => ''w'', ''Ý'' => ''Y'', ''ý'' => ''y'', ''ÿ'' => ''y'', ''Ŷ'' => ''Y'', ''ŷ'' => ''y'', ''Ÿ'' => ''Y'', ''Ź'' => ''Z'', ''ź'' => ''z'', ''Ż'' => ''Z'', ''ż'' => ''z'', ''Ž'' => ''Z'', ''ž'' => ''z''); return strtr($str, $accents); }

Además, guarda la decodificación / codificación en la parte UTF-8.


Puedes usar urlencode. No hace exactamente lo que quiere (eliminar acentos), pero le dará una cadena url utilizable

$output = urlencode ($input);

En Perl podría usar un translate regex, pero no puedo pensar en el equivalente de PHP

$input =~ tr/áâàå/aaaa/;

etc ...

puedes hacer esto usando preg_replace

$patterns[0] = ''/[á|â|à|å|ä]/''; $patterns[1] = ''/[ð|é|ê|è|ë]/''; $patterns[2] = ''/[í|î|ì|ï]/''; $patterns[3] = ''/[ó|ô|ò|ø|õ|ö]/''; $patterns[4] = ''/[ú|û|ù|ü]/''; $patterns[5] = ''/æ/''; $patterns[6] = ''/ç/''; $patterns[7] = ''/ß/''; $replacements[0] = ''a''; $replacements[1] = ''e''; $replacements[2] = ''i''; $replacements[3] = ''o''; $replacements[4] = ''u''; $replacements[5] = ''ae''; $replacements[6] = ''c''; $replacements[7] = ''ss''; $output = preg_replace($patterns, $replacements, $input);

(Tenga en cuenta que esto fue escrito a partir de una cerveza brumosa montada el viernes después de la memoria del mediodía, por lo que puede no ser 100% correcta)

o puede hacer una tabla hash y hacer un reemplazo basado en eso.


Versión amigable UTF-8 de la función simple publicada arriba por Gino:

function stripAccents($str) { return strtr(utf8_decode($str), utf8_decode(''àáâãäçèéêëìíîïñòóôõöùúûüýÿÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝ''), ''aaaaaceeeeiiiinooooouuuuyyAAAAACEEEEIIIINOOOOOUUUUY''); }

Tuve que venir a esto porque mi documento php estaba codificado en UTF-8.

Espero eso ayude.


aquí hay una función simple que uso generalmente para eliminar acentos:

function str_without_accents($str, $charset=''utf-8'') { $str = htmlentities($str, ENT_NOQUOTES, $charset); $str = preg_replace(''#&([A-za-z])(?:acute|cedil|caron|circ|grave|orn|ring|slash|th|tilde|uml);#'', ''/1'', $str); $str = preg_replace(''#&([A-za-z]{2})(?:lig);#'', ''/1'', $str); // pour les ligatures e.g. ''&oelig;'' $str = preg_replace(''#&[^;]+;#'', '''', $str); // supprime les autres caractères return $str; // or add this : mb_strtoupper($str); for uppercase :) }


si tiene http://php.net/manual/en/book.intl.php disponible, esto solucionó su problema

$string = "Fóø Bår"; $transliterator = Transliterator::createFromRules('':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: Lower(); :: NFC;'', Transliterator::FORWARD); echo $normalized = $transliterator->transliterate($string);


An improved version of remove_accents() function according to last version Wordpress 4.3 formatting is:

function mbstring_binary_safe_encoding( $reset = false ) { static $encodings = array(); static $overloaded = null; if ( is_null( $overloaded ) ) $overloaded = function_exists( ''mb_internal_encoding'' ) && ( ini_get( ''mbstring.func_overload'' ) & 2 ); if ( false === $overloaded ) return; if ( ! $reset ) { $encoding = mb_internal_encoding(); array_push( $encodings, $encoding ); mb_internal_encoding( ''ISO-8859-1'' ); } if ( $reset && $encodings ) { $encoding = array_pop( $encodings ); mb_internal_encoding( $encoding ); } } function reset_mbstring_encoding() { mbstring_binary_safe_encoding( true ); } function seems_utf8( $str ) { mbstring_binary_safe_encoding(); $length = strlen($str); reset_mbstring_encoding(); for ($i=0; $i < $length; $i++) { $c = ord($str[$i]); if ($c < 0x80) $n = 0; // 0bbbbbbb elseif (($c & 0xE0) == 0xC0) $n=1; // 110bbbbb elseif (($c & 0xF0) == 0xE0) $n=2; // 1110bbbb elseif (($c & 0xF8) == 0xF0) $n=3; // 11110bbb elseif (($c & 0xFC) == 0xF8) $n=4; // 111110bb elseif (($c & 0xFE) == 0xFC) $n=5; // 1111110b else return false; // Does not match any model for ($j=0; $j<$n; $j++) { // n bytes matching 10bbbbbb follow ? if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80)) return false; } } return true; } function remove_accents( $string ) { if ( !preg_match(''/[/x80-/xff]/'', $string) ) return $string; if (seems_utf8($string)) { $chars = array( // Decompositions for Latin-1 Supplement chr(194).chr(170) => ''a'', chr(194).chr(186) => ''o'', chr(195).chr(128) => ''A'', chr(195).chr(129) => ''A'', chr(195).chr(130) => ''A'', chr(195).chr(131) => ''A'', chr(195).chr(132) => ''A'', chr(195).chr(133) => ''A'', chr(195).chr(134) => ''AE'',chr(195).chr(135) => ''C'', chr(195).chr(136) => ''E'', chr(195).chr(137) => ''E'', chr(195).chr(138) => ''E'', chr(195).chr(139) => ''E'', chr(195).chr(140) => ''I'', chr(195).chr(141) => ''I'', chr(195).chr(142) => ''I'', chr(195).chr(143) => ''I'', chr(195).chr(144) => ''D'', chr(195).chr(145) => ''N'', chr(195).chr(146) => ''O'', chr(195).chr(147) => ''O'', chr(195).chr(148) => ''O'', chr(195).chr(149) => ''O'', chr(195).chr(150) => ''O'', chr(195).chr(153) => ''U'', chr(195).chr(154) => ''U'', chr(195).chr(155) => ''U'', chr(195).chr(156) => ''U'', chr(195).chr(157) => ''Y'', chr(195).chr(158) => ''TH'',chr(195).chr(159) => ''s'', chr(195).chr(160) => ''a'', chr(195).chr(161) => ''a'', chr(195).chr(162) => ''a'', chr(195).chr(163) => ''a'', chr(195).chr(164) => ''a'', chr(195).chr(165) => ''a'', chr(195).chr(166) => ''ae'',chr(195).chr(167) => ''c'', chr(195).chr(168) => ''e'', chr(195).chr(169) => ''e'', chr(195).chr(170) => ''e'', chr(195).chr(171) => ''e'', chr(195).chr(172) => ''i'', chr(195).chr(173) => ''i'', chr(195).chr(174) => ''i'', chr(195).chr(175) => ''i'', chr(195).chr(176) => ''d'', chr(195).chr(177) => ''n'', chr(195).chr(178) => ''o'', chr(195).chr(179) => ''o'', chr(195).chr(180) => ''o'', chr(195).chr(181) => ''o'', chr(195).chr(182) => ''o'', chr(195).chr(184) => ''o'', chr(195).chr(185) => ''u'', chr(195).chr(186) => ''u'', chr(195).chr(187) => ''u'', chr(195).chr(188) => ''u'', chr(195).chr(189) => ''y'', chr(195).chr(190) => ''th'', chr(195).chr(191) => ''y'', chr(195).chr(152) => ''O'', // Decompositions for Latin Extended-A chr(196).chr(128) => ''A'', chr(196).chr(129) => ''a'', chr(196).chr(130) => ''A'', chr(196).chr(131) => ''a'', chr(196).chr(132) => ''A'', chr(196).chr(133) => ''a'', chr(196).chr(134) => ''C'', chr(196).chr(135) => ''c'', chr(196).chr(136) => ''C'', chr(196).chr(137) => ''c'', chr(196).chr(138) => ''C'', chr(196).chr(139) => ''c'', chr(196).chr(140) => ''C'', chr(196).chr(141) => ''c'', chr(196).chr(142) => ''D'', chr(196).chr(143) => ''d'', chr(196).chr(144) => ''D'', chr(196).chr(145) => ''d'', chr(196).chr(146) => ''E'', chr(196).chr(147) => ''e'', chr(196).chr(148) => ''E'', chr(196).chr(149) => ''e'', chr(196).chr(150) => ''E'', chr(196).chr(151) => ''e'', chr(196).chr(152) => ''E'', chr(196).chr(153) => ''e'', chr(196).chr(154) => ''E'', chr(196).chr(155) => ''e'', chr(196).chr(156) => ''G'', chr(196).chr(157) => ''g'', chr(196).chr(158) => ''G'', chr(196).chr(159) => ''g'', chr(196).chr(160) => ''G'', chr(196).chr(161) => ''g'', chr(196).chr(162) => ''G'', chr(196).chr(163) => ''g'', chr(196).chr(164) => ''H'', chr(196).chr(165) => ''h'', chr(196).chr(166) => ''H'', chr(196).chr(167) => ''h'', chr(196).chr(168) => ''I'', chr(196).chr(169) => ''i'', chr(196).chr(170) => ''I'', chr(196).chr(171) => ''i'', chr(196).chr(172) => ''I'', chr(196).chr(173) => ''i'', chr(196).chr(174) => ''I'', chr(196).chr(175) => ''i'', chr(196).chr(176) => ''I'', chr(196).chr(177) => ''i'', chr(196).chr(178) => ''IJ'',chr(196).chr(179) => ''ij'', chr(196).chr(180) => ''J'', chr(196).chr(181) => ''j'', chr(196).chr(182) => ''K'', chr(196).chr(183) => ''k'', chr(196).chr(184) => ''k'', chr(196).chr(185) => ''L'', chr(196).chr(186) => ''l'', chr(196).chr(187) => ''L'', chr(196).chr(188) => ''l'', chr(196).chr(189) => ''L'', chr(196).chr(190) => ''l'', chr(196).chr(191) => ''L'', chr(197).chr(128) => ''l'', chr(197).chr(129) => ''L'', chr(197).chr(130) => ''l'', chr(197).chr(131) => ''N'', chr(197).chr(132) => ''n'', chr(197).chr(133) => ''N'', chr(197).chr(134) => ''n'', chr(197).chr(135) => ''N'', chr(197).chr(136) => ''n'', chr(197).chr(137) => ''N'', chr(197).chr(138) => ''n'', chr(197).chr(139) => ''N'', chr(197).chr(140) => ''O'', chr(197).chr(141) => ''o'', chr(197).chr(142) => ''O'', chr(197).chr(143) => ''o'', chr(197).chr(144) => ''O'', chr(197).chr(145) => ''o'', chr(197).chr(146) => ''OE'',chr(197).chr(147) => ''oe'', chr(197).chr(148) => ''R'',chr(197).chr(149) => ''r'', chr(197).chr(150) => ''R'',chr(197).chr(151) => ''r'', chr(197).chr(152) => ''R'',chr(197).chr(153) => ''r'', chr(197).chr(154) => ''S'',chr(197).chr(155) => ''s'', chr(197).chr(156) => ''S'',chr(197).chr(157) => ''s'', chr(197).chr(158) => ''S'',chr(197).chr(159) => ''s'', chr(197).chr(160) => ''S'', chr(197).chr(161) => ''s'', chr(197).chr(162) => ''T'', chr(197).chr(163) => ''t'', chr(197).chr(164) => ''T'', chr(197).chr(165) => ''t'', chr(197).chr(166) => ''T'', chr(197).chr(167) => ''t'', chr(197).chr(168) => ''U'', chr(197).chr(169) => ''u'', chr(197).chr(170) => ''U'', chr(197).chr(171) => ''u'', chr(197).chr(172) => ''U'', chr(197).chr(173) => ''u'', chr(197).chr(174) => ''U'', chr(197).chr(175) => ''u'', chr(197).chr(176) => ''U'', chr(197).chr(177) => ''u'', chr(197).chr(178) => ''U'', chr(197).chr(179) => ''u'', chr(197).chr(180) => ''W'', chr(197).chr(181) => ''w'', chr(197).chr(182) => ''Y'', chr(197).chr(183) => ''y'', chr(197).chr(184) => ''Y'', chr(197).chr(185) => ''Z'', chr(197).chr(186) => ''z'', chr(197).chr(187) => ''Z'', chr(197).chr(188) => ''z'', chr(197).chr(189) => ''Z'', chr(197).chr(190) => ''z'', chr(197).chr(191) => ''s'', // Decompositions for Latin Extended-B chr(200).chr(152) => ''S'', chr(200).chr(153) => ''s'', chr(200).chr(154) => ''T'', chr(200).chr(155) => ''t'', // Euro Sign chr(226).chr(130).chr(172) => ''E'', // GBP (Pound) Sign chr(194).chr(163) => '''', // Vowels with diacritic (Vietnamese) // unmarked chr(198).chr(160) => ''O'', chr(198).chr(161) => ''o'', chr(198).chr(175) => ''U'', chr(198).chr(176) => ''u'', // grave accent chr(225).chr(186).chr(166) => ''A'', chr(225).chr(186).chr(167) => ''a'', chr(225).chr(186).chr(176) => ''A'', chr(225).chr(186).chr(177) => ''a'', chr(225).chr(187).chr(128) => ''E'', chr(225).chr(187).chr(129) => ''e'', chr(225).chr(187).chr(146) => ''O'', chr(225).chr(187).chr(147) => ''o'', chr(225).chr(187).chr(156) => ''O'', chr(225).chr(187).chr(157) => ''o'', chr(225).chr(187).chr(170) => ''U'', chr(225).chr(187).chr(171) => ''u'', chr(225).chr(187).chr(178) => ''Y'', chr(225).chr(187).chr(179) => ''y'', // hook chr(225).chr(186).chr(162) => ''A'', chr(225).chr(186).chr(163) => ''a'', chr(225).chr(186).chr(168) => ''A'', chr(225).chr(186).chr(169) => ''a'', chr(225).chr(186).chr(178) => ''A'', chr(225).chr(186).chr(179) => ''a'', chr(225).chr(186).chr(186) => ''E'', chr(225).chr(186).chr(187) => ''e'', chr(225).chr(187).chr(130) => ''E'', chr(225).chr(187).chr(131) => ''e'', chr(225).chr(187).chr(136) => ''I'', chr(225).chr(187).chr(137) => ''i'', chr(225).chr(187).chr(142) => ''O'', chr(225).chr(187).chr(143) => ''o'', chr(225).chr(187).chr(148) => ''O'', chr(225).chr(187).chr(149) => ''o'', chr(225).chr(187).chr(158) => ''O'', chr(225).chr(187).chr(159) => ''o'', chr(225).chr(187).chr(166) => ''U'', chr(225).chr(187).chr(167) => ''u'', chr(225).chr(187).chr(172) => ''U'', chr(225).chr(187).chr(173) => ''u'', chr(225).chr(187).chr(182) => ''Y'', chr(225).chr(187).chr(183) => ''y'', // tilde chr(225).chr(186).chr(170) => ''A'', chr(225).chr(186).chr(171) => ''a'', chr(225).chr(186).chr(180) => ''A'', chr(225).chr(186).chr(181) => ''a'', chr(225).chr(186).chr(188) => ''E'', chr(225).chr(186).chr(189) => ''e'', chr(225).chr(187).chr(132) => ''E'', chr(225).chr(187).chr(133) => ''e'', chr(225).chr(187).chr(150) => ''O'', chr(225).chr(187).chr(151) => ''o'', chr(225).chr(187).chr(160) => ''O'', chr(225).chr(187).chr(161) => ''o'', chr(225).chr(187).chr(174) => ''U'', chr(225).chr(187).chr(175) => ''u'', chr(225).chr(187).chr(184) => ''Y'', chr(225).chr(187).chr(185) => ''y'', // acute accent chr(225).chr(186).chr(164) => ''A'', chr(225).chr(186).chr(165) => ''a'', chr(225).chr(186).chr(174) => ''A'', chr(225).chr(186).chr(175) => ''a'', chr(225).chr(186).chr(190) => ''E'', chr(225).chr(186).chr(191) => ''e'', chr(225).chr(187).chr(144) => ''O'', chr(225).chr(187).chr(145) => ''o'', chr(225).chr(187).chr(154) => ''O'', chr(225).chr(187).chr(155) => ''o'', chr(225).chr(187).chr(168) => ''U'', chr(225).chr(187).chr(169) => ''u'', // dot below chr(225).chr(186).chr(160) => ''A'', chr(225).chr(186).chr(161) => ''a'', chr(225).chr(186).chr(172) => ''A'', chr(225).chr(186).chr(173) => ''a'', chr(225).chr(186).chr(182) => ''A'', chr(225).chr(186).chr(183) => ''a'', chr(225).chr(186).chr(184) => ''E'', chr(225).chr(186).chr(185) => ''e'', chr(225).chr(187).chr(134) => ''E'', chr(225).chr(187).chr(135) => ''e'', chr(225).chr(187).chr(138) => ''I'', chr(225).chr(187).chr(139) => ''i'', chr(225).chr(187).chr(140) => ''O'', chr(225).chr(187).chr(141) => ''o'', chr(225).chr(187).chr(152) => ''O'', chr(225).chr(187).chr(153) => ''o'', chr(225).chr(187).chr(162) => ''O'', chr(225).chr(187).chr(163) => ''o'', chr(225).chr(187).chr(164) => ''U'', chr(225).chr(187).chr(165) => ''u'', chr(225).chr(187).chr(176) => ''U'', chr(225).chr(187).chr(177) => ''u'', chr(225).chr(187).chr(180) => ''Y'', chr(225).chr(187).chr(181) => ''y'', // Vowels with diacritic (Chinese, Hanyu Pinyin) chr(201).chr(145) => ''a'', // macron chr(199).chr(149) => ''U'', chr(199).chr(150) => ''u'', // acute accent chr(199).chr(151) => ''U'', chr(199).chr(152) => ''u'', // caron chr(199).chr(141) => ''A'', chr(199).chr(142) => ''a'', chr(199).chr(143) => ''I'', chr(199).chr(144) => ''i'', chr(199).chr(145) => ''O'', chr(199).chr(146) => ''o'', chr(199).chr(147) => ''U'', chr(199).chr(148) => ''u'', chr(199).chr(153) => ''U'', chr(199).chr(154) => ''u'', // grave accent chr(199).chr(155) => ''U'', chr(199).chr(156) => ''u'', ); $string = strtr($string, $chars); } else { $chars = array(); // Assume ISO-8859-1 if not UTF-8 $chars[''in''] = chr(128).chr(131).chr(138).chr(142).chr(154).chr(158) .chr(159).chr(162).chr(165).chr(181).chr(192).chr(193).chr(194) .chr(195).chr(196).chr(197).chr(199).chr(200).chr(201).chr(202) .chr(203).chr(204).chr(205).chr(206).chr(207).chr(209).chr(210) .chr(211).chr(212).chr(213).chr(214).chr(216).chr(217).chr(218) .chr(219).chr(220).chr(221).chr(224).chr(225).chr(226).chr(227) .chr(228).chr(229).chr(231).chr(232).chr(233).chr(234).chr(235) .chr(236).chr(237).chr(238).chr(239).chr(241).chr(242).chr(243) .chr(244).chr(245).chr(246).chr(248).chr(249).chr(250).chr(251) .chr(252).chr(253).chr(255); $chars[''out''] = "EfSZszYcYuAAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy"; $string = strtr($string, $chars[''in''], $chars[''out'']); $double_chars = array(); $double_chars[''in''] = array(chr(140), chr(156), chr(198), chr(208), chr(222), chr(223), chr(230), chr(240), chr(254)); $double_chars[''out''] = array(''OE'', ''oe'', ''AE'', ''DH'', ''TH'', ''ss'', ''ae'', ''dh'', ''th''); $string = str_replace($double_chars[''in''], $double_chars[''out''], $string); } return $string; }

My answer is an update of @dynamic solution since Romanian or perhaps other language diacritics weren''t converted. I wrote the minimum functions and works like a charm.

print_r(remove_accents(''Iași, Iași County, Romania''));


One of the tricks I stumbled upon on the web was using htmlentities then stripping the encoded character :

$stripped = preg_replace(''`&[^;]+;`'','''',htmlentities($string));

Not perfect but it does work well in some case.

Pero, está escribiendo sobre la creación de una cadena de URL, por lo que urlencode y su contraparte urldecode pueden ser mejores. O bien, si está creando una cadena de consulta, use esta última función: http_build_query .


La implementación de WordPress es definitivamente la más segura para las cadenas UTF8. Para cadenas de Latin1, un strtr simple hace el trabajo, pero asegúrese de guardar el script en formato LATIN1, no en UTF-8.


$unwanted_array = array( ''&amp;'' => ''and'', ''&'' => ''and'', ''@'' => ''at'', ''©'' => ''c'', ''®'' => ''r'', ''̊''=>'''',''̧''=>'''',''̨''=>'''',''̄''=>'''',''̱''=>'''', ''Á''=>''a'',''á''=>''a'',''À''=>''a'',''à''=>''a'',''Ă''=>''a'',''ă''=>''a'',''ắ''=>''a'',''Ắ''=>''A'',''Ằ''=>''A'', ''ằ''=>''a'',''ẵ''=>''a'',''Ẵ''=>''A'',''ẳ''=>''a'',''Ẳ''=>''A'',''Â''=>''a'',''â''=>''a'',''ấ''=>''a'',''Ấ''=>''A'', ''ầ''=>''a'',''Ầ''=>''a'',''ẩ''=>''a'',''Ẩ''=>''A'',''Ǎ''=>''a'',''ǎ''=>''a'',''Å''=>''a'',''å''=>''a'',''Ǻ''=>''a'', ''ǻ''=>''a'',''Ä''=>''a'',''ä''=>''a'',''ã''=>''a'',''Ã''=>''A'',''Ą''=>''a'',''ą''=>''a'',''Ā''=>''a'',''ā''=>''a'', ''ả''=>''a'',''Ả''=>''a'',''Ạ''=>''A'',''ạ''=>''a'',''ặ''=>''a'',''Ặ''=>''A'',''ậ''=>''a'',''Ậ''=>''A'',''Æ''=>''ae'', ''æ''=>''ae'',''Ǽ''=>''ae'',''ǽ''=>''ae'',''ẫ''=>''a'',''Ẫ''=>''A'', ''Ć''=>''c'',''ć''=>''c'',''Ĉ''=>''c'',''ĉ''=>''c'',''Č''=>''c'',''č''=>''c'',''Ċ''=>''c'',''ċ''=>''c'',''Ç''=>''c'',''ç''=>''c'', ''Ď''=>''d'',''ď''=>''d'',''Ḑ''=>''D'',''ḑ''=>''d'',''Đ''=>''d'',''đ''=>''d'',''Ḍ''=>''D'',''ḍ''=>''d'',''Ḏ''=>''D'',''ḏ''=>''d'',''ð''=>''d'',''Ð''=>''D'', ''É''=>''e'',''é''=>''e'',''È''=>''e'',''è''=>''e'',''Ĕ''=>''e'',''ĕ''=>''e'',''ê''=>''e'',''ế''=>''e'',''Ế''=>''E'',''ề''=>''e'', ''Ề''=>''E'',''Ě''=>''e'',''ě''=>''e'',''Ë''=>''e'',''ë''=>''e'',''Ė''=>''e'',''ė''=>''e'',''Ę''=>''e'',''ę''=>''e'',''Ē''=>''e'', ''ē''=>''e'',''ệ''=>''e'',''Ệ''=>''E'',''Ə''=>''e'',''ə''=>''e'',''ẽ''=>''e'',''Ẽ''=>''E'',''ễ''=>''e'', ''Ễ''=>''E'',''ể''=>''e'',''Ể''=>''E'',''ẻ''=>''e'',''Ẻ''=>''E'',''ẹ''=>''e'',''Ẹ''=>''E'', ''ƒ''=>''f'', ''Ğ''=>''g'',''ğ''=>''g'',''Ĝ''=>''g'',''ĝ''=>''g'',''Ǧ''=>''G'',''ǧ''=>''g'',''Ġ''=>''g'',''ġ''=>''g'',''Ģ''=>''g'',''ģ''=>''g'', ''H̲''=>''H'',''h̲''=>''h'',''Ĥ''=>''h'',''ĥ''=>''h'',''Ȟ''=>''H'',''ȟ''=>''h'',''Ḩ''=>''H'',''ḩ''=>''h'',''Ħ''=>''h'',''ħ''=>''h'',''Ḥ''=>''H'',''ḥ''=>''h'', ''Ỉ''=>''I'',''Í''=>''i'',''í''=>''i'',''Ì''=>''i'',''ì''=>''i'',''Ĭ''=>''i'',''ĭ''=>''i'',''Î''=>''i'',''î''=>''i'',''Ǐ''=>''i'',''ǐ''=>''i'', ''Ï''=>''i'',''ï''=>''i'',''Ḯ''=>''I'',''ḯ''=>''i'',''Ĩ''=>''i'',''ĩ''=>''i'',''İ''=>''i'',''Į''=>''i'',''į''=>''i'',''Ī''=>''i'',''ī''=>''i'', ''ỉ''=>''I'',''Ị''=>''I'',''ị''=>''i'',''IJ''=>''ij'',''ij''=>''ij'',''ı''=>''i'', ''Ĵ''=>''j'',''ĵ''=>''j'', ''Ķ''=>''k'',''ķ''=>''k'',''Ḵ''=>''K'',''ḵ''=>''k'', ''Ĺ''=>''l'',''ĺ''=>''l'',''Ľ''=>''l'',''ľ''=>''l'',''Ļ''=>''l'',''ļ''=>''l'',''Ł''=>''l'',''ł''=>''l'',''Ŀ''=>''l'',''ŀ''=>''l'', ''Ń''=>''n'',''ń''=>''n'',''Ň''=>''n'',''ň''=>''n'',''Ñ''=>''N'',''ñ''=>''n'',''Ņ''=>''n'',''ņ''=>''n'',''Ṇ''=>''N'',''ṇ''=>''n'',''Ŋ''=>''n'',''ŋ''=>''n'', ''Ó''=>''o'',''ó''=>''o'',''Ò''=>''o'',''ò''=>''o'',''Ŏ''=>''o'',''ŏ''=>''o'',''Ô''=>''o'',''ô''=>''o'',''ố''=>''o'',''Ố''=>''O'',''ồ''=>''o'', ''Ồ''=>''O'',''ổ''=>''o'',''Ổ''=>''O'',''Ǒ''=>''o'',''ǒ''=>''o'',''Ö''=>''o'',''ö''=>''o'',''Ő''=>''o'',''ő''=>''o'',''Õ''=>''o'',''õ''=>''o'', ''Ø''=>''o'',''ø''=>''o'',''Ǿ''=>''o'',''ǿ''=>''o'',''Ǫ''=>''O'',''ǫ''=>''o'',''Ǭ''=>''O'',''ǭ''=>''o'',''Ō''=>''o'',''ō''=>''o'',''ỏ''=>''o'', ''Ỏ''=>''O'',''Ơ''=>''o'',''ơ''=>''o'',''ớ''=>''o'',''Ớ''=>''O'',''ờ''=>''o'',''Ờ''=>''O'',''ở''=>''o'',''Ở''=>''O'',''ợ''=>''o'',''Ợ''=>''O'', ''ọ''=>''o'',''Ọ''=>''O'',''ọ''=>''o'',''Ọ''=>''O'',''ộ''=>''o'',''Ộ''=>''O'',''ỗ''=>''o'',''Ỗ''=>''O'',''ỡ''=>''o'',''Ỡ''=>''O'', ''Œ''=>''oe'',''œ''=>''oe'', ''ĸ''=>''k'', ''Ŕ''=>''r'',''ŕ''=>''r'',''Ř''=>''r'',''ř''=>''r'',''ṙ''=>''r'',''Ŗ''=>''r'',''ŗ''=>''r'',''Ṛ''=>''R'',''ṛ''=>''r'',''Ṟ''=>''R'',''ṟ''=>''r'', ''S̲''=>''S'',''s̲''=>''s'',''Ś''=>''s'',''ś''=>''s'',''Ŝ''=>''s'',''ŝ''=>''s'',''Š''=>''s'',''š''=>''s'',''Ş''=>''s'',''ş''=>''s'', ''Ṣ''=>''S'',''ṣ''=>''s'',''Ș''=>''S'',''ș''=>''s'', ''ſ''=>''z'',''ß''=>''ss'',''Ť''=>''t'',''ť''=>''t'',''Ţ''=>''t'',''ţ''=>''t'',''Ṭ''=>''T'',''ṭ''=>''t'',''Ț''=>''T'', ''ț''=>''t'',''Ṯ''=>''T'',''ṯ''=>''t'',''™''=>''tm'',''Ŧ''=>''t'',''ŧ''=>''t'', ''Ú''=>''u'',''ú''=>''u'',''Ù''=>''u'',''ù''=>''u'',''Ŭ''=>''u'',''ŭ''=>''u'',''Û''=>''u'',''û''=>''u'',''Ǔ''=>''u'',''ǔ''=>''u'',''Ů''=>''u'',''ů''=>''u'', ''Ü''=>''u'',''ü''=>''u'',''Ǘ''=>''u'',''ǘ''=>''u'',''Ǜ''=>''u'',''ǜ''=>''u'',''Ǚ''=>''u'',''ǚ''=>''u'',''Ǖ''=>''u'',''ǖ''=>''u'',''Ű''=>''u'',''ű''=>''u'', ''Ũ''=>''u'',''ũ''=>''u'',''Ų''=>''u'',''ų''=>''u'',''Ū''=>''u'',''ū''=>''u'',''Ư''=>''u'',''ư''=>''u'',''ứ''=>''u'',''Ứ''=>''U'',''ừ''=>''u'',''Ừ''=>''U'', ''ử''=>''u'',''Ử''=>''U'',''ự''=>''u'',''Ự''=>''U'',''ụ''=>''u'',''Ụ''=>''U'',''ủ''=>''u'',''Ủ''=>''U'',''ữ''=>''u'',''Ữ''=>''U'', ''Ŵ''=>''w'',''ŵ''=>''w'', ''Ý''=>''y'',''ý''=>''y'',''ỳ''=>''y'',''Ỳ''=>''Y'',''Ŷ''=>''y'',''ŷ''=>''y'',''ÿ''=>''y'',''Ÿ''=>''y'',''ỹ''=>''y'',''Ỹ''=>''Y'',''ỷ''=>''y'',''Ỷ''=>''Y'', ''Z̲''=>''Z'',''z̲''=>''z'',''Ź''=>''z'',''ź''=>''z'',''Ž''=>''z'',''ž''=>''z'',''Ż''=>''z'',''ż''=>''z'',''Ẕ''=>''Z'',''ẕ''=>''z'', ''þ''=>''p'',''ʼn''=>''n'',''А''=>''a'',''а''=>''a'',''Б''=>''b'',''б''=>''b'',''В''=>''v'',''в''=>''v'',''Г''=>''g'',''г''=>''g'',''Ґ''=>''g'',''ґ''=>''g'', ''Д''=>''d'',''д''=>''d'',''Е''=>''e'',''е''=>''e'',''Ё''=>''jo'',''ё''=>''jo'',''Є''=>''e'',''є''=>''e'',''Ж''=>''zh'',''ж''=>''zh'',''З''=>''z'',''з''=>''z'', ''И''=>''i'',''и''=>''i'',''І''=>''i'',''і''=>''i'',''Ї''=>''i'',''ї''=>''i'',''Й''=>''j'',''й''=>''j'',''К''=>''k'',''к''=>''k'',''Л''=>''l'',''л''=>''l'', ''М''=>''m'',''м''=>''m'',''Н''=>''n'',''н''=>''n'',''О''=>''o'',''о''=>''o'',''П''=>''p'',''п''=>''p'',''Р''=>''r'',''р''=>''r'',''С''=>''s'',''с''=>''s'', ''Т''=>''t'',''т''=>''t'',''У''=>''u'',''у''=>''u'',''Ф''=>''f'',''ф''=>''f'',''Х''=>''h'',''х''=>''h'',''Ц''=>''c'',''ц''=>''c'',''Ч''=>''ch'',''ч''=>''ch'', ''Ш''=>''sh'',''ш''=>''sh'',''Щ''=>''sch'',''щ''=>''sch'',''Ъ''=>''-'', ''ъ''=>''-'',''Ы''=>''y'',''ы''=>''y'',''Ь''=>''-'',''ь''=>''-'', ''Э''=>''je'',''э''=>''je'',''Ю''=>''ju'',''ю''=>''ju'',''Я''=>''ja'',''я''=>''ja'',''א''=>''a'',''ב''=>''b'',''ג''=>''g'',''ד''=>''d'',''ה''=>''h'',''ו''=>''v'', ''ז''=>''z'',''ח''=>''h'',''ט''=>''t'',''י''=>''i'',''ך''=>''k'',''כ''=>''k'',''ל''=>''l'',''ם''=>''m'',''מ''=>''m'',''ן''=>''n'',''נ''=>''n'',''ס''=>''s'',''ע''=>''e'', ''ף''=>''p'',''פ''=>''p'',''ץ''=>''C'',''צ''=>''c'',''ק''=>''q'',''ר''=>''r'',''ש''=>''w'',''ת''=>''t'' ); $accentsRemoved = strtr( $stringToRemoveAccents , $unwanted_array );


<?php /* * Thanks: * - The idea of extracting accents equiv chars with the help of the HTMLSpecialChars convertion was taking from ICanBoogie Package of ''Olivier Laviale'' {@link http://www.weirdog.com/blog/php/supprimer-les-accents-des-caracteres-accentues.html} */ function accentCharsModifier($str){ if(($length=mb_strlen($str,"UTF-8"))<strlen($str)){ $i=$count=0; while($i<$length){ if(strlen($c=mb_substr($str,$i,1,"UTF-8"))>1){ $he=htmlentities($c); if(($nC=preg_replace("#&([A-Za-z])(?:acute|cedil|caron|circ|grave|orn|ring|slash|th|tilde|uml);#", "//1", $he))!=$he || ($nC=preg_replace("#&([A-Za-z]{2})(?:lig);#", "//1", $he))!=$he || ($nC=preg_replace("#&[^;]+;#", "", $he))!=$he){ $str=str_replace($c,$nC,$str,$count);if($nC==""){$length=$length-$count;$i--;} } } $i++; } } return $str; } echo accentCharsModifier("&éôpkAÈû"); ?>