poner - Reemplazar los caracteres acentuados php
tilde en php (13)
Estoy tratando de reemplazar los caracteres acentuados con los reemplazos normales. Debajo está lo que estoy haciendo actualmente.
$string = "Éric Cantona";
$strict = strtolower($string);
echo "After Lower: ".$strict;
$patterns[0] = ''/[á|â|à|å|ä]/'';
$patterns[1] = ''/[ð|é|ê|è|ë]/'';
$patterns[2] = ''/[í|î|ì|ï]/'';
$patterns[3] = ''/[ó|ô|ò|ø|õ|ö]/'';
$patterns[4] = ''/[ú|û|ù|ü]/'';
$patterns[5] = ''/æ/'';
$patterns[6] = ''/ç/'';
$patterns[7] = ''/ß/'';
$replacements[0] = ''a'';
$replacements[1] = ''e'';
$replacements[2] = ''i'';
$replacements[3] = ''o'';
$replacements[4] = ''u'';
$replacements[5] = ''ae'';
$replacements[6] = ''c'';
$replacements[7] = ''ss'';
$strict = preg_replace($patterns, $replacements, $strict);
echo "Final: ".$strict;
Esto me da:
After Lower: éric cantona
Final: ric cantona
Lo anterior me da ric cantona
. Quiero que la salida sea eric cantona
.
¿Alguien puede ayudarme con dónde me estoy equivocando?
Descargo de responsabilidad: ya no apoyo esta respuesta (estaba ciego en ese momento). Pero gracias por los votos positivos = P
Puedes tomar esto como base. Desde WordPress, se usa para generar URLs bonitas (el punto de entrada es la función slugify ()):
/**
* Converts all accent characters to ASCII characters.
*
* If there are no accent characters, then the string given is just returned.
*
* @param string $string Text that might have accent characters
* @return string Filtered string with replaced "nice" characters.
*/
function remove_accents($string) {
if (!preg_match(''/[/x80-/xff]/'', $string))
return $string;
if (seems_utf8($string)) {
$chars = array(
// Decompositions for Latin-1 Supplement
chr(195).chr(128) => ''A'', chr(195).chr(129) => ''A'',
chr(195).chr(130) => ''A'', chr(195).chr(131) => ''A'',
chr(195).chr(132) => ''A'', chr(195).chr(133) => ''A'',
chr(195).chr(135) => ''C'', chr(195).chr(136) => ''E'',
chr(195).chr(137) => ''E'', chr(195).chr(138) => ''E'',
chr(195).chr(139) => ''E'', chr(195).chr(140) => ''I'',
chr(195).chr(141) => ''I'', chr(195).chr(142) => ''I'',
chr(195).chr(143) => ''I'', chr(195).chr(145) => ''N'',
chr(195).chr(146) => ''O'', chr(195).chr(147) => ''O'',
chr(195).chr(148) => ''O'', chr(195).chr(149) => ''O'',
chr(195).chr(150) => ''O'', chr(195).chr(153) => ''U'',
chr(195).chr(154) => ''U'', chr(195).chr(155) => ''U'',
chr(195).chr(156) => ''U'', chr(195).chr(157) => ''Y'',
chr(195).chr(159) => ''s'', chr(195).chr(160) => ''a'',
chr(195).chr(161) => ''a'', chr(195).chr(162) => ''a'',
chr(195).chr(163) => ''a'', chr(195).chr(164) => ''a'',
chr(195).chr(165) => ''a'', chr(195).chr(167) => ''c'',
chr(195).chr(168) => ''e'', chr(195).chr(169) => ''e'',
chr(195).chr(170) => ''e'', chr(195).chr(171) => ''e'',
chr(195).chr(172) => ''i'', chr(195).chr(173) => ''i'',
chr(195).chr(174) => ''i'', chr(195).chr(175) => ''i'',
chr(195).chr(177) => ''n'', chr(195).chr(178) => ''o'',
chr(195).chr(179) => ''o'', chr(195).chr(180) => ''o'',
chr(195).chr(181) => ''o'', chr(195).chr(182) => ''o'',
chr(195).chr(182) => ''o'', chr(195).chr(185) => ''u'',
chr(195).chr(186) => ''u'', chr(195).chr(187) => ''u'',
chr(195).chr(188) => ''u'', chr(195).chr(189) => ''y'',
chr(195).chr(191) => ''y'',
// Decompositions for Latin Extended-A
chr(196).chr(128) => ''A'', chr(196).chr(129) => ''a'',
chr(196).chr(130) => ''A'', chr(196).chr(131) => ''a'',
chr(196).chr(132) => ''A'', chr(196).chr(133) => ''a'',
chr(196).chr(134) => ''C'', chr(196).chr(135) => ''c'',
chr(196).chr(136) => ''C'', chr(196).chr(137) => ''c'',
chr(196).chr(138) => ''C'', chr(196).chr(139) => ''c'',
chr(196).chr(140) => ''C'', chr(196).chr(141) => ''c'',
chr(196).chr(142) => ''D'', chr(196).chr(143) => ''d'',
chr(196).chr(144) => ''D'', chr(196).chr(145) => ''d'',
chr(196).chr(146) => ''E'', chr(196).chr(147) => ''e'',
chr(196).chr(148) => ''E'', chr(196).chr(149) => ''e'',
chr(196).chr(150) => ''E'', chr(196).chr(151) => ''e'',
chr(196).chr(152) => ''E'', chr(196).chr(153) => ''e'',
chr(196).chr(154) => ''E'', chr(196).chr(155) => ''e'',
chr(196).chr(156) => ''G'', chr(196).chr(157) => ''g'',
chr(196).chr(158) => ''G'', chr(196).chr(159) => ''g'',
chr(196).chr(160) => ''G'', chr(196).chr(161) => ''g'',
chr(196).chr(162) => ''G'', chr(196).chr(163) => ''g'',
chr(196).chr(164) => ''H'', chr(196).chr(165) => ''h'',
chr(196).chr(166) => ''H'', chr(196).chr(167) => ''h'',
chr(196).chr(168) => ''I'', chr(196).chr(169) => ''i'',
chr(196).chr(170) => ''I'', chr(196).chr(171) => ''i'',
chr(196).chr(172) => ''I'', chr(196).chr(173) => ''i'',
chr(196).chr(174) => ''I'', chr(196).chr(175) => ''i'',
chr(196).chr(176) => ''I'', chr(196).chr(177) => ''i'',
chr(196).chr(178) => ''IJ'',chr(196).chr(179) => ''ij'',
chr(196).chr(180) => ''J'', chr(196).chr(181) => ''j'',
chr(196).chr(182) => ''K'', chr(196).chr(183) => ''k'',
chr(196).chr(184) => ''k'', chr(196).chr(185) => ''L'',
chr(196).chr(186) => ''l'', chr(196).chr(187) => ''L'',
chr(196).chr(188) => ''l'', chr(196).chr(189) => ''L'',
chr(196).chr(190) => ''l'', chr(196).chr(191) => ''L'',
chr(197).chr(128) => ''l'', chr(197).chr(129) => ''L'',
chr(197).chr(130) => ''l'', chr(197).chr(131) => ''N'',
chr(197).chr(132) => ''n'', chr(197).chr(133) => ''N'',
chr(197).chr(134) => ''n'', chr(197).chr(135) => ''N'',
chr(197).chr(136) => ''n'', chr(197).chr(137) => ''N'',
chr(197).chr(138) => ''n'', chr(197).chr(139) => ''N'',
chr(197).chr(140) => ''O'', chr(197).chr(141) => ''o'',
chr(197).chr(142) => ''O'', chr(197).chr(143) => ''o'',
chr(197).chr(144) => ''O'', chr(197).chr(145) => ''o'',
chr(197).chr(146) => ''OE'',chr(197).chr(147) => ''oe'',
chr(197).chr(148) => ''R'',chr(197).chr(149) => ''r'',
chr(197).chr(150) => ''R'',chr(197).chr(151) => ''r'',
chr(197).chr(152) => ''R'',chr(197).chr(153) => ''r'',
chr(197).chr(154) => ''S'',chr(197).chr(155) => ''s'',
chr(197).chr(156) => ''S'',chr(197).chr(157) => ''s'',
chr(197).chr(158) => ''S'',chr(197).chr(159) => ''s'',
chr(197).chr(160) => ''S'', chr(197).chr(161) => ''s'',
chr(197).chr(162) => ''T'', chr(197).chr(163) => ''t'',
chr(197).chr(164) => ''T'', chr(197).chr(165) => ''t'',
chr(197).chr(166) => ''T'', chr(197).chr(167) => ''t'',
chr(197).chr(168) => ''U'', chr(197).chr(169) => ''u'',
chr(197).chr(170) => ''U'', chr(197).chr(171) => ''u'',
chr(197).chr(172) => ''U'', chr(197).chr(173) => ''u'',
chr(197).chr(174) => ''U'', chr(197).chr(175) => ''u'',
chr(197).chr(176) => ''U'', chr(197).chr(177) => ''u'',
chr(197).chr(178) => ''U'', chr(197).chr(179) => ''u'',
chr(197).chr(180) => ''W'', chr(197).chr(181) => ''w'',
chr(197).chr(182) => ''Y'', chr(197).chr(183) => ''y'',
chr(197).chr(184) => ''Y'', chr(197).chr(185) => ''Z'',
chr(197).chr(186) => ''z'', chr(197).chr(187) => ''Z'',
chr(197).chr(188) => ''z'', chr(197).chr(189) => ''Z'',
chr(197).chr(190) => ''z'', chr(197).chr(191) => ''s'',
// Euro Sign
chr(226).chr(130).chr(172) => ''E'',
// GBP (Pound) Sign
chr(194).chr(163) => '''');
$string = strtr($string, $chars);
} else {
// Assume ISO-8859-1 if not UTF-8
$chars[''in''] = chr(128).chr(131).chr(138).chr(142).chr(154).chr(158)
.chr(159).chr(162).chr(165).chr(181).chr(192).chr(193).chr(194)
.chr(195).chr(196).chr(197).chr(199).chr(200).chr(201).chr(202)
.chr(203).chr(204).chr(205).chr(206).chr(207).chr(209).chr(210)
.chr(211).chr(212).chr(213).chr(214).chr(216).chr(217).chr(218)
.chr(219).chr(220).chr(221).chr(224).chr(225).chr(226).chr(227)
.chr(228).chr(229).chr(231).chr(232).chr(233).chr(234).chr(235)
.chr(236).chr(237).chr(238).chr(239).chr(241).chr(242).chr(243)
.chr(244).chr(245).chr(246).chr(248).chr(249).chr(250).chr(251)
.chr(252).chr(253).chr(255);
$chars[''out''] = "EfSZszYcYuAAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy";
$string = strtr($string, $chars[''in''], $chars[''out'']);
$double_chars[''in''] = array(chr(140), chr(156), chr(198), chr(208), chr(222), chr(223), chr(230), chr(240), chr(254));
$double_chars[''out''] = array(''OE'', ''oe'', ''AE'', ''DH'', ''TH'', ''ss'', ''ae'', ''dh'', ''th'');
$string = str_replace($double_chars[''in''], $double_chars[''out''], $string);
}
return $string;
}
/**
* Checks to see if a string is utf8 encoded.
*
* @author bmorel at ssi dot fr
*
* @param string $Str The string to be checked
* @return bool True if $Str fits a UTF-8 model, false otherwise.
*/
function seems_utf8($Str) { # by bmorel at ssi dot fr
$length = strlen($Str);
for ($i = 0; $i < $length; $i++) {
if (ord($Str[$i]) < 0x80) continue; # 0bbbbbbb
elseif ((ord($Str[$i]) & 0xE0) == 0xC0) $n = 1; # 110bbbbb
elseif ((ord($Str[$i]) & 0xF0) == 0xE0) $n = 2; # 1110bbbb
elseif ((ord($Str[$i]) & 0xF8) == 0xF0) $n = 3; # 11110bbb
elseif ((ord($Str[$i]) & 0xFC) == 0xF8) $n = 4; # 111110bb
elseif ((ord($Str[$i]) & 0xFE) == 0xFC) $n = 5; # 1111110b
else return false; # Does not match any model
for ($j = 0; $j < $n; $j++) { # n bytes matching 10bbbbbb follow ?
if ((++$i == $length) || ((ord($Str[$i]) & 0xC0) != 0x80))
return false;
}
}
return true;
}
function utf8_uri_encode($utf8_string, $length = 0) {
$unicode = '''';
$values = array();
$num_octets = 1;
$unicode_length = 0;
$string_length = strlen($utf8_string);
for ($i = 0; $i < $string_length; $i++) {
$value = ord($utf8_string[$i]);
if ($value < 128) {
if ($length && ($unicode_length >= $length))
break;
$unicode .= chr($value);
$unicode_length++;
} else {
if (count($values) == 0) $num_octets = ($value < 224) ? 2 : 3;
$values[] = $value;
if ($length && ($unicode_length + ($num_octets * 3)) > $length)
break;
if (count( $values ) == $num_octets) {
if ($num_octets == 3) {
$unicode .= ''%'' . dechex($values[0]) . ''%'' . dechex($values[1]) . ''%'' . dechex($values[2]);
$unicode_length += 9;
} else {
$unicode .= ''%'' . dechex($values[0]) . ''%'' . dechex($values[1]);
$unicode_length += 6;
}
$values = array();
$num_octets = 1;
}
}
}
return $unicode;
}
/**
* Sanitizes title, replacing whitespace with dashes.
*
* Limits the output to alphanumeric characters, underscore (_) and dash (-).
* Whitespace becomes a dash.
*
* @param string $title The title to be sanitized.
* @return string The sanitized title.
*/
function slugify($title) {
$title = strip_tags($title);
// Preserve escaped octets.
$title = preg_replace(''|%([a-fA-F0-9][a-fA-F0-9])|'', ''---$1---'', $title);
// Remove percent signs that are not part of an octet.
$title = str_replace(''%'', '''', $title);
// Restore octets.
$title = preg_replace(''|---([a-fA-F0-9][a-fA-F0-9])---|'', ''%$1'', $title);
$title = remove_accents($title);
if (seems_utf8($title)) {
if (function_exists(''mb_strtolower'')) {
$title = mb_strtolower($title, ''UTF-8'');
}
$title = utf8_uri_encode($title, 200);
}
$title = strtolower($title);
$title = preg_replace(''/&.+?;/'', '''', $title); // kill entities
$title = preg_replace(''/[^%a-z0-9 _-]/'', '''', $title);
$title = preg_replace(''//s+/'', ''-'', $title);
$title = preg_replace(''|-+|'', ''-'', $title);
$title = trim($title, ''-'');
return $title;
}
Acabo de encontrar la respuesta de Lizard que es extremadamente útil, especialmente cuando haces algún tipo de clasificación. No es hermoso cuántos caracteres necesitamos decir más o menos igual;)
Si alguien más está buscando una solución integral (en lo que respecta a los comentarios anteriores), aquí está el copiar y pegar:
/**
* Replace language-specific characters by ASCII-equivalents.
* @param string $s
* @return string
*/
public static function normalizeChars($s) {
$replace = array(
''ъ''=>''-'', ''Ь''=>''-'', ''Ъ''=>''-'', ''ь''=>''-'',
''Ă''=>''A'', ''Ą''=>''A'', ''À''=>''A'', ''Ã''=>''A'', ''Á''=>''A'', ''Æ''=>''A'', ''Â''=>''A'', ''Å''=>''A'', ''Ä''=>''Ae'',
''Þ''=>''B'',
''Ć''=>''C'', ''ץ''=>''C'', ''Ç''=>''C'',
''È''=>''E'', ''Ę''=>''E'', ''É''=>''E'', ''Ë''=>''E'', ''Ê''=>''E'',
''Ğ''=>''G'',
''İ''=>''I'', ''Ï''=>''I'', ''Î''=>''I'', ''Í''=>''I'', ''Ì''=>''I'',
''Ł''=>''L'',
''Ñ''=>''N'', ''Ń''=>''N'',
''Ø''=>''O'', ''Ó''=>''O'', ''Ò''=>''O'', ''Ô''=>''O'', ''Õ''=>''O'', ''Ö''=>''Oe'',
''Ş''=>''S'', ''Ś''=>''S'', ''Ș''=>''S'', ''Š''=>''S'',
''Ț''=>''T'',
''Ù''=>''U'', ''Û''=>''U'', ''Ú''=>''U'', ''Ü''=>''Ue'',
''Ý''=>''Y'',
''Ź''=>''Z'', ''Ž''=>''Z'', ''Ż''=>''Z'',
''â''=>''a'', ''ǎ''=>''a'', ''ą''=>''a'', ''á''=>''a'', ''ă''=>''a'', ''ã''=>''a'', ''Ǎ''=>''a'', ''а''=>''a'', ''А''=>''a'', ''å''=>''a'', ''à''=>''a'', ''א''=>''a'', ''Ǻ''=>''a'', ''Ā''=>''a'', ''ǻ''=>''a'', ''ā''=>''a'', ''ä''=>''ae'', ''æ''=>''ae'', ''Ǽ''=>''ae'', ''ǽ''=>''ae'',
''б''=>''b'', ''ב''=>''b'', ''Б''=>''b'', ''þ''=>''b'',
''ĉ''=>''c'', ''Ĉ''=>''c'', ''Ċ''=>''c'', ''ć''=>''c'', ''ç''=>''c'', ''ц''=>''c'', ''צ''=>''c'', ''ċ''=>''c'', ''Ц''=>''c'', ''Č''=>''c'', ''č''=>''c'', ''Ч''=>''ch'', ''ч''=>''ch'',
''ד''=>''d'', ''ď''=>''d'', ''Đ''=>''d'', ''Ď''=>''d'', ''đ''=>''d'', ''д''=>''d'', ''Д''=>''D'', ''ð''=>''d'',
''є''=>''e'', ''ע''=>''e'', ''е''=>''e'', ''Е''=>''e'', ''Ə''=>''e'', ''ę''=>''e'', ''ĕ''=>''e'', ''ē''=>''e'', ''Ē''=>''e'', ''Ė''=>''e'', ''ė''=>''e'', ''ě''=>''e'', ''Ě''=>''e'', ''Є''=>''e'', ''Ĕ''=>''e'', ''ê''=>''e'', ''ə''=>''e'', ''è''=>''e'', ''ë''=>''e'', ''é''=>''e'',
''ф''=>''f'', ''ƒ''=>''f'', ''Ф''=>''f'',
''ġ''=>''g'', ''Ģ''=>''g'', ''Ġ''=>''g'', ''Ĝ''=>''g'', ''Г''=>''g'', ''г''=>''g'', ''ĝ''=>''g'', ''ğ''=>''g'', ''ג''=>''g'', ''Ґ''=>''g'', ''ґ''=>''g'', ''ģ''=>''g'',
''ח''=>''h'', ''ħ''=>''h'', ''Х''=>''h'', ''Ħ''=>''h'', ''Ĥ''=>''h'', ''ĥ''=>''h'', ''х''=>''h'', ''ה''=>''h'',
''î''=>''i'', ''ï''=>''i'', ''í''=>''i'', ''ì''=>''i'', ''į''=>''i'', ''ĭ''=>''i'', ''ı''=>''i'', ''Ĭ''=>''i'', ''И''=>''i'', ''ĩ''=>''i'', ''ǐ''=>''i'', ''Ĩ''=>''i'', ''Ǐ''=>''i'', ''и''=>''i'', ''Į''=>''i'', ''י''=>''i'', ''Ї''=>''i'', ''Ī''=>''i'', ''І''=>''i'', ''ї''=>''i'', ''і''=>''i'', ''ī''=>''i'', ''ij''=>''ij'', ''IJ''=>''ij'',
''й''=>''j'', ''Й''=>''j'', ''Ĵ''=>''j'', ''ĵ''=>''j'', ''я''=>''ja'', ''Я''=>''ja'', ''Э''=>''je'', ''э''=>''je'', ''ё''=>''jo'', ''Ё''=>''jo'', ''ю''=>''ju'', ''Ю''=>''ju'',
''ĸ''=>''k'', ''כ''=>''k'', ''Ķ''=>''k'', ''К''=>''k'', ''к''=>''k'', ''ķ''=>''k'', ''ך''=>''k'',
''Ŀ''=>''l'', ''ŀ''=>''l'', ''Л''=>''l'', ''ł''=>''l'', ''ļ''=>''l'', ''ĺ''=>''l'', ''Ĺ''=>''l'', ''Ļ''=>''l'', ''л''=>''l'', ''Ľ''=>''l'', ''ľ''=>''l'', ''ל''=>''l'',
''מ''=>''m'', ''М''=>''m'', ''ם''=>''m'', ''м''=>''m'',
''ñ''=>''n'', ''н''=>''n'', ''Ņ''=>''n'', ''ן''=>''n'', ''ŋ''=>''n'', ''נ''=>''n'', ''Н''=>''n'', ''ń''=>''n'', ''Ŋ''=>''n'', ''ņ''=>''n'', ''ʼn''=>''n'', ''Ň''=>''n'', ''ň''=>''n'',
''о''=>''o'', ''О''=>''o'', ''ő''=>''o'', ''õ''=>''o'', ''ô''=>''o'', ''Ő''=>''o'', ''ŏ''=>''o'', ''Ŏ''=>''o'', ''Ō''=>''o'', ''ō''=>''o'', ''ø''=>''o'', ''ǿ''=>''o'', ''ǒ''=>''o'', ''ò''=>''o'', ''Ǿ''=>''o'', ''Ǒ''=>''o'', ''ơ''=>''o'', ''ó''=>''o'', ''Ơ''=>''o'', ''œ''=>''oe'', ''Œ''=>''oe'', ''ö''=>''oe'',
''פ''=>''p'', ''ף''=>''p'', ''п''=>''p'', ''П''=>''p'',
''ק''=>''q'',
''ŕ''=>''r'', ''ř''=>''r'', ''Ř''=>''r'', ''ŗ''=>''r'', ''Ŗ''=>''r'', ''ר''=>''r'', ''Ŕ''=>''r'', ''Р''=>''r'', ''р''=>''r'',
''ș''=>''s'', ''с''=>''s'', ''Ŝ''=>''s'', ''š''=>''s'', ''ś''=>''s'', ''ס''=>''s'', ''ş''=>''s'', ''С''=>''s'', ''ŝ''=>''s'', ''Щ''=>''sch'', ''щ''=>''sch'', ''ш''=>''sh'', ''Ш''=>''sh'', ''ß''=>''ss'',
''т''=>''t'', ''ט''=>''t'', ''ŧ''=>''t'', ''ת''=>''t'', ''ť''=>''t'', ''ţ''=>''t'', ''Ţ''=>''t'', ''Т''=>''t'', ''ț''=>''t'', ''Ŧ''=>''t'', ''Ť''=>''t'', ''™''=>''tm'',
''ū''=>''u'', ''у''=>''u'', ''Ũ''=>''u'', ''ũ''=>''u'', ''Ư''=>''u'', ''ư''=>''u'', ''Ū''=>''u'', ''Ǔ''=>''u'', ''ų''=>''u'', ''Ų''=>''u'', ''ŭ''=>''u'', ''Ŭ''=>''u'', ''Ů''=>''u'', ''ů''=>''u'', ''ű''=>''u'', ''Ű''=>''u'', ''Ǖ''=>''u'', ''ǔ''=>''u'', ''Ǜ''=>''u'', ''ù''=>''u'', ''ú''=>''u'', ''û''=>''u'', ''У''=>''u'', ''ǚ''=>''u'', ''ǜ''=>''u'', ''Ǚ''=>''u'', ''Ǘ''=>''u'', ''ǖ''=>''u'', ''ǘ''=>''u'', ''ü''=>''ue'',
''в''=>''v'', ''ו''=>''v'', ''В''=>''v'',
''ש''=>''w'', ''ŵ''=>''w'', ''Ŵ''=>''w'',
''ы''=>''y'', ''ŷ''=>''y'', ''ý''=>''y'', ''ÿ''=>''y'', ''Ÿ''=>''y'', ''Ŷ''=>''y'',
''Ы''=>''y'', ''ž''=>''z'', ''З''=>''z'', ''з''=>''z'', ''ź''=>''z'', ''ז''=>''z'', ''ż''=>''z'', ''ſ''=>''z'', ''Ж''=>''zh'', ''ж''=>''zh''
);
return strtr($s, $replace);
}
Tenga en cuenta algunos ligeros cambios con respecto a las diéresis alemanas (ä => ae)
Editar: incluye más caracteres basados en la publicación del usuario3682119 (excepto el símbolo de copyright) y el comentario de daker.
Así que encontré esto en la página php.net para la función preg_replace
// replace accented chars
$string = "Zacarías Ferreíra"; // my definition for string variable
$accents = ''/&([A-Za-z]{1,2})(grave|acute|circ|cedil|uml|lig);/'';
$string_encoded = htmlentities($string,ENT_NOQUOTES,''UTF-8'');
$string = preg_replace($accents,''$1'',$string_encoded);
Si tiene problemas de codificación, puede obtener algo así como "Zacarás Ferreira", simplemente decodifique la cadena y use dicho código arriba.
$string = utf8_decode("ZacarÃÂas FerreÃÂra");
En PHP 5.4 la extensión intl
proporciona una nueva clase llamada Transliterator.
Creo que esa es la mejor manera de eliminar signos diacríticos por dos razones:
Transliterator se basa en la UCI, por lo que está utilizando las tablas de la biblioteca de la UCI. ICU es un gran proyecto, desarrollado a lo largo del año para proporcionar tablas y funcionalidades completas. Cualquier tabla que quiera escribir usted mismo, nunca será tan completa como la de la UCI.
En UTF-8, los personajes podrían representarse de manera diferente. Por ejemplo, el carácter ñ podría guardarse como un carácter único (multi-byte), o como la combinación de caracteres
˜
(multibyte) yn
. Además de esto, algunos caracteres en Unicode son homógrafos: tienen el mismo aspecto y tienen diferentes puntos de código. Por esta razón, también es importante normalizar la cadena.
Aquí hay un código de muestra, tomado de una vieja respuesta mía :
<?php
$transliterator = Transliterator::createFromRules('':: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;'', Transliterator::FORWARD);
$test = [''abcd'', ''èe'', ''€'', ''àòùìéëü'', ''àòùìéëü'', ''tiësto''];
foreach($test as $e) {
$normalized = $transliterator->transliterate($e);
echo $e. '' --> ''.$normalized."/n";
}
?>
Resultado:
abcd --> abcd
èe --> ee
€ --> €
àòùìéëü --> aouieeu
àòùìéëü --> aouieeu
tiësto --> tiesto
El primer argumento para la clase Transliterator realiza la eliminación de signos diacríticos así como la normalización de la cadena.
Esto funcionó para mí:
<?php
setlocale(LC_ALL, "en_US.utf8");
$val = iconv(''UTF-8'',''ASCII//TRANSLIT'',$val);
?>
He intentado todo tipo basado en las variaciones que figuran en las respuestas, pero el siguiente funcionó:
$unwanted_array = array( ''Š''=>''S'', ''š''=>''s'', ''Ž''=>''Z'', ''ž''=>''z'', ''À''=>''A'', ''Á''=>''A'', ''Â''=>''A'', ''Ã''=>''A'', ''Ä''=>''A'', ''Å''=>''A'', ''Æ''=>''A'', ''Ç''=>''C'', ''È''=>''E'', ''É''=>''E'',
''Ê''=>''E'', ''Ë''=>''E'', ''Ì''=>''I'', ''Í''=>''I'', ''Î''=>''I'', ''Ï''=>''I'', ''Ñ''=>''N'', ''Ò''=>''O'', ''Ó''=>''O'', ''Ô''=>''O'', ''Õ''=>''O'', ''Ö''=>''O'', ''Ø''=>''O'', ''Ù''=>''U'',
''Ú''=>''U'', ''Û''=>''U'', ''Ü''=>''U'', ''Ý''=>''Y'', ''Þ''=>''B'', ''ß''=>''Ss'', ''à''=>''a'', ''á''=>''a'', ''â''=>''a'', ''ã''=>''a'', ''ä''=>''a'', ''å''=>''a'', ''æ''=>''a'', ''ç''=>''c'',
''è''=>''e'', ''é''=>''e'', ''ê''=>''e'', ''ë''=>''e'', ''ì''=>''i'', ''í''=>''i'', ''î''=>''i'', ''ï''=>''i'', ''ð''=>''o'', ''ñ''=>''n'', ''ò''=>''o'', ''ó''=>''o'', ''ô''=>''o'', ''õ''=>''o'',
''ö''=>''o'', ''ø''=>''o'', ''ù''=>''u'', ''ú''=>''u'', ''û''=>''u'', ''ý''=>''y'', ''þ''=>''b'', ''ÿ''=>''y'' );
$str = strtr( $str, $unwanted_array );
Para eliminar los signos diacríticos, usa iconv:
$val = iconv(''ISO-8859-1'',''ASCII//TRANSLIT'',$val);
o
$val = iconv(''UTF-8'',''ASCII//TRANSLIT'',$val);
tenga en cuenta que php tiene algunos errores extraños en el sentido de que (¿a veces?) necesita tener una configuración regional para que estas conversiones funcionen, usando setlocale ().
editar recién probado, obtiene la mayoría de sus signos diacríticos de la caja:
$val = "á|â|à|å|ä ð|é|ê|è|ë í|î|ì|ï ó|ô|ò|ø|õ|ö ú|û|ù|ü æ ç ß abc ABC 123";
echo iconv(''UTF-8'',''ASCII//TRANSLIT'',$val);
salida:
a|a|a|a|a ?|e|e|e|e i|i|i|i o|o|o|?|o|o u|u|u|u ae c ss abc ABC 123
Por lo tanto, es posible que desee corregir manualmente esos dos elementos impares antes de llamar a iconv, o profundizar en el funcionamiento interno de php y solucionarlo.
Una respuesta actualizada basada en la respuesta de @BurninLeo
function replace_spec_char($subject) {
$char_map = array(
"ъ" => "-", "ь" => "-", "Ъ" => "-", "Ь" => "-",
"А" => "A", "Ă" => "A", "Ǎ" => "A", "Ą" => "A", "À" => "A", "Ã" => "A", "Á" => "A", "Æ" => "A", "Â" => "A", "Å" => "A", "Ǻ" => "A", "Ā" => "A", "א" => "A",
"Б" => "B", "ב" => "B", "Þ" => "B",
"Ĉ" => "C", "Ć" => "C", "Ç" => "C", "Ц" => "C", "צ" => "C", "Ċ" => "C", "Č" => "C", "©" => "C", "ץ" => "C",
"Д" => "D", "Ď" => "D", "Đ" => "D", "ד" => "D", "Ð" => "D",
"È" => "E", "Ę" => "E", "É" => "E", "Ë" => "E", "Ê" => "E", "Е" => "E", "Ē" => "E", "Ė" => "E", "Ě" => "E", "Ĕ" => "E", "Є" => "E", "Ə" => "E", "ע" => "E",
"Ф" => "F", "Ƒ" => "F",
"Ğ" => "G", "Ġ" => "G", "Ģ" => "G", "Ĝ" => "G", "Г" => "G", "ג" => "G", "Ґ" => "G",
"ח" => "H", "Ħ" => "H", "Х" => "H", "Ĥ" => "H", "ה" => "H",
"I" => "I", "Ï" => "I", "Î" => "I", "Í" => "I", "Ì" => "I", "Į" => "I", "Ĭ" => "I", "I" => "I", "И" => "I", "Ĩ" => "I", "Ǐ" => "I", "י" => "I", "Ї" => "I", "Ī" => "I", "І" => "I",
"Й" => "J", "Ĵ" => "J",
"ĸ" => "K", "כ" => "K", "Ķ" => "K", "К" => "K", "ך" => "K",
"Ł" => "L", "Ŀ" => "L", "Л" => "L", "Ļ" => "L", "Ĺ" => "L", "Ľ" => "L", "ל" => "L",
"מ" => "M", "М" => "M", "ם" => "M",
"Ñ" => "N", "Ń" => "N", "Н" => "N", "Ņ" => "N", "ן" => "N", "Ŋ" => "N", "נ" => "N", "ʼn" => "N", "Ň" => "N",
"Ø" => "O", "Ó" => "O", "Ò" => "O", "Ô" => "O", "Õ" => "O", "О" => "O", "Ő" => "O", "Ŏ" => "O", "Ō" => "O", "Ǿ" => "O", "Ǒ" => "O", "Ơ" => "O",
"פ" => "P", "ף" => "P", "П" => "P",
"ק" => "Q",
"Ŕ" => "R", "Ř" => "R", "Ŗ" => "R", "ר" => "R", "Р" => "R", "®" => "R",
"Ş" => "S", "Ś" => "S", "Ș" => "S", "Š" => "S", "С" => "S", "Ŝ" => "S", "ס" => "S",
"Т" => "T", "Ț" => "T", "ט" => "T", "Ŧ" => "T", "ת" => "T", "Ť" => "T", "Ţ" => "T",
"Ù" => "U", "Û" => "U", "Ú" => "U", "Ū" => "U", "У" => "U", "Ũ" => "U", "Ư" => "U", "Ǔ" => "U", "Ų" => "U", "Ŭ" => "U", "Ů" => "U", "Ű" => "U", "Ǖ" => "U", "Ǜ" => "U", "Ǚ" => "U", "Ǘ" => "U",
"В" => "V", "ו" => "V",
"Ý" => "Y", "Ы" => "Y", "Ŷ" => "Y", "Ÿ" => "Y",
"Ź" => "Z", "Ž" => "Z", "Ż" => "Z", "З" => "Z", "ז" => "Z", "S" => "Z",
"а" => "a", "ă" => "a", "ǎ" => "a", "ą" => "a", "à" => "a", "ã" => "a", "á" => "a", "æ" => "a", "â" => "a", "å" => "a", "ǻ" => "a", "ā" => "a", "א" => "a",
"б" => "b", "ב" => "b", "þ" => "b",
"ĉ" => "c", "ć" => "c", "ç" => "c", "ц" => "c", "צ" => "c", "ċ" => "c", "č" => "c", "©" => "c", "ץ" => "c",
"Ч" => "ch", "ч" => "ch",
"д" => "d", "ď" => "d", "đ" => "d", "ד" => "d", "ð" => "d",
"è" => "e", "ę" => "e", "é" => "e", "ë" => "e", "ê" => "e", "е" => "e", "ē" => "e", "ė" => "e", "ě" => "e", "ĕ" => "e", "є" => "e", "ə" => "e", "ע" => "e",
"ф" => "f", "ƒ" => "f",
"ğ" => "g", "ġ" => "g", "ģ" => "g", "ĝ" => "g", "г" => "g", "ג" => "g", "ґ" => "g",
"ח" => "h", "ħ" => "h", "х" => "h", "ĥ" => "h", "ה" => "h",
"i" => "i", "ï" => "i", "î" => "i", "í" => "i", "ì" => "i", "į" => "i", "ĭ" => "i", "ı" => "i", "и" => "i", "ĩ" => "i", "ǐ" => "i", "י" => "i", "ї" => "i", "ī" => "i", "і" => "i",
"й" => "j", "Й" => "j", "Ĵ" => "j", "ĵ" => "j",
"ĸ" => "k", "כ" => "k", "ķ" => "k", "к" => "k", "ך" => "k",
"ł" => "l", "ŀ" => "l", "л" => "l", "ļ" => "l", "ĺ" => "l", "ľ" => "l", "ל" => "l",
"מ" => "m", "м" => "m", "ם" => "m",
"ñ" => "n", "ń" => "n", "н" => "n", "ņ" => "n", "ן" => "n", "ŋ" => "n", "נ" => "n", "ʼn" => "n", "ň" => "n",
"ø" => "o", "ó" => "o", "ò" => "o", "ô" => "o", "õ" => "o", "о" => "o", "ő" => "o", "ŏ" => "o", "ō" => "o", "ǿ" => "o", "ǒ" => "o", "ơ" => "o",
"פ" => "p", "ף" => "p", "п" => "p",
"ק" => "q",
"ŕ" => "r", "ř" => "r", "ŗ" => "r", "ר" => "r", "р" => "r", "®" => "r",
"ş" => "s", "ś" => "s", "ș" => "s", "š" => "s", "с" => "s", "ŝ" => "s", "ס" => "s",
"т" => "t", "ț" => "t", "ט" => "t", "ŧ" => "t", "ת" => "t", "ť" => "t", "ţ" => "t",
"ù" => "u", "û" => "u", "ú" => "u", "ū" => "u", "у" => "u", "ũ" => "u", "ư" => "u", "ǔ" => "u", "ų" => "u", "ŭ" => "u", "ů" => "u", "ű" => "u", "ǖ" => "u", "ǜ" => "u", "ǚ" => "u", "ǘ" => "u",
"в" => "v", "ו" => "v",
"ý" => "y", "ы" => "y", "ŷ" => "y", "ÿ" => "y",
"ź" => "z", "ž" => "z", "ż" => "z", "з" => "z", "ז" => "z", "ſ" => "z",
"™" => "tm",
"@" => "at",
"Ä" => "ae", "Ǽ" => "ae", "ä" => "ae", "æ" => "ae", "ǽ" => "ae",
"ij" => "ij", "IJ" => "ij",
"я" => "ja", "Я" => "ja",
"Э" => "je", "э" => "je",
"ё" => "jo", "Ё" => "jo",
"ю" => "ju", "Ю" => "ju",
"œ" => "oe", "Œ" => "oe", "ö" => "oe", "Ö" => "oe",
"щ" => "sch", "Щ" => "sch",
"ш" => "sh", "Ш" => "sh",
"ß" => "ss",
"Ü" => "ue",
"Ж" => "zh", "ж" => "zh",
);
return strtr($subject, $char_map);
}
$string = "Ħí ŧħə®ë, юßť å test!";
echo replace_spec_char($string);
Ħí ŧħə®ë, юßť å test!
=> Hi there, jusst a test!
Esto no mezcla los caracteres en mayúsculas y minúsculas, excepto los caracteres más largos (p. Ej .: ss, ch, sch), se agregó @ ® ©
Además, si desea generar concordancia de expresiones regulares independientemente de los caracteres especiales:
rss => ''[rŕřŘŗŖרŔРр](?:[sșсŜšśסşСŝ][sșсŜšśסşСŝ]|[ß])''
Una implementación vala de esto: https://code.launchpad.net/~jeremy-munsch/synapse-project/ascii-smart/+merge/277477
Aquí está la lista básica con la que podría trabajar, con la sustitución de expresiones regulares (en texto sublime) o una secuencia de comandos pequeña, puede crear cualquier cosa a partir de esta matriz para satisfacer sus necesidades.
"-" => "ъьЪЬ",
"A" => "АĂǍĄÀÃÁÆÂÅǺĀא",
"B" => "БבÞ",
"C" => "ĈĆÇЦצĊČ©ץ",
"D" => "ДĎĐדÐ",
"E" => "ÈĘÉËÊЕĒĖĚĔЄƏע",
"F" => "ФƑ",
"G" => "ĞĠĢĜГגҐ",
"H" => "חĦХĤה",
"I" => "IÏÎÍÌĮĬIИĨǏיЇĪІ",
"J" => "ЙĴ",
"K" => "ĸכĶКך",
"L" => "ŁĿЛĻĹĽל",
"M" => "מМם",
"N" => "ÑŃНŅןŊנʼnŇ",
"O" => "ØÓÒÔÕОŐŎŌǾǑƠ",
"P" => "פףП",
"Q" => "ק",
"R" => "ŔŘŖרР®",
"S" => "ŞŚȘŠСŜס",
"T" => "ТȚטŦתŤŢ",
"U" => "ÙÛÚŪУŨƯǓŲŬŮŰǕǛǙǗ",
"V" => "Вו",
"Y" => "ÝЫŶŸ",
"Z" => "ŹŽŻЗזS",
"a" => "аăǎąàãáæâåǻāא",
"b" => "бבþ",
"c" => "ĉćçцצċč©ץ",
"ch" => "ч",
"d" => "дďđדð",
"e" => "èęéëêеēėěĕєəע",
"f" => "фƒ",
"g" => "ğġģĝгגґ",
"h" => "חħхĥה",
"i" => "iïîíìįĭıиĩǐיїīі",
"j" => "йĵ",
"k" => "ĸכķкך",
"l" => "łŀлļĺľל",
"m" => "מмם",
"n" => "ñńнņןŋנʼnň",
"o" => "øóòôõоőŏōǿǒơ",
"p" => "פףп",
"q" => "ק",
"r" => "ŕřŗרр®",
"s" => "şśșšсŝס",
"t" => "тțטŧתťţ",
"u" => "ùûúūуũưǔųŭůűǖǜǚǘ",
"v" => "вו",
"y" => "ýыŷÿ",
"z" => "źžżзזſ",
"tm" => "™",
"at" => "@",
"ae" => "ÄǼäæǽ",
"ch" => "Чч",
"ij" => "ijIJ",
"j" => "йЙĴĵ",
"ja" => "яЯ",
"je" => "Ээ",
"jo" => "ёЁ",
"ju" => "юЮ",
"oe" => "œŒöÖ",
"sch" => "щЩ",
"sh" => "шШ",
"ss" => "ß",
"tm" => "™",
"ue" => "Ü",
"zh" => "Жж"
strtolower
only works on iso-8859-1 encoded strings. You could try with mb_strtolower
.
Or, if you have to mangle with multibyte-extensions, you might as well use iconv''s transliteration support:
iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text);
Editar:
It seems I was a bit fast. You appear to use iso-8859-1, so your current strategy will work. You just need to write the regexp''s properly. Eg.:
''/(ð|é|ê|è|ë)/''
not:
''/[ð|é|ê|è|ë]/''
I know, that question has been asked a long long time ago...
I was looking for a short and elegant solution, but couldn''t find satisfaction for two reasons:
En primer lugar, la mayoría de las soluciones existentes reemplazan una lista de caracteres por una lista de otros caracteres. Lamentablemente, es necesario utilizar una codificación específica para el archivo de script php, que podría no ser deseada.
Segundo, usar iconv parece ser una buena manera, pero no es suficiente ya que el resultado de un personaje convertido podría ser de uno o dos caracteres, o una Excepción Fatal.
Así que escribí esa pequeña función que hace el trabajo:
function replaceAccent($string, $replacement = ''_'')
{
$alnumPattern = ''/^[a-zA-Z0-9 ]+$/'';
if (preg_match($alnumPattern, $string)) {
return $string;
}
$ret = array_map(
function ($chr) use ($alnumPattern, $replacement) {
if (preg_match($alnumPattern, $chr)) {
return $chr;
} else {
$chr = @iconv(''ISO-8859-1'', ''ASCII//TRANSLIT'', $chr);
if (strlen($chr) == 1) {
return $chr;
} elseif (strlen($chr) > 1) {
$ret = '''';
foreach (str_split($chr) as $char2) {
if (preg_match($alnumPattern, $char2)) {
$ret .= $char2;
}
}
return $ret;
} else {
// replace whatever iconv fail to convert by something else
return $replacement;
}
}
},
str_split($string)
);
return implode($ret);
}
I''ve searched and your idea for accent striping is quite awesome and cost-effective but your regex is wrongly done and misses 2 extra params. Long story short the regex must be:
$patterns[0] = ''/[áâàåä]/ui'';
$patterns[1] = ''/[ðéêèë]/ui'';
$patterns[2] = ''/[íîìï]/ui'';
$patterns[3] = ''/[óôòøõö]/ui'';
$patterns[4] = ''/[úûùü]/ui'';
$patterns[5] = ''/æ/ui'';
$patterns[6] = ''/ç/ui'';
$patterns[7] = ''/ß/ui'';
$replacements[0] = ''a'';
$replacements[1] = ''e'';
$replacements[2] = ''i'';
$replacements[3] = ''o'';
$replacements[4] = ''u'';
$replacements[5] = ''ae'';
$replacements[6] = ''c'';
$replacements[7] = ''ss'';
As you can see is quite similar but the most important thing is the paramas after the second slash of the regular expression. When a regualr expression is like this /[someCoolRegex]/ui
the u
specifies that it must use unicode and the i
specifies that is case insensitive, I''ve tested my own and with the ansewer in this forum I must say is more cost efective than using strtr.
Hope someone reads this answer.
if you have http://php.net/manual/en/book.intl.php available, this will solve your problem:
$string = "Éric Cantona";
$transliterator = Transliterator::createFromRules('':: NFD; :: [:Nonspacing Mark:] Remove; :: Lower(); :: NFC;'', Transliterator::FORWARD);
echo $normalized = $transliterator->transliterate($string);
protected $_convertTable = array(
''&'' => ''and'', ''@'' => ''at'', ''©'' => ''c'', ''®'' => ''r'', ''À'' => ''a'',
''Á'' => ''a'', ''Â'' => ''a'', ''Ä'' => ''a'', ''Å'' => ''a'', ''Æ'' => ''ae'',''Ç'' => ''c'',
''È'' => ''e'', ''É'' => ''e'', ''Ë'' => ''e'', ''Ì'' => ''i'', ''Í'' => ''i'', ''Î'' => ''i'',
''Ï'' => ''i'', ''Ò'' => ''o'', ''Ó'' => ''o'', ''Ô'' => ''o'', ''Õ'' => ''o'', ''Ö'' => ''o'',
''Ø'' => ''o'', ''Ù'' => ''u'', ''Ú'' => ''u'', ''Û'' => ''u'', ''Ü'' => ''u'', ''Ý'' => ''y'',
''ß'' => ''ss'',''à'' => ''a'', ''á'' => ''a'', ''â'' => ''a'', ''ä'' => ''a'', ''å'' => ''a'',
''æ'' => ''ae'',''ç'' => ''c'', ''è'' => ''e'', ''é'' => ''e'', ''ê'' => ''e'', ''ë'' => ''e'',
''ì'' => ''i'', ''í'' => ''i'', ''î'' => ''i'', ''ï'' => ''i'', ''ò'' => ''o'', ''ó'' => ''o'',
''ô'' => ''o'', ''õ'' => ''o'', ''ö'' => ''o'', ''ø'' => ''o'', ''ù'' => ''u'', ''ú'' => ''u'',
''û'' => ''u'', ''ü'' => ''u'', ''ý'' => ''y'', ''þ'' => ''p'', ''ÿ'' => ''y'', ''Ā'' => ''a'',
''ā'' => ''a'', ''Ă'' => ''a'', ''ă'' => ''a'', ''Ą'' => ''a'', ''ą'' => ''a'', ''Ć'' => ''c'',
''ć'' => ''c'', ''Ĉ'' => ''c'', ''ĉ'' => ''c'', ''Ċ'' => ''c'', ''ċ'' => ''c'', ''Č'' => ''c'',
''č'' => ''c'', ''Ď'' => ''d'', ''ď'' => ''d'', ''Đ'' => ''d'', ''đ'' => ''d'', ''Ē'' => ''e'',
''ē'' => ''e'', ''Ĕ'' => ''e'', ''ĕ'' => ''e'', ''Ė'' => ''e'', ''ė'' => ''e'', ''Ę'' => ''e'',
''ę'' => ''e'', ''Ě'' => ''e'', ''ě'' => ''e'', ''Ĝ'' => ''g'', ''ĝ'' => ''g'', ''Ğ'' => ''g'',
''ğ'' => ''g'', ''Ġ'' => ''g'', ''ġ'' => ''g'', ''Ģ'' => ''g'', ''ģ'' => ''g'', ''Ĥ'' => ''h'',
''ĥ'' => ''h'', ''Ħ'' => ''h'', ''ħ'' => ''h'', ''Ĩ'' => ''i'', ''ĩ'' => ''i'', ''Ī'' => ''i'',
''ī'' => ''i'', ''Ĭ'' => ''i'', ''ĭ'' => ''i'', ''Į'' => ''i'', ''į'' => ''i'', ''İ'' => ''i'',
''ı'' => ''i'', ''IJ'' => ''ij'',''ij'' => ''ij'',''Ĵ'' => ''j'', ''ĵ'' => ''j'', ''Ķ'' => ''k'',
''ķ'' => ''k'', ''ĸ'' => ''k'', ''Ĺ'' => ''l'', ''ĺ'' => ''l'', ''Ļ'' => ''l'', ''ļ'' => ''l'',
''Ľ'' => ''l'', ''ľ'' => ''l'', ''Ŀ'' => ''l'', ''ŀ'' => ''l'', ''Ł'' => ''l'', ''ł'' => ''l'',
''Ń'' => ''n'', ''ń'' => ''n'', ''Ņ'' => ''n'', ''ņ'' => ''n'', ''Ň'' => ''n'', ''ň'' => ''n'',
''ʼn'' => ''n'', ''Ŋ'' => ''n'', ''ŋ'' => ''n'', ''Ō'' => ''o'', ''ō'' => ''o'', ''Ŏ'' => ''o'',
''ŏ'' => ''o'', ''Ő'' => ''o'', ''ő'' => ''o'', ''Œ'' => ''oe'',''œ'' => ''oe'',''Ŕ'' => ''r'',
''ŕ'' => ''r'', ''Ŗ'' => ''r'', ''ŗ'' => ''r'', ''Ř'' => ''r'', ''ř'' => ''r'', ''Ś'' => ''s'',
''ś'' => ''s'', ''Ŝ'' => ''s'', ''ŝ'' => ''s'', ''Ş'' => ''s'', ''ş'' => ''s'', ''Š'' => ''s'',
''š'' => ''s'', ''Ţ'' => ''t'', ''ţ'' => ''t'', ''Ť'' => ''t'', ''ť'' => ''t'', ''Ŧ'' => ''t'',
''ŧ'' => ''t'', ''Ũ'' => ''u'', ''ũ'' => ''u'', ''Ū'' => ''u'', ''ū'' => ''u'', ''Ŭ'' => ''u'',
''ŭ'' => ''u'', ''Ů'' => ''u'', ''ů'' => ''u'', ''Ű'' => ''u'', ''ű'' => ''u'', ''Ų'' => ''u'',
''ų'' => ''u'', ''Ŵ'' => ''w'', ''ŵ'' => ''w'', ''Ŷ'' => ''y'', ''ŷ'' => ''y'', ''Ÿ'' => ''y'',
''Ź'' => ''z'', ''ź'' => ''z'', ''Ż'' => ''z'', ''ż'' => ''z'', ''Ž'' => ''z'', ''ž'' => ''z'',
''ſ'' => ''z'', ''Ə'' => ''e'', ''ƒ'' => ''f'', ''Ơ'' => ''o'', ''ơ'' => ''o'', ''Ư'' => ''u'',
''ư'' => ''u'', ''Ǎ'' => ''a'', ''ǎ'' => ''a'', ''Ǐ'' => ''i'', ''ǐ'' => ''i'', ''Ǒ'' => ''o'',
''ǒ'' => ''o'', ''Ǔ'' => ''u'', ''ǔ'' => ''u'', ''Ǖ'' => ''u'', ''ǖ'' => ''u'', ''Ǘ'' => ''u'',
''ǘ'' => ''u'', ''Ǚ'' => ''u'', ''ǚ'' => ''u'', ''Ǜ'' => ''u'', ''ǜ'' => ''u'', ''Ǻ'' => ''a'',
''ǻ'' => ''a'', ''Ǽ'' => ''ae'',''ǽ'' => ''ae'',''Ǿ'' => ''o'', ''ǿ'' => ''o'', ''ə'' => ''e'',
''Ё'' => ''jo'',''Є'' => ''e'', ''І'' => ''i'', ''Ї'' => ''i'', ''А'' => ''a'', ''Б'' => ''b'',
''В'' => ''v'', ''Г'' => ''g'', ''Д'' => ''d'', ''Е'' => ''e'', ''Ж'' => ''zh'',''З'' => ''z'',
''И'' => ''i'', ''Й'' => ''j'', ''К'' => ''k'', ''Л'' => ''l'', ''М'' => ''m'', ''Н'' => ''n'',
''О'' => ''o'', ''П'' => ''p'', ''Р'' => ''r'', ''С'' => ''s'', ''Т'' => ''t'', ''У'' => ''u'',
''Ф'' => ''f'', ''Х'' => ''h'', ''Ц'' => ''c'', ''Ч'' => ''ch'',''Ш'' => ''sh'',''Щ'' => ''sch'',
''Ъ'' => ''-'', ''Ы'' => ''y'', ''Ь'' => ''-'', ''Э'' => ''je'',''Ю'' => ''ju'',''Я'' => ''ja'',
''а'' => ''a'', ''б'' => ''b'', ''в'' => ''v'', ''г'' => ''g'', ''д'' => ''d'', ''е'' => ''e'',
''ж'' => ''zh'',''з'' => ''z'', ''и'' => ''i'', ''й'' => ''j'', ''к'' => ''k'', ''л'' => ''l'',
''м'' => ''m'', ''н'' => ''n'', ''о'' => ''o'', ''п'' => ''p'', ''р'' => ''r'', ''с'' => ''s'',
''т'' => ''t'', ''у'' => ''u'', ''ф'' => ''f'', ''х'' => ''h'', ''ц'' => ''c'', ''ч'' => ''ch'',
''ш'' => ''sh'',''щ'' => ''sch'',''ъ'' => ''-'',''ы'' => ''y'', ''ь'' => ''-'', ''э'' => ''je'',
''ю'' => ''ju'',''я'' => ''ja'',''ё'' => ''jo'',''є'' => ''e'', ''і'' => ''i'', ''ї'' => ''i'',
''Ґ'' => ''g'', ''ґ'' => ''g'', ''א'' => ''a'', ''ב'' => ''b'', ''ג'' => ''g'', ''ד'' => ''d'',
''ה'' => ''h'', ''ו'' => ''v'', ''ז'' => ''z'', ''ח'' => ''h'', ''ט'' => ''t'', ''י'' => ''i'',
''ך'' => ''k'', ''כ'' => ''k'', ''ל'' => ''l'', ''ם'' => ''m'', ''מ'' => ''m'', ''ן'' => ''n'',
''נ'' => ''n'', ''ס'' => ''s'', ''ע'' => ''e'', ''ף'' => ''p'', ''פ'' => ''p'', ''ץ'' => ''C'',
''צ'' => ''c'', ''ק'' => ''q'', ''ר'' => ''r'', ''ש'' => ''w'', ''ת'' => ''t'', ''™'' => ''tm'',
);
Desde magento, lo estoy usando básicamente para todo