java - remover - Eliminar acentos de String
reemplazar caracteres en java (5)
Ajedé la solución de Rabi a mis necesidades, espero que ayude a alguien:
private static Map<Character, Character> MAP_NORM;
public static String removeAccents(String value)
{
if (MAP_NORM == null || MAP_NORM.size() == 0)
{
MAP_NORM = new HashMap<Character, Character>();
MAP_NORM.put(''À'', ''A'');
MAP_NORM.put(''Á'', ''A'');
MAP_NORM.put(''Â'', ''A'');
MAP_NORM.put(''Ã'', ''A'');
MAP_NORM.put(''Ä'', ''A'');
MAP_NORM.put(''È'', ''E'');
MAP_NORM.put(''É'', ''E'');
MAP_NORM.put(''Ê'', ''E'');
MAP_NORM.put(''Ë'', ''E'');
MAP_NORM.put(''Í'', ''I'');
MAP_NORM.put(''Ì'', ''I'');
MAP_NORM.put(''Î'', ''I'');
MAP_NORM.put(''Ï'', ''I'');
MAP_NORM.put(''Ù'', ''U'');
MAP_NORM.put(''Ú'', ''U'');
MAP_NORM.put(''Û'', ''U'');
MAP_NORM.put(''Ü'', ''U'');
MAP_NORM.put(''Ò'', ''O'');
MAP_NORM.put(''Ó'', ''O'');
MAP_NORM.put(''Ô'', ''O'');
MAP_NORM.put(''Õ'', ''O'');
MAP_NORM.put(''Ö'', ''O'');
MAP_NORM.put(''Ñ'', ''N'');
MAP_NORM.put(''Ç'', ''C'');
MAP_NORM.put(''ª'', ''A'');
MAP_NORM.put(''º'', ''O'');
MAP_NORM.put(''§'', ''S'');
MAP_NORM.put(''³'', ''3'');
MAP_NORM.put(''²'', ''2'');
MAP_NORM.put(''¹'', ''1'');
MAP_NORM.put(''à'', ''a'');
MAP_NORM.put(''á'', ''a'');
MAP_NORM.put(''â'', ''a'');
MAP_NORM.put(''ã'', ''a'');
MAP_NORM.put(''ä'', ''a'');
MAP_NORM.put(''è'', ''e'');
MAP_NORM.put(''é'', ''e'');
MAP_NORM.put(''ê'', ''e'');
MAP_NORM.put(''ë'', ''e'');
MAP_NORM.put(''í'', ''i'');
MAP_NORM.put(''ì'', ''i'');
MAP_NORM.put(''î'', ''i'');
MAP_NORM.put(''ï'', ''i'');
MAP_NORM.put(''ù'', ''u'');
MAP_NORM.put(''ú'', ''u'');
MAP_NORM.put(''û'', ''u'');
MAP_NORM.put(''ü'', ''u'');
MAP_NORM.put(''ò'', ''o'');
MAP_NORM.put(''ó'', ''o'');
MAP_NORM.put(''ô'', ''o'');
MAP_NORM.put(''õ'', ''o'');
MAP_NORM.put(''ö'', ''o'');
MAP_NORM.put(''ñ'', ''n'');
MAP_NORM.put(''ç'', ''c'');
}
if (value == null) {
return "";
}
StringBuilder sb = new StringBuilder(value);
for(int i = 0; i < value.length(); i++) {
Character c = MAP_NORM.get(sb.charAt(i));
if(c != null) {
sb.setCharAt(i, c.charValue());
}
}
return sb.toString();
}
¿Hay alguna forma en Android que (que yo sepa) no tiene java.text.Normalizer, para eliminar cualquier acento de una cadena. Por ejemplo, "éàù" se convierte en "eau".
¡Me gustaría evitar analizar el String para verificar cada personaje si es posible!
Probablemente esta no sea la solución más eficiente, pero funcionará y funciona en todas las versiones de Android:
private static Map<Character, Character> MAP_NORM;
static { // Greek characters normalization
MAP_NORM = new HashMap<Character, Character>();
MAP_NORM.put(''ά'', ''α'');
MAP_NORM.put(''έ'', ''ε'');
MAP_NORM.put(''ί'', ''ι'');
MAP_NORM.put(''ό'', ''ο'');
MAP_NORM.put(''ύ'', ''υ'');
MAP_NORM.put(''ή'', ''η'');
MAP_NORM.put(''ς'', ''σ'');
MAP_NORM.put(''ώ'', ''ω'');
MAP_NORM.put(''Ά'', ''α'');
MAP_NORM.put(''Έ'', ''ε'');
MAP_NORM.put(''Ί'', ''ι'');
MAP_NORM.put(''Ό'', ''ο'');
MAP_NORM.put(''Ύ'', ''υ'');
MAP_NORM.put(''Ή'', ''η'');
MAP_NORM.put(''Ώ'', ''ω'');
}
public static String removeAccents(String s) {
if (s == null) {
return null;
}
StringBuilder sb = new StringBuilder(s);
for(int i = 0; i < s.length(); i++) {
Character c = MAP_NORM.get(sb.charAt(i));
if(c != null) {
sb.setCharAt(i, c.charValue());
}
}
return sb.toString();
}
Si bien la respuesta de Guillaume funciona, elimina todos los caracteres que no sean ASCII de la cadena. Si desea conservar estos intente con este código (donde string
es la cadena para simplificar):
// Convert input string to decomposed Unicode (NFD) so that the
// diacritical marks used in many European scripts (such as the
// "C WITH CIRCUMFLEX" → ĉ) become separate characters.
// Also use compatibility decomposition (K) so that characters,
// that have the exact same meaning as one or more other
// characters (such as "㎏" → "kg" or "ヒ" → "ヒ"), match when
// comparing them.
string = Normalizer.normalize(string, Normalizer.Form.NFKD);
StringBuilder result = new StringBuilder();
int offset = 0, strLen = string.length();
while(offset < strLen) {
int character = string.codePointAt(offset);
offset += Character.charCount(character);
// Only process characters that are not combining Unicode
// characters. This way all the decomposed diacritical marks
// (and some other not-that-important modifiers), that were
// part of the original string or produced by the NFKD
// normalizer above, disappear.
switch(Character.getType(character)) {
case Character.NON_SPACING_MARK:
case Character.COMBINING_SPACING_MARK:
// Some combining character found
break;
default:
result.appendCodePoint(Character.toLowerCase(character));
}
}
// Since we stripped all combining Unicode characters in the
// previous while-loop there should be no combining character
// remaining in the string and the composed and decomposed
// versions of the string should be equivalent. This also means
// we do not need to convert the string back to composed Unicode
// before returning it.
return result.toString();
Todos los referenciales están en el conjunto de códigos de caracteres ASCII ampliado, con valores decimales mayores que 127. Por lo tanto, podría enumerar todos los caracteres de una cadena y si el valor del código del carácter decimal es mayor que 127, vuelva a asignarlo al equivalente deseado. No hay una forma fácil de asignar los caracteres acentuados a las contrapartes sin acentos: debería mantener algún tipo de mapa en la memoria para asignar los códigos decimales extendidos a los caracteres sin acentos.
java.text.Normalizer
está ahí en Android (en las últimas versiones de todos modos). Puedes usarlo.
EDITAR Como referencia, aquí está cómo usar Normalizer
:
string = Normalizer.normalize(string, Normalizer.Form.NFD);
string = string.replaceAll("[^//p{ASCII}]", "");
(pegado desde el enlace en los comentarios a continuación)