javascript - regular - ¿Reemplazar eficientemente todos los caracteres acentuados en una cadena?

replace javascript (18)

Aquí hay una versión más completa basada en el estándar Unicode, tomada desde aquí: http://semplicewebsites.com/removing-accents-javascript

Algunos ejemplos:

> "Piqué".latinize(); "Pique" > "Piqué".isLatin(); false > "Pique".isLatin(); true > "Piqué".latinise().isLatin(); true

Para la implementación de una ordenación near-colate-correct de un pobre hombre en el lado del cliente, necesito una función de JavaScript que haga un reemplazo eficiente de un solo carácter en una cadena.

Esto es lo que quiero decir (tenga en cuenta que esto se aplica al texto en alemán, otros idiomas se ordenan de manera diferente):

native sorting gets it wrong: a b c o u z ä ö ü collation-correct would be: a ä b c o ö u ü z

Básicamente, necesito que todas las ocurrencias de "ä" de una cadena dada sean reemplazadas por "a" (y así sucesivamente). De esta forma, el resultado de la ordenación nativa sería muy similar a lo que un usuario esperaría (o lo que devolvería una base de datos).

Otros idiomas tienen facilidades para hacer precisamente eso: Python suministra str.translate() , en Perl hay tr/…/…/ , XPath tiene una función translate() , ColdFusion tiene ReplaceList() . Pero ¿qué pasa con JavaScript?

Esto es lo que tengo ahora.

// s would be a rather short string (something like // 200 characters at max, most of the time much less) function makeSortString(s) { var translate = { "ä": "a", "ö": "o", "ü": "u", "Ä": "A", "Ö": "O", "Ü": "U" // probably more to come }; var translate_re = /[öäüÖÄÜ]/g; return ( s.replace(translate_re, function(match) { return translate[match]; }) ); }

Para empezar, no me gusta el hecho de que la expresión regular se reconstruye cada vez que llamo a la función. Supongo que un cierre puede ser útil en este sentido, pero parece que no me gusta el truco por alguna razón.

¿Alguien puede pensar en algo más eficiente?

Las respuestas a continuación se clasifican en dos categorías:

Funciones de reemplazo de cadenas de diversos grados de completitud y eficiencia (sobre lo que originalmente preguntaba)
Una mención tardía de String#localeCompare , que es ampliamente compatible con los motores JS y podría resolver esta categoría de problema de forma mucho más elegante.

Basado en la solución de Jason Bunting, esto es lo que uso ahora.

Todo es para el complemento jQuery tablesorter : para la clasificación (casi correcta) de las tablas que no están en inglés con el complemento de tablesorter, es necesario utilizar una función textExtraction personalizada.

Éste:

traduce las letras acentuadas más comunes a las sin acentos (la lista de letras admitidas es fácilmente ampliable)
cambia las fechas en formato alemán ( ''dd.mm.yyyy'' ) a un formato reconocido ( ''yyyy-mm-dd'' )

Tenga cuidado de guardar el archivo JavaScript en codificación UTF-8 o no funcionará.

// file encoding must be UTF-8! function getTextExtractor() { return (function() { var patternLetters = /[öäüÖÄÜáàâéèêúùûóòôÁÀÂÉÈÊÚÙÛÓÒÔß]/g; var patternDateDmy = /^(?:/D+)?(/d{1,2})/.(/d{1,2})/.(/d{2,4})$/; var lookupLetters = { "ä": "a", "ö": "o", "ü": "u", "Ä": "A", "Ö": "O", "Ü": "U", "á": "a", "à": "a", "â": "a", "é": "e", "è": "e", "ê": "e", "ú": "u", "ù": "u", "û": "u", "ó": "o", "ò": "o", "ô": "o", "Á": "A", "À": "A", "Â": "A", "É": "E", "È": "E", "Ê": "E", "Ú": "U", "Ù": "U", "Û": "U", "Ó": "O", "Ò": "O", "Ô": "O", "ß": "s" }; var letterTranslator = function(match) { return lookupLetters[match] || match; } return function(node) { var text = $.trim($(node).text()); var date = text.match(patternDateDmy); if (date) return [date[3], date[2], date[1]].join("-"); else return text.replace(patternLetters, letterTranslator); } })(); }

Puedes usarlo así:

$("table.sortable").tablesorter({ textExtraction: getTextExtractor() });

Creo que esto podría funcionar un poco más limpio / mejor (aunque no he probado su rendimiento):

String.prototype.stripAccents = function() { var translate_re = /[àáâãäçèéêëìíîïñòóôõöùúûüýÿÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝ]/g; var translate = ''aaaaaceeeeiiiinooooouuuuyyAAAAACEEEEIIIINOOOOOUUUUY''; return (this.replace(translate_re, function(match){ return translate.substr(translate_re.source.indexOf(match)-1, 1); }) ); };

O si todavía le preocupa el rendimiento, obtengamos lo mejor de ambos mundos:

String.prototype.stripAccents = function() { var in_chrs = ''àáâãäçèéêëìíîïñòóôõöùúûüýÿÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝ'', out_chrs = ''aaaaaceeeeiiiinooooouuuuyyAAAAACEEEEIIIINOOOOOUUUUY'', transl = {}; eval(''var chars_rgx = /[''+in_chrs+'']/g''); for(var i = 0; i < in_chrs.length; i++){ transl[in_chrs.charAt(i)] = out_chrs.charAt(i); } return this.replace(chars_rgx, function(match){ return transl[match]; }); };

EDITAR (por @Tomalak)

Aprecio la idea Sin embargo, hay varias cosas mal con la implementación, como se describe en el comentario a continuación.

Aquí es cómo lo implementaría.

var stripAccents = (function () { var in_chrs = ''àáâãäçèéêëìíîïñòóôõöùúûüýÿÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝ'', out_chrs = ''aaaaaceeeeiiiinooooouuuuyyAAAAACEEEEIIIINOOOOOUUUUY'', chars_rgx = new RegExp(''['' + in_chrs + '']'', ''g''), transl = {}, i, lookup = function (m) { return transl[m] || m; }; for (i=0; i<in_chrs.length; i++) { transl[ in_chrs[i] ] = out_chrs[i]; } return function (s) { return s.replace(chars_rgx, lookup); } })();

La terminología correcta para tales acentos es Diacritics . Después de buscar en Google este término, encontré esta función que es parte de backbone.paginator . Tiene una colección muy completa de Diacritics y los reemplaza con su personaje Ascii más intuitivo. Encontré que esta es la solución de Javascript más completa disponible en la actualidad.

La función completa para referencia futura:

No puedo hablar de lo que estás tratando de hacer específicamente con la función en sí, pero si no te gusta la expresión regular que se está creando cada vez, aquí hay dos soluciones y algunas advertencias sobre cada una.

Aquí hay una manera de hacer esto:

function makeSortString(s) { if(!makeSortString.translate_re) makeSortString.translate_re = /[öäüÖÄÜ]/g; var translate = { "ä": "a", "ö": "o", "ü": "u", "Ä": "A", "Ö": "O", "Ü": "U" // probably more to come }; return ( s.replace(makeSortString.translate_re, function(match) { return translate[match]; }) ); }

Obviamente, esto hará que la expresión regular sea una propiedad de la función en sí misma. Lo único que puede no gustarle acerca de esto (o puede, supongo que depende) es que la expresión regular ahora puede modificarse fuera del cuerpo de la función. Entonces, alguien podría hacer esto para modificar la expresión regular utilizada internamente:

makeSortString.translate_re = /[a-z]/g;

Entonces, existe esa opción.

Una forma de obtener un cierre, y así evitar que alguien modifique la expresión regular, sería definir esto como una asignación de función anónima como esta:

var makeSortString = (function() { var translate_re = /[öäüÖÄÜ]/g; return function(s) { var translate = { "ä": "a", "ö": "o", "ü": "u", "Ä": "A", "Ö": "O", "Ü": "U" // probably more to come }; return ( s.replace(translate_re, function(match) { return translate[match]; }) ); } })();

Espero que esto te sea útil.

ACTUALIZACIÓN: Es temprano y no sé por qué no vi lo obvio antes, pero también podría ser útil ponerle objeto de translate en un cierre:

var makeSortString = (function() { var translate_re = /[öäüÖÄÜ]/g; var translate = { "ä": "a", "ö": "o", "ü": "u", "Ä": "A", "Ö": "O", "Ü": "U" // probably more to come }; return function(s) { return ( s.replace(translate_re, function(match) { return translate[match]; }) ); } })();

Lo he solucionado de otra manera, si quieres.

Aquí utilicé dos matrices donde searchChars que contiene serán reemplazados y replaceChars que contienen los caracteres deseados.

var text = "your input string"; var searchChars = [''Å'',''Ä'',''å'',''Ö'',''ö'']; // add more charecter. var replaceChars = [''A'',''A'',''a'',''O'',''o'']; // exact same index to searchChars. var index; for (var i = 0; i < text.length; i++) { if( $.inArray(text[i], searchChars) >-1 ){ // $.inArray() is from jquery. index = searchChars.indexOf(text[i]); text = text.slice(0, i) + replaceChars[index] + text.slice(i+1,text.length); } }

A direct port to javascript of Kierons solution: https://github.com/rwarasaurus/nano/blob/master/system/helpers.php#L61-73 :

/** * Normalise a string replacing foreign characters * * @param {String} str * @return {String} str */ var normalize = (function () { var a = [''À'', ''Á'', ''Â'', ''Ã'', ''Ä'', ''Å'', ''Æ'', ''Ç'', ''È'', ''É'', ''Ê'', ''Ë'', ''Ì'', ''Í'', ''Î'', ''Ï'', ''Ð'', ''Ñ'', ''Ò'', ''Ó'', ''Ô'', ''Õ'', ''Ö'', ''Ø'', ''Ù'', ''Ú'', ''Û'', ''Ü'', ''Ý'', ''ß'', ''à'', ''á'', ''â'', ''ã'', ''ä'', ''å'', ''æ'', ''ç'', ''è'', ''é'', ''ê'', ''ë'', ''ì'', ''í'', ''î'', ''ï'', ''ñ'', ''ò'', ''ó'', ''ô'', ''õ'', ''ö'', ''ø'', ''ù'', ''ú'', ''û'', ''ü'', ''ý'', ''ÿ'', ''Ā'', ''ā'', ''Ă'', ''ă'', ''Ą'', ''ą'', ''Ć'', ''ć'', ''Ĉ'', ''ĉ'', ''Ċ'', ''ċ'', ''Č'', ''č'', ''Ď'', ''ď'', ''Đ'', ''đ'', ''Ē'', ''ē'', ''Ĕ'', ''ĕ'', ''Ė'', ''ė'', ''Ę'', ''ę'', ''Ě'', ''ě'', ''Ĝ'', ''ĝ'', ''Ğ'', ''ğ'', ''Ġ'', ''ġ'', ''Ģ'', ''ģ'', ''Ĥ'', ''ĥ'', ''Ħ'', ''ħ'', ''Ĩ'', ''ĩ'', ''Ī'', ''ī'', ''Ĭ'', ''ĭ'', ''Į'', ''į'', ''İ'', ''ı'', ''Ĳ'', ''ĳ'', ''Ĵ'', ''ĵ'', ''Ķ'', ''ķ'', ''Ĺ'', ''ĺ'', ''Ļ'', ''ļ'', ''Ľ'', ''ľ'', ''Ŀ'', ''ŀ'', ''Ł'', ''ł'', ''Ń'', ''ń'', ''Ņ'', ''ņ'', ''Ň'', ''ň'', ''ŉ'', ''Ō'', ''ō'', ''Ŏ'', ''ŏ'', ''Ő'', ''ő'', ''Œ'', ''œ'', ''Ŕ'', ''ŕ'', ''Ŗ'', ''ŗ'', ''Ř'', ''ř'', ''Ś'', ''ś'', ''Ŝ'', ''ŝ'', ''Ş'', ''ş'', ''Š'', ''š'', ''Ţ'', ''ţ'', ''Ť'', ''ť'', ''Ŧ'', ''ŧ'', ''Ũ'', ''ũ'', ''Ū'', ''ū'', ''Ŭ'', ''ŭ'', ''Ů'', ''ů'', ''Ű'', ''ű'', ''Ų'', ''ų'', ''Ŵ'', ''ŵ'', ''Ŷ'', ''ŷ'', ''Ÿ'', ''Ź'', ''ź'', ''Ż'', ''ż'', ''Ž'', ''ž'', ''ſ'', ''ƒ'', ''Ơ'', ''ơ'', ''Ư'', ''ư'', ''Ǎ'', ''ǎ'', ''Ǐ'', ''ǐ'', ''Ǒ'', ''ǒ'', ''Ǔ'', ''ǔ'', ''Ǖ'', ''ǖ'', ''Ǘ'', ''ǘ'', ''Ǚ'', ''ǚ'', ''Ǜ'', ''ǜ'', ''Ǻ'', ''ǻ'', ''Ǽ'', ''ǽ'', ''Ǿ'', ''ǿ'']; var b = [''A'', ''A'', ''A'', ''A'', ''A'', ''A'', ''AE'', ''C'', ''E'', ''E'', ''E'', ''E'', ''I'', ''I'', ''I'', ''I'', ''D'', ''N'', ''O'', ''O'', ''O'', ''O'', ''O'', ''O'', ''U'', ''U'', ''U'', ''U'', ''Y'', ''s'', ''a'', ''a'', ''a'', ''a'', ''a'', ''a'', ''ae'', ''c'', ''e'', ''e'', ''e'', ''e'', ''i'', ''i'', ''i'', ''i'', ''n'', ''o'', ''o'', ''o'', ''o'', ''o'', ''o'', ''u'', ''u'', ''u'', ''u'', ''y'', ''y'', ''A'', ''a'', ''A'', ''a'', ''A'', ''a'', ''C'', ''c'', ''C'', ''c'', ''C'', ''c'', ''C'', ''c'', ''D'', ''d'', ''D'', ''d'', ''E'', ''e'', ''E'', ''e'', ''E'', ''e'', ''E'', ''e'', ''E'', ''e'', ''G'', ''g'', ''G'', ''g'', ''G'', ''g'', ''G'', ''g'', ''H'', ''h'', ''H'', ''h'', ''I'', ''i'', ''I'', ''i'', ''I'', ''i'', ''I'', ''i'', ''I'', ''i'', ''IJ'', ''ij'', ''J'', ''j'', ''K'', ''k'', ''L'', ''l'', ''L'', ''l'', ''L'', ''l'', ''L'', ''l'', ''l'', ''l'', ''N'', ''n'', ''N'', ''n'', ''N'', ''n'', ''n'', ''O'', ''o'', ''O'', ''o'', ''O'', ''o'', ''OE'', ''oe'', ''R'', ''r'', ''R'', ''r'', ''R'', ''r'', ''S'', ''s'', ''S'', ''s'', ''S'', ''s'', ''S'', ''s'', ''T'', ''t'', ''T'', ''t'', ''T'', ''t'', ''U'', ''u'', ''U'', ''u'', ''U'', ''u'', ''U'', ''u'', ''U'', ''u'', ''U'', ''u'', ''W'', ''w'', ''Y'', ''y'', ''Y'', ''Z'', ''z'', ''Z'', ''z'', ''Z'', ''z'', ''s'', ''f'', ''O'', ''o'', ''U'', ''u'', ''A'', ''a'', ''I'', ''i'', ''O'', ''o'', ''U'', ''u'', ''U'', ''u'', ''U'', ''u'', ''U'', ''u'', ''U'', ''u'', ''A'', ''a'', ''AE'', ''ae'', ''O'', ''o'']; return function (str) { var i = a.length; while (i--) str = str.replace(a[i], b[i]); return str; }; }());

And a slightly modified version, using a char-map instead of two arrays:

To compare these two methods I made a simple benchmark: http://jsperf.com/replace-foreign-characters

/** * Normalise a string replacing foreign characters * * @param {String} str * @return {String} */ var normalize = (function () { var map = { "À": "A", "Á": "A", "Â": "A", "Ã": "A", "Ä": "A", "Å": "A", "Æ": "AE", "Ç": "C", "È": "E", "É": "E", "Ê": "E", "Ë": "E", "Ì": "I", "Í": "I", "Î": "I", "Ï": "I", "Ð": "D", "Ñ": "N", "Ò": "O", "Ó": "O", "Ô": "O", "Õ": "O", "Ö": "O", "Ø": "O", "Ù": "U", "Ú": "U", "Û": "U", "Ü": "U", "Ý": "Y", "ß": "s", "à": "a", "á": "a", "â": "a", "ã": "a", "ä": "a", "å": "a", "æ": "ae", "ç": "c", "è": "e", "é": "e", "ê": "e", "ë": "e", "ì": "i", "í": "i", "î": "i", "ï": "i", "ñ": "n", "ò": "o", "ó": "o", "ô": "o", "õ": "o", "ö": "o", "ø": "o", "ù": "u", "ú": "u", "û": "u", "ü": "u", "ý": "y", "ÿ": "y", "Ā": "A", "ā": "a", "Ă": "A", "ă": "a", "Ą": "A", "ą": "a", "Ć": "C", "ć": "c", "Ĉ": "C", "ĉ": "c", "Ċ": "C", "ċ": "c", "Č": "C", "č": "c", "Ď": "D", "ď": "d", "Đ": "D", "đ": "d", "Ē": "E", "ē": "e", "Ĕ": "E", "ĕ": "e", "Ė": "E", "ė": "e", "Ę": "E", "ę": "e", "Ě": "E", "ě": "e", "Ĝ": "G", "ĝ": "g", "Ğ": "G", "ğ": "g", "Ġ": "G", "ġ": "g", "Ģ": "G", "ģ": "g", "Ĥ": "H", "ĥ": "h", "Ħ": "H", "ħ": "h", "Ĩ": "I", "ĩ": "i", "Ī": "I", "ī": "i", "Ĭ": "I", "ĭ": "i", "Į": "I", "į": "i", "İ": "I", "ı": "i", "Ĳ": "IJ", "ĳ": "ij", "Ĵ": "J", "ĵ": "j", "Ķ": "K", "ķ": "k", "Ĺ": "L", "ĺ": "l", "Ļ": "L", "ļ": "l", "Ľ": "L", "ľ": "l", "Ŀ": "L", "ŀ": "l", "Ł": "l", "ł": "l", "Ń": "N", "ń": "n", "Ņ": "N", "ņ": "n", "Ň": "N", "ň": "n", "ŉ": "n", "Ō": "O", "ō": "o", "Ŏ": "O", "ŏ": "o", "Ő": "O", "ő": "o", "Œ": "OE", "œ": "oe", "Ŕ": "R", "ŕ": "r", "Ŗ": "R", "ŗ": "r", "Ř": "R", "ř": "r", "Ś": "S", "ś": "s", "Ŝ": "S", "ŝ": "s", "Ş": "S", "ş": "s", "Š": "S", "š": "s", "Ţ": "T", "ţ": "t", "Ť": "T", "ť": "t", "Ŧ": "T", "ŧ": "t", "Ũ": "U", "ũ": "u", "Ū": "U", "ū": "u", "Ŭ": "U", "ŭ": "u", "Ů": "U", "ů": "u", "Ű": "U", "ű": "u", "Ų": "U", "ų": "u", "Ŵ": "W", "ŵ": "w", "Ŷ": "Y", "ŷ": "y", "Ÿ": "Y", "Ź": "Z", "ź": "z", "Ż": "Z", "ż": "z", "Ž": "Z", "ž": "z", "ſ": "s", "ƒ": "f", "Ơ": "O", "ơ": "o", "Ư": "U", "ư": "u", "Ǎ": "A", "ǎ": "a", "Ǐ": "I", "ǐ": "i", "Ǒ": "O", "ǒ": "o", "Ǔ": "U", "ǔ": "u", "Ǖ": "U", "ǖ": "u", "Ǘ": "U", "ǘ": "u", "Ǚ": "U", "ǚ": "u", "Ǜ": "U", "ǜ": "u", "Ǻ": "A", "ǻ": "a", "Ǽ": "AE", "ǽ": "ae", "Ǿ": "O", "ǿ": "o" }, nonWord = //W/g, mapping = function (c) { return map[c] || c; }; return function (str) { return str.replace(nonWord, mapping); }; }());

A simple and easy way:

function remove-accents(p){ c=''áàãâäéèêëíìîïóòõôöúùûüçÁÀÃÂÄÉÈÊËÍÌÎÏÓÒÕÖÔÚÙÛÜÇ'';s=''aaaaaeeeeiiiiooooouuuucAAAAAEEEEIIIIOOOOOUUUUC'';n='''';for(i=0;i<p.length;i++){if(c.search(p.substr(i,1))>=0){n+=s.substr(c.search(p.substr(i,1)),1);} else{n+=p.substr(i,1);}} return n; }

Entonces haz esto:

remove-accents("Thís ís ân accêntéd phráse");

Salida:

"This is an accented phrase"

Basing on existing answers and some suggestions, I''ve created this one:

String.prototype.removeAccents = function() { var removalMap = { ''A'' : /[AⒶＡÀÁÂẦẤẪẨÃĀĂẰẮẴẲȦǠÄǞẢÅǺǍȀȂẠẬẶḀĄ]/g, ''AA'' : /[Ꜳ]/g, ''AE'' : /[ÆǼǢ]/g, ''AO'' : /[Ꜵ]/g, ''AU'' : /[Ꜷ]/g, ''AV'' : /[ꜸꜺ]/g, ''AY'' : /[Ꜽ]/g, ''B'' : /[BⒷＢḂḄḆɃƂƁ]/g, ''C'' : /[CⒸＣĆĈĊČÇḈƇȻꜾ]/g, ''D'' : /[DⒹＤḊĎḌḐḒḎĐƋƊƉꝹ]/g, ''DZ'' : /[ǱǄ]/g, ''Dz'' : /[ǲǅ]/g, ''E'' : /[EⒺＥÈÉÊỀẾỄỂẼĒḔḖĔĖËẺĚȄȆẸỆȨḜĘḘḚƐƎ]/g, ''F'' : /[FⒻＦḞƑꝻ]/g, ''G'' : /[GⒼＧǴĜḠĞĠǦĢǤƓꞠꝽꝾ]/g, ''H'' : /[HⒽＨĤḢḦȞḤḨḪĦⱧⱵꞍ]/g, ''I'' : /[IⒾＩÌÍÎĨĪĬİÏḮỈǏȈȊỊĮḬƗ]/g, ''J'' : /[JⒿＪĴɈ]/g, ''K'' : /[KⓀＫḰǨḲĶḴƘⱩꝀꝂꝄꞢ]/g, ''L'' : /[LⓁＬĿĹĽḶḸĻḼḺŁȽⱢⱠꝈꝆꞀ]/g, ''LJ'' : /[Ǉ]/g, ''Lj'' : /[ǈ]/g, ''M'' : /[MⓂＭḾṀṂⱮƜ]/g, ''N'' : /[NⓃＮǸŃÑṄŇṆŅṊṈȠƝꞐꞤ]/g, ''NJ'' : /[Ǌ]/g, ''Nj'' : /[ǋ]/g, ''O'' : /[OⓄＯÒÓÔỒỐỖỔÕṌȬṎŌṐṒŎȮȰÖȪỎŐǑȌȎƠỜỚỠỞỢỌỘǪǬØǾƆƟꝊꝌ]/g, ''OI'' : /[Ƣ]/g, ''OO'' : /[Ꝏ]/g, ''OU'' : /[Ȣ]/g, ''P'' : /[PⓅＰṔṖƤⱣꝐꝒꝔ]/g, ''Q'' : /[QⓆＱꝖꝘɊ]/g, ''R'' : /[RⓇＲŔṘŘȐȒṚṜŖṞɌⱤꝚꞦꞂ]/g, ''S'' : /[SⓈＳẞŚṤŜṠŠṦṢṨȘŞⱾꞨꞄ]/g, ''T'' : /[TⓉＴṪŤṬȚŢṰṮŦƬƮȾꞆ]/g, ''TZ'' : /[Ꜩ]/g, ''U'' : /[UⓊＵÙÚÛŨṸŪṺŬÜǛǗǕǙỦŮŰǓȔȖƯỪỨỮỬỰỤṲŲṶṴɄ]/g, ''V'' : /[VⓋＶṼṾƲꝞɅ]/g, ''VY'' : /[Ꝡ]/g, ''W'' : /[WⓌＷẀẂŴẆẄẈⱲ]/g, ''X'' : /[XⓍＸẊẌ]/g, ''Y'' : /[YⓎＹỲÝŶỸȲẎŸỶỴƳɎỾ]/g, ''Z'' : /[ZⓏＺŹẐŻŽẒẔƵȤⱿⱫꝢ]/g, ''a'' : /[aⓐａẚàáâầấẫẩãāăằắẵẳȧǡäǟảåǻǎȁȃạậặḁąⱥɐ]/g, ''aa'' : /[ꜳ]/g, ''ae'' : /[æǽǣ]/g, ''ao'' : /[ꜵ]/g, ''au'' : /[ꜷ]/g, ''av'' : /[ꜹꜻ]/g, ''ay'' : /[ꜽ]/g, ''b'' : /[bⓑｂḃḅḇƀƃɓ]/g, ''c'' : /[cⓒｃćĉċčçḉƈȼꜿↄ]/g, ''d'' : /[dⓓｄḋďḍḑḓḏđƌɖɗꝺ]/g, ''dz'' : /[ǳǆ]/g, ''e'' : /[eⓔｅèéêềếễểẽēḕḗĕėëẻěȅȇẹệȩḝęḙḛɇɛǝ]/g, ''f'' : /[fⓕｆḟƒꝼ]/g, ''g'' : /[gⓖｇǵĝḡğġǧģǥɠꞡᵹꝿ]/g, ''h'' : /[hⓗｈĥḣḧȟḥḩḫẖħⱨⱶɥ]/g, ''hv'' : /[ƕ]/g, ''i'' : /[iⓘｉìíîĩīĭïḯỉǐȉȋịįḭɨı]/g, ''j'' : /[jⓙｊĵǰɉ]/g, ''k'' : /[kⓚｋḱǩḳķḵƙⱪꝁꝃꝅꞣ]/g, ''l'' : /[lⓛｌŀĺľḷḹļḽḻſłƚɫⱡꝉꞁꝇ]/g, ''lj'' : /[ǉ]/g, ''m'' : /[mⓜｍḿṁṃɱɯ]/g, ''n'' : /[nⓝｎǹńñṅňṇņṋṉƞɲŉꞑꞥ]/g, ''nj'' : /[ǌ]/g, ''o'' : /[oⓞｏòóôồốỗổõṍȭṏōṑṓŏȯȱöȫỏőǒȍȏơờớỡởợọộǫǭøǿɔꝋꝍɵ]/g, ''oi'' : /[ƣ]/g, ''ou'' : /[ȣ]/g, ''oo'' : /[ꝏ]/g, ''p'' : /[pⓟｐṕṗƥᵽꝑꝓꝕ]/g, ''q'' : /[qⓠｑɋꝗꝙ]/g, ''r'' : /[rⓡｒŕṙřȑȓṛṝŗṟɍɽꝛꞧꞃ]/g, ''s'' : /[sⓢｓßśṥŝṡšṧṣṩșşȿꞩꞅẛ]/g, ''t'' : /[tⓣｔṫẗťṭțţṱṯŧƭʈⱦꞇ]/g, ''tz'' : /[ꜩ]/g, ''u'' : /[uⓤｕùúûũṹūṻŭüǜǘǖǚủůűǔȕȗưừứữửựụṳųṷṵʉ]/g, ''v'' : /[vⓥｖṽṿʋꝟʌ]/g, ''vy'' : /[ꝡ]/g, ''w'' : /[wⓦｗẁẃŵẇẅẘẉⱳ]/g, ''x'' : /[xⓧｘẋẍ]/g, ''y'' : /[yⓨｙỳýŷỹȳẏÿỷẙỵƴɏỿ]/g, ''z'' : /[zⓩｚźẑżžẓẕƶȥɀⱬꝣ]/g, }; var str = this; for(var latin in removalMap) { var nonLatin = removalMap[latin]; str = str.replace(nonLatin , latin); } return str; }

It uses real chars instead of unicode list and works well.

You can use it like

"ąąą".removeAccents(); // returns "aaa"

You can easily convert this function to not be string prototype. However, as I''m fan of using string prototype in such cases, you''ll have to do it yourself.

I just wanted to post my solution using String#localeCompare

I made a Prototype Version of this:

String.prototype.strip = function() { var translate_re = /[öäüÖÄÜß ]/g; var translate = { "ä":"a", "ö":"o", "ü":"u", "Ä":"A", "Ö":"O", "Ü":"U", " ":"_", "ß":"ss" // probably more to come }; return (this.replace(translate_re, function(match){ return translate[match];}) ); };

Use like:

var teststring = ''ä ö ü Ä Ö Ü ß''; teststring.strip();

This will will change the String to a_o_u_A_O_U_ss

If you want to achieve sorting where "ä" comes after "a" and is not treated as the same, then you can use a function like mine.

You can always change the alphabet to get different or even weird sortings. However, if you want some letters to be equivalent, then you have to manipulate the strings like a = a.replace(/ä/, ''a'') or similar, as many have already replied above. I''ve included the uppercase letters if someone wants to have all uppercase words before all lowercase words (then you have to ommit .toLowerCase() ).

If you''re looking specifically for a way to convert accented characters to non-accented characters, rather than a way to sort accented characters, with a little finagling, the String.localeCompare function can be manipulated to find the basic latin characters that match the extended ones. For example, you might want to produce a human friendly url slug from a page title. If so, you can do something like this:

This should perform quite well, but if further optimization were needed, a binary search could be used with localeCompare as the comparator to locate the base character. Note that case is preserved, and options allow for either preserving, replacing, or removing characters that aren''t alphabetical, or do not have matching latin characters they can be replaced with. This implementation is faster and more flexible, and should work with new characters as they are added. The disadvantage is that compound characters like ''ꝡ'' have to be handled specifically, if they need to be supported.

Long time ago I did this in Java and found someone else''s solution based on a single string that captures part of the Unicode table that was important for the conversion - the rest was converted to ? or any other replacement character. So I tried to convert it to JavaScript. Mind that I''m no JS expert. :-)

TAB_00C0 = "AAAAAAACEEEEIIII" + "DNOOOOO*OUUUUYIs" + "aaaaaaaceeeeiiii" + "?nooooo/ouuuuy?y" + "AaAaAaCcCcCcCcDd" + "DdEeEeEeEeEeGgGg" + "GgGgHhHhIiIiIiIi" + "IiJjJjKkkLlLlLlL" + "lLlNnNnNnnNnOoOo" + "OoOoRrRrRrSsSsSs" + "SsTtTtTtUuUuUuUu" + "UuUuWwYyYZzZzZzF"; function stripDiacritics(source) { var result = source.split(''''); for (var i = 0; i < result.length; i++) { var c = source.charCodeAt(i); if (c >= 0x00c0 && c <= 0x017f) { result[i] = String.fromCharCode(TAB_00C0.charCodeAt(c - 0x00c0)); } else if (c > 127) { result[i] = ''?''; } } return result.join(''''); } stripDiacritics("Šupa, čo? ľšťčžýæøåℌð")

This converts most of latin1+2 Unicode characters. It is not able to translate single char to multiple. I don''t know its performance on JS, in Java this is by far the fastest of common solutions (6-50x), there is no map, there is no regex, nothing. It produces strict ASCII output, potentially with a loss of information, but the size of the output matches the input.

I tested the snippet with http://www.webtoolkitonline.com/javascript-tester.html and it produced Supa, co? lstczyaoa?? as expected.

Not a single answer mentions String.localeCompare , which happens to do exactly what you originally wanted, but not what you''re asking for.

var list = [''a'', ''b'', ''c'', ''o'', ''u'', ''z'', ''ä'', ''ö'', ''ü'']; list.sort((a, b) => a.localeCompare(b)); console.log(list); //Outputs [''a'', ''ä'', ''b'', ''c'', ''o'', ''ö'', ''u'', ''ü'', ''z'']

The second and third parameter are not supported by older browsers though. It''s an option worth considering nonetheless.

Simply should be normalized chain and run a replacement codes:

var str = "Letras Á É Í Ó Ú Ñ - á é í ó ú ñ..."; console.log (str.normalize ("NFKD").replace (/[/u0300-/u036F]/g, "")); // Letras A E I O U N - a e i o u n...

See normalize

Then you can use this function:

function noTilde (s) { if (s.normalize != undefined) { s = s.normalize ("NFKD"); } return s.replace (/[/u0300-/u036F]/g, ""); }

The complete solution to your request is:

function convert_accented_characters(str){ var conversions = new Object(); conversions[''ae''] = ''ä|æ|ǽ''; conversions[''oe''] = ''ö|œ''; conversions[''ue''] = ''ü''; conversions[''Ae''] = ''Ä''; conversions[''Ue''] = ''Ü''; conversions[''Oe''] = ''Ö''; conversions[''A''] = ''À|Á|Â|Ã|Ä|Å|Ǻ|Ā|Ă|Ą|Ǎ''; conversions[''a''] = ''à|á|â|ã|å|ǻ|ā|ă|ą|ǎ|ª''; conversions[''C''] = ''Ç|Ć|Ĉ|Ċ|Č''; conversions[''c''] = ''ç|ć|ĉ|ċ|č''; conversions[''D''] = ''Ð|Ď|Đ''; conversions[''d''] = ''ð|ď|đ''; conversions[''E''] = ''È|É|Ê|Ë|Ē|Ĕ|Ė|Ę|Ě''; conversions[''e''] = ''è|é|ê|ë|ē|ĕ|ė|ę|ě''; conversions[''G''] = ''Ĝ|Ğ|Ġ|Ģ''; conversions[''g''] = ''ĝ|ğ|ġ|ģ''; conversions[''H''] = ''Ĥ|Ħ''; conversions[''h''] = ''ĥ|ħ''; conversions[''I''] = ''Ì|Í|Î|Ï|Ĩ|Ī|Ĭ|Ǐ|Į|İ''; conversions[''i''] = ''ì|í|î|ï|ĩ|ī|ĭ|ǐ|į|ı''; conversions[''J''] = ''Ĵ''; conversions[''j''] = ''ĵ''; conversions[''K''] = ''Ķ''; conversions[''k''] = ''ķ''; conversions[''L''] = ''Ĺ|Ļ|Ľ|Ŀ|Ł''; conversions[''l''] = ''ĺ|ļ|ľ|ŀ|ł''; conversions[''N''] = ''Ñ|Ń|Ņ|Ň''; conversions[''n''] = ''ñ|ń|ņ|ň|ŉ''; conversions[''O''] = ''Ò|Ó|Ô|Õ|Ō|Ŏ|Ǒ|Ő|Ơ|Ø|Ǿ''; conversions[''o''] = ''ò|ó|ô|õ|ō|ŏ|ǒ|ő|ơ|ø|ǿ|º''; conversions[''R''] = ''Ŕ|Ŗ|Ř''; conversions[''r''] = ''ŕ|ŗ|ř''; conversions[''S''] = ''Ś|Ŝ|Ş|Š''; conversions[''s''] = ''ś|ŝ|ş|š|ſ''; conversions[''T''] = ''Ţ|Ť|Ŧ''; conversions[''t''] = ''ţ|ť|ŧ''; conversions[''U''] = ''Ù|Ú|Û|Ũ|Ū|Ŭ|Ů|Ű|Ų|Ư|Ǔ|Ǖ|Ǘ|Ǚ|Ǜ''; conversions[''u''] = ''ù|ú|û|ũ|ū|ŭ|ů|ű|ų|ư|ǔ|ǖ|ǘ|ǚ|ǜ''; conversions[''Y''] = ''Ý|Ÿ|Ŷ''; conversions[''y''] = ''ý|ÿ|ŷ''; conversions[''W''] = ''Ŵ''; conversions[''w''] = ''ŵ''; conversions[''Z''] = ''Ź|Ż|Ž''; conversions[''z''] = ''ź|ż|ž''; conversions[''AE''] = ''Æ|Ǽ''; conversions[''ss''] = ''ß''; conversions[''IJ''] = ''Ĳ''; conversions[''ij''] = ''ĳ''; conversions[''OE''] = ''Œ''; conversions[''f''] = ''ƒ''; for(var i in conversions){ var re = new RegExp(conversions[i],"g"); str = str.replace(re,i); } return str; }

https://.com/a/37511463

With ES2015/ES6 normalize ,
const str = "Crème Brulée" str.normalize(''NFD'').replace(/[/u0300-/u036f]/g, "") > ''Creme Brulee''
Two things are happening here:
normalize() ing to NFD Unicode normal form decomposes combined graphemes into the combination of simple ones. The è of Crème ends up expressed as e + ̀ .
Using a regex character class to match the U+0300 → U+036F range, it is now trivial to g lobally get rid of the diacritics, which the Unicode standard conveniently groups as the Combining Diacritical Marks Unicode block.
See comment for performance testing.
Alternatively, if you just want sorting
Intl.Collator has sufficient support ~85% right now , a polyfill is also available here but I haven''t tested it.
const c = new Intl.Collator(); [''creme brulee'', ''crème brulée'', ''crame brulai'', ''crome brouillé'', ''creme brulay'', ''creme brulfé'', ''creme bruléa''].sort(c.compare) [ ''crame brulai'',''creme brulay'',''creme bruléa'',''creme brulee'', ''crème brulée'',''creme brulfé'',''crome brouillé'' ] [''creme brulee'', ''crème brulée'', ''crame brulai'', ''crome brouillé''].sort((a,b) => a>b) ["crame brulai", "creme brulee", "crome brouillé", "crème brulée"]