una texto regulares regular partir parte generar fecha extraer expresión expresiones dni cadenas cadena java regex utf-8 emoji

java - texto - ¿Cuál es la expresión regular para extraer todos los emojis de una cadena?



generar cadenas a partir de una expresión regular (13)

Tengo una cadena codificada en UTF-8. Por ejemplo:

Thats a nice joke 😆😆😆 😛

Tengo que extraer todos los emojis presentes en la oración. Y el emoji podría ser cualquier

Cuando esta oración se visualiza en la terminal usando el comando less text.txt se ve como:

Thats a nice joke <U+1F606><U+1F606><U+1F606> <U+1F61B>

Este es el código UTF correspondiente para los emoji. Todos los códigos para emojis se pueden encontrar en emojitracker .

Con el propósito de encontrar todas las ocurrencias, utilicé un patrón de expresión regular (<U/+/w+?>) Pero no funcionó para la cadena codificada en UTF-8.

Lo siguiente es mi código:

String s="Thats a nice joke 😆😆😆 😛"; Pattern pattern = Pattern.compile("(<U//+//w+?>)"); Matcher matcher = pattern.matcher(s); List<String> matchList = new ArrayList<String>(); while (matcher.find()) { matchList.add(matcher.group()); } for(int i=0;i<matchList.size();i++){ System.out.println(matchList.get(i)); }

Este pdf dice Range: 1F300–1F5FF for Miscellaneous Symbols and Pictographs . Así que quiero capturar cualquier personaje que se encuentre dentro de este rango.


pdf dice Rango: 1F300-1F5FF para Símbolos y Pictografías Misceláneos. Así que digamos que quiero capturar cualquier personaje que se encuentre dentro de este rango. ¿Ahora qué hacer?

De acuerdo, pero solo notaré que los emoji en su pregunta están fuera de ese rango. :-)

El hecho de que estos estén por encima de 0xFFFF complica las cosas, porque las cadenas de Java almacenan UTF-16. Entonces no podemos simplemente usar una clase de personaje simple para eso. Vamos a tener pares sustitutos . (Más: http://www.unicode.org/faq/utf_bom.html )

U + 1F300 en UTF-16 termina siendo el par /uD83C/uDF00 ; U + 1F5FF termina siendo /uD83D/uDDFF . Tenga en cuenta que el primer personaje subió, cruzamos al menos un límite. Entonces, tenemos que saber qué rangos de pares sustitutos estamos buscando.

Al no estar inmerso en el conocimiento sobre el funcionamiento interno de UTF-16, escribí un programa para averiguar (fuente al final, lo verificaría dos veces si fuera usted, en lugar de confiar en mí). Me dice que estamos buscando /uD83C seguido de cualquier cosa en el rango /uDF00-/uDFFF (inclusive), o /uD83D seguido de cualquier cosa en el rango /uDC00-/uDDFF (inclusive).

Así que armados con ese conocimiento, en teoría ahora podríamos escribir un patrón:

// This is wrong, keep reading Pattern p = Pattern.compile("(?:/uD83C[/uDF00-/uDFFF])|(?:/uD83D[/uDC00-/uDDFF])");

Es una alternancia de dos grupos que no capturan, el primer grupo para los pares que comienzan con /uD83C y el segundo grupo para los pares que comienzan con /uD83D .

Pero eso falla (no encuentra nada). Estoy bastante seguro de que es porque estamos tratando de especificar la mitad de un par suplente en varios lugares:

Pattern p = Pattern.compile("(?:/uD83C[/uDF00-/uDFFF])|(?:/uD83D[/uDC00-/uDDFF])"); // Half of a pair --------------^------^------^-----------^------^------^

No podemos simplemente dividir parejas sustitutas como esa, se las llama parejas sustitutas por una razón. :-)

En consecuencia, no creo que podamos usar expresiones regulares (o de hecho, ningún enfoque basado en cadenas) para esto. Creo que tenemos que buscar a través de matrices de caracteres.

char matrices de caracteres tienen valores UTF-16, por lo que podemos encontrar esos pares en los datos si lo buscamos de la manera difícil:

String s = new StringBuilder() .append("Thats a nice joke ") .appendCodePoint(0x1F606) .appendCodePoint(0x1F606) .appendCodePoint(0x1F606) .append(" ") .appendCodePoint(0x1F61B) .toString(); char[] chars = s.toCharArray(); int index; char ch1; char ch2; index = 0; while (index < chars.length - 1) { // -1 because we''re looking for two-char-long things ch1 = chars[index]; if ((int)ch1 == 0xD83C) { ch2 = chars[index+1]; if ((int)ch2 >= 0xDF00 && (int)ch2 <= 0xDFFF) { System.out.println("Found emoji at index " + index); index += 2; continue; } } else if ((int)ch1 == 0xD83D) { ch2 = chars[index+1]; if ((int)ch2 >= 0xDC00 && (int)ch2 <= 0xDDFF) { System.out.println("Found emoji at index " + index); index += 2; continue; } } ++index; }

Obviamente, es solo código de nivel de depuración, pero cumple su función. (En su cadena dada, con sus emoji, por supuesto no encontrará nada, ya que están fuera del rango. Pero si cambia el límite superior en el segundo par a 0xDEFF lugar de 0xDDFF , lo hará. No tengo idea si eso también incluiría no-emojis, sin embargo)

Fuente de mi programa para averiguar cuáles eran los rangos sustitutos:

public class FindRanges { public static void main(String[] args) { char last0 = ''/0''; char last1 = ''/0''; for (int x = 0x1F300; x <= 0x1F5FF; ++x) { char[] chars = new StringBuilder().appendCodePoint(x).toString().toCharArray(); if (chars[0] != last0) { if (last0 != ''/0'') { System.out.println("-//u" + Integer.toHexString((int)last1).toUpperCase()); } System.out.print("//u" + Integer.toHexString((int)chars[0]).toUpperCase() + " //u" + Integer.toHexString((int)chars[1]).toUpperCase()); last0 = chars[0]; } last1 = chars[1]; } if (last0 != ''/0'') { System.out.println("-//u" + Integer.toHexString((int)last1).toUpperCase()); } } }

Salida:

/uD83C /uDF00-/uDFFF /uD83D /uDC00-/uDDFF


Esto es lo que uso para eliminar emojis y hasta ahora ha demostrado que permite todos los demás alfabetos.

private static String remove_Emojis(String name) { //we will store all the letters in this array ArrayList<Character> nonEmoji = new ArrayList<>(); // and when we rebuild the name we will put it in here String newName = ""; // we are going to loop through checking each character to see if its an emoji or not for (int i = 0; i < name.length(); i++) { if (Character.isLetterOrDigit(name.charAt(i))) { nonEmoji.add(name.charAt(i)); } else { // this is just a 2nd check in case the other method didn''t allow some letter if (Build.VERSION.SDK_INT > 18) { if (Character.isAlphabetic(name.charAt(i))) { nonEmoji.add(name.charAt(i)); } } } if (name.charAt(i) == '' '')// may want to consider adding or ''-'' or ''/''' { nonEmoji.add(i);// just add it } if (name.charAt(i) == ''@'' && !name.contains(" "))// I put this in for email addresses { nonEmoji.add(''@''); } } // finally just loop through building it back out for (int i = 0; i < nonEmoji.size(); i++) { newName += nonEmoji.get(i); } return newName; }


Esto funcionó para mí en java 8:

public static String mysqlSafe(String input) { if (input == null) return null; StringBuilder sb = new StringBuilder(); for (int i = 0; i < input.length(); i++) { if (i < (input.length() - 1)) { // Emojis are two characters long in java, e.g. a rocket emoji is "/uD83D/uDE80"; if (Character.isSurrogatePair(input.charAt(i), input.charAt(i + 1))) { i += 1; //also skip the second character of the emoji continue; } } sb.append(input.charAt(i)); } return sb.toString(); }


Hay dos formas de resolver este problema pegajoso.

El primero es Usar libs de terceros como emoji-java y emoji4j. Estos son mencionados arriba. Puedes usar fácilmente el método containsEmoji o removesEmoji , etc. Y en tus propias aplicaciones, necesitas mantener la actualización con estas librerías.

En cuanto a mí, quiero encontrar una solución simple para resolver este problema.

Después de un día entero de búsqueda, encontré una expresión mágica:

"(?:[/uD83C/uDF00-/uD83D/uDDFF]|[/uD83E/uDD00-/uD83E/uDDFF]|[/uD83D/uDE00-/uD83D/uDE4F]|[/uD83D/uDE80-/uD83D/uDEFF]|[/u2600-/u26FF]/uFE0F?|[/u2700-/u27BF]/uFE0F?|/u24C2/uFE0F?|[/uD83C/uDDE6-/uD83C/uDDFF]{1,2}|[/uD83C/uDD70/uD83C/uDD71/uD83C/uDD7E/uD83C/uDD7F/uD83C/uDD8E/uD83C/uDD91-/uD83C/uDD9A]/uFE0F?|[/u0023/u002A/u0030-/u0039]/uFE0F?/u20E3|[/u2194-/u2199/u21A9-/u21AA]/uFE0F?|[/u2B05-/u2B07/u2B1B/u2B1C/u2B50/u2B55]/uFE0F?|[/u2934/u2935]/uFE0F?|[/u3030/u303D]/uFE0F?|[/u3297/u3299]/uFE0F?|[/uD83C/uDE01/uD83C/uDE02/uD83C/uDE1A/uD83C/uDE2F/uD83C/uDE32-/uD83C/uDE3A/uD83C/uDE50/uD83C/uDE51]/uFE0F?|[/u203C/u2049]/uFE0F?|[/u25AA/u25AB/u25B6/u25C0/u25FB-/u25FE]/uFE0F?|[/u00A9/u00AE]/uFE0F?|[/u2122/u2139]/uFE0F?|/uD83C/uDC04/uFE0F?|/uD83C/uDCCF/uFE0F?|[/u231A/u231B/u2328/u23CF/u23E9-/u23F3/u23F8-/u23FA]/uFE0F?)"

que he probado bien en Java. Resolvió perfectamente mi problema.

Puedes ver esto en la página de Github:

https://github.com/zly394/EmojiRegex

Notas:

La respuesta proporcionada por @Eric Nakagawa contiene algunos errores que no se pueden operar correctamente.


La mejor expresión regular para extraer TODOS los emoji es esta:

(?:[/u2700-/u27bf]|(?:/ud83c[/udde6-/uddff]){2}|[/ud800-/udbff][/udc00-/udfff]|[/u0023-/u0039]/ufe0f?/u20e3|/u3299|/u3297|/u303d|/u3030|/u24c2|/ud83c[/udd70-/udd71]|/ud83c[/udd7e-/udd7f]|/ud83c/udd8e|/ud83c[/udd91-/udd9a]|/ud83c[/udde6-/uddff]|[/ud83c[/ude01-/ude02]|/ud83c/ude1a|/ud83c/ude2f|[/ud83c[/ude32-/ude3a]|[/ud83c[/ude50-/ude51]|/u203c|/u2049|[/u25aa-/u25ab]|/u25b6|/u25c0|[/u25fb-/u25fe]|/u00a9|/u00ae|/u2122|/u2139|/ud83c/udc04|[/u2600-/u26FF]|/u2b05|/u2b06|/u2b07|/u2b1b|/u2b1c|/u2b50|/u2b55|/u231a|/u231b|/u2328|/u23cf|[/u23e9-/u23f3]|[/u23f8-/u23fa]|/ud83c/udccf|/u2934|/u2935|[/u2190-/u21ff])

Identifica muchos emoji de un solo carácter que las otras respuestas no tienen en cuenta. Para obtener más información acerca de cómo funciona esta expresión regular, eche un vistazo a esta publicación. https://medium.com/@thekevinscott/emojis-in-javascript-f693d0eb79fb#.enomgcu63


Puede generar su propia expresión regular siempre que cambie la especificación.
Esta herramienta (captura de pantalla here ).

Para el modo utf-8/32 (de cuerda), modo expandido:

" # Use the ''Mega-Conversion'' tool to change into other syntaxes" " # -------------------------------------------------------------" " " " [#*0-9] //x{FE0F} //x{20E3}" " | [//x{A9}//x{AE}//x{203C}//x{2049}//x{2122}//x{2139}//x{2194}-//x{2199}//x{21A9}//x{21AA}//x{231A}//x{231B}//x{2328}//x{23CF}//x{23E9}-//x{23F3}//x{23F8}-//x{23FA}//x{24C2}//x{25AA}//x{25AB}//x{25B6}//x{25C0}//x{25FB}-//x{25FE}//x{2600}-//x{2604}//x{260E}//x{2611}//x{2614}//x{2615}//x{2618}]" " | //x{261D} [//x{1F3FB}-//x{1F3FF}]?" " | [//x{2620}//x{2622}//x{2623}//x{2626}//x{262A}//x{262E}//x{262F}//x{2638}-//x{263A}//x{2640}//x{2642}//x{2648}-//x{2653}//x{265F}//x{2660}//x{2663}//x{2665}//x{2666}//x{2668}//x{267B}//x{267E}//x{267F}//x{2692}-//x{2697}//x{2699}//x{269B}//x{269C}//x{26A0}//x{26A1}//x{26AA}//x{26AB}//x{26B0}//x{26B1}//x{26BD}//x{26BE}//x{26C4}//x{26C5}//x{26C8}//x{26CE}//x{26CF}//x{26D1}//x{26D3}//x{26D4}//x{26E9}//x{26EA}//x{26F0}-//x{26F5}//x{26F7}//x{26F8}]" " | //x{26F9}" " (?:" " //x{FE0F} //x{200D} [//x{2640}//x{2642}] //x{FE0F}" " | [//x{1F3FB}-//x{1F3FF}]" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " )?" " | [//x{26FA}//x{26FD}//x{2702}//x{2705}//x{2708}//x{2709}]" " | [//x{270A}-//x{270D}] [//x{1F3FB}-//x{1F3FF}]?" " | [//x{270F}//x{2712}//x{2714}//x{2716}//x{271D}//x{2721}//x{2728}//x{2733}//x{2734}//x{2744}//x{2747}//x{274C}//x{274E}//x{2753}-//x{2755}//x{2757}//x{2763}//x{2764}//x{2795}-//x{2797}//x{27A1}//x{27B0}//x{27BF}//x{2934}//x{2935}//x{2B05}-//x{2B07}//x{2B1B}//x{2B1C}//x{2B50}//x{2B55}//x{3030}//x{303D}//x{3297}//x{3299}//x{1F004}//x{1F0CF}//x{1F170}//x{1F171}//x{1F17E}//x{1F17F}//x{1F18E}//x{1F191}-//x{1F19A}]" " | //x{1F1E6} [//x{1F1E8}-//x{1F1EC}//x{1F1EE}//x{1F1F1}//x{1F1F2}//x{1F1F4}//x{1F1F6}-//x{1F1FA}//x{1F1FC}//x{1F1FD}//x{1F1FF}]" " | //x{1F1E7} [//x{1F1E6}//x{1F1E7}//x{1F1E9}-//x{1F1EF}//x{1F1F1}-//x{1F1F4}//x{1F1F6}-//x{1F1F9}//x{1F1FB}//x{1F1FC}//x{1F1FE}//x{1F1FF}]" " | //x{1F1E8} [//x{1F1E6}//x{1F1E8}//x{1F1E9}//x{1F1EB}-//x{1F1EE}//x{1F1F0}-//x{1F1F5}//x{1F1F7}//x{1F1FA}-//x{1F1FF}]" " | //x{1F1E9} [//x{1F1EA}//x{1F1EC}//x{1F1EF}//x{1F1F0}//x{1F1F2}//x{1F1F4}//x{1F1FF}]" " | //x{1F1EA} [//x{1F1E6}//x{1F1E8}//x{1F1EA}//x{1F1EC}//x{1F1ED}//x{1F1F7}-//x{1F1FA}]" " | //x{1F1EB} [//x{1F1EE}-//x{1F1F0}//x{1F1F2}//x{1F1F4}//x{1F1F7}]" " | //x{1F1EC} [//x{1F1E6}//x{1F1E7}//x{1F1E9}-//x{1F1EE}//x{1F1F1}-//x{1F1F3}//x{1F1F5}-//x{1F1FA}//x{1F1FC}//x{1F1FE}]" " | //x{1F1ED} [//x{1F1F0}//x{1F1F2}//x{1F1F3}//x{1F1F7}//x{1F1F9}//x{1F1FA}]" " | //x{1F1EE} [//x{1F1E8}-//x{1F1EA}//x{1F1F1}-//x{1F1F4}//x{1F1F6}-//x{1F1F9}]" " | //x{1F1EF} [//x{1F1EA}//x{1F1F2}//x{1F1F4}//x{1F1F5}]" " | //x{1F1F0} [//x{1F1EA}//x{1F1EC}-//x{1F1EE}//x{1F1F2}//x{1F1F3}//x{1F1F5}//x{1F1F7}//x{1F1FC}//x{1F1FE}//x{1F1FF}]" " | //x{1F1F1} [//x{1F1E6}-//x{1F1E8}//x{1F1EE}//x{1F1F0}//x{1F1F7}-//x{1F1FB}//x{1F1FE}]" " | //x{1F1F2} [//x{1F1E6}//x{1F1E8}-//x{1F1ED}//x{1F1F0}-//x{1F1FF}]" " | //x{1F1F3} [//x{1F1E6}//x{1F1E8}//x{1F1EA}-//x{1F1EC}//x{1F1EE}//x{1F1F1}//x{1F1F4}//x{1F1F5}//x{1F1F7}//x{1F1FA}//x{1F1FF}]" " | //x{1F1F4} //x{1F1F2}" " | //x{1F1F5} [//x{1F1E6}//x{1F1EA}-//x{1F1ED}//x{1F1F0}-//x{1F1F3}//x{1F1F7}-//x{1F1F9}//x{1F1FC}//x{1F1FE}]" " | //x{1F1F6} //x{1F1E6}" " | //x{1F1F7} [//x{1F1EA}//x{1F1F4}//x{1F1F8}//x{1F1FA}//x{1F1FC}]" " | //x{1F1F8} [//x{1F1E6}-//x{1F1EA}//x{1F1EC}-//x{1F1F4}//x{1F1F7}-//x{1F1F9}//x{1F1FB}//x{1F1FD}-//x{1F1FF}]" " | //x{1F1F9} [//x{1F1E6}//x{1F1E8}//x{1F1E9}//x{1F1EB}-//x{1F1ED}//x{1F1EF}-//x{1F1F4}//x{1F1F7}//x{1F1F9}//x{1F1FB}//x{1F1FC}//x{1F1FF}]" " | //x{1F1FA} [//x{1F1E6}//x{1F1EC}//x{1F1F2}//x{1F1F3}//x{1F1F8}//x{1F1FE}//x{1F1FF}]" " | //x{1F1FB} [//x{1F1E6}//x{1F1E8}//x{1F1EA}//x{1F1EC}//x{1F1EE}//x{1F1F3}//x{1F1FA}]" " | //x{1F1FC} [//x{1F1EB}//x{1F1F8}]" " | //x{1F1FD} //x{1F1F0}" " | //x{1F1FE} [//x{1F1EA}//x{1F1F9}]" " | //x{1F1FF} [//x{1F1E6}//x{1F1F2}//x{1F1FC}]" " | [//x{1F201}//x{1F202}//x{1F21A}//x{1F22F}//x{1F232}-//x{1F23A}//x{1F250}//x{1F251}//x{1F300}-//x{1F321}//x{1F324}-//x{1F384}]" " | //x{1F385} [//x{1F3FB}-//x{1F3FF}]?" " | [//x{1F386}-//x{1F393}//x{1F396}//x{1F397}//x{1F399}-//x{1F39B}//x{1F39E}-//x{1F3C1}]" " | //x{1F3C2} [//x{1F3FB}-//x{1F3FF}]?" " | [//x{1F3C3}//x{1F3C4}]" " (?:" " //x{200D} [//x{2640}//x{2642}] //x{FE0F}" " | [//x{1F3FB}-//x{1F3FF}]" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " )?" " | [//x{1F3C5}//x{1F3C6}]" " | //x{1F3C7} [//x{1F3FB}-//x{1F3FF}]?" " | [//x{1F3C8}//x{1F3C9}]" " | //x{1F3CA}" " (?:" " //x{200D} [//x{2640}//x{2642}] //x{FE0F}" " | [//x{1F3FB}-//x{1F3FF}]" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " )?" " | [//x{1F3CB}//x{1F3CC}]" " (?:" " //x{FE0F} //x{200D} [//x{2640}//x{2642}] //x{FE0F}" " | [//x{1F3FB}-//x{1F3FF}]" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " )?" " | [//x{1F3CD}-//x{1F3F0}]" " | //x{1F3F3}" " (?: //x{FE0F} //x{200D} //x{1F308} )?" " | //x{1F3F4}" " (?:" " //x{200D} //x{2620} //x{FE0F}" " | //x{E0067} //x{E0062}" " (?:" " //x{E0065} //x{E006E} //x{E0067}" " | //x{E0073} //x{E0063} //x{E0074}" " | //x{E0077} //x{E006C} //x{E0073}" " )" " //x{E007F}" " )?" " | [//x{1F3F5}//x{1F3F7}-//x{1F440}]" " | //x{1F441}" " (?: //x{FE0F} //x{200D} //x{1F5E8} //x{FE0F} )?" " | [//x{1F442}//x{1F443}] [//x{1F3FB}-//x{1F3FF}]?" " | [//x{1F444}//x{1F445}]" " | [//x{1F446}-//x{1F450}] [//x{1F3FB}-//x{1F3FF}]?" " | [//x{1F451}-//x{1F465}]" " | [//x{1F466}//x{1F467}] [//x{1F3FB}-//x{1F3FF}]?" " | //x{1F468}" " (?:" " //x{200D}" " (?:" " [//x{2695}//x{2696}//x{2708}] //x{FE0F}" " | //x{2764} //x{FE0F} //x{200D}" " (?: //x{1F48B} //x{200D} )?" " //x{1F468}" " | [//x{1F33E}//x{1F373}//x{1F393}//x{1F3A4}//x{1F3A8}//x{1F3EB}//x{1F3ED}]" " | //x{1F466}" " (?: //x{200D} //x{1F466} )?" " | //x{1F467}" " (?: //x{200D} [//x{1F466}//x{1F467}] )?" " | [//x{1F468}//x{1F469}] //x{200D}" " (?:" " //x{1F466}" " (?: //x{200D} //x{1F466} )?" " | //x{1F467}" " (?: //x{200D} [//x{1F466}//x{1F467}] )?" " )" " | [//x{1F4BB}//x{1F4BC}//x{1F527}//x{1F52C}//x{1F680}//x{1F692}//x{1F9B0}-//x{1F9B3}]" " )" " | [//x{1F3FB}-//x{1F3FF}]" " (?:" " //x{200D}" " (?:" " [//x{2695}//x{2696}//x{2708}] //x{FE0F}" " | [//x{1F33E}//x{1F373}//x{1F393}//x{1F3A4}//x{1F3A8}//x{1F3EB}//x{1F3ED}//x{1F4BB}//x{1F4BC}//x{1F527}//x{1F52C}//x{1F680}//x{1F692}//x{1F9B0}-//x{1F9B3}]" " )" " )?" " )?" " | //x{1F469}" " (?:" " //x{200D}" " (?:" " [//x{2695}//x{2696}//x{2708}] //x{FE0F}" " | //x{2764} //x{FE0F} //x{200D}" " (?: //x{1F48B} //x{200D} )?" " [//x{1F468}//x{1F469}]" " | [//x{1F33E}//x{1F373}//x{1F393}//x{1F3A4}//x{1F3A8}//x{1F3EB}//x{1F3ED}]" " | //x{1F466}" " (?: //x{200D} //x{1F466} )?" " | //x{1F467}" " (?: //x{200D} [//x{1F466}//x{1F467}] )?" " | //x{1F469} //x{200D}" " (?:" " //x{1F466}" " (?: //x{200D} //x{1F466} )?" " | //x{1F467}" " (?: //x{200D} [//x{1F466}//x{1F467}] )?" " )" " | [//x{1F4BB}//x{1F4BC}//x{1F527}//x{1F52C}//x{1F680}//x{1F692}//x{1F9B0}-//x{1F9B3}]" " )" " | [//x{1F3FB}-//x{1F3FF}]" " (?:" " //x{200D}" " (?:" " [//x{2695}//x{2696}//x{2708}] //x{FE0F}" " | [//x{1F33E}//x{1F373}//x{1F393}//x{1F3A4}//x{1F3A8}//x{1F3EB}//x{1F3ED}//x{1F4BB}//x{1F4BC}//x{1F527}//x{1F52C}//x{1F680}//x{1F692}//x{1F9B0}-//x{1F9B3}]" " )" " )?" " )?" " | [//x{1F46A}-//x{1F46D}]" " | //x{1F46E}" " (?:" " //x{200D} [//x{2640}//x{2642}] //x{FE0F}" " | [//x{1F3FB}-//x{1F3FF}]" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " )?" " | //x{1F46F}" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " | //x{1F470} [//x{1F3FB}-//x{1F3FF}]?" " | //x{1F471}" " (?:" " //x{200D} [//x{2640}//x{2642}] //x{FE0F}" " | [//x{1F3FB}-//x{1F3FF}]" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " )?" " | //x{1F472} [//x{1F3FB}-//x{1F3FF}]?" " | //x{1F473}" " (?:" " //x{200D} [//x{2640}//x{2642}] //x{FE0F}" " | [//x{1F3FB}-//x{1F3FF}]" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " )?" " | [//x{1F474}-//x{1F476}] [//x{1F3FB}-//x{1F3FF}]?" " | //x{1F477}" " (?:" " //x{200D} [//x{2640}//x{2642}] //x{FE0F}" " | [//x{1F3FB}-//x{1F3FF}]" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " )?" " | //x{1F478} [//x{1F3FB}-//x{1F3FF}]?" " | [//x{1F479}-//x{1F47B}]" " | //x{1F47C} [//x{1F3FB}-//x{1F3FF}]?" " | [//x{1F47D}-//x{1F480}]" " | [//x{1F481}//x{1F482}]" " (?:" " //x{200D} [//x{2640}//x{2642}] //x{FE0F}" " | [//x{1F3FB}-//x{1F3FF}]" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " )?" " | //x{1F483} [//x{1F3FB}-//x{1F3FF}]?" " | //x{1F484}" " | //x{1F485} [//x{1F3FB}-//x{1F3FF}]?" " | [//x{1F486}//x{1F487}]" " (?:" " //x{200D} [//x{2640}//x{2642}] //x{FE0F}" " | [//x{1F3FB}-//x{1F3FF}]" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " )?" " | [//x{1F488}-//x{1F4A9}]" " | //x{1F4AA} [//x{1F3FB}-//x{1F3FF}]?" " | [//x{1F4AB}-//x{1F4FD}//x{1F4FF}-//x{1F53D}//x{1F549}-//x{1F54E}//x{1F550}-//x{1F567}//x{1F56F}//x{1F570}//x{1F573}]" " | //x{1F574} [//x{1F3FB}-//x{1F3FF}]?" " | //x{1F575}" " (?:" " //x{FE0F} //x{200D} [//x{2640}//x{2642}] //x{FE0F}" " | [//x{1F3FB}-//x{1F3FF}]" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " )?" " | [//x{1F576}-//x{1F579}]" " | //x{1F57A} [//x{1F3FB}-//x{1F3FF}]?" " | [//x{1F587}//x{1F58A}-//x{1F58D}]" " | [//x{1F590}//x{1F595}//x{1F596}] [//x{1F3FB}-//x{1F3FF}]?" " | [//x{1F5A4}//x{1F5A5}//x{1F5A8}//x{1F5B1}//x{1F5B2}//x{1F5BC}//x{1F5C2}-//x{1F5C4}//x{1F5D1}-//x{1F5D3}//x{1F5DC}-//x{1F5DE}//x{1F5E1}//x{1F5E3}//x{1F5E8}//x{1F5EF}//x{1F5F3}//x{1F5FA}-//x{1F644}]" " | [//x{1F645}-//x{1F647}]" " (?:" " //x{200D} [//x{2640}//x{2642}] //x{FE0F}" " | [//x{1F3FB}-//x{1F3FF}]" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " )?" " | [//x{1F648}-//x{1F64A}]" " | //x{1F64B}" " (?:" " //x{200D} [//x{2640}//x{2642}] //x{FE0F}" " | [//x{1F3FB}-//x{1F3FF}]" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " )?" " | //x{1F64C} [//x{1F3FB}-//x{1F3FF}]?" " | [//x{1F64D}//x{1F64E}]" " (?:" " //x{200D} [//x{2640}//x{2642}] //x{FE0F}" " | [//x{1F3FB}-//x{1F3FF}]" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " )?" " | //x{1F64F} [//x{1F3FB}-//x{1F3FF}]?" " | [//x{1F680}-//x{1F6A2}]" " | //x{1F6A3}" " (?:" " //x{200D} [//x{2640}//x{2642}] //x{FE0F}" " | [//x{1F3FB}-//x{1F3FF}]" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " )?" " | [//x{1F6A4}-//x{1F6B3}]" " | [//x{1F6B4}-//x{1F6B6}]" " (?:" " //x{200D} [//x{2640}//x{2642}] //x{FE0F}" " | [//x{1F3FB}-//x{1F3FF}]" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " )?" " | [//x{1F6B7}-//x{1F6BF}]" " | //x{1F6C0} [//x{1F3FB}-//x{1F3FF}]?" " | [//x{1F6C1}-//x{1F6C5}//x{1F6CB}]" " | //x{1F6CC} [//x{1F3FB}-//x{1F3FF}]?" " | [//x{1F6CD}-//x{1F6D2}//x{1F6E0}-//x{1F6E5}//x{1F6E9}//x{1F6EB}//x{1F6EC}//x{1F6F0}//x{1F6F3}-//x{1F6F9}//x{1F910}-//x{1F917}]" " | [//x{1F918}-//x{1F91C}] [//x{1F3FB}-//x{1F3FF}]?" " | //x{1F91D}" " | [//x{1F91E}//x{1F91F}] [//x{1F3FB}-//x{1F3FF}]?" " | [//x{1F920}-//x{1F925}]" " | //x{1F926}" " (?:" " //x{200D} [//x{2640}//x{2642}] //x{FE0F}" " | [//x{1F3FB}-//x{1F3FF}]" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " )?" " | [//x{1F927}-//x{1F92F}]" " | [//x{1F930}-//x{1F936}] [//x{1F3FB}-//x{1F3FF}]?" " | //x{1F937}" " (?:" " //x{200D} [//x{2640}//x{2642}] //x{FE0F}" " | [//x{1F3FB}-//x{1F3FF}]" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " )?" " | [//x{1F938}//x{1F939}]" " (?:" " //x{200D} [//x{2640}//x{2642}] //x{FE0F}" " | [//x{1F3FB}-//x{1F3FF}]" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " )?" " | //x{1F93A}" " | //x{1F93C}" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " | [//x{1F93D}//x{1F93E}]" " (?:" " //x{200D} [//x{2640}//x{2642}] //x{FE0F}" " | [//x{1F3FB}-//x{1F3FF}]" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " )?" " | [//x{1F940}-//x{1F945}//x{1F947}-//x{1F970}//x{1F973}-//x{1F976}//x{1F97A}//x{1F97C}-//x{1F9A2}//x{1F9B0}-//x{1F9B4}]" " | [//x{1F9B5}//x{1F9B6}] [//x{1F3FB}-//x{1F3FF}]?" " | //x{1F9B7}" " | [//x{1F9B8}//x{1F9B9}]" " (?:" " //x{200D} [//x{2640}//x{2642}] //x{FE0F}" " | [//x{1F3FB}-//x{1F3FF}]" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " )?" " | [//x{1F9C0}-//x{1F9C2}//x{1F9D0}]" " | [//x{1F9D1}-//x{1F9D5}] [//x{1F3FB}-//x{1F3FF}]?" " | //x{1F9D6}" " (?:" " //x{200D} [//x{2640}//x{2642}] //x{FE0F}" " | [//x{1F3FB}-//x{1F3FF}]" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " )?" " | [//x{1F9D7}-//x{1F9DD}]" " (?:" " //x{200D} [//x{2640}//x{2642}] //x{FE0F}" " | [//x{1F3FB}-//x{1F3FF}]" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " )?" " | [//x{1F9DE}//x{1F9DF}]" " (?: //x{200D} [//x{2640}//x{2642}] //x{FE0F} )?" " | [//x{1F9E0}-//x{1F9FF}]"

Para el modo utf-16 (de cuerda), modo comprimido:

"[#*0-9]//uFE0F//u20E3|[//u00A9//u00AE//u203C//u2049//u2122//u2139//u2" "194-//u2199//u21A9//u21AA//u231A//u231B//u2328//u23CF//u23E9-//u23F3//" "u23F8-//u23FA//u24C2//u25AA//u25AB//u25B6//u25C0//u25FB-//u25FE//u260" "0-//u2604//u260E//u2611//u2614//u2615//u2618]|//u261D(?://uD83C[//uDF" "FB-//uDFFF])?|[//u2620//u2622//u2623//u2626//u262A//u262E//u262F//u26" "38-//u263A//u2640//u2642//u2648-//u2653//u265F//u2660//u2663//u2665//u" "2666//u2668//u267B//u267E//u267F//u2692-//u2697//u2699//u269B//u269C//" "u26A0//u26A1//u26AA//u26AB//u26B0//u26B1//u26BD//u26BE//u26C4//u26C5//" "u26C8//u26CE//u26CF//u26D1//u26D3//u26D4//u26E9//u26EA//u26F0-//u26F5" "//u26F7//u26F8]|//u26F9(?://uD83C[//uDFFB-//uDFFF](?://u200D[//u2640" "//u2642]//uFE0F)?|//uFE0F//u200D[//u2640//u2642]//uFE0F)?|[//u26FA//u" "26FD//u2702//u2705//u2708//u2709]|[//u270A-//u270D](?://uD83C[//uDFF" "B-//uDFFF])?|[//u270F//u2712//u2714//u2716//u271D//u2721//u2728//u273" "3//u2734//u2744//u2747//u274C//u274E//u2753-//u2755//u2757//u2763//u27" "64//u2795-//u2797//u27A1//u27B0//u27BF//u2934//u2935//u2B05-//u2B07//u" "2B1B//u2B1C//u2B50//u2B55//u3030//u303D//u3297//u3299]|//uD83C(?:[//u" "DC04//uDCCF//uDD70//uDD71//uDD7E//uDD7F//uDD8E//uDD91-//uDD9A]|//uDDE" "6//uD83C[//uDDE8-//uDDEC//uDDEE//uDDF1//uDDF2//uDDF4//uDDF6-//uDDFA//u" "DDFC//uDDFD//uDDFF]|//uDDE7//uD83C[//uDDE6//uDDE7//uDDE9-//uDDEF//uDD" "F1-//uDDF4//uDDF6-//uDDF9//uDDFB//uDDFC//uDDFE//uDDFF]|//uDDE8//uD83C" "[//uDDE6//uDDE8//uDDE9//uDDEB-//uDDEE//uDDF0-//uDDF5//uDDF7//uDDFA-//u" "DDFF]|//uDDE9//uD83C[//uDDEA//uDDEC//uDDEF//uDDF0//uDDF2//uDDF4//uDDF" "F]|//uDDEA//uD83C[//uDDE6//uDDE8//uDDEA//uDDEC//uDDED//uDDF7-//uDDFA]" "|//uDDEB//uD83C[//uDDEE-//uDDF0//uDDF2//uDDF4//uDDF7]|//uDDEC//uD83C[" "//uDDE6//uDDE7//uDDE9-//uDDEE//uDDF1-//uDDF3//uDDF5-//uDDFA//uDDFC//uD" "DFE]|//uDDED//uD83C[//uDDF0//uDDF2//uDDF3//uDDF7//uDDF9//uDDFA]|//uDD" "EE//uD83C[//uDDE8-//uDDEA//uDDF1-//uDDF4//uDDF6-//uDDF9]|//uDDEF//uD8" "3C[//uDDEA//uDDF2//uDDF4//uDDF5]|//uDDF0//uD83C[//uDDEA//uDDEC-//uDDE" "E//uDDF2//uDDF3//uDDF5//uDDF7//uDDFC//uDDFE//uDDFF]|//uDDF1//uD83C[//u" "DDE6-//uDDE8//uDDEE//uDDF0//uDDF7-//uDDFB//uDDFE]|//uDDF2//uD83C[//uD" "DE6//uDDE8-//uDDED//uDDF0-//uDDFF]|//uDDF3//uD83C[//uDDE6//uDDE8//uDD" "EA-//uDDEC//uDDEE//uDDF1//uDDF4//uDDF5//uDDF7//uDDFA//uDDFF]|//uDDF4//" "uD83C//uDDF2|//uDDF5//uD83C[//uDDE6//uDDEA-//uDDED//uDDF0-//uDDF3//uD" "DF7-//uDDF9//uDDFC//uDDFE]|//uDDF6//uD83C//uDDE6|//uDDF7//uD83C[//uDD" "EA//uDDF4//uDDF8//uDDFA//uDDFC]|//uDDF8//uD83C[//uDDE6-//uDDEA//uDDEC" "-//uDDF4//uDDF7-//uDDF9//uDDFB//uDDFD-//uDDFF]|//uDDF9//uD83C[//uDDE6" "//uDDE8//uDDE9//uDDEB-//uDDED//uDDEF-//uDDF4//uDDF7//uDDF9//uDDFB//uDD" "FC//uDDFF]|//uDDFA//uD83C[//uDDE6//uDDEC//uDDF2//uDDF3//uDDF8//uDDFE//" "uDDFF]|//uDDFB//uD83C[//uDDE6//uDDE8//uDDEA//uDDEC//uDDEE//uDDF3//uDD" "FA]|//uDDFC//uD83C[//uDDEB//uDDF8]|//uDDFD//uD83C//uDDF0|//uDDFE//uD8" "3C[//uDDEA//uDDF9]|//uDDFF//uD83C[//uDDE6//uDDF2//uDDFC]|[//uDE01//uD" "E02//uDE1A//uDE2F//uDE32-//uDE3A//uDE50//uDE51//uDF00-//uDF21//uDF24-" "//uDF84]|//uDF85(?://uD83C[//uDFFB-//uDFFF])?|[//uDF86-//uDF93//uDF9" "6//uDF97//uDF99-//uDF9B//uDF9E-//uDFC1]|//uDFC2(?://uD83C[//uDFFB-//u" "DFFF])?|[//uDFC3//uDFC4](?://u200D[//u2640//u2642]//uFE0F|//uD83C[//" "uDFFB-//uDFFF](?://u200D[//u2640//u2642]//uFE0F)?)?|[//uDFC5//uDFC6" "]|//uDFC7(?://uD83C[//uDFFB-//uDFFF])?|[//uDFC8//uDFC9]|//uDFCA(?://" "u200D[//u2640//u2642]//uFE0F|//uD83C[//uDFFB-//uDFFF](?://u200D[//u2" "640//u2642]//uFE0F)?)?|[//uDFCB//uDFCC](?://uD83C[//uDFFB-//uDFFF](" "?://u200D[//u2640//u2642]//uFE0F)?|//uFE0F//u200D[//u2640//u2642]//uF" "E0F)?|[//uDFCD-//uDFF0]|//uDFF3(?://uFE0F//u200D//uD83C//uDF08)?|//u" "DFF4(?://u200D//u2620//uFE0F|//uDB40//uDC67//uDB40//uDC62//uDB40(?://" "uDC65//uDB40//uDC6E//uDB40//uDC67|//uDC73//uDB40//uDC63//uDB40//uDC74" "|//uDC77//uDB40//uDC6C//uDB40//uDC73)//uDB40//uDC7F)?|[//uDFF5//uDFF7" "-//uDFFF])|//uD83D(?:[//uDC00-//uDC40]|//uDC41(?://uFE0F//u200D//uD8" "3D//uDDE8//uFE0F)?|[//uDC42//uDC43](?://uD83C[//uDFFB-//uDFFF])?|[//" "uDC44//uDC45]|[//uDC46-//uDC50](?://uD83C[//uDFFB-//uDFFF])?|[//uDC" "51-//uDC65]|[//uDC66//uDC67](?://uD83C[//uDFFB-//uDFFF])?|//uDC68(?" "://u200D(?:[//u2695//u2696//u2708]//uFE0F|//u2764//uFE0F//u200D//uD83" "D(?://uDC8B//u200D//uD83D)?//uDC68|//uD83C[//uDF3E//uDF73//uDF93//uDF" "A4//uDFA8//uDFEB//uDFED]|//uD83D(?://uDC66(?://u200D//uD83D//uDC66)?" "|//uDC67(?://u200D//uD83D[//uDC66//uDC67])?|[//uDC68//uDC69]//u200D//" "uD83D(?://uDC66(?://u200D//uD83D//uDC66)?|//uDC67(?://u200D//uD83D[" "//uDC66//uDC67])?)|[//uDCBB//uDCBC//uDD27//uDD2C//uDE80//uDE92])|//uD" "83E[//uDDB0-//uDDB3])|//uD83C[//uDFFB-//uDFFF](?://u200D(?:[//u2695" "//u2696//u2708]//uFE0F|//uD83C[//uDF3E//uDF73//uDF93//uDFA4//uDFA8//uD" "FEB//uDFED]|//uD83D[//uDCBB//uDCBC//uDD27//uDD2C//uDE80//uDE92]|//uD8" "3E[//uDDB0-//uDDB3]))?)?|//uDC69(?://u200D(?:[//u2695//u2696//u2708" "]//uFE0F|//u2764//uFE0F//u200D//uD83D(?://uDC8B//u200D//uD83D)?[//uDC" "68//uDC69]|//uD83C[//uDF3E//uDF73//uDF93//uDFA4//uDFA8//uDFEB//uDFED]" "|//uD83D(?://uDC66(?://u200D//uD83D//uDC66)?|//uDC67(?://u200D//uD83" "D[//uDC66//uDC67])?|//uDC69//u200D//uD83D(?://uDC66(?://u200D//uD83D" "//uDC66)?|//uDC67(?://u200D//uD83D[//uDC66//uDC67])?)|[//uDCBB//uDCB" "C//uDD27//uDD2C//uDE80//uDE92])|//uD83E[//uDDB0-//uDDB3])|//uD83C[//u" "DFFB-//uDFFF](?://u200D(?:[//u2695//u2696//u2708]//uFE0F|//uD83C[//u" "DF3E//uDF73//uDF93//uDFA4//uDFA8//uDFEB//uDFED]|//uD83D[//uDCBB//uDCB" "C//uDD27//uDD2C//uDE80//uDE92]|//uD83E[//uDDB0-//uDDB3]))?)?|[//uDC6" "A-//uDC6D]|//uDC6E(?://u200D[//u2640//u2642]//uFE0F|//uD83C[//uDFFB-" "//uDFFF](?://u200D[//u2640//u2642]//uFE0F)?)?|//uDC6F(?://u200D[//u2" "640//u2642]//uFE0F)?|//uDC70(?://uD83C[//uDFFB-//uDFFF])?|//uDC71(?" "://u200D[//u2640//u2642]//uFE0F|//uD83C[//uDFFB-//uDFFF](?://u200D[//" "u2640//u2642]//uFE0F)?)?|//uDC72(?://uD83C[//uDFFB-//uDFFF])?|//uDC" "73(?://u200D[//u2640//u2642]//uFE0F|//uD83C[//uDFFB-//uDFFF](?://u20" "0D[//u2640//u2642]//uFE0F)?)?|[//uDC74-//uDC76](?://uD83C[//uDFFB-//" "uDFFF])?|//uDC77(?://u200D[//u2640//u2642]//uFE0F|//uD83C[//uDFFB-//" "uDFFF](?://u200D[//u2640//u2642]//uFE0F)?)?|//uDC78(?://uD83C[//uDF" "FB-//uDFFF])?|[//uDC79-//uDC7B]|//uDC7C(?://uD83C[//uDFFB-//uDFFF])" "?|[//uDC7D-//uDC80]|[//uDC81//uDC82](?://u200D[//u2640//u2642]//uFE0" "F|//uD83C[//uDFFB-//uDFFF](?://u200D[//u2640//u2642]//uFE0F)?)?|//uD" "C83(?://uD83C[//uDFFB-//uDFFF])?|//uDC84|//uDC85(?://uD83C[//uDFFB-" "//uDFFF])?|[//uDC86//uDC87](?://u200D[//u2640//u2642]//uFE0F|//uD83C" "[//uDFFB-//uDFFF](?://u200D[//u2640//u2642]//uFE0F)?)?|[//uDC88-//uD" "CA9]|//uDCAA(?://uD83C[//uDFFB-//uDFFF])?|[//uDCAB-//uDCFD//uDCFF-//" "uDD3D//uDD49-//uDD4E//uDD50-//uDD67//uDD6F//uDD70//uDD73]|//uDD74(?:" "//uD83C[//uDFFB-//uDFFF])?|//uDD75(?://uD83C[//uDFFB-//uDFFF](?://u2" "00D[//u2640//u2642]//uFE0F)?|//uFE0F//u200D[//u2640//u2642]//uFE0F)?" "|[//uDD76-//uDD79]|//uDD7A(?://uD83C[//uDFFB-//uDFFF])?|[//uDD87//uD" "D8A-//uDD8D]|[//uDD90//uDD95//uDD96](?://uD83C[//uDFFB-//uDFFF])?|[" "//uDDA4//uDDA5//uDDA8//uDDB1//uDDB2//uDDBC//uDDC2-//uDDC4//uDDD1-//uDD" "D3//uDDDC-//uDDDE//uDDE1//uDDE3//uDDE8//uDDEF//uDDF3//uDDFA-//uDE44]|" "[//uDE45-//uDE47](?://u200D[//u2640//u2642]//uFE0F|//uD83C[//uDFFB-//" "uDFFF](?://u200D[//u2640//u2642]//uFE0F)?)?|[//uDE48-//uDE4A]|//uDE" "4B(?://u200D[//u2640//u2642]//uFE0F|//uD83C[//uDFFB-//uDFFF](?://u20" "0D[//u2640//u2642]//uFE0F)?)?|//uDE4C(?://uD83C[//uDFFB-//uDFFF])?|" "[//uDE4D//uDE4E](?://u200D[//u2640//u2642]//uFE0F|//uD83C[//uDFFB-//u" "DFFF](?://u200D[//u2640//u2642]//uFE0F)?)?|//uDE4F(?://uD83C[//uDFF" "B-//uDFFF])?|[//uDE80-//uDEA2]|//uDEA3(?://u200D[//u2640//u2642]//uF" "E0F|//uD83C[//uDFFB-//uDFFF](?://u200D[//u2640//u2642]//uFE0F)?)?|[" "//uDEA4-//uDEB3]|[//uDEB4-//uDEB6](?://u200D[//u2640//u2642]//uFE0F|" "//uD83C[//uDFFB-//uDFFF](?://u200D[//u2640//u2642]//uFE0F)?)?|[//uDE" "B7-//uDEBF]|//uDEC0(?://uD83C[//uDFFB-//uDFFF])?|[//uDEC1-//uDEC5//u" "DECB]|//uDECC(?://uD83C[//uDFFB-//uDFFF])?|[//uDECD-//uDED2//uDEE0-" "//uDEE5//uDEE9//uDEEB//uDEEC//uDEF0//uDEF3-//uDEF9])|//uD83E(?:[//uDD" "10-//uDD17]|[//uDD18-//uDD1C](?://uD83C[//uDFFB-//uDFFF])?|//uDD1D|" "[//uDD1E//uDD1F](?://uD83C[//uDFFB-//uDFFF])?|[//uDD20-//uDD25]|//uD" "D26(?://u200D[//u2640//u2642]//uFE0F|//uD83C[//uDFFB-//uDFFF](?://u2" "00D[//u2640//u2642]//uFE0F)?)?|[//uDD27-//uDD2F]|[//uDD30-//uDD36](" "?://uD83C[//uDFFB-//uDFFF])?|//uDD37(?://u200D[//u2640//u2642]//uFE0" "F|//uD83C[//uDFFB-//uDFFF](?://u200D[//u2640//u2642]//uFE0F)?)?|[//u" "DD38//uDD39](?://u200D[//u2640//u2642]//uFE0F|//uD83C[//uDFFB-//uDFF" "F](?://u200D[//u2640//u2642]//uFE0F)?)?|//uDD3A|//uDD3C(?://u200D[//" "u2640//u2642]//uFE0F)?|[//uDD3D//uDD3E](?://u200D[//u2640//u2642]//u" "FE0F|//uD83C[//uDFFB-//uDFFF](?://u200D[//u2640//u2642]//uFE0F)?)?|" "[//uDD40-//uDD45//uDD47-//uDD70//uDD73-//uDD76//uDD7A//uDD7C-//uDDA2//" "uDDB0-//uDDB4]|[//uDDB5//uDDB6](?://uD83C[//uDFFB-//uDFFF])?|//uDDB" "7|[//uDDB8//uDDB9](?://u200D[//u2640//u2642]//uFE0F|//uD83C[//uDFFB-" "//uDFFF](?://u200D[//u2640//u2642]//uFE0F)?)?|[//uDDC0-//uDDC2//uDDD" "0]|[//uDDD1-//uDDD5](?://uD83C[//uDFFB-//uDFFF])?|//uDDD6(?://u200D" "[//u2640//u2642]//uFE0F|//uD83C[//uDFFB-//uDFFF](?://u200D[//u2640//u" "2642]//uFE0F)?)?|[//uDDD7-//uDDDD](?://u200D[//u2640//u2642]//uFE0F" "|//uD83C[//uDFFB-//uDFFF](?://u200D[//u2640//u2642]//uFE0F)?)?|[//uD" "DDE//uDDDF](?://u200D[//u2640//u2642]//uFE0F)?|[//uDDE0-//uDDFF])"



Suponiendo que está pidiendo rangos de emoji Unicode estándar (hay diferentes bloques por proveedor) puede considerar estos tres rangos:

  • 0x20a0 - 0x32ff
  • 0x1f000 - 0x1ffff
  • 0xfe4e5 - 0xfe4ee

Además de toda la explicación reflexiva que TJCrowder ha compartido con nosotros, debe decirse que, a partir de Java 7, es posible hacer coincidir los pares sustituidos con codificación UTF-16 con facilidad.

Eche un vistazo a los documentos:

http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

Un carácter Unicode también se puede representar en una expresión regular usando su notación Hex (valor de punto de código hexadecimal) directamente como se describe en construir / x {...}, por ejemplo, un carácter suplementario U + 2011F se puede especificar como / x {2011F}, en lugar de dos secuencias de escape Unicode consecutivas del par suplente / uD840 / uDD1F.

Sin embargo, si no puede cambiar a Java 7, puede extender el valioso UnicodeEscaper proporcionado por Guava.

Aquí una implementación por el bien del ejemplo:

public class SimpleEscaper extends UnicodeEscaper { @Override protected char[] escape(int codePoint) { if (0x1f000 >= codePoint && codePoint <= 0x1ffff) { return Integer.toHexString(codePoint).toCharArray(); } return Character.toChars(codePoint); } }


También puedes usar la biblioteca emoji4j .

String emojiText = "A 🐱, 🐱 and a 🐭 became friends. For 🐶''s birthday party, they all had 🍔s, 🍟s, 🍪s and 🍰."; EmojiUtils.removeAllEmojis(emojiText);//returns "A , and a became friends. For ''s birthday party, they all had s, s, s and .


Tuve un problema similar. Lo siguiente me ha servido bien y coincide con los pares sustitutos

public class SplitByUnicode { public static void main(String[] argv) throws Exception { String string = "Thats a nice joke 😆😆😆 😛"; System.out.println("Original String:"+string); String regexPattern = "[/uD83C-/uDBFF/uDC00-/uDFFF]+"; byte[] utf8 = string.getBytes("UTF-8"); String string1 = new String(utf8, "UTF-8"); Pattern pattern = Pattern.compile(regexPattern); Matcher matcher = pattern.matcher(string1); List<String> matchList = new ArrayList<String>(); while (matcher.find()) { matchList.add(matcher.group()); } for(int i=0;i<matchList.size();i++){ System.out.println(i+":"+matchList.get(i)); } } }

La salida es:

Original String:Thats a nice joke 😆😆😆 😛 0:😆😆😆 1:😛

Encontré la expresión regular de https://.com/a/24071599/915972


Usando emoji-java he escrito un método simple que elimina todos los emojis, incluidos los modificadores fitzpatrick . Requiere una biblioteca externa pero es más fácil de mantener que esas expresiones regulares monstruosas.

Utilizar:

String input = "A string 😄with a /uD83D/uDC66/uD83C/uDFFFfew 😉emojis!"; String result = EmojiParser.removeAllEmojis(input);

instalación de emoji-java maven:

<dependency> <groupId>com.vdurmont</groupId> <artifactId>emoji-java</artifactId> <version>3.1.3</version> </dependency>

gradle:

compile ''com.vdurmont:emoji-java:3.1.3''

EDITAR: la respuesta enviada previamente se insertó en el código fuente de emoji-java.


puedes hacerlo así

String s="Thats a nice joke 😆😆😆 😛"; Pattern pattern = Pattern.compile("[/ud83c/udc00-/ud83c/udfff]|[/ud83d/udc00-/ud83d/udfff]|[/u2600-/u27ff]", Pattern.UNICODE_CASE | Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(s); List<String> matchList = new ArrayList<String>(); while (matcher.find()) { matchList.add(matcher.group()); } for(int i=0;i<matchList.size();i++){ System.out.println(matchList.get(i)); }


Emoji regex

public static final String sEmojiRegex = "(?:[//u2700-//u27bf]|" + "(?:[//ud83c//udde6-//ud83c//uddff]){2}|" + "[//ud800//udc00-//uDBFF//uDFFF]|[//u2600-//u26FF])[//ufe0e//ufe0f]?(?:[//u0300-//u036f//ufe20-//ufe23//u20d0-//u20f0]|[//ud83c//udffb-//ud83c//udfff])?" + "(?://u200d(?:[^//ud800-//udfff]|" + "(?:[//ud83c//udde6-//ud83c//uddff]){2}|" + "[//ud800//udc00-//uDBFF//uDFFF]|[//u2600-//u26FF])[//ufe0e//ufe0f]?(?:[//u0300-//u036f//ufe20-//ufe23//u20d0-//u20f0]|[//ud83c//udffb-//ud83c//udfff])?)*|" + "[//u0023-//u0039]//ufe0f?//u20e3|//u3299|//u3297|//u303d|//u3030|//u24c2|[//ud83c//udd70-//ud83c//udd71]|[//ud83c//udd7e-//ud83c//udd7f]|//ud83c//udd8e|[//ud83c//udd91-//ud83c//udd9a]|[//ud83c//udde6-//ud83c//uddff]|[//ud83c//ude01-//ud83c//ude02]|//ud83c//ude1a|//ud83c//ude2f|[//ud83c//ude32-//ud83c//ude3a]|[//ud83c//ude50-//ud83c//ude51]|//u203c|//u2049|[//u25aa-//u25ab]|//u25b6|//u25c0|[//u25fb-//u25fe]|//u00a9|//u00ae|//u2122|//u2139|//ud83c//udc04|[//u2600-//u26FF]|//u2b05|//u2b06|//u2b07|//u2b1b|//u2b1c|//u2b50|//u2b55|//u231a|//u231b|//u2328|//u23cf|[//u23e9-//u23f3]|[//u23f8-//u23fa]|//ud83c//udccf|//u2934|//u2935|[//u2190-//u21ff]";

algunos emojis (1627)

// count = 1627 public static final String sEmojiTest = "😀😃😄😁😆😅😂🤣☺️😊😇🙂🙃😉😌😍😘😗😙😚😋😜😝😛🤑🤗🤓😎🤡🤠😏😒😞😔😟😕🙁☹️😣😖😫😩😤😠😡😶😐😑😯😦😧😮😲😵😳😱😨😰😢😥🤤😭😓😪😴🙄🤔🤥😬🤐🤢🤧😷🤒🤕😈👿👹👺💩👻💀☠️👽👾🤖🎃😺😸😹😻😼😽🙀😿😾👐🙌👏🙏🤝👍👎👊✊🤛🤜🤞✌️🤘👌👈👉👆👇☝️✋🤚🖐🖖👋🤙💪🖕✍️🤳💅💍💄💋👄👅👂👃👣👁👀🗣👤👥👶👦👧👨👩👱‍♀👱👴👵👲👳‍♀👳👮‍♀👮👷‍♀👷💂‍♀💂🕵️‍♀️🕵👩‍⚕👨‍⚕👩‍🌾👨‍🌾👩‍🍳👨‍🍳👩‍🎓👨‍🎓👩‍🎤👨‍🎤👩‍🏫👨‍🏫👩‍🏭👨‍🏭👩‍💻👨‍💻👩‍💼👨‍💼👩‍🔧👨‍🔧👩‍🔬👨‍🔬👩‍🎨👨‍🎨👩‍🚒👨‍🚒👩‍✈👨‍✈👩‍🚀👨‍🚀👩‍⚖👨‍⚖🤶🎅👸🤴👰🤵👼🤰🙇‍♀🙇💁💁‍♂🙅🙅‍♂🙆🙆‍♂🙋🙋‍♂🤦‍♀🤦‍♂🤷‍♀🤷‍♂🙎🙎‍♂🙍🙍‍♂💇💇‍♂💆💆‍♂🕴💃🕺👯👯‍♂🚶‍♀🚶🏃‍♀🏃👫👭👬💑👩‍❤️‍👩👨‍❤️‍👨💏👩‍❤️‍💋‍👩👨‍❤️‍💋‍👨👪👨‍👩‍👧👨‍👩‍👧‍👦👨‍👩‍👦‍👦👨‍👩‍👧‍👧👩‍👩‍👦👩‍👩‍👧👩‍👩‍👧‍👦👩‍👩‍👦‍👦👩‍👩‍👧‍👧👨‍👨‍👦👨‍👨‍👧👨‍👨‍👧‍👦👨‍👨‍👦‍👦👨‍👨‍👧‍👧👩‍👦👩‍👧👩‍👧‍👦👩‍👦‍👦👩‍👧‍👧👨‍👦👨‍👧👨‍👧‍👦👨‍👦‍👦👨‍👧‍👧👚👕👖👔👗👙👘👠👡👢👞👟👒🎩🎓👑⛑🎒👝👛👜💼👓🕶🌂☂️🐶🐱🐭🐹🐰🦊🐻🐼🐨🐯🦁🐮🐷🐽🐸🐵🙈🙉🙊🐒🐔🐧🐦🐤🐣🐥🦆🦅🦉🦇🐺🐗🐴🦄🐝🐛🦋🐌🐚🐞🐜🕷🕸🐢🐍🦎🦂🦀🦑🐙🦐🐠🐟🐡🐬🦈🐳🐋🐊🐆🐅🐃🐂🐄🦌🐪🐫🐘🦏🦍🐎🐖🐐🐏🐑🐕🐩🐈🐓🦃🕊🐇🐁🐀🐿🐾🐉🐲🌵🎄🌲🌳🌴🌱🌿☘️🍀🎍🎋🍃🍂🍁🍄🌾💐🌷🌹🥀🌻🌼🌸🌺🌎🌍🌏🌕🌖🌗🌘🌑🌒🌓🌔🌚🌝🌞🌛🌜🌙💫⭐️🌟✨⚡️🔥💥☄☀️🌤⛅️🌥🌦🌈☁️🌧⛈🌩🌨☃️⛄️❄️🌬💨🌪🌫🌊💧💦☔️🍏🍎🍐🍊🍋🍌🍉🍇🍓🍈🍒🍑🍍🥝🥑🍅🍆🥒🥕🌽🌶🥔🍠🌰🥜🍯🥐🍞🥖🧀🥚🍳🥓🥞🍤🍗🍖🍕🌭🍔🍟🥙🌮🌯🥗🥘🍝🍜🍲🍥🍣🍱🍛🍚🍙🍘🍢🍡🍧🍨🍦🍰🎂🍮🍭🍬🍫🍿🍩🍪🥛🍼☕️🍵🍶🍺🍻🥂🍷🥃🍸🍹🍾🥄🍴🍽⚽️🏀🏈⚾️🎾🏐🏉🎱🏓🏸🥅🏒🏑🏏⛳️🏹🎣🥊🥋⛸🎿⛷🏂🏋️‍♀️🏋🤺🤼‍♀🤼‍♂🤸‍♀🤸‍♂⛹️‍♀️⛹🤾‍♀🤾‍♂🏌️‍♀️🏌🏄‍♀🏄🏊‍♀🏊🤽‍♀🤽‍♂🚣‍♀🚣🏇🚴‍♀🚴🚵‍♀🚵🎽🏅🎖🥇🥈🥉🏆🏵🎗🎫🎟🎪🤹‍♀🤹‍♂🎭🎨🎬🎤🎧🎼🎹🥁🎷🎺🎸🎻🎲🎯🎳🎮🎰🚗🚕🚙🚌🚎🏎🚓🚑🚒🚐🚚🚛🚜🛴🚲🛵🏍🚨🚔🚍🚘🚖🚡🚠🚟🚃🚋🚞🚝🚄🚅🚈🚂🚆🚇🚊🚉🚁🛩✈️🛫🛬🚀🛰💺🛶⛵️🛥🚤🛳⛴🚢⚓️🚧⛽️🚏🚦🚥🗺🗿🗽⛲️🗼🏰🏯🏟🎡🎢🎠⛱🏖🏝⛰🏔🗻🌋🏜🏕⛺️🛤🛣🏗🏭🏠🏡🏘🏚🏢🏬🏣🏤🏥🏦🏨🏪🏫🏩💒🏛⛪️🕌🕍🕋⛩🗾🎑🏞🌅🌄🌠🎇🎆🌇🌆🏙🌃🌌🌉🌁⌚️📱📲💻⌨️🖥🖨🖱🖲🕹🗜💽💾💿📀📼📷📸📹🎥📽🎞📞☎️📟📠📺📻🎙🎚🎛⏱⏲⏰🕰⌛️⏳📡🔋🔌💡🔦🕯🗑🛢💸💵💴💶💷💰💳💎⚖️🔧🔨⚒🛠⛏🔩⚙️⛓🔫💣🔪🗡⚔️🛡🚬⚰️⚱️🏺🔮📿💈⚗️🔭🔬🕳💊💉🌡🚽🚰🚿🛁🛀🛎🔑🗝🚪🛋🛏🛌🖼🛍🛒🎁🎈🎏🎀🎊🎉🎎🏮🎐✉️📩📨📧💌📥📤📦🏷📪📫📬📭📮📯📜📃📄📑📊📈📉🗒🗓📆📅📇🗃🗳🗄📋📁📂🗂🗞📰📓📔📒📕📗📘📙📚📖🔖🔗📎🖇📐📏📌📍✂️🖊🖋✒️🖌🖍📝✏️🔍🔎🔏🔐🔒🔓❤️💛💚💙💜🖤💔❣️💕💞💓💗💖💘💝💟☮️✝️☪️🕉☸️✡️🔯🕎☯️☦️🛐⛎♈️♉️♊️♋️♌️♍️♎️♏️♐️♑️♒️♓️🆔⚛️🉑☢️☣️📴📳🈶🈚️🈸🈺🈷️✴️🆚💮🉐㊙️㊗️🈴🈵🈹🈲🅰️🅱️🆎🆑🅾️🆘❌⭕️🛑⛔️📛🚫💯💢♨️🚷🚯🚳🚱🔞📵🚭❗️❕❓❔‼️⁉️🔅🔆〽️⚠️🚸🔱⚜️🔰♻️✅🈯️💹❇️✳️❎🌐💠Ⓜ️🌀💤🏧🚾♿️🅿️🈳🈂️🛂🛃🛄🛅🚹🚺🚼🚻🚮🎦📶🈁🔣ℹ️🔤🔡🔠🆖🆗🆙🆒🆕🆓0️⃣1️⃣2️⃣3️⃣4️⃣5️⃣6️⃣7️⃣8️⃣9️⃣🔟🔢#️⃣*️⃣▶️⏸⏯⏹⏺⏭⏮⏩⏪⏫⏬◀️🔼🔽➡️⬅️⬆️⬇️↗️↘️↙️↖️↕️↔️↪️↩️⤴️⤵️🔀🔁🔂🔄🔃🎵🎶➕➖➗✖️💲💱™️©️®️〰️➰➿🔚🔙🔛🔝🔜✔️☑️🔘⚪️⚫️🔴🔵🔺🔻🔸🔹🔶🔷🔳🔲▪️▫️◾️◽️◼️◻️⬛️⬜️🔈🔇🔉🔊🔔🔕📣📢👁‍🗨💬💭🗯♠️♣️♥️♦️🃏🎴🀄️🕐🕑🕒🕓🕔🕕🕖🕗🕘🕙🕚🕛🕜🕝🕞🕟🕠🕡🕢🕣🕤🕥🕦🕧🏳️🏴🏁🚩🏳️‍🌈🇦🇫🇦🇽🇦🇱🇩🇿🇦🇸🇦🇩🇦🇴🇦🇮🇦🇶🇦🇬🇦🇷🇦🇲🇦🇼🇦🇺🇦🇹🇦🇿🇧🇸🇧🇭🇧🇩🇧🇧🇧🇾🇧🇪🇧🇿🇧🇯🇧🇲🇧🇹🇧🇴🇧🇶🇧🇦🇧🇼🇧🇷🇮🇴🇻🇬🇧🇳🇧🇬🇧🇫🇧🇮🇨🇻🇰🇭🇨🇲🇨🇦🇮🇨🇰🇾🇨🇫🇹🇩🇨🇱🇨🇳🇨🇽🇨🇨🇨🇴🇰🇲🇨🇬🇨🇩🇨🇰🇨🇷🇨🇮🇭🇷🇨🇺🇨🇼🇨🇾🇨🇿🇩🇰🇩🇯🇩🇲🇩🇴🇪🇨🇪🇬🇸🇻🇬🇶🇪🇷🇪🇪🇪🇹🇪🇺🇫🇰🇫🇴🇫🇯🇫🇮🇫🇷🇬🇫🇵🇫🇹🇫🇬🇦🇬🇲🇬🇪🇩🇪🇬🇭🇬🇮🇬🇷🇬🇱🇬🇩🇬🇵🇬🇺🇬🇹🇬🇬🇬🇳🇬🇼🇬🇾🇭🇹🇭🇳🇭🇰🇭🇺🇮🇸🇮🇳🇮🇩🇮🇷🇮🇶🇮🇪🇮🇲🇮🇱🇮🇹🇯🇲🇯🇵🎌🇯🇪🇯🇴🇰🇿🇰🇪🇰🇮🇽🇰🇰🇼🇰🇬🇱🇦🇱🇻🇱🇧🇱🇸🇱🇷🇱🇾🇱🇮🇱🇹🇱🇺🇲🇴🇲🇰🇲🇬🇲🇼🇲🇾🇲🇻🇲🇱🇲🇹🇲🇭🇲🇶🇲🇷🇲🇺🇾🇹🇲🇽🇫🇲🇲🇩🇲🇨🇲🇳🇲🇪🇲🇸🇲🇦🇲🇿🇲🇲🇳🇦🇳🇷🇳🇵🇳🇱🇳🇨🇳🇿🇳🇮🇳🇪🇳🇬🇳🇺🇳🇫🇲🇵🇰🇵🇳🇴🇴🇲🇵🇰🇵🇼🇵🇸🇵🇦🇵🇬🇵🇾🇵🇪🇵🇭🇵🇳🇵🇱🇵🇹🇵🇷🇶🇦🇷🇪🇷🇴🇷🇺🇷🇼🇧🇱🇸🇭🇰🇳🇱🇨🇵🇲🇻🇨🇼🇸🇸🇲🇸🇹🇸🇦🇸🇳🇷🇸🇸🇨🇸🇱🇸🇬🇸🇽🇸🇰🇸🇮🇸🇧🇸🇴🇿🇦🇬🇸🇰🇷🇸🇸🇪🇸🇱🇰🇸🇩🇸🇷🇸🇿🇸🇪🇨🇭🇸🇾🇹🇼🇹🇯🇹🇿🇹🇭🇹🇱🇹🇬🇹🇰🇹🇴🇹🇹🇹🇳🇹🇷🇹🇲🇹🇨🇹🇻🇺🇬🇺🇦🇦🇪🇬🇧🇺🇸🇻🇮🇺🇾🇺🇿🇻🇺🇻🇦🇻🇪🇻🇳🇼🇫🇪🇭🇾🇪🇿🇲🇿🇼⚽️🏀🏈⚾️🎾🏐🏉🎱🏓🏸🥅🏒🏑🏏⛳️🏹🎣🥊🥋⛸🎿⛷🏂🏋️‍♀️🏋🏻‍♀️🏋🏼‍♀️🏋🏽‍♀️🏋🏾‍♀️🏋🏿‍♀️🏋️🏋🏻🏋🏼🏋🏽🏋🏾🏋🏿🤺🤼‍♀️🤼‍♂️🤸‍♀️🤸🏻‍♀️🤸🏼‍♀️🤸🏽‍♀️🤸🏾‍♀️🤸🏿‍♀️🤸‍♂️🤸🏻‍♂️🤸🏼‍♂️🤸🏽‍♂️🤸🏾‍♂️🤸🏿‍♂️⛹️‍♀️⛹🏻‍♀️⛹🏼‍♀️⛹🏽‍♀️⛹🏾‍♀️⛹🏿‍♀️⛹️⛹🏻⛹🏼⛹🏽⛹🏾⛹🏿🤾‍♀️🤾🏻‍♀️🤾🏼‍♀️🤾🏽‍♀️🤾🏾‍♀️🤾🏿‍♀️🤾‍♂️🤾🏻‍♂️🤾🏼‍♂️🤾🏽‍♂️🤾🏾‍♂️🤾🏿‍♂️🏌️‍♀️🏌🏻‍♀️🏌🏼‍♀️🏌🏽‍♀️🏌🏾‍♀️🏌🏿‍♀️🏌️🏌🏻🏌🏼🏌🏽🏌🏾🏌🏿🏄‍♀️🏄🏻‍♀️🏄🏼‍♀️🏄🏽‍♀️🏄🏾‍♀️🏄🏿‍♀️🏄🏄🏻🏄🏼🏄🏽🏄🏾🏄🏿🏊‍♀️🏊🏻‍♀️🏊🏼‍♀️🏊🏽‍♀️🏊🏾‍♀️🏊🏿‍♀️🏊🏊🏻🏊🏼🏊🏽🏊🏾🏊🏿🤽‍♀️🤽🏻‍♀️🤽🏼‍♀️🤽🏽‍♀️🤽🏾‍♀️🤽🏿‍♀️🤽‍♂️🤽🏻‍♂️🤽🏼‍♂️🤽🏽‍♂️🤽🏾‍♂️🤽🏿‍♂️🚣‍♀️🚣🏻‍♀️🚣🏼‍♀️🚣🏽‍♀️🚣🏾‍♀️🚣🏿‍♀️🚣🚣🏻🚣🏼🚣🏽🚣🏾🚣🏿🏇🏇🏻🏇🏼🏇🏽🏇🏾🏇🏿🚴‍♀️🚴🏻‍♀️🚴🏼‍♀️🚴🏽‍♀️🚴🏾‍♀️🚴🏿‍♀️🚴🚴🏻🚴🏼🚴🏽🚴🏾🚴🏿🚵‍♀️🚵🏻‍♀️🚵🏼‍♀️🚵🏽‍♀️🚵🏾‍♀️🚵🏿‍♀️🚵🚵🏻🚵🏼🚵🏽🚵🏾🚵🏿🎽🏅🎖🥇🥈🥉🏆🏵🎗🎫🎟🎪🤹‍♀️🤹‍♂️🎭🎨🎬🎤🎧🎼🎹🥁🎷🎺🎸🎻🎲🎯🎳🎮🎰";

función para probar emojis

public void checkMatchingEmojis() { final Pattern pattern = Pattern.compile(sEmojiRegex); final Matcher matcher = pattern.matcher(sEmojiTest); int foundEmojiCount = 0; while (matcher.find()) { System.out.println("Full match: " + matcher.group(0)); foundEmojiCount++; } System.out.println("*******************************************"); System.out.println("Input Emoji count = 1627"); System.out.println("Captured Emoji count = " + foundEmojiCount); System.out.println("*******************************************"); }

Here está la esencia, probado en todos 10 emojis Unicode

Gracias a Kevin Scott por escribir gran ejemplo