example java regex split guava

java - example - ¿Cómo se diferencia Guava Splitter.onPattern(..). Split() de String.split(..)?



stream split string java 8 (2)

¡Has encontrado un error!

System.out.println(s.split("abc82")); // [abc, 8] System.out.println(s.split("abc8")); // [abc]

Este es el método que utiliza Splitter para dividir String s ( Splitter.SplittingIterator::computeNext ):

@Override protected String computeNext() { /* * The returned string will be from the end of the last match to the * beginning of the next one. nextStart is the start position of the * returned substring, while offset is the place to start looking for a * separator. */ int nextStart = offset; while (offset != -1) { int start = nextStart; int end; int separatorPosition = separatorStart(offset); if (separatorPosition == -1) { end = toSplit.length(); offset = -1; } else { end = separatorPosition; offset = separatorEnd(separatorPosition); } if (offset == nextStart) { /* * This occurs when some pattern has an empty match, even if it * doesn''t match the empty string -- for example, if it requires * lookahead or the like. The offset must be increased to look for * separators beyond this point, without changing the start position * of the next returned substring -- so nextStart stays the same. */ offset++; if (offset >= toSplit.length()) { offset = -1; } continue; } while (start < end && trimmer.matches(toSplit.charAt(start))) { start++; } while (end > start && trimmer.matches(toSplit.charAt(end - 1))) { end--; } if (omitEmptyStrings && start == end) { // Don''t include the (unused) separator in next split string. nextStart = offset; continue; } if (limit == 1) { // The limit has been reached, return the rest of the string as the // final item. This is tested after empty string removal so that // empty strings do not count towards the limit. end = toSplit.length(); offset = -1; // Since we may have changed the end, we need to trim it again. while (end > start && trimmer.matches(toSplit.charAt(end - 1))) { end--; } } else { limit--; } return toSplit.subSequence(start, end).toString(); } return endOfData(); }

El área de interés es:

if (offset == nextStart) { /* * This occurs when some pattern has an empty match, even if it * doesn''t match the empty string -- for example, if it requires * lookahead or the like. The offset must be increased to look for * separators beyond this point, without changing the start position * of the next returned substring -- so nextStart stays the same. */ offset++; if (offset >= toSplit.length()) { offset = -1; } continue; }

Esta lógica funciona muy bien, a menos que la coincidencia vacía ocurra al final de una String . Si la coincidencia vacía ocurre al final de una String , terminará omitiendo ese carácter. Cómo debe ser esta parte (aviso >= - > ):

if (offset == nextStart) { /* * This occurs when some pattern has an empty match, even if it * doesn''t match the empty string -- for example, if it requires * lookahead or the like. The offset must be increased to look for * separators beyond this point, without changing the start position * of the next returned substring -- so nextStart stays the same. */ offset++; if (offset > toSplit.length()) { offset = -1; } continue; }

Recientemente aproveché el poder de una expresión regular de mirar hacia adelante para dividir una cadena:

"abc8".split("(?=//d)|//W")

Si se imprime en la consola, esta expresión retorna:

[abc, 8]

Muy contento con este resultado, quería transferir esto a Guava para un mayor desarrollo, que se veía así:

Splitter.onPattern("(?=//d)|//W").split("abc8")

Para mi sorpresa, la salida cambió a:

[abc]

¿Por qué?


El Splitter guayaba parece tener un error cuando un patrón coincide con una cadena vacía. Si intenta crear un Matcher e imprime lo que coincide:

Pattern pattern = Pattern.compile("(?=//d)|//W"); Matcher matcher = pattern.matcher("abc8"); while (matcher.find()) { System.out.println(matcher.start() + "," + matcher.end()); }

Obtienes la salida 3,3 que hace que parezca que coincidiría con el 8 . Por lo tanto, simplemente se divide allí dando como resultado solo abc .

Puede usar, por ejemplo, Pattern#split(String) que parece dar el resultado correcto:

Pattern.compile("(?=//d)|//W").split("abc8")