test - Regex para unir fracciones descuidadas/números mixtos
regex perl online (3)
Tengo una serie de texto que contiene números mixtos (es decir, una parte completa y una parte fraccionaria). El problema es que el texto está lleno de descuido codificado por el ser humano:
- Toda la parte puede o no existir (ej: "10")
- La parte fraccional puede o no existir (ej: "1/3")
- Las dos partes pueden estar separadas por espacios y / o guiones (por ejemplo, "10 1/3", "10-1 / 3", "10 - 1/3").
- La fracción misma puede o no tener espacios entre el número y la barra oblicua (por ejemplo, "1/3", "1/3", "1/3").
- Puede haber otro texto después de la fracción que debe ignorarse
Necesito una expresión regular que pueda analizar estos elementos para poder crear un número adecuado a partir de este lío.
Aquí hay una expresión regular que manejará todos los datos que puedo arrojar sobre ella:
(/d++(?! */))? *-? *(?:(/d+) */ *(/d+))?.*$
Esto colocará los dígitos en los siguientes grupos:
- La parte completa del número mixto, si existe
- El numerador, si sale una fracción
- El denominador, si existe una fracción
Además, aquí está la explicación de RegexBuddy para los elementos (que me ayudó enormemente al construirlo):
Match the regular expression below and capture its match into backreference number 1 «(/d++(?! */))?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match a single digit 0..9 «/d++»
Between one and unlimited times, as many times as possible, without giving back (possessive) «++»
Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?! */)»
Match the character “ ” literally « *»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “/” literally «/»
Match the character “ ” literally « *»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “-” literally «-?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the character “ ” literally « *»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the regular expression below «(?:(/d+) */ *(/d+))?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the regular expression below and capture its match into backreference number 2 «(/d+)»
Match a single digit 0..9 «/d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “ ” literally « *»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “/” literally «/»
Match the character “ ” literally « *»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the regular expression below and capture its match into backreference number 3 «(/d+)»
Match a single digit 0..9 «/d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match any single character that is not a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
Creo que puede ser más fácil abordar los diferentes casos (mezcla completa, solo fracción, solo número) de forma separada. Por ejemplo:
sub parse_mixed {
my($mixed) = @_;
if($mixed =~ /^ *(/d+)[- ]+(/d+) *// *(/d)+(/D.*)?$/) {
return $1+$2/$3;
} elsif($mixed =~ /^ *(/d+) *// *(/d+)(/D.*)?$/) {
return $1/$2;
} elsif($mixed =~ /^ *(/d+)(/D.*)?$/) {
return $1;
}
}
print parse_mixed("10"), "/n";
print parse_mixed("1/3"), "/n";
print parse_mixed("1 / 3"), "/n";
print parse_mixed("10 1/3"), "/n";
print parse_mixed("10-1/3"), "/n";
print parse_mixed("10 - 1/3"), "/n";
Si está usando Perl 5.10
, así es como lo escribiría.
m{ ^ /s* # skip leading spaces (?''whole'' /d++ (?! /s*[//] ) # there should not be a slash immediately following a whole number ) /s* (?: # the rest should fail or succeed as a group -? # ignore possible neg sign /s* (?''numerator'' /d+ ) /s* [//] /s* (?''denominator'' /d+ ) )? }x
Luego puede acceder a los valores de la variable %+
esta manera:
$+{whole};
$+{numerator};
$+{denominator};