perl - referencias - como eliminar una cita bibliografica en word 2016
¿Cómo puedo extraer todas las citas en un texto? (4)
Me gusta esto:
perl -ne ''print "$_/n" foreach /"((?>[^"//]|//+[^"]|//(?:////)*")*)"/g;''
Es un poco detallado, pero maneja las comillas escapadas y retrocede mucho mejor que la implementación más simple. Lo que está diciendo es:
my $re = qr{
" # Begin it with literal quote
(
(?> # prevent backtracking once the alternation has been
# satisfied. It either agrees or it does not. This expression
# only needs one direction, or we fail out of the branch
[^"//] # a character that is not a dquote or a backslash
| //+ # OR if a backslash, then any number of backslashes followed by
[^"] # something that is not a quote
| // # OR again a backslash
(?>////)* # followed by any number of *pairs* of backslashes (as units)
" # and a quote
)* # any number of *set* qualifying phrases
) # all batched up together
" # Ended by a literal quote
}x;
Si no necesita tanto poder, digamos que solo es probable que sea un diálogo y no comillas estructuradas, entonces
/"([^"]*)"/
probablemente funcione tan bien como cualquier otra cosa.
Estoy buscando un SimpleGrepSedPerlOrPythonOneLiner que emita todas las citas en un texto.
Ejemplo 1:
echo “HAL,” noted Frank, “said that everything was going extremely well.” | SimpleGrepSedPerlOrPythonOneLiner
stdout:
"HAL,"
"said that everything was going extremely well.”
Ejemplo 2:
cat MicrosoftWindowsXPEula.txt | SimpleGrepSedPerlOrPythonOneLiner
stdout:
"EULA"
"Software"
"Workstation Computer"
"Device"
"DRM"
etc.
Ninguna solución regexp funcionará si tiene comillas anidadas, pero para sus ejemplos, esto funciona bien
$ echo /"HAL,/" noted Frank, /"said that everything was going extremely well/"
| perl -n -e ''while (m/(".*?")/g) { print $1."/n"; }''
"HAL,"
"said that everything was going extremely well"
$ cat eula.txt| perl -n -e ''while (m/(".*?")/g) { print $1."/n"; }''
"EULA"
"online"
"Software"
"Workstation Computer"
"Device"
"multiplexing"
"DRM"
"Secure Content"
"DRM Software"
"Secure Content Owners"
"DRM Upgrades"
"WMFSDK"
"Not For Resale"
"NFR,"
"Academic Edition"
"AE,"
"Qualified Educational User."
"Exclusion of Incidental, Consequential and Certain Other Damages"
"Restricted Rights"
"Exclusion des dommages accessoires, indirects et de certains autres dommages"
"Consumer rights"
grep -o "/"[^/"]*/""
Esto greps para "
+ cualquier cosa excepto una cita, cualquier cantidad de veces + "
El -o hace que solo emita el texto coincidente, no toda la línea.
grep -o ''"[^"]*"'' file
La opción ''-o'' print only pattern