python - online - ¿Cómo introducir una expresión regular en string.replace?
regular expression |] (6)
El método de reemplazar objetos de cadena no acepta expresiones regulares, sino solo cadenas fijas (consulte la documentación: http://docs.python.org/2/library/stdtypes.html#str.replace ).
Tienes que usar re
modulo:
import re
newline= re.sub("<//?/[[0-9]+>", "", line)
Necesito un poco de ayuda para declarar una expresión regular. Mis entradas son como las siguientes:
this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>.
and there are many other lines in the txt files
with<[3> such tags </[3>
La salida requerida es:
this is a paragraph with in between and then there are cases ... where the number ranges from 1-100.
and there are many other lines in the txt files
with such tags
He intentado esto:
#!/usr/bin/python
import os, sys, re, glob
for infile in glob.glob(os.path.join(os.getcwd(), ''*.txt'')):
for line in reader:
line2 = line.replace(''<[1> '', '''')
line = line2.replace(''</[1> '', '''')
line2 = line.replace(''<[1>'', '''')
line = line2.replace(''</[1>'', '''')
print line
También he intentado esto (pero parece que estoy usando la sintaxis de expresiones regulares incorrecta):
line2 = line.replace(''<[*> '', '''')
line = line2.replace(''</[*> '', '''')
line2 = line.replace(''<[*>'', '''')
line = line2.replace(''</[*>'', '''')
No quiero codificar la replace
de 1 a 99. . .
Este fragmento de código probado debe hacerlo:
import re
line = re.sub(r"</?/[/d+>", "", line)
Edición: Aquí hay una versión comentada que explica cómo funciona:
line = re.sub(r"""
(?x) # Use free-spacing mode.
< # Match a literal ''<''
/? # Optionally match a ''/''
/[ # Match a literal ''[''
/d+ # Match one or more digits
> # Match a literal ''>''
""", "", line)
¡Los regexes son divertidos! Pero recomendaría encarecidamente pasar una o dos horas estudiando lo básico. Para empezar, necesita aprender qué caracteres son especiales: "metacaracteres" que deben escaparse (es decir, con una barra invertida colocada al frente y las reglas son diferentes dentro y fuera de las clases de caracteres). Hay un excelente tutorial en línea en: www.regular-expressions.info . El tiempo que pases allí se pagará por sí mismo muchas veces. Feliz regexing!
La forma más fácil
import re
txt=''this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. and there are many other lines in the txt files with<[3> such tags </[3>''
out = re.sub("(<[^>]+>)", '''', txt)
print out
Me gustaría esto (expresiones regulares explicadas en los comentarios):
import re
# If you need to use the regex more than once it is suggested to compile it.
pattern = re.compile(r"</{0,}/[/d+>")
# <//{0,}/[/d+>
#
# Match the character “<” literally «<»
# Match the character “/” literally «//{0,}»
# Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «{0,}»
# Match the character “[” literally «/[»
# Match a single digit 0..9 «/d+»
# Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Match the character “>” literally «>»
subject = """this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>.
and there are many other lines in the txt files
with<[3> such tags </[3>"""
result = pattern.sub("", subject)
print(result)
Si desea obtener más información sobre expresiones regulares, le recomiendo leer el libro de recetas de Expresiones regulares de Jan Goyvaerts y Steven Levithan.
no tiene que usar expresiones regulares (para su cadena de muestra)
>>> s
''this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. /nand there are many other lines in the txt files/nwith<[3> such tags </[3>/n''
>>> for w in s.split(">"):
... if "<" in w:
... print w.split("<")[0]
...
this is a paragraph with
in between
and then there are cases ... where the
number ranges from 1-100
.
and there are many other lines in the txt files
with
such tags
str.replace()
hace reemplazos fijos. Utilice re.sub()
lugar.