tutorial regular online expressions examples python regex string replace

python - online - ¿Cómo introducir una expresión regular en string.replace?



regular expression |] (6)

El método de reemplazar objetos de cadena no acepta expresiones regulares, sino solo cadenas fijas (consulte la documentación: http://docs.python.org/2/library/stdtypes.html#str.replace ).

Tienes que usar re modulo:

import re newline= re.sub("<//?/[[0-9]+>", "", line)

Necesito un poco de ayuda para declarar una expresión regular. Mis entradas son como las siguientes:

this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. and there are many other lines in the txt files with<[3> such tags </[3>

La salida requerida es:

this is a paragraph with in between and then there are cases ... where the number ranges from 1-100. and there are many other lines in the txt files with such tags

He intentado esto:

#!/usr/bin/python import os, sys, re, glob for infile in glob.glob(os.path.join(os.getcwd(), ''*.txt'')): for line in reader: line2 = line.replace(''<[1> '', '''') line = line2.replace(''</[1> '', '''') line2 = line.replace(''<[1>'', '''') line = line2.replace(''</[1>'', '''') print line

También he intentado esto (pero parece que estoy usando la sintaxis de expresiones regulares incorrecta):

line2 = line.replace(''<[*> '', '''') line = line2.replace(''</[*> '', '''') line2 = line.replace(''<[*>'', '''') line = line2.replace(''</[*>'', '''')

No quiero codificar la replace de 1 a 99. . .


Este fragmento de código probado debe hacerlo:

import re line = re.sub(r"</?/[/d+>", "", line)

Edición: Aquí hay una versión comentada que explica cómo funciona:

line = re.sub(r""" (?x) # Use free-spacing mode. < # Match a literal ''<'' /? # Optionally match a ''/'' /[ # Match a literal ''['' /d+ # Match one or more digits > # Match a literal ''>'' """, "", line)

¡Los regexes son divertidos! Pero recomendaría encarecidamente pasar una o dos horas estudiando lo básico. Para empezar, necesita aprender qué caracteres son especiales: "metacaracteres" que deben escaparse (es decir, con una barra invertida colocada al frente y las reglas son diferentes dentro y fuera de las clases de caracteres). Hay un excelente tutorial en línea en: www.regular-expressions.info . El tiempo que pases allí se pagará por sí mismo muchas veces. Feliz regexing!


La forma más fácil

import re txt=''this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. and there are many other lines in the txt files with<[3> such tags </[3>'' out = re.sub("(<[^>]+>)", '''', txt) print out


Me gustaría esto (expresiones regulares explicadas en los comentarios):

import re # If you need to use the regex more than once it is suggested to compile it. pattern = re.compile(r"</{0,}/[/d+>") # <//{0,}/[/d+> # # Match the character “<” literally «<» # Match the character “/” literally «//{0,}» # Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «{0,}» # Match the character “[” literally «/[» # Match a single digit 0..9 «/d+» # Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» # Match the character “>” literally «>» subject = """this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. and there are many other lines in the txt files with<[3> such tags </[3>""" result = pattern.sub("", subject) print(result)

Si desea obtener más información sobre expresiones regulares, le recomiendo leer el libro de recetas de Expresiones regulares de Jan Goyvaerts y Steven Levithan.


no tiene que usar expresiones regulares (para su cadena de muestra)

>>> s ''this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. /nand there are many other lines in the txt files/nwith<[3> such tags </[3>/n'' >>> for w in s.split(">"): ... if "<" in w: ... print w.split("<")[0] ... this is a paragraph with in between and then there are cases ... where the number ranges from 1-100 . and there are many other lines in the txt files with such tags


str.replace() hace reemplazos fijos. Utilice re.sub() lugar.