string - ejemplo - Encontrar cadena entre dos subcadenas

substring python ejemplo (19)

¿Cómo encuentro una cadena entre dos subcadenas ( ''123STRINGabc'' -> ''STRING'' )?

Mi método actual es así:

>>> start = ''asdf=5;'' >>> end = ''123jasd'' >>> s = ''asdf=5;iwantthis123jasd'' >>> print((s.split(start))[1].split(end)[0]) iwantthis

Sin embargo, esto parece muy ineficiente y antiponético. ¿Cuál es una mejor manera de hacer algo como esto?

Se olvidó de mencionar: es posible que la cadena no comience y termine con start y end . Pueden tener más personajes antes y después.

Además de la respuesta de Nikolaus Gradwohl, necesitaba obtener el número de versión (es decir, 0.0.2 ) entre (''ui:'' y ''-'') del siguiente contenido del archivo (nombre de archivo: docker-compose.yml ):

version: ''3.1'' services: ui: image: repo-pkg.dev.io:21/website/ui:0.0.2-QA1 #network_mode: host ports: - 443:9999 ulimits: nofile:test

y así es como funcionó para mí (script python):

import re, sys f = open(''docker-compose.yml'', ''r'') lines = f.read() result = re.search(''ui:(.*)-'', lines) print result.group(1) Result: 0.0.2

Aquí hay una función que hice para devolver una lista con una cadena (s) entre el string1 y el string2 buscado.

def GetListOfSubstrings(stringSubject,string1,string2): MyList = [] intstart=0 strlength=len(stringSubject) continueloop = 1 while(intstart < strlength and continueloop == 1): intindex1=stringSubject.find(string1,intstart) if(intindex1 != -1): #The substring was found, lets proceed intindex1 = intindex1+len(string1) intindex2 = stringSubject.find(string2,intindex1) if(intindex2 != -1): subsequence=stringSubject[intindex1:intindex2] MyList.append(subsequence) intstart=intindex2+len(string2) else: continueloop=0 else: continueloop=0 return MyList #Usage Example mystring="s123y123o123pp123y6" List = GetListOfSubstrings(mystring,"1","y68") for x in range(0, len(List)): print(List[x]) output: mystring="s123y123o123pp123y6" List = GetListOfSubstrings(mystring,"1","3") for x in range(0, len(List)): print(List[x]) output: 2 2 2 2 mystring="s123y123o123pp123y6" List = GetListOfSubstrings(mystring,"1","y") for x in range(0, len(List)): print(List[x]) output: 23 23o123pp123

Aquí hay una manera de hacerlo

_,_,rest = s.partition(start) result,_,_ = rest.partition(end) print result

Otra forma de usar regexp

import re print re.findall(re.escape(start)+"(.*)"+re.escape(end),s)[0]

print re.search(re.escape(start)+"(.*)"+re.escape(end),s).group(1)

El análisis de texto con delimitadores de diferentes plataformas de correo electrónico planteó una versión de mayor tamaño de este problema. Generalmente tienen un COMIENZO y una PARADA. Los caracteres del delimitador para comodines se mantienen ahogando en expresiones regex. El problema con la división se menciona aquí y en otras partes. Vaya, el carácter del delimitador ya no está. Se me ocurrió usar replace () para dar split () algo más para consumir. Trozo de código:

nuke = ''~~~'' start = ''|*'' stop = ''*|'' julien = (textIn.replace(start,nuke + start).replace(stop,stop + nuke).split(nuke)) keep = [chunk for chunk in julien if start in chunk and stop in chunk] logging.info(''keep: %s'',keep)

El formato de cadena agrega cierta flexibilidad a lo que sugirió Nikolaus Gradwohl. start y el end ahora se pueden modificar como se desee.

import re s = ''asdf=5;iwantthis123jasd'' start = ''asdf=5;'' end = ''123jasd'' result = re.search(''%s(.*)%s'' % (start, end), s).group(1) print(result)

Esta es esencialmente la respuesta de cji: 30 de julio a las 5:58. Cambié la estructura try try para obtener un poco más de claridad sobre qué causaba la excepción.

def find_between( inputStr, firstSubstr, lastSubstr ): '''''' find between firstSubstr and lastSubstr in inputStr STARTING FROM THE LEFT http://.com/questions/3368969/find-string-between-two-substrings above also has a func that does this FROM THE RIGHT '''''' start, end = (-1,-1) try: start = inputStr.index( firstSubstr ) + len( firstSubstr ) except ValueError: print '' ValueError: '', print "firstSubstr=%s - "%( firstSubstr ), print sys.exc_info()[1] try: end = inputStr.index( lastSubstr, start ) except ValueError: print '' ValueError: '', print "lastSubstr=%s - "%( lastSubstr ), print sys.exc_info()[1] return inputStr[start:end]

Estas soluciones asumen que la cadena de inicio y la cadena final son diferentes. Aquí hay una solución que uso para un archivo completo cuando los indicadores iniciales y finales son los mismos, suponiendo que el archivo completo se lee usando readlines ():

def extractstring(line,flag=''$''): if flag in line: # $ is the flag dex1=line.index(flag) subline=line[dex1+1:-1] #leave out flag (+1) to end of line dex2=subline.index(flag) string=subline[0:dex2].strip() #does not include last flag, strip whitespace return(string)

Ejemplo:

lines=[''asdf 1qr3 qtqay 45q at $A NEWT?$ asdfa afeasd'', ''afafoaltat $I GOT BETTER!$ derpity derp derp''] for line in lines: string=extractstring(line,flag=''$'') print(string)

Da:

A NEWT? I GOT BETTER!

Esto lo publiqué antes como fragmento de código en Daniweb :

# picking up piece of string between separators # function using partition, like partition, but drops the separators def between(left,right,s): before,_,a = s.partition(left) a,_,after = a.partition(right) return before,a,after s = "bla bla blaa <a>data</a> lsdjfasdjöf (important notice) ''Daniweb forum'' tcha tcha tchaa" print between(''<a>'',''</a>'',s) print between(''('','')'',s) print between("''","''",s) """ Output: (''bla bla blaa '', ''data'', " lsdjfasdj/xc3/xb6f (important notice) ''Daniweb forum'' tcha tcha tchaa") (''bla bla blaa <a>data</a> lsdjfasdj/xc3/xb6f '', ''important notice'', " ''Daniweb forum'' tcha tcha tchaa") (''bla bla blaa <a>data</a> lsdjfasdj/xc3/xb6f (important notice) '', ''Daniweb forum'', '' tcha tcha tchaa'') """

Esto me parece mucho más directo:

import re s = ''asdf=5;iwantthis123jasd'' x= re.search(''iwantthis'',s) print(s[x.start():x.end()])

Mi método será hacer algo como,

find index of start string in s => i find index of end string in s => j substring = substring(i+len(start) to j-1)

Para extraer STRING , intente:

myString = ''123STRINGabc'' startString = ''123'' endString = ''abc'' mySubString=myString[myString.find(startString)+len(startString):myString.find(endString)]

Simplemente convirtiendo la solución de OP en una respuesta:

def find_between(s, start, end): return (s.split(start))[1].split(end)[0]

Simplemente puede usar este código o copiar la función a continuación. Todo prolijamente en una línea.

def substring(whole, sub1, sub2): return whole[whole.index(sub1) : whole.index(sub2)]

Si ejecuta la función de la siguiente manera.

print(substring("5+(5*2)+2", "(", "("))

Es probable que se quede con la salida:

(5*2

más bien que

5*2

Si desea tener las subcadenas al final de la salida, el código debe verse a continuación.

return whole[whole.index(sub1) : whole.index(sub2) + 1]

Pero si no quiere las subcadenas en el extremo, el +1 debe estar en el primer valor.

return whole[whole.index(sub1) + 1 : whole.index(sub2)]

from timeit import timeit from re import search, DOTALL def partition_find(string, start, end): return string.partition(start)[2].rpartition(end)[0] def re_find(string, start, end): # applying re.escape to start and end would be safer return search(start + ''(.*)'' + end, string, DOTALL).group(1) def index_find(string, start, end): return string[string.find(start) + len(start):string.rfind(end)] # The wikitext of "Alan Turing law" article form English Wikipeida # https://en.wikipedia.org/w/index.php?title=Alan_Turing_law&action=edit&oldid=763725886 string = """...""" start = ''==Proposals=='' end = ''==Rival bills=='' assert index_find(string, start, end) / == partition_find(string, start, end) / == re_find(string, start, end) print(''index_find'', timeit( ''index_find(string, start, end)'', globals=globals(), number=100_000, )) print(''partition_find'', timeit( ''partition_find(string, start, end)'', globals=globals(), number=100_000, )) print(''re_find'', timeit( ''re_find(string, start, end)'', globals=globals(), number=100_000, ))

Resultado:

index_find 0.35047444528454114 partition_find 0.5327825636197754 re_find 7.552149639286381

re_find fue casi 20 veces más lento que index_find en este ejemplo.

import re s = ''asdf=5;iwantthis123jasd'' result = re.search(''asdf=5;(.*)123jasd'', s) print result.group(1)

s = "123123STRINGabcabc" def find_between( s, first, last ): try: start = s.index( first ) + len( first ) end = s.index( last, start ) return s[start:end] except ValueError: return "" def find_between_r( s, first, last ): try: start = s.rindex( first ) + len( first ) end = s.rindex( last, start ) return s[start:end] except ValueError: return "" print find_between( s, "123", "abc" ) print find_between_r( s, "123", "abc" )

da:

123STRING STRINGabc

Pensé que debería tenerse en cuenta: dependiendo del comportamiento que necesita, puede mezclar llamadas rindex y rindex o ir con una de las versiones anteriores (es equivalente a los grupos regex (.*) Y (.*?) ).

s[len(start):-len(end)]

source=''your token _here0@df and maybe _here1@df or maybe _here2@df'' start_sep=''_'' end_sep=''@df'' result=[] tmp=source.split(start_sep) for par in tmp: if end_sep in par: result.append(par.split(end_sep)[0]) print result

debe mostrar: aquí0, aquí1, aquí2

la expresión regular es mejor pero requerirá lib adicional y es posible que desee ir solo para python

start = ''asdf=5;'' end = ''123jasd'' s = ''asdf=5;iwantthis123jasd'' print s[s.find(start)+len(start):s.rfind(end)]

iwantthis