python - tiene - Cómo dividir una lista de cadenas en sub-listas de cadenas por un elemento de cadena específico

listas en python (6)

Aquí hay otra manera de usar solo las operaciones de lista estándar (¡sin dependencias de otras bibliotecas!). Primero encontramos los puntos de división y luego creamos sublistas alrededor de ellos; Observe que el primer elemento se trata como un caso especial:

a = [''this'', ''is'', ''a'', ''cat'', ''.'', ''hello'', ''.'', ''she'', ''is'', ''nice'', ''.''] indexes = [-1] + [i for i, x in enumerate(a) if x == ''.''] [a[indexes[i]+1:indexes[i+1]] for i in range(len(indexes)-1)] => [[''this'', ''is'', ''a'', ''cat''], [''hello''], [''she'', ''is'', ''nice'']]

Tengo una lista de palabras como abajo. Quiero dividir la lista por . . ¿Hay algún código mejor o útil en Python 3?

a = [''this'', ''is'', ''a'', ''cat'', ''.'', ''hello'', ''.'', ''she'', ''is'', ''nice'', ''.''] result = [] tmp = [] for elm in a: if elm is not ''.'': tmp.append(elm) else: result.append(tmp) tmp = [] print(result) # result: [[''this'', ''is'', ''a'', ''cat''], [''hello''], [''she'', ''is'', ''nice'']]

Actualizar

Agrega casos de prueba para manejarlo correctamente.

a = [''this'', ''is'', ''a'', ''cat'', ''.'', ''hello'', ''.'', ''she'', ''is'', ''nice'', ''.''] b = [''this'', ''is'', ''a'', ''cat'', ''.'', ''hello'', ''.'', ''she'', ''is'', ''nice'', ''.'', ''yes''] c = [''.'', ''this'', ''is'', ''a'', ''cat'', ''.'', ''hello'', ''.'', ''she'', ''is'', ''nice'', ''.'', ''yes''] def split_list(list_data, split_word=''.''): result = [] sub_data = [] for elm in list_data: if elm is not split_word: sub_data.append(elm) else: if len(sub_data) != 0: result.append(sub_data) sub_data = [] if len(sub_data) != 0: result.append(sub_data) return result print(split_list(a)) # [[''this'', ''is'', ''a'', ''cat''], [''hello''], [''she'', ''is'', ''nice'']] print(split_list(b)) # [[''this'', ''is'', ''a'', ''cat''], [''hello''], [''she'', ''is'', ''nice''], [''yes'']] print(split_list(c)) # [[''this'', ''is'', ''a'', ''cat''], [''hello''], [''she'', ''is'', ''nice''], [''yes'']]

Esta respuesta requiere la instalación de una biblioteca de terceros: iteration_utilities ¹ . La función de split incluida facilita la resolución de esta tarea:

>>> from iteration_utilities import split >>> a = [''this'', ''is'', ''a'', ''cat'', ''.'', ''hello'', ''.'', ''she'', ''is'', ''nice'', ''.''] >>> list(filter(None, split(a, ''.'', eq=True))) [[''this'', ''is'', ''a'', ''cat''], [''hello''], [''she'', ''is'', ''nice'']]

En lugar de utilizar el parámetro eq , también puede definir una función personalizada donde dividir:

>>> list(filter(None, split(a, lambda x: x==''.''))) [[''this'', ''is'', ''a'', ''cat''], [''hello''], [''she'', ''is'', ''nice'']]

En caso de que quiera mantener el ''.'' También puedes usar el argumento keep_before :

>>> list(filter(None, split(a, ''.'', eq=True, keep_before=True))) [[''this'', ''is'', ''a'', ''cat'', ''.''], [''hello'', ''.''], [''she'', ''is'', ''nice'', ''.'']]

Tenga en cuenta que la biblioteca solo lo hace más fácil, es fácil (vea las otras respuestas) realizar esta tarea sin instalar una biblioteca adicional.

El filter se puede quitar si no esperas ''.'' para que aparezca al principio o al final de su lista de división.

¹ Soy el autor de esa biblioteca. Está disponible vía pip o el canal conda-forge con conda .

No pude evitarlo, solo quería divertirme con esta gran pregunta:

import itertools a = [''this'', ''is'', ''a'', ''cat'', ''.'', ''hello'', ''.'', ''she'', ''is'', ''nice'', ''.''] b = [''this'', ''is'', ''a'', ''cat'', ''.'', ''hello'', ''.'', ''she'', ''is'', ''nice'', ''.'', ''yes''] c = [''.'', ''this'', ''is'', ''a'', ''cat'', ''.'', ''hello'', ''.'', ''she'', ''is'', ''nice'', ''.'', ''yes''] def split_dots(lst): dots = [0] + [i+1 for i, e in enumerate(lst) if e == ''.''] result = [list(itertools.takewhile(lambda x : x != ''.'', lst[dot:])) for dot in dots] return list(filter(lambda x : x, result)) print(split_dots(a)) # [[''this'', ''is'', ''a'', ''cat''], [''hello''], [''she'', ''is'', ''nice'']] print(split_dots(b)) # [[''this'', ''is'', ''a'', ''cat''], [''hello''], [''she'', ''is'', ''nice''], [''yes'']] print(split_dots(c)) # [[''this'', ''is'', ''a'', ''cat''], [''hello''], [''she'', ''is'', ''nice''], [''yes'']]

Puede hacer todo esto con un "uso de una sola línea" usando la comprensión de listas y las funciones de cadena de join , split , split y sin bibliotecas adicionales.

a = [''this'', ''is'', ''a'', ''cat'', ''.'', ''hello'', ''.'', ''she'', ''is'', ''nice'', ''.''] b = [''this'', ''is'', ''a'', ''cat'', ''.'', ''hello'', ''.'', ''she'', ''is'', ''nice'', ''.'', ''yes''] c = [''.'', ''this'', ''is'', ''a'', ''cat'', ''.'', ''hello'', ''.'', ''she'', ''is'', ''nice'', ''.'', ''yes''] In [5]: [i.strip().split('' '') for i in '' ''.join(a).split(''.'') if len(i) > 0 ] Out[5]: [[''this'', ''is'', ''a'', ''cat''], [''hello''], [''she'', ''is'', ''nice'']] In [8]: [i.strip().split('' '') for i in '' ''.join(b).split(''.'') if len(i) > 0 ] Out[8]: [[''this'', ''is'', ''a'', ''cat''], [''hello''], [''she'', ''is'', ''nice''], [''yes'']] In [9]: In [8]: [i.strip().split('' '') for i in '' ''.join(c).split(''.'') if len(i) > 0 ] Out[9]: [[''this'', ''is'', ''a'', ''cat''], [''hello''], [''she'', ''is'', ''nice''], [''yes'']]

@Craig tiene una actualización más simple:

[s.split() for s in '' ''.join(a).split(''.'') if s]

Puede reconstruir la cadena usando '' ''.join y use regex:

import re a = [''this'', ''is'', ''a'', ''cat'', ''.'', ''hello'', ''.'', ''she'', ''is'', ''nice'', ''.''] new_s = [b for b in [re.split(''/s'', i) for i in re.split(''/s*/./s*'', '' ''.join(a))] if all(b)]

Salida:

[[''this'', ''is'', ''a'', ''cat''], [''hello''], [''she'', ''is'', ''nice'']]

Usando itertools.groupby

from itertools import groupby a = [''this'', ''is'', ''a'', ''cat'', ''.'', ''hello'', ''.'', ''she'', ''is'', ''nice'', ''.''] result = [list(g) for k,g in groupby(a,lambda x:x==''.'') if not k] print (result) #[[''this'', ''is'', ''a'', ''cat''], [''hello''], [''she'', ''is'', ''nice'']]