algorithm - quitar - Implementar la justificación de texto con programación dinámica

porque no puedo justificar un texto en word (4)

En caso de que tenga problemas para entender la idea central de la programación dinámica, aquí está mi opinión:

La programación dinámica está sacrificando esencialmente la complejidad del espacio por la complejidad del tiempo (pero el espacio extra que usas suele ser muy pequeño en comparación con el tiempo que guardas, haciendo que la programación dinámica merezca la pena si se implementa correctamente). Usted almacena los valores de cada llamada recursiva sobre la marcha (por ejemplo, en una matriz o un diccionario) para evitar el cómputo por segunda vez cuando se ejecuta la misma llamada recursiva en otra rama del árbol de recursión.

Y no, no tienes que usar recursividad. Aquí está mi implementación de la pregunta en la que estaba trabajando usando solo bucles. Seguí el TextAlignment.pdf vinculado por AlexSilva muy de cerca. Espero que esto te sea útil.

def length(wordLengths, i, j): return sum(wordLengths[i- 1:j]) + j - i + 1 def breakLine(text, L): # wl = lengths of words wl = [len(word) for word in text.split()] # n = number of words in the text n = len(wl) # total badness of a text l1 ... li m = dict() # initialization m[0] = 0 # auxiliary array s = dict() # the actual algorithm for i in range(1, n + 1): sums = dict() k = i while (length(wl, k, i) <= L and k > 0): sums[(L - length(wl, k, i))**3 + m[k - 1]] = k k -= 1 m[i] = min(sums) s[i] = sums[min(sums)] # actually do the splitting by working backwords line = 1 while n > 1: print("line " + str(line) + ": " + str(s[n]) + "->" + str(n)) n = s[n] - 1 line += 1

Estoy tratando de entender el concepto de programación dinámica, a través del curso en MIT OCW aquí . La explicación del video de OCW es genial y todo, pero siento que realmente no lo entiendo hasta que implementé la explicación en el código. Al implementar, me refiero a algunas notas de la nota de la conferencia aquí , particularmente a la página 3 de la nota.

El problema es que no tengo idea de cómo traducir parte de la notación matemática al código. Aquí hay una parte de la solución que implementé (y creo que se implementó correctamente):

import math paragraph = "Some long lorem ipsum text." words = paragraph.split(" ") # Count total length for all strings in a list of strings. # This function will be used by the badness function below. def total_length(str_arr): total = 0 for string in str_arr: total = total + len(string) total = total + len(str_arr) # spaces return total # Calculate the badness score for a word. # str_arr is assumed be send as word[i:j] as in the notes # we don''t make i and j as argument since it will require # global vars then. def badness(str_arr, page_width): line_len = total_length(str_arr) if line_len > page_width: return float(''nan'') else: return math.pow(page_width - line_len, 3)

Ahora la parte que no entiendo está en el punto 3 al 5 en las notas de la conferencia. Literalmente no entiendo y no sé por dónde empezar a implementarlos. Hasta ahora, he intentado iterar la lista de palabras y contar la maldad de cada supuestamente fin de línea, así:

def justifier(str_arr, page_width): paragraph = str_arr par_len = len(paragraph) result = [] # stores each line as list of strings for i in range(0, par_len): if i == (par_len - 1): result.append(paragraph) else: dag = [badness(paragraph[i:j], page_width) + justifier(paragraph[j:], page_width) for j in range(i + 1, par_len + 1)] # Should I do a min(dag), get the index, and declares it as end of line?

Pero entonces, no sé cómo puedo continuar la función, y para ser honesto, no entiendo esta línea:

dag = [badness(paragraph[i:j], page_width) + justifier(paragraph[j:], page_width) for j in range(i + 1, par_len + 1)]

y cómo devolveré el justifier como un int (ya que decidí almacenar el valor de retorno en el result , que es una lista. ¿Debo hacer otra función y volver a recurrir desde allí? ¿Debería haber alguna recursión?

¿Podrías por favor mostrarme qué hacer a continuación y explicarme cómo se trata de una programación dinámica? Realmente no puedo ver dónde está la recursión y cuál es el subproblema.

Gracias antes.

Acabo de ver la conferencia y el pensamiento pondría aquí todo lo que pudiera entender. He puesto el código en el formato similar al del interrogador. He usado la recursión aquí, como lo explicó la conferencia.
Punto # 3, define la recurrencia. Básicamente se trata de un fondo para acercarse, en el que se calcula antes un valor de la función correspondiente a una entrada más alta y luego se usa para calcular la entrada de menor valor.
La conferencia lo explica como:
DP (i) = min (DP (j) + maldad (i, j))
para j que varía de i + 1 a n.
Aquí, yo varío de n a 0 (de abajo hacia arriba).
Como DP (n) = 0,
DP (n-1) = DP (n) + maldad (n-1, n)
y luego calcula D (n-2) de D (n-1) y D (n) y saca un mínimo de ellos.
¡De esta forma puedes bajar hasta i = 0 y esa es la respuesta final de maldad!
En el punto # 4, como puede ver, hay dos bucles sucediendo aquí. Uno para i y el otro dentro de i para j.
Por lo tanto, cuando i = 0, j (max) = n, i = 1, j (max) = n-1, ... i = n, j (max) = 0.
por lo tanto, tiempo total = suma de estos = n (n + 1) / 2.
De ahí O (n ^ 2).
El punto n. ° 5 simplemente identifica la solución que DP [0]!
¡Espero que esto ayude!

import math justification_map = {} min_map = {} def total_length(str_arr): total = 0 for string in str_arr: total = total + len(string) total = total + len(str_arr) - 1 # spaces return total def badness(str_arr, page_width): line_len = total_length(str_arr) if line_len > page_width: return float(''nan'') else: return math.pow(page_width - line_len, 3) def justify(i, n, words, page_width): if i == n: return 0 ans = [] for j in range(i+1, n+1): #ans.append(justify(j, n, words, page_width)+ badness(words[i:j], page_width)) ans.append(justification_map[j]+ badness(words[i:j], page_width)) min_map[i] = ans.index(min(ans)) + 1 return min(ans) def main(): print "Enter page width" page_width = input() print "Enter text" paragraph = input() words = paragraph.split('' '') n = len(words) #justification_map[n] = 0 for i in reversed(range(n+1)): justification_map[i] = justify(i, n, words, page_width) print "Minimum badness achieved: ", justification_map[0] key = 0 while(key <n): key = key + min_map[key] print key if __name__ == ''__main__'': main()

Esto es lo que pienso según tu definición.

import math class Text(object): def __init__(self, words, width): self.words = words self.page_width = width self.str_arr = words self.memo = {} def total_length(self, str): total = 0 for string in str: total = total + len(string) total = total + len(str) # spaces return total def badness(self, str): line_len = self.total_length(str) if line_len > self.page_width: return float(''nan'') else: return math.pow(self.page_width - line_len, 3) def dp(self): n = len(self.str_arr) self.memo[n-1] = 0 return self.judge(0) def judge(self, i): if i in self.memo: return self.memo[i] self.memo[i] = float(''inf'') for j in range(i+1, len(self.str_arr)): bad = self.judge(j) + self.badness(self.str_arr[i:j]) if bad < self.memo[i]: self.memo[i] = bad return self.memo[i]

Para cualquier persona que aún esté interesada en esto: la clave es retroceder desde el final del texto (como se menciona aquí ). Si lo haces, simplemente compara elementos ya memorizados.

Diga, words es una lista de cadenas para envolver de acuerdo con el textwidth de textwidth . Luego, en la notación de la conferencia, la tarea se reduce a tres líneas de código:

import numpy as np textwidth = 80 DP = [0]*(len(words)+1) for i in range(len(words)-1,-1,-1): DP[i] = np.min([DP[j] + badness(words[i:j],textwidth) for j in range(i+1,len(words)+1)])

Con:

def badness(line,textwidth): # Number of gaps length_line = len(line) - 1 for word in line: length_line += len(word) if length_line > textwidth: return float(''inf'') return ( textwidth - length_line )**3

Él menciona que uno puede agregar una segunda lista para realizar un seguimiento de las posiciones de ruptura. Puedes hacerlo modificando el código para:

DP = [0]*(len(words)+1) breaks = [0]*(len(words)+1) for i in range(len(words)-1,-1,-1): temp = [DP[j] + badness(words[i:j],args.textwidth) for j in range(i+1,len(words)+1)] index = np.argmin(temp) # Index plus position in upper list breaks[i] = index + i + 1 DP[i] = temp[index]

Para recuperar el texto, simplemente use la lista de posiciones de ruptura:

def reconstruct_text(words,breaks): lines = [] linebreaks = [] i = 0 while True: linebreaks.append(breaks[i]) i = breaks[i] if i == len(words): linebreaks.append(0) break for i in range( len(linebreaks) ): lines.append( '' ''.join( words[ linebreaks[i-1] : linebreaks[i] ] ).strip() ) return lines

Resultado: ( text = reconstruct_text(words,breaks) )

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.

Uno podría estar tentado de agregar algunos espacios en blanco. Esto es bastante complicado (ya que uno podría proponer varias reglas estéticas) pero un intento ingenuo podría ser:

import re def spacing(text,textwidth,maxspace=4): for i in range(len(text)): length_line = len(text[i]) if length_line < textwidth: status_length = length_line whitespaces_remain = textwidth - status_length Nwhitespaces = text[i].count('' '') # If whitespaces (to add) per whitespace exeeds # maxspace, don''t do anything. if whitespaces_remain/Nwhitespaces > maxspace-1:pass else: text[i] = text[i].replace('' '','' ''*( 1 + int(whitespaces_remain/Nwhitespaces)) ) status_length = len(text[i]) # Periods have highest priority for whitespace insertion periods = text[i].split(''.'') # Can we add a whitespace behind each period? if len(periods) - 1 + status_length <= textwidth: text[i] = ''. ''.join(periods).strip() status_length = len(text[i]) whitespaces_remain = textwidth - status_length Nwords = len(text[i].split()) Ngaps = Nwords - 1 if whitespaces_remain != 0:factor = Ngaps / whitespaces_remain # List of whitespaces in line i gaps = re.findall(''/s+'', text[i]) temp = text[i].split() for k in range(Ngaps): temp[k] = ''''.join([temp[k],gaps[k]]) for j in range(whitespaces_remain): if status_length >= textwidth:pass else: replace = temp[int(factor*j)] replace = ''''.join([replace, " "]) temp[int(factor*j)] = replace text[i] = ''''.join(temp) return text

Lo que te da: ( text = spacing(text,textwidth) )