arrays - tamaño - tablas largas en latex

Expandir celdas de columna para cada celda de columna (8)

Tengo 3 conjuntos diferentes de datos (en diferentes columnas)

Animales (5 tipos diferentes) en la columna A
Frutas (1000 tipos diferentes) en la columna B
Países (10 tipos diferentes) en la columna C

Con estas 3 colecciones de datos, me gustaría recibir 5 × 1000 × 10 para un total de 50k elementos correspondientes en col. EFG (cada animal que corresponde con cada fruta y cada país).

Puede hacerse copiando y pegando valores manualmente, pero llevará años. ¿Hay alguna forma de automatizarlo por código VBA o

¿Existe alguna fórmula universal para conjuntos de datos ilimitados como el presentado anteriormente? Avíseme si algo no está claro.

Aquí hay un ejemplo más pequeño de datos y cómo deberían resultar los resultados:

Aquí hay una versión recursiva. Se supone que los datos no contienen pestañas internas ya que la función principal devuelve cadenas de producto que están delimitadas por tabuladores. El sub principal debe pasar un rango que consiste en los datos junto con la celda de la esquina superior izquierda del rango de salida. Esto probablemente podría modificarse un poco, pero es adecuado para fines de prueba.

ColumnProducts Range("A:C"), Range("E1")

Es la llamada que resuelve el problema de OP. Aquí está el código:

''the following function takes a collection of arrays of strings ''and returns a variant array of tab-delimited strings which ''comprise the (tab-delimited) cartesian products of ''the arrays in the collection Function CartesianProduct(ByVal Arrays As Collection) As Variant Dim i As Long, j As Long, k As Long, m As Long, n As Long Dim head As Variant Dim tail As Variant Dim product As Variant If Arrays.Count = 1 Then CartesianProduct = Arrays.Item(1) Exit Function Else head = Arrays.Item(1) Arrays.Remove 1 tail = CartesianProduct(Arrays) m = UBound(head) n = UBound(tail) ReDim product(1 To m * n) k = 1 For i = 1 To m For j = 1 To n product(k) = head(i) & vbTab & tail(j) k = k + 1 Next j Next i CartesianProduct = product End If End Function Sub ColumnProducts(data As Range, output As Range) Dim Arrays As New Collection Dim strings As Variant, product As Variant Dim i As Long, j As Long, n As Long, numRows As Long Dim col As Range, cell As Range Dim outRange As Range numRows = Range("A:A").Rows.Count For Each col In data.Columns n = col.EntireColumn.Cells(numRows).End(xlUp).Row i = col.Cells(1).Row ReDim strings(1 To n - i + 1) For j = 1 To n - i + 1 strings(j) = col.Cells(i + j - 1) Next j Arrays.Add strings Next col product = CartesianProduct(Arrays) n = UBound(product) Set outRange = Range(output, output.Offset(n - 1)) outRange.Value = Application.WorksheetFunction.Transpose(product) outRange.TextToColumns Destination:=output, DataType:=xlDelimited, Tab:=True End Sub

Aquí, mi enfoque para su problema.

Public Sub matchingCell() Dim animalRow, fruitRow, countryRow, checkRow, resultRow As Long Dim isExist As Boolean ''Set the start row animalRow = 2 resultRow = 2 ''Work with data sheet With Sheets("sheetname") ''Loop until animals column is blank Do While .Range("A" & animalRow) <> "" ''Set the start row fruitRow = 2 ''Loop until fruits column is blank Do While .Range("B" & fruitRow) <> "" ''Set the start row countryRow = 2 ''Loop until country column is blank Do While .Range("C" & countryRow) <> "" ''Set the start row checkRow = 2 ''Reset flag isExist = False ''Checking for duplicate row ''Loop all result row until D is blank Do While .Range("D" & checkRow) <> "" ''If duplicate row found If .Range("D" & checkRow) = .Range("A" & animalRow) And _ .Range("E" & checkRow) = .Range("B" & fruitRow) And _ .Range("F" & checkRow) = .Range("C" & countryRow) Then ''Set true for exist flag isExist = True End If checkRow = checkRow + 1 Loop ''If duplicate row not found If Not isExist Then .Range("D" & resultRow) = .Range("A" & animalRow) .Range("E" & resultRow) = .Range("B" & fruitRow) .Range("F" & resultRow) = .Range("C" & countryRow) ''Increase resultRow resultRow = resultRow + 1 End If ''Increase countryRow countryRow = countryRow + 1 Loop ''Increase fruitRow fruitRow = fruitRow + 1 Loop ''Increase fruitRow animalRow = animalRow + 1 Loop End With End Sub

Ya lo probé. Funciona bien Que tengas un buen día.

Bien, entonces solo quieres una lista de todas las combinaciones posibles. Esto es lo que haría:

Primero seleccione los datos sin procesar y elimine los duplicados, columna por columna.
Luego lea estas 3 columnas en 3 matrices separadas.
Calcule la longitud total de todas las matrices.
Luego, con un bucle, pegue el primer valor de la matriz del país tantas veces como haya combinaciones de animales y frutas, por lo que la longitud de esas matrices se multiplicó.
Dentro del ciclo, haga otro ciclo que publique todas las opciones de frutas. Con un número de filas duplicadas que es igual al número máximo de animales.
Luego pegue los animales sin duplicados siguiéndose hasta la última fila de la mesa.

Ejemplo clásico de una instrucción SQL de selección sin unión que devuelve el Producto cartesiano de todos los resultados combinados de las tablas enumeradas.

Solución de base de datos SQL

Simplemente importe Animals, Fruit, Country como tablas separadas en cualquier base de datos SQL como MS Access, SQLite, MySQL, etc. y enumere las tablas sin combinaciones, incluidas las combinaciones implícitas ( WHERE ) y explícitas ( JOIN ):

SELECT Animals.Animal, Fruits.Fruit, Countries.Country FROM Animals, Countries, Fruits;

Solución Excel

El mismo concepto con la ejecución de la instrucción SQL sin unión en VBA usando una conexión ODBC al libro de trabajo que contiene rangos de animales, países y frutas. Por ejemplo, cada grupo de datos está en su propia hoja de trabajo del mismo nombre.

Sub CrossJoinQuery() Dim conn As Object Dim rst As Object Dim sConn As String, strSQL As String Set conn = CreateObject("ADODB.Connection") Set rst = CreateObject("ADODB.Recordset") sConn = "Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};" _ & "DBQ=C:/Path To/Excel/Workbook.xlsx;" conn.Open sConn strSQL = "SELECT * FROM [Animals$A1:A3], [Fruits$A1:A3], [Countries$A1:A3] " rst.Open strSQL, conn Range("A1").CopyFromRecordset rst rst.Close conn.Close Set rst = Nothing Set conn = Nothing End Sub

En realidad, quiero modificar mi respuesta anterior. Pero, mi nueva respuesta es completamente diferente de la anterior. Porque, la respuesta anterior es para una columna específica y esta es para una columna universal. Después de contestar la respuesta anterior, el interrogador dice un nuevo requisito que quiere hacer en universal. Para una columna fija, podemos pensar en un bucle fijo y para una columna infinita, debemos pensar de otra manera. Entonces, yo también lo hago. Y los usuarios de SO también pueden ver las diferencias de código y creo que esto será útil para los principiantes.

Este nuevo código no es tan simple como el anterior. Si desea saber claramente sobre el código, sugerí depurar el código en línea por línea.

No te preocupes por el código. Ya lo probé paso a paso. A mí me funciona perfectamente. Si no es para ti, avísame. Una cosa es que este código puede causar un error para la fila en blanco (que no tiene datos). Porque, actualmente, no agregué verificar eso.

Aquí está mi enfoque universal para su problema:

Public Sub matchingCell() Dim startRawColumn, endRawColumn, startResultColumn, endResultColumn, startRow As Integer Dim index, row, column, containerIndex, tempIndex As Integer Dim columnCount, totalCount, timesCount, matchingCount, tempCount As Integer Dim isExist As Boolean Dim arrayContainer() As Variant ''Actually, even it is for universal, we need to know start column and end column of raw data. ''And also start row. And start column for write result. ''I set them for my test data. ''You need to modify them(startRawColumn, endRawColumn, startRow, startResultColumn). ''Set the start column and end column for raw data startRawColumn = 1 endRawColumn = 3 ''Set the start row for read data and write data startRow = 2 ''Set the start column for result data startResultColumn = 4 ''Get no of raw data column columnCount = endRawColumn - startRawColumn ''Set container index containerIndex = 0 ''Re-create array container for count of column ReDim arrayContainer(0 To columnCount) With Sheets("sheetname") ''Getting data from sheet ''Loop all column for getting data of each column For column = startRawColumn To endRawColumn Step 1 ''Create tempArray for column Dim tempArray() As Variant ''Reset startRow row = startRow ''Reset index index = 0 ''Here is one things. I looped until to blank. ''If you want anymore, you can modify the looping type. ''Don''t do any changes to main body of looping. ''Loop until the cell is blank Do While .Cells(row, column) <> "" ''Reset isExist flag isExist = False ''Remove checking for no data If index > 0 Then ''Loop previous data for duplicate checking For tempIndex = 0 To index - 1 Step 1 ''If found, set true to isExist and stop loop If tempArray(tempIndex) = .Cells(row, column) Then isExist = True Exit For End If Next tempIndex End If ''If there is no duplicate data, store data If Not isExist Then ''Reset tempArray ReDim Preserve tempArray(index) tempArray(index) = .Cells(row, column) ''Increase index index = index + 1 End If ''Increase row row = row + 1 Loop ''Store column with data arrayContainer(containerIndex) = tempArray ''Increase container index containerIndex = containerIndex + 1 Next column ''Now, we got all data column including data which has no duplicate ''Show result data on sheet ''Getting the result row count totalCount = 1 ''Get result row count For tempIndex = 0 To UBound(arrayContainer) Step 1 totalCount = totalCount * (UBound(arrayContainer(tempIndex)) + 1) Next tempIndex ''Reset timesCount timesCount = 1 ''Get the last column for result endResultColumn = startResultColumn + columnCount ''Loop array container For containerIndex = UBound(arrayContainer) To 0 Step -1 ''Getting the counts for looping If containerIndex = UBound(arrayContainer) Then duplicateCount = 1 timesCount = totalCount / (UBound(arrayContainer(containerIndex)) + 1) Else duplicateCount = duplicateCount * (UBound(arrayContainer(containerIndex + 1)) + 1) timesCount = timesCount / (UBound(arrayContainer(containerIndex)) + 1) End If ''Reset the start row row = startRow ''Loop timesCount For countIndex = 1 To timesCount Step 1 ''Loop data array For index = 0 To UBound(arrayContainer(containerIndex)) Step 1 ''Loop duplicateCount For tempIndex = 1 To duplicateCount Step 1 ''Write data to cell .Cells(row, endResultColumn) = arrayContainer(containerIndex)(index) ''Increase row row = row + 1 Next tempIndex Next index Next countIndex ''Increase result column index endResultColumn = endResultColumn - 1 Next containerIndex End With End Sub

Entiendo por universal , quieres que esto acomode cualquier número de columnas y cualquier número de entradas en cada una. Algunas matrices variantes deberían proporcionar las dimensiones necesarias para calcular los ciclos de repetición para cada valor.

Option Explicit Sub main() Call for_each_in_others(rDATA:=Worksheets("Sheet3").Range("A3"), bHDR:=True) End Sub Sub for_each_in_others(rDATA As Range, Optional bHDR As Boolean = False) Dim v As Long, w As Long Dim iINCROWS As Long, iMAXROWS As Long, sErrorRng As String Dim vVALs As Variant, vTMPs As Variant, vCOLs As Variant On Error GoTo bm_Safe_Exit appTGGL bTGGL:=False With rDATA.Parent With rDATA(1).CurrentRegion ''Debug.Print rDATA(1).Row - .Cells(1).Row With .Resize(.Rows.Count - (rDATA(1).Row - .Cells(1).Row), .Columns.Count).Offset(2, 0) sErrorRng = .Address(0, 0) vTMPs = .Value2 ReDim vCOLs(LBound(vTMPs, 2) To UBound(vTMPs, 2)) iMAXROWS = 1 ''On Error GoTo bm_Output_Exceeded For w = LBound(vTMPs, 2) To UBound(vTMPs, 2) vCOLs(w) = Application.CountA(.Columns(w)) iMAXROWS = iMAXROWS * vCOLs(w) Next w ''control excessive or no rows of output If iMAXROWS > Rows.Count Then GoTo bm_Output_Exceeded ElseIf .Columns.Count = 1 Or iMAXROWS = 0 Then GoTo bm_Nothing_To_Do End If On Error GoTo bm_Safe_Exit ReDim vVALs(LBound(vTMPs, 1) To iMAXROWS, LBound(vTMPs, 2) To UBound(vTMPs, 2)) iINCROWS = 1 For w = LBound(vVALs, 2) To UBound(vVALs, 2) iINCROWS = iINCROWS * vCOLs(w) For v = LBound(vVALs, 1) To UBound(vVALs, 1) vVALs(v, w) = vTMPs((Int(iINCROWS * ((v - 1) / UBound(vVALs, 1))) Mod vCOLs(w)) + 1, w) Next v Next w End With End With .Cells(2, UBound(vVALs, 2) + 2).Resize(1, UBound(vVALs, 2) + 2).EntireColumn.Delete If bHDR Then rDATA.Cells(1, 1).Offset(-1, 0).Resize(1, UBound(vVALs, 2)).Copy _ Destination:=rDATA.Cells(1, UBound(vVALs, 2) + 2).Offset(-1, 0) End If rDATA.Cells(1, UBound(vVALs, 2) + 2).Resize(UBound(vVALs, 1), UBound(vVALs, 2)) = vVALs End With GoTo bm_Safe_Exit bm_Nothing_To_Do: MsgBox "There is not enough data in " & sErrorRng & " to perform expansion." & Chr(10) & _ "This could be due to a single column of values or one or more blank column(s) of values." & _ Chr(10) & Chr(10) & "There is nothing to expand.", vbInformation, _ "Single or No Column of Raw Data" GoTo bm_Safe_Exit bm_Output_Exceeded: MsgBox "The number of expanded values created from " & sErrorRng & _ " (" & Format(iMAXROWS, "/> #, ##0") & " rows × " & UBound(vTMPs, 2) & _ " columns) exceeds the rows available (" & Format(Rows.Count, "#, ##0") & ") on this worksheet.", vbCritical, _ "Too Many Entries" bm_Safe_Exit: appTGGL End Sub Sub appTGGL(Optional bTGGL As Boolean = True) Application.EnableEvents = bTGGL Application.ScreenUpdating = bTGGL End Sub

Coloque las etiquetas de encabezado de columna en la fila 2 comenzando en la columna A y los datos directamente debajo de eso.

He agregado algunos controles de error para advertir de exceder el número de filas en una hoja de trabajo. Esto normalmente no es algo que probablemente sea una consideración, pero multiplicar el número de valores en un número indeterminado de columnas entre sí puede producir rápidamente una gran cantidad de resultados. No es imprevisible que supere las 1.048.576 filas.

Mi primer acercamiento a este problema fue similar al publicado por @Jeeped:

cargar columnas de entrada al conjunto y contar filas en cada columna
llenar matriz con todas las combinaciones
asignar matriz al rango de salida

Usando MicroTimer he calculado los tiempos promedio tomados por cada parte del algoritmo anterior. La parte 3. tomó 90% -93% del tiempo total de ejecución para datos de entrada más grandes.

A continuación se muestra mi intento de mejorar la velocidad de escritura de datos en la hoja de trabajo. He definido una constante iMinRSize=17 . Una vez que es posible llenar más de iMinRSize filas consecutivas con el mismo valor, el código deja de archivar la matriz y escribe directamente en el rango de la hoja de trabajo.

Sub CrossJoin(rSrc As Range, rTrg As Range) Dim vSrc() As Variant, vTrgPart() As Variant Dim iLengths() As Long Dim iCCnt As Integer, iRTrgCnt As Long, iRSrcCnt As Long Dim i As Integer, j As Long, k As Long, l As Long Dim iStep As Long Const iMinRSize As Long = 17 Dim iArrLastC As Integer On Error GoTo CleanUp Application.ScreenUpdating = False Application.EnableEvents = False vSrc = rSrc.Value2 iCCnt = UBound(vSrc, 2) iRSrcCnt = UBound(vSrc, 1) iRTrgCnt = 1 iArrLastC = 1 ReDim iLengths(1 To iCCnt) For i = 1 To iCCnt j = iRSrcCnt While (j > 0) And IsEmpty(vSrc(j, i)) j = j - 1 Wend iLengths(i) = j iRTrgCnt = iRTrgCnt * iLengths(i) If (iRTrgCnt < iMinRSize) And (iArrLastC < iCCnt) Then iArrLastC = iArrLastC + 1 Next i If (iRTrgCnt > 0) And (rTrg.row + iRTrgCnt - 1 <= rTrg.Parent.Rows.Count) Then ReDim vTrgPart(1 To iRTrgCnt, 1 To iArrLastC) iStep = 1 For i = 1 To iArrLastC k = 0 For j = 1 To iRTrgCnt Step iStep k = k + 1 If k > iLengths(i) Then k = 1 For l = j To j + iStep - 1 vTrgPart(l, i) = vSrc(k, i) Next l Next j iStep = iStep * iLengths(i) Next i rTrg.Resize(iRTrgCnt, iArrLastC) = vTrgPart For i = iArrLastC + 1 To iCCnt k = 0 For j = 1 To iRTrgCnt Step iStep k = k + 1 If k > iLengths(i) Then k = 1 rTrg.Resize(iStep).Offset(j - 1, i - 1).Value2 = vSrc(k, i) Next j iStep = iStep * iLengths(i) Next i End If CleanUp: Application.ScreenUpdating = True Application.EnableEvents = False End Sub Sub test() CrossJoin Range("a2:f10"), Range("k2") End Sub

Si establecemos iMinRSize en Rows.Count , todos los datos se escriben en la matriz. A continuación se muestran mis resultados de prueba de muestra:

El código funciona mejor si las columnas de entrada con el mayor número de filas son lo primero, pero no sería un gran problema modificar el código para clasificar las columnas y procesarlas en el orden correcto.

Puede hacer esto con fórmulas de hoja de trabajo. Si tiene rangos NAME''d: animales, frutas y países, el "truco" es generar índices en esa matriz para proporcionar todas las combinaciones.

Por ejemplo:

=CEILING(ROWS($1:1)/(ROWS(Fruits)*ROWS(Countries)),1)

generará una serie de números basada en 1 que se repite para las entradas de números en Fruits * Countries, lo que le da cuántas filas necesita para cada animal.

=MOD(CEILING(ROWS($1:1)/ROWS(Countries),1)-1,ROWS(Fruits))+1

generará una serie basada en 1 que repite cada fruta para el número de países.

=MOD(ROWS($1:1)-1,ROWS(Countries))+1))

Genera una secuencia repetitiva de 1..n donde n es el número de países.

Poniendo esto en fórmulas (con alguna comprobación de errores)

D3: =IFERROR(INDEX(Animals,CEILING(ROWS($1:1)/(ROWS(Fruits)*ROWS(Countries)),1)),"") E3: =IF(E3="","",INDEX(Fruits,MOD(CEILING(ROWS($1:1)/ROWS(Countries),1)-1,ROWS(Fruits))+1)) F3: =IF(E3="","",INDEX(Countries,MOD(ROWS($1:1)-1,ROWS(Countries))+1))