redshift que pricing español amazon-web-services amazon-redshift paraccel

amazon web services - que - Cómo medir el espacio de tabla en el disco en RedShift/ParAccel



redshift sql (4)

Agregando propietario y un filtro de esquema a la consulta anterior:

select cast(use.usename as varchar(50)) as owner, trim(pgdb.datname) as Database, trim(pgn.nspname) as Schema, trim(a.name) as Table, b.mbytes, a.rows from (select db_id, id, name, sum(rows) as rows from stv_tbl_perm a group by db_id, id, name ) as a join pg_class as pgc on pgc.oid = a.id left join pg_user use on (pgc.relowner = use.usesysid) join pg_namespace as pgn on pgn.oid = pgc.relnamespace -- leave out system schemas and pgn.nspowner > 1 join pg_database as pgdb on pgdb.oid = a.db_id join (select tbl, count as mbytes from stv_blocklist group by tbl ) b on a.id = b.tbl order by mbytes desc, a.db_id, a.name;

Tengo una mesa en RedShift. ¿Cómo puedo ver la cantidad de espacio en disco que utiliza?


Sé que esta pregunta es antigua y ya se aceptó una respuesta, pero debo señalar que la respuesta es incorrecta. Lo que la consulta genera como "mb" es en realidad el "número de bloques". La respuesta sería correcta solo si el tamaño del bloque es de 1 MB (que es el valor predeterminado).

Si el tamaño del bloque es diferente (en mi caso, por ejemplo, 256K), tiene que multiplicar el número de bloques por su tamaño en bytes. Sugiero el siguiente cambio en su consulta donde multiplico la cantidad de bloques por el tamaño de bloque en bytes (262144 bytes) y luego divido por (1024 * 1024) para generar el total en megabytes:

select trim(pgdb.datname) as Database, trim(pgn.nspname) as Schema, trim(a.name) as Table, b.mbytes as previous_wrong_value, (b.mbytes * 262144)::bigint/(1024*1024) as "Total MBytes", a.rows from ( select db_id, id, name, sum(rows) as rows from stv_tbl_perm a group by db_id, id, name ) as a join pg_class as pgc on pgc.oid = a.id join pg_namespace as pgn on pgn.oid = pgc.relnamespace join pg_database as pgdb on pgdb.oid = a.db_id join ( select tbl, count(blocknum) as mbytes from stv_blocklist group by tbl ) b on a.id = b.tbl order by mbytes desc, a.db_id, a.name;


Solo pensé que me expandiría en esto ya que me enfrento a un problema de distribución desigual. He agregado algunos enlaces y campos para permitir el análisis del espacio por nodo y sector. También se agregan los valores máximos / mínimos y el número de valores por segmento para la columna 0.

select cast(use.usename as varchar(50)) as owner, trim(pgdb.datname) as Database, trim(pgn.nspname) as Schema, trim(a.name) as Table, a.node, a.slice, b.mbytes, a.rows, a.num_values, a.minvalue, a.maxvalue from (select a.db_id, a.id, s.node, s.slice, a.name, d.num_values, d.minvalue, d.maxvalue, sum(rows) as rows from stv_tbl_perm a inner join stv_slices s on a.slice = s.slice inner join ( select tbl, slice, sum(num_values) as num_values, min(minvalue) as minvalue, max(maxvalue) as maxvalue from svv_diskusage where col = 0 group by 1, 2) d on a.id = d.tbl and a.slice = d.slice group by 1, 2, 3, 4, 5, 6, 7, 8 ) as a join pg_class as pgc on pgc.oid = a.id left join pg_user use on (pgc.relowner = use.usesysid) join pg_namespace as pgn on pgn.oid = pgc.relnamespace -- leave out system schemas and pgn.nspowner > 1 join pg_database as pgdb on pgdb.oid = a.db_id join (select tbl, slice, count(*) as mbytes from stv_blocklist group by tbl, slice ) b on a.id = b.tbl and a.slice = b.slice order by mbytes desc, a.db_id, a.name, a.node;


Utilice las consultas de esta presentación: http://www.slideshare.net/AmazonWebServices/amazon-redshift-best-practices

Analizar el uso del espacio en disco para el clúster:

select trim(pgdb.datname) as Database, trim(pgn.nspname) as Schema, trim(a.name) as Table, b.mbytes, a.rows from ( select db_id, id, name, sum(rows) as rows from stv_tbl_perm a group by db_id, id, name ) as a join pg_class as pgc on pgc.oid = a.id join pg_namespace as pgn on pgn.oid = pgc.relnamespace join pg_database as pgdb on pgdb.oid = a.db_id join ( select tbl, count(*) as mbytes from stv_blocklist group by tbl ) b on a.id = b.tbl order by mbytes desc, a.db_id, a.name;

Analizar la distribución de tablas entre nodos:

select slice, col, num_values, minvalue, maxvalue from svv_diskusage where name = ''__INSERT__TABLE__NAME__HERE__'' and col = 0 order by slice, col;