optimizar - mysql concatenar filas
Solim Dataimport de mysql muere cuando se elimina el lĂmite de consultas de MySQL (2)
Tengo una base de datos mysql con más de 400,000,000 de registros en una sola tabla.
La importación de Solr funciona bien cuando establezco un límite. Mi última prueba fue de 10,000,000 y funcionó con éxito en aproximadamente 5 minutos. Es necesario tener el conjunto de datos completo en Solr y cuando elimino el límite, solr informa que ha perdido la conexión a mysql.
Creo que me estoy encontrando con una configuración errónea o una limitación con mi configuración actual.
Aquí está el registro catalina.out de tomcat
Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT id as solr_id, id,firstname, lastname, state from names Processing Document # 1
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:621)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225)
... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT id as solr_id, id,firstname, lastname, state from names Processing Document # 1
at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<init>(JdbcDataSource.java:253)
at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.pullRow(EntityProcessorWrapper.java:330)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:296)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:683)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619)
... 5 more
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 353,135 milliseconds ago. The last packet sent successfully to the server was 353,135 milliseconds ago.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1121)
at com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:2243)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1999)
at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:3504)
at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:490)
at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:3198)
at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:2366)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2789)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2809)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2758)
at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:894)
at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:732)
at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<init>(JdbcDataSource.java:246)
... 13 more
Caused by: java.io.EOFException: Can not read response from server. Expected to read 17 bytes, read 12 bytes before connection was unexpectedly lost.
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3119)
at com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:2224)
... 24 more
Jun 16, 2013 4:51:37 PM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: start rollback
Jun 16, 2013 4:51:37 PM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: end_rollback
Registro de MySQL
130616 16:15:29 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
130616 16:15:29 [Warning] option ''max_allowed_packet'': unsigned value 53687091200 adjusted to 1073741824
130616 16:15:29 InnoDB: Initializing buffer pool, size = 22.0G
130616 16:15:30 InnoDB: Completed initialization of buffer pool
130616 16:15:31 InnoDB: Started; log sequence number 49 1885610091
130616 16:15:31 [Note] Event Scheduler: Loaded 0 events
130616 16:15:31 [Note] /usr/libexec/mysqld: ready for connections.
Version: ''5.1.69'' socket: ''/var/lib/mysql/mysql.sock'' port: 3306 Source distribution
Tabla MySQL
CREATE TABLE `names` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`state` varchar(12) DEFAULT NULL,
`source` varchar(128) DEFAULT NULL,
`lastname` varchar(128) DEFAULT NULL,
`firstname` varchar(128) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `source-state` (`source`,`state`),
KEY `first-last` (`firstname`,`lastname`),
KEY `first-last-state` (`firstname`,`lastname`,`state`)
) ENGINE=InnoDB AUTO_INCREMENT=433924211 DEFAULT CHARSET=utf8;
Configuración de MySQL
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
wait_timeout=66600
net_read_timeout=66600
interactive_timeout=666666
max_allowed_packet=50G
max_connect_errors=666666
innodb_buffer_pool_size=22GB
innodb_flush_log_at_trx_commit = 2
thread_cache_size=4
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
Solr data-config.xml
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://127.0.0.1:3306/databasename?zeroDateTimeBehavior=convertToNull"
user="root"
password="password"
stream="true"/>
<document name="content">
<entity name="names" pk="id" query="SELECT id as solr_id, id,firstname, lastname, state from names limit 10000000">
<field column="solr_id" name="solr_id" />
<field column="id" name="id" />
<field column="firstname" name="firstname" />
<field column="lastname" name="lastname" />
<field column="state" name="state" />
</entity>
</document>
</dataConfig>
Agregue batchSize = "- 1" a data-config.xml
Puede crear múltiples entidades raíz junto con la cláusula LIMIT
como:
<entity name="one" pk="id" query="SELECT id as solr_id, id,firstname, lastname, state from names limit 0, 10000000">
<field column="solr_id" name="solr_id" />
<field column="id" name="id" />
<field column="firstname" name="firstname" />
<field column="lastname" name="lastname" />
<field column="state" name="state" />
</entity>
<entity name="two" pk="id" query="SELECT id as solr_id, id,firstname, lastname, state from names limit 10000000, 20000000">
<field column="solr_id" name="solr_id" />
<field column="id" name="id" />
<field column="firstname" name="firstname" />
<field column="lastname" name="lastname" />
<field column="state" name="state" />
</entity>
hasta las filas requeridas
O
Puede pasar los parámetros de limit
y offset
como parámetros de request
como -
<entity name="one" pk="id" query="SELECT id as solr_id, id,firstname, lastname, state from names limit ${dataimporter.request.startAt}, ${dataimporter.request.count}">
<field column="solr_id" name="solr_id" />
<field column="id" name="id" />
<field column="firstname" name="firstname" />
<field column="lastname" name="lastname" />
<field column="state" name="state" />
</entity>
El enlace de consulta para la segunda opción se vería como: http://<hostname>:<port>/solr/dataimport?command=full-import&startAt=0&count=10000000