registros optimizar numero limitar lentas funcion filas ejemplos consultas consulta concatenar campos mysql solr dataimporthandler large-data

optimizar - mysql concatenar filas



Solim Dataimport de mysql muere cuando se elimina el lĂ­mite de consultas de MySQL (2)

Tengo una base de datos mysql con más de 400,000,000 de registros en una sola tabla.

La importación de Solr funciona bien cuando establezco un límite. Mi última prueba fue de 10,000,000 y funcionó con éxito en aproximadamente 5 minutos. Es necesario tener el conjunto de datos completo en Solr y cuando elimino el límite, solr informa que ha perdido la conexión a mysql.

Creo que me estoy encontrando con una configuración errónea o una limitación con mi configuración actual.

Aquí está el registro catalina.out de tomcat

Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT id as solr_id, id,firstname, lastname, state from names Processing Document # 1 at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:621) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225) ... 3 more Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT id as solr_id, id,firstname, lastname, state from names Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<init>(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.pullRow(EntityProcessorWrapper.java:330) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:296) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:683) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619) ... 5 more Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure The last packet successfully received from the server was 353,135 milliseconds ago. The last packet sent successfully to the server was 353,135 milliseconds ago. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:532) at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1121) at com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:2243) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1999) at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:3504) at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:490) at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:3198) at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:2366) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2789) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2809) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2758) at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:894) at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:732) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<init>(JdbcDataSource.java:246) ... 13 more Caused by: java.io.EOFException: Can not read response from server. Expected to read 17 bytes, read 12 bytes before connection was unexpectedly lost. at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3119) at com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:2224) ... 24 more Jun 16, 2013 4:51:37 PM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: start rollback Jun 16, 2013 4:51:37 PM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: end_rollback

Registro de MySQL

130616 16:15:29 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql 130616 16:15:29 [Warning] option ''max_allowed_packet'': unsigned value 53687091200 adjusted to 1073741824 130616 16:15:29 InnoDB: Initializing buffer pool, size = 22.0G 130616 16:15:30 InnoDB: Completed initialization of buffer pool 130616 16:15:31 InnoDB: Started; log sequence number 49 1885610091 130616 16:15:31 [Note] Event Scheduler: Loaded 0 events 130616 16:15:31 [Note] /usr/libexec/mysqld: ready for connections. Version: ''5.1.69'' socket: ''/var/lib/mysql/mysql.sock'' port: 3306 Source distribution

Tabla MySQL

CREATE TABLE `names` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT, `state` varchar(12) DEFAULT NULL, `source` varchar(128) DEFAULT NULL, `lastname` varchar(128) DEFAULT NULL, `firstname` varchar(128) DEFAULT NULL, PRIMARY KEY (`id`), KEY `source-state` (`source`,`state`), KEY `first-last` (`firstname`,`lastname`), KEY `first-last-state` (`firstname`,`lastname`,`state`) ) ENGINE=InnoDB AUTO_INCREMENT=433924211 DEFAULT CHARSET=utf8;

Configuración de MySQL

[mysqld] datadir=/var/lib/mysql socket=/var/lib/mysql/mysql.sock user=mysql # Disabling symbolic-links is recommended to prevent assorted security risks symbolic-links=0 wait_timeout=66600 net_read_timeout=66600 interactive_timeout=666666 max_allowed_packet=50G max_connect_errors=666666 innodb_buffer_pool_size=22GB innodb_flush_log_at_trx_commit = 2 thread_cache_size=4 [mysqld_safe] log-error=/var/log/mysqld.log pid-file=/var/run/mysqld/mysqld.pid

Solr data-config.xml

<dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://127.0.0.1:3306/databasename?zeroDateTimeBehavior=convertToNull" user="root" password="password" stream="true"/> <document name="content"> <entity name="names" pk="id" query="SELECT id as solr_id, id,firstname, lastname, state from names limit 10000000"> <field column="solr_id" name="solr_id" /> <field column="id" name="id" /> <field column="firstname" name="firstname" /> <field column="lastname" name="lastname" /> <field column="state" name="state" /> </entity> </document> </dataConfig>



Puede crear múltiples entidades raíz junto con la cláusula LIMIT como:

<entity name="one" pk="id" query="SELECT id as solr_id, id,firstname, lastname, state from names limit 0, 10000000"> <field column="solr_id" name="solr_id" /> <field column="id" name="id" /> <field column="firstname" name="firstname" /> <field column="lastname" name="lastname" /> <field column="state" name="state" /> </entity> <entity name="two" pk="id" query="SELECT id as solr_id, id,firstname, lastname, state from names limit 10000000, 20000000"> <field column="solr_id" name="solr_id" /> <field column="id" name="id" /> <field column="firstname" name="firstname" /> <field column="lastname" name="lastname" /> <field column="state" name="state" /> </entity>

hasta las filas requeridas

O

Puede pasar los parámetros de limit y offset como parámetros de request como -

<entity name="one" pk="id" query="SELECT id as solr_id, id,firstname, lastname, state from names limit ${dataimporter.request.startAt}, ${dataimporter.request.count}"> <field column="solr_id" name="solr_id" /> <field column="id" name="id" /> <field column="firstname" name="firstname" /> <field column="lastname" name="lastname" /> <field column="state" name="state" /> </entity>

El enlace de consulta para la segunda opción se vería como: http://<hostname>:<port>/solr/dataimport?command=full-import&startAt=0&count=10000000