oak-chunk-update
NAME
oak-chunk-update: Perform long, non-blocking UPDATE/DELETE operation in auto managed small chunksSYNOPSIS
Delete rows from world.City where population is small:oak-chunk-update --database=world --execute="DELETE FROM City WHERE Population < 10000000 AND OAK_CHUNK(City)"Same as above, provide fully qualified table names, use 100 rows chunk size:
oak-chunk-update --execute="DELETE FROM world.City WHERE Population < 10000000 AND OAK_CHUNK(world.City)" --chunk-size=100Same as above, sleep for 200 milliseconds between chunks, verbose:
oak-chunk-update --database=world --execute="DELETE FROM City WHERE Population < 10000000 AND OAK_CHUNK(City)" --chunk-size=100 --sleep=200 --verboseSame as above, quit when no rows are affected:
oak-chunk-update --database=world --execute="DELETE FROM City WHERE Population < 10000000 AND OAK_CHUNK(City)" --chunk-size=100 --sleep=200 --verbose --terminate-on-not-foundPerform a multi-table UPDATE operation, choose world.City as chunking table, sleep twice the query time:
oak-chunk-update --execute="UPDATE City, Country SET City.District = 'unknown' WHERE City.CountryCode = Country.Code AND Country.Continent = 'Africa' AND OAK_CHUNK(City)" --sleep-ratio=2Same as above, avoid using INFORMATION_SCHEMA by specifying chunk key:
oak-chunk-update --execute="UPDATE City, Country SET City.District = 'unknown' WHERE City.CountryCode = Country.Code AND Country.Continent = 'Africa' AND OAK_CHUNK(City)" --force-chunking-column=id:integerSame as above, specify constant start position and end position:
oak-chunk-update --execute="UPDATE City, Country SET City.District = 'unknown' WHERE City.CountryCode = Country.Code AND Country.Continent = 'Africa' AND OAK_CHUNK(City)" --force-chunking-column=id:integer --start-with=1074 --end-with=2990Same as above, specify query start position, skip locks:
oak-chunk-update --execute="UPDATE City, Country SET City.District = 'unknown' WHERE City.CountryCode = Country.Code AND Country.Continent = 'Africa' AND OAK_CHUNK(City)" --force-chunking-column=id:integer --start-with="SELECT MIN(id) FROM world.City WHERE District=''" --skip-lock-tablesProvide connection parameters. Prompt for password:
oak-chunk-update --user=root --ask-pass --socket=/tmp/mysql.sock --database=world --execute="DELETE FROM City WHERE Population < 10000000 AND OAK_CHUNK(City)"Use a defaults file for parameters.
oak-chunk-update --defaults-file=/home/myuser/.my-oak.cnf --database=world --execute="DELETE FROM City WHERE Population < 10000000 AND OAK_CHUNK(City)"
DESCRIPTION
This utility allows for splitting long running or non-indexed UPDATE/DELETE oprations, optionally multi-table ones. Long running updating queries are often used. Some examples:
- Purging old table records (e.g. purging old logs).
- Updating a column on a table scale.
- Deleting or updating a small number of rows, but with a non-indexed search condition.
- Inserting into one table aggregated values from other tables.
oak-chunk-update splits such long running tasks into small chunks. It also allows for sleep time between chunks. This allows for less lock time, better replication responsiveness (less lag) and less stress on system resources (CPU, IO). To perform, the utility uses a UNIQUE KEY on a given table, which is used for the splitting process. Note that the query may involve multiple tables (JOINed), in which case one of the tables must have a UNIQUE KEY. The utility requires, then:
- At least on of the tables participating in the UPDATE/DELETE query has a UNIQUE KEY.
- The query must indicate to the utility the table for which the UNIQUE KEY is used.
The query must include a hint in the form OAK_CHUNK(table_name) or OAK_CHUNK(database_name.table_name). See SYNOPSIS for examples. The table indicated in the OAK_CHUNK clause is the table which must contain a UNIQUE KEY, which is used for splitting the query. The utility rewrites the query by iteratively replacing the OAK_CHUNK(...) clause with appropriate values from the UNIQUE KEY. In case more than one UNIQUE KEY is available on the table, the utility chooses in the following order:
- If there's a PRIMARY KEY - this is the selected key
- A key for which the first column is non-textual is prefereable to a key for which the first column is textual
- A key with a smaller numeric data type takes precedance
- A key with fewer columns take precedance
The tool auto selects the chunking key by observing INFORMATION_SCHEMA. Reading from INFORMATION_SCHEMA is risky on large, busy servers. It is possible to instruct the tool to use a specific column, by adding --force-chunking-column.
Limiting the number of scanned rows is done via --start-with and --end-with. Neither, either or both can be specified, as constant or as a SELECT query returning a single integer value. In the latter case it is required that the chunking key is itself an integer. --terminate-on-not-found is another way of limiting scanned rows, by instructing the tool to quit the first time the query has no effect. This works well for monotonic columns, e.g. TIMESTAMP columns with increasing values.
OPTIONS
--ask-passPrompt for password.
-c CHUNK_SIZE, --chunk-size=CHUNK_SIZENumber of rows to act on in chunks (default: 1000). 0 means all rows updated in one operation The lower the number, the shorter any locks are held, but the more operations required and the more total running time. -d DATABASE, --database=DATABASE
Database name (required unless table is fully qualified)
--defaults-file=DEFAULTS_FILERead from MySQL configuration file. Overrides --user, --password, --socket, --port.
Configuration needs to be in the following format:
[client]
user=my_user
password=my_pass
socket=/tmp/mysql.sock
port=3306
Assuming chunking on numeric field (e.g. AUTO_INCREMENT), end chunking with this value. Either provide a constant or a query returning a single integer value.
-e EXECUTE_QUERY, --execute=EXECUTE_QUERYQuery to execute, which contains a chunk placeholder in the form of OAK_CHUNK(table_name) (required)
--force-chunking-column=FORCED_CHUNKING_COLUMNColumns to chunk by; avoids querying in INFORMATION_SCHEMA. Format: either column_name:type, where type is integer/text/temporal - for single column keys, or column1_name,column2_name,... for one or more column keys, with no type.
-H HOST, --host=HOSTMySQL host (default: localhost)
--no-log-binDo not log to binary log (actions will not replicate). This may be useful if the slave already finds it hard to replicate behind master. The utility may be spawned manually on slave machines, therefore utilizing more than one CPU core on those machines, making replication process faster due to parallelism.
-p PASSWORD, --password=PASSWORDMySQL password
-P PORT, --port=PORTTCP/IP port (default: 3306)
--print-progressShow number of affected rows during utility runtime
--sleep=SLEEP_MILLISNumber of milliseconds to sleep between chunks. Default: 0
--sleep-ratio=SLEEP_RATIORatio of sleep time to execution time. Default: 0
--skip-lock-tablesDo not issue a LOCK TABLES READ. May be required when using queries within --start-with or --end-with
--skip-retry-chunkAvoid retrying a chunk operation on error. Default: false
-S SOCKET, --socket=SOCKETMySQL socket file. Only applies when host is localhost
--start-with=START_WITHAssuming chunking on numeric field (e.g. AUTO_INCREMENT), start chunking from this value and onward. Either provide a constant or a query returning a single integer value.
--terminate-on-not-foundTerminate on first occurrence where chunking did not affect any rows (default: False)
-u USER, --user=USERMySQL user
-v, --verbosePrint user friendly messages
ENVIRONMENT
Requires MySQL 5.0 or newer, python 2.3 or newer. python-mysqldb must be installed in order to use this tool. You canapt-get install python-mysqldbor
yum install mysql-python
SEE ALSO
oak-repeat-query
LICENSE
This tool is released under the BSD license.Copyright (c) 2008 - 2010, Shlomi Noach All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the organization nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.