It happens.
You have a Master -> Slave MySQL replication and your master crashes
for a reason or another.
You’re worried about bringing the master back online as soon as
possible.
When you finally get it back up, you notice the slave is stopped,
complaining about a dreaded error that looks a bit like this:
The binlog file got corrupted when the master database server crashed.
Restoring a backup won’t save you, nor will re-importing the database.
Simply skipping the corrupted binlog will result in a loss of data on
your slave, therefore voiding your data integrity.
So, what’s the solution ?
Skip the binlog AND replace the missing data.
But how do you know what is missing ? How do you fix it ?
This is where percona-toolkit (formerly known as maatkit) comes
into play.
A subset of the percona-toolkit, pt-table-sync, allows you to
checksum the databases you’re replicating, make out the differences and
translate them into actual SQL statements - therefore ensuring that your
master and slave are identical.
Skip the binlog
When your master booted, it eventually moved to the next binlog file.
You want to go on the slave, skip the current (corrupted) binlog and
move to the first position on the next binlog.
Following the example of the error provided above:
SLAVE STOP;
CHANGE MASTER TO MASTER_LOG_FILE = 'mysql-bin.000003';
CHANGE MASTER TO MASTER_LOG_POS = 0;
SLAVE START;
At this point, your slave should already be catching up it’s replication to the master, you can tell with a
SHOW SLAVE STATUS\G
Retrieve the missing data
Once your slave has caught up to the master, you’ll want to retrieve and insert the missing data.
Install required dependencies for percona-toolkit:
Download and install percona-toolkit:
You might have to refer to the documentation of percona-toolkit for the right usage depending on your scenario but your command should look a bit like this:
If you don’t want to commit changes to the database, you just want to see what is missing and what SQL statements table-sync would do, you can do:
And there you have it, all fixed !
pt-table-sync performance
Running pt-table-sync on a 260GB database with several hundred million
rows took a bit more than 2 hours on very high-end hardware.
It wasn’t taxing or heavy on the servers, I didn’t notice much load.
Share this post