It happens.
You have a Master -> Slave MySQL replication and your master crashes for a reason or another.

You’re worried about bringing the master back online as soon as possible.
When you finally get it back up, you notice the slave is stopped, complaining about a dreaded error that looks a bit like this:

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'binlog truncated in the middle of event; consider out of disk space on master; the first event 'mysql-bin.000002' at 71185833, the last event read from '/var/log/mysql/mysql-bin.000002' at 71185833, the last byte read from '/var/log/mysql/mysql-bin.000002' at 71185852.

The binlog file got corrupted when the master database server crashed.

Restoring a backup won’t save you, nor will re-importing the database.
Simply skipping the corrupted binlog will result in a loss of data on your slave, therefore voiding your data integrity.

So, what’s the solution ?

Skip the binlog AND replace the missing data.

But how do you know what is missing ? How do you fix it ?
This is where percona-toolkit (formerly known as maatkit) comes into play.
A subset of the percona-toolkit, pt-table-sync, allows you to checksum the databases you’re replicating, make out the differences and translate them into actual SQL statements - therefore ensuring that your master and slave are identical.

Skip the binlog

When your master booted, it eventually moved to the next binlog file.
You want to go on the slave, skip the current (corrupted) binlog and move to the first position on the next binlog.

Following the example of the error provided above:

SLAVE STOP;  
CHANGE MASTER TO MASTER_LOG_FILE = 'mysql-bin.000003';  
CHANGE MASTER TO MASTER_LOG_POS = 0;  
SLAVE START;  

At this point, your slave should already be catching up it’s replication to the master, you can tell with a

SHOW SLAVE STATUS\G

Retrieve the missing data

Once your slave has caught up to the master, you’ll want to retrieve and insert the missing data.

Install required dependencies for percona-toolkit:

apt-get install libio-socket-ssl-perl libnet-libidn-perl libnet-ssleay-perl

Download and install percona-toolkit:

wget http://www.percona.com/redir/downloads/percona-toolkit/LATEST/deb/percona-toolkit_2.2.5-2_all.deb dpkg -i percona-toolkit_2.2.5-2_all.deb

You might have to refer to the documentation of percona-toolkit for the right usage depending on your scenario but your command should look a bit like this:

pt-table-sync --execute --sync-to-master slave1

If you don’t want to commit changes to the database, you just want to see what is missing and what SQL statements table-sync would do, you can do:

pt-table-sync --sync-to-master slave1 --print

And there you have it, all fixed !

pt-table-sync performance

Running pt-table-sync on a 260GB database with several hundred million rows took a bit more than 2 hours on very high-end hardware.
It wasn’t taxing or heavy on the servers, I didn’t notice much load.


Comments

comments powered by Disqus