It's one of your slave corrupted, you can see this at SLOWLOG.
127.0.0.1:6379[3]> slowlog get 2
1) 1) (integer) 36532
2) (integer) 1548262888
3) (integer) 531825
4) 1) "PSYNC"
2) "70fb0605bde9d8df4ce0384d52c7fa584c741f47"
3) "52777391995395"
5) "10.x.x.x:46335"
6) ""
2) 1) (integer) 36531
2) (integer) 1548262887
3) (integer) 520081
4) 1) "PSYNC"
2) "70fb0605bde9d8df4ce0384d52c7fa584c741f47"
3) "52777391995395"
5) "10.x.x.x:34363"
6) ""
At master, using redis command info replication
, you can see the master does not recognize that IP as a slave.
127.0.0.1:6379[3]> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=10.y.y.y,port=6379,state=online,offset=52777990926778,lag=1
master_replid:70fb0605bde9d8df4ce0384d52c7fa584c741f47
master_replid2:5da8c0503f5ad3cb498c5595163ad0ce8ad78a23
master_repl_offset:52777991869922
second_repl_offset:52765785565114
repl_backlog_active:1
repl_backlog_size:1000000000
repl_backlog_first_byte_offset:52776991869923
repl_backlog_histlen:1000000000
The solution is simple, just ssh into that slave, stop redis, # rm /var/lib/redis/dump.rdb
and start redis. Wait for replication done.
127.0.0.1:6379[3]> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=10.y.y.y,port=6379,state=online,offset=52779449364787,lag=1
slave1:ip=10.x.x.x,port=6379,state=online,offset=52779449158904,lag=1
master_replid:70fb0605bde9d8df4ce0384d52c7fa584c741f47
master_replid2:5da8c0503f5ad3cb498c5595163ad0ce8ad78a23
master_repl_offset:52779449806011
second_repl_offset:52765785565114
repl_backlog_active:1
repl_backlog_size:1000000000
repl_backlog_first_byte_offset:52778449806012
repl_backlog_histlen:1000000000