记一次mysql无法启动的解决过程

正文

今天开发人员反馈一个问题，就是某一台开发环境机器的mysql无法启动了，但是如果这台服务器重启的话，mysql就好使，而第二天就会出现mysql死掉然后无法启动的情况。我使用service mysqld restart，命令行反馈如下的内容：

1
2
3

[root@iZ2373j9xivZ data]# service mysqld restart
MySQL server PID file could not be found!                  [FAILED]    
Starting MySQL........The server quit without updating PID file (/data/mysql/data/sock/mysql.pid).                                                         [FAILED]

打开错误日志看一下，里面是这么写的：

2018-03-10 00:26:24 25919 [Note] InnoDB: Using CPU crc32 instructions
2018-03-10 00:26:24 25919 [Note] InnoDB: Initializing buffer pool, size = 4.0G
InnoDB: mmap(549453824 bytes) failed; errno 12
2018-03-10 00:26:24 25919 [ERROR] InnoDB: Cannot allocate memory for the buffer pool
2018-03-10 00:26:24 25919 [ERROR] Plugin 'InnoDB' init function returned error.
2018-03-10 00:26:24 25919 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2018-03-10 00:26:24 25919 [ERROR] Unknown/unsupported storage engine: INNODB
2018-03-10 00:26:24 25919 [ERROR] Aborting

爆InnoDB: mmap(549453824 bytes) failed; errno 12，然后我就free -m查看一下，当前服务器的内存已经不够用了。

[root@iZ2373j9xivZ sock]# free -m
             total       used       free     shared    buffers     cached
Mem:          7869       7747        121          0         16         15
-/+ buffers/cache:       7716        152
Swap:            0          0          0

那么是什么在占用这台服务器的内存？使用ps aux | sort -k4nr |head -5 这个命令查找当前占用内存最大的五个进程一看，全是php-fpm，同时也发现服务器里面运行大量的php-fpm，在征得开发人员的同意之后，重启php-fpm进程，内存空出来很多。

此时再次service mysqld restart，发现mysql的错误日志改成如下的样子了：

2018-03-12 10:53:42 28238 [Note] InnoDB: Highest supported file format is Barracuda.
2018-03-12 10:53:42 28238 [Note] InnoDB: The log sequence numbers 16939991440 and 16939991440 in ibdata files do not match the log sequence number 16940121908 in the ib_logfiles!
2018-03-12 10:53:42 28238 [Note] InnoDB: Database was not shutdown normally!
2018-03-12 10:53:42 28238 [Note] InnoDB: Starting crash recovery.

这次变成了The log sequence numbers 16939991440 and 16939991440 in ibdata files do not match the log sequence number 16940121908 in the ib_logfiles!，我打开my.cnf，适当的调小了max_connections和innodb_buffer_pool_size，然后service mysqld restart的时候发现错误又变了：

2018-03-12 11:03:57 29190 [Note] InnoDB: 5.6.27 started; log sequence number 16940121918
2018-03-12 11:03:57 29190 [Note] Server hostname (bind-address): '*'; port: 3306
2018-03-12 11:03:57 29190 [Note] IPv6 is not available.
2018-03-12 11:03:57 29190 [Note]   - '0.0.0.0' resolves to '0.0.0.0';
2018-03-12 11:03:57 29190 [Note] Server socket created on IP: '0.0.0.0'.
2018-03-12 11:03:57 29190 [ERROR] Can't start server : Bind on unix socket: Permission denied    
2018-03-12 11:03:57 29190 [ERROR] Do you already have another mysqld server running on socket: /data/mysql/data/sock/mysql.sock ?
2018-03-12 11:03:57 29190 [ERROR] Aborting

这就是文件权限问题了，我再次打开my.cnf发现里面的user填写的是mysql，那么把/data/mysql/data/sock/mysql.sock这一系列的文件的所属人都改成了mysql用户，这一次重启mysql就OK了。

为什么这个mysql会好好的突然自动死掉呢？我发现日志里面有这样的字样：InnoDB: Database was not shutdown normally!，于是我猜想很有可能是php-fpm这进程不断地增长，占用的内存太大，导致mysql被linux的内核杀死了。于是查看/var/log/message的文件，结合mysql的错误日志时间找到了如下的字样：

Mar 10 00:26:22 iZ2373j9xivZ kernel: Out of memory: Kill process 1883 (mysqld) score 53 or sacrifice child
Mar 10 00:26:22 iZ2373j9xivZ kernel: Killed process 1883, UID 501, (mysqld) total-vm:6849508kB, anon-rss:429368kB, file-rss:176kB
Mar 10 04:11:38 iZ2373j9xivZ kernel: php-fpm invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Mar 10 04:11:38 iZ2373j9xivZ kernel: php-fpm cpuset=/ mems_allowed=0
Mar 10 04:11:38 iZ2373j9xivZ kernel: Pid: 4375, comm: php-fpm Not tainted 2.6.32-431.23.3.el6.x86_64 #1

证据确凿，php-fpm的无休止增长导致服务器的可用内存变小，最后内核把mysql杀死，修改php-fpm的文件之后，暂时好了点…

参考资料

http://robinchen.me/tech/2016/03/14/tech-aliyun-centos-mysql-shutdown-itself-irregularly.html
http://www.wisedream.net/2017/12/20/traps/mysql-corrupt/