正文
今天开发人员反馈一个问题,就是某一台开发环境机器的mysql无法启动了,但是如果这台服务器重启的话,mysql就好使,而第二天就会出现mysql死掉然后无法启动的情况。我使用service mysqld restart
,命令行反馈如下的内容:
1
2
3[root@iZ2373j9xivZ data]# service mysqld restart
MySQL server PID file could not be found! [FAILED]
Starting MySQL........The server quit without updating PID file (/data/mysql/data/sock/mysql.pid). [FAILED]
打开错误日志看一下,里面是这么写的:
1
2
3
4
5
6
7
82018-03-10 00:26:24 25919 [Note] InnoDB: Using CPU crc32 instructions
2018-03-10 00:26:24 25919 [Note] InnoDB: Initializing buffer pool, size = 4.0G
InnoDB: mmap(549453824 bytes) failed; errno 12
2018-03-10 00:26:24 25919 [ERROR] InnoDB: Cannot allocate memory for the buffer pool
2018-03-10 00:26:24 25919 [ERROR] Plugin 'InnoDB' init function returned error.
2018-03-10 00:26:24 25919 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2018-03-10 00:26:24 25919 [ERROR] Unknown/unsupported storage engine: INNODB
2018-03-10 00:26:24 25919 [ERROR] Aborting
爆InnoDB: mmap(549453824 bytes) failed; errno 12
,然后我就free -m
查看一下,当前服务器的内存已经不够用了。
1
2
3
4
5[root@iZ2373j9xivZ sock]# free -m
total used free shared buffers cached
Mem: 7869 7747 121 0 16 15
-/+ buffers/cache: 7716 152
Swap: 0 0 0
那么是什么在占用这台服务器的内存?使用ps aux | sort -k4nr |head -5
这个命令查找当前占用内存最大的五个进程一看,全是php-fpm
,同时也发现服务器里面运行大量的php-fpm
,在征得开发人员的同意之后,重启php-fpm
进程,内存空出来很多。
此时再次service mysqld restart
,发现mysql的错误日志改成如下的样子了:
1
2
3
42018-03-12 10:53:42 28238 [Note] InnoDB: Highest supported file format is Barracuda.
2018-03-12 10:53:42 28238 [Note] InnoDB: The log sequence numbers 16939991440 and 16939991440 in ibdata files do not match the log sequence number 16940121908 in the ib_logfiles!
2018-03-12 10:53:42 28238 [Note] InnoDB: Database was not shutdown normally!
2018-03-12 10:53:42 28238 [Note] InnoDB: Starting crash recovery.
这次变成了The log sequence numbers 16939991440 and 16939991440 in ibdata files do not match the log sequence number 16940121908 in the ib_logfiles!
,我打开my.cnf
,适当的调小了max_connections
和innodb_buffer_pool_size
,然后service mysqld restart
的时候发现错误又变了:
1
2
3
4
5
6
7
82018-03-12 11:03:57 29190 [Note] InnoDB: 5.6.27 started; log sequence number 16940121918
2018-03-12 11:03:57 29190 [Note] Server hostname (bind-address): '*'; port: 3306
2018-03-12 11:03:57 29190 [Note] IPv6 is not available.
2018-03-12 11:03:57 29190 [Note] - '0.0.0.0' resolves to '0.0.0.0';
2018-03-12 11:03:57 29190 [Note] Server socket created on IP: '0.0.0.0'.
2018-03-12 11:03:57 29190 [ERROR] Can't start server : Bind on unix socket: Permission denied
2018-03-12 11:03:57 29190 [ERROR] Do you already have another mysqld server running on socket: /data/mysql/data/sock/mysql.sock ?
2018-03-12 11:03:57 29190 [ERROR] Aborting
这就是文件权限问题了,我再次打开my.cnf
发现里面的user
填写的是mysql
,那么把/data/mysql/data/sock/mysql.sock
这一系列的文件的所属人都改成了mysql
用户,这一次重启mysql就OK了。
为什么这个mysql
会好好的突然自动死掉呢?我发现日志里面有这样的字样:InnoDB: Database was not shutdown normally!
,于是我猜想很有可能是php-fpm
这进程不断地增长,占用的内存太大,导致mysql
被linux
的内核杀死了。于是查看/var/log/message
的文件,结合mysql的错误日志时间找到了如下的字样:
1
2
3
4
5Mar 10 00:26:22 iZ2373j9xivZ kernel: Out of memory: Kill process 1883 (mysqld) score 53 or sacrifice child
Mar 10 00:26:22 iZ2373j9xivZ kernel: Killed process 1883, UID 501, (mysqld) total-vm:6849508kB, anon-rss:429368kB, file-rss:176kB
Mar 10 04:11:38 iZ2373j9xivZ kernel: php-fpm invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Mar 10 04:11:38 iZ2373j9xivZ kernel: php-fpm cpuset=/ mems_allowed=0
Mar 10 04:11:38 iZ2373j9xivZ kernel: Pid: 4375, comm: php-fpm Not tainted 2.6.32-431.23.3.el6.x86_64 #1
证据确凿,php-fpm
的无休止增长导致服务器的可用内存变小,最后内核把mysql
杀死,修改php-fpm
的文件之后,暂时好了点…
参考资料
http://robinchen.me/tech/2016/03/14/tech-aliyun-centos-mysql-shutdown-itself-irregularly.html
http://www.wisedream.net/2017/12/20/traps/mysql-corrupt/