本帖最后由 zhaorong 于 2018-8-20 14:34 编辑
思路:
1、确定高负载的类型 htop,dstat命令看负载高是CPU还是IO
2、监控具体的sql语句,是insert update 还是 delete导致高负载
3、检查mysql日志
4、检查硬件问题
dstat
可以看到具体是哪个用户哪个进程占用了相关系统资源,当前CPU、内存谁在使用
htop
htop是top的增强版,更直观
tcpdump
抓取mysql包分析,一般抓3306端口的数据
- [root@cc ~]# tcpdump -i eth0 -A -s 3000 port 3306 > ~/sql.log
复制代码
然后使用awk,sort,wc 等命令进行分析
或者
- [root@cc ~]# tcpdump -i eth0 -s 0 -l -w - dst port 3306 | strings | perl -e '
- while(<>) { chomp; next if /^[^ ]+[ ]*$/;
- if(/^(SELECT|UPDATE|DELETE|INSERT|SET|COMMIT|ROLLBACK|CREATE|DROP|ALTER)/i) {
- if (defined $q) { print "$qn"; }
- $q=$_;
- } else {
- $_ =~ s/^[ t]+//; $q.=" $_";
- }
- }'
复制代码
就可以看出最繁忙的sql语句了
strace
查看系统调用是否有问题,进程是否堵塞,是否有Broken pipe
- [root@cc ~]# strace -p 26578
复制代码
pt-query-digest
分析mysql慢日志,查看哪些sql语句最耗时
- [root@cc ~]# pt-query-digest slow.logs
- # 390ms USER TIME, 10ms system TIME, 15.67M rss, 105.84M vsz
- # CURRENT DATE: Thu DEC 29 13:22:42 2014
- # Hostname: test
- # Files: slow.log
- # Overall: 776 total, 11 UNIQUE, 0.00 QPS, 0.00x concurrency _____________
- # TIME range: 2011-09-10 04:03:19 TO 2011-12-29 05:02:51
- # Attribute total MIN MAX avg 95% stddev median
- # ============ ======= ======= ======= ======= ======= ======= =======
- # EXEC TIME 5657s 2s 33s 7s 23s 6s 5s
- # LOCK TIME 33s 0 19s 43ms 98us 715ms 38us
- # ROWS sent 323.38k 0 107.36k 426.73 0.99 6.35k 0
- # ROWS examine 323.39k 0 107.36k 426.74 0 6.35k 0
- # Query SIZE 217.95k 38 562 287.61 420.77 81.78 284.79
复制代码
show processlist
查看系统到底在干什么
- mysql> show full processlist;
- +-----------+---------------+---------------------+---------------------+---------+------+---------------+---------------------------+
- | Id | User | Host | db | Command | Time | State | Info |
- +-----------+---------------+---------------------+---------------------+---------+------+---------------+---------------------------+
- | 184498848 | testdb_rr1356 | 10.11.211.120:61343 | testdb_rr1356_db121 | Sleep | 1384 | | NULL |
- | 184508740 | testdb_rr1356 | 10.11.211.120:11809 | testdb_rr1356_db121 | Sleep | 87 | | NULL |
- | 184509415 | testdb_rr1356 | 10.11.211.120:12760 | testdb_rr1356_db121 | Query | 0 | NULL | show full processlist |
- | 184509451 | testdb_rr1356 | 10.11.211.120:12804 | testdb_rr1356_db121 | Sleep | 10 | | NULL |
- | 184509528 | testdb_rr1356 | 10.11.211.120:12919 | testdb_rr1356_db121 | Query | 0 | freeing items | DESCRIBE test_channel |
复制代码
检查mysql配置参数是否有问题,引起大量的IO或者高CPU操作
- [code]innodb_flush_log_at_trx_commit 、innodb_buffer_pool_size 、key_buffer_size 等重要参数
- mysql> show variables like '%innodb%';
- +---------------------------------+----------------------------+
- | Variable_name | Value |
- +---------------------------------+----------------------------+
- | have_innodb | YES |
- | ignore_builtin_innodb | ON |
- | innodb_adaptive_flushing | ON |
- | innodb_adaptive_hash_index | ON |
- | innodb_additional_mem_pool_size | 2097152 |
- | innodb_autoextend_increment | 8 |
- | innodb_autoinc_lock_mode | 1 |
- | innodb_buffer_pool_size | 2013265920 |
- | innodb_change_buffering | inserts |
- | innodb_checksums | ON |
复制代码
通过show engine innodb status查看当前事务,内存使用
- mysql> show engine innodb status \G
- LATEST DETECTED DEADLOCK
- ------------------------
- 150731 10:36:50
- *** (1) TRANSACTION:
- TRANSACTION EBFBBEC, ACTIVE 0 sec, process no 20691, OS thread id 47345217033984 inserting
- mysql tables in use 1, locked 1
- LOCK WAIT 5 lock struct(s), heap size 1248, 4 row lock(s), undo log entries 2
- MySQL thread id 143249904, query id 1286731854 10.135.21.120 tybuser2014 update
-
- #此处具体sql省略
-
- ----------------------
- BUFFER POOL AND MEMORY
- ----------------------
- Total memory allocated 2058485760; in additional pool allocated 0
- Dictionary memory allocated 819282
- Buffer pool size 122879
- Free buffers 97599
- Database pages 24313
- Old database pages 8954
- Modified db pages 7
- Pending reads 0
- Pending writes: LRU 0, flush list 0, single page 0
- Pages made young 6, not young 0
- 0.00 youngs/s, 0.00 non-youngs/s
- Pages read 1049, created 41853, written 30401604
- 0.00 reads/s, 0.00 creates/s, 1.75 writes/s
- Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
- Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
- LRU len: 24313, unzip_LRU len: 0
- I/O sum[45]:cur[0], unzip sum[0]:cur[0]
复制代码
最后通过zabbix或者cacti等监控来查看IO、CPU、MEMORY、磁盘等是否有异常
这样基本上就可以把问题找出来了
|