Greenplum Database Query Performance Checklist

As a Greenplum Database system is MPP there are lot of things we can check during performance issues. Here are few things that Greenplum Database Administrator can check while troubleshooting.

If users are claiming the query is running for longer time, Check if query is waiting on locks or Resource Queue
Try to run the same query from master instance and check if queries are really running for longer time. Sometimes, It’s network between master server and application causes more time to send the query result to application.
Check if there are any hardware issues by running dcacheck utility if you are using EMC DCA.
Check if there is load on master and segment server using uptime utility. This is quick way to check the load average. Though it will not display lot of information but it can give high level idea to move forward on investigation.
- Example:
  
  Below result displays load averages for last one minute, 5 minutes and 15 minutes respectively.
```
 $ uptime
  13:15  up  2:23, 2 users, load averages: 1.38 1.81 2.04
```
Run uptime using gpssh on all hosts, if there is load unexpected load on and server and do some hardware checks like network and storage.

Network:

Check for network issues using ethtool and ifconfig utilities.

Example:

$ ethtool eth0
Settings for enp0s3:
 Supported ports: [ TP ]
 Supported link modes:   10baseT/Half 10baseT/Full
                         100baseT/Half 100baseT/Full
                         1000baseT/Full
 Supported pause frame use: No
 Supports auto-negotiation: Yes
 Advertised link modes:  10baseT/Half 10baseT/Full
                         100baseT/Half 100baseT/Full
                         1000baseT/Full
 Advertised pause frame use: No
 Advertised auto-negotiation: Yes
 Speed: 10000Mb/s <===
 Duplex: Full <===
 Port: Twisted Pair
 PHYAD: 0
 Transceiver: internal
 Auto-negotiation: on
 MDI-X: off (auto)
 Supports Wake-on: umbg
 Wake-on: d
 Current message level: 0x00000007 (7)
     	       drv probe link
 Link detected: yes
$
$
$ ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
  inet 10.0.2.15  netmask 255.255.255.0  broadcast 10.0.2.255
  inet6 fe80::a00:27ff:fe9d:c7fc  prefixlen 64  scopeid 0x20<link>
  ether 08:00:27:9d:c7:fc  txqueuelen 1000  (Ethernet)
  RX packets 34686  bytes 32923726 (31.3 MiB)
  RX errors 0  dropped 0  overruns 0  frame 0 <===
  TX packets 12508  bytes 6625616 (6.3 MiB)
  TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0 <===

In ethtool check for duplex and speed. It should be full and 10GB(Recommended for GPDB). And In ifconfig, check if counts for errors and dropped packets are increasing by running several times.

Hardware:

Check if any virtual drive cache policy has changed to ‘write through’.

Check recent dmesg errors that can cause perfomance issue.

Example:

  [root@mdw ~]# dmesg | tail
  igb 0000:02:00.2: eth2: igb: eth2 NIC Link is Down
  bond1: making interface eth2 the new active one
  bond1: link status definitely down for interface eth2, disabling it
  bond1: now running without any active interface!
  igb 0000:02:00.2: eth2: igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
  igb 0000:02:00.1: eth1: igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
  bond1: link status definitely up for interface eth1, 1000 Mbps full duplex
  bond1: making interface eth1 the new active one
  bond1: first active interface up!
  bond1: link status definitely up for interface eth2, 1000 Mbps full duplex

Check virtual memory stats usings vmstat 1. With argument 1, It runs each second until we cancel it. Remember, First line will have average stats from boot instead for last second. It will display average stats across all CPUs.
- Columns to check:
r: Number of processes running on CPU and waiting for a turn. This provides a better signal than load averages for determining CPU saturation, as it does not include I/O.

si,so: Swap-ins and swap-outs. If these are non-zero, you’re out of memory.

us: User time on CPU.

sy: Systerm time on CPU.

id: Idle time on CPU.

Check disk performance using iostat -xz 1

Example:

  [root@mdw ~]# iostat -xz 1
  Linux 2.6.32-642.15.1.el6.x86_64 (mdw.gphd.local)       01/08/2018      _x86_64_        (32 CPU)

  avg-cpu:  %user   %nice %system %iowait  %steal   %idle
      	   0.40    0.00    0.18    0.00    0.00   99.42

  Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
  sdd               0.00     0.01    0.00    0.00     0.00     0.05   159.46     0.00    0.94    1.21    0.88   0.42   0.00
  sde               0.00     1.47    0.14    3.16    71.32   212.20    85.79     0.01    3.92   12.16    3.55   0.19   0.06
  sda               0.00     3.27    0.01    3.89     1.80    57.31    15.14     0.00    0.08    4.60    0.06   0.05   0.02
  sdc               0.00     0.00    0.00    0.00     0.00     0.00     8.77     0.00    1.41    1.41    0.00   1.41   0.00
  sdb               0.00     0.02    0.00    0.02     0.00     0.26    14.71     0.00    0.28    4.39    0.25   0.27   0.00

  avg-cpu:  %user   %nice %system %iowait  %steal   %idle
             0.03    0.00    0.19    0.03    0.00   99.75

  Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util

  avg-cpu:  %user   %nice %system %iowait  %steal   %idle
             0.03    0.00    0.06    0.00    0.00   99.91

  Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
  sde               0.00     4.00    0.00   27.00     0.00  1624.00    60.15     0.01    0.30    0.00    0.30   0.04   0.10

  avg-cpu:  %user   %nice %system %iowait  %steal   %idle
             0.00    0.00    0.09    0.00    0.00   99.91

  Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
  sda               0.00     0.00    0.00    1.00     0.00     8.00     8.00     0.00    3.00    0.00    3.00   3.00   0.30

  avg-cpu:  %user   %nice %system %iowait  %steal   %idle
             0.06    0.00    0.06    0.00    0.00   99.87

  Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util

  avg-cpu:  %user   %nice %system %iowait  %steal   %idle
             0.09    0.00    0.13    0.00    0.00   99.78

  Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
  sda               0.00    95.00    0.00  137.00     0.00  1856.00    13.55     0.01    0.07    0.00    0.07   0.07   0.90

Columnd to check:

r/s, w/s, rkB/s, wkB/s: These are the read and writes from the device.

await: This the average time for the IO in miliseconds. Larger than expected value can be indicator of device problem.

%util: Displays the device utlization.

Check free -m for used and free memory.

Check system activity report using sar 1 which displays stats for every second. This is my favorite.

Example:

  [root@mdw ~]# sar 1
  Linux 2.6.32-642.15.1.el6.x86_64 (mdw.gphd.local)       01/08/2018      _x86_64_        (32 CPU)

48:28 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
48:29 AM     all      0.03      0.00      0.34      0.00      0.00     99.62
48:30 AM     all      1.06      0.00      0.53      0.00      0.00     98.41
48:31 AM     all      0.41      0.00      0.34      0.00      0.00     99.25
48:32 AM     all      0.03      0.00      0.34      0.00      0.00     99.62
48:33 AM     all      0.16      0.00      0.38      0.00      0.00     99.47
48:34 AM     all      1.31      0.00      0.50      0.00      0.00     98.19
48:35 AM     all      0.16      0.00      0.94      0.00      0.00     98.90

Columns to check

Share on

Twitter Facebook Google+ LinkedIn

Greenplum Database Query Performance Checklist

Yogesh Jadhav

Example:

Network:

Example:

Hardware:

Example:

Columns to check:

Example:

Columnd to check:

Example:

Columns to check

Share on

You May Also Enjoy

Restore multiple MarkLogic Forests simultaneously using Python

कोडाचाद्री ट्रेक

GPOPB: Greenplum Object Processing in Bulk

Getting row count for all the tables in Database