I have written a python program to run analyze parallel on postgres tables. This script will reduce lot of time compared to running analyze on database with analyze SQL command.

Progam

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
from multiprocessing import Pool, Value
from pg import DB
import re
import logging
import optparse

logging.basicConfig(format='%(asctime)s:%(levelname)s:%(message)s',level=logging.DEBUG)

parser = optparse.OptionParser()
parser.add_option("-d", "--database", dest = "database", action = "store", help = "Specify target database to analyze")
parser.add_option("-n", "--schema", dest = "schema", action = "store", help = "Specify schema to analyze")
parser.add_option("-p", "--parallel-processes", dest = "parallel", action = "store",default = 1,help = "Specify number of parallel-processes")
parser.add_option("--host", dest = "host", action = "store", default = 'localhost', help = "Specify the target host")
parser.add_option("--user-tables", dest = "usertables", action = "store_true", help = "Specify if you want to analyze only user table")

options, args = parser.parse_args()

# Getting command line options

if options.database:
    vDatabase = options.database
else:
    logging.error("database not supplied... exiting...")
    sys.exit()

con = DB(dbname='template1', host=options.host)
if vDatabase in con.get_databases():
    pass
else:
    logging.error("Database doesn't exists... exiting")
    sys.exit()
con.close()

vProcesses = int(options.parallel)
vHost = options.host
vSchema = options.schema

# Function to get list of table

def get_tables():
    db = DB(dbname = vDatabase, host = vHost)
    table_list = []
    if options.usertables:
        table_list = db.get_tables()
    else:
        table_list = db.get_tables('system')
    db.close()

    if vSchema:
        tables = []
        regex = "^" + vSchema + "\."
        for table in table_list:
            if re.match(regex, table, re.I):
                tables.append(table)
    else:
        tables = table_list
    return tables

counter = Value('i', 0)
total_tables = len(get_tables())

# Function to run analyze

def run_analyze(table):
    global counter
    db = DB(dbname = vDatabase, host = vHost)
    db.query('analyze %s' %table)
    with counter.get_lock():
        counter.value += 1
    if counter.value % 10 == 0 or counter.value == total_tables:
        logging.info(str(counter.value) + " tables completed out of " + str(total_tables) + " tables")
    db.close()

# For count
def init(args):
    ''' store the counter for later use '''
    global counter
    counter = args

# Forking new #n processes for run_analyze() Function

logging.info("Running analyze on " + str(total_tables) + " tables")
pool = Pool(initializer=init, initargs=(counter, ), processes=vProcesses)
pool.map(run_analyze, get_tables())

pool.close()  # worker processes will terminate when all work already assigned has completed.
pool.join()  # to wait for the worker processes to terminate.

Program help

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ ./analyzedb --help
Usage: analyzedb [options]

Options:
  -h, --help            show this help message and exit
  -d DATABASE, --database=DATABASE
                        Specify target database to analyze
  -n SCHEMA, --schema=SCHEMA
                        Specify schema to analyze
  -p PARALLEL, --parallel-processes=PARALLEL
                        Specify number of parallel-processes
  --host=HOST           Specify the target host
  --user-tables         Specify if you want to analyze only user table
  
 

-d, --database

This option is to specify the database to analyze. If database doesn’t exists in environment, this program exists immediately.

-n, --schema

This option is to specify the schema to analyze. We can specify only one schema at a time

-p, --parallel_processes

This option is to specify the number of parallel processes to run. The default value is 1.

--host

This option is to specify the target host. The default value is localhost

--user-tables

Specify this option if you want to analyze on user tables and not system tables. By default this program analyzes system tables.

Run log 1 (Complete database)

1
2
3
4
5
6
7
8
9
10
$ ./analyzedb -d jadhavy -p 10
2018-05-06 00:23:33,680:INFO:Running analyze on 76 tables
2018-05-06 00:23:33,799:INFO:10 tables completed out of 76 tables
2018-05-06 00:23:33,844:INFO:21 tables completed out of 76 tables
2018-05-06 00:23:33,886:INFO:30 tables completed out of 76 tables
2018-05-06 00:23:33,923:INFO:40 tables completed out of 76 tables
2018-05-06 00:23:33,972:INFO:50 tables completed out of 76 tables
2018-05-06 00:23:34,018:INFO:60 tables completed out of 76 tables
2018-05-06 00:23:34,062:INFO:70 tables completed out of 76 tables
2018-05-06 00:23:34,113:INFO:76 tables completed out of 76 tables

Run log 2 (Specific schema)

1
2
3
$ ./analyzedb -d jadhavy -p 10 -n test
2018-05-06 00:23:52,681:INFO:Running analyze on 6 tables
2018-05-06 00:23:52,755:INFO:6 tables completed out of 6 tables

Run log 3 (User tables only)

1
2
3
4
5
$ ./analyzedb -d jadhavy -p 10 --user-tables
2018-05-06 00:24:25,641:INFO:Running analyze on 14 tables
2018-05-06 00:24:25,737:INFO:10 tables completed out of 14 tables
2018-05-06 00:24:25,745:INFO:14 tables completed out of 14 tables
 

Download binary

You can download binary of this program from here.

If you have any suggestions or issue, let me know in comment box below :smile: