I just created external table in greenplum using Amazon S3 web service. So here I’m writing how I did it.
First we have to create two function to write to S3 and to read from S3 like below:
Creating Functions and Protocol
Functions
1
2
3
4
CREATE OR REPLACE FUNCTION read_from_s3 ()
RETURNS integer
AS '$libdir/gps3ext.so' , 's3_import'
LANGUAGE C STABLE ;
1
2
3
4
CREATE OR REPLACE FUNCTION read_from_s3 ()
RETURNS integer
AS '$libdir/gps3ext.so' , 's3_import'
LANGUAGE C STABLE ;
Then create protocol to access S3 from greenplum database.
Protocol
1
2
CREATE PROTOCOL s3
( writefunc = write_to_s3 , readfunc = read_from_s3 );
Creating Configuration File
You can create sample configuration file using gpcloudcheck utility like below and then you can edit it as per your setting.
1
gpcheckcloud -t > s3_config.conf
Configuration file Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
secret = "Your AWS Secret ID"
accessid = "Your AWS Access ID"
threadnum = 4
chunksize = 67108864
low_speed_limit = 10240
low_speed_time = 60
encryption = true
version = 1
proxy = ""
autocompress = true
verifycert = true
server_side_encryption = ""
# gpcheckcloud config
gpcheckcloud_newline = " \n "
Creating S3 bucket on AWS
You can follow this guide to to create S3 bucket.
Testing configuration using gpcheckcloud utility.
You can use command line options like below:
1
gpcheckcload -c "s3://S3_region_endpoint/bucket_name/[prefix] config=/path/to/config_file"
You can check your region endpoint here
Example
1
2
3
4
5
6
$ gpcheckcloud -c "s3://s3-ap-south-1.amazonaws.com/greenplumext config=/home/gpadmin/s3.conf"
File: Screen Shot 2017-01-23 at 1.14.21 AM.png, Size: 110279
File: sample_1.csv, Size: 43
File: sample_2.csv, Size: 43
Your configuration works well.
Once your configuration works, You can create external table with S3 location and start access S3 files for GPDB.
Creating and accessing S3 external tables
Example
1
2
3
CREATE EXTERNAL TABLE name_external ( Firstname text , Lastname text )
LOCATION ( 's3://s3-ap-south-1.amazonaws.com/greenplumext/ config=/home/gpadmin/s3_root_v1.conf' )
FORMAT 'csv' ;
1
2
3
4
5
6
7
SELECT * FROM name_external
WHERE firstname = 'Yogesh' ;
firstname | lastname
-----------+----------
Yogesh | Jadhav
Important Links