I just created external table in greenplum using Amazon S3 web service. So here I’m writing how I did it.

First we have to create two function to write to S3 and to read from S3 like below:

Creating Functions and Protocol

Functions

1
2
3
4
CREATE OR REPLACE FUNCTION read_from_s3()
RETURNS integer
AS  '$libdir/gps3ext.so', 's3_import'
LANGUAGE C STABLE;
1
2
3
4
CREATE OR REPLACE FUNCTION read_from_s3()
RETURNS integer
AS '$libdir/gps3ext.so', 's3_import'
LANGUAGE C STABLE;

Then create protocol to access S3 from greenplum database.

Protocol

1
2
CREATE PROTOCOL s3
(writefunc = write_to_s3, readfunc = read_from_s3);

Creating Configuration File

You can create sample configuration file using gpcloudcheck utility like below and then you can edit it as per your setting.

1
gpcheckcloud -t > s3_config.conf
Configuration file Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
  secret = "Your AWS Secret ID"
  accessid = "Your AWS Access ID"
  threadnum = 4
  chunksize = 67108864
  low_speed_limit = 10240
  low_speed_time = 60
  encryption = true
  version = 1
  proxy = ""
  autocompress = true
  verifycert = true
  server_side_encryption = ""
  # gpcheckcloud config
  gpcheckcloud_newline = "\n"
  

Creating S3 bucket on AWS

  • You can follow this guide to to create S3 bucket.

Testing configuration using gpcheckcloud utility.

You can use command line options like below:

1
gpcheckcload -c "s3://S3_region_endpoint/bucket_name/[prefix] config=/path/to/config_file"

You can check your region endpoint here

Example

1
2
3
4
5
6
$ gpcheckcloud -c "s3://s3-ap-south-1.amazonaws.com/greenplumext config=/home/gpadmin/s3.conf"
File: Screen Shot 2017-01-23 at 1.14.21 AM.png, Size: 110279
File: sample_1.csv, Size: 43
File: sample_2.csv, Size: 43

Your configuration works well.

Once your configuration works, You can create external table with S3 location and start access S3 files for GPDB.

Creating and accessing S3 external tables

Example

1
2
3
CREATE EXTERNAL TABLE name_external(Firstname text, Lastname text)
LOCATION ('s3://s3-ap-south-1.amazonaws.com/greenplumext/ config=/home/gpadmin/s3_root_v1.conf')
FORMAT 'csv';
1
2
3
4
5
6
7
SELECT * FROM name_external
WHERE firstname='Yogesh';                                                                                                

firstname | lastname
-----------+----------
 Yogesh    | Jadhav