loading...

Quickly set up a Greenplum environment on GCP

mesmacosta profile image Marcelo Costa Updated on ・2 min read

gcp_dbms_dev (4 Part Series)

1) Quickly set up an Oracle environment on GCP 2) Quickly set up a Greenplum environment on GCP 3) Quickly set up a PostgreSQL environment on GCP 4) Quickly set up a Hive environment on GCP

This quick-start guide is part of a series that shows how to set up relational databases on Google Cloud Platform, for developing and testing purposes.

This guide will show you how to create an Greenplum environment running inside your Google Cloud Project.

Create a Compute Engine VM

Using Cloud Shell:

# Create the Greenplum GCE instance
gcloud compute instances create greenplum \
  --zone=us-central1-c \
  --machine-type=n1-standard-1 \
  --image-project=debian-cloud --boot-disk-size=10GB \
  --image=debian-9-stretch-v20190916 \
  --boot-disk-type=pd-standard \
  --boot-disk-device-name=greenplum \
  --scopes=cloud-platform

Configure your VM with Greenplum

Using Cloud Shell:

# Connect to the greenplum VM
gcloud compute ssh --zone=us-central1-c greenplum

# Login as super user
sudo -s

# Install Docker
curl -sSL https://get.docker.com/ | sh

# Install Git 
apt-get install git

# Install postgresl client
apt-get install postgresql-client

# Clone greenplum official repo
git clone https://github.com/greenplum-db/gpdb

# Go to to the docker directory
cd gpdb/src/tools/docker/ubuntu16_ppa-persistent

# Build and run
docker build -t local/gpdb .
mkdir -p /tmp/gpdata/
docker run -d -p 5432:5432 -h dwgpdb -v /tmp/gpdata:/data local/gpdb

Load your Greenplum database with data

Using Cloud Shell:

# Verify GPDB is installed and started up successfully
docker ps -a
docker logs --follow <GPDB_CONTAINDER_ID>

# Wait for the message to appear:
# ===> GPDB starting process has completed, check the result above   
# or try to connect
# Leave the logs command, by pressing CTRL + C

# Log into the GPDB container
docker ps
docker exec -it <GPDB_CONTAINER_ID> bash

# Log as the gpadmin user
su gpadmin

# Verify that the GDPB instance is running
gpstate

# Create a DB called messages_db
createdb messages_db

# Log into the new DB
psql messages_db

# Create tables and populate with Data
CREATE TABLE Users (uid INTEGER PRIMARY KEY,
                    name VARCHAR);
INSERT INTO Users
  SELECT generate_series, random()
  FROM generate_series(1, 100000);

CREATE TABLE Messages (mid INTEGER PRIMARY KEY,
 uid INTEGER REFERENCES Users(uid),
 ptime DATE, message VARCHAR);

INSERT INTO Messages
   SELECT generate_series,
          round(random()*100000),
          date(now() - '1 hour'::INTERVAL * round(random()*24*30)),
          random()::text
   FROM generate_series(1, 100000);

And that's it!

If you have difficulties, don’t hesitate reaching out. I would love to help you!

gcp_dbms_dev (4 Part Series)

1) Quickly set up an Oracle environment on GCP 2) Quickly set up a Greenplum environment on GCP 3) Quickly set up a PostgreSQL environment on GCP 4) Quickly set up a Hive environment on GCP

Posted on Jan 2 by:

mesmacosta profile

Marcelo Costa

@mesmacosta

senior software engineer & google cloud certified architect and data engineer @ ciandt.com

Discussion

markdown guide