Creating an OpenShift Cluster with Terraform
Overview
There are many examples of how to create an OpenShift cluster in AWS. Most of these examples use CloudFormation for orchestrating the creation of infrastructure and deploying the cluster. This post walks through how to do it using Terraform.
Note: The code displayed uses Terraform 0.12.x, but the concepts should still apply for Terraform 0.11.x
This example will create a number of items outside of OpenShift that are the basics required to get up and running in an AWS VPC, specifically:
- VPC
- Client VPN endpoints
- Certificates for the endpoints
If these items are not required, you can skip the vpc
module and continue on to the openshift
module.
Terraform Deployment
The full code for this post can be found here. There are two
branches of note. master
which contains an OpenShift deployment using the aws-iam-authenticator
for AutoScaling
(more on that later), and feature/terraform-standard-install
. For now we will focus on the second branch. To get
started, download the code and checkout the right branch:
git clone https://gitlab.com/kjanania/openshift-terraform.git
cd openshift-terraform
git checkout feature/terraform-standard-install
You can check the README for the full list of steps and some convinience scripts.
Setting Up Prerequisites
Terraform maintains a state file to keep track of what resources have been created in order to determine which ones need to be modified to reach the current state. This state file can be stored locally on the file system or remotely in locations such as Amazon S3. This demo stores the state file on Amazon S3 and uses DynamoDB to concurrency locking.
We’ll start by creating the S3 bucket and DynamoDB tables:
aws dynamodb create-table --table-name terraform-vpc-lock --attribute-definitions AttributeName=LockID,AttributeType=S --key-schema AttributeName=LockID,KeyType=HASH --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5
aws dynamodb create-table --table-name terraform-openshift-lock --attribute-definitions AttributeName=LockID,AttributeType=S --key-schema AttributeName=LockID,KeyType=HASH --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5
aws s3 mb s3://<insert s3 bucket name here>
Next, we’ll need to upload certificates for use with the VPN. Certificates have been provided in the repo for convenience, but these are not secure!!
Create your own certificates as described in this document. The ones provided are for testing purposes only!
Upload the provided certificates or ones you have generated on your own:
SERVER_ARN=$(aws acm import-certificate --certificate file://terraform/vpc/certificates/server.crt --certificate-chain file://terraform/vpc/certificates/ca.crt --private-key file://terraform/vpc/certificates/server.key | jq -r '.CertificateArn')
aws acm add-tags-to-certificate --certificate-arn $SERVER_ARN --tags Key=Name,Value=vpn-server-cert
CLIENT_ARN=$(aws acm import-certificate --certificate file://terraform/vpc/certificates/client1.domain.tld.crt --certificate-chain file://terraform/vpc/certificates/ca.crt --private-key file://terraform/vpc/certificates/client1.domain.tld.key | jq -r '.CertificateArn')
aws acm add-tags-to-certificate --certificate-arn $CLIENT_ARN --tags Key=Name,Value=vpn-client-cert
The tags are important because that is what will be used to find them and attach them to the client VPN endpoints.
If you do not have a key pair for EC2 instances, go ahead and create one now.
We can go ahead and initalize both the vpc
module and the openshift
module:
cd terraform/vpc
terraform init -backend-config="bucket=<insert s3 bucket name here>"
cd ../openshift
terraform init -backend-config="bucket=<insert s3 bucket name here>"
After this, we’ll need to create a .tfvars
file to populate these three variables:
openshift-cluster-name = "my-cluster"
ec2-key-location = "~/.ssh/my-ec2-key.pem"
ec2-key-name = "my-ec2-key"
Creating the VPC
Once we’ve created what we need for saving the state file to a backend, let’s create the VPC.
If you have one already you can skip this step but you’ll need to add some tags to existing resources. You’ll need to add the tag
kubernetes.io/cluster/my-cluster = shared
to these resources:
- the VPC in which the cluster will be deployed
- the Subnets in which the cluster AutoScaling Groups will create instances
Run the following command to apply the Terraform plan:
terraform apply -var-file=my-vars.tfvars
This will create a VPC which you can connect to over VPN. If you’re on Fedora or similar operating system, you can use the OpenVPN CLI to connect:
sudo openvpn --config client.config --cert certificates/client1.domain.tld.crt --key certificates/client1.domain.tld.key
Otherwise, connect using your preferred compatible client.
Creating the OpenShift Cluster
After you’ve connected over VPN (you need that connection to start the install on the bastion), we can start deploying
the OpenShift cluster. Move to the openshift
directory and start applying the Terraform plan:
terraform apply -var-file=my-vars.tfvars
This will create all the infrastructure required to create an OpenShift cluster:
- 3 t2.medium master-infra nodes
- 3 t2.medium compute nodes
- 1 t2.medium bastion node
Once the infrastructure is created, the Terraform plan will wait for the nodes to ready using the trick described in this post.
The deployment will run from the bastion node using Ansible. Since we’re using smaller hardware, some of the minimum requirements checks were disabled. If you update the variables to use larger nodes, you can re-enable those checks by commenting out the lines in the inventory file:
openshift_disable_check:
- memory_availability
Under the Hood
So what’s going on during the deployment? To make things a bit easier to troubleshoot and handle in general, it’s broken up into several phases:
- Infrastructure deployment
- Bastion
- Masters
- Workers
- OpenShift deployment
- Wait for nodes to ready
- Prepare Ansible inventory file
- Run OpenShift cluster install
During each infrastructure phase, corresponding IAM roles, instance profiles, and security groups are created as well. Each component is broken up into a separate module, along with the cluster install being its own module as well. During the cluster install, an Ansible inventory file is dynamically generated with some reasonable defaults. Then the addresses of each of the nodes is fed into the template and is used for the cluster installation.
It Works!
After the Terraform plan is complete, you should be able to ssh to one of the Master nodes and begin using the cluster:
[ec2-user@ip-10-0-1-8 ~]$ oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-1-8.ec2.internal Ready infra,master 12m v1.11.0+d4cacc0
ip-10-0-2-39.ec2.internal Ready compute 9m v1.11.0+d4cacc0
ip-10-0-3-136.ec2.internal Ready infra,master 12m v1.11.0+d4cacc0
ip-10-0-3-236.ec2.internal Ready compute 9m v1.11.0+d4cacc0
ip-10-0-4-215.ec2.internal Ready compute 9m v1.11.0+d4cacc0
ip-10-0-4-54.ec2.internal Ready infra,master 12m v1.11.0+d4cacc0
[ec2-user@ip-10-0-1-8 ~]$ oc new-app https://github.com/openshift/ruby-hello-world.git
--> Found Docker image e42d0dc (16 months old) from Docker Hub for "centos/ruby-22-centos7"
Ruby 2.2
--------
Ruby 2.2 available as container is a base platform for building and running various Ruby 2.2 applications and frameworks. Ruby is the interpreted scripting language for quick and easy object-oriented programming. It has many features to process text files and to do system management tasks (as in Perl). It is simple, straight-forward, and extensible.
Tags: builder, ruby, ruby22
* An image stream tag will be created as "ruby-22-centos7:latest" that will track the source image
* A Docker build using source code from https://github.com/openshift/ruby-hello-world.git will be created
* The resulting image will be pushed to image stream tag "ruby-hello-world:latest"
* Every time "ruby-22-centos7:latest" changes a new build will be triggered
* This image will be deployed in deployment config "ruby-hello-world"
* Port 8080/tcp will be load balanced by service "ruby-hello-world"
* Other containers can access this service through the hostname "ruby-hello-world"
--> Creating resources ...
imagestream.image.openshift.io "ruby-22-centos7" created
imagestream.image.openshift.io "ruby-hello-world" created
buildconfig.build.openshift.io "ruby-hello-world" created
deploymentconfig.apps.openshift.io "ruby-hello-world" created
service "ruby-hello-world" created
--> Success
Build scheduled, use 'oc logs -f bc/ruby-hello-world' to track its progress.
Application is not exposed. You can expose services to the outside world by executing one or more of the commands below:
'oc expose svc/ruby-hello-world'
Run 'oc status' to view your app.
Cleaning it Up
When it comes time to clean it up, we’ll basically need to run the process in reverse.
If you do not disconnect from your VPN session prior to destroying the
vpc
module, your plan cleaup may get stuck, requiring that you manually fix the state lock.
First we’ll clean up the OpenShift cluster:
# CWD is <repo path>/terraform/openshift
terraform destroy -var-file=my-vars.tfvars
Next, we’ll disconnect from the VPN:
Thu Oct 10 15:11:50 2019 /sbin/ip route add 0.0.0.0/1 via 10.0.44.161
Thu Oct 10 15:11:50 2019 /sbin/ip route add 128.0.0.0/1 via 10.0.44.161
Thu Oct 10 15:11:50 2019 WARNING: this configuration may cache passwords in memory -- use the auth-nocache option to prevent this
Thu Oct 10 15:11:50 2019 Initialization Sequence Completed
# Hit Ctrl+C to stop the connection
^CThu Oct 10 22:30:42 2019 event_wait : Interrupted system call (code=4)
Thu Oct 10 22:30:42 2019 /sbin/ip route del 18.211.133.7/32
Thu Oct 10 22:30:42 2019 /sbin/ip route del 0.0.0.0/1
Thu Oct 10 22:30:42 2019 /sbin/ip route del 128.0.0.0/1
Thu Oct 10 22:30:42 2019 Closing TUN/TAP interface
Thu Oct 10 22:30:42 2019 /sbin/ip addr del dev tun0 10.0.44.162/27
Thu Oct 10 22:30:42 2019 SIGINT[hard,] received, process exiting
Then we’ll destroy the vpc
module:
# CWD is <repo path>/terraform/vpc
terraform destroy -var-file=my-vars.tfvars
And finally, we’ll clean up our state backends, locks, and certificates:
# Backends
aws s3 rb --force s3://<insert s3 bucket name here>
aws dynamodb delete-table --table-name terraform-vpc-lock
aws dynamodb delete-table --table-name terraform-openshift-lock
# Certificates
SERVER_ARN=$(aws resourcegroupstaggingapi get-resources --tag-filters Key=Name,Values=vpn-server-cert --resource-type-filters acm:certificate | jq -r '.ResourceTagMappingList[0].ResourceARN')
CLIENT_ARN=$(aws resourcegroupstaggingapi get-resources --tag-filters Key=Name,Values=vpn-client-cert --resource-type-filters acm:certificate | jq -r '.ResourceTagMappingList[0].ResourceARN')
aws acm delete-certificate --certificate-arn $SERVER_ARN
aws acm delete-certificate --certificate-arn $CLIENT_ARN
And that’s that!