NGINX in TKG Cluster in vSphere Workload Management – Road to Kubeflow in vSphere

Welcome to the third part in this series. This series has the target to finally deploy Kubeflow in vSphere, but the post is only one of the steps required for our destination.

Don’t forget to catch up if you missed how we set up Workload Management (the supervisor cluster) or this brief differentiation between TKG Cluster and the Workload Management supervisor cluster.

What you’re going to learn

We’ll set up a TKG cluster within vSphere 7 in the supervisor cluster created by the Workload Management. For this to successfully work we need to create a content library, authenticate with the kubectl vSphere plugin and utilize TKG for obtaining all necessary information. Finally, we’ll make sure to deploy a NGINX. I give you an example on how to overcome the Pod Security Policy, which is enabled on TKG Clusters.

Necessary for completion:

For a successful NGINX Deployment, you’ll need the vSphere admin account, not the admin role, this is NOT enough. Theoretically it’s not necessary, for CREATING the TKG cluster, but you will have to guess some parameters. Unfortunately, you can’t access crucial information without the admin account. When you have obtained this information, each configured user later on may be enabled to create the TKG cluster on their own.

Additional make sure to download this binary TKG CLI . This is also technically not necessary, but it’ll ease the access to information.

Breaking down this tutorial

All steps for your TKG Cluster and the NGINX Deployment:

  1. Subscribe the content library from VMware (for obtaining OVF Templates)
  2. Create a namespace, where you want to deploy your TKG Cluster
  3. Create a TKG Cluster
  4. Create NGINX Deployment

1. Subscribing the content library from VMware

In the first post I explained, that the TKG Cluster is realized as VMs i.e. the Kubernetes control planes and worker nodes are VMs. Therefore, we need OVF templates, so TKG can set up these VMs. This step is solely needed, when you want to utilize TKG, otherwise this is not necessary.

  1. Connect to vSphere Client and select Content Libraries via the Menu
setting up content library step 1
setting up the content library step 1
  1. Create a new Content Library
setting up the content library step 2
setting up the content library step 2
  1. Choose a name, and on which vCenter Server to save, click next
setting up the content library step 3
setting up the content library step 3
  1. Subscribe to the following URL:
https://wp-content.vmware.com/v2/latest/lib.json

Make sure that you select “immediately”, because if not, TKG won’t find the OVF templates, and when applying the TKG Cluster .yaml file, Kubernetes will throw an error!

setting up the content library step 4
setting up the content library step 4
  1. Ignore this warning. The chosen URL is from VMware and a check via the browser against that URL shows no certificate problems. Click Yes!
setting up the content library step 5
setting up the content library step 5
  1. Finally select the Storage Location and visit the last configuration step.
setting up the content library step 6
setting up the content library step 6
  1. One last check. Pressing FINISH creates the Content Library
setting up the content library step 7
setting up the content library step 7
  1. Choose your Cluster, where the Workload Management is configured in. Within Configuration > Namespaces > General you’ll set up the Content Library, solely needed for TKG.
setting up the content library step 8
setting up the content library step 8
  1. The last step! Select your freshly created Content Library and Finish with OK.
setting up the content library step 9
setting up the content library step 9

2. Creating a namespace

A namespace in Workload Management is not exactly the same as a namespace in Kubernetes. You can’t create namespaces in the supervisor cluster. Workload Management does not mind your admin privileges, it’ll deny your request. Even though creating a namespace in Workload Management will create a namespace in the supervisor cluster, too.

  1. Select Workload Management in the vSphere Client via the Menu
create namespace step 1
create namespace step 1
  1. Select Namespaces followed by Create Namespace
create namespace step 2
create namespace step 2
  1. Choose your ESXi Cluster, in which you deployed your Workload Management, put a namespace-name and remember it
create namespace step 3
create namespace step 3
  1. Add a Permission, so someone can deploy into this cluster. This step can be skipped, when you connect as a vsphere.local Administrator to the cluster
create namespace step 4
create namespace step 4
  1. Select an identity source, e.g. your domain. The identity source and the username are required later, when connecting against the cluster!
create namespace step 5
create namespace step 5
  1. Finally configure the Storage
create namespace step 6
create namespace step 6
  1. Provide any VM Storage Policy, you’ve maybe already configured. I’ve chosen the Storage Policy which is used for Workload Management.
create namespace step 7
create namespace step 7

Congratulations, the namespace is created and configured.

3. Create the TKG Cluster

Now we need the TKG cli, download it to your work machine, make sure it’s executable.

kubectl and the corresponding kubectl-vsphere plugin will be downloaded in the next few steps, as they’re given to us conveniently by Workload Management.

kubectl communicates with any cluster – based on information stored in the following file:

~/.kube/config

VMware provides a cli binary: kubectl-vsphere. This binary checks authentication and, based on authorization, configures your ~/.kube/config file.

After that it’s mostly regular kubectl. Only for the connection to the TKG cluster we’ll need to kubectl-vsphere binary again.

Screenshots now are sparse, as the console commands must be pasted anyways.

  1. We left on namespaces (if you somehow manage to not find it again, don’t worry: Menu > Workload Management > Namespace > <your namespace>)
create tkg cluster step 1
create tkg cluster step 1
  1. Make sure to remember the IP of the website that you’re currently accessing (it’s the IP of the supervisor cluster)
create tkg cluster step 2
create tkg cluster step 2
  1. You could manually download the binaries, I’ll prefer using wget and creating my own scripts, so I copy the download link.
create tkg cluster step 3
create tkg cluster step 3
  1. Unpack, move binaries to binary location and remove the remainder
create tkg cluster step 4
create tkg cluster step 4

Remember to change the IP to your supervisor cluster

{
  wget https://<IP-of-supervisor-cluster>/wcp/plugin/linux-amd64/vsphere-plugin.zip
  unzip vsphere-plugin.zip
  sudo mv bin/* /usr/local/bin
  rm -r bin
  rm vsphere-plugin.zip
}
  1. Now utilize the vsphere plugin and connect as the vSphere administrator.
create tkg cluster step 5
create tkg cluster step 5
  1. The next commands are all done on the supervisor cluster.
kubectl config use-context <supervisor-ip> #10.4.10.1
  1. Now configure the TKG binary
tkg add mc <supervisor-ip> #10.4.10.1
tkg set mc <supervisor-ip> #10.4.10.1
  1. Obtain configuration parameters,

we need the

  • storageclass (where can we save what)
  • virtualmachineclasses (how much ressources to allocate for the worker and master nodes of the tkg cluster)
  • virtualmachineimage (which image to use to deploy the worker and master nodes).

Save the circled information!

create tkg cluster get storageclass
create tkg cluster get storageclass
create tkg cluster get virtualmachineimages
create tkg cluster get virtualmachineimages
create tkg cluster get virtual machine classes with extra information
create tkg cluster get virtual machine classes with extra information
kubectl get storageclass
kubectl get virtualmachineimages
kubectl get virtualmachineclasses
kubectl get virtualmachineclasses -o jsonpath="{range .items[*]}{.metadata.name}{'\t'}{.spec.hardware.cpus}{'\t'}{.spec.hardware.memory}{'\n'}"
  1. Now create some environment variables. the SERVICE_DOMAIN must match cluster.local or kubeflow will later on have troubles! Exchange found storage classes. The service/cluster CIDR are TKG Cluster internal, choose whatever you want to.
export CONTROL_PLANE_STORAGE_CLASS=gold-tanzu-kubernetes-storage-policy
export WORKER_STORAGE_CLASS=gold-tanzu-kubernetes-storage-policy
export DEFAULT_STORAGE_CLASS=gold-tanzu-kubernetes-storage-policy
export STORAGE_CLASSES=
export SERVICE_DOMAIN=cluster.local #don't change this! kubeflow needs it.
export CONTROL_PLANE_VM_CLASS=guaranteed-medium
export WORKER_VM_CLASS=guaranteed-xlarge
export SERVICE_CIDR=100.64.0.0/13
export CLUSTER_CIDR=100.96.0.0/11
  1. Create the TKG cluster, after you’ve exported all these variables.
    This command creates a .yaml file and saves it in tkg-cluster-creation-small. Exchange your cluster name (kubeflow-tkg), the namespace (kubeflow-ns), the kubernetes-version (v1.17.8+vmware.1-tkg.1.5417466). You do find the kubernetes version with the command from step 8 (kubectl get virtualmachineimages).
tkg config cluster kubeflow-tkg --plan=dev \
--namespace=kubeflow-ns \
--kubernetes-version=v1.17.8+vmware.1-tkg.1.5417466 \
--controlplane-machine-count=1 \
--worker-machine-count=3 > tkg-cluster-creation-small.yaml
  1. Apply the yaml
kubectl apply -f tkg-cluster-creation.yaml

Monitor Cluster creation

Now that the cluster is created, we’ll check vSphere… Once again we’ll open the namespace via Menu > Workload Management > Namespace > <your-namespace>.

monitor cluster creation
monitor cluster creation

Note down the name (1) of the namespace and the TKG Cluster name (4) to proceed.

When the cluster is green, meaning everything is up and running, you should proceed.

4. Create an NGINX Deployment

Creating a Pod in a TKG Cluster isn’t hard, it’ll work out of the box. But when creating a Deployment, for example, it’ll not work. We’ll hit the configured Pod Security Policy of VMware, because Service Accounts are utilized by Deployments. The Deployment will be created, but it won’t finish. When deploying Kubeflow we’ll fix this issue, too, so don’t stop reading now.

Let us connect to the freshly created TKG Cluster with the kubectl-vsphere plugin and the following command.

  1. Let us connect to the freshly created TKG Cluster with the kubectl-vsphere plugin and the following command.
kubectl vsphere login --server=<supervisor-ip> \
--vsphere-username <your-configured-user@<your-domain> \
--tanzu-kubernetes-cluster-namespace=<your-namespace> \
--tanzu-kubernetes-cluster-name=<your-TKG-Cluster-name>

The supervisor-ip is the same as the last one! It’ll handle authentication and authorization for you!

console input and output for logging into the TKG Cluster
console input and output for logging into the TKG Cluster

When you see this output message, you can proceed, kubectl is already configured to use the correct context. It’s still possible to switch out of the TKG Cluster back into the Supervisor Cluster, but that’s not necessary for us. Checking cluster health is free and fast, therefore type the following command for one last check.

kubectl get nodes
kubectl check nodes health
checking node health of the kubectl VMs
Create a Role Binding between Service Accounts and a new namespace

Full blown explanation for Pod Security Policies (PSP) is surely not my goal here. We’ll take the already specified PSP, attach them to all created Service Accounts (SA) from a new namespace (ns).

  1. Obtain all available PSP
kubectl get psp
create nginx step 1
obtaining the PSP name
  1. Create a new namespace, save this name for later
kubectl create namespace test
  1. Create a Role Binding between SA and PSP in the freshly created test namespace. Focus on the highlighted lines!
cat << EOF > test-rb.yaml
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: rb-all-sa_ns-test
  namespace: test
roleRef:
  kind: ClusterRole
  name: psp:vmware-system-privileged
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: Group
  apiGroup: rbac.authorization.k8s.io
  name: system:serviceaccounts:test
EOF

kubectl apply -f test-rb.yaml

rm test-rb.yaml
  1. With that RoleBinding each SA of the NS test will have a privileged PSP attached. Proceed with your deployment.
kubectl run nginx --image=nginx

Final thoughts

This was a rather long post with many potential problems. They may or may not occur. I’m going to create a special post with some erros I’ve hit. Let’s be honest, it’s not a script, you may forget something, overread a sentence or whatever!

Additionally I’m not advising for any set up security in this post, especially regarding Pod Security Policy, please check this topic out for yourself!

Kommentar absenden

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert