n3s0 || journal

SadServers: Bilbao: Basic Kubernetes Problems

Posted on 5 mins

Summary


This is my writeup for the Bilbao scenario for Sad Servers.

Scenario


Description

There’s a Kubernetes Deployment with an Nginx pod and a Load Balancer declared in the manifest.yml file. The pod is not coming up. Fix it so that you can access the Nginx container through the Load Balancer.

There’s no “sudo” (root) access.

Test: Running curl 10.43.216.196 returns the default Nginx Welcome page.

See /home/admin/agent/check.sh for the test that “Check My Solution” runs.

Time to Solve: 10 minutes.

Solution


I attempted to check and see if it would work after first starting it up using curl. But, it didn’t show anything good. Said the ports closed.

curl 10.43.216.196

Checked the pods using the get pods commands.

kubectl get pods

It looks like the nginx-deployment-67699598cc-zrj6f pod isn’t ready yet and its been in this state for 268 days. Some significant pending time. Pending time is the new uptime flex these days I suppose.

NAME                                READY   STATUS    RESTARTS   AGE
nginx-deployment-67699598cc-zrj6f   0/1     Pending   0          268d

Checked the descriptio for the pod using the following command.

kubectl describe pods nginx-deployment-67699598cc-zrj6f

I’m pretty new to Kubernetes. So, not a whole lot jumped out to me. But, the memory utilization for this pod. 2000Mi seems a little high. So, I might change that down the road.

The failed scheduling events also jumped out. So, I decided to read up a little on Affinity and Node Selectors and ran with something being related to that being the issue.

Name:             nginx-deployment-67699598cc-zrj6f
Namespace:        default
Priority:         0
Service Account:  default
Node:             <none>
Labels:           app=nginx
                  pod-template-hash=67699598cc
Annotations:      <none>
Status:           Pending
IP:               
IPs:              <none>
Controlled By:    ReplicaSet/nginx-deployment-67699598cc
Containers:
  nginx:
    Image:      localhost:5000/nginx
    Port:       80/TCP
    Host Port:  0/TCP
    Limits:
      cpu:     100m
      memory:  2000Mi
    Requests:
      cpu:        100m
      memory:     2000Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mslhc (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  kube-api-access-mslhc:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Guaranteed
Node-Selectors:              disk=ssd
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  268d                   default-scheduler  0/2 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had untolerated taint {node.kubernetes.io/unreachable: }. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling..
  Warning  FailedScheduling  2m41s (x2 over 7m41s)  default-scheduler  0/2 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had untolerated taint {node.kubernetes.io/unreachable: }. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling..

Checked the home directory for the manifest.yml file using the following command.

ls -lah ./manifest.yml

Looks like it’s in there and the user I’m connected as can make edits to the file.

total 44K
-rw-r--r-- 1 admin admin  755 Jan 17  2024 manifest.yml

This is the full output from reading the manifest.yml file with the cat(1) command. Looks like all of the confiuratio applied to the pod we’ve deployed has been applied to it.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: localhost:5000/nginx
        ports:
        - containerPort: 80
        resources:
          limits:
            memory: 2000Mi
            cpu: 100m
          requests:
            cpu: 100m
            memory: 2000Mi
      nodeSelector:
        disk: ssd

---
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
  selector:
    app: nginx
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
  clusterIP: 10.43.216.196
  type: LoadBalancer

Labeled node1 with the label ssd using the following command. The nodeSelector section indicates that it expects a node with the label disk with a value of ssd. This is one way to indicate which node you would like to be assigned to a pod. I’m not sure if node1 has an SSD on it. But, if it did and the workload required needed that. I would probably want to assign it to that. It makes sense to degree.

kubectl label nodes node1 disk=ssd

The node node1 has been successfully labeled.

node/node1 labeled

Another thing I did was edit the manifest.yml file. Setting the memory limit to 200Mi and the requests memory to 200Mi.

spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: localhost:5000/nginx
        ports:
        - containerPort: 80
        resources:
          limits:
            memory: 200Mi
            cpu: 100m
          requests:
            cpu: 100m
            memory: 200Mi
      nodeSelector:
        disk: ssd

Applied the new manifest to the deployment using the following command.

kubectl apply -f manifest.yml 

Looks like it updated successfully.

deployment.apps/nginx-deployment configured
service/nginx-service unchanged

Checked the server using the curl command and it worked as expected.

curl 10.43.216.196

As shown below. This is the default page for Nginx. So it looks like this server is up and running.

<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

Checked using the agent check.sh file.

agent/check.sh

Looks like everything is OK and this scenario is complete.

OK

Conclusion


This is my first real exposure to Kubernetes in any shape or form. But, considering how long it look me to go through this challenge. I think it’s time to start setting up a cluster just to go through the process and get a grasp of what that all entails. One of my goals is to learn a little more about container networking. So I’ll start with the basics of getting it all setup and dive a little deeper later.