Prometheus & Grafana Project on Kubernetes Voting-app

We will set up observability for our voting application to ensure it runs smoothly and any issues can be quickly identified and resolved. Check out the first part of project Automated Deployment of Scalable Applications on AWS EC2 with Kubernetes and Argo CD

Observability consists of:

Monitoring: This involves keeping an eye on various metrics, such as CPU usage, to ensure the application is performing well. For example, we can track how much CPU% is being used at any given time.
Logging: This allows us to record and review error logs. For instance, if users are having trouble signing in, we can check the logs to find out why the sign-in process is failing.
Tracing: This helps us understand the path and sequence of events that lead to an error, making it easier to pinpoint and fix problems.
Alerting: This feature enables us to send notifications, such as emails, when certain conditions are met. For example, if CPU usage reaches 80%, we can receive an email alert to take action.

For logging, monitoring, and tracing, we use Prometheus, which is a powerful tool for collecting and managing these data points.

To visualize the data and create dashboards, we use Grafana, which allows us to build interactive and informative displays of our application's performance metrics.

Steps to create observability setup:

  1. Create an EC2 instance with t2.medium and 15 GB EBS Storage

    “Don’t forget to install docker and update the system” P.S: Also add the KIND cluster.

     sudo apt-get update -y
     sudo install docker.io 
     sudo usermod -aG docker $USER && newgrp docker
    
     #Install KIND for K8s Cluster
    
  2. Make clone of repository in your EC2 machine

     git clone https://github.com/LondheShubham153/k8s-kind-voting-app.git 
     # I'm using trainwithshubham github repo for the project
    

Note: Anyone reading out the project and following steps, you can checkout Shubham’s video for smooth process: https://www.youtube.com/watch?v=DXZUunEeHqM and repo URL: https://github.com/LondheShubham153/k8s-kind-voting-app/

  1. Now create a KIND Cluster

      kind create cluster --config=config.yml --name=my-cluster
    

  2. Check our cluster is ready

      kubectl get nodes
    

  3. Deploying Kubernetes Cluster with help of Argo CD: Quick way

     kubectl create namespace argocd
     kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
     kubectl get svc -n argocd
    
     #Expose Argo CD server using NodePort
     kubectl patch svc argocd-server -n argocd -p '{"spec": {"type": "NodePort"}}'
    
     #Forward ports to access Argo CD server
     kubectl port-forward -n argocd service/argocd-server 8443:443 &
    
     #GET Secret password for argo CD to login
     kubectl get secret -n argocd argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d && echo
    

    Access on Website: localhost:8443

  4. Setting up Argo CD: For Automated Kubernetes cluster setup with KIND

    Click on “Create Application”

    Fields to fill in Application

    #GENERAL:

    Application Name: voting-app

    Product Name: default

    Sync Policy: Automatic Check: Prune Resources and Self Heal

    #SOURCE:

    Repository URL: https://github.com/LondheShubham153/k8s-kind-voting-app.git

    Revision: main (branch name)

    Path: k8s-specifications (here all our yml manifest is placed)

    #DESTINATION:

    Cluster URL: https://kubernetes.default.svc

    Namespace: default

    Successfully created the Argo CD Setup for our Kubernetes Cluster

    To confirm that pods are created or not : $kubectl get pods

    Setup Prometheus:

    Pre-requisite for Prometheus is Helm

    What is Prometheus?

    Prometheus is a powerful time series database designed for monitoring and alerting. It collects and stores its metrics as time-stamped data, making it ideal for tracking and analyzing the performance of applications and infrastructure over time.

    Prometheus is widely used in cloud-native environments due to its robust querying capabilities and integration with other tools, such as Grafana, for visualizing data.

    Installing helm:

     curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
     chmod 700 get_helm.sh
     ./get_helm.sh
    

    Installing Prometheus:

     helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
     helm repo add stable https://charts.helm.sh/stable
     helm repo update
     kubectl create namespace monitoring
     helm install kind-prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --set prometheus.service.nodePort=30000 --set prometheus.service.type=NodePort --set grafana.service.nodePort=31000 --set grafana.service.type=NodePort --set alertmanager.service.nodePort=32000 --set alertmanager.service.type=NodePort --set prometheus-node-exporter.service.nodePort=32001 --set prometheus-node-exporter.service.type=NodePort
     kubectl get svc -n monitoring
     kubectl get namespace
    

     kubectl port-forward svc/kind-prometheus-kube-prome-prometheus -n monitoring 9090:9090 --address=0.0.0.0 &
     kubectl port-forward svc/kind-prometheus-grafana -n monitoring 31000:80 --address=0.0.0.0 &
    

    Open Prometheus: localhost:9090

    Check Target: If running then our Prometheus is getting data

    Prometheus have Expression section: here we perform Prom QL Queries to fetch the data from “TSDB” Time Series Database.

    Prom QL we are using:

     sum (rate (container_cpu_usage_seconds_total{namespace="default"}[1m])) / sum (machine_cpu_cores) * 100
     #Tells how much CPU has been used by our voting-app which is placed in default ns
    
     #More queries you can try 
     sum (container_memory_usage_bytes{namespace="default"}) by (pod)
    
     sum(rate(container_network_receive_bytes_total{namespace="default"}[5m])) by (pod)
     sum(rate(container_network_transmit_bytes_total{namespace="default"}[5m])) by (pod)
    

    Open Grafana: localhost:31000

    Use credential: Email- admin
    password: write query (initial password)

    Create a User: Go to Administrator > Users and access > Users > Create User

    With help of users, you can give access role to dashboard: View, editor, admin.

    Note: When you create Grafana and Prometheus with help of helm charts, then you don’t need to configure data sources in Grafana for Prometheus and Alert manager. You will see there already. See below

    Click on “Add Visualization” in Prometheus

    Now see the “yellow marked areas”: shows our configuration

    Too see various types of dashboard: In the right side, click on “search options” then “suggestion” section will appear. Select from there, the dashboard you like:

    Here, you can see our control plane of kubernetes cluster below:

    This how you can create fun dashboard using Grafana, and enjoy the practical approach to monitor your application usage and more.

    Below you can see our node-exporter: visuals in Grafana.

    We did alot of fun till now, but do you want to learn how to create complete Kubernetes Dashboard?
    Step1: Search on google, “Grafana Dashboard”

    Step2: Search for “kubernetes dashboard” > Copy Id to clipboard

    Step3: Open Grafana and Import dashboard, write ID and click “Load” > “Import”

    This how you can add the Dashboard Template for Kubernetes. The project is completed as of now
    Don’t forget to check out the part 1 with detailed implementation of project with ArgoCD and then proceed to this article for Observability project.

    Some major troubleshooting, I have faced in this project

    1. Installed and Managed Prometheus & Grafana with Helm

    • Used helm install and helm upgrade --install commands to deploy the Kube-Prometheus Stack with a custom values.yml file.

    • Checked running pods using kubectl get pods -n monitoring.

    • Edited and restarted the prometheus-kube-state-metrics deployment.

2. Troubleshot Image Pull Issues

  • Pulled the missing image manually using:

      docker pull registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0
    
  • Deleted problematic pods to trigger re-creation.

3. Checked Kubernetes Services and Ports

  • Listed services in the monitoring namespace using:

      bashCopyEditkubectl get svc -n monitoring
    
  • Attempted to port-forward services like Grafana and Prometheus but faced issues with incorrect ports.

  • Used ps aux | grep "kubectl port-forward" and sudo ss -tuln to check which ports were already in use.

4. Fixed Port-Forwarding Issues

  • Tried multiple port-forward commands with different services and ports:

      kubectl port-forward svc/prometheus-grafana -n monitoring 31000:80 --address=0.0.0.0 &
    
  • Experimented with different port numbers (3000, 31000, 31100, 31200).

5. Retrieved and Reset Grafana Admin Password

  • Extracted the Grafana admin password using:

      kubectl get secret prometheus-grafana -n monitoring -o jsonpath="{.data.admin-password}" | base64 --decode
    
  • Reset the password multiple times with:

      kubectl patch secret prometheus-grafana -n monitoring -p '{"data":{"admin-password":"'$(echo -n "newpassword" | base64)'"}}'
    
  • Deleted the Grafana pod to apply the new password.

6. Verified and Restarted Services

  • Listed services again using kubectl get svc -n monitoring to check running status.

  • Restarted port-forwarding after changing the password.