Posts

KB: Kube-proxy vs CNI Plugin

  kube-proxy is not considered a CNI plugin in Kubernetes or AKS (Azure Kubernetes Service). It serves a different purpose within the Kubernetes networking stack. Let me clarify the roles of kube-proxy and a CNI plugin kube-proxy: Role : kube-proxy manages network rules that allow communication between Kubernetes services and pods. It sets up the networking rules (e.g., iptables, IPVS, or eBPF) to enable service discovery and routing within the cluster. Key Responsibilities : Implements Kubernetes Service networking. Forwards traffic from a service's ClusterIP to the appropriate pod(s) backing the service. Handles load balancing for traffic directed to services. Scope : It operates at the service level, not the pod-to-pod network level. CNI Plugin: Role : A CNI plugin is responsible for setting up the pod network. It ensures that all pods across the cluster can communicate with each other and with the host network. Key Responsibilities : Assigns IP addresses to pods. Configures ...

KB: Data ELT vs ETL

Image
  ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) are both data integration processes used for data management, typically within data warehouses, but they differ in the order of steps and their specific use cases. ETL (Extract, Transform, Load) Order : Data is first extracted from source systems, then transformed (cleaned, formatted, aggregated, etc.), and finally loaded into the target data warehouse or data store. Transformation Location : Data is transformed in a staging area or in an ETL tool before it reaches the target. Use Case : Traditional data warehouses, where transformations are needed to standardize data before storage. Often suitable for environments with limited storage or compute power. Pros : Ensures data is clean and structured upon arrival in the target system. Good for legacy systems where transformations need to happen outside the data warehouse. Cons : Can be time-consuming and resource-intensive, especially for large datasets. Transformation...

KB:Airflow on AKS

Image
  To deploy Apache Airflow to Azure Kubernetes Service (AKS), you can follow these general steps based on the architecture in the image. This guide includes high-level steps and common practices to integrate with components such as Redis, PostgreSQL, and monitoring tools like Dynatrace. Prerequisites Azure Kubernetes Service (AKS) : Set up an AKS cluster and ensure you have access. Azure PostgreSQL : For the metadata database. Azure File Share : For log storage. Redis : For Airflow’s task queue. GitLab : For CI/CD pipeline integration and version control. Monitoring Tool (e.g., Dynatrace): Set up for observability. Step-by-Step Deployment Guide 1. Set Up AKS Cluster Create an AKS cluster using Azure CLI or the Azure portal. Ensure the cluster has sufficient nodes for Airflow components (webserver, scheduler, workers). Set up a network load balancer if needed for on-premises or other integrations. 2. Deploy Airflow on AKS using Helm Helm provides an easy way to deploy Airflow on Ku...

KB: Airflow DAG

Image
An Airflow DAG (Directed Acyclic Graph) is the central component in Apache Airflow that defines a workflow or pipeline. It represents the sequence and dependencies of tasks to be executed. Each DAG is a Python script that contains the structure of the workflow, detailing the tasks (operators) and their dependencies. DAGs: Tasks inside DAGs: Here's a breakdown of the main elements of an Airflow DAG: 1. Directed Acyclic Graph (DAG) Directed : It has a clear order where each task points to the next task. Acyclic : It cannot have loops, so tasks cannot depend on each other in a circular way. Graph : It represents tasks (nodes) and dependencies (edges) as a graph structure. 2. Tasks in a DAG Tasks are individual steps in a DAG and are represented by operators (like PythonOperator , BashOperator , etc.). Each task can perform specific actions, such as running scripts, transferring data, or checking conditions. Tasks are defined in the DAG and assigned dependencies to set the order in w...

KB: Airflow Operators

In Apache Airflow, operators are tasks that perform specific actions in a workflow or Directed Acyclic Graph (DAG). Each operator defines a particular action or job, such as running a script, transferring files, or interacting with external systems. Airflow provides various types of operators, which can be categorized broadly as follows: 1. Basic Operators PythonOperator : Executes Python functions directly. BashOperator : Runs Bash commands. DummyOperator : Used as a placeholder or for creating dependencies. 2. Transfer Operators Used to transfer data between different systems or databases. Examples: S3ToRedshiftOperator (transfers data from S3 to Redshift), MySqlToS3Operator , S3ToSnowflakeOperator . 3. Sensors Special operators that wait for a specific condition to be true. Examples: FileSensor (waits for a file to be present), ExternalTaskSensor (waits for another task to complete), TimeDeltaSensor (waits for a specified time). 4. Hooks Not exactly operators, but hooks allow op...

KB:Process Manager (Supervisord/Tini)

Image
The container process manager, often referred to as an "init system" (such as tini or supervisord), is a lightweight and essential component within containerized environments. Acting as the first process (PID 1) in a container, its primary responsibility is to monitor and manage all subsequent processes, ensuring that any child processes are properly started, monitored, and restarted in case of failure. This process manager is critical for maintaining container stability, as it handles important tasks like reaping zombie processes, forwarding system signals, and ensuring that resource consumption remains efficient.  However, it is not considered best practice to run multiple processes within a single container or pod; each pod/container should have a primary init process focused on a single responsibility to align with the principles of containerization, such as process isolation and microservices architecture. References:  https://docs.docker.com/engine/containers/multi-serv...

KB:Kubectl EXPLAIN

Describe fields and structure of various resources. This command describes the fields associated with each supported API resource. Fields are identified via a simple JSONPath identifier: Behind the scenes, kubectl just made an API request to my Kubernetes cluster, grabbed the current Swagger documentation of the API version running in the cluster, and output the documentation and object types. kubectl explain deployment kubectl explain deployment -- recursive kubectl explain deployment.spec.strategy References: https://kubernetes.io/docs/reference/kubectl/generated/kubectl_explain/ https://blog.heptio.com/kubectl-explain-heptioprotip-ee883992a243