Posts

Showing posts from 2024

KB:(Behavior Driven Development) BDD Pattern

Behavior-Driven Development (BDD) is a collaborative approach to software development that emphasizes communication between developers, testers, and business stakeholders. It bridges the gap between technical and non-technical team members by using plain language to describe the behavior of the software. BDD focuses on creating clear and shared understanding of how a system should behave, enabling the creation of software that aligns closely with business goals. Core Principles of BDD Collaboration : Encourages close collaboration between technical and non-technical stakeholders. User-Focused : Focuses on delivering features that bring value to the end-user. Shared Language : Uses a ubiquitous language to describe software behavior in plain English. Living Documentation : BDD artifacts double as documentation that stays current with the software. BDD Workflow Discovery Phase : Team members work together to understand and define the desired behavior of the system. User stories and examp...

The Rise and Fall of VMware: A Hypervisor's Tale of Triumph and Turmoil

Image
 Once upon a time, in the ever-evolving world of technology, there was an extraordinary hypervisor called ESXi , crafted by the visionary team at VMware. It was a masterpiece of innovation, transforming data centers and delighting IT professionals everywhere. ESXi was efficient, reliable, and beloved, serving countless happy customers who relied on it to power their businesses. It seemed unstoppable, the jewel in VMware’s crown. But the tech world is not without its storms. As ESXi flourished and VMware thrived, a powerful corporation named Broadcom set its sights on the company. Broadcom was not content to simply compete; they wanted to own the crown. They decided the best way to assert control was through acquisition. And so, Broadcom made a bold move: they announced their plan to acquire VMware. At first, the industry watched in suspense. Surely, this would bring even more resources and innovation to VMware's beloved products. But Broadcom had other plans. Once the deal was in ...

KB: Kube-proxy vs CNI Plugin

  kube-proxy is not considered a CNI plugin in Kubernetes or AKS (Azure Kubernetes Service). It serves a different purpose within the Kubernetes networking stack. Let me clarify the roles of kube-proxy and a CNI plugin kube-proxy: Role : kube-proxy manages network rules that allow communication between Kubernetes services and pods. It sets up the networking rules (e.g., iptables, IPVS, or eBPF) to enable service discovery and routing within the cluster. Key Responsibilities : Implements Kubernetes Service networking. Forwards traffic from a service's ClusterIP to the appropriate pod(s) backing the service. Handles load balancing for traffic directed to services. Scope : It operates at the service level, not the pod-to-pod network level. CNI Plugin: Role : A CNI plugin is responsible for setting up the pod network. It ensures that all pods across the cluster can communicate with each other and with the host network. Key Responsibilities : Assigns IP addresses to pods. Configures ...

KB: Data ELT vs ETL

Image
  ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) are both data integration processes used for data management, typically within data warehouses, but they differ in the order of steps and their specific use cases. ETL (Extract, Transform, Load) Order : Data is first extracted from source systems, then transformed (cleaned, formatted, aggregated, etc.), and finally loaded into the target data warehouse or data store. Transformation Location : Data is transformed in a staging area or in an ETL tool before it reaches the target. Use Case : Traditional data warehouses, where transformations are needed to standardize data before storage. Often suitable for environments with limited storage or compute power. Pros : Ensures data is clean and structured upon arrival in the target system. Good for legacy systems where transformations need to happen outside the data warehouse. Cons : Can be time-consuming and resource-intensive, especially for large datasets. Transformation...

KB:Airflow on AKS

Image
  To deploy Apache Airflow to Azure Kubernetes Service (AKS), you can follow these general steps based on the architecture in the image. This guide includes high-level steps and common practices to integrate with components such as Redis, PostgreSQL, and monitoring tools like Dynatrace. Prerequisites Azure Kubernetes Service (AKS) : Set up an AKS cluster and ensure you have access. Azure PostgreSQL : For the metadata database. Azure File Share : For log storage. Redis : For Airflow’s task queue. GitLab : For CI/CD pipeline integration and version control. Monitoring Tool (e.g., Dynatrace): Set up for observability. Step-by-Step Deployment Guide 1. Set Up AKS Cluster Create an AKS cluster using Azure CLI or the Azure portal. Ensure the cluster has sufficient nodes for Airflow components (webserver, scheduler, workers). Set up a network load balancer if needed for on-premises or other integrations. 2. Deploy Airflow on AKS using Helm Helm provides an easy way to deploy Airflow on Ku...

KB: Airflow DAG

Image
An Airflow DAG (Directed Acyclic Graph) is the central component in Apache Airflow that defines a workflow or pipeline. It represents the sequence and dependencies of tasks to be executed. Each DAG is a Python script that contains the structure of the workflow, detailing the tasks (operators) and their dependencies. DAGs: Tasks inside DAGs: Here's a breakdown of the main elements of an Airflow DAG: 1. Directed Acyclic Graph (DAG) Directed : It has a clear order where each task points to the next task. Acyclic : It cannot have loops, so tasks cannot depend on each other in a circular way. Graph : It represents tasks (nodes) and dependencies (edges) as a graph structure. 2. Tasks in a DAG Tasks are individual steps in a DAG and are represented by operators (like PythonOperator , BashOperator , etc.). Each task can perform specific actions, such as running scripts, transferring data, or checking conditions. Tasks are defined in the DAG and assigned dependencies to set the order in w...

KB: Airflow Operators

In Apache Airflow, operators are tasks that perform specific actions in a workflow or Directed Acyclic Graph (DAG). Each operator defines a particular action or job, such as running a script, transferring files, or interacting with external systems. Airflow provides various types of operators, which can be categorized broadly as follows: 1. Basic Operators PythonOperator : Executes Python functions directly. BashOperator : Runs Bash commands. DummyOperator : Used as a placeholder or for creating dependencies. 2. Transfer Operators Used to transfer data between different systems or databases. Examples: S3ToRedshiftOperator (transfers data from S3 to Redshift), MySqlToS3Operator , S3ToSnowflakeOperator . 3. Sensors Special operators that wait for a specific condition to be true. Examples: FileSensor (waits for a file to be present), ExternalTaskSensor (waits for another task to complete), TimeDeltaSensor (waits for a specified time). 4. Hooks Not exactly operators, but hooks allow op...

KB:Process Manager (Supervisord/Tini)

Image
The container process manager, often referred to as an "init system" (such as tini or supervisord), is a lightweight and essential component within containerized environments. Acting as the first process (PID 1) in a container, its primary responsibility is to monitor and manage all subsequent processes, ensuring that any child processes are properly started, monitored, and restarted in case of failure. This process manager is critical for maintaining container stability, as it handles important tasks like reaping zombie processes, forwarding system signals, and ensuring that resource consumption remains efficient.  However, it is not considered best practice to run multiple processes within a single container or pod; each pod/container should have a primary init process focused on a single responsibility to align with the principles of containerization, such as process isolation and microservices architecture. References:  https://docs.docker.com/engine/containers/multi-serv...

KB:Kubectl EXPLAIN

Describe fields and structure of various resources. This command describes the fields associated with each supported API resource. Fields are identified via a simple JSONPath identifier: Behind the scenes, kubectl just made an API request to my Kubernetes cluster, grabbed the current Swagger documentation of the API version running in the cluster, and output the documentation and object types. kubectl explain deployment kubectl explain deployment -- recursive kubectl explain deployment.spec.strategy References: https://kubernetes.io/docs/reference/kubectl/generated/kubectl_explain/ https://blog.heptio.com/kubectl-explain-heptioprotip-ee883992a243

KB:Kubernetes ReplicaSet

Image
A ReplicaSet's purpose is to maintain a stable set of replica Pods running at any given time. Usually, you define a Deployment and let that Deployment manage ReplicaSets automatically. --- The reason why the Selector is added to the spec of a Kubernetes ReplicaSet is to allow it to manage Pods that match a specified label, regardless of whether the ReplicaSet itself created them . This is particularly useful because it enables the ReplicaSet to manage Pods that might have been created manually or by another process (assuming they don't already have another controller managing them). As long as the Pods have the same labels that match the ReplicaSet's Selector, the ReplicaSet controller will treat them as its own and manage their lifecycle—ensuring the desired number of replicas is running. This behavior helps the controller maintain the correct state, but it also means care should be taken when using labels to avoid conflicting control over Pods by different controllers. Wh...

KB: Kubectl auto completion setup

kubectl completion Synopsis Output shell completion code for the specified shell (bash, zsh, fish, or powershell). The shell code must be evaluated to provide interactive completion of kubectl commands. This can be done by sourcing it from the .bash_profile. Detailed instructions on how to do this are available here: for macOS: https://kubernetes.io/docs/tasks/tools/install-kubectl-macos/#enable-shell-autocompletion for linux: https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/#enable-shell-autocompletion for windows: https://kubernetes.io/docs/tasks/tools/install-kubectl-windows/#enable-shell-autocompletion Note for zsh users: [1] zsh completions are only supported in versions of zsh >= 5.2. kubectl completion SHELL ref: https://kubernetes.io/docs/reference/kubectl/generated/kubectl_completion/

KB:Json Path Query

Image
Jason root = $ $.element. if you have "jq" installed you can use the -c 'path'  to see compacted list of all the elements.  the $ symbol is often used to represent the root element in JSON data when using tools or languages that support JSONPath, which is a query language for JSON data similar to XPath for XML. Here's why and how it's used: Why Use $ in JSONPath? Root Reference: The $ symbol in JSONPath represents the root object or array. It's a way to anchor your query to the very beginning of the JSON structure. Navigation: From the $ root, you can navigate through the JSON structure to access nested elements or values by specifying keys or indices.   Function Description Example Result text the plain text kind is {.kind} kind is List @ the current object {@} the same as input .  or  [] child operator {.kind} ,  {['kind']}  or  {['name\.type']} List .. recursive descent {..name} 127.0.0.1 127.0.0.2 myself e2e * wildcard. Get all ob...

KB:LLM Vectors vs Embeddings

Image
The sequence is from "Text" to "Tokens," then "Vectors," and finally "Embeddings." In data processing and machine learning, the creation or extraction of a vector typically precedes the embedding process. Initially, data is converted into a vector form, which is a numerical representation, and then, embeddings are generated from these vectors. Embeddings are lower-dimensional representations that capture the relationships and features of the data, making it easier to use in various machine learning models.  This generalized concept reflects the idea that vectors serve as the foundation upon which embeddings are built. When a model is trained, the initial vectorized values (those random vectors assigned at the beginning) don’t exist separately after the embeddings are created. Here’s how it works: Training Process: Initial Vectors: These are the starting points, just random numbers. They exist at the beginning of the training process but are not ...

KB:SQL DDL/DML

Image
DDL (Data Definition Language) provides the ability to define, create and modify database objects such as tables, views, indexes, and users. DML (Data Manipulation Language) allows for manipulating data in a database, such as inserting, updating, and deleting records. In SQL, DDL (Data Definition Language) and DML (Data Manipulation Language) are two categories of SQL commands used for different purposes: Data Definition Language (DDL) DDL commands are used to define and manage database schema, which includes creating, altering, and deleting database objects such as tables, indexes, and views. Common DDL commands include: CREATE : Used to create database objects like tables, indexes, views, etc. Example: CREATE TABLE Employees ( EmployeeID int PRIMARY KEY, FirstName varchar ( 255 ), LastName varchar ( 255 ), BirthDate date ); ALTER : Used to modify an existing database object. Example: ALTER TABLE Employees ADD COLUMN Salary decimal ( 10 , 2 ); DROP : Used to...

KB: LoadBalancers vs Ingress

Kubernetes Ingress and Service LoadBalancer both handle external access to applications running in a Kubernetes cluster, but they operate in different ways and are used in different scenarios. Here’s a comparison to understand their differences: Kubernetes Service LoadBalancer Purpose : Exposes a single Service to external traffic by creating a load balancer. Operation : Directly creates a cloud provider load balancer (e.g., AWS ELB, GCP LB). Maps a single Service to the external load balancer, which routes traffic to the Service's Pods. Provides a single external IP address for the Service. Use Case : Suitable for simple use cases where a single Service needs to be exposed to external traffic. Complexity : Less complex setup, easy to configure, ideal for straightforward scenarios. Example : Exposing a single web application to the internet. Kubernetes Ingress Purpose : Manages external access to multiple Services, typically HTTP and HTTPS, providing load balancing, SSL termination...

KB:Ingress

Image
 Ingress has two components: - Ingress Controller      The controller is deployed as a pod on all the nodes (i.e. Nginx)     The pods are deployed using a "Deployment"     the default route is defined in the deployment     there is NO "kind: IngressController" there is only kind "ingress" after the controller has been deployed in the cluster. - Ingress Resource     The resources is used to route the traffic to the correct backend/service References: https://overcast.blog/kubernetes-ingress-controllers-and-ingress-resources-a-practical-guide-7a709dec3e4b

KB:RMM VS DEX (Remote Monitoring Management vs Digital Employee Experience)

Digital Employee Experience (DEX) and Remote Monitoring and Management (RMM) tools serve different purposes and cater to distinct aspects of IT and employee management. Digital Employee Experience (DEX) Focus: Enhancing the overall experience of employees with their digital workplace tools and environment. Key Features: User Experience Monitoring: Tracks how employees interact with software and hardware, identifying issues impacting productivity. Performance Analytics: Provides insights into the performance of applications from the end-user's perspective. Feedback Mechanisms: Allows employees to report issues and give feedback on their digital tools and environment. Employee Well-being: Monitors factors that affect employee satisfaction and well-being, such as system performance and usability. Proactive Support: Identifies potential issues before they become significant problems, enabling proactive IT support. Goals: Improve employee satisfaction and productivity. Ensure a sm...