Ismat Samadov
  • Tags
  • About

© 2026 Ismat Samadov

RSS

Tag

Data Engineering

20 articles

15 min read/0 views

Change Data Capture Replaced Our Entire ETL Pipeline — Debezium, Postgres, and the Death of Batch

Our 15-minute batch ETL caused a billing incident. Debezium reading the Postgres WAL replaced the entire pipeline. CDC setup, consumer patterns, and production gotchas.

Data EngineeringPostgreSQLBackendArchitecture
13 min read/0 views

ClickHouse Processes 1 Billion Rows Per Second on a Single Node — OLAP for Engineers Who Hate Complexity

A 47-second Postgres query took 120ms on ClickHouse. Columnar storage, vectorized execution, and why your analytics belong in OLAP.

DatabaseData EngineeringPerformanceAnalytics
18 min read/1 views

Polars vs DuckDB vs Pandas: The 2026 Decision Guide

Polars is 8.7x faster than pandas. DuckDB is 9.4x faster. Both handle larger-than-RAM data. Here's when to use each — with benchmarks.

PythonData EngineeringPerformanceAnalytics
15 min read/2 views

Build a RAG Chatbot in 30 Minutes with LangChain and Neon PostgreSQL

Build a RAG chatbot with LangChain, OpenAI embeddings, and Neon PostgreSQL. pgvector, no Pinecone, full Python code, 30 minutes.

AIPythonLLMSQLData Engineering
13 min read/1 views

dbt Is Not Enough: SQLMesh, Coalesce, and the Next Wave of Data Transformation

SQLMesh is 9x faster than dbt, with free dev environments. Fivetran-dbt merger raises lock-in concerns. Coalesce offers visual SQL. Decision framework.

Data EngineeringSQLOpen SourceDeveloper Tools
13 min read/5 views

A Brief History of SQL: The Language That Refuses to Die

SQL was born in 1973 at IBM, survived the NoSQL rebellion, and now powers 55.6% of all developers. Here's how.

Data EngineeringOpinionSQL
13 min read/1 views

DataGovOps: Why Governance-as-Code Is the 2026 Data Engineering Mandate

Poor data quality costs $12.9M/year per enterprise. DataGovOps automates governance in CI/CD. EU AI Act makes it mandatory by August 2026.

Data EngineeringComplianceDevOpsOpen Source
13 min read/1 views

Event-Driven Architecture in 2026: From 'Should We Stream?' to 'How Do We Unify?'

IBM paid $11B for Confluent. 90% of enterprises adopt EDA. Kafka 4.0, Flink 2.0, and the Streamhouse vision are reshaping data infrastructure.

ArchitectureData EngineeringBackendInfrastructure
15 min read/1 views

PostgreSQL Is the Only Database You Need in 2026

PostgreSQL won the Stack Overflow triple crown 3 years straight. With JSONB, pgvector, PostGIS, and full-text search, it replaces 5 databases.

SQLData EngineeringOpinionBackend
18 min read/14 views

AI Agents Are the New Microservices: Everyone Wants Them, Almost Nobody Ships Them

The market says $200B by 2034. The data says 95% of agent projects fail before production. Here is what actually works.

AIData EngineeringLLMOpinion
15 min read/5 views

Graph RAG: The $7 Knowledge Graph That Beats Standard RAG by 2x (Sometimes)

When Graph RAG doubles retrieval accuracy and when it wastes your money. Benchmarks, costs, frameworks, and a decision framework.

AIData EngineeringLLMOpinionPython
13 min read/4 views

ML Engineering: The 87% Failure Rate and How to Beat It

Nearly 87% of ML projects never reach production. The failures aren't about models — they're about engineering.

CareerData EngineeringMLPython
11 min read/24 views

From 14 Browser Tabs to 10,000 Jobs: How I Turned Web Scraping Into a Startup

Four years of building Azerbaijan's biggest job aggregator as a solo founder on $25/month infrastructure.

BootstrappingData EngineeringPythonSAASStartupWeb Scraping
14 min read/4 views

Graph Database vs Vector Database: One Finds Similar Things, the Other Finds Connected Things

Graph databases find connections. Vector databases find similarities. When to use which, real benchmarks, and why PostgreSQL might replace both.

AIData EngineeringLLMOpinionSQL
13 min read/5 views

RAG Is Not As Simple As They Tell You

RAG tutorials teach the easy 20%. Here are the five production problems they skip — and how to actually solve them.

AIData EngineeringLLMPython
14 min read/4 views

Correlated vs Non-Correlated Subqueries: The SQL Concept That Breaks Production at 2 AM

The real difference between correlated and non-correlated subqueries, with benchmarks, optimizer behavior, and the NOT IN NULL trap.

CareerData EngineeringOpinionSQL
14 min read/1 views

I Replaced My Entire ETL Pipeline with SQL — Here's How

How I killed a 2,400-line Python ETL pipeline and replaced it with 300 lines of SQL using CTEs, materialized views, and pg_cron.

SQLData EngineeringPostgreSQLPython
15 min read/1 views

Building Data Pipelines in 2026: Airflow vs Dagster vs Prefect

Honest comparison of Airflow, Dagster, and Prefect for data pipelines in 2026. Code examples, pricing, and what I actually use.

PythonData EngineeringDevOpsBackend
16 min read/9 views

Vector Databases Are Overhyped — When You Actually Need One

Most teams don't need Pinecone. pgvector benchmarks, decision framework, and when dedicated vector DBs actually make sense.

AIDatabasePostgreSQLData Engineering
16 min read/2 views

Web Scraping in 2026 — What Still Works After AI

What actually works for web scraping in 2026: tools, stealth browsers, AI extractors, anti-detection, and the legal reality.

PythonWeb ScrapingAutomationData Engineering