In April 2023, I had 14 browser tabs open. Every morning. Same routine: check hellojob.az, then vakansiya.az, then banker.az, then boss.az, then scroll through LinkedIn, then start over because I forgot which ones I'd already seen. I was a fraud analyst at Kapital Bank in Baku, and the job search process in Azerbaijan was broken in a way that personally drove me insane.
Three years later, birjob.com aggregates 10,000+ active job listings from 99 sources, runs on $25/month of infrastructure, and gets scraped by my own army of 128 Python scripts organized across a dedicated GitHub organization. I built every line of it alone.
This is the story of how a web scraping hobby turned into Azerbaijan's most comprehensive job platform. It's also a story about making every mistake in the book first.
The Numbers That Matter
Before the narrative, here's what birjob.com looks like today:
| Metric | Value |
|---|
| Active job listings | 10,000+ |
| Job sources scraped | 99 |
| Candidate profiles scraped | 30,000+ |
| Database tables | 45+ |
| Database indexes | 50+ |
| Monthly infrastructure cost | ~$25 |
| Scraper runtime (full pipeline) | 3-4 minutes |
| Throughput | 35.7 requests/second |
| Blog articles published | 459 |
| Revenue from VC funding | $0 |
The global job aggregation market hit $7.4 billion in 2025 and is projected to reach $16.2 billion by 2032. Azerbaijan's job market has a 5 million labor force with roughly 5.6% unemployment. Every one of those job seekers faces the same problem I had: fragmentation across dozens of local boards with no unified search.
Year 0: The Email Scraper Era (2022-2023)
It started with FOMO. Not startup FOMO. Job FOMO. The fear that the perfect vacancy was sitting on some obscure Azerbaijani job board I hadn't checked that day.
So I did what any developer would do. I wrote a script.
The first version was embarrassing in hindsight. A Python script that scraped about 70 job boards and dumped everything into an email. No database. No deduplication. Just raw listings, keyword-filtered, blasted to my inbox every morning.
The problems were immediate:
- 200 listings per email. Nobody reads that.
- Massive duplicates. The same job posted on 5 boards appeared 5 times.
- No state tracking. I couldn't tell which jobs were new vs. ones I'd already seen yesterday.
- No search. Want to filter by company? Scroll through the email.
But here's what I didn't expect: I started sharing these emails with friends. They loved it. Not because the product was good -- it was terrible -- but because the problem was real. Everyone in Baku was doing the same 14-tab morning ritual.
That was the signal. Not a good product. A good problem.
Year 1: Building 91 Scrapers Before Launch (2023)
I rebuilt everything. Proper database (PostgreSQL). Proper deduplication. A real web interface instead of email dumps. I wanted to aggregate every job board in Azerbaijan before launching.
So I built 91 scrapers.
Each one targeting a different website. Each one living in its own repository under the analisto GitHub organization. 128 repos total now, but the core scraping fleet was 91 at launch.
Here's what each scraper looks like in practice:
async def scrape_jobs(session: aiohttp.ClientSession) -> list[dict]:
url = "https://target-site.az/api/vacancies"
async with session.get(url) as response:
data = await response.json()
return [
{
"title": normalize_title(item["title"]),
"company": item["company"],
"apply_link": item["url"],
"source": "target_site_az",
}
for item in data["results"]
]
Simple, right? Multiply that by 91. Then maintain all of them. Then watch half of them break within the first month.
I'm not exaggerating. Half broke within the first month after launch. CSS selectors changed. APIs got versioned. Sites added Cloudflare protection. One site migrated from REST to GraphQL overnight.
This was the first hard lesson: each scraper is an ongoing maintenance commitment, not a one-time build.
The Technical Decisions That Saved Me
When you're a solo founder running $25/month infrastructure, every architectural decision matters. Here are the ones that actually made a difference.
Why Not Airflow or Kafka?
I process 2,000-7,000 jobs per run. That's CSV-scale data. Airflow is for complex DAGs with branching dependencies. Kafka is for millions of events per second. Using either would be like driving a semi truck to pick up groceries.
My stack:
| Component | Tool | Cost |
|---|
| Orchestration | GitHub Actions (cron at 08:00 UTC) | Free (2,000 min/month) |
| Database | PostgreSQL on Neon | ~$5/month |
| Frontend | Next.js 14 on Vercel | $20/month (Pro) |
| CDN | Cloudflare | Free |
| Object Storage | AWS S3 | ~$0.50/month |
| Email | Resend | Free tier |
| Error Tracking | Sentry | Free tier |
| AI Analytics | Google Gemini 2.5 Flash | Free tier |
| Notifications | Telegram Bot | Free |
Total: ~$25/month. That's not frugality as a lifestyle choice. That's necessity. Azerbaijan's startup ecosystem ranks 74th globally, and total VC investment across all startups in the country was $2.6 million in 2025. Not per startup. Total. For context, a single seed round in San Francisco can be 10x that.
When you're building in a VC desert, $25/month infrastructure isn't cute. It's survival.
Why Not Playwright?
This was a big one. The instinct is to throw a headless browser at every "dynamic" website. But here's what I learned: 95% of sites that look dynamic have hidden JSON APIs.
Open DevTools. Click the Network tab. Filter by XHR/Fetch. Reload the page. That API endpoint sitting right there will return clean JSON in 200ms. Playwright would take 5-30 seconds for the same data, after loading JavaScript, rendering the DOM, and downloading images you don't need.
I only use Playwright for 2 out of 99 scrapers. The rest hit APIs directly. The speed difference is staggering:
| Approach | Speed |
|---|
| Synchronous requests | 2.2 requests/second |
| Async with aiohttp | 35.7 requests/second |
| Playwright | ~0.1-0.3 pages/second |
That 15x improvement from going async is the difference between a 3-minute pipeline and a 45-minute one.
The Three-Level Deduplication System
Duplicates were the bane of the email scraper era. The same job posted on boss.az, hellojob.az, and linkedin.com would appear three times. Users hated it.
I built a three-level system:
Level 1: Python-level normalization. Before anything hits the database, normalize job titles (strip extra spaces, lowercase, remove special characters) and deduplicate within each batch using a company + title hash.
Level 2: Database UPSERT. A unique constraint on apply_link means the database rejects true duplicates. On conflict, it updates last_seen_at instead of inserting a new row.
Level 3: Cross-source hash matching. The same job posted on different boards gets different URLs but identical content. An MD5 hash of the normalized title + company catches 85%+ of these cross-posted duplicates.
I originally tried embedding-based similarity matching with vector comparisons. Fancy. Expensive. Slow. The MD5 approach catches nearly as many duplicates at a fraction of the cost.
The Database Disaster
Every developer has that story. Mine happened when a scraper bug triggered a hard delete that wiped 3,000 jobs from the database in seconds.
No soft delete flag. No backup recovery. Just gone.
The fix was simple but painful to learn: soft deletes everywhere. Every record now has an is_active flag and a deleted_at timestamp. When a scraper bugs out (and they will), the damage is reversible.
The schema now looks something like:
CREATE TABLE jobs (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
company TEXT,
apply_link TEXT UNIQUE,
source TEXT NOT NULL,
content_hash TEXT,
is_active BOOLEAN DEFAULT true,
first_seen_at TIMESTAMP DEFAULT now(),
last_seen_at TIMESTAMP DEFAULT now(),
deleted_at TIMESTAMP
);
CREATE INDEX idx_jobs_active ON jobs(is_active) WHERE is_active = true;
CREATE INDEX idx_jobs_source ON jobs(source);
CREATE INDEX idx_jobs_hash ON jobs(content_hash);
45 tables. 50+ indexes. All running on a $5/month Neon instance. PostgreSQL is absurdly capable at this scale. (If you're building data systems and considering RAG, read my piece on why RAG is harder than it sounds.)
The Maintenance Tax
Building scrapers is fun. Maintaining them is not.
Here's the reality of running 99 scrapers as a solo developer:
| Failure Type | How Often |
|---|
| CSS selector changes | 2-3 times/month |
| API endpoint changes | 1-2 times/month |
| Cloudflare/IP blocking | Ongoing (15 scrapers currently disabled) |
| Site goes completely offline | Once per quarter |
| REST to GraphQL migration | Rare but devastating |
| Rate limiting (429 errors) | Weekly |
I spend 3-5 hours per week just fixing broken scrapers. That's not product development. That's janitorial work. But it's the cost of doing business when your product is built on top of other people's websites.
15 scrapers are currently disabled because the sites added Cloudflare protection that blocks GitHub Actions' IP ranges. That's roughly 15% of my sources, just gone, because one company decided to upgrade their anti-bot defenses on a Tuesday afternoon.
If I were starting over, I'd start with 20 sources instead of 91. Each scraper is a long-term liability, not an asset.
The SEO Machine
Here's something most technical founders miss: having the best product doesn't matter if nobody can find it.
I learned this the hard way. BirJob had thousands of jobs but almost no organic traffic for the first six months. So I built a content machine.
459 blog articles on birjob.com/blog. Career guides, salary comparisons, job market analysis. Written in both Azerbaijani and English to capture local search traffic.
But that wasn't enough. I needed backlinks.
So I built reklamyeri.az -- a digital marketing services platform. (I wrote a separate article about how reklamyeri.az works as a backlink engine.) Not because I wanted to be in the marketing business, but because a second domain linking to birjob.com creates authentic backlink signals that Google respects.
I also used Medium strategically. Articles like "How I Built Azerbaijan's Biggest Job Platform on $0 Infrastructure" serve a dual purpose: they tell the founder story (which builds brand), and they link back to birjob.com (which builds SEO authority).
The compounding effect of content + backlinks + a real product with daily-updated data is powerful. Google likes sites that change frequently and have genuine utility. A job board that updates 10,000+ listings daily is exactly that.
The Solo Founder Reality
36.3% of new startups in 2025 were founded by solo founders, up from 23.7% in 2019. The solo founder model is growing fast, especially in bootstrapped companies where 38% have solo founders.
What nobody tells you about being a solo founder:
You're not a developer anymore. You're a developer, product manager, designer, marketer, support team, content writer, SEO specialist, and accountant. The code is maybe 30% of the work.
The loneliest moment isn't building. It's when something breaks at 2am and there's nobody to page. No co-founder to share the panic with. Just you and a stack trace.
"I can build it" is a trap. I spent years building features nobody asked for. The real skill isn't coding. It's figuring out what to build. As I wrote in my startup journey reflection: "Can a single developer -- gifted at building but inexperienced at selling -- start several products and eventually discover a business that pays? Yes, but only if you learn to replace the 'I can build it' faith with 'people will pay for this' validation early and repeatedly."
ScrapingBee, probably the closest comparison to my journey, took 5 years and multiple failed products before hitting $1M ARR. They eventually sold for eight figures. The timeline for scraping-based businesses is long.
What I'd Do Differently
If I started birjob.com today with everything I know:
1. Start with 20 scrapers, not 91. Cover the top job boards and prove demand before committing to the long tail. Each scraper is a maintenance liability.
2. Build the audience before the product. Those 459 blog articles should have come first. SEO compounds over months. Starting content early means organic traffic is waiting when the product launches.
3. Charge something from day one. Even AZN 5/month. Not for revenue -- for signal. Paying users tell you what matters. Free users tell you what's nice to have.
4. Don't overbuild the deduplication system. I started with vector embeddings and similarity matching. An MD5 hash does 85% of the work at 1% of the complexity. Start simple.
5. Accept that 15% of your scrapers will be broken at any given time. Plan for it. Communicate it. Users don't need every source -- they need the best sources to be reliable.
Building in Azerbaijan's Startup Desert
I want to be honest about the context. Azerbaijan's startup ecosystem is real but young.
The country ranks 74th globally for startups. Baku specifically ranks 297th among cities. Total venture capital investment across all Azerbaijani startups in 2025 was $2.6 million -- split among 22 startups.
For comparison, Armenia ranks 54th globally and offers 0% corporate tax for software companies. Kazakhstan has 1,500+ startups with $130 million in venture investments.
Azerbaijan's Civil Code lacks mechanisms supporting venture capital and business angel funding. There's no legal framework for SAFEs or convertible notes -- the basic instruments that power Silicon Valley fundraising.
But here's the thing: constraints create clarity. When there's no VC to chase, you build things that make money. When there's no startup ecosystem to lean on, you learn everything yourself. When your infrastructure budget is $25/month, you choose tools that actually work instead of tools that look good on a pitch deck.
The government is trying. KOBIA has certified 247 SMEs as startups since 2021. There are grants of 2.5 million manats distributed to 130 startups. Three VC funds launched between 2022 and 2024. An AI Strategy for 2025-2028 was approved.
But the reality for most Azerbaijani founders in 2026 is still: bootstrap or don't start.
What I Actually Think
Four years in, here's my honest take.
Web scraping is the best business model nobody respects. It sounds unglamorous. "You just... copy data from websites?" Yes. And that data, cleaned, deduplicated, and presented well, is more valuable than most SaaS products I've seen funded for millions.
ScrapingBee bootstrapped to $5M ARR and an eight-figure exit. ScraperAPI grew to $400K/month. The data extraction market isn't small -- it's just quiet.
Solo founding is underrated for local-market products. If I had a co-founder, we'd have argued about whether to target Azerbaijan or go global. Going global with a job aggregator means competing with Indeed, LinkedIn, and Glassdoor. Going local means I have no real competition and deep understanding of the market. A solo founder with local knowledge can move faster than a well-funded team without it.
$25/month infrastructure is not a limitation -- it's a competitive advantage. My burn rate is essentially zero. I can't run out of money. I can't be pressured by investors to pivot. I can experiment for years because the cost of being wrong is a weekend of debugging, not a board meeting.
Azerbaijan will have its breakout startup within 5 years. The infrastructure is being built. The talent pool is growing. The government is (slowly) catching up on regulation. Someone building in Baku right now, probably on $25/month of free-tier infrastructure, will build something that puts the ecosystem on the map.
Maybe it'll be birjob. Maybe it won't. But I know this: the problem is real, the market is growing, and I'm not stopping.
Sources
- Research and Markets -- Online Job Search Platform Market
- World Bank -- Azerbaijan Labor Force Data
- Trading Economics -- Azerbaijan Labor Statistics
- StartupBlink -- Azerbaijan Startup Ecosystem
- Caliber.az -- What's Holding Back Azerbaijan's Startups
- Carta -- Solo Founders Report 2025
- SaaStr -- Solo Founder Statistics
- ScrapingBee -- Journey to $1M ARR
- Failory -- ScraperAPI Interview
- BirJob -- Solo Data Pipeline: 99 Sources