The unfiltered account of week one — from buying the domain to a working Hindi-English voice agent in 7 days. Every decision, every mistake, every API key.
I've been a product designer for 4 years. I know how to design software. I do not know how to build it. At least, I didn't before this week.
Here's exactly what happened in week one of building DeskSathi — the AI voice receptionist I'm building for Indian businesses.
Started with desksathi.com — tried about 40 variations of "vaani" and every good domain was taken. DeskSathi came from an hour of frustration with Claude, who kept suggesting names that were already registered.
Set up 6 accounts: Twilio (phone), Deepgram (speech to text), ElevenLabs (text to speech), Anthropic (the AI brain), Supabase (database), Railway (server). Total: ₹0 — all free tiers.
Twilio's free US number works immediately. Indian numbers require KYC which takes 48 hours. Start that process on Day 1, not Day 3.
Deployed N8n on Railway — it's a drag-and-drop workflow tool that acts as the conductor between all the APIs. Connected Twilio to N8n: when someone calls the number, N8n wakes up.
Day 3 was the hardest: WebSockets. Audio streaming from Twilio to Deepgram in real time. This part required actual code — Claude wrote it, I pasted it, it didn't work, Claude debugged it, I pasted again. Took 6 hours instead of the planned 3.
Day 4: Claude API responding to transcribed speech. Asked it "mujhe appointment chahiye" and it replied in text. No voice yet — just text on a screen. Still exciting.
Day 5: ElevenLabs speaking Claude's text back to the caller. This was the moment. I called the Twilio number, said something in Hindi, and heard a real voice reply. The system prompt said it was the receptionist at "Smile Dental, Indirapuram." It actually sounded like a receptionist.
"Namaste, Smile Dental mein aapka swagat hai. Kaise help kar sakti hoon?" — in a real human-sounding Hindi voice, responding to what I said. I recorded a Loom immediately.
Ran 30 test calls. Most worked. Some broke in embarrassing ways — the AI suggested an appointment time that the clinic "doesn't have" and then when I asked again, invented a new time. Fixed with better prompt engineering.
Added barge-in handling (when the caller interrupts, the AI stops talking). Added Google Calendar booking. Recorded a 4-minute Loom showing the full flow: call → Hindi conversation → Google Calendar appointment → WhatsApp confirmation.
1. Claude Code is the unlock. I'm a designer, not a developer. Without Claude Code writing the Node.js, I'd have taken 6 weeks for what took 7 days.
2. Audio streaming is harder than everything else. Every other part was API calls. Audio streaming in real time is a different beast. Budget more time.
3. The demo IS the product at this stage. One 4-minute Loom showing a real booking is worth 40 slides of pitch deck.
4. Total spend: ₹3,200. Domain (₹1,700), Twilio number ($1.15), API testing credits. The rest was free tier.