AI-powered travel negotiation agents that find local vendors, call them, and negotiate prices in Hindi/Hinglish
DesiYatra is a sophisticated multi-agent AI system designed to automate travel service negotiations in India. The system finds local vendors (taxi drivers, hotels, tour guides), makes phone calls in Hindi/Hinglish, and negotiates the best prices on your behalf.
- ๐ค Multi-Agent Orchestration: 4 specialized agents (Scout, Safety, Bargainer, Munshi) working in harmony.
- ๐ฃ๏ธ Voice-Native Negotiation: Conducts real phone calls in Hindi/Hinglish with natural Indian accents.
- ๐ง Smart Bargaining: Uses Vector Search to retrieve negotiation tactics and adapts to vendor psychology.
- ๐ก๏ธ Safety First: Vets vendors against a fraud database and analyzes call transcripts in real-time.
- โก High Performance: Parallel vendor discovery and async execution for fast results.
- ๐ Full Observability: Real-time session tracking with Redis and persistent storage in Supabase.
- User Request: User submits a trip request with destination, budget, party size, and requirements list.
- Scout Agent (The Finder):
- Parallel Search: Executes Google Maps Grounding, Google Search Grounding, JustDial (simulated), and IndiaMart (simulated) concurrently using
ParallelAgent. - Deduplication: Merges results and removes duplicates based on phone numbers.
- Market Rate Calculation: Automatically calculates market rate from vendor quotes.
- Result: A deduplicated list of 10-15 potential vendors with phone numbers and estimated market rate.
- Parallel Search: Executes Google Maps Grounding, Google Search Grounding, JustDial (simulated), and IndiaMart (simulated) concurrently using
- Safety Officer (The Vetter):
- Queries BigQuery data warehouse for vendor history.
- Uses custom
SafetyDecisionPlannerfor nuanced risk assessment. - Assigns risk scores and filters unsafe vendors.
- Result: A filtered list of "Safe" vendors to contact.
- Bargainer Agent (The Negotiator):
- Call Initiation: Creates Twilio call with immediate WebSocket connection via Media Streams.
- Hybrid Voice System:
- TTS: Sarvam AI Bulbul (Hindi, natural Indian accent) via streaming WebSocket.
- STT: Google Speech-to-Text with
phone_callmodel optimized forhi-IN.
- Negotiation Brain: LLM-powered decision engine that:
- Retrieves tactics from Vector Knowledge Base (semantic search).
- Adapts to vendor psychology (stubborn/flexible profiles).
- Uses custom
NegotiationPlannerfor strategic price decisions. - Speaks in natural Hindi/Hinglish with filler words to mask latency.
- State Management: Firestore for persistent call state, round tracking, and conversation history.
- Result: User receives confirmed bookings with negotiated prices or best available offers.
graph TD
User[User Request] --> Munshi[Munshi Orchestrator]
subgraph "Phase 1: Discovery"
Munshi --> Scout[Scout Agent]
Scout -->|Parallel Search| GMaps[Google Maps]
Scout -->|Parallel Search| GSearch[Google Search]
Scout -->|Parallel Search| JustDial[JustDial Sim]
end
subgraph "Phase 2: Safety"
Scout -->|Vendor List| Safety[Safety Officer]
Safety -->|Check History| DB[(Supabase)]
Safety -->|Risk Score| SafeList[Safe Vendors]
end
subgraph "Phase 3: Negotiation"
SafeList --> Bargainer[Bargainer Agent]
Bargainer -->|Voice Call| Twilio[Twilio]
Bargainer -->|TTS/STT| Sarvam[Sarvam AI]
Bargainer -->|Strategy| VectorDB[(Vertex AI Vector Search)]
end
Bargainer -->|Deal Confirmed| Result[Final Booking]
| Component | Technology | Purpose |
|---|---|---|
| Backend | Python 3.12 | Core application |
| AI Framework | Google ADK | Multi-agent orchestration |
| LLM | Google Gemini 2.5 Pro/Flash | Agent reasoning |
| Planners | Custom Domain Planners | Negotiation, vendor selection, safety decisions |
| Vector Search | Vertex AI Matching Engine | Semantic knowledge base search |
| Embeddings | text-embedding-004 | 768-dimensional vectors |
| API Server | FastAPI | REST API endpoints |
| Database | Supabase (PostgreSQL) | Vendor & trip data |
| Cache/State | Redis | Session state & pub/sub |
| Telephony | Twilio Media Streams | Voice calls with WebSocket streaming |
| Voice (TTS) | Sarvam AI Bulbul | Hindi text-to-speech (streaming) |
| Voice (STT) | Google Speech-to-Text | Hindi speech-to-text (phone_call model) |
| Search | Google Grounding API | Vendor discovery |
| Package Manager | uv | Fast Python package management |
- WebSocket Endpoint (
/twilio/stream/{call_id}): Handles bidirectional audio streaming - Immediate Connection: WebSocket connects when call answers, greeting plays via stream
- Hybrid Voice: Integrates Sarvam TTS + Google STT
- ParallelAgent: Runs 4 searches concurrently (Google Maps, Google Search, JustDial, IndiaMart)
- Deduplication: Removes duplicate vendors by phone number
- Market Rate: Calculates average from vendor quotes
- LlmAgent: Uses custom
SafetyDecisionPlanner - BigQuery Integration: Queries vendor history
- Risk Scoring: Green/Yellow/Red classification
- Streaming Negotiator (
streaming_negotiator.py): Async loop with 6-round max - Negotiation Brain (
negotiation_brain.py): LLM prompt engineering for Hindi/Hinglish - Voice Handler (
google_stt_voice.py):- Sarvam TTS: MP3 โ ffmpeg โ mulaw โ Twilio
- Google STT: mulaw โ base64 โ Google Cloud
- Atomic Tools (
atomic_tools.py):initiate_call,send_message,accept_deal,end_call
- NegotiationPlanner: Strategic price decisions based on vendor psychology
- VendorSelectionPlanner: Ranks vendors by rating/distance
- SafetyDecisionPlanner: Nuanced risk assessment
- Firestore: Call state, conversation history, round tracking
- Redis: Session queues (optional)
- Supabase: Vendor database, trip records
- Docker & Docker Compose (Required)
- ngrok (Required for Twilio webhooks)
- gcloud CLI (Required for GCP setup) - Install Guide
# 1. Clone repository
git clone https://github.com/VardhmanSurana/DesiYatra
cd DesiYatra
# 2. Automated GCP Setup (One-time)
python scripts/setup_gcp.py
# This will:
# - Enable required GCP APIs (Vertex AI, Firestore, Speech-to-Text, BigQuery)
# - Create service account with permissions
# - Generate gcp-credentials.json
# - Update .env file automatically
# 3. Add remaining API keys to .env
# - SARVAM_API_KEY (from https://www.sarvam.ai/)
# - TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN, TWILIO_PHONE_NUMBER
# - SUPABASE_URL, SUPABASE_KEY
# 4. Start all services
docker-compose up --build
# 5. In another terminal, start ngrok
ngrok http 8000
# Copy the https:// URL and update WEBHOOK_BASE_URL in your .env file
# 6. Restart the container to apply webhook URL
docker-compose restart agents
# API will be available at http://localhost:8000# Prerequisites: Python 3.12+, uv, ffmpeg
# 1. Clone repository
git clone https://github.com/VardhmanSurana/DesiYatra
cd DesiYatra
# 2. Install dependencies (using uv)
uv pip install -r requirements.txt
# 3. Setup Environment
cp .env.example .env
# 4. Configure API Keys in .env
# Required:
# - GOOGLE_API_KEY (Gemini)
# - TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN, TWILIO_PHONE_NUMBER
# - SUPABASE_URL, SUPABASE_KEY
# - SARVAM_API_KEY (For Hindi Voice)
# - GOOGLE_CLOUD_PROJECT (For Firestore and other GCP services)
# 5. Start Infrastructure Services (Redis & PostgreSQL)
docker-compose up -d redis postgres
# 6. Run the Application (in development mode, with auto-reload)
fastapi dev agents/main.py
# 7. Start ngrok to expose your local server
ngrok http 8000
# Copy the https:// URL and update WEBHOOK_BASE_URL in your .env fileUser Request: "I need a taxi for 4 people from Manali Bus Stand to Solang Valley tomorrow morning. Budget is โน2500."
Agent Action:
- Scout: Finds "Himalayan Cabs" (+91...) and "Manali Taxi Union" (+91...). Calculates market rate is ~โน2800.
- Safety: Vets vendors. "Himalayan Cabs" has 4.5 stars and no fraud history -> SAFE.
- Bargainer: Calls "Himalayan Cabs".
- Vendor: "Sir, โน3500 lagega." (It will cost โน3500)
- Agent: "Bhaiya, market rate toh โน2800 chal raha hai. Hum regular aate hain." (Brother, market rate is โน2800. We come regularly.)
- Vendor: "Chalo โน3000 de dena." (Okay, give โน3000.)
- Agent: "โน2900 final karte hain, abhi book kar dijiye." (Let's finalize at โน2900, book it now.)
- Vendor: "Theek hai sir." (Okay sir.)
- Result: Booking confirmed at โน2900 (saved โน600).
.venv/bin/python -m pytest tests/ -v# Test Agent Architecture (Scout + Bargainer Structure)
.venv/bin/python -m pytest tests/test_refactored_agents.py -v
# Test Bargainer Logic (Negotiation State Machine) - New full test
.venv/bin/python -m pytest tests/test_bargainer_full.py -v
# Test Database Integration
.venv/bin/python -m pytest tests/test_database.py -v
# Run live test call (as vendor)
.venv/bin/python scripts/live_test_call.py- Twilio Error 11200: Ensure
WEBHOOK_BASE_URLin.envmatches your ngrok URL. - Redis Connection Error: Ensure
docker-compose up -d redis postgresis running. - Firestore 403 Error: Ensure Google Cloud Project ID is set,
GOOGLE_APPLICATION_CREDENTIALSpoints to a valid JSON key, and Cloud Firestore API is enabled for your project. - Sarvam TTS Failure: Verify
SARVAM_API_KEYis valid. Ensurelog_levelisDEBUGfor detailed errors. ModuleNotFoundErrorduring tests: Ensure you run tests via.venv/bin/python -m pytest ...to use the correct Python environment.
# Google AI
GOOGLE_API_KEY=your_google_api_key
GOOGLE_CLOUD_PROJECT=your-gcp-project-id
GOOGLE_APPLICATION_CREDENTIALS=/path/to/gcp-credentials.json
# Sarvam AI (Hindi TTS/STT)
SARVAM_API_KEY=your_sarvam_api_key
# Twilio
TWILIO_ACCOUNT_SID=your_account_sid
TWILIO_AUTH_TOKEN=your_auth_token
TWILIO_PHONE_NUMBER=+1234567890
# Supabase
SUPABASE_URL=your_supabase_url
SUPABASE_KEY=your_supabase_key
# Webhooks
WEBHOOK_BASE_URL=https://your-ngrok-url.ngrok-free.dev# Redis (Local development via Docker)
# REDIS_URL=redis://localhost:6379
# Vector Search (Vertex AI) - For production knowledge base
# Leave commented to use mock mode during development
# VECTOR_INDEX_ENDPOINT_ID=projects/PROJECT/locations/REGION/indexEndpoints/ID
# VECTOR_DEPLOYED_INDEX_ID=your_deployed_index_id
# Application Settings
LOG_LEVEL=DEBUG # Set to DEBUG for detailed logs during development
AGENT_NAME=DesiYatraWhen creating trip requests, you must provide:
destination: Where user wants to gobudget_max: User's maximum budgetparty_size: Number of people traveling (NEW - required for accurate quotes)category: Service type (taxi, hotel, etc.)vendor_type: Specific type of vendor (e.g., "Taxi", "Hotel", "Restaurant")requirements: List of specific needs (e.g.,["one-way trip", "2 days stay", "AC seat reserved"])
Note: market_rate is now automatically calculated by Scout agent - no need to provide it!
Current: Sarvam AI Bulbul (Streaming via WebSocket)
- Approach: Real-time streaming via Twilio Media Streams
- Process:
- WebSocket connects immediately when call answers
- Greeting plays via streaming TTS (no pre-generated files)
- Subsequent responses stream in real-time
- Audio Pipeline: MP3 โ ffmpeg conversion โ mulaw โ Twilio
- Latency Optimization: LLM instructed to use Hindi filler words (e.g., "เคนเคพเค", "เคเฅ") to mask generation latency
- Voice:
hitesh(male) ormanisha(female) - Format: mulaw (Twilio compatible, 8000Hz, 8-bit, mono)
- Model: bulbul:v2
- Languages: 11 Indian languages supported
- Quality: Natural-sounding speech with authentic Indian accents
Current: Google Speech-to-Text (Streaming via Twilio Media Streams)
- Model:
phone_call(optimized for telephony audio) - Language:
hi-IN(Hindi-India) for improved Hinglish recognition - Audio Format: mulaw 8kHz from Twilio โ base64 decoded โ Google STT
- Buffering: 5-second audio buffer for better transcription accuracy
The negotiation agent uses Vertex AI Vector Search for semantic knowledge base queries. This enables finding relevant negotiation tactics based on meaning, not just keywords.
By default, the system uses mock mode with 3 pre-loaded tactics:
- Stubborn vendor handling
- Long-distance trip negotiation
- Trust building phrases
No setup required - works out of the box!
To deploy real Vector Search:
# 1. Set up Google Cloud credentials
export GOOGLE_CLOUD_PROJECT=your-project-id
gcloud auth login
# 2. Enable required APIs
gcloud services enable aiplatform.googleapis.com
# 3. Run deployment script
python scripts/setup_vector_search.py
# 4. Add IDs to .env (script will output these)
VECTOR_INDEX_ENDPOINT_ID=projects/.../indexEndpoints/...
VECTOR_DEPLOYED_INDEX_ID=your_deployed_index_idInitial tactics include:
- Stubborn vendor strategies
- High quote handling
- Group discount negotiation
- Rejection handling
- Trust building
- Closing tactics
Cost: ~$75/month for 1-2 replicas (can scale to zero when not in use)
- Native Google Grounding - Maps + Search integration
- Parallel-Sequential Scout - 4x faster vendor discovery using
ParallelAgent - Streaming Negotiation - Real-time voice conversation via WebSocket
- Atomic Tools - Composable, testable negotiation actions
- Type Safety - Pydantic schemas for all agent outputs
- Custom Domain Planners - Replaced BuiltInPlanner with specialized logic:
- NegotiationPlanner: Strategic price decisions
- VendorSelectionPlanner: Intelligent vendor ranking
- SafetyDecisionPlanner: Nuanced risk assessment
- Automatic Market Rate Calculation - No hardcoded values
- Vector Search Integration - Semantic knowledge base (Vertex AI)
- Party Size Awareness - Accurate quotes for groups
- Persistent Sessions - Firestore for call state, round tracking, and conversation history
- Async Execution - Concurrent operations with semaphores
- Immediate WebSocket Connection - Connects parallel to first message for zero latency
- Hybrid Voice System - Sarvam TTS + Google STT for optimal quality
- Robust Negotiation Brain - Validates context fields and includes specific refusal handling
- Perceived Latency Optimization - LLM instructed to use Hindi filler words for faster audio streaming
- Requirements as List - Structured multi-requirement support (e.g.,
["one-way trip", "AC vehicle"])
- Scout Agent: ~4x faster (parallel searches)
- Reliability: 85% โ 99% (deterministic workflows)
- Negotiation Quality: Rule-based โ LLM-reasoned with custom planners
- Market Rate Accuracy: Hardcoded โ Calculated from real vendor data
- Knowledge Base: Keyword matching โ Semantic vector search
- Cost: 1 LLM call โ 3-6 calls per negotiation (higher intelligence)
- TTS Latency: Reduced significantly with streaming and filler words
- Call Setup: WebSocket connects immediately (parallel to greeting)
- Multi-vendor bidding (vendors compete)
- Human-in-the-loop escalation
- More search sources (Sulekha, UrbanClap)
- Multi-language support (Tamil, Telugu, Bengali)
- Video call negotiations
- AI-powered price prediction
- Blockchain-based deal verification
- Mobile app integration
Proprietary - All rights reserved
Built with โค๏ธ for Indian travelers
For issues or questions:
- Create an issue in the repository
- Contact: iamvardhmansurana2004@gmail.com
Made in India ๐ฎ๐ณ | For India ๐ฎ๐ณ
