[proxy] github.com← back | site home | direct (HTTPS) ↗ | proxy home | ◑ dark◐ light

GitHub - NickTikhonov/shuo: sub-500ms latency phone agent orchestration

NickTikhonov

shuo θ―΄

A voice agent framework in ~600 lines of Python.

python main.py +1234567890
πŸš€ Server starting on port 3040
βœ“  Ready https://mature-spaniel-physically.ngrok-free.app
πŸ“ž Calling +1234567890...
βœ“  Call initiated SID: CA094f2e...
πŸ”Œ WebSocket connected
β–Ά  Stream started SID: MZ8a3b1f...
← Flux EndOfTurn "Hey, how's it going?"
β—† LISTENING β†’ RESPONDING
β†’ Start Agent "Hey, how's it going?"
← Agent turn done
β—† RESPONDING β†’ LISTENING

How it works

Two abstractions, one pure function:

  • Deepgram Flux β€” always-on STT + turn detection over a single WebSocket
  • Agent β€” self-contained LLM β†’ TTS β†’ Player pipeline, owns conversation history
  • process_event(state, event) β†’ (state, actions) β€” the entire state machine in ~30 lines

Everything streams. LLM tokens feed TTS immediately, TTS audio feeds Twilio immediately. If you interrupt (barge-in), the agent cancels everything and clears the audio buffer instantly.

LISTENING ──EndOfTurn──→ RESPONDING ──Done──→ LISTENING
    ↑                        β”‚
    └────StartOfTurnβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  (barge-in)

Project structure

shuo/
  types.py              # Immutable state, events, actions
  state.py              # Pure state machine (~30 lines)
  conversation.py       # Main event loop
  agent.py              # LLM β†’ TTS β†’ Player pipeline
  log.py                # Colored logging
  server.py             # FastAPI endpoints
  services/
    flux.py             # Deepgram Flux (STT + turns)
    llm.py              # OpenAI GPT-4o-mini streaming
    tts.py              # ElevenLabs WebSocket streaming
    tts_pool.py         # TTS connection pool (warm spares)
    player.py           # Audio playback to Twilio
    twilio_client.py    # Outbound calls + message parsing

Setup

Requires Python 3.9+, ngrok, and API keys for Twilio, Deepgram, OpenAI, and ElevenLabs.

pip install -r requirements.txt
cp .env.example .env   # fill in your keys
ngrok http 3040        # in another terminal
python main.py +1234567890

Tests

python -m pytest tests/ -v   # runs in ~0.03s

License

MIT