Voice Input

Speak to your AI coding agent instead of typing. Styrby captures audio on the mobile app, sends it to a transcription service you configure, then delivers the transcribed text to your agent as a normal message. Included on Pro and Growth. Mobile only.

Styrby does not include a transcription service and never processes or stores your audio. Audio goes directly from your phone to whichever transcription endpoint you configure. Your code conversations stay private.

Overview

Property	Value
Tier requirement	Pro and Growth
Platform	iOS and Android (mobile only)
Audio storage	Never stored by Styrby
Audio routing	Phone to your transcription endpoint directly
Transcription service	Provided by you (not included in Styrby)

Requirements

To use Voice Input you need two things:

A transcription endpoint that accepts audio and returns text. The endpoint must accept multipart/form-data POST requests with a file field and return a text response. The OpenAI Whisper API format is the default.
An API key or access token for that service, if required.

The recommended service is the OpenAI Whisper API at $0.006 per minute. A two-minute voice command costs less than two cents. Self-hosted alternatives are also supported for teams with stricter privacy requirements.

Setting Up with OpenAI Whisper

OpenAI's Whisper API is the easiest way to get started. It requires an OpenAI account and a few minutes of setup.

Sign in at platform.openai.com
Go to API Keys in the left sidebar
Click Create new secret key, name it "Styrby Voice", and copy the key
Open the Styrby mobile app and go to Settings > Voice Input

Set Transcription Endpoint to:

https://api.openai.com/v1/audio/transcriptions

Set API Key to the key you copied in step 3
Set Model to whisper-1
Tap Save and then Test Microphone to confirm transcription works

The API key is stored securely in your device keychain. It is never sent to Styrby servers.

Self-Hosted Transcription

If you prefer not to send audio to a third-party cloud service, you can run a local Whisper server. Two popular options:

Whisper.cpp (fastest on Apple Silicon)

Whisper.cpp is a C++ port of OpenAI Whisper that runs a local HTTP server. It is the fastest self-hosted option on Apple Silicon Macs.

# Clone and build
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
make

# Download a model (base is fast and accurate enough for voice commands)
bash ./models/download-ggml-model.sh base.en

# Start the server on port 8080
./server -m models/ggml-base.en.bin --port 8080

Then in Styrby mobile Settings > Voice Input, set:

Transcription Endpoint: http://your-mac-ip:8080/inference
API Key: leave blank (no auth required for local server)

Your phone and Mac must be on the same network for a local server to be reachable from the mobile app.

Faster Whisper (Docker)

Faster Whisper uses CTranslate2 for significantly faster inference than the original Whisper model, and is easy to run with Docker.

# Run Faster Whisper with the whisper-asr-webservice image
docker run -d -p 9000:9000   -e ASR_MODEL=base   onerahmet/openai-whisper-asr-webservice:latest

In Styrby mobile Settings > Voice Input, set the endpoint to:

http://your-server-ip:9000/asr

Configuration Options

All voice settings are under Settings > Voice Input in the Styrby mobile app.

Setting	Options	Notes
Input mode	Hold to talk / Toggle	Hold to talk: press and hold the mic button while speaking, release to transcribe. Toggle: tap once to start, tap again to stop.
Transcription endpoint	URL string	Must accept `multipart/form-data` with a `file` field and return a text response.
API key	String	Sent as `Authorization: Bearer <key>`. Leave blank for unauthenticated local servers.
Model	String	Passed as the `model` field in the form data. Use `whisper-1` for the OpenAI API.
Language	Auto / ISO 639-1 code	Auto-detect is recommended. Set a specific language code (e.g., `en`) to improve accuracy for short technical phrases.
Request timeout	5 to 30 seconds	Increase if transcription requests fail on slow connections or for longer recordings.

Troubleshooting

No transcription returned

Confirm the endpoint URL is correct and reachable from your phone. Open a browser on the same network and navigate to the URL to check connectivity.
Check that the API key is valid. For OpenAI, verify the key has not been revoked in the API Keys dashboard.
Confirm microphone permission is granted to the Styrby app in your phone's system settings (Settings > Styrby > Microphone).

Garbled or inaccurate transcription

Speak clearly and at a moderate pace. Background noise significantly reduces accuracy.
Set a specific language code in Settings > Voice Input > Language instead of using auto-detect. Technical jargon like variable names and framework names transcribes more accurately with a known language hint.
If using a self-hosted model, try a larger model variant (e.g., small or medium instead of base).

Request timeout

Increase the timeout in Settings > Voice Input > Request Timeout. A 10-second timeout is usually enough for cloud services. Self-hosted servers on first load may need up to 20 seconds while the model warms up.
For self-hosted servers, confirm the server is running and not sleeping. Containers on low-memory hosts sometimes get killed under load.

Voice Input option not available

Voice Input requires the Pro or Growth tier. If the setting is grayed out, upgrade your plan from Settings > Plan in the mobile app or from the Pricing page on the web dashboard.

OTEL Metrics Export Troubleshooting