Voice Input
Speak to your AI coding agent instead of typing. Styrby captures audio on the mobile app, sends it to a transcription service you configure, then delivers the transcribed text to your agent as a normal message. Included on Pro and Growth. Mobile only.
Styrby does not include a transcription service and never processes or stores your audio. Audio goes directly from your phone to whichever transcription endpoint you configure. Your code conversations stay private.
Overview
| Property | Value |
|---|---|
| Tier requirement | Pro and Growth |
| Platform | iOS and Android (mobile only) |
| Audio storage | Never stored by Styrby |
| Audio routing | Phone to your transcription endpoint directly |
| Transcription service | Provided by you (not included in Styrby) |
Requirements
To use Voice Input you need two things:
- A transcription endpoint that accepts audio and returns text. The endpoint must accept
multipart/form-dataPOST requests with afilefield and return a text response. The OpenAI Whisper API format is the default. - An API key or access token for that service, if required.
The recommended service is the OpenAI Whisper API at $0.006 per minute. A two-minute voice command costs less than two cents. Self-hosted alternatives are also supported for teams with stricter privacy requirements.
Setting Up with OpenAI Whisper
OpenAI's Whisper API is the easiest way to get started. It requires an OpenAI account and a few minutes of setup.
- Sign in at platform.openai.com
- Go to API Keys in the left sidebar
- Click Create new secret key, name it "Styrby Voice", and copy the key
- Open the Styrby mobile app and go to Settings > Voice Input
- Set Transcription Endpoint to:
https://api.openai.com/v1/audio/transcriptions
- Set API Key to the key you copied in step 3
- Set Model to
whisper-1 - Tap Save and then Test Microphone to confirm transcription works
The API key is stored securely in your device keychain. It is never sent to Styrby servers.
Self-Hosted Transcription
If you prefer not to send audio to a third-party cloud service, you can run a local Whisper server. Two popular options:
Whisper.cpp (fastest on Apple Silicon)
Whisper.cpp is a C++ port of OpenAI Whisper that runs a local HTTP server. It is the fastest self-hosted option on Apple Silicon Macs.
# Clone and build git clone https://github.com/ggerganov/whisper.cpp cd whisper.cpp make # Download a model (base is fast and accurate enough for voice commands) bash ./models/download-ggml-model.sh base.en # Start the server on port 8080 ./server -m models/ggml-base.en.bin --port 8080
Then in Styrby mobile Settings > Voice Input, set:
- Transcription Endpoint:
http://your-mac-ip:8080/inference - API Key: leave blank (no auth required for local server)
Your phone and Mac must be on the same network for a local server to be reachable from the mobile app.
Faster Whisper (Docker)
Faster Whisper uses CTranslate2 for significantly faster inference than the original Whisper model, and is easy to run with Docker.
# Run Faster Whisper with the whisper-asr-webservice image docker run -d -p 9000:9000 -e ASR_MODEL=base onerahmet/openai-whisper-asr-webservice:latest
In Styrby mobile Settings > Voice Input, set the endpoint to:
http://your-server-ip:9000/asr
Configuration Options
All voice settings are under Settings > Voice Input in the Styrby mobile app.
| Setting | Options | Notes |
|---|---|---|
| Input mode | Hold to talk / Toggle | Hold to talk: press and hold the mic button while speaking, release to transcribe. Toggle: tap once to start, tap again to stop. |
| Transcription endpoint | URL string | Must accept multipart/form-data with a file field and return a text response. |
| API key | String | Sent as Authorization: Bearer <key>. Leave blank for unauthenticated local servers. |
| Model | String | Passed as the model field in the form data. Use whisper-1 for the OpenAI API. |
| Language | Auto / ISO 639-1 code | Auto-detect is recommended. Set a specific language code (e.g., en) to improve accuracy for short technical phrases. |
| Request timeout | 5 to 30 seconds | Increase if transcription requests fail on slow connections or for longer recordings. |
Troubleshooting
No transcription returned
- Confirm the endpoint URL is correct and reachable from your phone. Open a browser on the same network and navigate to the URL to check connectivity.
- Check that the API key is valid. For OpenAI, verify the key has not been revoked in the API Keys dashboard.
- Confirm microphone permission is granted to the Styrby app in your phone's system settings (Settings > Styrby > Microphone).
Garbled or inaccurate transcription
- Speak clearly and at a moderate pace. Background noise significantly reduces accuracy.
- Set a specific language code in Settings > Voice Input > Language instead of using auto-detect. Technical jargon like variable names and framework names transcribes more accurately with a known language hint.
- If using a self-hosted model, try a larger model variant (e.g.,
smallormediuminstead ofbase).
Request timeout
- Increase the timeout in Settings > Voice Input > Request Timeout. A 10-second timeout is usually enough for cloud services. Self-hosted servers on first load may need up to 20 seconds while the model warms up.
- For self-hosted servers, confirm the server is running and not sleeping. Containers on low-memory hosts sometimes get killed under load.
Voice Input option not available
Voice Input requires the Pro or Growth tier. If the setting is grayed out, upgrade your plan from Settings > Plan in the mobile app or from the Pricing page on the web dashboard.