calliope/README.md

# Calliope

**A locally running voice transcription app built specifically for macOS. Fully MLX accelerated, fully free.**

Calliope lives in your menu bar and turns speech into text in any application. Press a hotkey, speak, and your words appear wherever your cursor is. No cloud services, no API keys, no subscriptions — everything runs on-device using Apple Silicon acceleration.

## Features

- **Menu bar native** — Runs quietly in the macOS menu bar, always one hotkey away
- **Universal text input** — Types transcribed text directly into any focused application via Quartz events or clipboard paste
- **On-device transcription** — Powered by OpenAI Whisper models via `mlx-whisper`, natively accelerated on Apple Silicon with no MPS/PyTorch overhead
- **Auto-stop on silence** — Recording stops automatically after a configurable period of silence, so you don't have to press the hotkey again
- **LLM post-processing** — Optional grammar and punctuation correction using local MLX language models
- **Live waveform overlay** — Floating visual feedback showing audio levels during recording and a pulsing indicator during transcription
- **Dual hotkey modes** — Push-to-talk (hold to record) and toggle (tap to start/stop), both fully configurable
- **Multi-language support** — Transcribe in English, Spanish, French, German, Japanese, Chinese, Korean, Portuguese, Italian, Dutch, Russian, or auto-detect
- **Context prompting** — Provide domain-specific vocabulary to improve transcription accuracy for technical or specialized content
- **Interactive setup wizard** — Rich terminal UI that walks through microphone selection, hotkey configuration, model download, and permission checks on first run
- **Configurable models** — Choose from multiple Whisper model sizes to balance speed and accuracy, from `whisper-base` to `whisper-large-v3`

## Installation

```bash
git clone https://github.com/yourname/calliope.git
cd calliope
pip install -e .
```

### Requirements

- macOS on Apple Silicon (M1 or later)
- Python 3.10+
- Accessibility permission (for typing into other apps)
- Microphone permission (for audio capture)

## Usage

```bash
calliope                # Launch (runs setup wizard on first run)
calliope setup          # Re-run the setup wizard
calliope --debug        # Launch with verbose logging
calliope --device 2 --model mlx-community/whisper-large-v3  # Override config for this session
calliope --version      # Print version
```

## Hotkeys

| Mode | Default | Behavior |
|------|---------|----------|
| Push-to-talk | `Ctrl+Shift` (hold) | Records while held, transcribes on release |
| Toggle | `Ctrl+Space` | Tap to start recording, tap again to stop and transcribe |

Hotkeys are fully configurable through the setup wizard or by editing the config file directly.

## Configuration

All settings are stored at `~/.config/calliope/config.yaml`:

```yaml
device: null                        # Microphone index (null = system default)
model: mlx-community/whisper-large-v3-turbo
language: auto                      # Language code or "auto" for detection
hotkeys:
  ptt: ctrl+shift
  toggle: ctrl+space
context: ""                         # Domain-specific terms to improve accuracy
typing_mode: char                   # "char" (keystroke simulation) or "clipboard" (Cmd+V paste)
typing_delay: 0.005                 # Seconds between keystrokes in char mode
max_recording_seconds: 300          # Maximum recording duration
silence_threshold: 0.005            # RMS energy below which audio is considered silence
auto_stop_silence: true             # Automatically stop recording after sustained silence
silence_timeout_seconds: 1.5        # Seconds of silence before auto-stop triggers
notifications: true                 # macOS notification banners
postprocessing:
  enabled: false                    # LLM grammar/punctuation correction
  model: null                       # Active MLX model
  system_prompt: "..."              # Custom post-processing instructions
debug: false
```

CLI flags override config values for that session.

## Available Models

All models are sourced from Hugging Face and run natively via `mlx-whisper` on Apple Silicon.

| Model | Size | Speed | Accuracy |
|-------|------|-------|----------|
| `mlx-community/whisper-base` | ~150 MB | Fastest | Basic |
| `mlx-community/whisper-small` | ~500 MB | Fast | Good |
| `mlx-community/whisper-medium` | ~1.5 GB | Moderate | Better |
| `mlx-community/whisper-large-v3-turbo` | ~1.6 GB | Fast | High (default) |
| `mlx-community/whisper-large-v3` | ~3 GB | Slower | Highest |

## Troubleshooting

**"Status: Model load failed"**
Verify you have sufficient disk space and RAM for the selected model. Run with `--debug` for detailed error logs.

**No text appears after transcription**
Confirm that Accessibility permission is granted in System Settings > Privacy & Security > Accessibility. Restart Calliope after granting.

**Wrong microphone selected**
Run `calliope setup` to choose a different input device, or set the `device` index in the config file. Use `python -m sounddevice` to list available devices.

**Hotkeys not responding**
Ensure no other application is capturing the same key combination. Reconfigure hotkeys via `calliope setup`.

## License

TBD