Initial commit: Calliope voice-to-text macOS menu bar app

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
syntaxbullet
2026-02-17 15:08:53 +01:00
commit 7cbf2d04a9
15 changed files with 1431 additions and 0 deletions

22
.gitignore vendored Normal file
View File

@@ -0,0 +1,22 @@
__pycache__/
*.py[cod]
*$py.class
*.so
*.egg-info/
*.egg
dist/
build/
.eggs/
*.whl
.venv/
venv/
env/
.env
*.log
.DS_Store
*.swp
*.swo
*~
.idea/
.vscode/
*.iml

42
CLAUDE.md Normal file
View File

@@ -0,0 +1,42 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## What is Calliope?
A macOS menu bar app for local voice-to-text. Users press a hotkey, speak, and transcribed text is typed into the focused app. Runs entirely offline using Whisper models via Hugging Face Transformers + PyTorch.
## Setup & Running
```bash
pip install -e . # Install in dev mode
calliope # Launch (runs setup wizard on first run)
calliope setup # Re-run setup wizard
calliope --debug # Launch with debug logging
calliope --device 2 --model openai/whisper-large-v3 # Override config
```
No test suite or linter is configured yet.
## Architecture
**Entry point:** `calliope/cli.py` → Click CLI → `calliope/app.py:main()`
**Data flow:** Hotkey press → Record audio → Transcribe with Whisper → Type into focused app
Key modules in `calliope/`:
- **app.py** — `CalliopeApp(rumps.App)`: main orchestrator, manages menu bar UI and coordinates all components
- **recorder.py** — Audio capture via `sounddevice` at 16kHz mono float32, with chunk consolidation
- **transcriber.py** — Whisper STT using HF `transformers.pipeline("automatic-speech-recognition")`
- **hotkeys.py** — `HotkeyListener` using `pynput`: supports push-to-talk (Ctrl+Shift hold) and toggle (Ctrl+Space) modes
- **typer.py** — Outputs text via Quartz CGEvents (character mode) or clipboard paste (Cmd+V)
- **overlay.py** — `WaveformOverlay`: floating NSPanel with scrolling waveform during recording, pulsing dots during transcription
- **setup_wizard.py** — Rich-based interactive first-run config (mic, hotkeys, model download)
- **config.py** — Loads/saves YAML config at `~/.config/calliope/config.yaml`
## Platform Constraints
- **macOS only** — uses `pyobjc` bindings (Quartz, AppKit, AVFoundation, ApplicationServices)
- **MPS (Apple Silicon):** must use float32, not float16 (causes garbled Whisper output)
- Requires Accessibility and Microphone permissions in macOS System Settings

21
LICENSE Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2026 Calliope Contributors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

86
README.md Normal file
View File

@@ -0,0 +1,86 @@
# Calliope
Voice-to-text for macOS — speak and type into any app.
Calliope sits in your menu bar, listens when you hold a hotkey, transcribes your speech with Whisper, and types the result into whatever app is focused. No cloud, no API keys — everything runs locally on your Mac.
## Installation
```bash
git clone https://github.com/yourname/calliope.git
cd calliope
pip install -e .
```
## Usage
```bash
# First run — launches the setup wizard, then starts the app
calliope
# Re-run the setup wizard
calliope setup
# Launch with overrides
calliope --device 2 --model openai/whisper-large-v3 --debug
# Print version
calliope --version
```
## Hotkeys
| Action | Default | Description |
|--------|---------|-------------|
| Push-to-talk | `Ctrl+Shift` (hold) | Records while held, transcribes on release |
| Toggle | `Ctrl+Space` | Start/stop recording |
Hotkeys are configurable via the setup wizard or `~/.config/calliope/config.yaml`.
## Permissions
Calliope needs two macOS permissions:
- **Accessibility** — to type text into other apps (System Settings > Privacy & Security > Accessibility)
- **Microphone** — to record audio (System Settings > Privacy & Security > Microphone)
The setup wizard checks for these and can open System Settings for you.
## Configuration
Config lives at `~/.config/calliope/config.yaml`:
```yaml
device: null # sounddevice index; null = system default
model: distil-whisper/distil-large-v3
hotkeys:
ptt: ctrl+shift
toggle: ctrl+space
context: "" # domain-specific terms to help Whisper
debug: false
```
CLI flags override config values for that session.
## Troubleshooting
**"Status: Model load failed"**
Check that you have enough disk space and RAM. The default model needs ~1.5 GB. Run with `--debug` for detailed logs.
**No text appears after transcribing**
Make sure Accessibility permission is granted. Restart Calliope after granting it.
**Wrong microphone**
Run `calliope setup` to pick a different input device, or set `device` in the config file. Use `python -m sounddevice` to list devices.
**Hotkeys not working**
Ensure no other app is capturing the same key combo. Customize hotkeys via `calliope setup`.
## Remaining TODOs
- LICENSE file
- Unit tests
- CI/CD pipeline
- Homebrew formula
- `.app` bundle for drag-and-drop install
- Changelog

0
calliope/__init__.py Normal file
View File

302
calliope/app.py Normal file
View File

@@ -0,0 +1,302 @@
"""Calliope — Voice-to-text macOS menu bar app."""
import logging
import os
import threading
import time
from typing import Any
# Disable tokenizers parallelism to avoid leaked semaphore warnings on shutdown.
os.environ.setdefault("TOKENIZERS_PARALLELISM", "false")
# Run offline — models are downloaded during setup, no need to hit HuggingFace on every launch.
os.environ.setdefault("HF_HUB_OFFLINE", "1")
import rumps
from calliope import config as config_mod
from calliope.recorder import Recorder
from calliope.transcriber import Transcriber
from calliope.typer import type_text, type_text_clipboard
from calliope.hotkeys import HotkeyListener
from calliope.overlay import WaveformOverlay
log = logging.getLogger(__name__)
class CalliopeApp(rumps.App):
def __init__(self, cfg: dict[str, Any] | None = None):
super().__init__("Calliope", title="\U0001f3a4", quit_button=None) # 🎤
if cfg is None:
cfg = config_mod.load()
self.cfg = cfg
self.overlay = WaveformOverlay()
self.recorder = Recorder(device=cfg.get("device"))
self.transcriber = Transcriber(
model=cfg.get("model", "distil-whisper/distil-large-v3"),
)
self.transcriber.context = cfg.get("context", "")
self.transcriber.language = cfg.get("language", "auto")
self._recording = False
self._rec_lock = threading.Lock()
self._rec_start_time: float | None = None
self._rec_timer: rumps.Timer | None = None
self.status_item = rumps.MenuItem("Status: Loading model...")
self.status_item.set_callback(None)
self.toggle_item = rumps.MenuItem("Start Recording", callback=self._on_toggle_click)
self.context_item = rumps.MenuItem("Set Whisper Context...", callback=self._on_set_context)
# Language submenu
self._lang_menu = rumps.MenuItem("Language")
current_lang = cfg.get("language", "auto")
for display_name, code in config_mod.LANGUAGES.items():
prefix = "\u2713 " if code == current_lang else " "
item = rumps.MenuItem(f"{prefix}{display_name}", callback=self._on_language_select)
self._lang_menu.add(item)
# Model submenu
self._model_menu = rumps.MenuItem("Model")
current_model = cfg.get("model", "distil-whisper/distil-large-v3")
for model_id in config_mod.MODELS:
short = model_id.split("/")[-1]
prefix = "\u2713 " if model_id == current_model else " "
item = rumps.MenuItem(f"{prefix}{short}", callback=self._on_model_select)
self._model_menu.add(item)
quit_item = rumps.MenuItem("Quit Calliope", callback=self._on_quit)
self.menu = [
self.status_item,
None,
self.toggle_item,
self.context_item,
self._lang_menu,
self._model_menu,
None,
quit_item,
]
hotkey_cfg = cfg.get("hotkeys", {})
self.hotkeys = HotkeyListener(
on_push_to_talk_start=self._start_recording,
on_push_to_talk_stop=self._stop_and_transcribe,
on_toggle=self._toggle_recording,
ptt_combo=hotkey_cfg.get("ptt", "ctrl+shift"),
toggle_combo=hotkey_cfg.get("toggle", "ctrl+space"),
)
# Load model in background
threading.Thread(target=self._load_model, daemon=True).start()
def _load_model(self) -> None:
try:
self.transcriber.load()
self.status_item.title = "Status: Ready"
self.hotkeys.start()
log.info("Model loaded, hotkeys active")
except Exception:
log.error("Failed to load model", exc_info=True)
self.status_item.title = "Status: Model load failed"
try:
rumps.notification("Calliope", "Error", "Failed to load Whisper model. Check logs.")
except RuntimeError:
pass
@staticmethod
def _activate_app():
"""Temporarily become a regular app so dialog text fields receive focus."""
from AppKit import NSApplication, NSApplicationActivationPolicyRegular
app = NSApplication.sharedApplication()
app.setActivationPolicy_(NSApplicationActivationPolicyRegular)
app.activateIgnoringOtherApps_(True)
@staticmethod
def _deactivate_app():
"""Revert to accessory app (no Dock icon)."""
from AppKit import NSApplication, NSApplicationActivationPolicyAccessory
NSApplication.sharedApplication().setActivationPolicy_(NSApplicationActivationPolicyAccessory)
def _on_set_context(self, sender) -> None:
self._activate_app()
response = rumps.Window(
message="Provide context to help Whisper with domain-specific terms, "
"names, or jargon. For example:\n\n"
"\"Meeting about Kubernetes, gRPC, and the Istio service mesh.\"",
title="Set Whisper Context",
default_text=self.transcriber.context,
ok="Save",
cancel="Clear",
dimensions=(320, 120),
).run()
if response.clicked == 1: # Save
self.transcriber.context = response.text.strip()
else: # Clear
self.transcriber.context = ""
self._deactivate_app()
ctx = self.transcriber.context
self.context_item.title = f"Set Whisper Context... ({ctx[:20]}...)" if ctx else "Set Whisper Context..."
def _on_language_select(self, sender) -> None:
display_name = sender.title.strip().lstrip("\u2713").strip()
code = config_mod.LANGUAGES.get(display_name, "auto")
self.transcriber.language = code
# Update checkmarks
for item in self._lang_menu.values():
name = item.title.strip().lstrip("\u2713").strip()
item.title = f"\u2713 {name}" if config_mod.LANGUAGES.get(name) == code else f" {name}"
self.cfg["language"] = code
config_mod.save(self.cfg)
log.info("Language set to %s (%s)", display_name, code)
def _on_model_select(self, sender) -> None:
short_name = sender.title.strip().lstrip("\u2713").strip()
# Find full model ID
model_id = None
for m in config_mod.MODELS:
if m.split("/")[-1] == short_name:
model_id = m
break
if model_id is None or model_id == self.transcriber.model:
return
# Update checkmarks
for item in self._model_menu.values():
name = item.title.strip().lstrip("\u2713").strip()
item.title = f"\u2713 {name}" if name == short_name else f" {name}"
self.cfg["model"] = model_id
config_mod.save(self.cfg)
self.status_item.title = "Status: Loading model..."
self.hotkeys.stop()
self._release_transcriber()
self.transcriber = Transcriber(model=model_id)
self.transcriber.context = self.cfg.get("context", "")
self.transcriber.language = self.cfg.get("language", "auto")
threading.Thread(target=self._load_model, daemon=True).start()
log.info("Switching model to %s", model_id)
def _release_transcriber(self) -> None:
"""Free the current Whisper model to reclaim GPU memory."""
if self.transcriber is not None:
self.transcriber._pipe = None
self.transcriber._tokenizer = None
import torch
if torch.backends.mps.is_available():
torch.mps.empty_cache()
def _on_toggle_click(self, sender) -> None:
self._toggle_recording()
def _toggle_recording(self) -> None:
if self._recording:
self._stop_and_transcribe()
else:
self._start_recording()
def _start_recording(self) -> None:
with self._rec_lock:
if self._recording:
return
self._recording = True
self._rec_start_time = time.time()
self.title = "\U0001f534 0:00" # 🔴
self.toggle_item.title = "Stop Recording"
self.status_item.title = "Status: Recording..."
self.recorder.on_audio = self.overlay.push_samples
try:
self.recorder.start()
except Exception:
log.error("Failed to start recording", exc_info=True)
with self._rec_lock:
self._recording = False
self.title = "\U0001f3a4" # 🎤
self.toggle_item.title = "Start Recording"
self.status_item.title = "Status: Mic error (check device)"
try:
rumps.notification("Calliope", "", "Microphone unavailable — check audio device")
except RuntimeError:
pass
return
self.overlay.show()
self._rec_timer = rumps.Timer(self._update_rec_duration, 1)
self._rec_timer.start()
try:
rumps.notification("Calliope", "", "Recording started")
except RuntimeError:
pass # Info.plist missing CFBundleIdentifier
log.info("Recording started")
def _stop_and_transcribe(self) -> None:
with self._rec_lock:
if not self._recording:
return
self._recording = False
if self._rec_timer:
self._rec_timer.stop()
self._rec_timer = None
duration = int(time.time() - self._rec_start_time) if self._rec_start_time else 0
self._rec_start_time = None
self.title = "\U0001f3a4" # 🎤
self.toggle_item.title = "Start Recording"
self.status_item.title = "Status: Transcribing..."
self.overlay.show_transcribing()
audio = self.recorder.stop()
try:
rumps.notification("Calliope", "", f"Recording stopped ({duration}s)")
except RuntimeError:
pass
log.info("Recording stopped, %d samples", audio.size)
threading.Thread(target=self._transcribe_and_type, args=(audio,), daemon=True).start()
def _update_rec_duration(self, timer) -> None:
if self._rec_start_time is None:
return
elapsed = int(time.time() - self._rec_start_time)
minutes, seconds = divmod(elapsed, 60)
self.title = f"\U0001f534 {minutes}:{seconds:02d}"
def _transcribe_and_type(self, audio) -> None:
try:
text = self.transcriber.transcribe(audio)
if text:
def _do_type():
try:
if self.cfg.get("typing_mode", "char") == "clipboard":
type_text_clipboard(text)
else:
type_text(text)
print(f"\n[Calliope] {text}")
log.info("Typed %d chars", len(text))
except Exception:
log.error("Typing failed", exc_info=True)
from PyObjCTools.AppHelper import callAfter
callAfter(_do_type)
self.overlay.hide()
self.status_item.title = "Status: Ready"
except Exception:
log.error("Transcription failed", exc_info=True)
self.overlay.hide()
self.status_item.title = "Status: Ready"
try:
rumps.notification("Calliope", "Error", "Transcription failed. Check logs.")
except RuntimeError:
pass
def _on_quit(self, sender) -> None:
self.hotkeys.stop()
self.recorder.stop()
# Stop overlay timers synchronously to avoid retain cycles on quit.
self.overlay.cleanup()
rumps.quit_application()
def main():
from calliope.cli import cli
cli()
if __name__ == "__main__":
main()

55
calliope/cli.py Normal file
View File

@@ -0,0 +1,55 @@
"""CLI entry point using click."""
import logging
import click
from calliope import config
@click.group(invoke_without_command=True)
@click.option("--device", type=int, default=None, help="Audio input device index.")
@click.option("--model", type=str, default=None, help="Whisper model name.")
@click.option("--context", type=str, default=None, help="Transcription context prompt.")
@click.option("--debug", is_flag=True, default=False, help="Enable debug logging.")
@click.version_option(package_name="calliope")
@click.pass_context
def cli(ctx, device, model, context, debug):
"""Calliope — Voice-to-text for macOS."""
level = logging.DEBUG if debug else logging.INFO
logging.basicConfig(
level=level,
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
)
cfg = config.load()
# CLI flags override config
if device is not None:
cfg["device"] = device
if model is not None:
cfg["model"] = model
if context is not None:
cfg["context"] = context
if debug:
cfg["debug"] = True
ctx.ensure_object(dict)
ctx.obj["cfg"] = cfg
if ctx.invoked_subcommand is None:
# First run → wizard, then launch
if not config.exists():
from calliope.setup_wizard import run as run_wizard
cfg = run_wizard()
ctx.obj["cfg"] = cfg
from calliope.app import CalliopeApp
CalliopeApp(cfg).run()
@cli.command()
def setup():
"""Re-run the setup wizard."""
from calliope.setup_wizard import run as run_wizard
run_wizard()

85
calliope/config.py Normal file
View File

@@ -0,0 +1,85 @@
"""Persistent YAML config at ~/.config/calliope/config.yaml."""
import logging
from pathlib import Path
from typing import Any
import yaml
log = logging.getLogger(__name__)
CONFIG_DIR = Path.home() / ".config" / "calliope"
CONFIG_PATH = CONFIG_DIR / "config.yaml"
DEFAULTS: dict[str, Any] = {
"device": None, # sounddevice index; None = system default
"model": "distil-whisper/distil-large-v3",
"language": "auto",
"hotkeys": {
"ptt": "ctrl+shift",
"toggle": "ctrl+space",
},
"context": "",
"debug": False,
"typing_mode": "char", # "char" or "clipboard"
}
LANGUAGES: dict[str, str] = {
"Auto": "auto",
"English": "en",
"Spanish": "es",
"French": "fr",
"German": "de",
"Japanese": "ja",
"Chinese": "zh",
"Korean": "ko",
"Portuguese": "pt",
"Italian": "it",
"Dutch": "nl",
"Russian": "ru",
}
MODELS: list[str] = [
"distil-whisper/distil-large-v3",
"openai/whisper-large-v3",
"openai/whisper-base",
"openai/whisper-small",
"openai/whisper-medium",
]
def _deep_merge(base: dict, override: dict) -> dict:
"""Recursively merge override into base, returning a new dict."""
result = dict(base)
for key, value in override.items():
if key in result and isinstance(result[key], dict) and isinstance(value, dict):
result[key] = _deep_merge(result[key], value)
else:
result[key] = value
return result
def load() -> dict[str, Any]:
"""Load config from disk, falling back to defaults."""
cfg = dict(DEFAULTS)
if CONFIG_PATH.exists():
try:
with open(CONFIG_PATH) as f:
saved = yaml.safe_load(f) or {}
cfg = _deep_merge(cfg, saved)
log.debug("Loaded config from %s", CONFIG_PATH)
except Exception:
log.warning("Failed to read config; using defaults", exc_info=True)
return cfg
def save(cfg: dict[str, Any]) -> None:
"""Write config to disk."""
CONFIG_DIR.mkdir(parents=True, exist_ok=True)
with open(CONFIG_PATH, "w") as f:
yaml.safe_dump(cfg, f, default_flow_style=False)
log.info("Config saved to %s", CONFIG_PATH)
def exists() -> bool:
return CONFIG_PATH.exists()

106
calliope/hotkeys.py Normal file
View File

@@ -0,0 +1,106 @@
"""Global hotkey listener using pynput."""
import logging
from typing import Callable
from pynput import keyboard
log = logging.getLogger(__name__)
# Maps string names to pynput keys
_KEY_MAP: dict[str, keyboard.Key] = {
"ctrl": keyboard.Key.ctrl,
"shift": keyboard.Key.shift,
"alt": keyboard.Key.alt,
"cmd": keyboard.Key.cmd,
"space": keyboard.Key.space,
}
def _parse_combo(combo: str) -> set[keyboard.Key]:
"""Parse 'ctrl+shift' into a set of pynput keys."""
keys: set[keyboard.Key] = set()
for part in combo.lower().split("+"):
part = part.strip()
if part in _KEY_MAP:
keys.add(_KEY_MAP[part])
else:
log.warning("Unknown key in combo: %s", part)
return keys
class HotkeyListener:
def __init__(
self,
on_push_to_talk_start: Callable,
on_push_to_talk_stop: Callable,
on_toggle: Callable,
ptt_combo: str = "ctrl+shift",
toggle_combo: str = "ctrl+space",
):
self._on_ptt_start = on_push_to_talk_start
self._on_ptt_stop = on_push_to_talk_stop
self._on_toggle = on_toggle
self._listener: keyboard.Listener | None = None
self._pressed: set = set()
self._ptt_active = False
self._toggle_active = False
self._ptt_keys = _parse_combo(ptt_combo)
self._toggle_keys = _parse_combo(toggle_combo)
log.debug("PTT keys: %s, Toggle keys: %s", self._ptt_keys, self._toggle_keys)
def start(self) -> None:
self._pressed.clear()
self._ptt_active = False
self._toggle_active = False
self._listener = keyboard.Listener(
on_press=self._on_press,
on_release=self._on_release,
)
self._listener.daemon = True
self._listener.start()
def stop(self) -> None:
if self._listener is not None:
try:
self._listener.stop()
except Exception:
pass
self._listener = None
self._pressed.clear()
self._ptt_active = False
self._toggle_active = False
def _normalize(self, key) -> keyboard.Key | keyboard.KeyCode:
if hasattr(key, "value") and hasattr(key.value, "vk"):
vk = key.value.vk
if vk in (0x3B, 0x3E):
return keyboard.Key.ctrl
if vk in (0x38, 0x3C):
return keyboard.Key.shift
return key
def _on_press(self, key) -> None:
key = self._normalize(key)
self._pressed.add(key)
if self._ptt_keys.issubset(self._pressed) and not self._ptt_active:
self._ptt_active = True
self._on_ptt_start()
if self._toggle_keys.issubset(self._pressed) and not self._toggle_active:
self._toggle_active = True
self._on_toggle()
def _on_release(self, key) -> None:
key = self._normalize(key)
if self._ptt_active and key in self._ptt_keys:
self._ptt_active = False
self._on_ptt_stop()
if key in self._toggle_keys:
self._toggle_active = False
self._pressed.discard(key)

313
calliope/overlay.py Normal file
View File

@@ -0,0 +1,313 @@
"""Floating waveform overlay shown during recording."""
import logging
import time
from collections import deque
from enum import Enum, auto
import numpy as np
from AppKit import (
NSPanel,
NSView,
NSColor,
NSBezierPath,
NSTimer,
NSScreen,
NSWindowStyleMaskBorderless,
NSWindowStyleMaskNonactivatingPanel,
NSFloatingWindowLevel,
NSStatusWindowLevel,
NSBackingStoreBuffered,
NSApp,
NSFont,
NSFontAttributeName,
NSForegroundColorAttributeName,
NSMakePoint,
)
from Foundation import NSMakeRect
from objc import super as objc_super
from PyObjCTools.AppHelper import callAfter
log = logging.getLogger(__name__)
WIDTH = 360
HEIGHT = 80
NUM_BARS = 150 # number of amplitude samples visible at once
FPS = 30
# Fade animation
FADE_DURATION = 0.2 # seconds
FADE_STEPS = int(FADE_DURATION * FPS)
class OverlayMode(Enum):
RECORDING = auto()
TRANSCRIBING = auto()
class WaveformView(NSView):
"""Custom NSView that draws a scrolling waveform or transcribing indicator."""
amplitudes: deque
mode: OverlayMode
_pulse_start: float
_fade_step: int
_fade_direction: int
_fade_timer: object
_on_fade_complete: object
def initWithFrame_(self, frame):
self = objc_super(WaveformView, self).initWithFrame_(frame)
if self is None:
return None
self.amplitudes = deque([0.0] * NUM_BARS, maxlen=NUM_BARS)
self.mode = OverlayMode.RECORDING
self._pulse_start = time.monotonic()
self._fade_step = 0
self._fade_direction = 0
self._fade_timer = None
self._on_fade_complete = None
return self
def drawRect_(self, rect):
# Dark translucent rounded-rect background
bg = NSColor.colorWithCalibratedRed_green_blue_alpha_(0.1, 0.1, 0.1, 0.85)
bg.setFill()
path = NSBezierPath.bezierPathWithRoundedRect_xRadius_yRadius_(
self.bounds(), 12, 12
)
path.fill()
# Subtle border
border = NSColor.colorWithCalibratedRed_green_blue_alpha_(1.0, 1.0, 1.0, 0.12)
border.setStroke()
border_path = NSBezierPath.bezierPathWithRoundedRect_xRadius_yRadius_(
self.bounds(), 12, 12
)
border_path.setLineWidth_(1.0)
border_path.stroke()
if self.mode == OverlayMode.RECORDING:
self._draw_waveform()
elif self.mode == OverlayMode.TRANSCRIBING:
self._draw_transcribing()
def _draw_waveform(self):
color = NSColor.colorWithCalibratedRed_green_blue_alpha_(0.4, 0.75, 0.5, 0.9)
color.setStroke()
bounds = self.bounds()
w = bounds.size.width
h = bounds.size.height
mid_y = h / 2
padding = 10
draw_w = w - 2 * padding
draw_h = (h - 2 * padding) / 2
amps = list(self.amplitudes)
if not amps:
return
step = draw_w / max(len(amps) - 1, 1)
for sign in (1, -1):
line = NSBezierPath.bezierPath()
line.setLineWidth_(1.5)
for i, a in enumerate(amps):
x = padding + i * step
y_off = a * draw_h * sign
if i == 0:
line.moveToPoint_((x, mid_y + y_off))
else:
line.lineToPoint_((x, mid_y + y_off))
line.stroke()
self._draw_label("calliope recording...")
def _draw_transcribing(self):
bounds = self.bounds()
w = bounds.size.width
h = bounds.size.height
mid_y = h / 2
# Pulsing dots animation
elapsed = time.monotonic() - self._pulse_start
num_dots = 3
dot_radius = 5.0
dot_spacing = 20.0
total_w = (num_dots - 1) * dot_spacing
start_x = (w - total_w) / 2
for i in range(num_dots):
# Staggered sine pulse for each dot
phase = elapsed * 3.0 - i * 0.6
alpha = 0.3 + 0.7 * max(0.0, (1.0 + np.sin(phase)) / 2.0)
color = NSColor.colorWithCalibratedRed_green_blue_alpha_(
0.4, 0.75, 0.5, alpha
)
color.setFill()
x = start_x + i * dot_spacing
dot = NSBezierPath.bezierPathWithOvalInRect_(
NSMakeRect(x - dot_radius, mid_y - dot_radius + 6,
dot_radius * 2, dot_radius * 2)
)
dot.fill()
self._draw_label("transcribing...")
def _draw_label(self, text: str):
from Foundation import NSString, NSDictionary
bounds = self.bounds()
w = bounds.size.width
label = NSString.stringWithString_(text)
attrs = NSDictionary.dictionaryWithObjects_forKeys_(
[
NSFont.systemFontOfSize_(11),
NSColor.colorWithCalibratedRed_green_blue_alpha_(1.0, 1.0, 1.0, 0.5),
],
[NSFontAttributeName, NSForegroundColorAttributeName],
)
label_size = label.sizeWithAttributes_(attrs)
label_x = (w - label_size.width) / 2
label.drawAtPoint_withAttributes_(NSMakePoint(label_x, 4), attrs)
def refresh_(self, timer):
self.setNeedsDisplay_(True)
def fadeTick_(self, timer):
self._fade_step += 1
progress = min(self._fade_step / FADE_STEPS, 1.0)
if self._fade_direction == 1:
alpha = progress
else:
alpha = 1.0 - progress
self.window().setAlphaValue_(alpha)
if progress >= 1.0:
self.stopFade()
if self._fade_direction == -1 and self._on_fade_complete:
self._on_fade_complete()
def stopFade(self):
if self._fade_timer is not None:
self._fade_timer.invalidate()
self._fade_timer = None
def startFade_onComplete_(self, direction, on_complete):
self.stopFade()
self._fade_direction = direction
self._fade_step = 0
self._on_fade_complete = on_complete
self._fade_timer = NSTimer.scheduledTimerWithTimeInterval_target_selector_userInfo_repeats_(
FADE_DURATION / FADE_STEPS, self, b"fadeTick:", None, True
)
class WaveformOverlay:
"""Floating translucent window showing a live scrolling waveform."""
def __init__(self):
self._panel: NSPanel | None = None
self._view: WaveformView | None = None
self._timer: NSTimer | None = None
def _ensure_panel(self):
if self._panel is not None:
return
screen = NSScreen.mainScreen()
screen_frame = screen.frame()
x = (screen_frame.size.width - WIDTH) / 2
y = screen_frame.size.height - HEIGHT - 40 # near top, below menu bar
rect = NSMakeRect(x, y, WIDTH, HEIGHT)
style = NSWindowStyleMaskBorderless | NSWindowStyleMaskNonactivatingPanel
panel = NSPanel.alloc().initWithContentRect_styleMask_backing_defer_(
rect, style, NSBackingStoreBuffered, False
)
panel.setLevel_(NSStatusWindowLevel)
panel.setOpaque_(False)
panel.setBackgroundColor_(NSColor.clearColor())
panel.setHasShadow_(True)
panel.setIgnoresMouseEvents_(True)
panel.setCollectionBehavior_(1 << 4) # NSWindowCollectionBehaviorCanJoinAllSpaces
view = WaveformView.alloc().initWithFrame_(NSMakeRect(0, 0, WIDTH, HEIGHT))
panel.setContentView_(view)
self._panel = panel
self._view = view
def show(self):
callAfter(self._show_on_main)
def hide(self):
callAfter(self._hide_on_main)
def show_transcribing(self):
"""Switch overlay to transcribing state (pulsing dots)."""
callAfter(self._show_transcribing_on_main)
def _show_on_main(self):
self._ensure_panel()
self._view.stopFade()
self._view.mode = OverlayMode.RECORDING
self._view.amplitudes = deque([0.0] * NUM_BARS, maxlen=NUM_BARS)
self._panel.setAlphaValue_(0.0)
self._panel.orderFront_(None)
self._start_timer()
self._view.startFade_onComplete_(1, None)
log.debug("Overlay shown")
def _show_transcribing_on_main(self):
self._ensure_panel()
self._view.stopFade()
self._view.mode = OverlayMode.TRANSCRIBING
self._view._pulse_start = time.monotonic()
# If panel is already visible, just switch mode; otherwise show it
if self._panel.alphaValue() < 0.01:
self._panel.setAlphaValue_(0.0)
self._panel.orderFront_(None)
self._view.startFade_onComplete_(1, None)
self._start_timer()
log.debug("Overlay switched to transcribing")
def _hide_on_main(self):
if self._view is None or self._panel is None:
return
def on_fade_out():
self._stop_timer()
self._panel.orderOut_(None)
self._view.startFade_onComplete_(-1, on_fade_out)
log.debug("Overlay hiding")
def cleanup(self):
"""Synchronously stop all timers and hide. Call before quit."""
self._stop_timer()
if self._view is not None:
self._view.stopFade()
if self._panel is not None:
self._panel.orderOut_(None)
def push_samples(self, chunk: np.ndarray):
"""Called from audio callback with a new chunk of float32 samples."""
rms = float(np.sqrt(np.mean(chunk ** 2)))
# Clamp to [0, 1] with some headroom
amplitude = min(rms * 5.0, 1.0)
if self._view is not None:
self._view.amplitudes.append(amplitude)
def _start_timer(self):
self._stop_timer()
self._timer = NSTimer.scheduledTimerWithTimeInterval_target_selector_userInfo_repeats_(
1.0 / FPS, self._view, b"refresh:", None, True
)
def _stop_timer(self):
if self._timer is not None:
self._timer.invalidate()
self._timer = None

88
calliope/recorder.py Normal file
View File

@@ -0,0 +1,88 @@
"""Audio recording using sounddevice."""
import logging
import numpy as np
import sounddevice as sd
import threading
log = logging.getLogger(__name__)
_CONSOLIDATE_EVERY = 100
class Recorder:
SAMPLE_RATE = 16_000
CHANNELS = 1
def __init__(self, device: int | None = None, on_audio=None):
self._device = device
self._chunks: list[np.ndarray] = []
self._stream: sd.InputStream | None = None
self._lock = threading.Lock()
self._chunk_count = 0
self.on_audio = on_audio
@property
def is_recording(self) -> bool:
return self._stream is not None and self._stream.active
def start(self) -> None:
with self._lock:
self._chunks = []
self._chunk_count = 0
try:
self._stream = sd.InputStream(
samplerate=self.SAMPLE_RATE,
channels=self.CHANNELS,
dtype="float32",
device=self._device,
callback=self._callback,
)
self._stream.start()
log.debug("Recording stream started (device=%s)", self._device)
except sd.PortAudioError:
log.error("Failed to open audio device %s", self._device, exc_info=True)
if self._stream is not None:
try:
self._stream.close()
except Exception:
pass
self._stream = None
raise
def stop(self) -> np.ndarray:
# Stop stream first — guarantees no more callbacks after this returns.
if self._stream is not None:
self._stream.stop()
self._stream.close()
self._stream = None
with self._lock:
if not self._chunks:
return np.zeros(0, dtype=np.float32)
audio = np.concatenate(self._chunks).flatten()
self._chunks = []
return audio
def get_audio_so_far(self) -> np.ndarray:
"""Return a copy of all audio recorded so far without stopping the stream."""
with self._lock:
if not self._chunks:
return np.zeros(0, dtype=np.float32)
return np.concatenate(self._chunks).flatten()
def _callback(self, indata: np.ndarray, frames, time_info, status) -> None:
if status:
log.warning("Audio stream status: %s", status)
chunk = indata[:, 0].copy() if indata.ndim > 1 else indata.copy()
with self._lock:
self._chunks.append(chunk)
self._chunk_count += 1
if self._chunk_count % _CONSOLIDATE_EVERY == 0:
self._chunks = [np.concatenate(self._chunks).flatten()]
if self.on_audio is not None:
try:
self.on_audio(chunk)
except Exception:
log.error("Error in on_audio callback", exc_info=True)

147
calliope/setup_wizard.py Normal file
View File

@@ -0,0 +1,147 @@
"""First-run setup wizard — Rich TUI."""
import subprocess
import sys
import sounddevice as sd
from rich.console import Console
from rich.panel import Panel
from rich.progress import Progress
from rich.prompt import Confirm, IntPrompt, Prompt
from rich.table import Table
from calliope import config
console = Console()
def run() -> dict:
"""Run the interactive setup wizard and return the final config."""
cfg = dict(config.DEFAULTS)
# ── Welcome ──────────────────────────────────────────────────────
console.print(
Panel.fit(
"[bold magenta]Calliope[/bold magenta]\n"
"Voice-to-text for macOS — speak and type into any app.\n\n"
"This wizard will walk you through first-time setup.",
border_style="magenta",
)
)
# ── Permission checks ────────────────────────────────────────────
console.print("\n[bold]Permission checks[/bold]")
_check_accessibility()
_check_microphone()
console.print()
# ── Mic selection ────────────────────────────────────────────────
console.print("[bold]Microphone selection[/bold]")
devices = sd.query_devices()
table = Table(show_header=True)
table.add_column("#", style="cyan", width=4)
table.add_column("Device")
table.add_column("Inputs", justify="right")
input_indices: list[int] = []
for i, d in enumerate(devices):
if d["max_input_channels"] > 0:
input_indices.append(i)
marker = " (default)" if i == sd.default.device[0] else ""
table.add_row(str(i), f"{d['name']}{marker}", str(d["max_input_channels"]))
console.print(table)
default_dev = sd.default.device[0]
choice = Prompt.ask(
"Device index",
default=str(default_dev) if default_dev is not None else str(input_indices[0]),
)
cfg["device"] = int(choice) if choice else None
# ── Hotkey config ────────────────────────────────────────────────
console.print("\n[bold]Hotkey configuration[/bold]")
console.print(f" Push-to-talk : [cyan]{cfg['hotkeys']['ptt']}[/cyan]")
console.print(f" Toggle : [cyan]{cfg['hotkeys']['toggle']}[/cyan]")
if Confirm.ask("Keep defaults?", default=True):
pass
else:
cfg["hotkeys"]["ptt"] = Prompt.ask("Push-to-talk combo", default=cfg["hotkeys"]["ptt"])
cfg["hotkeys"]["toggle"] = Prompt.ask("Toggle combo", default=cfg["hotkeys"]["toggle"])
# ── Model download ───────────────────────────────────────────────
console.print("\n[bold]Model download[/bold]")
console.print(f" Default model: [cyan]{cfg['model']}[/cyan]")
if not Confirm.ask("Use default model?", default=True):
cfg["model"] = Prompt.ask("Whisper model")
console.print(f"Downloading [cyan]{cfg['model']}[/cyan] (this may take a while)...")
from calliope.transcriber import Transcriber
transcriber = Transcriber(model=cfg["model"])
with Progress() as progress:
task = progress.add_task("Loading model...", total=None)
transcriber.load()
progress.update(task, completed=100, total=100)
console.print("[green]Model ready.[/green]")
# ── Validation ───────────────────────────────────────────────────
if Confirm.ask("\nRecord a short test clip to verify everything works?", default=True):
console.print("Recording for 3 seconds...")
from calliope.recorder import Recorder
import time
rec = Recorder(device=cfg["device"])
rec.start()
time.sleep(3)
audio = rec.stop()
console.print("Transcribing...")
text = transcriber.transcribe(audio)
console.print(f"[green]Result:[/green] {text or '(no speech detected)'}")
# ── Save ─────────────────────────────────────────────────────────
config.save(cfg)
console.print(f"\n[green]Config saved to {config.CONFIG_PATH}[/green]")
console.print("Run [bold]calliope[/bold] to start. Enjoy! 🎤\n")
return cfg
def _check_accessibility() -> None:
try:
import ApplicationServices
trusted = ApplicationServices.AXIsProcessTrusted()
except Exception:
trusted = None
if trusted:
console.print(" [green]✓[/green] Accessibility access granted")
else:
console.print(" [red]✗[/red] Accessibility access — required for typing")
console.print(" Open: System Settings → Privacy & Security → Accessibility")
if Confirm.ask(" Open System Settings?", default=False):
subprocess.run(
["open", "x-apple.systempreferences:com.apple.preference.security?Privacy_Accessibility"],
check=False,
)
def _check_microphone() -> None:
try:
import AVFoundation
status = AVFoundation.AVCaptureDevice.authorizationStatusForMediaType_(
AVFoundation.AVMediaTypeAudio
)
granted = status == 3 # AVAuthorizationStatusAuthorized
except Exception:
granted = None
if granted:
console.print(" [green]✓[/green] Microphone access granted")
else:
console.print(" [red]✗[/red] Microphone access — required for recording")
console.print(" Open: System Settings → Privacy & Security → Microphone")
if Confirm.ask(" Open System Settings?", default=False):
subprocess.run(
["open", "x-apple.systempreferences:com.apple.preference.security?Privacy_Microphone"],
check=False,
)

73
calliope/transcriber.py Normal file
View File

@@ -0,0 +1,73 @@
"""Whisper transcription using transformers pipeline on MPS."""
import logging
import numpy as np
import torch
from transformers import pipeline
log = logging.getLogger(__name__)
class Transcriber:
def __init__(self, model: str = "distil-whisper/distil-large-v3"):
self.model = model
self._pipe = None
self._tokenizer = None
self.context: str = ""
self.language: str = "auto"
def load(self) -> None:
from transformers import AutoTokenizer
device = "mps" if torch.backends.mps.is_available() else "cpu"
# Use float32 on MPS — float16 produces garbled output on Apple Silicon.
dtype = torch.float32 if device == "mps" else torch.float16
log.info("Loading model %s on %s (dtype=%s)", self.model, device, dtype)
try:
self._pipe = pipeline(
"automatic-speech-recognition",
model=self.model,
torch_dtype=dtype,
device=device,
)
self._tokenizer = AutoTokenizer.from_pretrained(self.model)
log.info("Model loaded successfully")
except Exception:
log.error("Failed to load model %s", self.model, exc_info=True)
raise
def transcribe(self, audio: np.ndarray) -> str:
if self._pipe is None:
self.load()
if audio.size == 0:
return ""
# Skip audio that's too short (<1s) or too quiet — Whisper hallucinates
# punctuation like "!" on silence/noise.
duration = audio.size / 16_000
energy = float(np.sqrt(np.mean(audio ** 2)))
log.debug("Audio: %.1fs, RMS energy: %.6f", duration, energy)
if duration < 1.0 or energy < 0.005:
log.debug("Audio too short or too quiet, skipping transcription")
return ""
generate_kwargs = {}
if self.context:
prompt_ids = self._tokenizer.get_prompt_ids(self.context)
generate_kwargs["prompt_ids"] = prompt_ids
pipe_kwargs = {
"batch_size": 4,
"return_timestamps": True,
"generate_kwargs": generate_kwargs,
}
if self.language != "auto":
pipe_kwargs["generate_kwargs"]["language"] = self.language
pipe_kwargs["generate_kwargs"]["task"] = "transcribe"
result = self._pipe(
{"raw": audio, "sampling_rate": 16_000},
**pipe_kwargs,
)
return result["text"].strip()

64
calliope/typer.py Normal file
View File

@@ -0,0 +1,64 @@
"""Type text into the focused field using Quartz CGEvents."""
import logging
import subprocess
import time
import Quartz
log = logging.getLogger(__name__)
def type_text(text: str) -> None:
"""Simulate typing text into the currently focused text field."""
for char in text:
_type_char(char)
time.sleep(0.005)
def type_text_clipboard(text: str) -> None:
"""Type text by copying to clipboard and pasting with Cmd+V.
Saves and restores the previous clipboard contents.
"""
# Save current clipboard
try:
prev = subprocess.run(
["pbpaste"], capture_output=True, text=True, timeout=2,
).stdout
except Exception:
prev = None
# Copy text to clipboard
subprocess.run(["pbcopy"], input=text, text=True, timeout=2)
# Paste with Cmd+V
_cmd_v()
time.sleep(0.05)
# Restore previous clipboard
if prev is not None:
subprocess.run(["pbcopy"], input=prev, text=True, timeout=2)
def _cmd_v() -> None:
"""Simulate Cmd+V keypress."""
# 'v' keycode is 9
event_down = Quartz.CGEventCreateKeyboardEvent(None, 9, True)
event_up = Quartz.CGEventCreateKeyboardEvent(None, 9, False)
Quartz.CGEventSetFlags(event_down, Quartz.kCGEventFlagMaskCommand)
Quartz.CGEventSetFlags(event_up, Quartz.kCGEventFlagMaskCommand)
Quartz.CGEventPost(Quartz.kCGAnnotatedSessionEventTap, event_down)
Quartz.CGEventPost(Quartz.kCGAnnotatedSessionEventTap, event_up)
def _type_char(char: str) -> None:
"""Type a single unicode character via CGEvents."""
event_down = Quartz.CGEventCreateKeyboardEvent(None, 0, True)
event_up = Quartz.CGEventCreateKeyboardEvent(None, 0, False)
Quartz.CGEventKeyboardSetUnicodeString(event_down, len(char), char)
Quartz.CGEventKeyboardSetUnicodeString(event_up, len(char), char)
Quartz.CGEventPost(Quartz.kCGAnnotatedSessionEventTap, event_down)
Quartz.CGEventPost(Quartz.kCGAnnotatedSessionEventTap, event_up)

27
pyproject.toml Normal file
View File

@@ -0,0 +1,27 @@
[build-system]
requires = ["setuptools>=68.0"]
build-backend = "setuptools.build_meta"
[project]
name = "calliope"
version = "0.1.0"
description = "Voice-to-text for macOS — speak and type into any app"
requires-python = ">=3.10"
dependencies = [
"rumps>=0.4.0",
"sounddevice>=0.4.6",
"numpy>=1.24.0",
"torch>=2.0.0",
"transformers>=4.36.0",
"accelerate>=0.25.0",
"pynput>=1.7.6",
"pyobjc-framework-Quartz>=9.0",
"pyobjc-framework-Cocoa>=9.0",
"pyobjc-framework-AVFoundation>=9.0",
"rich>=13.0.0",
"click>=8.1.0",
"pyyaml>=6.0",
]
[project.scripts]
calliope = "calliope.app:main"