Initial commit: Calliope voice-to-text macOS menu bar app

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 15:08:53 +01:00
commit 7cbf2d04a9
15 changed files with 1431 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,22 @@
 __pycache__/
 *.py[cod]
 *$py.class
 *.so
 *.egg-info/
 *.egg
 dist/
 build/
 .eggs/
 *.whl
 .venv/
 venv/
 env/
 .env
 *.log
 .DS_Store
 *.swp
 *.swo
 *~
 .idea/
 .vscode/
 *.iml
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,42 @@
 # CLAUDE.md
 This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
 ## What is Calliope?
 A macOS menu bar app for local voice-to-text. Users press a hotkey, speak, and transcribed text is typed into the focused app. Runs entirely offline using Whisper models via Hugging Face Transformers + PyTorch.
 ## Setup & Running
 ```bash
 pip install -e .        # Install in dev mode
 calliope                # Launch (runs setup wizard on first run)
 calliope setup          # Re-run setup wizard
 calliope --debug        # Launch with debug logging
 calliope --device 2 --model openai/whisper-large-v3  # Override config
 ```
 No test suite or linter is configured yet.
 ## Architecture
 **Entry point:** `calliope/cli.py` → Click CLI → `calliope/app.py:main()`
 **Data flow:** Hotkey press → Record audio → Transcribe with Whisper → Type into focused app
 Key modules in `calliope/`:
 - **app.py** — `CalliopeApp(rumps.App)`: main orchestrator, manages menu bar UI and coordinates all components
 - **recorder.py** — Audio capture via `sounddevice` at 16kHz mono float32, with chunk consolidation
 - **transcriber.py** — Whisper STT using HF `transformers.pipeline("automatic-speech-recognition")`
 - **hotkeys.py** — `HotkeyListener` using `pynput`: supports push-to-talk (Ctrl+Shift hold) and toggle (Ctrl+Space) modes
 - **typer.py** — Outputs text via Quartz CGEvents (character mode) or clipboard paste (Cmd+V)
 - **overlay.py** — `WaveformOverlay`: floating NSPanel with scrolling waveform during recording, pulsing dots during transcription
 - **setup_wizard.py** — Rich-based interactive first-run config (mic, hotkeys, model download)
 - **config.py** — Loads/saves YAML config at `~/.config/calliope/config.yaml`
 ## Platform Constraints
 - **macOS only** — uses `pyobjc` bindings (Quartz, AppKit, AVFoundation, ApplicationServices)
 - **MPS (Apple Silicon):** must use float32, not float16 (causes garbled Whisper output)
 - Requires Accessibility and Microphone permissions in macOS System Settings
--- a/21
+++ b/21
@@ -0,0 +1,21 @@
 MIT License
 Copyright (c) 2026 Calliope Contributors
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in all
 copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
--- a/README.md
+++ b/README.md
@@ -0,0 +1,86 @@
 # Calliope
 Voice-to-text for macOS — speak and type into any app.
 Calliope sits in your menu bar, listens when you hold a hotkey, transcribes your speech with Whisper, and types the result into whatever app is focused. No cloud, no API keys — everything runs locally on your Mac.
 ## Installation
 ```bash
 git clone https://github.com/yourname/calliope.git
 cd calliope
 pip install -e .
 ```
 ## Usage
 ```bash
 # First run — launches the setup wizard, then starts the app
 calliope
 # Re-run the setup wizard
 calliope setup
 # Launch with overrides
 calliope --device 2 --model openai/whisper-large-v3 --debug
 # Print version
 calliope --version
 ```
 ## Hotkeys
 | Action | Default | Description |
 |--------|---------|-------------|
 | Push-to-talk | `Ctrl+Shift` (hold) | Records while held, transcribes on release |
 | Toggle | `Ctrl+Space` | Start/stop recording |
 Hotkeys are configurable via the setup wizard or `~/.config/calliope/config.yaml`.
 ## Permissions
 Calliope needs two macOS permissions:
 - **Accessibility** — to type text into other apps (System Settings > Privacy & Security > Accessibility)
 - **Microphone** — to record audio (System Settings > Privacy & Security > Microphone)
 The setup wizard checks for these and can open System Settings for you.
 ## Configuration
 Config lives at `~/.config/calliope/config.yaml`:
 ```yaml
 device: null          # sounddevice index; null = system default
 model: distil-whisper/distil-large-v3
 hotkeys:
  ptt: ctrl+shift
  toggle: ctrl+space
 context: ""           # domain-specific terms to help Whisper
 debug: false
 ```
 CLI flags override config values for that session.
 ## Troubleshooting
 **"Status: Model load failed"**
 Check that you have enough disk space and RAM. The default model needs ~1.5 GB. Run with `--debug` for detailed logs.
 **No text appears after transcribing**
 Make sure Accessibility permission is granted. Restart Calliope after granting it.
 **Wrong microphone**
 Run `calliope setup` to pick a different input device, or set `device` in the config file. Use `python -m sounddevice` to list devices.
 **Hotkeys not working**
 Ensure no other app is capturing the same key combo. Customize hotkeys via `calliope setup`.
 ## Remaining TODOs
 - LICENSE file
 - Unit tests
 - CI/CD pipeline
 - Homebrew formula
 - `.app` bundle for drag-and-drop install
 - Changelog
--- a/calliope/init.py
+++ b/calliope/init.py
--- a/calliope/app.py
+++ b/calliope/app.py
@@ -0,0 +1,302 @@
 """Calliope — Voice-to-text macOS menu bar app."""
 import logging
 import os
 import threading
 import time
 from typing import Any
 # Disable tokenizers parallelism to avoid leaked semaphore warnings on shutdown.
 os.environ.setdefault("TOKENIZERS_PARALLELISM", "false")
 # Run offline — models are downloaded during setup, no need to hit HuggingFace on every launch.
 os.environ.setdefault("HF_HUB_OFFLINE", "1")
 import rumps
 from calliope import config as config_mod
 from calliope.recorder import Recorder
 from calliope.transcriber import Transcriber
 from calliope.typer import type_text, type_text_clipboard
 from calliope.hotkeys import HotkeyListener
 from calliope.overlay import WaveformOverlay
 log = logging.getLogger(__name__)
 class CalliopeApp(rumps.App):
    def __init__(self, cfg: dict[str, Any] | None = None):
        super().__init__("Calliope", title="\U0001f3a4", quit_button=None)  # 🎤
        if cfg is None:
            cfg = config_mod.load()
        self.cfg = cfg
        self.overlay = WaveformOverlay()
        self.recorder = Recorder(device=cfg.get("device"))
        self.transcriber = Transcriber(
            model=cfg.get("model", "distil-whisper/distil-large-v3"),
        )
        self.transcriber.context = cfg.get("context", "")
        self.transcriber.language = cfg.get("language", "auto")
        self._recording = False
        self._rec_lock = threading.Lock()
        self._rec_start_time: float | None = None
        self._rec_timer: rumps.Timer | None = None
        self.status_item = rumps.MenuItem("Status: Loading model...")
        self.status_item.set_callback(None)
        self.toggle_item = rumps.MenuItem("Start Recording", callback=self._on_toggle_click)
        self.context_item = rumps.MenuItem("Set Whisper Context...", callback=self._on_set_context)
        # Language submenu
        self._lang_menu = rumps.MenuItem("Language")
        current_lang = cfg.get("language", "auto")
        for display_name, code in config_mod.LANGUAGES.items():
            prefix = "\u2713 " if code == current_lang else "   "
            item = rumps.MenuItem(f"{prefix}{display_name}", callback=self._on_language_select)
            self._lang_menu.add(item)
        # Model submenu
        self._model_menu = rumps.MenuItem("Model")
        current_model = cfg.get("model", "distil-whisper/distil-large-v3")
        for model_id in config_mod.MODELS:
            short = model_id.split("/")[-1]
            prefix = "\u2713 " if model_id == current_model else "   "
            item = rumps.MenuItem(f"{prefix}{short}", callback=self._on_model_select)
            self._model_menu.add(item)
        quit_item = rumps.MenuItem("Quit Calliope", callback=self._on_quit)
        self.menu = [
            self.status_item,
            None,
            self.toggle_item,
            self.context_item,
            self._lang_menu,
            self._model_menu,
            None,
            quit_item,
        ]
        hotkey_cfg = cfg.get("hotkeys", {})
        self.hotkeys = HotkeyListener(
            on_push_to_talk_start=self._start_recording,
            on_push_to_talk_stop=self._stop_and_transcribe,
            on_toggle=self._toggle_recording,
            ptt_combo=hotkey_cfg.get("ptt", "ctrl+shift"),
            toggle_combo=hotkey_cfg.get("toggle", "ctrl+space"),
        )
        # Load model in background
        threading.Thread(target=self._load_model, daemon=True).start()
    def _load_model(self) -> None:
        try:
            self.transcriber.load()
            self.status_item.title = "Status: Ready"
            self.hotkeys.start()
            log.info("Model loaded, hotkeys active")
        except Exception:
            log.error("Failed to load model", exc_info=True)
            self.status_item.title = "Status: Model load failed"
            try:
                rumps.notification("Calliope", "Error", "Failed to load Whisper model. Check logs.")
            except RuntimeError:
                pass
    @staticmethod
    def _activate_app():
        """Temporarily become a regular app so dialog text fields receive focus."""
        from AppKit import NSApplication, NSApplicationActivationPolicyRegular
        app = NSApplication.sharedApplication()
        app.setActivationPolicy_(NSApplicationActivationPolicyRegular)
        app.activateIgnoringOtherApps_(True)
    @staticmethod
    def _deactivate_app():
        """Revert to accessory app (no Dock icon)."""
        from AppKit import NSApplication, NSApplicationActivationPolicyAccessory
        NSApplication.sharedApplication().setActivationPolicy_(NSApplicationActivationPolicyAccessory)
    def _on_set_context(self, sender) -> None:
        self._activate_app()
        response = rumps.Window(
            message="Provide context to help Whisper with domain-specific terms, "
            "names, or jargon. For example:\n\n"
            "\"Meeting about Kubernetes, gRPC, and the Istio service mesh.\"",
            title="Set Whisper Context",
            default_text=self.transcriber.context,
            ok="Save",
            cancel="Clear",
            dimensions=(320, 120),
        ).run()
        if response.clicked == 1:  # Save
            self.transcriber.context = response.text.strip()
        else:  # Clear
            self.transcriber.context = ""
        self._deactivate_app()
        ctx = self.transcriber.context
        self.context_item.title = f"Set Whisper Context... ({ctx[:20]}...)" if ctx else "Set Whisper Context..."
    def _on_language_select(self, sender) -> None:
        display_name = sender.title.strip().lstrip("\u2713").strip()
        code = config_mod.LANGUAGES.get(display_name, "auto")
        self.transcriber.language = code
        # Update checkmarks
        for item in self._lang_menu.values():
            name = item.title.strip().lstrip("\u2713").strip()
            item.title = f"\u2713 {name}" if config_mod.LANGUAGES.get(name) == code else f"   {name}"
        self.cfg["language"] = code
        config_mod.save(self.cfg)
        log.info("Language set to %s (%s)", display_name, code)
    def _on_model_select(self, sender) -> None:
        short_name = sender.title.strip().lstrip("\u2713").strip()
        # Find full model ID
        model_id = None
        for m in config_mod.MODELS:
            if m.split("/")[-1] == short_name:
                model_id = m
                break
        if model_id is None or model_id == self.transcriber.model:
            return
        # Update checkmarks
        for item in self._model_menu.values():
            name = item.title.strip().lstrip("\u2713").strip()
            item.title = f"\u2713 {name}" if name == short_name else f"   {name}"
        self.cfg["model"] = model_id
        config_mod.save(self.cfg)
        self.status_item.title = "Status: Loading model..."
        self.hotkeys.stop()
        self._release_transcriber()
        self.transcriber = Transcriber(model=model_id)
        self.transcriber.context = self.cfg.get("context", "")
        self.transcriber.language = self.cfg.get("language", "auto")
        threading.Thread(target=self._load_model, daemon=True).start()
        log.info("Switching model to %s", model_id)
    def _release_transcriber(self) -> None:
        """Free the current Whisper model to reclaim GPU memory."""
        if self.transcriber is not None:
            self.transcriber._pipe = None
            self.transcriber._tokenizer = None
        import torch
        if torch.backends.mps.is_available():
            torch.mps.empty_cache()
    def _on_toggle_click(self, sender) -> None:
        self._toggle_recording()
    def _toggle_recording(self) -> None:
        if self._recording:
            self._stop_and_transcribe()
        else:
            self._start_recording()
    def _start_recording(self) -> None:
        with self._rec_lock:
            if self._recording:
                return
            self._recording = True
        self._rec_start_time = time.time()
        self.title = "\U0001f534 0:00"  # 🔴
        self.toggle_item.title = "Stop Recording"
        self.status_item.title = "Status: Recording..."
        self.recorder.on_audio = self.overlay.push_samples
        try:
            self.recorder.start()
        except Exception:
            log.error("Failed to start recording", exc_info=True)
            with self._rec_lock:
                self._recording = False
            self.title = "\U0001f3a4"  # 🎤
            self.toggle_item.title = "Start Recording"
            self.status_item.title = "Status: Mic error (check device)"
            try:
                rumps.notification("Calliope", "", "Microphone unavailable — check audio device")
            except RuntimeError:
                pass
            return
        self.overlay.show()
        self._rec_timer = rumps.Timer(self._update_rec_duration, 1)
        self._rec_timer.start()
        try:
            rumps.notification("Calliope", "", "Recording started")
        except RuntimeError:
            pass  # Info.plist missing CFBundleIdentifier
        log.info("Recording started")
    def _stop_and_transcribe(self) -> None:
        with self._rec_lock:
            if not self._recording:
                return
            self._recording = False
        if self._rec_timer:
            self._rec_timer.stop()
            self._rec_timer = None
        duration = int(time.time() - self._rec_start_time) if self._rec_start_time else 0
        self._rec_start_time = None
        self.title = "\U0001f3a4"  # 🎤
        self.toggle_item.title = "Start Recording"
        self.status_item.title = "Status: Transcribing..."
        self.overlay.show_transcribing()
        audio = self.recorder.stop()
        try:
            rumps.notification("Calliope", "", f"Recording stopped ({duration}s)")
        except RuntimeError:
            pass
        log.info("Recording stopped, %d samples", audio.size)
        threading.Thread(target=self._transcribe_and_type, args=(audio,), daemon=True).start()
    def _update_rec_duration(self, timer) -> None:
        if self._rec_start_time is None:
            return
        elapsed = int(time.time() - self._rec_start_time)
        minutes, seconds = divmod(elapsed, 60)
        self.title = f"\U0001f534 {minutes}:{seconds:02d}"
    def _transcribe_and_type(self, audio) -> None:
        try:
            text = self.transcriber.transcribe(audio)
            if text:
                def _do_type():
                    try:
                        if self.cfg.get("typing_mode", "char") == "clipboard":
                            type_text_clipboard(text)
                        else:
                            type_text(text)
                        print(f"\n[Calliope] {text}")
                        log.info("Typed %d chars", len(text))
                    except Exception:
                        log.error("Typing failed", exc_info=True)
                from PyObjCTools.AppHelper import callAfter
                callAfter(_do_type)
            self.overlay.hide()
            self.status_item.title = "Status: Ready"
        except Exception:
            log.error("Transcription failed", exc_info=True)
            self.overlay.hide()
            self.status_item.title = "Status: Ready"
            try:
                rumps.notification("Calliope", "Error", "Transcription failed. Check logs.")
            except RuntimeError:
                pass
    def _on_quit(self, sender) -> None:
        self.hotkeys.stop()
        self.recorder.stop()
        # Stop overlay timers synchronously to avoid retain cycles on quit.
        self.overlay.cleanup()
        rumps.quit_application()
 def main():
    from calliope.cli import cli
    cli()
 if __name__ == "__main__":
    main()
--- a/calliope/cli.py
+++ b/calliope/cli.py
@@ -0,0 +1,55 @@
 """CLI entry point using click."""
 import logging
 import click
 from calliope import config
@click.group(invoke_without_command=True)
@click.option("--device", type=int, default=None, help="Audio input device index.")
@click.option("--model", type=str, default=None, help="Whisper model name.")
@click.option("--context", type=str, default=None, help="Transcription context prompt.")
@click.option("--debug", is_flag=True, default=False, help="Enable debug logging.")
@click.version_option(package_name="calliope")
@click.pass_context
 def cli(ctx, device, model, context, debug):
    """Calliope — Voice-to-text for macOS."""
    level = logging.DEBUG if debug else logging.INFO
    logging.basicConfig(
        level=level,
        format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
    )
    cfg = config.load()
    # CLI flags override config
    if device is not None:
        cfg["device"] = device
    if model is not None:
        cfg["model"] = model
    if context is not None:
        cfg["context"] = context
    if debug:
        cfg["debug"] = True
    ctx.ensure_object(dict)
    ctx.obj["cfg"] = cfg
    if ctx.invoked_subcommand is None:
        # First run → wizard, then launch
        if not config.exists():
            from calliope.setup_wizard import run as run_wizard
            cfg = run_wizard()
            ctx.obj["cfg"] = cfg
        from calliope.app import CalliopeApp
        CalliopeApp(cfg).run()
@cli.command()
 def setup():
    """Re-run the setup wizard."""
    from calliope.setup_wizard import run as run_wizard
    run_wizard()
--- a/calliope/config.py
+++ b/calliope/config.py
@@ -0,0 +1,85 @@
 """Persistent YAML config at ~/.config/calliope/config.yaml."""
 import logging
 from pathlib import Path
 from typing import Any
 import yaml
 log = logging.getLogger(__name__)
 CONFIG_DIR = Path.home() / ".config" / "calliope"
 CONFIG_PATH = CONFIG_DIR / "config.yaml"
 DEFAULTS: dict[str, Any] = {
    "device": None,  # sounddevice index; None = system default
    "model": "distil-whisper/distil-large-v3",
    "language": "auto",
    "hotkeys": {
        "ptt": "ctrl+shift",
        "toggle": "ctrl+space",
    },
    "context": "",
    "debug": False,
    "typing_mode": "char",  # "char" or "clipboard"
 }
 LANGUAGES: dict[str, str] = {
    "Auto": "auto",
    "English": "en",
    "Spanish": "es",
    "French": "fr",
    "German": "de",
    "Japanese": "ja",
    "Chinese": "zh",
    "Korean": "ko",
    "Portuguese": "pt",
    "Italian": "it",
    "Dutch": "nl",
    "Russian": "ru",
 }
 MODELS: list[str] = [
    "distil-whisper/distil-large-v3",
    "openai/whisper-large-v3",
    "openai/whisper-base",
    "openai/whisper-small",
    "openai/whisper-medium",
 ]
 def _deep_merge(base: dict, override: dict) -> dict:
    """Recursively merge override into base, returning a new dict."""
    result = dict(base)
    for key, value in override.items():
        if key in result and isinstance(result[key], dict) and isinstance(value, dict):
            result[key] = _deep_merge(result[key], value)
        else:
            result[key] = value
    return result
 def load() -> dict[str, Any]:
    """Load config from disk, falling back to defaults."""
    cfg = dict(DEFAULTS)
    if CONFIG_PATH.exists():
        try:
            with open(CONFIG_PATH) as f:
                saved = yaml.safe_load(f) or {}
            cfg = _deep_merge(cfg, saved)
            log.debug("Loaded config from %s", CONFIG_PATH)
        except Exception:
            log.warning("Failed to read config; using defaults", exc_info=True)
    return cfg
 def save(cfg: dict[str, Any]) -> None:
    """Write config to disk."""
    CONFIG_DIR.mkdir(parents=True, exist_ok=True)
    with open(CONFIG_PATH, "w") as f:
        yaml.safe_dump(cfg, f, default_flow_style=False)
    log.info("Config saved to %s", CONFIG_PATH)
 def exists() -> bool:
    return CONFIG_PATH.exists()
--- a/calliope/hotkeys.py
+++ b/calliope/hotkeys.py
@@ -0,0 +1,106 @@
 """Global hotkey listener using pynput."""
 import logging
 from typing import Callable
 from pynput import keyboard
 log = logging.getLogger(__name__)
 # Maps string names to pynput keys
 _KEY_MAP: dict[str, keyboard.Key] = {
    "ctrl": keyboard.Key.ctrl,
    "shift": keyboard.Key.shift,
    "alt": keyboard.Key.alt,
    "cmd": keyboard.Key.cmd,
    "space": keyboard.Key.space,
 }
 def _parse_combo(combo: str) -> set[keyboard.Key]:
    """Parse 'ctrl+shift' into a set of pynput keys."""
    keys: set[keyboard.Key] = set()
    for part in combo.lower().split("+"):
        part = part.strip()
        if part in _KEY_MAP:
            keys.add(_KEY_MAP[part])
        else:
            log.warning("Unknown key in combo: %s", part)
    return keys
 class HotkeyListener:
    def __init__(
        self,
        on_push_to_talk_start: Callable,
        on_push_to_talk_stop: Callable,
        on_toggle: Callable,
        ptt_combo: str = "ctrl+shift",
        toggle_combo: str = "ctrl+space",
    ):
        self._on_ptt_start = on_push_to_talk_start
        self._on_ptt_stop = on_push_to_talk_stop
        self._on_toggle = on_toggle
        self._listener: keyboard.Listener | None = None
        self._pressed: set = set()
        self._ptt_active = False
        self._toggle_active = False
        self._ptt_keys = _parse_combo(ptt_combo)
        self._toggle_keys = _parse_combo(toggle_combo)
        log.debug("PTT keys: %s, Toggle keys: %s", self._ptt_keys, self._toggle_keys)
    def start(self) -> None:
        self._pressed.clear()
        self._ptt_active = False
        self._toggle_active = False
        self._listener = keyboard.Listener(
            on_press=self._on_press,
            on_release=self._on_release,
        )
        self._listener.daemon = True
        self._listener.start()
    def stop(self) -> None:
        if self._listener is not None:
            try:
                self._listener.stop()
            except Exception:
                pass
            self._listener = None
        self._pressed.clear()
        self._ptt_active = False
        self._toggle_active = False
    def _normalize(self, key) -> keyboard.Key | keyboard.KeyCode:
        if hasattr(key, "value") and hasattr(key.value, "vk"):
            vk = key.value.vk
            if vk in (0x3B, 0x3E):
                return keyboard.Key.ctrl
            if vk in (0x38, 0x3C):
                return keyboard.Key.shift
        return key
    def _on_press(self, key) -> None:
        key = self._normalize(key)
        self._pressed.add(key)
        if self._ptt_keys.issubset(self._pressed) and not self._ptt_active:
            self._ptt_active = True
            self._on_ptt_start()
        if self._toggle_keys.issubset(self._pressed) and not self._toggle_active:
            self._toggle_active = True
            self._on_toggle()
    def _on_release(self, key) -> None:
        key = self._normalize(key)
        if self._ptt_active and key in self._ptt_keys:
            self._ptt_active = False
            self._on_ptt_stop()
        if key in self._toggle_keys:
            self._toggle_active = False
        self._pressed.discard(key)
--- a/calliope/overlay.py
+++ b/calliope/overlay.py
@@ -0,0 +1,313 @@
 """Floating waveform overlay shown during recording."""
 import logging
 import time
 from collections import deque
 from enum import Enum, auto
 import numpy as np
 from AppKit import (
    NSPanel,
    NSView,
    NSColor,
    NSBezierPath,
    NSTimer,
    NSScreen,
    NSWindowStyleMaskBorderless,
    NSWindowStyleMaskNonactivatingPanel,
    NSFloatingWindowLevel,
    NSStatusWindowLevel,
    NSBackingStoreBuffered,
    NSApp,
    NSFont,
    NSFontAttributeName,
    NSForegroundColorAttributeName,
    NSMakePoint,
 )
 from Foundation import NSMakeRect
 from objc import super as objc_super
 from PyObjCTools.AppHelper import callAfter
 log = logging.getLogger(__name__)
 WIDTH = 360
 HEIGHT = 80
 NUM_BARS = 150  # number of amplitude samples visible at once
 FPS = 30
 # Fade animation
 FADE_DURATION = 0.2  # seconds
 FADE_STEPS = int(FADE_DURATION * FPS)
 class OverlayMode(Enum):
    RECORDING = auto()
    TRANSCRIBING = auto()
 class WaveformView(NSView):
    """Custom NSView that draws a scrolling waveform or transcribing indicator."""
    amplitudes: deque
    mode: OverlayMode
    _pulse_start: float
    _fade_step: int
    _fade_direction: int
    _fade_timer: object
    _on_fade_complete: object
    def initWithFrame_(self, frame):
        self = objc_super(WaveformView, self).initWithFrame_(frame)
        if self is None:
            return None
        self.amplitudes = deque([0.0] * NUM_BARS, maxlen=NUM_BARS)
        self.mode = OverlayMode.RECORDING
        self._pulse_start = time.monotonic()
        self._fade_step = 0
        self._fade_direction = 0
        self._fade_timer = None
        self._on_fade_complete = None
        return self
    def drawRect_(self, rect):
        # Dark translucent rounded-rect background
        bg = NSColor.colorWithCalibratedRed_green_blue_alpha_(0.1, 0.1, 0.1, 0.85)
        bg.setFill()
        path = NSBezierPath.bezierPathWithRoundedRect_xRadius_yRadius_(
            self.bounds(), 12, 12
        )
        path.fill()
        # Subtle border
        border = NSColor.colorWithCalibratedRed_green_blue_alpha_(1.0, 1.0, 1.0, 0.12)
        border.setStroke()
        border_path = NSBezierPath.bezierPathWithRoundedRect_xRadius_yRadius_(
            self.bounds(), 12, 12
        )
        border_path.setLineWidth_(1.0)
        border_path.stroke()
        if self.mode == OverlayMode.RECORDING:
            self._draw_waveform()
        elif self.mode == OverlayMode.TRANSCRIBING:
            self._draw_transcribing()
    def _draw_waveform(self):
        color = NSColor.colorWithCalibratedRed_green_blue_alpha_(0.4, 0.75, 0.5, 0.9)
        color.setStroke()
        bounds = self.bounds()
        w = bounds.size.width
        h = bounds.size.height
        mid_y = h / 2
        padding = 10
        draw_w = w - 2 * padding
        draw_h = (h - 2 * padding) / 2
        amps = list(self.amplitudes)
        if not amps:
            return
        step = draw_w / max(len(amps) - 1, 1)
        for sign in (1, -1):
            line = NSBezierPath.bezierPath()
            line.setLineWidth_(1.5)
            for i, a in enumerate(amps):
                x = padding + i * step
                y_off = a * draw_h * sign
                if i == 0:
                    line.moveToPoint_((x, mid_y + y_off))
                else:
                    line.lineToPoint_((x, mid_y + y_off))
            line.stroke()
        self._draw_label("calliope recording...")
    def _draw_transcribing(self):
        bounds = self.bounds()
        w = bounds.size.width
        h = bounds.size.height
        mid_y = h / 2
        # Pulsing dots animation
        elapsed = time.monotonic() - self._pulse_start
        num_dots = 3
        dot_radius = 5.0
        dot_spacing = 20.0
        total_w = (num_dots - 1) * dot_spacing
        start_x = (w - total_w) / 2
        for i in range(num_dots):
            # Staggered sine pulse for each dot
            phase = elapsed * 3.0 - i * 0.6
            alpha = 0.3 + 0.7 * max(0.0, (1.0 + np.sin(phase)) / 2.0)
            color = NSColor.colorWithCalibratedRed_green_blue_alpha_(
                0.4, 0.75, 0.5, alpha
            )
            color.setFill()
            x = start_x + i * dot_spacing
            dot = NSBezierPath.bezierPathWithOvalInRect_(
                NSMakeRect(x - dot_radius, mid_y - dot_radius + 6,
                           dot_radius * 2, dot_radius * 2)
            )
            dot.fill()
        self._draw_label("transcribing...")
    def _draw_label(self, text: str):
        from Foundation import NSString, NSDictionary
        bounds = self.bounds()
        w = bounds.size.width
        label = NSString.stringWithString_(text)
        attrs = NSDictionary.dictionaryWithObjects_forKeys_(
            [
                NSFont.systemFontOfSize_(11),
                NSColor.colorWithCalibratedRed_green_blue_alpha_(1.0, 1.0, 1.0, 0.5),
            ],
            [NSFontAttributeName, NSForegroundColorAttributeName],
        )
        label_size = label.sizeWithAttributes_(attrs)
        label_x = (w - label_size.width) / 2
        label.drawAtPoint_withAttributes_(NSMakePoint(label_x, 4), attrs)
    def refresh_(self, timer):
        self.setNeedsDisplay_(True)
    def fadeTick_(self, timer):
        self._fade_step += 1
        progress = min(self._fade_step / FADE_STEPS, 1.0)
        if self._fade_direction == 1:
            alpha = progress
        else:
            alpha = 1.0 - progress
        self.window().setAlphaValue_(alpha)
        if progress >= 1.0:
            self.stopFade()
            if self._fade_direction == -1 and self._on_fade_complete:
                self._on_fade_complete()
    def stopFade(self):
        if self._fade_timer is not None:
            self._fade_timer.invalidate()
            self._fade_timer = None
    def startFade_onComplete_(self, direction, on_complete):
        self.stopFade()
        self._fade_direction = direction
        self._fade_step = 0
        self._on_fade_complete = on_complete
        self._fade_timer = NSTimer.scheduledTimerWithTimeInterval_target_selector_userInfo_repeats_(
            FADE_DURATION / FADE_STEPS, self, b"fadeTick:", None, True
        )
 class WaveformOverlay:
    """Floating translucent window showing a live scrolling waveform."""
    def __init__(self):
        self._panel: NSPanel | None = None
        self._view: WaveformView | None = None
        self._timer: NSTimer | None = None
    def _ensure_panel(self):
        if self._panel is not None:
            return
        screen = NSScreen.mainScreen()
        screen_frame = screen.frame()
        x = (screen_frame.size.width - WIDTH) / 2
        y = screen_frame.size.height - HEIGHT - 40  # near top, below menu bar
        rect = NSMakeRect(x, y, WIDTH, HEIGHT)
        style = NSWindowStyleMaskBorderless | NSWindowStyleMaskNonactivatingPanel
        panel = NSPanel.alloc().initWithContentRect_styleMask_backing_defer_(
            rect, style, NSBackingStoreBuffered, False
        )
        panel.setLevel_(NSStatusWindowLevel)
        panel.setOpaque_(False)
        panel.setBackgroundColor_(NSColor.clearColor())
        panel.setHasShadow_(True)
        panel.setIgnoresMouseEvents_(True)
        panel.setCollectionBehavior_(1 << 4)  # NSWindowCollectionBehaviorCanJoinAllSpaces
        view = WaveformView.alloc().initWithFrame_(NSMakeRect(0, 0, WIDTH, HEIGHT))
        panel.setContentView_(view)
        self._panel = panel
        self._view = view
    def show(self):
        callAfter(self._show_on_main)
    def hide(self):
        callAfter(self._hide_on_main)
    def show_transcribing(self):
        """Switch overlay to transcribing state (pulsing dots)."""
        callAfter(self._show_transcribing_on_main)
    def _show_on_main(self):
        self._ensure_panel()
        self._view.stopFade()
        self._view.mode = OverlayMode.RECORDING
        self._view.amplitudes = deque([0.0] * NUM_BARS, maxlen=NUM_BARS)
        self._panel.setAlphaValue_(0.0)
        self._panel.orderFront_(None)
        self._start_timer()
        self._view.startFade_onComplete_(1, None)
        log.debug("Overlay shown")
    def _show_transcribing_on_main(self):
        self._ensure_panel()
        self._view.stopFade()
        self._view.mode = OverlayMode.TRANSCRIBING
        self._view._pulse_start = time.monotonic()
        # If panel is already visible, just switch mode; otherwise show it
        if self._panel.alphaValue() < 0.01:
            self._panel.setAlphaValue_(0.0)
            self._panel.orderFront_(None)
            self._view.startFade_onComplete_(1, None)
        self._start_timer()
        log.debug("Overlay switched to transcribing")
    def _hide_on_main(self):
        if self._view is None or self._panel is None:
            return
        def on_fade_out():
            self._stop_timer()
            self._panel.orderOut_(None)
        self._view.startFade_onComplete_(-1, on_fade_out)
        log.debug("Overlay hiding")
    def cleanup(self):
        """Synchronously stop all timers and hide. Call before quit."""
        self._stop_timer()
        if self._view is not None:
            self._view.stopFade()
        if self._panel is not None:
            self._panel.orderOut_(None)
    def push_samples(self, chunk: np.ndarray):
        """Called from audio callback with a new chunk of float32 samples."""
        rms = float(np.sqrt(np.mean(chunk ** 2)))
        # Clamp to [0, 1] with some headroom
        amplitude = min(rms * 5.0, 1.0)
        if self._view is not None:
            self._view.amplitudes.append(amplitude)
    def _start_timer(self):
        self._stop_timer()
        self._timer = NSTimer.scheduledTimerWithTimeInterval_target_selector_userInfo_repeats_(
            1.0 / FPS, self._view, b"refresh:", None, True
        )
    def _stop_timer(self):
        if self._timer is not None:
            self._timer.invalidate()
            self._timer = None
--- a/calliope/recorder.py
+++ b/calliope/recorder.py
@@ -0,0 +1,88 @@
 """Audio recording using sounddevice."""
 import logging
 import numpy as np
 import sounddevice as sd
 import threading
 log = logging.getLogger(__name__)
 _CONSOLIDATE_EVERY = 100
 class Recorder:
    SAMPLE_RATE = 16_000
    CHANNELS = 1
    def __init__(self, device: int | None = None, on_audio=None):
        self._device = device
        self._chunks: list[np.ndarray] = []
        self._stream: sd.InputStream | None = None
        self._lock = threading.Lock()
        self._chunk_count = 0
        self.on_audio = on_audio
    @property
    def is_recording(self) -> bool:
        return self._stream is not None and self._stream.active
    def start(self) -> None:
        with self._lock:
            self._chunks = []
            self._chunk_count = 0
            try:
                self._stream = sd.InputStream(
                    samplerate=self.SAMPLE_RATE,
                    channels=self.CHANNELS,
                    dtype="float32",
                    device=self._device,
                    callback=self._callback,
                )
                self._stream.start()
                log.debug("Recording stream started (device=%s)", self._device)
            except sd.PortAudioError:
                log.error("Failed to open audio device %s", self._device, exc_info=True)
                if self._stream is not None:
                    try:
                        self._stream.close()
                    except Exception:
                        pass
                self._stream = None
                raise
    def stop(self) -> np.ndarray:
        # Stop stream first — guarantees no more callbacks after this returns.
        if self._stream is not None:
            self._stream.stop()
            self._stream.close()
            self._stream = None
        with self._lock:
            if not self._chunks:
                return np.zeros(0, dtype=np.float32)
            audio = np.concatenate(self._chunks).flatten()
            self._chunks = []
            return audio
    def get_audio_so_far(self) -> np.ndarray:
        """Return a copy of all audio recorded so far without stopping the stream."""
        with self._lock:
            if not self._chunks:
                return np.zeros(0, dtype=np.float32)
            return np.concatenate(self._chunks).flatten()
    def _callback(self, indata: np.ndarray, frames, time_info, status) -> None:
        if status:
            log.warning("Audio stream status: %s", status)
        chunk = indata[:, 0].copy() if indata.ndim > 1 else indata.copy()
        with self._lock:
            self._chunks.append(chunk)
            self._chunk_count += 1
            if self._chunk_count % _CONSOLIDATE_EVERY == 0:
                self._chunks = [np.concatenate(self._chunks).flatten()]
        if self.on_audio is not None:
            try:
                self.on_audio(chunk)
            except Exception:
                log.error("Error in on_audio callback", exc_info=True)
--- a/calliope/setup_wizard.py
+++ b/calliope/setup_wizard.py
@@ -0,0 +1,147 @@
 """First-run setup wizard — Rich TUI."""
 import subprocess
 import sys
 import sounddevice as sd
 from rich.console import Console
 from rich.panel import Panel
 from rich.progress import Progress
 from rich.prompt import Confirm, IntPrompt, Prompt
 from rich.table import Table
 from calliope import config
 console = Console()
 def run() -> dict:
    """Run the interactive setup wizard and return the final config."""
    cfg = dict(config.DEFAULTS)
    # ── Welcome ──────────────────────────────────────────────────────
    console.print(
        Panel.fit(
            "[bold magenta]Calliope[/bold magenta]\n"
            "Voice-to-text for macOS — speak and type into any app.\n\n"
            "This wizard will walk you through first-time setup.",
            border_style="magenta",
        )
    )
    # ── Permission checks ────────────────────────────────────────────
    console.print("\n[bold]Permission checks[/bold]")
    _check_accessibility()
    _check_microphone()
    console.print()
    # ── Mic selection ────────────────────────────────────────────────
    console.print("[bold]Microphone selection[/bold]")
    devices = sd.query_devices()
    table = Table(show_header=True)
    table.add_column("#", style="cyan", width=4)
    table.add_column("Device")
    table.add_column("Inputs", justify="right")
    input_indices: list[int] = []
    for i, d in enumerate(devices):
        if d["max_input_channels"] > 0:
            input_indices.append(i)
            marker = " (default)" if i == sd.default.device[0] else ""
            table.add_row(str(i), f"{d['name']}{marker}", str(d["max_input_channels"]))
    console.print(table)
    default_dev = sd.default.device[0]
    choice = Prompt.ask(
        "Device index",
        default=str(default_dev) if default_dev is not None else str(input_indices[0]),
    )
    cfg["device"] = int(choice) if choice else None
    # ── Hotkey config ────────────────────────────────────────────────
    console.print("\n[bold]Hotkey configuration[/bold]")
    console.print(f"  Push-to-talk : [cyan]{cfg['hotkeys']['ptt']}[/cyan]")
    console.print(f"  Toggle       : [cyan]{cfg['hotkeys']['toggle']}[/cyan]")
    if Confirm.ask("Keep defaults?", default=True):
        pass
    else:
        cfg["hotkeys"]["ptt"] = Prompt.ask("Push-to-talk combo", default=cfg["hotkeys"]["ptt"])
        cfg["hotkeys"]["toggle"] = Prompt.ask("Toggle combo", default=cfg["hotkeys"]["toggle"])
    # ── Model download ───────────────────────────────────────────────
    console.print("\n[bold]Model download[/bold]")
    console.print(f"  Default model: [cyan]{cfg['model']}[/cyan]")
    if not Confirm.ask("Use default model?", default=True):
        cfg["model"] = Prompt.ask("Whisper model")
    console.print(f"Downloading [cyan]{cfg['model']}[/cyan] (this may take a while)...")
    from calliope.transcriber import Transcriber
    transcriber = Transcriber(model=cfg["model"])
    with Progress() as progress:
        task = progress.add_task("Loading model...", total=None)
        transcriber.load()
        progress.update(task, completed=100, total=100)
    console.print("[green]Model ready.[/green]")
    # ── Validation ───────────────────────────────────────────────────
    if Confirm.ask("\nRecord a short test clip to verify everything works?", default=True):
        console.print("Recording for 3 seconds...")
        from calliope.recorder import Recorder
        import time
        rec = Recorder(device=cfg["device"])
        rec.start()
        time.sleep(3)
        audio = rec.stop()
        console.print("Transcribing...")
        text = transcriber.transcribe(audio)
        console.print(f"[green]Result:[/green] {text or '(no speech detected)'}")
    # ── Save ─────────────────────────────────────────────────────────
    config.save(cfg)
    console.print(f"\n[green]Config saved to {config.CONFIG_PATH}[/green]")
    console.print("Run [bold]calliope[/bold] to start. Enjoy! 🎤\n")
    return cfg
 def _check_accessibility() -> None:
    try:
        import ApplicationServices
        trusted = ApplicationServices.AXIsProcessTrusted()
    except Exception:
        trusted = None
    if trusted:
        console.print("  [green]✓[/green] Accessibility access granted")
    else:
        console.print("  [red]✗[/red] Accessibility access — required for typing")
        console.print("    Open: System Settings → Privacy & Security → Accessibility")
        if Confirm.ask("    Open System Settings?", default=False):
            subprocess.run(
                ["open", "x-apple.systempreferences:com.apple.preference.security?Privacy_Accessibility"],
                check=False,
            )
 def _check_microphone() -> None:
    try:
        import AVFoundation
        status = AVFoundation.AVCaptureDevice.authorizationStatusForMediaType_(
            AVFoundation.AVMediaTypeAudio
        )
        granted = status == 3  # AVAuthorizationStatusAuthorized
    except Exception:
        granted = None
    if granted:
        console.print("  [green]✓[/green] Microphone access granted")
    else:
        console.print("  [red]✗[/red] Microphone access — required for recording")
        console.print("    Open: System Settings → Privacy & Security → Microphone")
        if Confirm.ask("    Open System Settings?", default=False):
            subprocess.run(
                ["open", "x-apple.systempreferences:com.apple.preference.security?Privacy_Microphone"],
                check=False,
            )
--- a/calliope/transcriber.py
+++ b/calliope/transcriber.py
@@ -0,0 +1,73 @@
 """Whisper transcription using transformers pipeline on MPS."""
 import logging
 import numpy as np
 import torch
 from transformers import pipeline
 log = logging.getLogger(__name__)
 class Transcriber:
    def __init__(self, model: str = "distil-whisper/distil-large-v3"):
        self.model = model
        self._pipe = None
        self._tokenizer = None
        self.context: str = ""
        self.language: str = "auto"
    def load(self) -> None:
        from transformers import AutoTokenizer
        device = "mps" if torch.backends.mps.is_available() else "cpu"
        # Use float32 on MPS — float16 produces garbled output on Apple Silicon.
        dtype = torch.float32 if device == "mps" else torch.float16
        log.info("Loading model %s on %s (dtype=%s)", self.model, device, dtype)
        try:
            self._pipe = pipeline(
                "automatic-speech-recognition",
                model=self.model,
                torch_dtype=dtype,
                device=device,
            )
            self._tokenizer = AutoTokenizer.from_pretrained(self.model)
            log.info("Model loaded successfully")
        except Exception:
            log.error("Failed to load model %s", self.model, exc_info=True)
            raise
    def transcribe(self, audio: np.ndarray) -> str:
        if self._pipe is None:
            self.load()
        if audio.size == 0:
            return ""
        # Skip audio that's too short (<1s) or too quiet — Whisper hallucinates
        # punctuation like "!" on silence/noise.
        duration = audio.size / 16_000
        energy = float(np.sqrt(np.mean(audio ** 2)))
        log.debug("Audio: %.1fs, RMS energy: %.6f", duration, energy)
        if duration < 1.0 or energy < 0.005:
            log.debug("Audio too short or too quiet, skipping transcription")
            return ""
        generate_kwargs = {}
        if self.context:
            prompt_ids = self._tokenizer.get_prompt_ids(self.context)
            generate_kwargs["prompt_ids"] = prompt_ids
        pipe_kwargs = {
            "batch_size": 4,
            "return_timestamps": True,
            "generate_kwargs": generate_kwargs,
        }
        if self.language != "auto":
            pipe_kwargs["generate_kwargs"]["language"] = self.language
            pipe_kwargs["generate_kwargs"]["task"] = "transcribe"
        result = self._pipe(
            {"raw": audio, "sampling_rate": 16_000},
            **pipe_kwargs,
        )
        return result["text"].strip()
--- a/calliope/typer.py
+++ b/calliope/typer.py
@@ -0,0 +1,64 @@
 """Type text into the focused field using Quartz CGEvents."""
 import logging
 import subprocess
 import time
 import Quartz
 log = logging.getLogger(__name__)
 def type_text(text: str) -> None:
    """Simulate typing text into the currently focused text field."""
    for char in text:
        _type_char(char)
        time.sleep(0.005)
 def type_text_clipboard(text: str) -> None:
    """Type text by copying to clipboard and pasting with Cmd+V.
    Saves and restores the previous clipboard contents.
    """
    # Save current clipboard
    try:
        prev = subprocess.run(
            ["pbpaste"], capture_output=True, text=True, timeout=2,
        ).stdout
    except Exception:
        prev = None
    # Copy text to clipboard
    subprocess.run(["pbcopy"], input=text, text=True, timeout=2)
    # Paste with Cmd+V
    _cmd_v()
    time.sleep(0.05)
    # Restore previous clipboard
    if prev is not None:
        subprocess.run(["pbcopy"], input=prev, text=True, timeout=2)
 def _cmd_v() -> None:
    """Simulate Cmd+V keypress."""
    # 'v' keycode is 9
    event_down = Quartz.CGEventCreateKeyboardEvent(None, 9, True)
    event_up = Quartz.CGEventCreateKeyboardEvent(None, 9, False)
    Quartz.CGEventSetFlags(event_down, Quartz.kCGEventFlagMaskCommand)
    Quartz.CGEventSetFlags(event_up, Quartz.kCGEventFlagMaskCommand)
    Quartz.CGEventPost(Quartz.kCGAnnotatedSessionEventTap, event_down)
    Quartz.CGEventPost(Quartz.kCGAnnotatedSessionEventTap, event_up)
 def _type_char(char: str) -> None:
    """Type a single unicode character via CGEvents."""
    event_down = Quartz.CGEventCreateKeyboardEvent(None, 0, True)
    event_up = Quartz.CGEventCreateKeyboardEvent(None, 0, False)
    Quartz.CGEventKeyboardSetUnicodeString(event_down, len(char), char)
    Quartz.CGEventKeyboardSetUnicodeString(event_up, len(char), char)
    Quartz.CGEventPost(Quartz.kCGAnnotatedSessionEventTap, event_down)
    Quartz.CGEventPost(Quartz.kCGAnnotatedSessionEventTap, event_up)
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -0,0 +1,27 @@
 [build-system]
 requires = ["setuptools>=68.0"]
 build-backend = "setuptools.build_meta"
 [project]
 name = "calliope"
 version = "0.1.0"
 description = "Voice-to-text for macOS — speak and type into any app"
 requires-python = ">=3.10"
 dependencies = [
    "rumps>=0.4.0",
    "sounddevice>=0.4.6",
    "numpy>=1.24.0",
    "torch>=2.0.0",
    "transformers>=4.36.0",
    "accelerate>=0.25.0",
    "pynput>=1.7.6",
    "pyobjc-framework-Quartz>=9.0",
    "pyobjc-framework-Cocoa>=9.0",
    "pyobjc-framework-AVFoundation>=9.0",
    "rich>=13.0.0",
    "click>=8.1.0",
    "pyyaml>=6.0",
 ]
 [project.scripts]
 calliope = "calliope.app:main"