Initial commit: Calliope voice-to-text macOS menu bar app
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
22
.gitignore
vendored
Normal file
22
.gitignore
vendored
Normal file
@@ -0,0 +1,22 @@
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
*.so
|
||||
*.egg-info/
|
||||
*.egg
|
||||
dist/
|
||||
build/
|
||||
.eggs/
|
||||
*.whl
|
||||
.venv/
|
||||
venv/
|
||||
env/
|
||||
.env
|
||||
*.log
|
||||
.DS_Store
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
.idea/
|
||||
.vscode/
|
||||
*.iml
|
||||
42
CLAUDE.md
Normal file
42
CLAUDE.md
Normal file
@@ -0,0 +1,42 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## What is Calliope?
|
||||
|
||||
A macOS menu bar app for local voice-to-text. Users press a hotkey, speak, and transcribed text is typed into the focused app. Runs entirely offline using Whisper models via Hugging Face Transformers + PyTorch.
|
||||
|
||||
## Setup & Running
|
||||
|
||||
```bash
|
||||
pip install -e . # Install in dev mode
|
||||
calliope # Launch (runs setup wizard on first run)
|
||||
calliope setup # Re-run setup wizard
|
||||
calliope --debug # Launch with debug logging
|
||||
calliope --device 2 --model openai/whisper-large-v3 # Override config
|
||||
```
|
||||
|
||||
No test suite or linter is configured yet.
|
||||
|
||||
## Architecture
|
||||
|
||||
**Entry point:** `calliope/cli.py` → Click CLI → `calliope/app.py:main()`
|
||||
|
||||
**Data flow:** Hotkey press → Record audio → Transcribe with Whisper → Type into focused app
|
||||
|
||||
Key modules in `calliope/`:
|
||||
|
||||
- **app.py** — `CalliopeApp(rumps.App)`: main orchestrator, manages menu bar UI and coordinates all components
|
||||
- **recorder.py** — Audio capture via `sounddevice` at 16kHz mono float32, with chunk consolidation
|
||||
- **transcriber.py** — Whisper STT using HF `transformers.pipeline("automatic-speech-recognition")`
|
||||
- **hotkeys.py** — `HotkeyListener` using `pynput`: supports push-to-talk (Ctrl+Shift hold) and toggle (Ctrl+Space) modes
|
||||
- **typer.py** — Outputs text via Quartz CGEvents (character mode) or clipboard paste (Cmd+V)
|
||||
- **overlay.py** — `WaveformOverlay`: floating NSPanel with scrolling waveform during recording, pulsing dots during transcription
|
||||
- **setup_wizard.py** — Rich-based interactive first-run config (mic, hotkeys, model download)
|
||||
- **config.py** — Loads/saves YAML config at `~/.config/calliope/config.yaml`
|
||||
|
||||
## Platform Constraints
|
||||
|
||||
- **macOS only** — uses `pyobjc` bindings (Quartz, AppKit, AVFoundation, ApplicationServices)
|
||||
- **MPS (Apple Silicon):** must use float32, not float16 (causes garbled Whisper output)
|
||||
- Requires Accessibility and Microphone permissions in macOS System Settings
|
||||
21
LICENSE
Normal file
21
LICENSE
Normal file
@@ -0,0 +1,21 @@
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2026 Calliope Contributors
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
86
README.md
Normal file
86
README.md
Normal file
@@ -0,0 +1,86 @@
|
||||
# Calliope
|
||||
|
||||
Voice-to-text for macOS — speak and type into any app.
|
||||
|
||||
Calliope sits in your menu bar, listens when you hold a hotkey, transcribes your speech with Whisper, and types the result into whatever app is focused. No cloud, no API keys — everything runs locally on your Mac.
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
git clone https://github.com/yourname/calliope.git
|
||||
cd calliope
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# First run — launches the setup wizard, then starts the app
|
||||
calliope
|
||||
|
||||
# Re-run the setup wizard
|
||||
calliope setup
|
||||
|
||||
# Launch with overrides
|
||||
calliope --device 2 --model openai/whisper-large-v3 --debug
|
||||
|
||||
# Print version
|
||||
calliope --version
|
||||
```
|
||||
|
||||
## Hotkeys
|
||||
|
||||
| Action | Default | Description |
|
||||
|--------|---------|-------------|
|
||||
| Push-to-talk | `Ctrl+Shift` (hold) | Records while held, transcribes on release |
|
||||
| Toggle | `Ctrl+Space` | Start/stop recording |
|
||||
|
||||
Hotkeys are configurable via the setup wizard or `~/.config/calliope/config.yaml`.
|
||||
|
||||
## Permissions
|
||||
|
||||
Calliope needs two macOS permissions:
|
||||
|
||||
- **Accessibility** — to type text into other apps (System Settings > Privacy & Security > Accessibility)
|
||||
- **Microphone** — to record audio (System Settings > Privacy & Security > Microphone)
|
||||
|
||||
The setup wizard checks for these and can open System Settings for you.
|
||||
|
||||
## Configuration
|
||||
|
||||
Config lives at `~/.config/calliope/config.yaml`:
|
||||
|
||||
```yaml
|
||||
device: null # sounddevice index; null = system default
|
||||
model: distil-whisper/distil-large-v3
|
||||
hotkeys:
|
||||
ptt: ctrl+shift
|
||||
toggle: ctrl+space
|
||||
context: "" # domain-specific terms to help Whisper
|
||||
debug: false
|
||||
```
|
||||
|
||||
CLI flags override config values for that session.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**"Status: Model load failed"**
|
||||
Check that you have enough disk space and RAM. The default model needs ~1.5 GB. Run with `--debug` for detailed logs.
|
||||
|
||||
**No text appears after transcribing**
|
||||
Make sure Accessibility permission is granted. Restart Calliope after granting it.
|
||||
|
||||
**Wrong microphone**
|
||||
Run `calliope setup` to pick a different input device, or set `device` in the config file. Use `python -m sounddevice` to list devices.
|
||||
|
||||
**Hotkeys not working**
|
||||
Ensure no other app is capturing the same key combo. Customize hotkeys via `calliope setup`.
|
||||
|
||||
## Remaining TODOs
|
||||
|
||||
- LICENSE file
|
||||
- Unit tests
|
||||
- CI/CD pipeline
|
||||
- Homebrew formula
|
||||
- `.app` bundle for drag-and-drop install
|
||||
- Changelog
|
||||
0
calliope/__init__.py
Normal file
0
calliope/__init__.py
Normal file
302
calliope/app.py
Normal file
302
calliope/app.py
Normal file
@@ -0,0 +1,302 @@
|
||||
"""Calliope — Voice-to-text macOS menu bar app."""
|
||||
|
||||
import logging
|
||||
import os
|
||||
import threading
|
||||
import time
|
||||
from typing import Any
|
||||
|
||||
# Disable tokenizers parallelism to avoid leaked semaphore warnings on shutdown.
|
||||
os.environ.setdefault("TOKENIZERS_PARALLELISM", "false")
|
||||
# Run offline — models are downloaded during setup, no need to hit HuggingFace on every launch.
|
||||
os.environ.setdefault("HF_HUB_OFFLINE", "1")
|
||||
|
||||
import rumps
|
||||
|
||||
from calliope import config as config_mod
|
||||
from calliope.recorder import Recorder
|
||||
from calliope.transcriber import Transcriber
|
||||
from calliope.typer import type_text, type_text_clipboard
|
||||
from calliope.hotkeys import HotkeyListener
|
||||
from calliope.overlay import WaveformOverlay
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
|
||||
|
||||
class CalliopeApp(rumps.App):
|
||||
def __init__(self, cfg: dict[str, Any] | None = None):
|
||||
super().__init__("Calliope", title="\U0001f3a4", quit_button=None) # 🎤
|
||||
|
||||
if cfg is None:
|
||||
cfg = config_mod.load()
|
||||
|
||||
self.cfg = cfg
|
||||
self.overlay = WaveformOverlay()
|
||||
self.recorder = Recorder(device=cfg.get("device"))
|
||||
self.transcriber = Transcriber(
|
||||
model=cfg.get("model", "distil-whisper/distil-large-v3"),
|
||||
)
|
||||
self.transcriber.context = cfg.get("context", "")
|
||||
self.transcriber.language = cfg.get("language", "auto")
|
||||
|
||||
self._recording = False
|
||||
self._rec_lock = threading.Lock()
|
||||
self._rec_start_time: float | None = None
|
||||
self._rec_timer: rumps.Timer | None = None
|
||||
|
||||
self.status_item = rumps.MenuItem("Status: Loading model...")
|
||||
self.status_item.set_callback(None)
|
||||
self.toggle_item = rumps.MenuItem("Start Recording", callback=self._on_toggle_click)
|
||||
self.context_item = rumps.MenuItem("Set Whisper Context...", callback=self._on_set_context)
|
||||
|
||||
# Language submenu
|
||||
self._lang_menu = rumps.MenuItem("Language")
|
||||
current_lang = cfg.get("language", "auto")
|
||||
for display_name, code in config_mod.LANGUAGES.items():
|
||||
prefix = "\u2713 " if code == current_lang else " "
|
||||
item = rumps.MenuItem(f"{prefix}{display_name}", callback=self._on_language_select)
|
||||
self._lang_menu.add(item)
|
||||
|
||||
# Model submenu
|
||||
self._model_menu = rumps.MenuItem("Model")
|
||||
current_model = cfg.get("model", "distil-whisper/distil-large-v3")
|
||||
for model_id in config_mod.MODELS:
|
||||
short = model_id.split("/")[-1]
|
||||
prefix = "\u2713 " if model_id == current_model else " "
|
||||
item = rumps.MenuItem(f"{prefix}{short}", callback=self._on_model_select)
|
||||
self._model_menu.add(item)
|
||||
|
||||
quit_item = rumps.MenuItem("Quit Calliope", callback=self._on_quit)
|
||||
|
||||
self.menu = [
|
||||
self.status_item,
|
||||
None,
|
||||
self.toggle_item,
|
||||
self.context_item,
|
||||
self._lang_menu,
|
||||
self._model_menu,
|
||||
None,
|
||||
quit_item,
|
||||
]
|
||||
|
||||
hotkey_cfg = cfg.get("hotkeys", {})
|
||||
self.hotkeys = HotkeyListener(
|
||||
on_push_to_talk_start=self._start_recording,
|
||||
on_push_to_talk_stop=self._stop_and_transcribe,
|
||||
on_toggle=self._toggle_recording,
|
||||
ptt_combo=hotkey_cfg.get("ptt", "ctrl+shift"),
|
||||
toggle_combo=hotkey_cfg.get("toggle", "ctrl+space"),
|
||||
)
|
||||
|
||||
# Load model in background
|
||||
threading.Thread(target=self._load_model, daemon=True).start()
|
||||
|
||||
def _load_model(self) -> None:
|
||||
try:
|
||||
self.transcriber.load()
|
||||
self.status_item.title = "Status: Ready"
|
||||
self.hotkeys.start()
|
||||
log.info("Model loaded, hotkeys active")
|
||||
except Exception:
|
||||
log.error("Failed to load model", exc_info=True)
|
||||
self.status_item.title = "Status: Model load failed"
|
||||
try:
|
||||
rumps.notification("Calliope", "Error", "Failed to load Whisper model. Check logs.")
|
||||
except RuntimeError:
|
||||
pass
|
||||
|
||||
@staticmethod
|
||||
def _activate_app():
|
||||
"""Temporarily become a regular app so dialog text fields receive focus."""
|
||||
from AppKit import NSApplication, NSApplicationActivationPolicyRegular
|
||||
app = NSApplication.sharedApplication()
|
||||
app.setActivationPolicy_(NSApplicationActivationPolicyRegular)
|
||||
app.activateIgnoringOtherApps_(True)
|
||||
|
||||
@staticmethod
|
||||
def _deactivate_app():
|
||||
"""Revert to accessory app (no Dock icon)."""
|
||||
from AppKit import NSApplication, NSApplicationActivationPolicyAccessory
|
||||
NSApplication.sharedApplication().setActivationPolicy_(NSApplicationActivationPolicyAccessory)
|
||||
|
||||
def _on_set_context(self, sender) -> None:
|
||||
self._activate_app()
|
||||
response = rumps.Window(
|
||||
message="Provide context to help Whisper with domain-specific terms, "
|
||||
"names, or jargon. For example:\n\n"
|
||||
"\"Meeting about Kubernetes, gRPC, and the Istio service mesh.\"",
|
||||
title="Set Whisper Context",
|
||||
default_text=self.transcriber.context,
|
||||
ok="Save",
|
||||
cancel="Clear",
|
||||
dimensions=(320, 120),
|
||||
).run()
|
||||
if response.clicked == 1: # Save
|
||||
self.transcriber.context = response.text.strip()
|
||||
else: # Clear
|
||||
self.transcriber.context = ""
|
||||
self._deactivate_app()
|
||||
ctx = self.transcriber.context
|
||||
self.context_item.title = f"Set Whisper Context... ({ctx[:20]}...)" if ctx else "Set Whisper Context..."
|
||||
|
||||
def _on_language_select(self, sender) -> None:
|
||||
display_name = sender.title.strip().lstrip("\u2713").strip()
|
||||
code = config_mod.LANGUAGES.get(display_name, "auto")
|
||||
self.transcriber.language = code
|
||||
# Update checkmarks
|
||||
for item in self._lang_menu.values():
|
||||
name = item.title.strip().lstrip("\u2713").strip()
|
||||
item.title = f"\u2713 {name}" if config_mod.LANGUAGES.get(name) == code else f" {name}"
|
||||
self.cfg["language"] = code
|
||||
config_mod.save(self.cfg)
|
||||
log.info("Language set to %s (%s)", display_name, code)
|
||||
|
||||
def _on_model_select(self, sender) -> None:
|
||||
short_name = sender.title.strip().lstrip("\u2713").strip()
|
||||
# Find full model ID
|
||||
model_id = None
|
||||
for m in config_mod.MODELS:
|
||||
if m.split("/")[-1] == short_name:
|
||||
model_id = m
|
||||
break
|
||||
if model_id is None or model_id == self.transcriber.model:
|
||||
return
|
||||
# Update checkmarks
|
||||
for item in self._model_menu.values():
|
||||
name = item.title.strip().lstrip("\u2713").strip()
|
||||
item.title = f"\u2713 {name}" if name == short_name else f" {name}"
|
||||
self.cfg["model"] = model_id
|
||||
config_mod.save(self.cfg)
|
||||
self.status_item.title = "Status: Loading model..."
|
||||
self.hotkeys.stop()
|
||||
self._release_transcriber()
|
||||
self.transcriber = Transcriber(model=model_id)
|
||||
self.transcriber.context = self.cfg.get("context", "")
|
||||
self.transcriber.language = self.cfg.get("language", "auto")
|
||||
threading.Thread(target=self._load_model, daemon=True).start()
|
||||
log.info("Switching model to %s", model_id)
|
||||
|
||||
def _release_transcriber(self) -> None:
|
||||
"""Free the current Whisper model to reclaim GPU memory."""
|
||||
if self.transcriber is not None:
|
||||
self.transcriber._pipe = None
|
||||
self.transcriber._tokenizer = None
|
||||
import torch
|
||||
if torch.backends.mps.is_available():
|
||||
torch.mps.empty_cache()
|
||||
|
||||
def _on_toggle_click(self, sender) -> None:
|
||||
self._toggle_recording()
|
||||
|
||||
def _toggle_recording(self) -> None:
|
||||
if self._recording:
|
||||
self._stop_and_transcribe()
|
||||
else:
|
||||
self._start_recording()
|
||||
|
||||
def _start_recording(self) -> None:
|
||||
with self._rec_lock:
|
||||
if self._recording:
|
||||
return
|
||||
self._recording = True
|
||||
self._rec_start_time = time.time()
|
||||
self.title = "\U0001f534 0:00" # 🔴
|
||||
self.toggle_item.title = "Stop Recording"
|
||||
self.status_item.title = "Status: Recording..."
|
||||
self.recorder.on_audio = self.overlay.push_samples
|
||||
try:
|
||||
self.recorder.start()
|
||||
except Exception:
|
||||
log.error("Failed to start recording", exc_info=True)
|
||||
with self._rec_lock:
|
||||
self._recording = False
|
||||
self.title = "\U0001f3a4" # 🎤
|
||||
self.toggle_item.title = "Start Recording"
|
||||
self.status_item.title = "Status: Mic error (check device)"
|
||||
try:
|
||||
rumps.notification("Calliope", "", "Microphone unavailable — check audio device")
|
||||
except RuntimeError:
|
||||
pass
|
||||
return
|
||||
self.overlay.show()
|
||||
self._rec_timer = rumps.Timer(self._update_rec_duration, 1)
|
||||
self._rec_timer.start()
|
||||
try:
|
||||
rumps.notification("Calliope", "", "Recording started")
|
||||
except RuntimeError:
|
||||
pass # Info.plist missing CFBundleIdentifier
|
||||
log.info("Recording started")
|
||||
|
||||
def _stop_and_transcribe(self) -> None:
|
||||
with self._rec_lock:
|
||||
if not self._recording:
|
||||
return
|
||||
self._recording = False
|
||||
if self._rec_timer:
|
||||
self._rec_timer.stop()
|
||||
self._rec_timer = None
|
||||
duration = int(time.time() - self._rec_start_time) if self._rec_start_time else 0
|
||||
self._rec_start_time = None
|
||||
self.title = "\U0001f3a4" # 🎤
|
||||
self.toggle_item.title = "Start Recording"
|
||||
self.status_item.title = "Status: Transcribing..."
|
||||
self.overlay.show_transcribing()
|
||||
|
||||
audio = self.recorder.stop()
|
||||
try:
|
||||
rumps.notification("Calliope", "", f"Recording stopped ({duration}s)")
|
||||
except RuntimeError:
|
||||
pass
|
||||
log.info("Recording stopped, %d samples", audio.size)
|
||||
threading.Thread(target=self._transcribe_and_type, args=(audio,), daemon=True).start()
|
||||
|
||||
def _update_rec_duration(self, timer) -> None:
|
||||
if self._rec_start_time is None:
|
||||
return
|
||||
elapsed = int(time.time() - self._rec_start_time)
|
||||
minutes, seconds = divmod(elapsed, 60)
|
||||
self.title = f"\U0001f534 {minutes}:{seconds:02d}"
|
||||
|
||||
def _transcribe_and_type(self, audio) -> None:
|
||||
try:
|
||||
text = self.transcriber.transcribe(audio)
|
||||
if text:
|
||||
def _do_type():
|
||||
try:
|
||||
if self.cfg.get("typing_mode", "char") == "clipboard":
|
||||
type_text_clipboard(text)
|
||||
else:
|
||||
type_text(text)
|
||||
print(f"\n[Calliope] {text}")
|
||||
log.info("Typed %d chars", len(text))
|
||||
except Exception:
|
||||
log.error("Typing failed", exc_info=True)
|
||||
from PyObjCTools.AppHelper import callAfter
|
||||
callAfter(_do_type)
|
||||
self.overlay.hide()
|
||||
self.status_item.title = "Status: Ready"
|
||||
except Exception:
|
||||
log.error("Transcription failed", exc_info=True)
|
||||
self.overlay.hide()
|
||||
self.status_item.title = "Status: Ready"
|
||||
try:
|
||||
rumps.notification("Calliope", "Error", "Transcription failed. Check logs.")
|
||||
except RuntimeError:
|
||||
pass
|
||||
|
||||
def _on_quit(self, sender) -> None:
|
||||
self.hotkeys.stop()
|
||||
self.recorder.stop()
|
||||
# Stop overlay timers synchronously to avoid retain cycles on quit.
|
||||
self.overlay.cleanup()
|
||||
rumps.quit_application()
|
||||
|
||||
|
||||
def main():
|
||||
from calliope.cli import cli
|
||||
cli()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
55
calliope/cli.py
Normal file
55
calliope/cli.py
Normal file
@@ -0,0 +1,55 @@
|
||||
"""CLI entry point using click."""
|
||||
|
||||
import logging
|
||||
|
||||
import click
|
||||
|
||||
from calliope import config
|
||||
|
||||
|
||||
@click.group(invoke_without_command=True)
|
||||
@click.option("--device", type=int, default=None, help="Audio input device index.")
|
||||
@click.option("--model", type=str, default=None, help="Whisper model name.")
|
||||
@click.option("--context", type=str, default=None, help="Transcription context prompt.")
|
||||
@click.option("--debug", is_flag=True, default=False, help="Enable debug logging.")
|
||||
@click.version_option(package_name="calliope")
|
||||
@click.pass_context
|
||||
def cli(ctx, device, model, context, debug):
|
||||
"""Calliope — Voice-to-text for macOS."""
|
||||
level = logging.DEBUG if debug else logging.INFO
|
||||
logging.basicConfig(
|
||||
level=level,
|
||||
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
|
||||
)
|
||||
|
||||
cfg = config.load()
|
||||
|
||||
# CLI flags override config
|
||||
if device is not None:
|
||||
cfg["device"] = device
|
||||
if model is not None:
|
||||
cfg["model"] = model
|
||||
if context is not None:
|
||||
cfg["context"] = context
|
||||
if debug:
|
||||
cfg["debug"] = True
|
||||
|
||||
ctx.ensure_object(dict)
|
||||
ctx.obj["cfg"] = cfg
|
||||
|
||||
if ctx.invoked_subcommand is None:
|
||||
# First run → wizard, then launch
|
||||
if not config.exists():
|
||||
from calliope.setup_wizard import run as run_wizard
|
||||
cfg = run_wizard()
|
||||
ctx.obj["cfg"] = cfg
|
||||
|
||||
from calliope.app import CalliopeApp
|
||||
CalliopeApp(cfg).run()
|
||||
|
||||
|
||||
@cli.command()
|
||||
def setup():
|
||||
"""Re-run the setup wizard."""
|
||||
from calliope.setup_wizard import run as run_wizard
|
||||
run_wizard()
|
||||
85
calliope/config.py
Normal file
85
calliope/config.py
Normal file
@@ -0,0 +1,85 @@
|
||||
"""Persistent YAML config at ~/.config/calliope/config.yaml."""
|
||||
|
||||
import logging
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import yaml
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
CONFIG_DIR = Path.home() / ".config" / "calliope"
|
||||
CONFIG_PATH = CONFIG_DIR / "config.yaml"
|
||||
|
||||
DEFAULTS: dict[str, Any] = {
|
||||
"device": None, # sounddevice index; None = system default
|
||||
"model": "distil-whisper/distil-large-v3",
|
||||
"language": "auto",
|
||||
"hotkeys": {
|
||||
"ptt": "ctrl+shift",
|
||||
"toggle": "ctrl+space",
|
||||
},
|
||||
"context": "",
|
||||
"debug": False,
|
||||
"typing_mode": "char", # "char" or "clipboard"
|
||||
}
|
||||
|
||||
LANGUAGES: dict[str, str] = {
|
||||
"Auto": "auto",
|
||||
"English": "en",
|
||||
"Spanish": "es",
|
||||
"French": "fr",
|
||||
"German": "de",
|
||||
"Japanese": "ja",
|
||||
"Chinese": "zh",
|
||||
"Korean": "ko",
|
||||
"Portuguese": "pt",
|
||||
"Italian": "it",
|
||||
"Dutch": "nl",
|
||||
"Russian": "ru",
|
||||
}
|
||||
|
||||
MODELS: list[str] = [
|
||||
"distil-whisper/distil-large-v3",
|
||||
"openai/whisper-large-v3",
|
||||
"openai/whisper-base",
|
||||
"openai/whisper-small",
|
||||
"openai/whisper-medium",
|
||||
]
|
||||
|
||||
|
||||
def _deep_merge(base: dict, override: dict) -> dict:
|
||||
"""Recursively merge override into base, returning a new dict."""
|
||||
result = dict(base)
|
||||
for key, value in override.items():
|
||||
if key in result and isinstance(result[key], dict) and isinstance(value, dict):
|
||||
result[key] = _deep_merge(result[key], value)
|
||||
else:
|
||||
result[key] = value
|
||||
return result
|
||||
|
||||
|
||||
def load() -> dict[str, Any]:
|
||||
"""Load config from disk, falling back to defaults."""
|
||||
cfg = dict(DEFAULTS)
|
||||
if CONFIG_PATH.exists():
|
||||
try:
|
||||
with open(CONFIG_PATH) as f:
|
||||
saved = yaml.safe_load(f) or {}
|
||||
cfg = _deep_merge(cfg, saved)
|
||||
log.debug("Loaded config from %s", CONFIG_PATH)
|
||||
except Exception:
|
||||
log.warning("Failed to read config; using defaults", exc_info=True)
|
||||
return cfg
|
||||
|
||||
|
||||
def save(cfg: dict[str, Any]) -> None:
|
||||
"""Write config to disk."""
|
||||
CONFIG_DIR.mkdir(parents=True, exist_ok=True)
|
||||
with open(CONFIG_PATH, "w") as f:
|
||||
yaml.safe_dump(cfg, f, default_flow_style=False)
|
||||
log.info("Config saved to %s", CONFIG_PATH)
|
||||
|
||||
|
||||
def exists() -> bool:
|
||||
return CONFIG_PATH.exists()
|
||||
106
calliope/hotkeys.py
Normal file
106
calliope/hotkeys.py
Normal file
@@ -0,0 +1,106 @@
|
||||
"""Global hotkey listener using pynput."""
|
||||
|
||||
import logging
|
||||
from typing import Callable
|
||||
|
||||
from pynput import keyboard
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
# Maps string names to pynput keys
|
||||
_KEY_MAP: dict[str, keyboard.Key] = {
|
||||
"ctrl": keyboard.Key.ctrl,
|
||||
"shift": keyboard.Key.shift,
|
||||
"alt": keyboard.Key.alt,
|
||||
"cmd": keyboard.Key.cmd,
|
||||
"space": keyboard.Key.space,
|
||||
}
|
||||
|
||||
|
||||
def _parse_combo(combo: str) -> set[keyboard.Key]:
|
||||
"""Parse 'ctrl+shift' into a set of pynput keys."""
|
||||
keys: set[keyboard.Key] = set()
|
||||
for part in combo.lower().split("+"):
|
||||
part = part.strip()
|
||||
if part in _KEY_MAP:
|
||||
keys.add(_KEY_MAP[part])
|
||||
else:
|
||||
log.warning("Unknown key in combo: %s", part)
|
||||
return keys
|
||||
|
||||
|
||||
class HotkeyListener:
|
||||
def __init__(
|
||||
self,
|
||||
on_push_to_talk_start: Callable,
|
||||
on_push_to_talk_stop: Callable,
|
||||
on_toggle: Callable,
|
||||
ptt_combo: str = "ctrl+shift",
|
||||
toggle_combo: str = "ctrl+space",
|
||||
):
|
||||
self._on_ptt_start = on_push_to_talk_start
|
||||
self._on_ptt_stop = on_push_to_talk_stop
|
||||
self._on_toggle = on_toggle
|
||||
self._listener: keyboard.Listener | None = None
|
||||
self._pressed: set = set()
|
||||
self._ptt_active = False
|
||||
self._toggle_active = False
|
||||
|
||||
self._ptt_keys = _parse_combo(ptt_combo)
|
||||
self._toggle_keys = _parse_combo(toggle_combo)
|
||||
log.debug("PTT keys: %s, Toggle keys: %s", self._ptt_keys, self._toggle_keys)
|
||||
|
||||
def start(self) -> None:
|
||||
self._pressed.clear()
|
||||
self._ptt_active = False
|
||||
self._toggle_active = False
|
||||
self._listener = keyboard.Listener(
|
||||
on_press=self._on_press,
|
||||
on_release=self._on_release,
|
||||
)
|
||||
self._listener.daemon = True
|
||||
self._listener.start()
|
||||
|
||||
def stop(self) -> None:
|
||||
if self._listener is not None:
|
||||
try:
|
||||
self._listener.stop()
|
||||
except Exception:
|
||||
pass
|
||||
self._listener = None
|
||||
self._pressed.clear()
|
||||
self._ptt_active = False
|
||||
self._toggle_active = False
|
||||
|
||||
def _normalize(self, key) -> keyboard.Key | keyboard.KeyCode:
|
||||
if hasattr(key, "value") and hasattr(key.value, "vk"):
|
||||
vk = key.value.vk
|
||||
if vk in (0x3B, 0x3E):
|
||||
return keyboard.Key.ctrl
|
||||
if vk in (0x38, 0x3C):
|
||||
return keyboard.Key.shift
|
||||
return key
|
||||
|
||||
def _on_press(self, key) -> None:
|
||||
key = self._normalize(key)
|
||||
self._pressed.add(key)
|
||||
|
||||
if self._ptt_keys.issubset(self._pressed) and not self._ptt_active:
|
||||
self._ptt_active = True
|
||||
self._on_ptt_start()
|
||||
|
||||
if self._toggle_keys.issubset(self._pressed) and not self._toggle_active:
|
||||
self._toggle_active = True
|
||||
self._on_toggle()
|
||||
|
||||
def _on_release(self, key) -> None:
|
||||
key = self._normalize(key)
|
||||
|
||||
if self._ptt_active and key in self._ptt_keys:
|
||||
self._ptt_active = False
|
||||
self._on_ptt_stop()
|
||||
|
||||
if key in self._toggle_keys:
|
||||
self._toggle_active = False
|
||||
|
||||
self._pressed.discard(key)
|
||||
313
calliope/overlay.py
Normal file
313
calliope/overlay.py
Normal file
@@ -0,0 +1,313 @@
|
||||
"""Floating waveform overlay shown during recording."""
|
||||
|
||||
import logging
|
||||
import time
|
||||
from collections import deque
|
||||
from enum import Enum, auto
|
||||
|
||||
import numpy as np
|
||||
|
||||
from AppKit import (
|
||||
NSPanel,
|
||||
NSView,
|
||||
NSColor,
|
||||
NSBezierPath,
|
||||
NSTimer,
|
||||
NSScreen,
|
||||
NSWindowStyleMaskBorderless,
|
||||
NSWindowStyleMaskNonactivatingPanel,
|
||||
NSFloatingWindowLevel,
|
||||
NSStatusWindowLevel,
|
||||
NSBackingStoreBuffered,
|
||||
NSApp,
|
||||
NSFont,
|
||||
NSFontAttributeName,
|
||||
NSForegroundColorAttributeName,
|
||||
NSMakePoint,
|
||||
)
|
||||
from Foundation import NSMakeRect
|
||||
from objc import super as objc_super
|
||||
from PyObjCTools.AppHelper import callAfter
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
WIDTH = 360
|
||||
HEIGHT = 80
|
||||
NUM_BARS = 150 # number of amplitude samples visible at once
|
||||
FPS = 30
|
||||
|
||||
# Fade animation
|
||||
FADE_DURATION = 0.2 # seconds
|
||||
FADE_STEPS = int(FADE_DURATION * FPS)
|
||||
|
||||
|
||||
class OverlayMode(Enum):
|
||||
RECORDING = auto()
|
||||
TRANSCRIBING = auto()
|
||||
|
||||
|
||||
class WaveformView(NSView):
|
||||
"""Custom NSView that draws a scrolling waveform or transcribing indicator."""
|
||||
|
||||
amplitudes: deque
|
||||
mode: OverlayMode
|
||||
_pulse_start: float
|
||||
_fade_step: int
|
||||
_fade_direction: int
|
||||
_fade_timer: object
|
||||
_on_fade_complete: object
|
||||
|
||||
def initWithFrame_(self, frame):
|
||||
self = objc_super(WaveformView, self).initWithFrame_(frame)
|
||||
if self is None:
|
||||
return None
|
||||
self.amplitudes = deque([0.0] * NUM_BARS, maxlen=NUM_BARS)
|
||||
self.mode = OverlayMode.RECORDING
|
||||
self._pulse_start = time.monotonic()
|
||||
self._fade_step = 0
|
||||
self._fade_direction = 0
|
||||
self._fade_timer = None
|
||||
self._on_fade_complete = None
|
||||
return self
|
||||
|
||||
def drawRect_(self, rect):
|
||||
# Dark translucent rounded-rect background
|
||||
bg = NSColor.colorWithCalibratedRed_green_blue_alpha_(0.1, 0.1, 0.1, 0.85)
|
||||
bg.setFill()
|
||||
path = NSBezierPath.bezierPathWithRoundedRect_xRadius_yRadius_(
|
||||
self.bounds(), 12, 12
|
||||
)
|
||||
path.fill()
|
||||
|
||||
# Subtle border
|
||||
border = NSColor.colorWithCalibratedRed_green_blue_alpha_(1.0, 1.0, 1.0, 0.12)
|
||||
border.setStroke()
|
||||
border_path = NSBezierPath.bezierPathWithRoundedRect_xRadius_yRadius_(
|
||||
self.bounds(), 12, 12
|
||||
)
|
||||
border_path.setLineWidth_(1.0)
|
||||
border_path.stroke()
|
||||
|
||||
if self.mode == OverlayMode.RECORDING:
|
||||
self._draw_waveform()
|
||||
elif self.mode == OverlayMode.TRANSCRIBING:
|
||||
self._draw_transcribing()
|
||||
|
||||
def _draw_waveform(self):
|
||||
color = NSColor.colorWithCalibratedRed_green_blue_alpha_(0.4, 0.75, 0.5, 0.9)
|
||||
color.setStroke()
|
||||
|
||||
bounds = self.bounds()
|
||||
w = bounds.size.width
|
||||
h = bounds.size.height
|
||||
mid_y = h / 2
|
||||
padding = 10
|
||||
draw_w = w - 2 * padding
|
||||
draw_h = (h - 2 * padding) / 2
|
||||
|
||||
amps = list(self.amplitudes)
|
||||
if not amps:
|
||||
return
|
||||
|
||||
step = draw_w / max(len(amps) - 1, 1)
|
||||
|
||||
for sign in (1, -1):
|
||||
line = NSBezierPath.bezierPath()
|
||||
line.setLineWidth_(1.5)
|
||||
for i, a in enumerate(amps):
|
||||
x = padding + i * step
|
||||
y_off = a * draw_h * sign
|
||||
if i == 0:
|
||||
line.moveToPoint_((x, mid_y + y_off))
|
||||
else:
|
||||
line.lineToPoint_((x, mid_y + y_off))
|
||||
line.stroke()
|
||||
|
||||
self._draw_label("calliope recording...")
|
||||
|
||||
def _draw_transcribing(self):
|
||||
bounds = self.bounds()
|
||||
w = bounds.size.width
|
||||
h = bounds.size.height
|
||||
mid_y = h / 2
|
||||
|
||||
# Pulsing dots animation
|
||||
elapsed = time.monotonic() - self._pulse_start
|
||||
num_dots = 3
|
||||
dot_radius = 5.0
|
||||
dot_spacing = 20.0
|
||||
total_w = (num_dots - 1) * dot_spacing
|
||||
start_x = (w - total_w) / 2
|
||||
|
||||
for i in range(num_dots):
|
||||
# Staggered sine pulse for each dot
|
||||
phase = elapsed * 3.0 - i * 0.6
|
||||
alpha = 0.3 + 0.7 * max(0.0, (1.0 + np.sin(phase)) / 2.0)
|
||||
color = NSColor.colorWithCalibratedRed_green_blue_alpha_(
|
||||
0.4, 0.75, 0.5, alpha
|
||||
)
|
||||
color.setFill()
|
||||
x = start_x + i * dot_spacing
|
||||
dot = NSBezierPath.bezierPathWithOvalInRect_(
|
||||
NSMakeRect(x - dot_radius, mid_y - dot_radius + 6,
|
||||
dot_radius * 2, dot_radius * 2)
|
||||
)
|
||||
dot.fill()
|
||||
|
||||
self._draw_label("transcribing...")
|
||||
|
||||
def _draw_label(self, text: str):
|
||||
from Foundation import NSString, NSDictionary
|
||||
|
||||
bounds = self.bounds()
|
||||
w = bounds.size.width
|
||||
label = NSString.stringWithString_(text)
|
||||
attrs = NSDictionary.dictionaryWithObjects_forKeys_(
|
||||
[
|
||||
NSFont.systemFontOfSize_(11),
|
||||
NSColor.colorWithCalibratedRed_green_blue_alpha_(1.0, 1.0, 1.0, 0.5),
|
||||
],
|
||||
[NSFontAttributeName, NSForegroundColorAttributeName],
|
||||
)
|
||||
label_size = label.sizeWithAttributes_(attrs)
|
||||
label_x = (w - label_size.width) / 2
|
||||
label.drawAtPoint_withAttributes_(NSMakePoint(label_x, 4), attrs)
|
||||
|
||||
def refresh_(self, timer):
|
||||
self.setNeedsDisplay_(True)
|
||||
|
||||
def fadeTick_(self, timer):
|
||||
self._fade_step += 1
|
||||
progress = min(self._fade_step / FADE_STEPS, 1.0)
|
||||
|
||||
if self._fade_direction == 1:
|
||||
alpha = progress
|
||||
else:
|
||||
alpha = 1.0 - progress
|
||||
|
||||
self.window().setAlphaValue_(alpha)
|
||||
|
||||
if progress >= 1.0:
|
||||
self.stopFade()
|
||||
if self._fade_direction == -1 and self._on_fade_complete:
|
||||
self._on_fade_complete()
|
||||
|
||||
def stopFade(self):
|
||||
if self._fade_timer is not None:
|
||||
self._fade_timer.invalidate()
|
||||
self._fade_timer = None
|
||||
|
||||
def startFade_onComplete_(self, direction, on_complete):
|
||||
self.stopFade()
|
||||
self._fade_direction = direction
|
||||
self._fade_step = 0
|
||||
self._on_fade_complete = on_complete
|
||||
self._fade_timer = NSTimer.scheduledTimerWithTimeInterval_target_selector_userInfo_repeats_(
|
||||
FADE_DURATION / FADE_STEPS, self, b"fadeTick:", None, True
|
||||
)
|
||||
|
||||
|
||||
class WaveformOverlay:
|
||||
"""Floating translucent window showing a live scrolling waveform."""
|
||||
|
||||
def __init__(self):
|
||||
self._panel: NSPanel | None = None
|
||||
self._view: WaveformView | None = None
|
||||
self._timer: NSTimer | None = None
|
||||
|
||||
def _ensure_panel(self):
|
||||
if self._panel is not None:
|
||||
return
|
||||
|
||||
screen = NSScreen.mainScreen()
|
||||
screen_frame = screen.frame()
|
||||
x = (screen_frame.size.width - WIDTH) / 2
|
||||
y = screen_frame.size.height - HEIGHT - 40 # near top, below menu bar
|
||||
|
||||
rect = NSMakeRect(x, y, WIDTH, HEIGHT)
|
||||
style = NSWindowStyleMaskBorderless | NSWindowStyleMaskNonactivatingPanel
|
||||
panel = NSPanel.alloc().initWithContentRect_styleMask_backing_defer_(
|
||||
rect, style, NSBackingStoreBuffered, False
|
||||
)
|
||||
panel.setLevel_(NSStatusWindowLevel)
|
||||
panel.setOpaque_(False)
|
||||
panel.setBackgroundColor_(NSColor.clearColor())
|
||||
panel.setHasShadow_(True)
|
||||
panel.setIgnoresMouseEvents_(True)
|
||||
panel.setCollectionBehavior_(1 << 4) # NSWindowCollectionBehaviorCanJoinAllSpaces
|
||||
|
||||
view = WaveformView.alloc().initWithFrame_(NSMakeRect(0, 0, WIDTH, HEIGHT))
|
||||
panel.setContentView_(view)
|
||||
|
||||
self._panel = panel
|
||||
self._view = view
|
||||
|
||||
def show(self):
|
||||
callAfter(self._show_on_main)
|
||||
|
||||
def hide(self):
|
||||
callAfter(self._hide_on_main)
|
||||
|
||||
def show_transcribing(self):
|
||||
"""Switch overlay to transcribing state (pulsing dots)."""
|
||||
callAfter(self._show_transcribing_on_main)
|
||||
|
||||
def _show_on_main(self):
|
||||
self._ensure_panel()
|
||||
self._view.stopFade()
|
||||
self._view.mode = OverlayMode.RECORDING
|
||||
self._view.amplitudes = deque([0.0] * NUM_BARS, maxlen=NUM_BARS)
|
||||
self._panel.setAlphaValue_(0.0)
|
||||
self._panel.orderFront_(None)
|
||||
self._start_timer()
|
||||
self._view.startFade_onComplete_(1, None)
|
||||
log.debug("Overlay shown")
|
||||
|
||||
def _show_transcribing_on_main(self):
|
||||
self._ensure_panel()
|
||||
self._view.stopFade()
|
||||
self._view.mode = OverlayMode.TRANSCRIBING
|
||||
self._view._pulse_start = time.monotonic()
|
||||
# If panel is already visible, just switch mode; otherwise show it
|
||||
if self._panel.alphaValue() < 0.01:
|
||||
self._panel.setAlphaValue_(0.0)
|
||||
self._panel.orderFront_(None)
|
||||
self._view.startFade_onComplete_(1, None)
|
||||
self._start_timer()
|
||||
log.debug("Overlay switched to transcribing")
|
||||
|
||||
def _hide_on_main(self):
|
||||
if self._view is None or self._panel is None:
|
||||
return
|
||||
def on_fade_out():
|
||||
self._stop_timer()
|
||||
self._panel.orderOut_(None)
|
||||
self._view.startFade_onComplete_(-1, on_fade_out)
|
||||
log.debug("Overlay hiding")
|
||||
|
||||
def cleanup(self):
|
||||
"""Synchronously stop all timers and hide. Call before quit."""
|
||||
self._stop_timer()
|
||||
if self._view is not None:
|
||||
self._view.stopFade()
|
||||
if self._panel is not None:
|
||||
self._panel.orderOut_(None)
|
||||
|
||||
def push_samples(self, chunk: np.ndarray):
|
||||
"""Called from audio callback with a new chunk of float32 samples."""
|
||||
rms = float(np.sqrt(np.mean(chunk ** 2)))
|
||||
# Clamp to [0, 1] with some headroom
|
||||
amplitude = min(rms * 5.0, 1.0)
|
||||
if self._view is not None:
|
||||
self._view.amplitudes.append(amplitude)
|
||||
|
||||
def _start_timer(self):
|
||||
self._stop_timer()
|
||||
self._timer = NSTimer.scheduledTimerWithTimeInterval_target_selector_userInfo_repeats_(
|
||||
1.0 / FPS, self._view, b"refresh:", None, True
|
||||
)
|
||||
|
||||
def _stop_timer(self):
|
||||
if self._timer is not None:
|
||||
self._timer.invalidate()
|
||||
self._timer = None
|
||||
88
calliope/recorder.py
Normal file
88
calliope/recorder.py
Normal file
@@ -0,0 +1,88 @@
|
||||
"""Audio recording using sounddevice."""
|
||||
|
||||
import logging
|
||||
|
||||
import numpy as np
|
||||
import sounddevice as sd
|
||||
import threading
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
_CONSOLIDATE_EVERY = 100
|
||||
|
||||
|
||||
class Recorder:
|
||||
SAMPLE_RATE = 16_000
|
||||
CHANNELS = 1
|
||||
|
||||
def __init__(self, device: int | None = None, on_audio=None):
|
||||
self._device = device
|
||||
self._chunks: list[np.ndarray] = []
|
||||
self._stream: sd.InputStream | None = None
|
||||
self._lock = threading.Lock()
|
||||
self._chunk_count = 0
|
||||
self.on_audio = on_audio
|
||||
|
||||
@property
|
||||
def is_recording(self) -> bool:
|
||||
return self._stream is not None and self._stream.active
|
||||
|
||||
def start(self) -> None:
|
||||
with self._lock:
|
||||
self._chunks = []
|
||||
self._chunk_count = 0
|
||||
try:
|
||||
self._stream = sd.InputStream(
|
||||
samplerate=self.SAMPLE_RATE,
|
||||
channels=self.CHANNELS,
|
||||
dtype="float32",
|
||||
device=self._device,
|
||||
callback=self._callback,
|
||||
)
|
||||
self._stream.start()
|
||||
log.debug("Recording stream started (device=%s)", self._device)
|
||||
except sd.PortAudioError:
|
||||
log.error("Failed to open audio device %s", self._device, exc_info=True)
|
||||
if self._stream is not None:
|
||||
try:
|
||||
self._stream.close()
|
||||
except Exception:
|
||||
pass
|
||||
self._stream = None
|
||||
raise
|
||||
|
||||
def stop(self) -> np.ndarray:
|
||||
# Stop stream first — guarantees no more callbacks after this returns.
|
||||
if self._stream is not None:
|
||||
self._stream.stop()
|
||||
self._stream.close()
|
||||
self._stream = None
|
||||
with self._lock:
|
||||
if not self._chunks:
|
||||
return np.zeros(0, dtype=np.float32)
|
||||
audio = np.concatenate(self._chunks).flatten()
|
||||
self._chunks = []
|
||||
return audio
|
||||
|
||||
def get_audio_so_far(self) -> np.ndarray:
|
||||
"""Return a copy of all audio recorded so far without stopping the stream."""
|
||||
with self._lock:
|
||||
if not self._chunks:
|
||||
return np.zeros(0, dtype=np.float32)
|
||||
return np.concatenate(self._chunks).flatten()
|
||||
|
||||
def _callback(self, indata: np.ndarray, frames, time_info, status) -> None:
|
||||
if status:
|
||||
log.warning("Audio stream status: %s", status)
|
||||
chunk = indata[:, 0].copy() if indata.ndim > 1 else indata.copy()
|
||||
with self._lock:
|
||||
self._chunks.append(chunk)
|
||||
self._chunk_count += 1
|
||||
if self._chunk_count % _CONSOLIDATE_EVERY == 0:
|
||||
self._chunks = [np.concatenate(self._chunks).flatten()]
|
||||
|
||||
if self.on_audio is not None:
|
||||
try:
|
||||
self.on_audio(chunk)
|
||||
except Exception:
|
||||
log.error("Error in on_audio callback", exc_info=True)
|
||||
147
calliope/setup_wizard.py
Normal file
147
calliope/setup_wizard.py
Normal file
@@ -0,0 +1,147 @@
|
||||
"""First-run setup wizard — Rich TUI."""
|
||||
|
||||
import subprocess
|
||||
import sys
|
||||
|
||||
import sounddevice as sd
|
||||
from rich.console import Console
|
||||
from rich.panel import Panel
|
||||
from rich.progress import Progress
|
||||
from rich.prompt import Confirm, IntPrompt, Prompt
|
||||
from rich.table import Table
|
||||
|
||||
from calliope import config
|
||||
|
||||
console = Console()
|
||||
|
||||
|
||||
def run() -> dict:
|
||||
"""Run the interactive setup wizard and return the final config."""
|
||||
cfg = dict(config.DEFAULTS)
|
||||
|
||||
# ── Welcome ──────────────────────────────────────────────────────
|
||||
console.print(
|
||||
Panel.fit(
|
||||
"[bold magenta]Calliope[/bold magenta]\n"
|
||||
"Voice-to-text for macOS — speak and type into any app.\n\n"
|
||||
"This wizard will walk you through first-time setup.",
|
||||
border_style="magenta",
|
||||
)
|
||||
)
|
||||
|
||||
# ── Permission checks ────────────────────────────────────────────
|
||||
console.print("\n[bold]Permission checks[/bold]")
|
||||
_check_accessibility()
|
||||
_check_microphone()
|
||||
console.print()
|
||||
|
||||
# ── Mic selection ────────────────────────────────────────────────
|
||||
console.print("[bold]Microphone selection[/bold]")
|
||||
devices = sd.query_devices()
|
||||
table = Table(show_header=True)
|
||||
table.add_column("#", style="cyan", width=4)
|
||||
table.add_column("Device")
|
||||
table.add_column("Inputs", justify="right")
|
||||
|
||||
input_indices: list[int] = []
|
||||
for i, d in enumerate(devices):
|
||||
if d["max_input_channels"] > 0:
|
||||
input_indices.append(i)
|
||||
marker = " (default)" if i == sd.default.device[0] else ""
|
||||
table.add_row(str(i), f"{d['name']}{marker}", str(d["max_input_channels"]))
|
||||
|
||||
console.print(table)
|
||||
default_dev = sd.default.device[0]
|
||||
choice = Prompt.ask(
|
||||
"Device index",
|
||||
default=str(default_dev) if default_dev is not None else str(input_indices[0]),
|
||||
)
|
||||
cfg["device"] = int(choice) if choice else None
|
||||
|
||||
# ── Hotkey config ────────────────────────────────────────────────
|
||||
console.print("\n[bold]Hotkey configuration[/bold]")
|
||||
console.print(f" Push-to-talk : [cyan]{cfg['hotkeys']['ptt']}[/cyan]")
|
||||
console.print(f" Toggle : [cyan]{cfg['hotkeys']['toggle']}[/cyan]")
|
||||
if Confirm.ask("Keep defaults?", default=True):
|
||||
pass
|
||||
else:
|
||||
cfg["hotkeys"]["ptt"] = Prompt.ask("Push-to-talk combo", default=cfg["hotkeys"]["ptt"])
|
||||
cfg["hotkeys"]["toggle"] = Prompt.ask("Toggle combo", default=cfg["hotkeys"]["toggle"])
|
||||
|
||||
# ── Model download ───────────────────────────────────────────────
|
||||
console.print("\n[bold]Model download[/bold]")
|
||||
console.print(f" Default model: [cyan]{cfg['model']}[/cyan]")
|
||||
if not Confirm.ask("Use default model?", default=True):
|
||||
cfg["model"] = Prompt.ask("Whisper model")
|
||||
console.print(f"Downloading [cyan]{cfg['model']}[/cyan] (this may take a while)...")
|
||||
|
||||
from calliope.transcriber import Transcriber
|
||||
|
||||
transcriber = Transcriber(model=cfg["model"])
|
||||
with Progress() as progress:
|
||||
task = progress.add_task("Loading model...", total=None)
|
||||
transcriber.load()
|
||||
progress.update(task, completed=100, total=100)
|
||||
|
||||
console.print("[green]Model ready.[/green]")
|
||||
|
||||
# ── Validation ───────────────────────────────────────────────────
|
||||
if Confirm.ask("\nRecord a short test clip to verify everything works?", default=True):
|
||||
console.print("Recording for 3 seconds...")
|
||||
from calliope.recorder import Recorder
|
||||
import time
|
||||
|
||||
rec = Recorder(device=cfg["device"])
|
||||
rec.start()
|
||||
time.sleep(3)
|
||||
audio = rec.stop()
|
||||
console.print("Transcribing...")
|
||||
text = transcriber.transcribe(audio)
|
||||
console.print(f"[green]Result:[/green] {text or '(no speech detected)'}")
|
||||
|
||||
# ── Save ─────────────────────────────────────────────────────────
|
||||
config.save(cfg)
|
||||
console.print(f"\n[green]Config saved to {config.CONFIG_PATH}[/green]")
|
||||
console.print("Run [bold]calliope[/bold] to start. Enjoy! 🎤\n")
|
||||
return cfg
|
||||
|
||||
|
||||
def _check_accessibility() -> None:
|
||||
try:
|
||||
import ApplicationServices
|
||||
trusted = ApplicationServices.AXIsProcessTrusted()
|
||||
except Exception:
|
||||
trusted = None
|
||||
|
||||
if trusted:
|
||||
console.print(" [green]✓[/green] Accessibility access granted")
|
||||
else:
|
||||
console.print(" [red]✗[/red] Accessibility access — required for typing")
|
||||
console.print(" Open: System Settings → Privacy & Security → Accessibility")
|
||||
if Confirm.ask(" Open System Settings?", default=False):
|
||||
subprocess.run(
|
||||
["open", "x-apple.systempreferences:com.apple.preference.security?Privacy_Accessibility"],
|
||||
check=False,
|
||||
)
|
||||
|
||||
|
||||
def _check_microphone() -> None:
|
||||
try:
|
||||
import AVFoundation
|
||||
status = AVFoundation.AVCaptureDevice.authorizationStatusForMediaType_(
|
||||
AVFoundation.AVMediaTypeAudio
|
||||
)
|
||||
granted = status == 3 # AVAuthorizationStatusAuthorized
|
||||
except Exception:
|
||||
granted = None
|
||||
|
||||
if granted:
|
||||
console.print(" [green]✓[/green] Microphone access granted")
|
||||
else:
|
||||
console.print(" [red]✗[/red] Microphone access — required for recording")
|
||||
console.print(" Open: System Settings → Privacy & Security → Microphone")
|
||||
if Confirm.ask(" Open System Settings?", default=False):
|
||||
subprocess.run(
|
||||
["open", "x-apple.systempreferences:com.apple.preference.security?Privacy_Microphone"],
|
||||
check=False,
|
||||
)
|
||||
73
calliope/transcriber.py
Normal file
73
calliope/transcriber.py
Normal file
@@ -0,0 +1,73 @@
|
||||
"""Whisper transcription using transformers pipeline on MPS."""
|
||||
|
||||
import logging
|
||||
|
||||
import numpy as np
|
||||
import torch
|
||||
from transformers import pipeline
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class Transcriber:
|
||||
def __init__(self, model: str = "distil-whisper/distil-large-v3"):
|
||||
self.model = model
|
||||
self._pipe = None
|
||||
self._tokenizer = None
|
||||
self.context: str = ""
|
||||
self.language: str = "auto"
|
||||
|
||||
def load(self) -> None:
|
||||
from transformers import AutoTokenizer
|
||||
|
||||
device = "mps" if torch.backends.mps.is_available() else "cpu"
|
||||
# Use float32 on MPS — float16 produces garbled output on Apple Silicon.
|
||||
dtype = torch.float32 if device == "mps" else torch.float16
|
||||
log.info("Loading model %s on %s (dtype=%s)", self.model, device, dtype)
|
||||
try:
|
||||
self._pipe = pipeline(
|
||||
"automatic-speech-recognition",
|
||||
model=self.model,
|
||||
torch_dtype=dtype,
|
||||
device=device,
|
||||
)
|
||||
self._tokenizer = AutoTokenizer.from_pretrained(self.model)
|
||||
log.info("Model loaded successfully")
|
||||
except Exception:
|
||||
log.error("Failed to load model %s", self.model, exc_info=True)
|
||||
raise
|
||||
|
||||
def transcribe(self, audio: np.ndarray) -> str:
|
||||
if self._pipe is None:
|
||||
self.load()
|
||||
if audio.size == 0:
|
||||
return ""
|
||||
|
||||
# Skip audio that's too short (<1s) or too quiet — Whisper hallucinates
|
||||
# punctuation like "!" on silence/noise.
|
||||
duration = audio.size / 16_000
|
||||
energy = float(np.sqrt(np.mean(audio ** 2)))
|
||||
log.debug("Audio: %.1fs, RMS energy: %.6f", duration, energy)
|
||||
if duration < 1.0 or energy < 0.005:
|
||||
log.debug("Audio too short or too quiet, skipping transcription")
|
||||
return ""
|
||||
|
||||
generate_kwargs = {}
|
||||
if self.context:
|
||||
prompt_ids = self._tokenizer.get_prompt_ids(self.context)
|
||||
generate_kwargs["prompt_ids"] = prompt_ids
|
||||
|
||||
pipe_kwargs = {
|
||||
"batch_size": 4,
|
||||
"return_timestamps": True,
|
||||
"generate_kwargs": generate_kwargs,
|
||||
}
|
||||
if self.language != "auto":
|
||||
pipe_kwargs["generate_kwargs"]["language"] = self.language
|
||||
pipe_kwargs["generate_kwargs"]["task"] = "transcribe"
|
||||
|
||||
result = self._pipe(
|
||||
{"raw": audio, "sampling_rate": 16_000},
|
||||
**pipe_kwargs,
|
||||
)
|
||||
return result["text"].strip()
|
||||
64
calliope/typer.py
Normal file
64
calliope/typer.py
Normal file
@@ -0,0 +1,64 @@
|
||||
"""Type text into the focused field using Quartz CGEvents."""
|
||||
|
||||
import logging
|
||||
import subprocess
|
||||
import time
|
||||
|
||||
import Quartz
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def type_text(text: str) -> None:
|
||||
"""Simulate typing text into the currently focused text field."""
|
||||
for char in text:
|
||||
_type_char(char)
|
||||
time.sleep(0.005)
|
||||
|
||||
|
||||
def type_text_clipboard(text: str) -> None:
|
||||
"""Type text by copying to clipboard and pasting with Cmd+V.
|
||||
|
||||
Saves and restores the previous clipboard contents.
|
||||
"""
|
||||
# Save current clipboard
|
||||
try:
|
||||
prev = subprocess.run(
|
||||
["pbpaste"], capture_output=True, text=True, timeout=2,
|
||||
).stdout
|
||||
except Exception:
|
||||
prev = None
|
||||
|
||||
# Copy text to clipboard
|
||||
subprocess.run(["pbcopy"], input=text, text=True, timeout=2)
|
||||
|
||||
# Paste with Cmd+V
|
||||
_cmd_v()
|
||||
time.sleep(0.05)
|
||||
|
||||
# Restore previous clipboard
|
||||
if prev is not None:
|
||||
subprocess.run(["pbcopy"], input=prev, text=True, timeout=2)
|
||||
|
||||
|
||||
def _cmd_v() -> None:
|
||||
"""Simulate Cmd+V keypress."""
|
||||
# 'v' keycode is 9
|
||||
event_down = Quartz.CGEventCreateKeyboardEvent(None, 9, True)
|
||||
event_up = Quartz.CGEventCreateKeyboardEvent(None, 9, False)
|
||||
Quartz.CGEventSetFlags(event_down, Quartz.kCGEventFlagMaskCommand)
|
||||
Quartz.CGEventSetFlags(event_up, Quartz.kCGEventFlagMaskCommand)
|
||||
Quartz.CGEventPost(Quartz.kCGAnnotatedSessionEventTap, event_down)
|
||||
Quartz.CGEventPost(Quartz.kCGAnnotatedSessionEventTap, event_up)
|
||||
|
||||
|
||||
def _type_char(char: str) -> None:
|
||||
"""Type a single unicode character via CGEvents."""
|
||||
event_down = Quartz.CGEventCreateKeyboardEvent(None, 0, True)
|
||||
event_up = Quartz.CGEventCreateKeyboardEvent(None, 0, False)
|
||||
|
||||
Quartz.CGEventKeyboardSetUnicodeString(event_down, len(char), char)
|
||||
Quartz.CGEventKeyboardSetUnicodeString(event_up, len(char), char)
|
||||
|
||||
Quartz.CGEventPost(Quartz.kCGAnnotatedSessionEventTap, event_down)
|
||||
Quartz.CGEventPost(Quartz.kCGAnnotatedSessionEventTap, event_up)
|
||||
27
pyproject.toml
Normal file
27
pyproject.toml
Normal file
@@ -0,0 +1,27 @@
|
||||
[build-system]
|
||||
requires = ["setuptools>=68.0"]
|
||||
build-backend = "setuptools.build_meta"
|
||||
|
||||
[project]
|
||||
name = "calliope"
|
||||
version = "0.1.0"
|
||||
description = "Voice-to-text for macOS — speak and type into any app"
|
||||
requires-python = ">=3.10"
|
||||
dependencies = [
|
||||
"rumps>=0.4.0",
|
||||
"sounddevice>=0.4.6",
|
||||
"numpy>=1.24.0",
|
||||
"torch>=2.0.0",
|
||||
"transformers>=4.36.0",
|
||||
"accelerate>=0.25.0",
|
||||
"pynput>=1.7.6",
|
||||
"pyobjc-framework-Quartz>=9.0",
|
||||
"pyobjc-framework-Cocoa>=9.0",
|
||||
"pyobjc-framework-AVFoundation>=9.0",
|
||||
"rich>=13.0.0",
|
||||
"click>=8.1.0",
|
||||
"pyyaml>=6.0",
|
||||
]
|
||||
|
||||
[project.scripts]
|
||||
calliope = "calliope.app:main"
|
||||
Reference in New Issue
Block a user