Abstract: Vision-language models (VLMs) offer flexible object detection through natural language prompts but suffer from performance variability depending on prompt phrasing. In this paper, we ...
Pocket TTS delivers high-quality text-to-speech on standard CPUs. No GPU, no cloud APIs. It is the first local TTS with voice ...