What We Shipped: Geta.Team v2.0.16

What We Shipped: Geta.Team v2.0.16

This release is all about making voice calls actually usable and giving the File Explorer some overdue muscle. Here's what changed in v2.0.16.

Voice Calls: Custom Instructions, Speed Control, and a Big Latency Fix

Three things that were either missing or broken in voice calls are now fixed.

Custom Voice Instructions (Sur-prompt)

You can now inject custom rules into any AI employee's voice behavior without rewriting the entire system prompt. Add instructions like "always speak formally," "ask for the order number first," or "never discuss pricing" — and they'll apply to both inbound and outbound calls.

The field is a simple textarea in both the phone number assignment panel and the in-chat voice settings. Your rules get injected as an ADDITIONAL RULES FROM EMPLOYER block on top of the base prompt. Simple, non-destructive, and instantly effective.

TTS Speech Speed

ElevenLabs voices now have an adjustable speed slider (0.7x to 1.2x). If your AI employee talks too fast for your customers or too slow for your workflow, you can dial it in. The setting lives right next to the voice selector.

Latency: From 2.5 Seconds to Under 1 Second

This was the biggest fix in the release. Voice calls had a noticeable lag between the user finishing their sentence and the AI starting to respond. The culprit was two-fold:

  1. Text buffering was waiting for the full LLM stream to complete before sending anything to TTS. We reverted to immediate chunk-by-chunk streaming — the AI starts speaking as soon as the first sentence is ready.
  2. The model itself was slow. We added detailed timestamps throughout the voice pipeline and found that API connect time was the bottleneck. Switching from gemini-3-flash-preview (1,350–2,550ms connect) to gemini-3.1-flash-lite-preview (580–825ms connect) cut latency roughly in half.

The result: voice conversations now feel significantly more natural. The AI responds in under a second instead of making you wait through an awkward silence.

Tool Overuse Fixed

Gemini was calling system tools for simple questions — asking "who is this?" would trigger a hostnamectl command. The fix: tool usage rules were moved from the bottom of the prompt (where Gemini ignored them) to the top, inside the critical directive block. The AI now only uses tools when actually needed.

Voice Filler Removed

The "Un instant" filler phrase that played during tool execution has been permanently disabled. It was more annoying than helpful.

File Explorer: Folder Downloads and Selective Downloads

Two features that make the File Explorer actually useful for bulk operations.

Folder Download as ZIP

Click the download icon on any folder and it streams the entire contents as a ZIP archive directly to your browser. Uses the archiver package on the backend — no temporary files, no waiting for compression to finish before the download starts.

Selective Download

Hit the checkbox toggle in the navigation bar and the File Explorer enters selection mode. Every file and folder gets a checkbox. Click a folder checkbox and it recursively selects everything inside. A floating action bar shows your selection count and a download button that bundles everything into a single ZIP.

This is particularly useful when your AI employee has generated a bunch of files across different folders and you want to grab specific outputs without downloading everything.

Other Fixes

Custom LLM Validation

The validateCustomLLM() function was making a test API call that frequently failed due to model name formatting or provider quirks — even when the LLM worked perfectly fine. The error was being displayed in the chat on the first message, which was confusing. Validation errors are now logged server-side only and no longer surface to users.

Mobile Voice Dictation

Voice-to-text on mobile was producing duplicated text ("salut salut salut comment salut comment ca"). The root cause: continuous=true combined with mobile-specific segment tracking caused interim results to overlap with final results.

The fix was a full rewrite of the speech recognition handler — continuous set to false with auto-restart, proper event.resultIndex tracking, and a unified handler that works identically on mobile and desktop. No more platform-specific code paths.

Want to test the most advanced AI employees? Try it here: https://Geta.Team