v2.2.4 + v2.2.5: Your Voice Agent Just Got Hands, Then Learned To Pick The Right Tool

Share
v2.2.4 + v2.2.5: Your Voice Agent Just Got Hands, Then Learned To Pick The Right Tool

v2.2.4 is the second voice release this week, and it is mostly about something the voice agent could not do before: actually use the skills it has installed.

Until today, a phone conversation with your AI employee was limited to a fixed set of built-in tools: ending the call, transferring it, looking up a contact, sending an email, telling time. Anything beyond that (booking an appointment in your scheduler, generating a quick mockup, querying a custom database, running a memory search) was simply unreachable. Even if the agent had a skill installed for exactly that task, the voice layer could not get to it.

That gap closes today.

New: phone agents can use any of your skills mid-call

Every skill installed in your agent's .skills/ folder is now reachable from a live phone call. The agent can call its appointment booker, search its memory, generate an image, draft a document, run a custom workflow, whatever lives in its toolkit. You ask, the agent reaches for the skill, the result comes back in voice.

A concrete example: an employee with a custom appointment-booking skill ("novo") can now take a call from a client, hear them say "I'd like to book for next Tuesday morning," check availability through the skill, propose three slots, confirm one, and hang up with the appointment created. Without v2.2.4 the agent would have politely apologized and offered to email back later.

The new tool is called run_skill. It is narrow on purpose: the agent only fires it when you explicitly ask for a feature that maps to one of its installed skills. It does not fire on greetings or small talk. The agent self-corrects if it asks for a skill that does not exist (the error reply tells it which ones are available).

This unlocks a category of phone calls that simply could not happen before.

Fixed: the voice agent now tells the right time

A non-UTC user on a 5pm call would hear the agent confidently say it was 9pm. The voice handler was reading container UTC time, completely ignoring the timezone we already had on file for the employee.

Fixed. The agent now reads time in the employee's configured timezone. A Montreal-based user on a 5pm call hears "it's 5pm," not "it's 9pm." Same for every timezone we know about.

Fixed: the voice agent stops hanging up on "ok" and "c'est bien"

Two calls earlier today ended abruptly while the user was still in the middle of a sentence. The shape was the same: the user said something like "Non non, c'est bien c'est bien. Euh..." and the agent interpreted "c'est bien" as a signal to end the call. Click. Dial tone.

The agent's instruction for when to hang up was too loose. It said something like "use this tool when the conversation feels complete." Gemini Live, very politely, interpreted agreement filler as completeness.

The instruction is now explicit. Only fire the hang-up tool on actual goodbye phrases ("au revoir," "bye," "talk to you later," "I'm going to hang up," and similar in six languages). Never fire on "ok," "d'accord," "c'est bien," "ça marche," "merci," "perfect," or filler words like "euh" and "hm." And on anything ambiguous, ask first: "shall we hang up?"

No more cut-off calls.

Fixed: voice agents now know their full personality on every call

The voice system prompt was running through a compression pass that summarized your employee's full profile down to about 500 tokens before injecting it. The summary kept the personality outline but sometimes dropped specific skills, employer preferences, or workflow rules.

v2.2.4 removes the compression entirely. The voice agent now receives the full profile, verbatim, on every call. The trade is a slightly larger system prompt (roughly 3-5k tokens more). The benefit is that the voice agent finally knows everything the chat agent knows. Which is also what made the new run_skill tool work cleanly the first time: the agent could see in its profile which skills were installed.

A diagnostic this morning showed that the semantic search index in memory-db was being rebuilt from scratch on every recall instead of being maintained on write. Functional, but slow: a search on 1,800 stored memories took noticeably longer than it should have.

The index now updates the moment a memory is written, deleted, or modified. Searches are immediate. New employees get this automatically; existing employees pick it up on their next skill resync.

Why v2.2.4 matters

v2.2.3 made voice calls feel natural. v2.2.4 makes them useful.

The shift from "the voice agent can chat" to "the voice agent can do" is the bigger story. A phone call to your AI employee can now book an appointment, search a customer record, draft a follow-up document, or run any other workflow you have given the employee a skill for. Without the user needing to know what a skill is, where it lives, or how to ask for it. They just talk.


v2.2.5 (later the same day): the voice agent learns to pick the right tool, and gets a notepad

A few hours after shipping v2.2.4, a real call exposed the next gap. A voice agent saw a request that should have mapped to an installed skill, but it had no idea which skill, and no way to read the skill's documentation to figure out the arguments. It guessed, got the arguments wrong, and the call ended with an awkward "I'm sorry, that didn't work."

v2.2.5 closes that loop. The voice agent can now discover the skills it has, read each skill's documentation on demand, and create the scratch files some skills require as input. Plus one chat-side CSS fix that was hiding the right-panel toolbar whenever a message contained a long code block.

New: voice agents now figure out which skill to use on their own

Two new voice tools landed: skills_list and skill_details.

The agent uses them like a person would. Caller says "fais-moi un mockup d'une page de login." The agent has dozens of skills installed and does not memorise their CLI syntax, so it calls skills_list, gets the inventory, spots ui-sketcher, calls skill_details("ui-sketcher") to read the usage examples, then fires run_skill with the correct arguments. The whole chain happens in under a second of conversation time, in the same turn.

The same flow works for any skill: web publishing, document generation, image editing, custom workflows. The agent does the lookup; you do not have to teach it the syntax of every tool you install.

Why this matters: in v2.2.4 you had to pre-load the agent's profile with skill names and example invocations for run_skill to work well. In v2.2.5 the agent reads the skill's own docs at call time. New skills you install tomorrow are reachable today.

New: voice agents can create scratch files mid-call

Some skills need a file as input. A document processor wants a markdown file to convert. An ASCII-GIF renderer wants a JSON frames file. A CV builder wants the candidate's content written somewhere before the PDF step.

In chat mode the agent has a Write tool and can drop a file anywhere before invoking a CLI. In voice, until this release, it could not. So a perfectly good skill would fail because the input file did not exist.

v2.2.5 exposes four file tools to the voice layer: write_file, read_file, list_files, search_files. They are sandboxed (only /opt/app/data/ and /tmp/ are writable, and path traversal is blocked at the handler level). The tool descriptions are tight enough that Gemini Live does not fire them on greetings or small talk: they only fire when the agent is genuinely staging a file for a downstream skill.

End-to-end example, from one of this morning's test calls:

Caller: "fais-moi une lettre de motivation et envoie-la par email à Jean"
Agent:  drafts the letter in head
Agent:  write_file("/tmp/letter.md", "<full content>")
Agent:  skill_details("document-processor")
Agent:  run_skill("document-processor", "convert --input /tmp/letter.md --output /opt/app/data/letter.pdf")
Agent:  send_email(to="jean@...", subject="...", attachments=["/opt/app/data/letter.pdf"])

Five tool calls, one continuous voice turn, no human edits. That was not possible yesterday.

Fixed: long code blocks in chat no longer hide the right-panel toolbar

A specific kind of assistant message (a code block containing one very long unbroken line, the sort of thing you get when you paste a JSON-ish description field) was extending the chat container past the right panel's edge and visually pushing the FileExplorer toolbar (Refresh, FolderPlus, FilePlus, Upload, CheckSquare) out of view.

The visual symptom looked exactly like the v2.2.1 chat-wipe bug we fixed two days ago, which sent us briefly chasing the wrong root cause. Different mechanism: the v2.2.1 fix was a backend race condition; this one was pure CSS. A monospace line-rendering <div> had whiteSpace: 'pre' hard-coded, overriding the parent's wrapping rule. Switched to pre-wrap plus overflowWrap: 'anywhere' so any long line breaks at any character if it has to.

Refresh and the toolbar stays put, no matter what code lands in the conversation.

Why v2.2.5 matters

v2.2.4 gave the voice agent hands. v2.2.5 teaches it to find the right tool for the job and to lay out its work before reaching for it.

The compound effect is what is interesting. A voice call can now: discover a skill it has never seen named in conversation, read its documentation, draft a file the skill needs, invoke the skill with the right arguments, take the output, and pass it to another skill (or an email tool) without the caller ever needing to know any of those steps existed.

Hard refresh to pick up the new version tag.

Want to test the most advanced AI employees? Try it here: https://Geta.Team

Read more