Table of contents
- The Four Pillars of Hermes
- Memory: `MEMORY.md` and `USER.md`
- Skills: Progressive Disclosure, Not Bloat
- Tools and Toolsets
- Context Awareness and `@` References
- Checkpoints and Rollback
- Automation: Scheduled Tasks, Subagents, Hooks, Batch
- Voice, Vision, Image Generation
- Provider Routing and Fallbacks
- API Server and IDE Integration
- Personality and Plugins
- Next Steps
The Four Pillars of Hermes
Hermes stops being 'yet another CLI chatbot' once you understand its four interlocking systems: persistent memory, on-demand skills, toolsets, and voice / multimodal I/O. Each one is useful alone, but they're designed to compose — memory tells the agent who you are, skills tell it how you work, toolsets give it the ability to act, and voice takes the whole thing off the keyboard.
Primary source: the feature overview at https://hermes-agent.nousresearch.com/docs/user-guide/features/overview. Use that page for the authoritative, always-current list.
Memory: `MEMORY.md` and `USER.md`
Hermes stores long-term state in plain-text files — MEMORY.md for facts about the system and project, USER.md for facts about you. They're human-readable, human-editable, and deliberately bounded: when they fill up, Hermes consolidates entries rather than letting them grow unbounded.
Memory is intentionally small. The design is: store the things that a future session would want to know but can't rediscover from the code — preferences, environmental quirks, past incidents, constraints. Don't store things that are already in the repo.
Skills: Progressive Disclosure, Not Bloat
Where memory stores facts, skills store procedures. A skill is a structured document — matching the agentskills.io open standard — that the agent loads on demand when the relevant scenario fires. The on-demand part is the key: if you install forty skills, your context window isn't forty skills lighter, because only the ones the agent is currently using are in the prompt.
That is the single biggest difference between 'lots of instructions in one giant system prompt' and a real skill system. Progressive loading means your context stays small and relevant, even with a large installed library.
Tools and Toolsets
Tools are the functions the agent can call — web search, terminal execution, file editing, delegation. Toolsets are logical groupings of tools that can be enabled or disabled per-platform. The messaging gateway uses this heavily: you might let a Slack bot read files but not run terminal commands, while the CLI gets the full set.
Context Awareness and `@` References
Hermes automatically discovers and loads any of .hermes.md, AGENTS.md, CLAUDE.md, or SOUL.md that sit alongside the code it's working on. No configuration, no flag — if the file is there, it shapes behaviour. This is how you get consistent conventions across sessions without re-explaining them every time.
Inline, inside a conversation, the @ symbol injects a file, folder or URL directly into the context. The agent expands the reference and integrates it — so @src/auth/` drops the whole auth directory into the prompt without you hand-copying files.
Checkpoints and Rollback
Before it modifies files, Hermes automatically snapshots the working directory. If a change turns out to be wrong, /rollback reverts to the last checkpoint. This is a safety net you don't notice until you need it — and then it's the feature you'd never give up.
Automation: Scheduled Tasks, Subagents, Hooks, Batch
A cluster of features turn Hermes from a reactive chat tool into a background worker:
- Scheduled Tasks — recurring jobs described in natural language, run on a cron.
- Subagent Delegation — spawn isolated child agents with restricted permissions, up to three concurrent, for parallel investigation.
- Event Hooks — customisation points at lifecycle moments (tool call, session start, etc.) for logging, alerts, or interception.
- Batch Processing — run hundreds or thousands of prompts programmatically and emit structured trajectory data for analysis.
Voice, Vision, Image Generation
Voice mode gives full bidirectional voice interaction across CLI and messaging platforms — mic in, spoken responses out. Five TTS providers are supported, including the free Edge TTS default, so you can try it with zero spend.
Vision works via clipboard image paste (Ctrl+V in the CLI), which is perfect for dropping screenshots and diagrams in without file management. Image generation is wired to FAL.ai with eight models, and browser automation runs against Browserbase, Browser Use, or a local Chrome via CDP.
Provider Routing and Fallbacks
Hermes treats the LLM as a pluggable component. Provider Routing lets you pick which provider handles each request — optimising for cost, latency or quality — and Fallback Providers automatically fail over when the primary is down. Credential Pools rotate between multiple keys to spread load or survive a revoked key.
API Server and IDE Integration
Two mature exit points out of the CLI are worth noting. The API server exposes Hermes as an OpenAI-compatible HTTP endpoint — plug it into Open WebUI, LobeChat or any OpenAI-compatible client. And ACP support wires Hermes directly into VS Code, Zed and JetBrains editors, so you get agent behaviour inside your IDE without leaving the editor.
Personality and Plugins
SOUL.md defines the agent's identity — tone, preferences, non-negotiables. Presets (helpful, concise, kawaii and others) are available per-session. Plugins let you extend Hermes without touching core code — custom tools, hooks, memory providers, alternative context engines — discovered from standard locations like ~/.hermes/plugins/.
Next Steps
Lesson 4 on messaging platforms shows how to expose all of this to your team via Slack or Telegram. Lesson 5 on integrations goes deeper on MCP, web search, and voice backends. If you're sold on the memory/skills model and want to go hands-on, SetupClaw's managed service applies the same 'stabilise the baseline first' principle while handling the security hardening for you.