Configuration schema

Read this if you want to tune OMem beyond what the wizard set, and you want to know not just what a field is but what each value means and when to pick it. The decision points — how qmd searches, which LLM provider, how images are handled — get a full explanation here; the mechanical knobs get a compact table.

Where it lives, and how to read it

Your config is a single YAML file at:

~/.config/omem/config.yaml

(or $XDG_CONFIG_HOME/omem/config.yaml if you’ve set that). OMem writes a default skeleton on first run and migrates it forward automatically on load.

There is no omem config schema command. To inspect the effective config — validated, with defaults filled in — use:

omem config show              # effective config (YAML)
omem config show --json       # same, as JSON
omem config show --raw        # the file verbatim (useful if it won't parse)
omem config get llm.curator.model              # read one value
omem config set kinds.mail.scope.time_window.since 6m_ago   # write one value

config set takes JSON for non-string values: omem config set kinds.file.scope.exclude_patterns '["~$*", "*.tmp"]'.

Browse the schema

The schema is one nested tree. Expand a section and click a field to see its type, default, and what it controls — the two high-risk fields are marked:

data.root

HIGH RISK — changing this strands existing data. Read the note before editing.

type: str
default: "~/.local/share/omem"

Internal cache: the content-addressed raw/ archive + the SQLite db. You never browse this by hand. Changing it orphans every cached item — re-ingest would start from scratch.

The rest of this page walks the fields section by section. The ones with real choices to make are explained in full; the mechanical ones are tabled.

The decision points

These are the handful of fields where the value genuinely changes how OMem behaves. They’re worth understanding before you touch them.

`plugins.qmd.query_mode` — how qmd searches

Only relevant if you’ve enabled the qmd index. It picks which retrieval paths qmd runs, trading speed for precision. The numbers below are measured on a single small page (4 documents); the gap between cold and warm matters because the embedding and reranker models load lazily on first use.

Value	What runs	Speed	When to pick it
`bm25`	FTS5 keyword search only — no LLM, no vectors.	~2 s	You want qmd’s plumbing but essentially keyword behavior; fastest possible.
`vector`	Vector similarity only (no keyword path).	~85 s cold, ~3–5 s warm	Rarely the best pick — dropping the keyword path lowers recall vs `no-rerank`.
`no-rerank`	Hybrid: BM25 + vector + LLM query expansion, reranker off.	~4 s warm	The recommended everyday mode. Precision is essentially the same as `full` at a fraction of the cost.
`full` (default)	`no-rerank` + a cross-encoder reranker (qwen3-reranker).	~365 s cold, ~5–10 s warm	Maximum precision when you can absorb the reranker’s cold-start; on lighter hardware the reranker is the expensive part.

The practical advice: full is the default because it’s the most precise, but no-rerank is the one most people should run day to day — it keeps the hybrid recall (the part that finds cross-language and semantic matches) and only drops the reranker, which adds latency for a modest precision gain. Set it with:

omem config set plugins.qmd.query_mode no-rerank

`llm.curator.provider` / `llm.vlm.provider` — where the LLM calls go

OMem never ships a model; you point it at one. The curator (text → wiki) and the VLM (image → description) are configured independently and can use different providers. Four are supported:

Value	What it is	You need
`anthropic-oauth` (default)	Your Claude Pro / Max subscription via OAuth. No API key to manage.	A Claude subscription; auth is handled in `omem setup`.
`anthropic-api`	The Anthropic API directly, billed per token.	An `ANTHROPIC_API_KEY`.
`openai-compat`	Any OpenAI-compatible endpoint — OpenAI, Chinese model platforms (Alibaba Bailian / ByteDance Volcengine Ark / Kimi / Zhipu GLM / MiniMax / DeepSeek), a local server, or a hosted gateway.	A `base_url` + an API key.
`openai-chatgpt-oauth`	A ChatGPT Plus / Pro subscription via the Codex backend.	A ChatGPT subscription; auth in `omem setup`.

The openai-compat path is the escape hatch: it’s how you run a local model (point base_url at your local server), use any Chinese model platform with an OpenAI-compatible endpoint (Bailian / Volcengine Ark / Kimi / Zhipu GLM / MiniMax / DeepSeek and others — concrete base_url and model IDs in Configure the LLM provider), or any provider that speaks the OpenAI API. The matching fields are base_url, api_key_env, and api_key_keychain.

`curator.mode.<format>` — rewrite, or keep verbatim

For each format, the curator runs in one of two modes:

Value	What it does	Best for
`llm-full`	The LLM rewrites the document into a clean wiki page (and writes the abstract + tags).	Formats where structure is messy and a rewrite genuinely helps: `pdf`, `docx`, `pptx`, `xlsx`, `html` (the defaults).
`frontmatter-only`	The body is copied byte-for-byte from the parsed source; the LLM only writes the abstract + tags.	Already-clean formats: `md`, `txt` (the defaults). Cheaper, and preserves exact structure.

There’s a counter-intuitive upside to frontmatter-only: because the body isn’t rewritten, exact numbers and wording survive — a full rewrite can quietly round 11.3% to 11%, while a verbatim copy can’t. If you find the curator is smoothing important details on a format, switching it to frontmatter-only is a fix.

omem config set curator.mode.html frontmatter-only   # stop rewriting HTML bodies

curator.cross_language_auto_promote (default true) is the safety net: if a source’s language differs from output.language, OMem forces llm-full so you don’t end up with a Chinese body under an English abstract.

`parser.images.<format>` — how images inside a document are handled

Each format decides what happens to pictures embedded in it:

Value	What happens	Cost
`ocr`	Run OCR (RapidOCR), extracting text from the image.	Cheap; great for text-in-images.
`vlm`	Send the image to a vision model, which writes a description.	An LLM call per qualifying image; best for charts, screenshots, diagrams.
`off`	Extract the image to `assets/` but don’t describe it.	Free; the image isn’t searchable.

The defaults are deliberate: pdf defaults to ocr (PDFs are mostly text, so per-page vision would burn tokens for little gain), while pptx / docx / xlsx / standalone / mail / calendar default to vlm (their images are usually content). A common tweak is parser.images.mail off to skip describing email signature art:

omem config set parser.images.mail off

See file formats for which images actually clear the “worth describing” filter.

`*concurrency` — how many things run at once

Four knobs control parallelism. The two _global_concurrency caps are the safety valves — they bound how many provider calls happen simultaneously no matter how high the worker pools go:

Field	Default	Controls
`llm.global_concurrency`	`4`	Hard cap on simultaneous LLM calls across the whole process. `4` stays under Pro/Max OAuth’s ~4–8 RPM; raise it on high-quota endpoints.
`parser.vlm_global_concurrency`	`3`	Hard cap on simultaneous vision-model calls.
`ingest.curate_concurrency`	`2`	Worker pool for items within one kind.
`ingest.kind_concurrency`	`2`	Worker pool for kinds in parallel (file + mail + calendar at once).

These defaults are a conservative baseline tuned to run safely on an ordinary Mac (an 8 GB mac mini included), not a ceiling. The two worker pools multiply — kind × curate = how many items sit in the pipeline at once (here 2×2 = 4), which is the main driver of memory use. The two global_concurrency caps bound actual provider calls. Raise them if you have a roomy machine or a high-quota API key (omem config set llm.global_concurrency 16); lower the globals if you hit rate limits; set any to 1 for strictly serial debugging. All four have env-var overrides (e.g. OMEM_LLM_GLOBAL_CONCURRENCY).

`data.root` / `data.wiki_path` — ⚠ the two you don’t hand-edit

Field	Default	What it is
`data.root`	`"~/.local/share/omem"`	Internal cache: the content-addressed `raw/` archive + the SQLite db. You never browse it by hand.
`data.wiki_path`	`"~/omem/wiki"`	The user-visible vault of Markdown pages.

The rest, by section

The remaining fields are mechanical — set once, rarely revisited.

Top level

Field	Type	Default	Notes
`config_version`	int	`1`	Schema version pin for migrations.
`active_index`	`"fts5" \| "qmd"`	`"fts5"`	The live index backend. Switch it with `omem plugin enable`, not by hand.
`output.language`	`"auto" \| "zh" \| "en"`	`"auto"`	Language the curator writes in. `auto` follows the source document; `zh`/`en` force it.

`llm` (beyond provider)

Field	Type	Default	Notes
`llm.curator.model`	str	`""`	Model name. Empty by design — the wizard fills it.
`llm.curator.base_url`	str \| null	`null`	Endpoint override for `openai-compat`.
`llm.curator.api_key_env`	str \| null	`null`	Env var name holding the API key.
`llm.curator.api_key_keychain`	str \| null	`null`	Keychain account holding the key (service `omem-llm`). Wins over `api_key_env`.
`llm.curator.fallback_model`	str \| null	`"claude-opus-4-8"`	OAuth-only fallback when a prompt exceeds the threshold below.
`llm.curator.fallback_token_threshold`	int	`180000`	Token count that triggers the fallback model.
`llm.curator.max_output_tokens`	int \| null	`null`	Hard cap on output tokens; `null` uses the provider/model default.
`llm.vlm.*`	—	—	Same fields as `llm.curator.*`, for the vision model.

`parser` (beyond images)

Field	Type	Default	Notes
`parser.max_images_per_doc`	int 1–10000	`200`	Per-document OCR/VLM image cap; extras are extracted but not described (logged, never silent).
`parser.ocr_subprocess_batch`	int 0–1000	`20`	Restart the OCR worker every N images to bound memory. `0` disables isolation (debug only).

`ingest` (beyond concurrency)

Field	Type	Default	Notes
`ingest.formats.{pdf,docx,pptx,xlsx,md,txt,html,image}`	bool	`true`	Per-format master switch, applied across every source — set `false` to never ingest that format, attachments included.

`schedule`

Field	Type	Default	Notes
`schedule.interval_minutes`	int	`0`	The interval `omem install` uses. Set by the wizard; `0` disables auto-ingest. `omem install` reads this — you don’t repeat it as a flag.

`setup`

Field	Type	Default	Notes
`setup.wizard_language`	`"zh" \| "en"`	`"zh"`	Onboarding UI language. English users switch at the wizard’s first step.

`plugins.qmd` (beyond query_mode)

Present only after omem plugin install qmd; null otherwise.

Field	Type	Default	Notes
`plugins.qmd.subprocess_timeout_sec`	int	`600`	Per-call timeout. `600` covers a warm full-mode run; raise to `3600` for a cold full-mode rebuild. Env: `OMEM_QMD_SUBPROCESS_TIMEOUT_SEC`.
`plugins.qmd.index_name`	str	`"omem"`	qmd collection namespace (isolates OMem’s index from other qmd users).
`plugins.qmd.executable_path`	str	(filled at install)	Path to the `qmd` binary.

`kinds.{file,mail,calendar,loop}`

Every kind ships enabled: false — a clean install ingests nothing until the wizard turns a kind on.

`kinds.file`

Field	Type	Default	Notes
`kinds.file.enabled`	bool	`false`
`kinds.file.source`	str	`"local-files"`
`kinds.file.source_config.roots`	list[str]	`[]`	Folders to ingest. The wizard auto-detects OneDrive / iCloud / Dropbox / Documents.
`kinds.file.scope.max_file_size_mb`	int	`50`	Skip files larger than this.
`kinds.file.scope.exclude_patterns`	list[str]	`["~$", ".DS_Store", "node_modules/*"]`	Glob excludes.
`kinds.file.scope.failed_quarantine_cap`	int	`200`	Cap on the retried-failure list size.
`kinds.file.tombstone_mode`	`full_sweep \| skip`	`"full_sweep"`	`full_sweep` re-checks for deletions every run; `skip` disables it for very large, slow-disk corpora.

`kinds.mail`

Field	Type	Default	Notes
`kinds.mail.enabled`	bool	`false`
`kinds.mail.source`	str	`"mail-app"`	Apple Mail’s local store. Outlook sources are v1.5+.
`kinds.mail.source_config.accounts`	list[str]	`[]`	Accounts to ingest.
`kinds.mail.scope.time_window.since`	time-str	`"3m_ago"`	How far back to reach. Grammar below.
`kinds.mail.scope.time_window.until`	time-str \| null	`null`	Upper bound; `null` = no upper bound.
`kinds.mail.scope.folders`	list[str]	`["inbox", "sent"]`	Lowercase semantic folder names.
`kinds.mail.scope.max_messages_per_account`	int	`5000`
`kinds.mail.scope.include_attachments`	bool	`true`
`kinds.mail.scope.max_attachment_size_mb`	int	`50`
`kinds.mail.tombstone_mode`	`full_sweep \| skip`	`"full_sweep"`

`kinds.calendar`

Field	Type	Default	Notes
`kinds.calendar.enabled`	bool	`false`
`kinds.calendar.source`	str	`"calendar-app"`	Apple Calendar’s local store — not Outlook Web.
`kinds.calendar.scope.time_window.since`	time-str	`"3m_ago"`
`kinds.calendar.scope.time_window.until`	time-str	`"3m_from_now"`	Symmetric window: recent past + near future.
`kinds.calendar.scope.include_recurring_instances`	bool	`true`	Expand each occurrence of a recurring event.
`kinds.calendar.scope.max_events_per_account`	int	`5000`
`kinds.calendar.scope.calendars`	list[str] \| null	`null`	Sub-calendar whitelist; `null` = all sub-calendars.
`kinds.calendar.scope.include_attachments`	bool	`true`
`kinds.calendar.tombstone_mode`	`full_sweep \| skip`	`"full_sweep"`

`kinds.loop`

Field	Type	Default	Notes
`kinds.loop.enabled`	bool	`false`
`kinds.loop.source`	str	`"loop-resolver"`
`kinds.loop.source_config.chromium_profile_dir`	str	`"~/.config/omem/sessions/browser-profile"`	Persisted browser profile for SSO.
`kinds.loop.scope.max_fetch_concurrency`	int	`2`	Concurrent Loop page fetches.

What’s next

CLI commands — the commands that read this config.
File formats — how parser.images.*, ingest.formats.*, and curator.mode.* combine per format.

Configuration schema

Where it lives, and how to read it

Browse the schema

The decision points

plugins.qmd.query_mode — how qmd searches

llm.curator.provider / llm.vlm.provider — where the LLM calls go

curator.mode.<format> — rewrite, or keep verbatim

parser.images.<format> — how images inside a document are handled

*concurrency — how many things run at once

data.root / data.wiki_path — ⚠ the two you don’t hand-edit

The rest, by section

Top level

llm (beyond provider)

parser (beyond images)

ingest (beyond concurrency)

schedule

setup

plugins.qmd (beyond query_mode)

kinds.{file,mail,calendar,loop}

kinds.file

kinds.mail

kinds.calendar

kinds.loop