Skip to content

Configuration schema

Read this if you want to tune OMem beyond what the wizard set, and you want to know not just what a field is but what each value means and when to pick it. The decision points — how qmd searches, which LLM provider, how images are handled — get a full explanation here; the mechanical knobs get a compact table.

Your config is a single YAML file at:

~/.config/omem/config.yaml

(or $XDG_CONFIG_HOME/omem/config.yaml if you’ve set that). OMem writes a default skeleton on first run and migrates it forward automatically on load.

There is no omem config schema command. To inspect the effective config — validated, with defaults filled in — use:

Terminal window
omem config show # effective config (YAML)
omem config show --json # same, as JSON
omem config show --raw # the file verbatim (useful if it won't parse)
omem config get llm.curator.model # read one value
omem config set kinds.mail.scope.time_window.since 6m_ago # write one value

config set takes JSON for non-string values: omem config set kinds.file.scope.exclude_patterns '["~$*", "*.tmp"]'.

The schema is one nested tree. Expand a section and click a field to see its type, default, and what it controls — the two high-risk fields are marked:

data.root
HIGH RISK — changing this strands existing data. Read the note before editing.
type
str
default
"~/.local/share/omem"

Internal cache: the content-addressed raw/ archive + the SQLite db. You never browse this by hand. Changing it orphans every cached item — re-ingest would start from scratch.

The rest of this page walks the fields section by section. The ones with real choices to make are explained in full; the mechanical ones are tabled.


These are the handful of fields where the value genuinely changes how OMem behaves. They’re worth understanding before you touch them.

plugins.qmd.query_mode — how qmd searches

Section titled “plugins.qmd.query_mode — how qmd searches”

Only relevant if you’ve enabled the qmd index. It picks which retrieval paths qmd runs, trading speed for precision. The numbers below are measured on a single small page (4 documents); the gap between cold and warm matters because the embedding and reranker models load lazily on first use.

ValueWhat runsSpeedWhen to pick it
bm25FTS5 keyword search only — no LLM, no vectors.~2 sYou want qmd’s plumbing but essentially keyword behavior; fastest possible.
vectorVector similarity only (no keyword path).~85 s cold, ~3–5 s warmRarely the best pick — dropping the keyword path lowers recall vs no-rerank.
no-rerankHybrid: BM25 + vector + LLM query expansion, reranker off.~4 s warmThe recommended everyday mode. Precision is essentially the same as full at a fraction of the cost.
full (default)no-rerank + a cross-encoder reranker (qwen3-reranker).~365 s cold, ~5–10 s warmMaximum precision when you can absorb the reranker’s cold-start; on lighter hardware the reranker is the expensive part.

The practical advice: full is the default because it’s the most precise, but no-rerank is the one most people should run day to day — it keeps the hybrid recall (the part that finds cross-language and semantic matches) and only drops the reranker, which adds latency for a modest precision gain. Set it with:

Terminal window
omem config set plugins.qmd.query_mode no-rerank

llm.curator.provider / llm.vlm.provider — where the LLM calls go

Section titled “llm.curator.provider / llm.vlm.provider — where the LLM calls go”

OMem never ships a model; you point it at one. The curator (text → wiki) and the VLM (image → description) are configured independently and can use different providers. Four are supported:

ValueWhat it isYou need
anthropic-oauth (default)Your Claude Pro / Max subscription via OAuth. No API key to manage.A Claude subscription; auth is handled in omem setup.
anthropic-apiThe Anthropic API directly, billed per token.An ANTHROPIC_API_KEY.
openai-compatAny OpenAI-compatible endpoint — OpenAI, a local server, or a hosted gateway.A base_url + an API key.
openai-chatgpt-oauthA ChatGPT Plus / Pro subscription via the Codex backend.A ChatGPT subscription; auth in omem setup.

The openai-compat path is the escape hatch: it’s how you run a local model (point base_url at your local server) or any provider that speaks the OpenAI API. The matching fields are base_url, api_key_env, and api_key_keychain.

curator.mode.<format> — rewrite, or keep verbatim

Section titled “curator.mode.<format> — rewrite, or keep verbatim”

For each format, the curator runs in one of two modes:

ValueWhat it doesBest for
llm-fullThe LLM rewrites the document into a clean wiki page (and writes the abstract + tags).Formats where structure is messy and a rewrite genuinely helps: pdf, docx, pptx, xlsx, html (the defaults).
frontmatter-onlyThe body is copied byte-for-byte from the parsed source; the LLM only writes the abstract + tags.Already-clean formats: md, txt (the defaults). Cheaper, and preserves exact structure.

There’s a counter-intuitive upside to frontmatter-only: because the body isn’t rewritten, exact numbers and wording survive — a full rewrite can quietly round 11.3% to 11%, while a verbatim copy can’t. If you find the curator is smoothing important details on a format, switching it to frontmatter-only is a fix.

Terminal window
omem config set curator.mode.html frontmatter-only # stop rewriting HTML bodies

curator.cross_language_auto_promote (default true) is the safety net: if a source’s language differs from output.language, OMem forces llm-full so you don’t end up with a Chinese body under an English abstract.

parser.images.<format> — how images inside a document are handled

Section titled “parser.images.<format> — how images inside a document are handled”

Each format decides what happens to pictures embedded in it:

ValueWhat happensCost
ocrRun OCR (RapidOCR), extracting text from the image.Cheap; great for text-in-images.
vlmSend the image to a vision model, which writes a description.An LLM call per qualifying image; best for charts, screenshots, diagrams.
offExtract the image to assets/ but don’t describe it.Free; the image isn’t searchable.

The defaults are deliberate: pdf defaults to ocr (PDFs are mostly text, so per-page vision would burn tokens for little gain), while pptx / docx / xlsx / standalone / mail / calendar default to vlm (their images are usually content). A common tweak is parser.images.mail off to skip describing email signature art:

Terminal window
omem config set parser.images.mail off

See file formats for which images actually clear the “worth describing” filter.

*concurrency — how many things run at once

Section titled “*concurrency — how many things run at once”

Four knobs control parallelism. The two _global_concurrency caps are the safety valves — they bound how many provider calls happen simultaneously no matter how high the worker pools go:

FieldDefaultControls
llm.global_concurrency4Hard cap on simultaneous LLM calls across the whole process. 4 stays under Pro/Max OAuth’s ~4–8 RPM; raise it on high-quota endpoints.
parser.vlm_global_concurrency3Hard cap on simultaneous vision-model calls.
ingest.curate_concurrency2Worker pool for items within one kind.
ingest.kind_concurrency2Worker pool for kinds in parallel (file + mail + calendar at once).

These defaults are a conservative baseline tuned to run safely on an ordinary Mac (an 8 GB mac mini included), not a ceiling. The two worker pools multiply — kind × curate = how many items sit in the pipeline at once (here 2×2 = 4), which is the main driver of memory use. The two global_concurrency caps bound actual provider calls. Raise them if you have a roomy machine or a high-quota API key (omem config set llm.global_concurrency 16); lower the globals if you hit rate limits; set any to 1 for strictly serial debugging. All four have env-var overrides (e.g. OMEM_LLM_GLOBAL_CONCURRENCY).

data.root / data.wiki_path — ⚠ the two you don’t hand-edit

Section titled “data.root / data.wiki_path — ⚠ the two you don’t hand-edit”
FieldDefaultWhat it is
data.root"~/.local/share/omem"Internal cache: the content-addressed raw/ archive + the SQLite db. You never browse it by hand.
data.wiki_path"~/omem/wiki"The user-visible vault of Markdown pages.

The remaining fields are mechanical — set once, rarely revisited.

FieldTypeDefaultNotes
config_versionint1Schema version pin for migrations.
active_index"fts5" | "qmd""fts5"The live index backend. Switch it with omem plugin enable, not by hand.
output.language"auto" | "zh" | "en""auto"Language the curator writes in. auto follows the source document; zh/en force it.
FieldTypeDefaultNotes
llm.curator.modelstr""Model name. Empty by design — the wizard fills it.
llm.curator.base_urlstr | nullnullEndpoint override for openai-compat.
llm.curator.api_key_envstr | nullnullEnv var name holding the API key.
llm.curator.api_key_keychainstr | nullnullKeychain account holding the key (service omem-llm). Wins over api_key_env.
llm.curator.fallback_modelstr | null"claude-opus-4-8"OAuth-only fallback when a prompt exceeds the threshold below.
llm.curator.fallback_token_thresholdint180000Token count that triggers the fallback model.
llm.curator.max_output_tokensint | nullnullHard cap on output tokens; null uses the provider/model default.
llm.vlm.*Same fields as llm.curator.*, for the vision model.
FieldTypeDefaultNotes
parser.max_images_per_docint 1–10000200Per-document OCR/VLM image cap; extras are extracted but not described (logged, never silent).
parser.ocr_subprocess_batchint 0–100020Restart the OCR worker every N images to bound memory. 0 disables isolation (debug only).
FieldTypeDefaultNotes
ingest.formats.{pdf,docx,pptx,xlsx,md,txt,html,image}booltruePer-format master switch, applied across every source — set false to never ingest that format, attachments included.
FieldTypeDefaultNotes
schedule.interval_minutesint0The interval omem install uses. Set by the wizard; 0 disables auto-ingest. omem install reads this — you don’t repeat it as a flag.
FieldTypeDefaultNotes
setup.wizard_language"zh" | "en""zh"Onboarding UI language. English users switch at the wizard’s first step.

Present only after omem plugin install qmd; null otherwise.

FieldTypeDefaultNotes
plugins.qmd.subprocess_timeout_secint600Per-call timeout. 600 covers a warm full-mode run; raise to 3600 for a cold full-mode rebuild. Env: OMEM_QMD_SUBPROCESS_TIMEOUT_SEC.
plugins.qmd.index_namestr"omem"qmd collection namespace (isolates OMem’s index from other qmd users).
plugins.qmd.executable_pathstr(filled at install)Path to the qmd binary.

Every kind ships enabled: false — a clean install ingests nothing until the wizard turns a kind on.

FieldTypeDefaultNotes
kinds.file.enabledboolfalse
kinds.file.sourcestr"local-files"
kinds.file.source_config.rootslist[str][]Folders to ingest. The wizard auto-detects OneDrive / iCloud / Dropbox / Documents.
kinds.file.scope.max_file_size_mbint50Skip files larger than this.
kinds.file.scope.exclude_patternslist[str]["~$*", ".DS_Store", "node_modules/**"]Glob excludes.
kinds.file.scope.failed_quarantine_capint200Cap on the retried-failure list size.
kinds.file.tombstone_modefull_sweep | skip"full_sweep"full_sweep re-checks for deletions every run; skip disables it for very large, slow-disk corpora.
FieldTypeDefaultNotes
kinds.mail.enabledboolfalse
kinds.mail.sourcestr"mail-app"Apple Mail’s local store. Outlook sources are v1.5+.
kinds.mail.source_config.accountslist[str][]Accounts to ingest.
kinds.mail.scope.time_window.sincetime-str"3m_ago"How far back to reach. Grammar below.
kinds.mail.scope.time_window.untiltime-str | nullnullUpper bound; null = no upper bound.
kinds.mail.scope.folderslist[str]["inbox", "sent"]Lowercase semantic folder names.
kinds.mail.scope.max_messages_per_accountint5000
kinds.mail.scope.include_attachmentsbooltrue
kinds.mail.scope.max_attachment_size_mbint50
kinds.mail.tombstone_modefull_sweep | skip"full_sweep"
FieldTypeDefaultNotes
kinds.calendar.enabledboolfalse
kinds.calendar.sourcestr"calendar-app"Apple Calendar’s local store — not Outlook Web.
kinds.calendar.scope.time_window.sincetime-str"3m_ago"
kinds.calendar.scope.time_window.untiltime-str"3m_from_now"Symmetric window: recent past + near future.
kinds.calendar.scope.include_recurring_instancesbooltrueExpand each occurrence of a recurring event.
kinds.calendar.scope.max_events_per_accountint5000
kinds.calendar.scope.calendarslist[str] | nullnullSub-calendar whitelist; null = all sub-calendars.
kinds.calendar.scope.include_attachmentsbooltrue
kinds.calendar.tombstone_modefull_sweep | skip"full_sweep"
FieldTypeDefaultNotes
kinds.loop.enabledboolfalse
kinds.loop.sourcestr"loop-resolver"
kinds.loop.source_config.chromium_profile_dirstr"~/.config/omem/sessions/browser-profile"Persisted browser profile for SSO.
kinds.loop.scope.max_fetch_concurrencyint2Concurrent Loop page fetches.
  • CLI commands — the commands that read this config.
  • File formats — how parser.images.*, ingest.formats.*, and curator.mode.* combine per format.