Skip to content

Performance tuning

Read this if OMem works but feels slow, or you hit a provider rate limit, or ingest uses more memory than you’d like. Each section is a symptom → the knob that addresses it. (For the full field reference, see the config schema; this page is the “what do I actually change” view.)

Before tuning, confirm where the time goes — omem ingest watch shows each phase as it completes. A run that looks stuck is usually one of: a cold qmd model loading, a big PDF in OCR, or a slow Loop fetch — all expected, none worth “tuning.” Tune only a pattern of slowness, not a one-off.

Symptom: 429 rate limit errors during ingest; items failing and retrying.

Knob: Lower the global LLM cap — this is the one that actually throttles API calls:

Terminal window
omem config set llm.global_concurrency 2 # default 4; lower further if still rate-limited
omem config set parser.vlm_global_concurrency 2

These two are cross-process ceilings on simultaneous LLM / vision calls. The per-kind worker pools (ingest.curate_concurrency, ingest.kind_concurrency) raise parallelism, but these global caps bound actual provider calls — so for rate limits, lower the globals, not the pools.

Symptom: A full run takes longer than you’d like (not a single stuck item).

What’s actually happening: Most of the time is LLM curation, and that’s bounded by llm.global_concurrency. If your provider tolerates it, raising it speeds ingest up:

Terminal window
omem config set llm.global_concurrency 16 # if your provider allows it

But remember most re-runs are already cheap — the curation cache skips unchanged items entirely. If a re-run is slow, check omem ingest history: lots of recur or ingest means real work; lots of cache means it’s already fast and the wall-clock is just the unavoidable new items.

Symptom: omem query takes seconds, not milliseconds.

The default fts5 index answers in ~50 ms — if that’s slow, something else is wrong (check omem doctor). Slow queries almost always mean you’ve enabled qmd, whose modes trade speed for precision:

ModeSpeedUse when
bm25fastestyou want near-keyword speed
no-rerankfast (warm)the everyday recommendation — semantic recall without the reranker latency
fullslowestmaximum precision, you can absorb the reranker
Terminal window
omem config set plugins.qmd.query_mode no-rerank

The first qmd query after enabling is always slow (it downloads ~2.2 GB of models, once). That’s not a tuning problem — it’s a one-time cost. See retrieval.

Symptom: Memory climbs on image-heavy documents.

Knob: OCR runs in a subprocess that restarts every N images to bound memory. The default (20) is conservative; you rarely need to touch it, but if memory still climbs:

Terminal window
omem config set parser.ocr_subprocess_batch 10 # restart more often = lower peak

There’s also a per-document image cap (parser.max_images_per_doc, default 200) that skips the long tail rather than ballooning memory — anything skipped is logged, never silent.

  • The worker pools (curate_concurrency / kind_concurrency) — the global caps bound real cost anyway; raising the pools rarely helps and can worsen rate limits.
  • fts5 — it’s already fast; there’s nothing to tune. Quality comes from qmd, not from tuning fts5.
  • First-run cost — the first ingest and the first qmd query are slow by nature (seeing everything once, downloading models once). That’s not steady-state performance.