Performance tuning
Read this if OMem works but feels slow, or you hit a provider rate limit, or ingest uses more memory than you’d like. Each section is a symptom → the knob that addresses it. (For the full field reference, see the config schema; this page is the “what do I actually change” view.)
First: is it slow, or just working?
Section titled “First: is it slow, or just working?”Before tuning, confirm where the time goes — omem ingest watch shows each phase as it completes. A run that looks stuck is usually one of: a cold qmd model loading, a big PDF in OCR, or a slow Loop fetch — all expected, none worth “tuning.” Tune only a pattern of slowness, not a one-off.
Hitting provider rate limits
Section titled “Hitting provider rate limits”Symptom: 429 rate limit errors during ingest; items failing and retrying.
Knob: Lower the global LLM cap — this is the one that actually throttles API calls:
omem config set llm.global_concurrency 2 # default 4; lower further if still rate-limitedomem config set parser.vlm_global_concurrency 2These two are cross-process ceilings on simultaneous LLM / vision calls. The per-kind worker pools (ingest.curate_concurrency, ingest.kind_concurrency) raise parallelism, but these global caps bound actual provider calls — so for rate limits, lower the globals, not the pools.
Ingest feels slow overall
Section titled “Ingest feels slow overall”Symptom: A full run takes longer than you’d like (not a single stuck item).
What’s actually happening: Most of the time is LLM curation, and that’s bounded by llm.global_concurrency. If your provider tolerates it, raising it speeds ingest up:
omem config set llm.global_concurrency 16 # if your provider allows itBut remember most re-runs are already cheap — the curation cache skips unchanged items entirely. If a re-run is slow, check omem ingest history: lots of recur or ingest means real work; lots of cache means it’s already fast and the wall-clock is just the unavoidable new items.
Queries feel slow
Section titled “Queries feel slow”Symptom: omem query takes seconds, not milliseconds.
The default fts5 index answers in ~50 ms — if that’s slow, something else is wrong (check omem doctor). Slow queries almost always mean you’ve enabled qmd, whose modes trade speed for precision:
| Mode | Speed | Use when |
|---|---|---|
bm25 | fastest | you want near-keyword speed |
no-rerank | fast (warm) | the everyday recommendation — semantic recall without the reranker latency |
full | slowest | maximum precision, you can absorb the reranker |
omem config set plugins.qmd.query_mode no-rerankThe first qmd query after enabling is always slow (it downloads ~2.2 GB of models, once). That’s not a tuning problem — it’s a one-time cost. See retrieval.
Ingest uses too much memory
Section titled “Ingest uses too much memory”Symptom: Memory climbs on image-heavy documents.
Knob: OCR runs in a subprocess that restarts every N images to bound memory. The default (20) is conservative; you rarely need to touch it, but if memory still climbs:
omem config set parser.ocr_subprocess_batch 10 # restart more often = lower peakThere’s also a per-document image cap (parser.max_images_per_doc, default 200) that skips the long tail rather than ballooning memory — anything skipped is logged, never silent.
What’s probably not worth tuning
Section titled “What’s probably not worth tuning”- The worker pools (
curate_concurrency/kind_concurrency) — the global caps bound real cost anyway; raising the pools rarely helps and can worsen rate limits. - fts5 — it’s already fast; there’s nothing to tune. Quality comes from
qmd, not from tuning fts5. - First-run cost — the first ingest and the first qmd query are slow by nature (seeing everything once, downloading models once). That’s not steady-state performance.
What’s next
Section titled “What’s next”- Configuration schema — every concurrency and timeout field.
- Observe ingest — find which phase is slow before you tune.