Pipeline Overview
- Query embedding — Generate 768D (full) and 256D (shortlist) embeddings via Gemini 2
- Matryoshka vector search — 256D shortlist pass narrows to 200 candidates, then 768D reranks for precision
- BM25 keyword search — SQLite FTS5 with Porter stemming for exact-match queries
- Immediate recall — 256D search over raw session messages for very recent context
- RRF fusion — Reciprocal Rank Fusion combines signals: 70% vector + 30% BM25
- Recency boost — Exponential decay (
0.25 * exp(-age / 120s)) weights recent memories higher - Temporal boosting — For time-based queries (“last week”), blends RRF (60%) with temporal relevance (40%)
- Deduplication — Content-level dedup and query echo filtering
Why Hybrid?
Vector search excels at semantic similarity (“Where does she work?” matches “Alice is employed at Google”) but misses exact terms. BM25 catches keyword matches that vectors may rank lower. RRF fusion gets the best of both without needing manual score calibration.Matryoshka Embeddings
Cognis uses Gemini 2’s Matryoshka embeddings, which produce useful representations at multiple dimensions from a single embedding call:- 256D — Fast candidate shortlisting. Used for the first-pass filter and immediate recall
- 768D — Full precision reranking. Used to re-score the shortlisted candidates
RRF Fusion
Reciprocal Rank Fusion combines ranked lists without needing comparable scores:vector_weight = 0.70bm25_weight = 0.30k = 10
CognisConfig:
Recency Boost
Recent memories get a score boost using exponential decay:recency_boost_weight and recency_half_life_seconds.
Temporal Query Detection
When the query contains time references (“last week”, “yesterday”, “in March”), Cognis detects this and adjusts scoring:Search Scoping
- Extracted memories are searched globally across all sessions for the owner/agent
- Immediate recall (raw messages) is scoped to the current session only
- Both are merged, deduplicated, and scored together
Tuning Tips
| Goal | Config Change |
|---|---|
| More keyword-sensitive | Increase bm25_weight, decrease vector_weight |
| Stricter results | Increase similarity_threshold (default: 0.3) |
| More candidates | Increase shortlist_size (default: 200) |
| Less recency bias | Decrease recency_boost_weight or increase recency_half_life_seconds |