12:05 am on Jun 25, 2026 | read the article | tags: medium
This is the next step in my series that started with Stop Searching by Coincidence. In my last post, I argued that hybrid search isn’t optional for e-commerce relevance. But making that theory boringly reliable in production? That’s another beast. This article is about that second step: choosing the engine that could carry it all.
My constraints were not generic, venture-backed «RAG startup» constraints. They were operationally specific and heavily informed by my day job, where I experienced first hand «routine» operations for managing hundreds of millions of vectors grouped across thousands of collections, handling thousands of requests per minute with tight, double-digit millisecond latency targets. There, the SRE team leans heavily on Helm combined with FluxCD. But for my project, I wanted a different path. First and foremost, I am paying for this infrastructure out of my own pocket, which breeds an immediate obsession with simplicity and resource control. I still love FluxCD for GitOps, but I deliberately choose to avoid Helm for the database tier here. I want maximum predictability and zero abstract abstraction layers over my storage.
When you view the database landscape through that lens, the options narrow quickly. If all I wanted was raw capability, Vespa would have stayed in the final round longer. If all I wanted was a giant distributed vector platform, Milvus would have stayed there too. If all I wanted was managed convenience, Pinecone would be hard to ignore.
But I needed a serious vector database that delivers sparse vectors, native hybrid search primitives, predictable persistence, and a deployment model that feels close to «run the binary, mount the volume, wire the StatefulSet».
That is how I ended up choosing Qdrant. It provides first-class sparse vectors, server-side fusion (RRF/DBSF), robust snapshotting, and an elegant single-process model written in Rust that leaves less operational surface area than its multi-service competitors.
But let’s be perfectly honest: Qdrant isn’t magic, and getting it to fit my architecture required tearing up my original playbook.
A lot of vector database comparisons quietly assume a single large corpus, one product team, and one clean retrieval stack. What I am building with my plugin is closer to a fleet problem: managing thousands of independent e-commerce sites.
I wasn’t starting from zero. The early versions of the architecture relied on a Redis + RediSearch setup. I liked it. It was fast, and it fit naturally with per-site allocation using a custom knapsack strategy across master/replica pairs. In fact, the current codebase still supports RediSearch. I deliberately kept it there because I’m considering offering a self-hosted «appliance» version down the road: a quick Docker Compose or Helm chart for people who want to index internal documents on their own iron.
But for the multi-tenant SaaS scale I wanted, RediSearch hit three massive walls:
FT.HYBRID, its native hybrid capabilities felt rudimentary compared to dedicated engines. After experiment after experiment, I realized that without deep, server-side hybrid reranking, the search results just weren’t matching user intent.I also took a hard look at Vespa. I met one of the Vespa developers, and talking to them was actually what triggered my obsession with hybrid search. Vespa is a phenomenal engineering achievement with a brilliant phased-ranking model. But its self-managed architecture requires config servers and Apache ZooKeeper.
Between work deadlines and pulling overtime at my day job, my after-work capacity to manage infrastructure is a finite resource. I didn’t want to spend my nights managing ZooKeeper.
I needed a lean, boring, cloud-native operating model.
I evaluated the candidates against seven core criteria: performance headroom, true on-disk persistence, native sparse/dense hybrid support, Kubernetes simplicity, multi-tenant scalability, low operational cost, and structural alignment with how I already think about data placement.
| Candidate | Persistence Story | Hybrid / Sparse Support | Deployment Shape | Best Fit | My Read |
|---|---|---|---|---|---|
| Qdrant | Snapshots, mmap/on-disk HNSW. Shards ready immediately on target. | First-class sparse vectors; server-side RRF and DBSF. | Single binary/container. Built-in cluster mode. | Teams wanting serious vector search without platform bloat. | Chosen |
| Weaviate | Persistent storage and crash-tolerant writes. | Native BM25 + vector hybrid. | Kubernetes path is heavily Helm-first. | Teams wanting an all-in-one stack comfortable with Helm. | Good tech, wrong operational shape for me. |
| Milvus | Highly durable distributed layers, but broad dependency footprint. | Native BM25 and sparse vectors. | Requires an Operator/Helm; relies on etcd, Pulsar, and MinIO. | Massive enterprise teams with dedicated infra engineers. | Incredibly powerful, far too heavy for a lean project. |
| Vespa | Mature serving engine; self-managed requires ZooKeeper. | Peerless hybrid ranking flexibility. | Operator/Helm-based; substantial footprint. | Search-centric enterprises willing to operate heavy machinery. | Brilliant engineering, but a massive operational commitment. |
| Pinecone | Managed-first. “Pinecone Local” is just an emulator. | Excellent hybrid and sparse support. | SaaS-only; poor fit for self-hosted K8s constraints. | Teams optimizing for zero-ops over control. | Violates my requirement for infrastructure control. |

When I first started sketching out this project, I called it ThinkPixel. I liked the sound of it, but when I ran it by my successful plugins-developing friends, they gave me some blunt, necessary feedback: “Make it explicit. Put ‘Search’ in the name.” They were right. The project became SearchPixel, and Qdrant became its engine.
Qdrant won because it nailed the operational sweet spot. Its clustering model is built right into the core process itself. To spin up a cluster, you enable cluster mode, give the first peer a --uri, and let the other peers join with a --bootstrap command. Conceptually, it behaves like a single clustered storage process rather than a massive distributed ecosystem.
Furthermore, independent evaluations (like Reddit Engineering’s public write-up) confirm that while platforms like Milvus excel at decoupling ingestion from query loads at massive scale, Qdrant consistently wins on raw, single-node query latency. For SearchPixel, a simpler architecture with blazing p99 latency was vastly more valuable than a sprawling ecosystem built for someone else’s scale.
I do not want to oversell it: arriving at this setup came with a heavy dose of initial architectural frustration.
My original plan was simple: create one physical Qdrant collection per WordPress site. It mirrored my Redis mental model perfectly. Then I hit a wall in Qdrant’s production documentation:
Do not create thousands of tiny collections. It is an explicit anti-pattern that destroys performance and spikes metric cardinality.
I was incredibly frustrated. I almost walked away. But instead of abandoning the engine, I looked at how to adapt.
Instead of creating 100,000 separate collections, the correct pattern is to create a modest number of large, shared collections grouped across multiple independent Qdrant clusters. I use a payload index on site_id (marked with is_tenant: True) and force a filter on every single incoming query.
By combining Qdrant’s payload filtering with my existing knapsack allocation logic at the application layer, I got the best of both worlds:
In my production stack, I handle hybrid search entirely on the server side using Qdrant’s Reciprocal Rank Fusion (RRF).
Below is a minimal, production-aligned Python example that runs a true hybrid search against Qdrant: it fuses a sparse (BM25-style) and dense (semantic) query using server-side Reciprocal Rank Fusion, while isolating results by site_id inside a shared multi-tenant collection:
from qdrant_client import QdrantClient, models
def hybrid_search(indexing_node: str, collection_name: str, dense_vector: list[float], sparse_vector: dict[int, float], site_id: str, limit: int = 20):
# Initialize the Qdrant client (connects to the node handling this site)
client = QdrantClient(url=indexing_node)
# Perform a hybrid query with dense + sparse prefetch and server-side RRF fusion
response = client.query_points(
collection_name=collection_name,
# Use `prefetch` to define multiple vector searches that will be fused
prefetch=[
models.Prefetch(
query=dense_vector,
using="dense", # Use the dense vector field
limit=limit,
query_filter=models.Filter(
must=[models.FieldCondition(
key="site_id",
match=models.MatchValue(value=site_id) # Enforce multi-tenant isolation
)]
)
),
models.Prefetch(
query=models.SparseVector(
indices=list(sparse_vector.keys()), # Sparse token IDs (e.g. BM25 terms)
values=list(sparse_vector.values()) # Corresponding weights
),
using="sparse", # Use the sparse vector field
limit=limit,
query_filter=models.Filter(
must=[models.FieldCondition(
key="site_id",
match=models.MatchValue(value=site_id)
)]
)
)
],
# Apply Reciprocal Rank Fusion (RRF) to merge dense + sparse results
query=models.RrfQuery(rrf=models.Rrf(k=60)),
# Final result limit after fusion
limit=limit,
# Fetch only the fields we care about in the result
with_payload=models.WithPayloadSelector(
include=models.PayloadIncludeSelector(fields=["post_id", "text"])
),
)
# Extract simplified result objects
return [
{
"id": point.payload.get("post_id"),
"text": point.payload.get("text"),
"score": point.score
}
for point in response
if point.payload is not None
]
(Note: If you are doing rapid client-side experimentation or complex A/B testing with custom business weights, you can calculate fusion manually in your application code. But for predictable, low-latency production execution, offloading RRF entirely to the database layer is an absolute game-changer.)
To keep operations lean, I bypass Helm charts and operators entirely. Because Qdrant doesn’t require an external cluster state manager, you can coordinate a resilient, self-bootstrapping 3-node cluster natively inside a vanilla Kubernetes StatefulSet.
Here is the exact configuration shape I use to ensure peers bootstrap automatically using the stateful pod network identity:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: qdrant
spec:
replicas: 1 # Single-node for now; can be scaled out later with cluster mode
serviceName: qdrant # Required for stable DNS identity in StatefulSets
selector:
matchLabels:
app.kubernetes.io/name: qdrant
template:
metadata:
labels:
app.kubernetes.io/name: qdrant
spec:
initContainers:
# Ensure volumes have correct ownership before main container starts
- name: ensure-dir-ownership
image: docker.io/qdrant/qdrant:v1.13.4
command: ["chown", "-R", "1000:2000", "/qdrant/storage", "/qdrant/snapshots", "/qdrant/snapshot-restoration"]
volumeMounts:
- name: qdrant-storage
mountPath: /qdrant/storage
- name: qdrant-snapshots
mountPath: /qdrant/snapshots
- name: qdrant-snapshot-restoration
mountPath: /qdrant/snapshot-restoration
containers:
- name: qdrant
image: docker.io/qdrant/qdrant:v1.13.4
command: ["/bin/bash", "-c"]
args: ["./config/initialize.sh"] # Custom script for first-time init or snapshot restoration
env:
- name: QDRANT_INIT_FILE_PATH
value: /qdrant/init/.qdrant-initialized # Used by your script to detect first-time boot
ports:
- name: http
containerPort: 6333
- name: grpc
containerPort: 6334
readinessProbe:
httpGet:
path: /readyz
port: 6333 # Qdrant’s built-in readiness endpoint
securityContext:
runAsUser: 1000
runAsGroup: 2000
runAsNonRoot: true
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true # Enforces tight security profile
volumeMounts:
- name: qdrant-storage
mountPath: /qdrant/storage # Persistent vector index data
- name: qdrant-snapshots
mountPath: /qdrant/snapshots # Snapshot output directory
- name: qdrant-snapshot-restoration
mountPath: /qdrant/snapshot-restoration # Where snapshots get restored from
- name: qdrant-config
mountPath: /qdrant/config/initialize.sh
subPath: initialize.sh # Your bootstrap script
- name: qdrant-config
mountPath: /qdrant/config/production.yaml
subPath: production.yaml # Optional override config
- name: qdrant-init
mountPath: /qdrant/init # Temp marker dir for init checks
volumes:
- name: qdrant-config
configMap:
name: qdrant-config # Provides both the init script and config file
- name: qdrant-init
emptyDir: {} # Used for writing a boot-complete marker file
volumeClaimTemplates:
- metadata:
name: qdrant-storage
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 200Gi
- metadata:
name: qdrant-snapshots
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 200Gi
- metadata:
name: qdrant-snapshot-restoration
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 200Gi
Cluster mode in Qdrant doesn’t require an external consensus service. Instead, I use a tiny initialize.sh script that uses StatefulSet DNS and a consistent peer URI strategy. Pod 0 becomes the initial node, and all others bootstrap from it using –bootstrap.
#!/bin/sh
# Extract the pod index from the StatefulSet hostname
# e.g. qdrant-0 → 0, qdrant-2 → 2
SET_INDEX=$${HOSTNAME##*-}
# For the first pod (index 0), start the cluster and
# become the initial peer
# https://github.com/qdrant/qdrant/blob/master/tools/entrypoint.sh
if [ "$SET_INDEX" = "0" ]; then
exec ./entrypoint.sh --uri 'http://qdrant-0.qdrant:6335'
else
# For other pods, join the cluster by bootstrapping from pod 0
exec ./entrypoint.sh \
--bootstrap 'http://qdrant-0.qdrant:6335' \
--uri "http://qdrant-$SET_INDEX.qdrant:6335"
fi
If you are currently evaluating vector search engines for a multi-tenant or multi-site workload, save yourself the abstract benchmark review sessions and follow a boring, systematic migration path:
site_id, category, and brand into payload indexes before you push data to avoid paying an unindexed disk-access tax later.indexing_threshold or m: 0 to bypass HNSW graph construction overhead during initial hydration. Re-enable it only when your initial data is warm.In the next post in this series, I’ll show you a bird’s-eye view of the overall infrastructure architecture. I’ll break down the main architectural components, map out what lives where, and look at how I’m running this cluster on bare-metal virtual servers inside Hetzner. Spoiler alert: because I appreciate highly performant, remarkably affordable, European-based cloud providers when I’m bootstrapping on my own dime.
If you want to track how to take hybrid search out of the research lab and make it stay completely boring in production, hit the follow button and stay tuned. For now, Qdrant does exactly what I ask of it: stay fast, stay boring, and stay out of the way.
aceast sait folosește cookie-uri pentru a îmbunătăți experiența ta, ca vizitator. în același scop, acest sait utilizează modulul Facebook pentru integrarea cu rețeaua lor socială. poți accesa aici politica mea de confidențialitate.