ublo
bogdan's (micro)blog

bogdan

bogdan » Moving from Stateless LLMs to Situated Intelligence

10:45 pm on Jun 10, 2026 | read the article | tags:

We’ve spent the last few years treating LLMs as if their main advantage is that they know almost everything. They can explain quantum mechanics, debug a convoluted CSS grid layout, and rewrite Romanian manele (you have been warned!) lyrics in the voice of Constantin Noica. But there is a fundamental mismatch between how these models work and how we actually make decisions.

The basic LLM interaction is still mostly stateless. Even when products add chat history or file uploads, the model itself does not automatically maintain an inspectable, evolving model of your projects, stakeholders, failed attempts, beliefs, and outcomes. You end up re-explaining the same constraints, re-contextualizing the same stakeholders, and re-hashing the same history. It’s like trying to lead a project while suffering from short-term memory loss.

I built SecondContext to bridge that gap. It is a prototype for an LLM assistant that behaves more like a situated expert: a system that accumulates experience alongside you.

Rather than treating every interaction as a blank slate, SecondContext operates as a persistent cognitive layer. It stores structured memories about people, projects, beliefs, and outcomes. If I ask it to help me draft an infrastructure proposal review for Alex, it doesn’t just output generic corporate filler. It has context that Alex is competent but perpetually busy, that he responds better to a narrow, API-focused scope, and that my previous attempts worked only when I presented a specific technical constraint. The assistant doesn’t just draft the message; it suggests the strategy, warns me about the risks, and generates follow-up scenarios based on how these people have responded to me in the past.

The common engineering answer to this problem is RAG: Retrieval-Augmented Generation. RAG is useful, but most systems are optimized for retrieving facts from static documents. SecondContext uses retrieval too, but the object being retrieved is different: not only documents, but accumulated work context: people, outcomes, preferences, failed strategies, uncertainty, and changing beliefs.

There is an obvious risk here: a memory system about people can become creepy or overconfident very quickly. That is why I think the important design principle is not just persistence, but inspectable persistence. SecondContext stores evidence, confidence, timestamps, and uncertainty; it distinguishes observations from interpretations; and it makes memories editable and deletable. A situated assistant should not secretly profile people. It should expose the assumptions it is using.

This architecture also aligns with the academic work around CoALA: Cognitive Architectures for Language Agents. I didn’t set out to build a formal cognitive architecture. I just wanted an assistant that remembered that Alex hates vague emails. But looking at the literature, the direction feels clear: useful agentic behavior requires a modular way to perceive, store, retrieve, act, and update. SecondContext is a practical, narrow-scoped implementation of these principles. It is a move toward building agents that aren’t just smarter, but more situated: able to function as persistent teammates rather than search engines trapped in chat boxes.

I’ve intentionally kept the stack boring: Go, Postgres, and Qdrant. No proprietary, un-debuggable decision layer. The goal is to keep the system inspectable and transparent. If the assistant gives a bad recommendation, I want to see exactly why it retrieved that specific memory, how it scored that strategy, and what evidence led to its current belief.

The current version is already a working MVP, with a baseline that supports memory ingest and search, hybrid retrieval, salience reranking, person/topic summaries, belief tracking, scenario generation, outcome feedback, and a debug view for comparing stateless versus memory-augmented responses.

This is still an experiment. It is narrow, early, and intentionally boring in its implementation. But it is testing a simple hypothesis: for recurring work, intelligence without memory is mostly a party trick. Intelligence with inspectable memory, feedback, and uncertainty can become a real tool.

You can find the architecture, demo, and code here: https://github.com/bdobrica/SecondContext

Second Context

bogdan

bogdan » Stop Searching by Coincidence: The Case for Hybrid WordPress Search

05:10 pm on May 17, 2026 | read the article | tags:

I like WordPress. I’ve been using it long enough to know where it shines and where it very clearly doesn’t.

Search is one of those areas everyone quietly accepts as “good enough”, until the moment it actually matters. And when it does, you start noticing that WordPress search is not really search in the way users expect it to be. It’s closer to a polite filter. A LIKE query with a UI.

This article is the first in a series where I’ll document how I ended up building SearchPixel, a WordPress plugin backed by a separate search infrastructure that tries to move from string matching to meaning matching.

Before getting into embeddings, hybrid ranking, or architecture, I want to start with the uncomfortable part: why this problem exists at all.

Because if we don’t agree there’s a real problem here, everything else just looks like unnecessary complexity.

What WordPress search actually does

At its core, WordPress search is fairly simple. It takes the query string, splits it into words (loosely), generates an SQL query, then runs a set of LIKE '%term%' conditions over post title, content and excerpt to return whatever matches.

LIKE answers this question:

does this exact sequence of characters appear somewhere in this text?

Users, however, are usually asking something closer to:

which page on this site talks about the thing I’m thinking of?

Those two questions overlap sometimes. Often by accident.

Humans search by meaning. LIKE searches by coincidence.

Users rarely type what you wrote. They type half-remembered ideas, synonyms, typos, vague descriptions, problems, not solutions.

Say you have a post titled:

“How to speed up WordPress with caching and CDN”

Users will search for something like: “site is slow”, “pages load slowly on mobile”, “optimize wordpress performance”, “cloudflare setup”, “cache plugin” or anything else vaguely related. Keyword search might do fine on “optimize wordpress performance”. It might get lucky with “cache plugin”. It will almost certainly miss “site is slow”. Not because the content isn’t relevant, but because relevance here is inferred from string overlap, not from meaning. And overlap is a fragile proxy.

Most improvements follow the same path. Start with better tokenization, weight titles higher, include tags and categories and do fuzzy matching from stemming and synonym lists. At some point, most people end up using an external search engine like Elasticsearch.

All of these help. A lot, actually. But they still rely on the same assumption:

relevance can be inferred from shared tokens

That assumption breaks in very predictable ways, mostly because human beings don’t coordinate their vocabulary with your content.

They will search for “cost” when your button says “price,” or “delivery” when your text says “shipping.” You can patch this by maintaining custom synonym dictionaries. It works, right up until it doesn’t, and you realize you’ve just built a brand-new maintenance problem.

Then, add human error to the mix. Combine fast typing with meme-generating mobile autocorrect, and your logs fill up with “aple,” “wordpres,” and “coudflare.” Keyword search doesn’t know what to do with a typo, so it just returns a blank page.

But the biggest breaking point is intent. If a user searches for “how to migrate” and your top article is titled “Moving between hosting providers”, a LIKE query treats them as entirely unrelated. They share no tokens.

By treating string overlap as a proxy for relevance, you aren’t actually matching intent—you’re just hoping for a linguistic coincidence. This will get worse really fast if you borrow idioms and expressions from other languages in your writing, turning “English” content into something only mostly English.

Even robots get confused by keyword search. Transitioning from exact character matches to conceptual relationships.

What hybrid search changes

To fix this, we have to change the underlying math of how search works. We start with semantic search to change the representation. Instead of comparing words, it compares embeddings: vector representations of text that (roughly) encode meaning. Queries and documents that talk about similar things end up closer together in this space. So “site is slow” can retrieve content about caching and CDNs, even if those exact words never appear. It’s not magic. It’s just a different coordinate system.

Keyword search asks: do these words overlap?
Semantic search asks: are these ideas related?

Both questions matter.

But semantic search has its own failure modes. It can be: too fuzzy, too tolerant and most of all surprisingly wrong in very confident ways. Exact matches still matter when you’re looking for error codes, version numbers, product names, quotes or any specific identifiers. Semantic search can rank “kind of related” above “exactly what I asked for”. Which is frustrating.

So this isn’t a “keyword vs semantic” story. It’s a both story. Hence the hybrid.

Why WordPress makes this harder than it sounds

There’s also an architectural reality check. WordPress is PHP, request–response, optimized for publishing and rendering pages. It’s not designed to run transformer models, compute embeddings, maintain vector indexes, perform semantic search, nor keep latency predictable under load. Sure, you can force it, but I wouldn’t recommend it.

The shape that makes sense, in practice, looks like this. You get a WordPress plugin for integration, UI, and content selection, paired with an external service for embeddings, indexing, and retrieval, with a clean API boundary between them.

That’s the direction SearchPixel took.

What SearchPixel is (today)

SearchPixel is a WordPress plugin plus a backend service that indexes selected WordPress content (you choose what goes in), then retrieves a capped number of top results to keep things fast by using a hybrid approach under the hood. All by trying to stay boring in production.

Right now it’s free while I iterate. If operating costs ever become significant, there will probably be a small cost attached because as far as I know, GPUs don’t run on enthusiasm alone.

What’s next

In the next article, I’ll move from “this is broken” to “this is how I designed around it”:

  • architecture choices
  • what runs where
  • indexing trade-offs
  • what I limited on purpose
  • where latency actually comes from

For now, the short version is this:

WordPress search checks whether your content contains the words.
Semantic search checks whether your content contains the meaning.

Users usually come for meaning. So that’s where I started.

This is part one of an ongoing series building SearchPixel. If you want to catch the next post on architecture choices and indexing trade-offs, hit the Follow button so you don’t miss it.

aceast sait folosește cookie-uri pentru a îmbunătăți experiența ta, ca vizitator. în același scop, acest sait utilizează modulul Facebook pentru integrarea cu rețeaua lor socială. poți accesa aici politica mea de confidențialitate.