10:30 am on Dec 1, 2025 | read the article | tags: buggy
recently, i realized i have a special talent: whenever i rely on someone else’s something, the universe conspires to remind me why i usually build* things myself. so, yes, i’ve started writing my own LLM Gateway.
*here build = start and never finish (mean people say)
why? because i wanted to work on a personal project: an AI companion powered mostly by gemini nano banana (still the cutest model name ever), while also playing with some image-to-video stuff to generate animations between keyframes. nothing complicated, just the usual «relaxing weekend» kind of project that ends up consuming two months and part of your soul.
how it started
somewhere around february this year i added a tiny PoC gateway in one of our kubernetes clusters at work. just to see what’s possible, what breaks, what costs look like. i picked berryai’s litellm because:
or so i thought…
the PoC got traction fast, people started using it, and now i’m actually running two production LiteLLM instances. so this wasn’t just a toy experiment. it grew into a fairly important internal service.
and then the problems started.
the «incident»
prisma’s python client (yes, the Python one) thought it was a brilliant idea to install the latest stable Node.js at runtime.
i was happily watching anime on my flight to Tallinn, for one of our team’s meetings when node 25 dropped. karpenter shuffled some pods. prisma wasn’t ready. our deployment exploded in the most beautiful, kubernetes-log-filling way sending chills on my colleagues’ spines. sure, they patched it quickly and yes, i found eventually a more permanent solution.
but while digging around, i realized the prisma python client (used under the hood by litellm) isn’t exactly actively maintained anymore making my personal «production red flag detector» to start screaming. LiteLLM’s creators ignoring the issue definitely didn’t help.
latency, my beloved
red flag number two: overhead. we’re running LiteLLM on k8s with hpa, rds postgres, valkey, replication, HA. the whole cloud-enterprise-lego-set. and despite all that, the gateway added seconds of latency on top of upstream calls. with p95 occasionally touching 20 seconds.
i tweaked malloc. i tweaked omp. i tweaked environment variables i’m pretty sure i shouldn’t have touched without adult supervision. nothing changed.
cost tracking? it’s… there. existing in a philosophical sense. about as reliable as calorie counts on protein bars.
i tried maximhq’s bifrost. only proxies requests in its open-source version. same for traceloop’s hub. so nothing that ticked all the boxes.
and, as usual, the moment annoyance crosses a certain threshold (involving generating anime waifu), i start hacking.
the bigger picture: ThinkPixel
for about a year, i’ve been trying to ship ThinkPixel: a semantic search engine you can embed seamlessly into WooCommerce shops. it uses custom embedding models, qdrant as the vector store and BM42 hybrid search. and a good dose of stubbornness on my part.
it works, but not «public release» level yet. i’ll get there eventually.
in my mind, ThinkPixel is the larger project: search, retrieval, intelligence that plugs into boring real-world small business ecommerce setups. for that, somewhere in the future i’ll need a reliable LLM layer. so ThinkPixelLLMGW naturally became a core component of that future. (until then, i just need it to animate anime elfs, but that’s the side-story)
so:
introducing: ThinkPixelLLMGW
https://github.com/bdobrica/ThinkPixelLLMGW (a piece of the bigger ThinkPixel puzzle)
what i wanted here was something:
so i wrote it in Go (not a rust hater, just allergic to hype), backed it with postgres + redis/valkey, and started adding the features i actually need:
curl.current status
the project is actually in a pretty good place. according to myself MVP is complete: admin features are implemented, openai provider works with streaming, async billing and usage queue system is done, and the whole thing is surprisingly solid. i even wrote tests. dozens of them. i know, i’m shocked too (kudos to copilot for help).
the full TODO / progress list is here. kept updated with AI. so bare with me. it’s long. like, romanian-bureaucracy long.
why am i posting this?
because i enjoy building things that solve my own frustrations. because gateways are boring… until they break. because vendor-neutral LLM infrastructure will matter more and more, especially with pricing randomness, model churn, and the growing zoo of providers.
and because maybe someone else has been annoyed by the same problems and wants something open-source, fast, predictable, and designed by someone who doesn’t think «production-ready» means «works in docker, on my mac».
ThinkPixelLLMGW is just one component in a larger thing i’ve been slowly carving out. if/when the original ThinkPixel semantic search finally ships, this gateway will already be there, quietly doing the unglamorous work of routing, tracking and keeping costs under control.
until then, i’ll keep adding features, and i’ll keep the repo public. feel free to star it, fork it, bash it, open issues, or just lurk.
sometimes the best things you build are the ones you started out of mild irritation.
disclaimer
as with all open-source projects, it works flawlessly on my cluster. your machine, cloud, cluster, or philosophical worldview may vary.

aceast sait folosește cookie-uri pentru a îmbunătăți experiența ta, ca vizitator. în același scop, acest sait utilizează modulul Facebook pentru integrarea cu rețeaua lor socială. poți accesa aici politica mea de confidențialitate.