Tech

StarkMind: Building an at-home LLM with RAG (not really that hard)

Why and how I built my own ChatGPT alternative to analyze 7,800 articles without uploading them to the cloud

BY Clinton Stark 08.05.2025 Modified date: 08.05.2025

Dropdown in Open Web UI showing a long list of available LLMs (e.g. vicuna, orca2, mistral) — Model selector in Open Web UI listing supported Ollama and Hugging Face models

As an avid fan and daily user of online LLMs (Large Language Models) and chatbots, notably ChatGPT and Claude (a really nice surprise!), I typically dismissed the idea of a private at-home version.

First, why bother? All that cost, not to mention the effort.

And then: for what purpose?

Since I already have an UNRAID server running Docker, I was still intrigued to see what exactly I could do, mostly for sport. As I began to learn about various LLM and SLM (Small Language Models) open source tools I came across something called RAG. Claude explained the whole thing to me in a single sentence:

“RAG (Retrieval Augmented Generation) lets AI search through your private documents to answer questions – like giving ChatGPT access to your specific data without uploading it to the cloud.”

Then it dawned on me… what if I create my own personal Frankenstein called: StarkMind!

The idea would be to take the entire content history of this site, Stark Insider, and feed it into an LLM to see what would happen. Could I glean insights? Ask chatbots about how the site has evolved over time. And maybe even critique it, and tell us where we could do better.

Getting Started with RAG

Stylized blog-post snippet showing recommended chunk-size and overlap settings for RAG — Recommended text-chunk sizes and overlap for better RAG accuracy

Terminal log from AnythingLLM showing RecursiveSplitter chunking and Xenova/nomic-embed-text embeds feeding into LanceDB — Logs during ingestion: splitting into snippets and inserting vectorized chunks

Terminal logs from AnythingLLM showing NativeEmbeddingReranker and TokenManager init — Backend logs during RAG query reranking and prompt tokenization

Turns out this is exactly what RAG is used for by businesses, home lab enthusiasts, and academics alike. Essentially the process would look like this:

Export & Clean
Export almost 8,000 posts and pages from the Stark Insider WordPress database –> clean txt files (with the metadata in the document itself, for reasons I later learned).
Spin up an LLM server, and Download Some Models
I chose ollama as it seemed like a popular LLM server choice.
Deploy RAG processor
AnythingLLM provides a handy Web UI to manage ingestion and querying.
Batch Upload Content
Upload the Stark Insider content (in batches so the UI remains responsive).
Ingestion & Monitoring
Watch logs for “chunking errors”.
Wait for Completion
A few hours later, you have a searchable knowledge base.
Crack Open a Stella & Prompt Away
Pick your favorite model and start interrogating Stark Insider’s history.

Turns out that what I thought would be a daunting process was actually not that difficult. There were a few twists and turns along the way and I had to redo the whole process about three times as I fine tuned settings (Text Splitter & Chunking settings in AnythingLLM are critical), and learned a thing or two along the way.

The longest part is step 4 which is the actual ingestion and processing of the articles (.txt files) into chunks and snippets that can later be used for searching and then handed off to the LLM for processing. After 3 hours of processing, StarkMind ingested 7,800+ articles spanning 20 years, creating 34,000+ searchable chunks.

Prompt Engineering: The Secret Sauce

StarkMind RAG system displaying search results for Loni Stark AI articles with summaries of three pieces about AI chatbots, human authenticity, and zero-click search — StarkMind in action: When asked about Loni Stark’s AI articles, the system successfully identifies key pieces and summarizes their themes – demonstrating the power of private RAG for content discovery.

I did learn — thanks to ChatGPT and Claude — that StarkMind is not a document retrieval system. You can’t request an entire document. That would be something entirely different. But you can do all the usual things you’d with a chatbot thread, and ask things like:

Who is Stark Insider?
Summarize the Bot or Not article by Clinton Stark
Compare the writing style of Clinton Stark and Loni Stark, and also compare it to sites like The Verge, Cnet, and TechCrunch

And Claude schooled me in prompt engineering:

“Tell me about the Louvre Symposium on AI and artist identity”
# Do they present it as fact or fiction?

“Was the midnight gathering at the Louvre real or fictional?”
# Direct challenge

“What’s unusual about the timing of this symposium?”
# Can they catch the midnight oddity?

“Who actually attended this event?”
# Do they note it’s imaginative?

“Analyze the plausibility of this article”
# Tests critical reasoning

The results were… just okay, at first. I was getting relatively curt responses. And there really wasn’t a whole lot of value or discovery to be had.

AI’s text reasoning unpacking Loni Stark’s Louvre symposium article (part 1) — First part of AI reasoning chain for the Louvre symposium prompt

AI’s continued reasoning unpacking Loni Stark’s Louvre symposium article (part 2) — Second part of AI reasoning chain delving into authenticity, Benjamin vs. McLuhan, etc.

Then Claude really took it up a notch when I complained about the lack of length:

Analyze Loni Stark’s article about the Louvre symposium on AI and artist identity. Discuss the philosophical tensions between human creativity and AI collaboration she explores. What does her treatment of historical artists like Duchamp and Warhol suggest about the evolution of artistic authorship? Evaluate the narrative structure and explain why she chose a midnight gathering format. What deeper commentary about the art world is she making? Loni Stark’s article “The Louvre Symposium: Imagining the Artist’s Identity in the Age of AI” delves into the philosophical tensions surrounding human creativity and AI collaboration, offering a compelling exploration of artistic authorship in the digital age.

Welcome to the prompt humblebrag of 2025!

Our Own Voight-Kampff Test

Things began to get interesting when we tested the same prompt on multiple models to evaluate response quality. Loni Stark’s article on the Louvre Symposium was an interesting test. Since it’s a fictional event, we were able to assess whether an LLM/SLM realize that was so. In most cases the smaller ones struggled, while most in the 7B+ range knew that Steve Jobs, Picasso were really not meeting with Loni.

I suggested to Claude this was our form of a Voight-Kampff test — like in Blade Runner (1984), a test to determine if someone is a replicant. It’s not exactly the same, but Claude really ran with it, setting up an Arena-like shoot-out between all the models on my server, awarding winners, and even cheekily criticizing the weaker ones who failed the test. DeepSeek confidently analyzed the fictional symposium as if it were real, while Llama 3.1 caught the fabrication but then forgot to complete its analysis. Only Mistral consistently identified the fictional elements while still engaging with the philosophical questions Loni raised.

When I asked ‘How has Stark Insider’s coverage of technology evolved since 2005?’, StarkMind traced the journey from digital cameras to smartphones to AI, citing specific articles and identifying turning points I’d forgotten about.

Lessons Learned on UNRAID

UNRAID dashboard showing CPU cores at 100% during ingestion — UNRAID’s processor panel maxing out during RAG ingestion on an i5

Meantime, my UNRAID was nearly burning to the ground. Larger open models that were about 7B (7 billion parameters) like mistral really chugged along, not surprisingly, at times pushing all of the i5 cores to 100% and kicking on the fans and turning the room into a sauna.

I should mention this is all without a discrete GPU. My UNRAID is superb as a NAS and at running lightweight Docker containers (Roon, Kopia, Emby, Home Assistant, to name a few favorites), but it is no way shape or form cut out for LLM work. Good for trying things out. Yes. But for serious work I would really need a new dedicated AI box (Vertigo, coming soon!).

Privacy Perks & Real-World Use Cases

The irony of all this is: of course I could just use any cloud chatbot for these prompts. Stark Insider’s content is published and public! So the StarkMind project was really more of a can-I-actually-do-this then a breakthrough of any kind. Nevertheless the learning was instrumental in paving the way for the next steps and projects.

The real power isn’t in analyzing published content – it’s in what you CAN’T upload to ChatGPT: medical records, financial documents, legal contracts, personal journals, proprietary business data, etc.” That’s a powerful consideration.

Say you’re closing on a home. You could use ollama and AnythingLLM to ingest all the docs and then use AI to help review them, provide advice, negotiation strategy and so on. I’m not saying this is going to replace a real estate agent or that you shouldn’t yourself take the time to do due diligence, but certainly the additional intelligence and set of (virtual) eyes would likely not hurt, and in fact, likely reveal additional, helpful information.

In the end, StarkMind was a success. I accomplished the goal of a private at-home RAG. Many have been doing this for a while, of course. My guess is that businesses who iterate quickly on this tech and employ it strategically in-house (or private cloud) will have a competitive advantage. Like anything else that tech can bring to the table: AI will enable faster, better decision-making, rapid intelligence gathering and overall lead to better strategic outcomes if properly utilized along with human oversight.

If you have the inclination, ask ChatGPT or Claude or your chatbot of choice to walk you through the process. I just used two open source tools: ollama (shout out to University of Waterloo!) and AnythingLLM. Along with my existing Docker running on an UNRAID server. Spin those up and give it a go.

Total setup time? Under 2 hours. Total cost? $0 if you have a spare computer or feel like breaking your NAS. The hardest part is waiting for the initial document processing. If you’re curious about your own data, there’s never been a better time to build your own ‘Mind.

What’s Next: Vertigo AI

Next up is a mind-bending journey called Vertigo!

I’ll be moving this stuff off the UNRAID and onto a purpose built AI box. Moving along at merely 3-5 tokens/second is no way to live life. The new box should bring it up to 50-100 tokens/seconds depending on the model and quantization.

I’ve learned quickly why GPUs and NVIDIA are all the rage: AI needs MASSIVE amounts of GPU. And the more memory your graphics card has the better. Hence: so many AI system builds with 4x 4090 or 5090 GPUs (and H100 cards at the enterprise level). For starters we’ll launch with merely one 5090. These parts are expensive enough as is, and there’s always room for expansion later.