Running LLMs Locally With Ollama

Why Run AI Locally?

Using cloud services like ChatGPT and Claude, there’s always one nagging concern: sending my code to external servers.

Personal projects, fine. But working with client code or company-confidential material? That’s uncomfortable. “We don’t use your data for training,” they say. But still.

So I decided to try running LLMs locally. Ollama seemed like the simplest option for installation and model management.

Setup Is Dead Simple

# macOS
brew install ollama

# Start the service
ollama serve

# Download and run a model
ollama run llama3.1

That’s it. Three lines and you have a local LLM running. No Docker configuration, no GPU driver hassle.

The model library is solid too. Llama 3.1, CodeLlama, Mistral, Gemma — open-source models downloadable with a single ollama pull.

Real-World Usage

Code assistance — Tried CodeLlama. Simple function generation and code explanation work fine. But complex refactoring or architecture-level suggestions fall well short of GPT-4 or Claude.

Document summarization — Used Llama 3.1 for summarization. English documents came out decent. Non-English languages are noticeably weaker compared to cloud models.

Speed — Ran it on an M4 Mac Mini. 7B models have acceptable speed. But 70B-class models generate tokens too slowly to be practical. GPU memory is the bottleneck.

My Conclusion

After trying various configurations, here’s where I landed:

Local LLM works well for:

Security-sensitive environments (proprietary code, confidential documents)
Offline usage requirements
Simple repetitive tasks (format conversion, basic classification)
Reducing API costs

Cloud LLM wins for:

Complex reasoning tasks
Non-English language processing
Access to current knowledge
Long context handling

The conclusion is the predictable “it depends on the use case,” but actually using both clarified exactly where that boundary sits. Local LLMs don’t replace cloud LLMs — they complement them.

For daily work, Claude and GPT remain the primary tools, with Ollama reserved for security-sensitive tasks. Open-source models are improving rapidly, so this could change — but that’s where things stand today.