Running LLMs Locally With Ollama - It Depends on the Use Case
Running LLMs Locally With Ollama
Why Run AI Locally?
Using cloud services like ChatGPT and Claude, there’s always one nagging concern: sending my code to external servers.
Personal projects, fine. But working with client code or company-confidential material? That’s uncomfortable. “We don’t use your data for training,” they say. But still.
So I decided to try running LLMs locally. Ollama seemed like the simplest option for installation and model management.
Setup Is Dead Simple
# macOS
brew install ollama
# Start the service
ollama serve
# Download and run a model
ollama run llama3.1
That’s it. Three lines and you have a local LLM running. No Docker configuration, no GPU driver hassle.
The model library is solid too. Llama 3.1, CodeLlama, Mistral, Gemma — open-source models downloadable with a single ollama pull.
Real-World Usage
Code assistance — Tried CodeLlama. Simple function generation and code explanation work fine. But complex refactoring or architecture-level suggestions fall well short of GPT-4 or Claude.
Document summarization — Used Llama 3.1 for summarization. English documents came out decent. Non-English languages are noticeably weaker compared to cloud models.
Speed — Ran it on an M4 Mac Mini. 7B models have acceptable speed. But 70B-class models generate tokens too slowly to be practical. GPU memory is the bottleneck.
My Conclusion
After trying various configurations, here’s where I landed:
Local LLM works well for:
- Security-sensitive environments (proprietary code, confidential documents)
- Offline usage requirements
- Simple repetitive tasks (format conversion, basic classification)
- Reducing API costs
Cloud LLM wins for:
- Complex reasoning tasks
- Non-English language processing
- Access to current knowledge
- Long context handling
The conclusion is the predictable “it depends on the use case,” but actually using both clarified exactly where that boundary sits. Local LLMs don’t replace cloud LLMs — they complement them.
For daily work, Claude and GPT remain the primary tools, with Ollama reserved for security-sensitive tasks. Open-source models are improving rapidly, so this could change — but that’s where things stand today.