Local Coding LLM - Initial Setup
06.03.2026 09:32
Exploring options for running an AI coding assistant at home, using hardware I already have. My initial research gave me a couple interesting options to try out:
- llama.cpp - llama-server can run GGUF format models, supports downloading directly from HuggingFace and provides a basic chat interface built into the server. Can run inference directly on the CPU but would perform better with GPU access.
- lmstudio - can run models of various formats, comes with a nice GUI for browsing, downloading and running models. Will make use of available GPU by default (doesn’t require custom compilation like llama.cpp).
Attempt 1
Executor: llama.cpp Model: TeichAI/Qwen3-14B-Claude-4.5-Opus-High-Reasoning-Distill-GGUF:Q4_K_M Hardware: Intel i7 4790k, 16GB DDR3 1600
Command:
meder@liquidwhite:bin$ ./llama-server -hf TeichAI/Qwen3-14B-Claude-4.5-Opus-High-Reasoning-Distill-GGUF:Q4_K_M --host 0.0.0.0
Result:
- Tokens: 10,492 tokens
- Time: 1h 41min 42s
- 1.72 t/s (avg. started ~3-4 t/s)
Attempt 2
Executor: llama.cpp Model: TeichAI/Qwen3-14B-Claude-4.5-Opus-High-Reasoning-Distill-GGUF:Q4_K_M Hardware: Intel i7 4790k, 16GB DDR3 1600
Command:
meder@liquidwhite:bin$ ./llama-server -hf Qwen/Qwen2.5-Coder-7B-Instruct-GGUF:Q4_K_M --host 0.0.0.0
Result:
- Tokens: 446 tokens
- Time: 1min 4s
- 6.94 t/s