Quote from: Same on May 13, 2026, 10:07:22(The number of CPU threads doesn't matter for running AI (aka inferencing) (4 threads pretty much tops-out a dual-channel PC))
Correction: >4 threads inr MTP:
Using Qwen3.6-27B-UD-Q4_K_XL.gguf (download: huggingface.co/unsloth/Qwen3.6-27B-MTP-GGUF):
./llama.cpp/build/bin/llama-server --models-preset 'LLMs.ini' --threads 8 --models-max 1 --no-models-autoload -np 1 --no-warmup --chat-template-kwargs '{"preserve_thinking": true}' --no-mmproj --offline --no-mmap --spec-type draft-mtp --spec-draft-n-max 2- 4 threads: ~8.7 tokens per second
- 6 threads: ~10.8 tokens per second
- 8 threads: ~12 tokens per second
- beyond 8 threads the tokens per second number doesn't increase
Source: reddit/"PSA: Test your "threads" argument in llama.cpp (+80% performance in my case)"