LLM Connect Issues¶

Ollama 500 Error¶

A 500 error from Ollama usually means the model is too large for your available memory.

# Check which models are installed
ollama list

# Check if a model is loaded and using GPU
ollama ps

# Test the model directly
ollama run qwen3.5:4b

If ollama ps shows 0% GPU, inference runs entirely on CPU and will be very slow.

ollama pull qwen3.5:4b

Then select the new model in Murmure's LLM Connect settings.

Murmure may not auto-detect Ollama in some configurations.

Fix: Manually set the Ollama URL in LLM Connect settings:

Some models wrap their output in quotes ("...") or include thinking tags (<think>...</think>).

Fix:

Use recommended models: Qwen 3.5, Ministral
Add to your system prompt: "Output only the result. No quotes, no thinking, no explanation."

No GPU: LLM inference on CPU is slow. Consider a smaller model or getting a GPU.
Model too large: If the model doesn't fit in VRAM, it falls back to CPU. Use ollama ps to check GPU usage.
First request: The first request after launching is slower (model loading). Subsequent requests are faster.

For remote Ollama or OpenAI-compatible servers:

Verify the server is reachable: curl http://<server>:<port>/api/tags
Check firewall settings on both machines
Ensure the URL in Murmure includes the protocol (http:// or https://)
For Ollama, make sure OLLAMA_HOST=0.0.0.0 is set on the server to allow remote connections

Proxy support

HTTP proxy for LLM Connect is not yet supported. If you need proxy support in an enterprise environment, please comment on #286.