I'm working on a Mac mini M4 Pro with 24Gb memory. I want to configure sub-agents with their own LLM. They cannot run simultaneously because of the memory limit.
I asked an online LLM for advice (Qwen 3.5) and it advised me to setup llama-server as a router. This way it will load/unload LLMs based on them being called. This works well in a browser, but so far I am unable to set it up correctly in Pi.
I assume I should update the models.json. If not this, then perhaps update the Pi system itself.
Can anyone hint me in the right direction? Thanks!
I'm working on a Mac mini M4 Pro with 24Gb memory. I want to configure sub-agents with their own LLM. They cannot run simultaneously because of the memory limit.
I asked an online LLM for advice (Qwen 3.5) and it advised me to setup llama-server as a router. This way it will load/unload LLMs based on them being called. This works well in a browser, but so far I am unable to set it up correctly in Pi.
I assume I should update the models.json. If not this, then perhaps update the Pi system itself.
Can anyone hint me in the right direction? Thanks!