Do you have any thoughts on how I can make this more obvious?
It's covered by the documentation for the individual plugins, but I want to make it as easy as possible for people to understand what's going on when they first start using the tool.
It's laziness on my part. I read just your blog post. Had I clicked through to the tool, it clearly says it at the top. My apologies.
I'm very grateful for all the work and writing you are doing about LLMs.
Regarding your note about JSON mode with llama.cpp, I'm writing a wrapper for it on my katarismo project. It is basically the stdout suggestion from that comment, but it is working really well for me when I use it with pocketbase.
Perhaps something as simple as stating it was first built around OpenAI models and later expanded to local via plugins?
I've been meaning to ask you, have you seen/used MS Guidance[0] 'language' at all? I don't know if it's the right abstraction to interface as a plugin with what you've got in llm cli but there's a lot about Guidance that seems incredibly useful to local inference [token healing and acceleration especially].
Yeah, I looked at Guidance and I have to admit I don't fully get it - my main problem was that I can't look at one of their Handlebars templates and figure out exactly what LLM prompts it's going to fire and in what order they will be sent.
I'm much happier with a very thin wrapper where I can explicitly see exactly what prompts are processed when, and where prompts are assembled using very simple string manipulation.
I'm thinking I may pull the OpenAI stuff out of LLM core and make that a plugin as well - that way it will be VERY obvious when you install the tool that you get to pick which LLMs you're going to work with.
It's covered by the documentation for the individual plugins, but I want to make it as easy as possible for people to understand what's going on when they first start using the tool.