Do you have any thoughts on how I can make this more obvious? It's covered by th...

xrd · on July 19, 2023

It's laziness on my part. I read just your blog post. Had I clicked through to the tool, it clearly says it at the top. My apologies.

I'm very grateful for all the work and writing you are doing about LLMs.

Regarding your note about JSON mode with llama.cpp, I'm writing a wrapper for it on my katarismo project. It is basically the stdout suggestion from that comment, but it is working really well for me when I use it with pocketbase.

https://gitlab.p.katarismo.com/katarismo/backend

ricopags · on July 19, 2023

Perhaps something as simple as stating it was first built around OpenAI models and later expanded to local via plugins?

I've been meaning to ask you, have you seen/used MS Guidance[0] 'language' at all? I don't know if it's the right abstraction to interface as a plugin with what you've got in llm cli but there's a lot about Guidance that seems incredibly useful to local inference [token healing and acceleration especially].

[0]https://github.com/microsoft/guidance

simonw · on July 19, 2023

Yeah, I looked at Guidance and I have to admit I don't fully get it - my main problem was that I can't look at one of their Handlebars templates and figure out exactly what LLM prompts it's going to fire and in what order they will be sent.

I'm much happier with a very thin wrapper where I can explicitly see exactly what prompts are processed when, and where prompts are assembled using very simple string manipulation.

I'm thinking I may pull the OpenAI stuff out of LLM core and make that a plugin as well - that way it will be VERY obvious when you install the tool that you get to pick which LLMs you're going to work with.