Everything (most llms and modern embedding models) is a transformer so the archi...

Everything (most llms and modern embedding models) is a transformer so the architecture is very similar. Llama(2) is a Meta (facebook) developed transformer plus the training they did on it.

Ggml is a "framework" like pytorch etc (for the purposes of this discussion) that lets you code up the architecture of a model, load in the weights that were trained, and run inference with it. Llama.cpp is a project that I'd describe as using ggml to implement some specific AI model architectures.