GGML has been replaced with GGUF now and GGML is no longer getting any updates.
GPU offloading for GGUF/GGML has been available for quite a long time in Text Generation WebUI and works very well, but isn’t nearly as fast as GPTQ or the new AWQ format.
GPU offloading for GGUF/GGML has been available for quite a long time in Text Generation WebUI and works very well, but isn’t nearly as fast as GPTQ or the new AWQ format.