Thank you, but this doesn't really answer OPs or my question. Is NVLink required if you want to run an LLM model which exceeds the memory of a single GPU? What are the benchmark comparisons with and without it?
I've heard that NVLink helps with training, but not so much with inferencing.
I've heard that NVLink helps with training, but not so much with inferencing.