Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Branches are not the problem, GPUs handle those just fine now actually

Worth noting that's only kinda true. If all threads take the same branch in a thread group, then it's mostly fine. But divergent branches are basically equivalent to all cores taking both branches and just masking off all the writes with whether or not the conditional was true. This can be incredibly slow depending on the complexity of the code being branched.

Also not all GPUs can even optimize branches effectively, some of them just always take both branches & mask off the results.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: