Ok, so this might seem stupid.. but I am just trying to get my head around it.
I have 32bit registers (longword size).
Unsigned values are values from 0 to 255.
Adding two unsigned 128 bit values...
So do I need to use 4 registers per value and add the lower parts first, then the next lowest until the highest part?