These are _much_ faster than the scalar equivalents on the Raspberry Pi that I tested on. Often 3x to 4x as fast!