Some more recent compilers emit SSE aligned store instructions for the loop,
causing crashes if the destination buffer isn't aligned on a 32-bit boundary.
This would also crash on platforms like ARM that require aligned stores.
This fixes a crash inside SDL_FillRect that happens with the official x64 mingw
build.