Sylvain Becker
2fd4aee149
Un-activate some routine on mips because they are slowers (Bug 4503)
2019-02-23 09:36:56 +01:00
Sylvain Becker
47fb781b94
BlitNtoN BlitNtoNKey: remove non-aligned word read/store (bpp 3<->4) (Bug 4503)
...
Mips and (old) ARM doesn't allow word read/write when adress isn't 4bytes
aligned. So just remove that.
2019-02-22 09:30:45 +01:00
Sylvain Becker
90a075d75f
Fix windows build
2019-02-18 22:48:14 +01:00
Sylvain Becker
e9a7b6973a
Fix bug 4053: Blit issues on Big Endian CPU
2019-02-18 22:06:53 +01:00
Sylvain Becker
afd1b3dae4
Fix invalid memory access and optimise Blit_3or4_to_3or4__*
...
Fix invalid write at last pixel of the surface:
when surface has no padding (pitch == w * bpp) and bpp is 3
with Blit, no colorkey, and NO_ALPHA same or inverse rgb triplet
Optimise by using int32 access:
BGR24 -> ARGB8888 : faster x1.897875 (362405 -> 190953)
RGB24 -> ABGR8888 : faster x1.660416 (363304 -> 218803)
ABGR8888 -> RGB24 : faster x1.686319 (334962 -> 198635)
ARGB8888 -> BGR24 : faster x1.691868 (324524 -> 191814)
BGR24 -> RGB888 : faster x1.678459 (326811 -> 194709)
BGR888 -> RGB24 : faster x1.731772 (327724 -> 189242)
RGB24 -> BGR888 : faster x1.690989 (328916 -> 194511)
RGB888 -> BGR24 : faster x1.698333 (326175 -> 192056)
2019-02-17 16:20:23 +01:00
Sylvain Becker
1aa2ad2fe8
Better naming for the blit permutation variables
2019-02-09 17:40:32 +01:00
Sylvain Becker
f6a2ae6007
Faster blit colorkey or not, applied to bpp: 3->4 and 4->3
...
===== BlitNtoNKey ========
ABGR8888 -> BGR24 : faster x3 (2168709 -> 562738)
ABGR8888 -> RGB24 : faster x3 (2165055 -> 567458)
ARGB8888 -> BGR24 : faster x3 (2169109 -> 564338)
ARGB8888 -> RGB24 : faster x3 (2165266 -> 567081)
BGR24 -> ABGR8888 : faster x3 (2997675 -> 891636)
BGR24 -> ARGB8888 : faster x3 (2985449 -> 892028)
BGR24 -> BGR888 : faster x3 (2961611 -> 891913)
BGR24 -> BGRA8888 : faster x3 (3116305 -> 891534)
BGR24 -> BGRX8888 : faster x3 (3179654 -> 896978)
BGR24 -> RGB888 : faster x3 (2968191 -> 895112)
BGR24 -> RGBA8888 : faster x3 (2998428 -> 893147)
BGR24 -> RGBX8888 : faster x3 (2976529 -> 914853)
BGR888 -> BGR24 : faster x3 (2161906 -> 563921)
BGR888 -> RGB24 : faster x3 (2168228 -> 566634)
BGRA8888 -> BGR24 : faster x4 (2270501 -> 561873)
BGRA8888 -> RGB24 : faster x3 (2163179 -> 567330)
BGRX8888 -> BGR24 : faster x3 (2162911 -> 562322)
BGRX8888 -> RGB24 : faster x3 (2169617 -> 570927)
RGB24 -> ABGR8888 : faster x3 (2977061 -> 925975)
RGB24 -> ARGB8888 : faster x3 (2978148 -> 923680)
RGB24 -> BGR888 : faster x3 (3001413 -> 935074)
RGB24 -> BGRA8888 : faster x3 (2959003 -> 924096)
RGB24 -> BGRX8888 : faster x3 (2965240 -> 927100)
RGB24 -> RGB888 : faster x3 (2983921 -> 926063)
RGB24 -> RGBA8888 : faster x3 (2963908 -> 925457)
RGB24 -> RGBX8888 : faster x3 (2967957 -> 931700)
RGB888 -> BGR24 : faster x3 (2173299 -> 563226)
RGB888 -> RGB24 : faster x3 (2218374 -> 566164)
RGBA8888 -> BGR24 : faster x3 (2166355 -> 561381)
RGBA8888 -> RGB24 : faster x3 (2170322 -> 566729)
RGBX8888 -> BGR24 : faster x3 (2168524 -> 564072)
RGBX8888 -> RGB24 : faster x3 (2163680 -> 566956)
===== BlitNtoN ========
BGR24 -> BGRA8888 : faster x3 (2458958 -> 797557)
BGR24 -> BGRX8888 : faster x3 (2486085 -> 797745)
BGR24 -> RGBA8888 : faster x3 (2422116 -> 797637)
BGR24 -> RGBX8888 : faster x3 (2454426 -> 799085)
BGRA8888 -> BGR24 : faster x4 (2468206 -> 524486)
BGRA8888 -> RGB24 : faster x4 (2463581 -> 525561)
BGRX8888 -> BGR24 : faster x4 (2583355 -> 524468)
BGRX8888 -> RGB24 : faster x4 (2477242 -> 524284)
RGB24 -> BGRA8888 : faster x2 (2453414 -> 818415)
RGB24 -> BGRX8888 : faster x3 (2414915 -> 800863)
RGB24 -> RGBA8888 : faster x3 (2461114 -> 798148)
RGB24 -> RGBX8888 : faster x3 (2400922 -> 799203)
RGBA8888 -> BGR24 : faster x4 (2494472 -> 526428)
RGBA8888 -> RGB24 : faster x4 (2462260 -> 526791)
RGBX8888 -> BGR24 : faster x4 (2541115 -> 524390)
RGBX8888 -> RGB24 : faster x4 (2469059 -> 525416)
2019-02-09 17:20:53 +01:00
Sylvain Becker
604b44f20f
Fix wrong access and simplify
2019-02-08 17:15:30 +01:00
Sylvain Becker
5ed30f844d
Some simplification of previous commit
2019-02-07 22:45:50 +01:00
Sylvain Becker
5fd228921c
Faster blit with CopyAlpha, no ColorKey
...
Applied to following formats:
ABGR8888 -> BGRA8888 : faster x3 (2727179 -> 704761)
ABGR8888 -> RGBA8888 : faster x3 (2707808 -> 705309)
ARGB8888 -> BGRA8888 : faster x3 (2745371 -> 712437)
ARGB8888 -> RGBA8888 : faster x3 (2746230 -> 705236)
BGRA8888 -> ABGR8888 : faster x3 (2745026 -> 707045)
BGRA8888 -> ARGB8888 : faster x3 (2752760 -> 727373)
BGRA8888 -> RGBA8888 : faster x3 (2769544 -> 704607)
RGBA8888 -> ABGR8888 : faster x3 (2725058 -> 706669)
RGBA8888 -> ARGB8888 : faster x3 (2704866 -> 707132)
RGBA8888 -> BGRA8888 : faster x3 (2710351 -> 704615)
2019-02-07 22:03:30 +01:00
Sylvain Becker
704e62bbf4
Code factorization of the pixel format permutation
2019-02-07 21:49:24 +01:00
Sylvain Becker
0a007a9bea
Fix wrong comment
2019-02-07 18:52:49 +01:00
Sylvain Becker
e5192384d0
Faster blit with no ColorKey
...
Applied to following formats:
ABGR8888 -> BGRX8888 : faster x5 (3177493 -> 630439)
ABGR8888 -> RGBX8888 : faster x5 (3178104 -> 628925)
ARGB8888 -> BGRX8888 : faster x4 (3141089 -> 629448)
ARGB8888 -> RGBX8888 : faster x5 (3216413 -> 630465)
BGR888 -> BGRA8888 : faster x4 (3145403 -> 637701)
BGR888 -> BGRX8888 : faster x4 (3142106 -> 630144)
BGR888 -> RGBA8888 : faster x4 (3202685 -> 649384)
BGR888 -> RGBX8888 : faster x4 (3170617 -> 658670)
BGRA8888 -> BGR888 : faster x4 (3203308 -> 657697)
BGRA8888 -> RGB888 : faster x5 (3201475 -> 631747)
BGRA8888 -> RGBX8888 : faster x5 (3274544 -> 630409)
BGRX8888 -> ABGR8888 : faster x4 (3149753 -> 638682)
BGRX8888 -> ARGB8888 : faster x5 (3164101 -> 631273)
BGRX8888 -> BGR888 : faster x4 (3144454 -> 630712)
BGRX8888 -> RGB888 : faster x4 (3160490 -> 638047)
BGRX8888 -> RGBA8888 : faster x5 (3308988 -> 631232)
BGRX8888 -> RGBX8888 : faster x5 (3216775 -> 638065)
RGB888 -> BGRA8888 : faster x4 (3143135 -> 655146)
RGB888 -> BGRX8888 : faster x4 (3141790 -> 653771)
RGB888 -> RGBA8888 : faster x5 (3214402 -> 637001)
RGB888 -> RGBX8888 : faster x4 (3143082 -> 630009)
RGBA8888 -> BGR888 : faster x3 (3157048 -> 920375)
RGBA8888 -> BGRX8888 : faster x5 (3196692 -> 632996)
RGBA8888 -> RGB888 : faster x4 (3141570 -> 652151)
RGBX8888 -> ABGR8888 : faster x5 (3175401 -> 631218)
RGBX8888 -> ARGB8888 : faster x4 (3144690 -> 639440)
RGBX8888 -> BGR888 : faster x4 (3144250 -> 630171)
RGBX8888 -> BGRA8888 : faster x5 (3220321 -> 630731)
RGBX8888 -> BGRX8888 : faster x4 (3178453 -> 637445)
RGBX8888 -> RGB888 : faster x5 (3203623 -> 632596)
2019-02-07 18:51:14 +01:00
Sylvain Becker
7372295ec9
Faster blit when using No Alpha or Set Alpha, + ColorKey
...
Applied to following formats:
ABGR8888 -> BGRX8888 : faster x4 (2794295 -> 610587)
ABGR8888 -> RGB888 : faster x4 (2835693 -> 615561)
ABGR8888 -> RGBX8888 : faster x4 (2880475 -> 610479)
ARGB8888 -> BGR888 : faster x4 (2802718 -> 610702)
ARGB8888 -> BGRX8888 : faster x4 (2792481 -> 606311)
ARGB8888 -> RGBX8888 : faster x4 (2821621 -> 624745)
BGR888 -> ARGB8888 : faster x4 (2791705 -> 637889)
BGR888 -> BGRA8888 : faster x4 (2793195 -> 652299)
BGR888 -> BGRX8888 : faster x4 (2800713 -> 609326)
BGR888 -> RGB888 : faster x4 (2812260 -> 610471)
BGR888 -> RGBA8888 : faster x4 (2792327 -> 629288)
BGR888 -> RGBX8888 : faster x4 (2799224 -> 607073)
BGRA8888 -> BGR888 : faster x4 (2800520 -> 606897)
BGRA8888 -> RGB888 : faster x4 (2825274 -> 616156)
BGRA8888 -> RGBX8888 : faster x4 (2812530 -> 610340)
BGRX8888 -> ABGR8888 : faster x4 (2793940 -> 628596)
BGRX8888 -> ARGB8888 : faster x4 (2822686 -> 638899)
BGRX8888 -> BGR888 : faster x4 (2818141 -> 613659)
BGRX8888 -> RGB888 : faster x4 (2929017 -> 611794)
BGRX8888 -> RGBA8888 : faster x4 (2799709 -> 629750)
BGRX8888 -> RGBX8888 : faster x4 (2911010 -> 605640)
RGB888 -> ABGR8888 : faster x4 (2800671 -> 631542)
RGB888 -> BGR888 : faster x4 (2802644 -> 604461)
RGB888 -> BGRA8888 : faster x4 (2801919 -> 628729)
RGB888 -> BGRX8888 : faster x4 (2938244 -> 604135)
RGB888 -> RGBA8888 : faster x4 (2912447 -> 642185)
RGB888 -> RGBX8888 : faster x4 (2831676 -> 634293)
RGBA8888 -> BGR888 : faster x4 (2928896 -> 614960)
RGBA8888 -> BGRX8888 : faster x4 (2821422 -> 608146)
RGBA8888 -> RGB888 : faster x4 (2825927 -> 617184)
RGBX8888 -> ABGR8888 : faster x4 (2803852 -> 654129)
RGBX8888 -> ARGB8888 : faster x4 (2923615 -> 642644)
RGBX8888 -> BGR888 : faster x4 (2806523 -> 610447)
RGBX8888 -> BGRA8888 : faster x4 (2813388 -> 630305)
RGBX8888 -> BGRX8888 : faster x4 (2800052 -> 607881)
RGBX8888 -> RGB888 : faster x4 (2807722 -> 610263)
2019-02-07 17:52:28 +01:00
Sylvain Becker
bb9a9080dc
Fix pointer warnings
2019-02-07 16:13:25 +01:00
Sylvain Becker
3543a44ae4
Faster blit when using CopyAlpha + ColorKey
...
Applied to following formats:
ABGR8888 -> ARGB8888 : faster x7 (3959672 -> 537227)
ABGR8888 -> BGRA8888 : faster x7 (4008716 -> 532064)
ABGR8888 -> RGBA8888 : faster x7 (3998576 -> 530964)
ARGB8888 -> ABGR8888 : faster x7 (3942420 -> 532503)
ARGB8888 -> BGRA8888 : faster x7 (3995382 -> 527722)
ARGB8888 -> RGBA8888 : faster x7 (4259330 -> 543033)
BGRA8888 -> ABGR8888 : faster x7 (4110411 -> 529402)
BGRA8888 -> ARGB8888 : faster x7 (4071906 -> 538393)
BGRA8888 -> RGBA8888 : faster x6 (4038320 -> 585141)
RGBA8888 -> ABGR8888 : faster x7 (3937018 -> 534127)
RGBA8888 -> ARGB8888 : faster x7 (3979577 -> 537810)
RGBA8888 -> BGRA8888 : faster x7 (3975656 -> 528355)
2019-02-07 15:12:17 +01:00
Sylvain Becker
7b8bac5958
Add fast paths in BlitNtoNKey
...
All following conversions are faster (with colorkey, but no blending).
(ratio isn't very accurate)
ABGR8888 -> BGR888 : faster x9 (2699035 -> 297425)
ARGB8888 -> RGB888 : faster x8 (2659266 -> 296137)
BGR24 -> BGR24 : faster x5 (2232482 -> 445897)
BGR24 -> RGB24 : faster x4 (2150023 -> 448576)
BGR888 -> ABGR8888 : faster x8 (2649957 -> 307595)
BGRA8888 -> BGRX8888 : faster x9 (2696041 -> 297596)
BGRX8888 -> BGRA8888 : faster x8 (2662011 -> 299463)
BGRX8888 -> BGRX8888 : faster x9 (2733346 -> 295045)
RGB24 -> BGR24 : faster x4 (2154551 -> 485262)
RGB24 -> RGB24 : faster x4 (2149878 -> 484870)
RGB888 -> ARGB8888 : faster x8 (2762877 -> 324946)
RGBA8888 -> RGBX8888 : faster x8 (2657855 -> 297753)
RGBX8888 -> RGBA8888 : faster x8 (2661360 -> 296655)
RGBX8888 -> RGBX8888 : faster x8 (2649287 -> 308268)
2019-01-30 22:50:20 +01:00
Sylvain Becker
a052d81bdf
Add explicit unsigned int and char types in (for bug 4290)
2019-01-30 15:31:07 +01:00
Sylvain Becker
1128d57316
Fixed bug 4290 - add fastpaths for format conversion in BlitNtoN
...
All following conversion are faster (no colorkey, no blending).
(ratio isn't very accurate)
ABGR8888 -> ARGB8888 : faster x6 (2655837 -> 416607)
ABGR8888 -> BGR24 : faster x7 (2470117 -> 325693)
ABGR8888 -> RGB24 : faster x7 (2478107 -> 335445)
ABGR8888 -> RGB888 : faster x9 (3178524 -> 333859)
ARGB8888 -> ABGR8888 : faster x6 (2648366 -> 406977)
ARGB8888 -> BGR24 : faster x7 (2474978 -> 327819)
ARGB8888 -> BGR888 : faster x9 (3189072 -> 326710)
ARGB8888 -> RGB24 : faster x7 (2473689 -> 324729)
BGR24 -> ABGR8888 : faster x6 (2268763 -> 359946)
BGR24 -> ARGB8888 : faster x6 (2306393 -> 359213)
BGR24 -> BGR888 : faster x6 (2231141 -> 324195)
BGR24 -> RGB24 : faster x4 (1557835 -> 322033)
BGR24 -> RGB888 : faster x6 (2229854 -> 323849)
BGR888 -> ARGB8888 : faster x8 (3215202 -> 363137)
BGR888 -> BGR24 : faster x7 (2474775 -> 347916)
BGR888 -> RGB24 : faster x7 (2532783 -> 327354)
BGR888 -> RGB888 : faster x9 (3134634 -> 344987)
RGB24 -> ABGR8888 : faster x6 (2229486 -> 358919)
RGB24 -> ARGB8888 : faster x6 (2271587 -> 358521)
RGB24 -> BGR24 : faster x4 (1530913 -> 321149)
RGB24 -> BGR888 : faster x6 (2227284 -> 327453)
RGB24 -> RGB888 : faster x6 (2227125 -> 329061)
RGB888 -> ABGR8888 : faster x8 (3163292 -> 362445)
RGB888 -> BGR24 : faster x7 (2469489 -> 327127)
RGB888 -> BGR888 : faster x9 (3190526 -> 326022)
RGB888 -> RGB24 : faster x7 (2479084 -> 324982)
2019-01-30 15:23:33 +01:00
Sam Lantinga
5e13087b0f
Updated copyright for 2019
2019-01-04 22:01:14 -08:00
Sam Lantinga
6e35e42145
Working on bug 3921 - Add some Fastpath to BlitNtoNKey and BlitNtoNKeyCopyAlpha
...
Sylvain
I did various benches. with clang 6.0.0 on linux, and ndk-r16b on android (NDK_TOOLCHAIN_VERSION=clang).
- still see a x10 speed factor.
- with duff_loops, it does not use vectorisation (but doesn't seem to be a problem).
on linux my patch is already at full speed on -O2, whereas the duff_loops need -O3 (200 ms at -03, and 300ms at -02).
I realized that on Android, I had a slight variation which fits best.
both on linux with -O2 and -O3, and on android with 02/03 and armeabi-v7a/arm64.
Here's the patch.
2018-10-01 14:43:03 -07:00
Ozkan Sezer
922623e1b6
SDL_blit_N.c (BlitNtoNKeyCopyAlpha): fix -Wshadow warnings by adding _
...
suffix to the temp Pixel local in the DUFFS_LOOP.
SDL_blit.h (ASSEMBLE_RGB): add _ prefix to temp Pixel locals to avoid
any possible shadowings.
The warnings were like the following:
In file included from src/video/SDL_blit_N.c:26:0:
src/video/SDL_blit_N.c: In function 'BlitNtoNKeyCopyAlpha':
src/video/SDL_blit_N.c:2421:24: warning: declaration of 'Pixel' shadows a previous local [-Wshadow]
Uint32 Pixel = ((*src32 & rgbmask) == ckey) ? *dst32 : *src32;
^
src/video/SDL_blit.h:475:21: note: in definition of macro 'DUFFS_LOOP8'
case 0: do { pixel_copy_increment; /* fallthrough */ \
^
src/video/SDL_blit_N.c:2419:13: note: in expansion of macro 'DUFFS_LOOP'
DUFFS_LOOP(
^
src/video/SDL_blit_N.c:2399:12: warning: shadowed declaration is here [-Wshadow]
Uint32 Pixel;
^
2018-10-01 21:29:11 +03:00
Sam Lantinga
7df0f4fdac
Fixed bug 4277 - warnings patch
...
Sylvain
Patch a few warnings when using:
-Wmissing-prototypes -Wdocumentation -Wdocumentation-unknown-command
They are automatically enabled with -Wall
2018-09-27 14:56:29 -07:00
Sam Lantinga
e3cc5b2c6b
Updated copyright for 2018
2018-01-03 10:03:25 -08:00
Philipp Wiesemann
e5d9b25d8c
Fixed comment style.
2017-02-26 21:20:39 +01:00
Sam Lantinga
45b774e3f7
Updated copyright for 2017
2017-01-01 18:33:28 -08:00
Sam Lantinga
818d1d3e80
Fixed bug 1646 - Warnings from clang with -Weverything
2016-11-15 01:30:08 -08:00
Sam Lantinga
39ba2ab835
Fixed NULL pointer dereference, thanks Ozkan Sezer
2016-10-22 17:53:03 -07:00
Sam Lantinga
5b14a943a8
Fixed bug 3466 - Can't build 2.0.5 on ppc64
...
/home/fedora/SDL2-2.0.5/src/video/SDL_blit_N.c: In function 'calc_swizzle32':
/home/fedora/SDL2-2.0.5/src/video/SDL_blit_N.c:127:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
const vector unsigned char plus = VECUINT8_LITERAL(0x00, 0x00, 0x00, 0x00,
^
2016-10-22 11:01:55 -07:00
Sam Lantinga
42065e785d
Updated copyright to 2016
2016-01-02 10:10:34 -08:00
Philipp Wiesemann
0e45984fa0
Fixed crash if initialization of EGL failed but was tried again later.
...
The internal function SDL_EGL_LoadLibrary() did not delete and remove a mostly
uninitialized data structure if loading the library first failed. A later try to
use EGL then skipped initialization and assumed it was previously successful
because the data structure now already existed. This led to at least one crash
in the internal function SDL_EGL_ChooseConfig() because a NULL pointer was
dereferenced to make a call to eglBindAPI().
2015-06-21 17:33:46 +02:00