- high-level filters out duplicate render commands from the queue so
backends don't have to.
- Setting draw color is now a render command, so backends can put color
information into the vertex buffer to upload with everything else instead
of setting it with slower dynamic data later.
- backends can request that they always batch, even for legacy programs,
since the lowlevel API can deal with it (Metal, and eventually Vulkan
and such...)
- high-level makes sure the queue has at least one setdrawcolor and
setviewport command before any draw calls, so the backends don't ever have
to manage cases where this hasn't been explicitly set yet.
- backends allocating vertex buffer space can specify alignment, and the
high-level will keep track of gaps in the buffer between the last used
positions and the aligned data that can be used for later allocations
(Metal and such need to specify some constant data on 256 byte boundaries,
but we don't want to waste all that space we had to skip to meet alignment
requirements).
It now uses a growable linked list that keeps a pool of allocated items for
reuse, and reallocs the vertex array as necessary. Testsprite2 can scale to
20,000 (or more!) draws now without drama.
This moves all the rendering to a command list that is flushed to the GL as
necessary, making most common activities upload a single vertex buffer per
frame and dramatically reducing state changes. In pathological cases,
like Emscripten running on iOS's Safari, performance can go from a dozen
draw calls killing your performance to 1000 draw calls running smoothly.
This is work in progress, and not ready to ship. Among other things, it has
a hardcoded array that isn't checked for overflow. But the basic idea is
sound!
The only place angle is activated and causes effect is RenderCopyEx. All other
methods which use vertex shader, leave angle disabled and cause useless sin/cos
calculation in shader.
To get around shader's interface is changed to a vector that contains results
of sin and cos. To behave properly when disabled, cos value is set with offset
-1.0 making 0.0 default when deactivated.
As nice side effect it simplifies GLES2_UpdateVertexBuffer: All attributes are
vectors now.
Additional background:
* On RaspberryPi it gives a performace win for operations. Tested with
[1] numbers go down for 5-10% (not easy to estimate due to huge variation).
* SDL_RenderCopyEx was tested with [2]
* It works around left rotated display caused by low accuracy sin implemetation
in RaspberryPi/VC4 [3]
[1] https://github.com/schnitzeltony/sdl2box
[2] https://github.com/schnitzeltony/sdl2rendercopyex
[3] https://github.com/anholt/mesa/issues/110
Signed-off-by: Andreas M?ller <schnitzeltony@gmail.com>
duckgrease
SDL_RenderCopyEx blits wrong image (in some cases it's bunch of alternating horizontal lines, some cases it's image from the wrong coordinate, and in some cases it's just a bunch of garbled pixels), when the following conditions are met:
- Use software renderer.
- Enable either horizontal or vertical flip.
- source and destination rectangles must have same width and height, and must be smaller than the size of the texture.
- source rectangle's X and Y coordinates must be 0.
Sylvain
Patch a few warnings when using:
-Wmissing-prototypes -Wdocumentation -Wdocumentation-unknown-command
They are automatically enabled with -Wall
Anthony @ POW Games
SDL_CreateTextureFromSurface makes an internal call to SDL_GetColorKey which can return an error and spams the error log with "Surface doesn't have a colorkey" even though the original function didn't return an error.
Olli-Samuli Lehmus
If one creates a window with the SDL_WINDOW_FULLSCREEN_DESKTOP flag, and creates a render target with SDL_SetHint(SDL_HINT_RENDER_SCALE_QUALITY, "linear"), and afterwards sets SDL_SetHint(SDL_HINT_RENDER_SCALE_QUALITY, "nearest"), after minimizing the window, the scale quality hint is lost on the render target. Textures however do keep their interpolation modes.
SDL now builds with gcc 7.2 with the following command line options:
-Wall -pedantic-errors -Wno-deprecated-declarations -Wno-overlength-strings --std=c99
- Use a single buffer for various non-changing constants accessed by the GPU, instead of multiple buffers.
- Do the half-pixel offset for points and lines using a transform matrix so we don't need a malloc when rendering.
- Don't add a half-pixel offset for other primitives and textures. This matches D3D and GL render behaviour.
- Remove the half-texel texture coordinate offset since it's not needed now that there's no more half-pixel position offset when rendering a texture.
- Don't try to set texture usage on iOS 8 since it doesn't exist there.
Eric wing
There is a tiny bug in the new overscan code for the SDL_renderer.
In SDL_renderer.c, line 1265, the if check for SDL_strcasecmp with "direct3d" needs to be inverted.
Instead of:
if(SDL_strcasecmp("direct3d", SDL_GetCurrentVideoDriver())) {
It should be:
if(0 == SDL_strcasecmp("direct3d", SDL_GetCurrentVideoDriver())) {
This bug causes the "overscan" mode to pretty much be completely ignored in all cases and all things remain letterboxed (as before the feature).
Yuri K. Schlesner
When using texture filtering, there are filtering artifacts visible on the edges of scaled textures, where the texture filtering pulls in texels from the other side of the texture. Using clamping texture modes wouldn't completely fix this since source rectangles don't need to cover the whole texture. (See screenshot attached in next post.)
The opengl driver uses clamping on textures and so avoid this at least in the cases where the source rect is the whole texture. The direct3d driver does not and so has problems in every case. I'm not sure if it can actually completely be fixed, but at least enabling clamping for direct3d would be one step in the right direction.
This works better for games where there may be a bunch of simulation logic that needs to be run before the next rendering pass, and prevents blocking if the next drawable is busy.
This isn't complete, but is enough to run testsprite2. It's currently
Mac-only; with a little work to figure out how to properly glue in a Metal
layer to a UIView, this will likely work on iOS, too.
This is only wired up to the configure script right now, and disabled by
default. CMake and Xcode still need their bits filled in as appropriate.
New functions get and set the YUV colorspace conversion mode:
SDL_SetYUVConversionMode()
SDL_GetYUVConversionMode()
SDL_GetYUVConversionModeForResolution()
SDL_ConvertPixels() converts between all supported RGB and YUV formats, with SSE acceleration for converting from planar YUV formats (YV12, NV12, etc) to common RGB/RGBA formats.
Added a new test program, testyuv, to verify correctness and speed of YUV conversion functionality.
Carlos
Angle supports GLES3 but when using these functions (SDL_CreateWindow and SDL_CreateRenderer), defaults again to GLES2.0.
A current workaround (hack) to retrieve a GLES3.0 context with Angle is:
1) set
SDL_GL_SetAttribute(SDL_GL_CONTEXT_MAJOR_VERSION, 3);
SDL_GL_SetAttribute(SDL_GL_CONTEXT_MINOR_VERSION, 0);
after InitSDL AND after calling SDL_CreateWindow (before SDL_CreateRenderer)
2) Comment lines 2032-2044 in SDL_render_gles2.c, funtion GLES2_CreateRenderer
window_flags = SDL_GetWindowFlags(window);
if (!(window_flags & SDL_WINDOW_OPENGL) ||
profile_mask != SDL_GL_CONTEXT_PROFILE_ES || major != RENDERER_CONTEXT_MAJOR || minor != RENDERER_CONTEXT_MINOR) {
changed_window = SDL_TRUE;
SDL_GL_SetAttribute(SDL_GL_CONTEXT_PROFILE_MASK, SDL_GL_CONTEXT_PROFILE_ES);
SDL_GL_SetAttribute(SDL_GL_CONTEXT_MAJOR_VERSION, RENDERER_CONTEXT_MAJOR);
SDL_GL_SetAttribute(SDL_GL_CONTEXT_MINOR_VERSION, RENDERER_CONTEXT_MINOR);
if (SDL_RecreateWindow(window, window_flags | SDL_WINDOW_OPENGL) < 0) {
goto error;
}
}
This retrives a GLES3 context as confirmed using glGetString(GL_VERSION). This should be fixed by modifying a few if's.
Sylvain
Few issues with YUV on SDL2 when using odd dimensions, and missing conversions from/back to YUV formats.
1) The big part is that SDL_ConvertPixels() does not convert to/from YUV in most cases. This now works with any format and also with odd dimensions,
by adding two internal functions SDL_ConvertPixels_YUV_to_ARGB8888 and SDL_ConvertPixels_ARGB8888_to_YUV (could it be XRGB888 ?).
The target format is hard coded to ARGB888 (which is the default in the internal of the software renderer).
In case of different YUV conversion, it will do an intermediate conversion to a ARGB8888 buffer.
SDL_ConvertPixels_YUV_to_ARGB8888 is somehow redundant with all the "Color*Dither*Mod*".
But it allows some completeness of SDL_ConvertPixels to handle all YUV format.
It also works with odd dimensions.
Moreover, I did some benchmark(SDL_ConvertPixel vs Color32DitherYV12Mod1X and Color32DitherYUY2Mod1X).
gcc-6.3 and clang-4.0. gcc performs better than clang. And, with gcc, SDL_ConvertPixels() performs better (20%) than the two C function Color32Dither*().
For instance, to convert 10 times a 3888x2592 image, it takes ~195 ms with SDL_ConvertPixels and ~235 ms with Color32Dither*().
Especially because of gcc vectorize feature that optimises all conversion loops (-ftree-loop-vectorize).
Nb: I put no image pitch for the YUV buffers. because it complexify a little bit the code and the API :
There would be some ambiguity when setting the pitch exactly to image width:
would it a be pitch of image width (for luma and chroma). or just contiguous data ? (could set pitch=0 for the later).
2) Small issues with odd dimensions:
If width "w" is odd, luma plane width is still "w" whereas chroma planes will be "(w + 1)/2". Almost the same for odd h.
Solution is to strategically substitute "w" by "(w+1)/2" at the good places ...
- In the repository, SDL_ConvertPixels() handles YUV only if yuv source format is exactly the same as YUV destination format.
It basically does a memcpy of pixels, but it's done incorrectly when width or height is odd (wrong size of chroma planes). This is fixed.
- SDL Renderers don't support odd width/height for YUV textures.
This is fixed for software, opengl, opengles2. (opengles 1 does not support it and fallback to software rendering).
This is *not* fixed for D3D and D3D11 ... (and others, psp ?)
Only *two* Dither function are fixed ... not sure if others are really used.
- This is not possible to create a NV12/NV12 texture with the software renderer, whereas other renderers allow it.
This is fixed, by using SDL_ConvertPixels underneath.
- It was not possible to SDL_UpdateTexture() of format NV12/NV21 with the software renderer. this is fixed.
Here's also two testcases:
- that do all combination of conversion.
- to test partial UpdateTexture
Anthony
This is what's making the software renderer crash with rotated destination rectangles of w or h = 0:
SDL_SetHint(SDL_HINT_RENDER_SCALE_QUALITY, "2");
Martin Gerhardy
just for easier debugging issues in the own code...
SDL_CreateRenderer should maybe also use this macro
Ryan C. Gordon
I'll go one better: it should have an SDL_assert().
UX-admin
I am compiling with the Sun Studio 12 u2 compiler. There are multiple issues with the build, but this particular issue appears to be that it is illegal to declare a union of a struct of floats and a float. While GCC 4.8.1 does not flag this as an error, Sun Studio is much more standards compliant and strict, halting further compilation with an error.
Simon Hug
The bug is in the GL_ResetState and GLES_ResetState functions which get called after a new GL context is created. These functions set the cached current color to transparent black, but the GL specification says the initial color is opaque white.
The attached patch changes the values to 0xffffffff to reflect the initial state of the current color. Should the ResetState functions get called anywhere else in the future, this probably has to call the GL functions itself to ensure that the colors match.
Littlefighter19
When trying to mirror something on the PSP, I've stumbled upon the problem,
that using SDL_RenderCopyEx with SDL_FLIP_HORIZONTAL flips the image vertically, vise-versa SDL_FLIP_VERTICAL flips the image horizontally.
Proposed patch would be swapping the check in line 944 with the one in line 948 in SDL_render_psp.c
Daniel
SDL_RenderReadPixels with SDL_RENDERER_SOFTWARE reads pixels from wrong coordinates.
SW_RenderReadPixels adjusts the rect coordinates according to the viewport. But since this is already done by SDL_RenderReadPixels, the final rect has x2 bigger X and Y.
Eric wing
Hi, I think I found a bug when using SDL_WINDOW_ALLOW_HIGHDPI with SDL_RenderSetLogicalSize on iOS. I use SDL_RenderSetLogicalSize for all my stuff. I just tried turning on SDL_WINDOW_ALLOW_HIGHDPI on iOS and suddenly all my touch/mouse positions are really broken/far-off-the-mark.
I actually don't have a real retina device (still) so I'm seeing this using the iOS simulator with a 6plus template.
Attached is a simple test program that can reproduce the problem. It uses RenderSetLogicalSize and draws some moving happy faces (to show the boundaries/space of the LogicalSize and that it is working correctly for that part).
When you click/touch, it will draw one more happy face where your button point is.
If you comment out SDL_WINDOW_ALLOW_HIGHDPI, everything works as expected. But if you compile with it in, the mouse coordinates seem really far off the mark. (Face appears far up and to the left.)
Alex Szpakowski on the mailing list suggests the problem is
"I believe this is a bug in SDL_Render?s platform-agnostic mouse coordinate scaling code. It assumes the units of the mouse coordinates are always in pixels, which isn?t the case where high-DPI is involved (regardless of whether iOS is used) ? they?re actually in ?DPI independent? coordinates (which matches the window size, but not the renderer output size)."
Additionally, if this is correct, the Mac under Retina is also probably affected too and "as well as any other platform SDL adds high-dpi support for in the future".
felix
The functions in src/render/SDL_yuv_mmx.c contain the following inline assembly snippet:
/* tap dance to workaround the inability to use %%ebx at will... */
/* move one thing to the stack... */
"pushl $0\n" /* save a slot on the stack. */
"pushl %%ebx\n" /* save %%ebx. */
"movl %0, %%ebx\n" /* put the thing in ebx. */
"movl %%ebx,4(%%esp)\n" /* put the thing in the stack slot. */
"popl %%ebx\n" /* get back %%ebx (the PIC register). */
Here's how it ended up in a binary on my old laptop:
0xb5c17dbd <ColorRGBDitherYV12MMX1X+93>: push $0x0
0xb5c17dbf <ColorRGBDitherYV12MMX1X+95>: push %ebx
0xb5c17dc0 <ColorRGBDitherYV12MMX1X+96>: mov 0xc(%esp),%ebx
0xb5c17dc4 <ColorRGBDitherYV12MMX1X+100>: mov %ebx,0x4(%esp)
0xb5c17dc8 <ColorRGBDitherYV12MMX1X+104>: pop %ebx
Apparently the compiler, oblivious to the fact that the assembly snippet manipulates the %esp register, decided to refer to the operand via that same register instead of via %ebp (I believe -fomit-frame-pointer enables this). This causes %ebx to be loaded with the wrong value, which later leads to a null pointer dereference.
Recent GCC can use the %ebx register normally: <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47602#c16>. There is even an explicit constraint "b" for allocating it.
Tom Seddon
GL_ActivateRenderer may call GL_UpdateViewport, which leaves the GL_PROJECTION matrix selected. But after GL_ResetState, the GL_MODELVIEW matrix is selected, suggesting that's the intended default state.
It seems at least like these should be consistent. Presumably GL_UpdateViewport should be doing a glMatrixMode(GL_MODELVIEW) before it finishes.
Richard Russell
Resuming from a suspended state results in a black screen. This only happens when using GLES 1.1 (GLES 2 resumes correctly) and when the render target has been changed using SDL_SetRenderTarget. This problem is new in 2.0.4.
The attached test case demonstrates the issue.
Sylvain Becker has apparently found a fix as follows:
"In the opengles leaf function (in 'src/render/opengles/SDL_render_gles.c'), it appears there is a call to 'GLES_ActivateRenderer' in 'GLES_SetRenderTarget', which is not present in opengles2. When commenting out this 'GLES_ActivateRenderer', it seems to resume fine".
This appears to fix the testcase perfectly, but I don't know whether it could have any undesirable side-effects.
felix
Here's a snippet of SDL_DestroyRenderer from hg revision 10746:7540ff5d0e0e:
SDL_Texture *texture = NULL;
SDL_Texture *nexttexture = NULL;
/* ... */
for (texture = renderer->textures; texture; texture = nexttexture) {
nexttexture = texture->next;
SDL_DestroyTexture(texture);
}
SDL_DestroyTexture removes the texture from the linked list pointed to by the renderer and ends up calling SDL_DestroyTextureInternal, which contains this:
if (texture->native) {
SDL_DestroyTexture(texture->native);
}
If it happens that texture->native is an alias of nexttexture two stack frames up, SDL_DestroyRenderer will end up trying to destroy an already freed texture. I've had this very situation happen in dosemu2.
Bug introduced in revision 10650:a8253d439914, which has a somewhat ironic description of "Fixed all known static analysis bugs"...
Simon Hug
The software renderer produces incorrect results when blending textures at an angle with certain blend modes. It seems that there were some edge cases that weren't considered when the SW_RenderCopyEx function was last changed. Or another bug possibly covered up the problem. (More on that in another bug report.)
Most of the issues come from the fact that the rotating function sets a black colorkey. This is problematic because black is most likely appearing in the surface and the final blit will ignore these pixels. Unless a colorkey is already set (the software renderer currently never sets one), it's very hard to find a free color. Of course it could scan over the whole image until one is found, but that seems inefficient.
The following blend modes have issues when drawn at an angle.
NONE: The black pixels get ignored, making them essentially transparent. This breaks the 'dstRGBA = srcRGBA' definition of the NONE blend mode.
MOD: Again, the black pixels get ignored. This also breaks the 'dstRGB = dstRGB * srcRGB' definition of the MOD blend mode, where black pixels would make the destination black as well. A white colorkey will work though, with some preparations.
BLEND: There are some issues when blending a texture with a translucent RGBA target texture. I - uh - forgot what the problem here exactly is.
This patch fixes the issues mentioned above. It mainly changes the code so it tries to do things without the colorkey and removes the automatic format conversion part from the SDLgfx_rotateSurface function. Getting the format right is something the caller has to do now and the required code has been added to the SW_RenderCopyEx function.
There's a small change to the SW_CreateTexture function. RLE encoding a surface with an alpha mask can be a lossy process. Depending on how the user uses the RGBA channels, this may be undesired. The change that surfaces with an alpha mask don't get encoded makes the software renderer consistent with the other renderers.
The SW_RenderCopyEx function now does these steps: Lock the source surface if necessary. Create a clone of the source by using the pixel buffer directly. Check the format and set a flag if a conversion is necessary. Check if scaling or cropping is necessary and set the flag for that as well. Check if color and alpha modulation has to be done before the rotate. Check if the source is an opaque surface. If not, it creates a mask surface that is necessary for the NONE blend mode. If any of the flags were set, a new surface is created and the source will be converted, scaled, cropped, and modulated. The rest of the function stays somewhat the same. The mask also needs to be rotated of course and then there is the NONE blend mode...
It's surprisingly hard to get the pixel from a rotated surface to the destination buffer without affecting the pixel outside the rotated area. I found a way to do this with three blits which is pretty hard on the performance. Perhaps someone has an idea how to do this faster?
As mentioned above, the SDLgfx_rotateSurface now only takes 8-bit paletted or 32-bit with alpha mask surfaces. It additionally sets the new surfaces up for the MOD blend mode.
I shortly tested the 8-bit path of SDLgfx_rotateSurface and it seemed to work so far. This path is not used by the software renderer anyway.
Sylvain 2016-11-07 08:49:34 UTC
when rotated +90 or -90, some transparent lines appears, though there is no Alpha or ColorKey.
if you set a dummy colorkey, it will remove the line ...
if you set a some alpha mod, the +90/-90 get transparent but not the 0/180 ...
Call SDL_GL_GetDrawableSize() directly because we may be in the initialization path and SDL_GetRendererOutputSize() will fail because the renderer magic isn't set up yet.
Daniel Gibson
Ok, I followed the simple approach of just making SDL_PIXELFORMAT_RGBA32 an alias of SDL_PIXELFORMAT_RGBA8888/SDL_PIXELFORMAT_ABGR8888, depending on endianess. And I did the same for SDL_PIXELFORMAT_ARGB32, .._BGRA, .._ABGR.
SDL_GetPixelFormatName() will of course return SDL_PIXELFORMAT_RGBA8888 (or SDL_PIXELFORMAT_ABGR8888) instead of SDL_PIXELFORMAT_RGBA32, but as long as that's mentioned in the docs it shouldn't be a problem.
D39 does not support negative viewport values which the current implementation relies on.
D3D11 does support negative viewport values so that will continue working.
Refer to Bug 2799.
Adam M.
When doing a rotated texture copy with the software renderer, where the angle is a multiple of 90 degrees, one or two edges of the image get cut off. This is because of the following line in sw_rotate.c:
if ((unsigned)dx < (unsigned)sw && (unsigned)dy < (unsigned)sh) {
which is effectively saying:
if (dx >= 0 && dx < src->w-1 && dy >= 0 && dy < src->h-1) {
As a result, it doesn't process pixels in the right column or bottom row of the source image (except when they're accessed as part of the bilinear filtering for nearby pixels). This causes it to look like the edges are cut off, and it's especially obvious with an exact multiple of 90 degrees.
Nitz
In function SDLgfx_rotateSurface:
rz_dst =
SDL_CreateRGBSurface(SDL_SWSURFACE, dstwidth, dstheight + GUARD_ROWS,
rz_src->format->Rmask, rz_src->format->Gmask,
rz_src->format->Bmask, rz_src->format->Amask);
Here rz_src get De-referenced without NULL check, which is risky.