mirror of
https://github.com/Relintai/pandemonium_engine_docs.git
synced 2025-01-08 15:09:50 +01:00
Cleanups.
This commit is contained in:
parent
94c244f458
commit
a29c7f8dd0
@ -1,10 +1,8 @@
|
|||||||
|
|
||||||
|
|
||||||
General optimization tips
|
# General optimization tips
|
||||||
=========================
|
|
||||||
|
|
||||||
Introduction
|
### Introduction
|
||||||
~~~~~~~~~~~~
|
|
||||||
|
|
||||||
In an ideal world, computers would run at infinite speed. The only limit to
|
In an ideal world, computers would run at infinite speed. The only limit to
|
||||||
what we could achieve would be our imagination. However, in the real world, it's
|
what we could achieve would be our imagination. However, in the real world, it's
|
||||||
@ -22,16 +20,14 @@ To achieve the best results, we have two approaches:
|
|||||||
|
|
||||||
And preferably, we will use a blend of the two.
|
And preferably, we will use a blend of the two.
|
||||||
|
|
||||||
Smoke and mirrors
|
#### Smoke and mirrors
|
||||||
^^^^^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
Part of working smarter is recognizing that, in games, we can often get the
|
Part of working smarter is recognizing that, in games, we can often get the
|
||||||
player to believe they're in a world that is far more complex, interactive, and
|
player to believe they're in a world that is far more complex, interactive, and
|
||||||
graphically exciting than it really is. A good programmer is a magician, and
|
graphically exciting than it really is. A good programmer is a magician, and
|
||||||
should strive to learn the tricks of the trade while trying to invent new ones.
|
should strive to learn the tricks of the trade while trying to invent new ones.
|
||||||
|
|
||||||
The nature of slowness
|
#### The nature of slowness
|
||||||
^^^^^^^^^^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
To the outside observer, performance problems are often lumped together.
|
To the outside observer, performance problems are often lumped together.
|
||||||
But in reality, there are several different kinds of performance problems:
|
But in reality, there are several different kinds of performance problems:
|
||||||
@ -45,8 +41,7 @@ But in reality, there are several different kinds of performance problems:
|
|||||||
|
|
||||||
Each of these are annoying to the user, but in different ways.
|
Each of these are annoying to the user, but in different ways.
|
||||||
|
|
||||||
Measuring performance
|
# Measuring performance
|
||||||
=====================
|
|
||||||
|
|
||||||
Probably the most important tool for optimization is the ability to measure
|
Probably the most important tool for optimization is the ability to measure
|
||||||
performance - to identify where bottlenecks are, and to measure the success of
|
performance - to identify where bottlenecks are, and to measure the success of
|
||||||
@ -66,8 +61,7 @@ Be very aware that the relative performance of different areas can vary on
|
|||||||
different hardware. It's often a good idea to measure timings on more than one
|
different hardware. It's often a good idea to measure timings on more than one
|
||||||
device. This is especially the case if you're targeting mobile devices.
|
device. This is especially the case if you're targeting mobile devices.
|
||||||
|
|
||||||
Limitations
|
### Limitations
|
||||||
~~~~~~~~~~~
|
|
||||||
|
|
||||||
CPU profilers are often the go-to method for measuring performance. However,
|
CPU profilers are often the go-to method for measuring performance. However,
|
||||||
they don't always tell the whole story.
|
they don't always tell the whole story.
|
||||||
@ -84,15 +78,13 @@ they don't always tell the whole story.
|
|||||||
As a result of these limitations, you often need to use detective work to find
|
As a result of these limitations, you often need to use detective work to find
|
||||||
out where bottlenecks are.
|
out where bottlenecks are.
|
||||||
|
|
||||||
Detective work
|
### Detective work
|
||||||
~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Detective work is a crucial skill for developers (both in terms of performance,
|
Detective work is a crucial skill for developers (both in terms of performance,
|
||||||
and also in terms of bug fixing). This can include hypothesis testing, and
|
and also in terms of bug fixing). This can include hypothesis testing, and
|
||||||
binary search.
|
binary search.
|
||||||
|
|
||||||
Hypothesis testing
|
#### Hypothesis testing
|
||||||
^^^^^^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
Say, for example, that you believe sprites are slowing down your game.
|
Say, for example, that you believe sprites are slowing down your game.
|
||||||
You can test this hypothesis by:
|
You can test this hypothesis by:
|
||||||
@ -105,8 +97,7 @@ the performance drop?
|
|||||||
- You can test this by keeping everything the same, but changing the sprite
|
- You can test this by keeping everything the same, but changing the sprite
|
||||||
size, and measuring performance.
|
size, and measuring performance.
|
||||||
|
|
||||||
Binary search
|
#### Binary search
|
||||||
^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
If you know that frames are taking much longer than they should, but you're
|
If you know that frames are taking much longer than they should, but you're
|
||||||
not sure where the bottleneck lies. You could begin by commenting out
|
not sure where the bottleneck lies. You could begin by commenting out
|
||||||
@ -116,8 +107,7 @@ performance improved more or less than expected?
|
|||||||
Once you know which of the two halves contains the bottleneck, you can
|
Once you know which of the two halves contains the bottleneck, you can
|
||||||
repeat this process until you've pinned down the problematic area.
|
repeat this process until you've pinned down the problematic area.
|
||||||
|
|
||||||
Profilers
|
# Profilers
|
||||||
=========
|
|
||||||
|
|
||||||
Profilers allow you to time your program while running it. Profilers then
|
Profilers allow you to time your program while running it. Profilers then
|
||||||
provide results telling you what percentage of time was spent in different
|
provide results telling you what percentage of time was spent in different
|
||||||
@ -130,8 +120,7 @@ and lead to slower performance.
|
|||||||
|
|
||||||
For more info about using Pandemonium's built-in profiler, see `doc_debugger_panel`.
|
For more info about using Pandemonium's built-in profiler, see `doc_debugger_panel`.
|
||||||
|
|
||||||
Principles
|
# Principles
|
||||||
==========
|
|
||||||
|
|
||||||
`Donald Knuth ( https://en.wikipedia.org/wiki/Donald_Knuth )` said:
|
`Donald Knuth ( https://en.wikipedia.org/wiki/Donald_Knuth )` said:
|
||||||
|
|
||||||
@ -160,8 +149,7 @@ One misleading aspect of the quote is that people tend to focus on the subquote
|
|||||||
optimization is (by definition) undesirable, performant software is the result
|
optimization is (by definition) undesirable, performant software is the result
|
||||||
of performant design.
|
of performant design.
|
||||||
|
|
||||||
Performant design
|
### Performant design
|
||||||
~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
The danger with encouraging people to ignore optimization until necessary, is
|
The danger with encouraging people to ignore optimization until necessary, is
|
||||||
that it conveniently ignores that the most important time to consider
|
that it conveniently ignores that the most important time to consider
|
||||||
@ -175,8 +163,7 @@ general programming. A performant design, even without low-level optimization,
|
|||||||
will often run many times faster than a mediocre design with low-level
|
will often run many times faster than a mediocre design with low-level
|
||||||
optimization.
|
optimization.
|
||||||
|
|
||||||
Incremental design
|
### Incremental design
|
||||||
~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Of course, in practice, unless you have prior knowledge, you are unlikely to
|
Of course, in practice, unless you have prior knowledge, you are unlikely to
|
||||||
come up with the best design the first time. Instead, you'll often make a series
|
come up with the best design the first time. Instead, you'll often make a series
|
||||||
@ -192,8 +179,7 @@ to a resurgence in data-oriented design, which involves designing data
|
|||||||
structures and algorithms for *cache locality* of data and linear access, rather
|
structures and algorithms for *cache locality* of data and linear access, rather
|
||||||
than jumping around in memory.
|
than jumping around in memory.
|
||||||
|
|
||||||
The optimization process
|
### The optimization process
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Assuming we have a reasonable design, and taking our lessons from Knuth, our
|
Assuming we have a reasonable design, and taking our lessons from Knuth, our
|
||||||
first step in optimization should be to identify the biggest bottlenecks - the
|
first step in optimization should be to identify the biggest bottlenecks - the
|
||||||
@ -209,8 +195,7 @@ The process is thus:
|
|||||||
2. Optimize bottleneck.
|
2. Optimize bottleneck.
|
||||||
3. Return to step 1.
|
3. Return to step 1.
|
||||||
|
|
||||||
Optimizing bottlenecks
|
### Optimizing bottlenecks
|
||||||
~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Some profilers will even tell you which part of a function (which data accesses,
|
Some profilers will even tell you which part of a function (which data accesses,
|
||||||
calculations) are slowing things down.
|
calculations) are slowing things down.
|
||||||
@ -234,11 +219,9 @@ will increase speed, others may have a negative effect. Sometimes, a small
|
|||||||
positive effect will be outweighed by the negatives of more complex code, and
|
positive effect will be outweighed by the negatives of more complex code, and
|
||||||
you may choose to leave out that optimization.
|
you may choose to leave out that optimization.
|
||||||
|
|
||||||
Appendix
|
# Appendix
|
||||||
========
|
|
||||||
|
|
||||||
Bottleneck math
|
### Bottleneck math
|
||||||
~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
The proverb *"a chain is only as strong as its weakest link"* applies directly to
|
The proverb *"a chain is only as strong as its weakest link"* applies directly to
|
||||||
performance optimization. If your project is spending 90% of the time in
|
performance optimization. If your project is spending 90% of the time in
|
||||||
|
@ -1,10 +1,8 @@
|
|||||||
|
|
||||||
|
|
||||||
CPU optimization
|
# CPU optimization
|
||||||
================
|
|
||||||
|
|
||||||
Measuring performance
|
# Measuring performance
|
||||||
=====================
|
|
||||||
|
|
||||||
We have to know where the "bottlenecks" are to know how to speed up our program.
|
We have to know where the "bottlenecks" are to know how to speed up our program.
|
||||||
Bottlenecks are the slowest parts of the program that limit the rate that
|
Bottlenecks are the slowest parts of the program that limit the rate that
|
||||||
@ -15,8 +13,7 @@ lead to small performance improvements.
|
|||||||
|
|
||||||
For the CPU, the easiest way to identify bottlenecks is to use a profiler.
|
For the CPU, the easiest way to identify bottlenecks is to use a profiler.
|
||||||
|
|
||||||
CPU profilers
|
# CPU profilers
|
||||||
=============
|
|
||||||
|
|
||||||
Profilers run alongside your program and take timing measurements to work out
|
Profilers run alongside your program and take timing measurements to work out
|
||||||
what proportion of time is spent in each function.
|
what proportion of time is spent in each function.
|
||||||
@ -28,9 +25,7 @@ slow down your project significantly.
|
|||||||
|
|
||||||
After profiling, you can look back at the results for a frame.
|
After profiling, you can look back at the results for a frame.
|
||||||
|
|
||||||
.. figure:: img/pandemonium_profiler.png)
|
![Screenshot of the Pandemonium profiler](img/pandemonium_profiler.png)
|
||||||
.. figure:: img/pandemonium_profiler.png)
|
|
||||||
:alt: Screenshot of the Pandemonium profiler
|
|
||||||
|
|
||||||
Results of a profile of one of the demo projects.
|
Results of a profile of one of the demo projects.
|
||||||
|
|
||||||
@ -49,8 +44,7 @@ you can usually increase speed by optimizing this area.
|
|||||||
For more info about using Pandemonium's built-in profiler, see
|
For more info about using Pandemonium's built-in profiler, see
|
||||||
`doc_debugger_panel`.
|
`doc_debugger_panel`.
|
||||||
|
|
||||||
External profilers
|
### External profilers
|
||||||
~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Although the Pandemonium IDE profiler is very convenient and useful, sometimes you
|
Although the Pandemonium IDE profiler is very convenient and useful, sometimes you
|
||||||
need more power, and the ability to profile the Pandemonium engine source code itself.
|
need more power, and the ability to profile the Pandemonium engine source code itself.
|
||||||
@ -70,8 +64,7 @@ Note:
|
|||||||
optimized. Bottlenecks are often in a different place in debug builds,
|
optimized. Bottlenecks are often in a different place in debug builds,
|
||||||
so you should profile release builds whenever possible.
|
so you should profile release builds whenever possible.
|
||||||
|
|
||||||
.. figure:: img/valgrind.png)
|
![Screenshot of Callgrind](img/valgrind.png)
|
||||||
:alt: Screenshot of Callgrind
|
|
||||||
|
|
||||||
Example results from Callgrind, which is part of Valgrind.
|
Example results from Callgrind, which is part of Valgrind.
|
||||||
|
|
||||||
@ -97,8 +90,7 @@ done in the graphics API. This specific profiling led to the development of 2D
|
|||||||
batching, which greatly speeds up 2D rendering by reducing bottlenecks in this
|
batching, which greatly speeds up 2D rendering by reducing bottlenecks in this
|
||||||
area.
|
area.
|
||||||
|
|
||||||
Manually timing functions
|
# Manually timing functions
|
||||||
=========================
|
|
||||||
|
|
||||||
Another handy technique, especially once you have identified the bottleneck
|
Another handy technique, especially once you have identified the bottleneck
|
||||||
using a profiler, is to manually time the function or area under test.
|
using a profiler, is to manually time the function or area under test.
|
||||||
@ -125,8 +117,7 @@ As you attempt to optimize functions, be sure to either repeatedly profile or
|
|||||||
time them as you go. This will give you crucial feedback as to whether the
|
time them as you go. This will give you crucial feedback as to whether the
|
||||||
optimization is working (or not).
|
optimization is working (or not).
|
||||||
|
|
||||||
Caches
|
# Caches
|
||||||
======
|
|
||||||
|
|
||||||
CPU caches are something else to be particularly aware of, especially when
|
CPU caches are something else to be particularly aware of, especially when
|
||||||
comparing timing results of two different versions of a function. The results
|
comparing timing results of two different versions of a function. The results
|
||||||
@ -156,10 +147,9 @@ will be able to work as fast as possible.
|
|||||||
Pandemonium usually takes care of such low-level details for you. For example, the
|
Pandemonium usually takes care of such low-level details for you. For example, the
|
||||||
Server APIs make sure data is optimized for caching already for things like
|
Server APIs make sure data is optimized for caching already for things like
|
||||||
rendering and physics. Still, you should be especially aware of caching when
|
rendering and physics. Still, you should be especially aware of caching when
|
||||||
using `GDNative <toc-tutorials-gdnative )`.
|
using `GDNative ( toc-tutorials-gdnative )`.
|
||||||
|
|
||||||
Languages
|
# Languages
|
||||||
=========
|
|
||||||
|
|
||||||
Pandemonium supports a number of different languages, and it is worth bearing in mind
|
Pandemonium supports a number of different languages, and it is worth bearing in mind
|
||||||
that there are trade-offs involved. Some languages are designed for ease of use
|
that there are trade-offs involved. Some languages are designed for ease of use
|
||||||
@ -169,42 +159,37 @@ Built-in engine functions run at the same speed regardless of the scripting
|
|||||||
language you choose. If your project is making a lot of calculations in its own
|
language you choose. If your project is making a lot of calculations in its own
|
||||||
code, consider moving those calculations to a faster language.
|
code, consider moving those calculations to a faster language.
|
||||||
|
|
||||||
GDScript
|
### GDScript
|
||||||
~~~~~~~~
|
|
||||||
|
|
||||||
`GDScript <toc-learn-scripting-gdscript )` is designed to be easy to use and iterate,
|
`GDScript (toc-learn-scripting-gdscript )` is designed to be easy to use and iterate,
|
||||||
and is ideal for making many types of games. However, in this language, ease of
|
and is ideal for making many types of games. However, in this language, ease of
|
||||||
use is considered more important than performance. If you need to make heavy
|
use is considered more important than performance. If you need to make heavy
|
||||||
calculations, consider moving some of your project to one of the other
|
calculations, consider moving some of your project to one of the other
|
||||||
languages.
|
languages.
|
||||||
|
|
||||||
C#
|
### C#
|
||||||
~~
|
|
||||||
|
|
||||||
`C# <toc-learn-scripting-C# )` is popular and has first-class support in Pandemonium.It
|
`C# (toc-learn-scripting-C# )` is popular and has first-class support in Pandemonium.It
|
||||||
offers a good compromise between speed and ease of use. Beware of possible
|
offers a good compromise between speed and ease of use. Beware of possible
|
||||||
garbage collection pauses and leaks that can occur during gameplay, though. A
|
garbage collection pauses and leaks that can occur during gameplay, though. A
|
||||||
common approach to workaround issues with garbage collection is to use *object
|
common approach to workaround issues with garbage collection is to use *object
|
||||||
pooling*, which is outside the scope of this guide.
|
pooling*, which is outside the scope of this guide.
|
||||||
|
|
||||||
Other languages
|
### Other languages
|
||||||
~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Third parties provide support for several other languages, including `Rust
|
Third parties provide support for several other languages, including `Rust
|
||||||
( https://github.com/pandemonium-rust/pandemonium-rust )` and `Javascript
|
( https://github.com/pandemonium-rust/pandemonium-rust )` and `Javascript
|
||||||
( https://github.com/PandemoniumExplorer/ECMAScript )`.
|
( https://github.com/PandemoniumExplorer/ECMAScript )`.
|
||||||
|
|
||||||
C++
|
### C++
|
||||||
~~~
|
|
||||||
|
|
||||||
Pandemonium is written in C++. Using C++ will usually result in the fastest code.
|
Pandemonium is written in C++. Using C++ will usually result in the fastest code.
|
||||||
However, on a practical level, it is the most difficult to deploy to end users'
|
However, on a practical level, it is the most difficult to deploy to end users'
|
||||||
machines on different platforms. Options for using C++ include
|
machines on different platforms. Options for using C++ include
|
||||||
`GDNative <toc-tutorials-gdnative )` and
|
`GDNative (toc-tutorials-gdnative )` and
|
||||||
`custom modules ( doc_custom_modules_in_c++ )`.
|
`custom modules ( doc_custom_modules_in_c++ )`.
|
||||||
|
|
||||||
Threads
|
# Threads
|
||||||
=======
|
|
||||||
|
|
||||||
Consider using threads when making a lot of calculations that can run in
|
Consider using threads when making a lot of calculations that can run in
|
||||||
parallel to each other. Modern CPUs have multiple cores, each one capable of
|
parallel to each other. Modern CPUs have multiple cores, each one capable of
|
||||||
@ -222,8 +207,7 @@ debugger doesn't support setting up breakpoints in threads yet.
|
|||||||
|
|
||||||
For more information on threads, see `doc_using_multiple_threads`.
|
For more information on threads, see `doc_using_multiple_threads`.
|
||||||
|
|
||||||
SceneTree
|
# SceneTree
|
||||||
=========
|
|
||||||
|
|
||||||
Although Nodes are an incredibly powerful and versatile concept, be aware that
|
Although Nodes are an incredibly powerful and versatile concept, be aware that
|
||||||
every node has a cost. Built-in functions such as `process()` and
|
every node has a cost. Built-in functions such as `process()` and
|
||||||
@ -244,8 +228,7 @@ This can be very useful for adding and removing areas from a game, for example.
|
|||||||
You can avoid the SceneTree altogether by using Server APIs. For more
|
You can avoid the SceneTree altogether by using Server APIs. For more
|
||||||
information, see `doc_using_servers`.
|
information, see `doc_using_servers`.
|
||||||
|
|
||||||
Physics
|
# Physics
|
||||||
=======
|
|
||||||
|
|
||||||
In some situations, physics can end up becoming a bottleneck. This is
|
In some situations, physics can end up becoming a bottleneck. This is
|
||||||
particularly the case with complex worlds and large numbers of physics objects.
|
particularly the case with complex worlds and large numbers of physics objects.
|
||||||
|
@ -1,10 +1,8 @@
|
|||||||
|
|
||||||
|
|
||||||
GPU optimization
|
# GPU optimization
|
||||||
================
|
|
||||||
|
|
||||||
Introduction
|
### Introduction
|
||||||
~~~~~~~~~~~~
|
|
||||||
|
|
||||||
The demand for new graphics features and progress almost guarantees that you
|
The demand for new graphics features and progress almost guarantees that you
|
||||||
will encounter graphics bottlenecks. Some of these can be on the CPU side, for
|
will encounter graphics bottlenecks. Some of these can be on the CPU side, for
|
||||||
@ -22,8 +20,7 @@ indirectly by changing the instructions you give to the GPU. Also, it may be
|
|||||||
more difficult to take measurements. In many cases, the only way of measuring
|
more difficult to take measurements. In many cases, the only way of measuring
|
||||||
performance is by examining changes in the time spent rendering each frame.
|
performance is by examining changes in the time spent rendering each frame.
|
||||||
|
|
||||||
Draw calls, state changes, and APIs
|
# Draw calls, state changes, and APIs
|
||||||
===================================
|
|
||||||
|
|
||||||
Note:
|
Note:
|
||||||
The following section is not relevant to end-users, but is useful to
|
The following section is not relevant to end-users, but is useful to
|
||||||
@ -42,8 +39,7 @@ reduce these instructions to a bare minimum and group together similar objects
|
|||||||
as much as possible so they can be rendered together, or with the minimum number
|
as much as possible so they can be rendered together, or with the minimum number
|
||||||
of these expensive state changes.
|
of these expensive state changes.
|
||||||
|
|
||||||
2D batching
|
### 2D batching
|
||||||
~~~~~~~~~~~
|
|
||||||
|
|
||||||
In 2D, the costs of treating each item individually can be prohibitively high -
|
In 2D, the costs of treating each item individually can be prohibitively high -
|
||||||
there can easily be thousands of them on the screen. This is why 2D *batching*
|
there can easily be thousands of them on the screen. This is why 2D *batching*
|
||||||
@ -54,8 +50,7 @@ to a minimum.
|
|||||||
|
|
||||||
For more information on 2D batching, see `doc_batching`.
|
For more information on 2D batching, see `doc_batching`.
|
||||||
|
|
||||||
3D batching
|
### 3D batching
|
||||||
~~~~~~~~~~~
|
|
||||||
|
|
||||||
In 3D, we still aim to minimize draw calls and state changes. However, it can be
|
In 3D, we still aim to minimize draw calls and state changes. However, it can be
|
||||||
more difficult to batch together several objects into a single draw call. 3D
|
more difficult to batch together several objects into a single draw call. 3D
|
||||||
@ -76,8 +71,7 @@ numbers of distant or low-poly objects.
|
|||||||
For more information on 3D specific optimizations, see
|
For more information on 3D specific optimizations, see
|
||||||
`doc_optimizing_3d_performance`.
|
`doc_optimizing_3d_performance`.
|
||||||
|
|
||||||
Reuse Shaders and Materials
|
### Reuse Shaders and Materials
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
The Pandemonium renderer is a little different to what is out there. It's designed to
|
The Pandemonium renderer is a little different to what is out there. It's designed to
|
||||||
minimize GPU state changes as much as possible. `SpatialMaterial
|
minimize GPU state changes as much as possible. `SpatialMaterial
|
||||||
@ -99,8 +93,7 @@ If a scene has, for example, `20,000` objects with `20,000` different
|
|||||||
materials each, rendering will be slow. If the same scene has `20,000`
|
materials each, rendering will be slow. If the same scene has `20,000`
|
||||||
objects, but only uses `100` materials, rendering will be much faster.
|
objects, but only uses `100` materials, rendering will be much faster.
|
||||||
|
|
||||||
Pixel cost versus vertex cost
|
# Pixel cost versus vertex cost
|
||||||
=============================
|
|
||||||
|
|
||||||
You may have heard that the lower the number of polygons in a model, the faster
|
You may have heard that the lower the number of polygons in a model, the faster
|
||||||
it will be rendered. This is *really* relative and depends on many factors.
|
it will be rendered. This is *really* relative and depends on many factors.
|
||||||
@ -155,8 +148,7 @@ Pay attention to the additional vertex processing required when using:
|
|||||||
- Morphs (shape keys)
|
- Morphs (shape keys)
|
||||||
- Vertex-lit objects (common on mobile)
|
- Vertex-lit objects (common on mobile)
|
||||||
|
|
||||||
Pixel/fragment shaders and fill rate
|
# Pixel/fragment shaders and fill rate
|
||||||
====================================
|
|
||||||
|
|
||||||
In contrast to vertex processing, the costs of fragment (per-pixel) shading have
|
In contrast to vertex processing, the costs of fragment (per-pixel) shading have
|
||||||
increased dramatically over the years. Screen resolutions have increased (the
|
increased dramatically over the years. Screen resolutions have increased (the
|
||||||
@ -182,8 +174,7 @@ amount of work the GPU has to do. You can do this by simplifying the shader
|
|||||||
**When targeting mobile devices, consider using the simplest possible shaders
|
**When targeting mobile devices, consider using the simplest possible shaders
|
||||||
you can reasonably afford to use.**
|
you can reasonably afford to use.**
|
||||||
|
|
||||||
Reading textures
|
### Reading textures
|
||||||
~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
The other factor in fragment shaders is the cost of reading textures. Reading
|
The other factor in fragment shaders is the cost of reading textures. Reading
|
||||||
textures is an expensive operation, especially when reading from several
|
textures is an expensive operation, especially when reading from several
|
||||||
@ -195,8 +186,7 @@ mobiles.
|
|||||||
**If you use third-party shaders or write your own shaders, try to use
|
**If you use third-party shaders or write your own shaders, try to use
|
||||||
algorithms that require as few texture reads as possible.**
|
algorithms that require as few texture reads as possible.**
|
||||||
|
|
||||||
Texture compression
|
### Texture compression
|
||||||
~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
By default, Pandemonium compresses textures of 3D models when imported using video RAM
|
By default, Pandemonium compresses textures of 3D models when imported using video RAM
|
||||||
(VRAM) compression. Video RAM compression isn't as efficient in size as PNG or
|
(VRAM) compression. Video RAM compression isn't as efficient in size as PNG or
|
||||||
@ -222,8 +212,7 @@ Note:
|
|||||||
significantly due to their low resolution.
|
significantly due to their low resolution.
|
||||||
|
|
||||||
|
|
||||||
Post-processing and shadows
|
### Post-processing and shadows
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Post-processing effects and shadows can also be expensive in terms of fragment
|
Post-processing effects and shadows can also be expensive in terms of fragment
|
||||||
shading activity. Always test the impact of these on different hardware.
|
shading activity. Always test the impact of these on different hardware.
|
||||||
@ -234,8 +223,7 @@ performance of shadows is to turn shadows off for as many lights and objects as
|
|||||||
possible. Smaller or distant OmniLights/SpotLights can often have their shadows
|
possible. Smaller or distant OmniLights/SpotLights can often have their shadows
|
||||||
disabled with only a small visual impact.
|
disabled with only a small visual impact.
|
||||||
|
|
||||||
Transparency and blending
|
# Transparency and blending
|
||||||
=========================
|
|
||||||
|
|
||||||
Transparent objects present particular problems for rendering efficiency. Opaque
|
Transparent objects present particular problems for rendering efficiency. Opaque
|
||||||
objects (especially in 3D) can be essentially rendered in any order and the
|
objects (especially in 3D) can be essentially rendered in any order and the
|
||||||
@ -259,8 +247,7 @@ minimize these fill rate requirements, especially on mobile, where fill rate is
|
|||||||
very expensive. Indeed, in many situations, rendering more complex opaque
|
very expensive. Indeed, in many situations, rendering more complex opaque
|
||||||
geometry can end up being faster than using transparency to "cheat".
|
geometry can end up being faster than using transparency to "cheat".
|
||||||
|
|
||||||
Multi-platform advice
|
# Multi-platform advice
|
||||||
=====================
|
|
||||||
|
|
||||||
If you are aiming to release on multiple platforms, test *early* and test
|
If you are aiming to release on multiple platforms, test *early* and test
|
||||||
*often* on all your platforms, especially mobile. Developing a game on desktop
|
*often* on all your platforms, especially mobile. Developing a game on desktop
|
||||||
@ -271,8 +258,7 @@ add optional enhancements for more powerful platforms. For example, you may want
|
|||||||
to use the GLES2 backend for both desktop and mobile platforms where you target
|
to use the GLES2 backend for both desktop and mobile platforms where you target
|
||||||
both.
|
both.
|
||||||
|
|
||||||
Mobile/tiled renderers
|
# Mobile/tiled renderers
|
||||||
======================
|
|
||||||
|
|
||||||
As described above, GPUs on mobile devices work in dramatically different ways
|
As described above, GPUs on mobile devices work in dramatically different ways
|
||||||
from GPUs on desktop. Most mobile devices use tile renderers. Tile renderers
|
from GPUs on desktop. Most mobile devices use tile renderers. Tile renderers
|
||||||
|
@ -1,7 +1,6 @@
|
|||||||
|
|
||||||
|
|
||||||
Optimization using MultiMeshes
|
# Optimization using MultiMeshes
|
||||||
==============================
|
|
||||||
|
|
||||||
For large amount of instances (in the thousands), that need to be constantly processed
|
For large amount of instances (in the thousands), that need to be constantly processed
|
||||||
(and certain amount of control needs to be retained),
|
(and certain amount of control needs to be retained),
|
||||||
@ -11,8 +10,7 @@ When the amount of objects reach the hundreds of thousands or millions,
|
|||||||
none of these approaches are efficient anymore. Still, depending on the requirements, there
|
none of these approaches are efficient anymore. Still, depending on the requirements, there
|
||||||
is one more optimization possible.
|
is one more optimization possible.
|
||||||
|
|
||||||
MultiMeshes
|
## MultiMeshes
|
||||||
-----------
|
|
||||||
|
|
||||||
A `MultiMesh( MultiMesh )` is a single draw primitive that can draw up to millions
|
A `MultiMesh( MultiMesh )` is a single draw primitive that can draw up to millions
|
||||||
of objects in one go. It's extremely efficient because it uses the GPU hardware to do this
|
of objects in one go. It's extremely efficient because it uses the GPU hardware to do this
|
||||||
@ -44,8 +42,7 @@ controlled with the `MultiMesh.visible_instance_count`
|
|||||||
property. The typical workflow is to allocate the maximum amount of instances that will be used,
|
property. The typical workflow is to allocate the maximum amount of instances that will be used,
|
||||||
then change the amount visible depending on how many are currently needed.
|
then change the amount visible depending on how many are currently needed.
|
||||||
|
|
||||||
Multimesh example
|
## Multimesh example
|
||||||
-----------------
|
|
||||||
|
|
||||||
Here is an example of using a MultiMesh from code. Languages other than GDScript may be more
|
Here is an example of using a MultiMesh from code. Languages other than GDScript may be more
|
||||||
efficient for millions of objects, but for a few thousands, GDScript should be fine.
|
efficient for millions of objects, but for a few thousands, GDScript should be fine.
|
||||||
|
@ -1,10 +1,7 @@
|
|||||||
|
|
||||||
|
# Optimization using batching
|
||||||
|
|
||||||
Optimization using batching
|
### Introduction
|
||||||
===========================
|
|
||||||
|
|
||||||
Introduction
|
|
||||||
~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Game engines have to send a set of instructions to the GPU to tell the GPU what
|
Game engines have to send a set of instructions to the GPU to tell the GPU what
|
||||||
and where to draw. These instructions are sent using common instructions called
|
and where to draw. These instructions are sent using common instructions called
|
||||||
@ -16,8 +13,7 @@ of work for the user in the GPU driver at the cost of more expensive draw calls.
|
|||||||
As a result, applications can often be sped up by reducing the number of draw
|
As a result, applications can often be sped up by reducing the number of draw
|
||||||
calls.
|
calls.
|
||||||
|
|
||||||
Draw calls
|
#### Draw calls
|
||||||
^^^^^^^^^^
|
|
||||||
|
|
||||||
In 2D, we need to tell the GPU to render a series of primitives (rectangles,
|
In 2D, we need to tell the GPU to render a series of primitives (rectangles,
|
||||||
lines, polygons etc). The most obvious technique is to tell the GPU to render
|
lines, polygons etc). The most obvious technique is to tell the GPU to render
|
||||||
@ -41,8 +37,7 @@ automatically group together primitives wherever possible and send these batches
|
|||||||
on to the GPU. This can give an increase in rendering performance while
|
on to the GPU. This can give an increase in rendering performance while
|
||||||
requiring few (if any) changes to your Pandemonium project.
|
requiring few (if any) changes to your Pandemonium project.
|
||||||
|
|
||||||
How it works
|
### How it works
|
||||||
~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Instructions come into the renderer from your game in the form of a series of
|
Instructions come into the renderer from your game in the form of a series of
|
||||||
items, each of which can contain one or more commands. The items correspond to
|
items, each of which can contain one or more commands. The items correspond to
|
||||||
@ -56,8 +51,7 @@ The batcher uses two main techniques to group together primitives:
|
|||||||
- Consecutive items can be joined together.
|
- Consecutive items can be joined together.
|
||||||
- Consecutive commands within an item can be joined to form a batch.
|
- Consecutive commands within an item can be joined to form a batch.
|
||||||
|
|
||||||
Breaking batching
|
#### Breaking batching
|
||||||
^^^^^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
Batching can only take place if the items or commands are similar enough to be
|
Batching can only take place if the items or commands are similar enough to be
|
||||||
rendered in one draw call. Certain changes (or techniques), by necessity, prevent
|
rendered in one draw call. Certain changes (or techniques), by necessity, prevent
|
||||||
@ -75,8 +69,7 @@ Note:
|
|||||||
For example, if you draw a series of sprites each with a different texture,
|
For example, if you draw a series of sprites each with a different texture,
|
||||||
there is no way they can be batched.
|
there is no way they can be batched.
|
||||||
|
|
||||||
Determining the rendering order
|
#### Determining the rendering order
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
The question arises, if only similar items can be drawn together in a batch, why
|
The question arises, if only similar items can be drawn together in a batch, why
|
||||||
don't we look through all the items in a scene, group together all the similar
|
don't we look through all the items in a scene, group together all the similar
|
||||||
@ -104,8 +97,7 @@ Note:
|
|||||||
can improve performance in some cases. See the
|
can improve performance in some cases. See the
|
||||||
`doc_batching_diagnostics` section to help you make this decision.
|
`doc_batching_diagnostics` section to help you make this decision.
|
||||||
|
|
||||||
A trick
|
#### A trick
|
||||||
^^^^^^^
|
|
||||||
|
|
||||||
And now, a sleight of hand. Even though the idea of painter's order is that
|
And now, a sleight of hand. Even though the idea of painter's order is that
|
||||||
objects are rendered from back to front, consider 3 objects `A`, `B` and
|
objects are rendered from back to front, consider 3 objects `A`, `B` and
|
||||||
@ -129,8 +121,7 @@ drawn *on top* of each other. If we relax that assumption, i.e. if none of these
|
|||||||
3 objects are overlapping, there is *no need* to preserve painter's order. The
|
3 objects are overlapping, there is *no need* to preserve painter's order. The
|
||||||
rendered result will be the same. What if we could take advantage of this?
|
rendered result will be the same. What if we could take advantage of this?
|
||||||
|
|
||||||
Item reordering
|
#### Item reordering
|
||||||
^^^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
![](img/overlap2.png)
|
![](img/overlap2.png)
|
||||||
|
|
||||||
@ -152,8 +143,7 @@ balance the costs and benefits in your project.
|
|||||||
Since the texture only changes once, we can render the above in only 2 draw
|
Since the texture only changes once, we can render the above in only 2 draw
|
||||||
calls.
|
calls.
|
||||||
|
|
||||||
Lights
|
### Lights
|
||||||
~~~~~~
|
|
||||||
|
|
||||||
Although the batching system's job is normally quite straightforward, it becomes
|
Although the batching system's job is normally quite straightforward, it becomes
|
||||||
considerably more complex when 2D lights are used. This is because lights are
|
considerably more complex when 2D lights are used. This is because lights are
|
||||||
@ -207,8 +197,7 @@ that in a real game, you might be drawing closer to 1,000 sprites.
|
|||||||
That is a 1000× decrease in draw calls, and should give a huge increase in
|
That is a 1000× decrease in draw calls, and should give a huge increase in
|
||||||
performance.
|
performance.
|
||||||
|
|
||||||
Overlap test
|
#### Overlap test
|
||||||
^^^^^^^^^^^^
|
|
||||||
|
|
||||||
However, as with the item reordering, things are not that simple. We must first
|
However, as with the item reordering, things are not that simple. We must first
|
||||||
perform the overlap test to determine whether we can join these primitives. This
|
perform the overlap test to determine whether we can join these primitives. This
|
||||||
@ -222,8 +211,7 @@ therefore shouldn't be joined). In practice, the decrease in draw calls may be
|
|||||||
less dramatic than in a perfect situation with no overlapping at all. However,
|
less dramatic than in a perfect situation with no overlapping at all. However,
|
||||||
performance is usually far higher than without this lighting optimization.
|
performance is usually far higher than without this lighting optimization.
|
||||||
|
|
||||||
Light scissoring
|
### Light scissoring
|
||||||
~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Batching can make it more difficult to cull out objects that are not affected or
|
Batching can make it more difficult to cull out objects that are not affected or
|
||||||
partially affected by a light. This can increase the fill rate requirements
|
partially affected by a light. This can increase the fill rate requirements
|
||||||
@ -257,14 +245,12 @@ The exact relationship is probably not necessary for users to worry about, but
|
|||||||
is included in the appendix out of interest:
|
is included in the appendix out of interest:
|
||||||
`doc_batching_light_scissoring_threshold_calculation`
|
`doc_batching_light_scissoring_threshold_calculation`
|
||||||
|
|
||||||
.. figure:: img/scissoring.png)
|
![Light scissoring example diagram](img/scissoring.png)
|
||||||
:alt: Light scissoring example diagram
|
|
||||||
|
|
||||||
Bottom right is a light, the red area is the pixels saved by the scissoring
|
Bottom right is a light, the red area is the pixels saved by the scissoring
|
||||||
operation. Only the intersection needs to be rendered.
|
operation. Only the intersection needs to be rendered.
|
||||||
|
|
||||||
Vertex baking
|
### Vertex baking
|
||||||
~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
The GPU shader receives instructions on what to draw in 2 main ways:
|
The GPU shader receives instructions on what to draw in 2 main ways:
|
||||||
|
|
||||||
@ -290,8 +276,7 @@ In most cases, this works fine, but this shortcut breaks down if a shader expect
|
|||||||
these values to be available individually rather than combined. This can happen
|
these values to be available individually rather than combined. This can happen
|
||||||
in custom shaders.
|
in custom shaders.
|
||||||
|
|
||||||
Custom shaders
|
#### Custom shaders
|
||||||
^^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
As a result of the limitation described above, certain operations in custom
|
As a result of the limitation described above, certain operations in custom
|
||||||
shaders will prevent vertex baking and therefore decrease the potential for
|
shaders will prevent vertex baking and therefore decrease the potential for
|
||||||
@ -301,8 +286,7 @@ currently apply:
|
|||||||
- Reading or writing `COLOR` or `MODULATE` disables vertex color baking.
|
- Reading or writing `COLOR` or `MODULATE` disables vertex color baking.
|
||||||
- Reading `VERTEX` disables vertex position baking.
|
- Reading `VERTEX` disables vertex position baking.
|
||||||
|
|
||||||
Project Settings
|
### Project Settings
|
||||||
~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
To fine-tune batching, a number of project settings are available. You can
|
To fine-tune batching, a number of project settings are available. You can
|
||||||
usually leave these at default during development, but it's a good idea to
|
usually leave these at default during development, but it's a good idea to
|
||||||
@ -311,8 +295,7 @@ tweaking parameters can often give considerable performance gains for very
|
|||||||
little effort. See the on-hover tooltips in the Project Settings for more
|
little effort. See the on-hover tooltips in the Project Settings for more
|
||||||
information.
|
information.
|
||||||
|
|
||||||
rendering/batching/options
|
#### rendering/batching/options
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
- `use_batching
|
- `use_batching
|
||||||
` -
|
` -
|
||||||
@ -328,8 +311,7 @@ rendering/batching/options
|
|||||||
This is a faster way of drawing unbatchable rectangles. However, it may lead
|
This is a faster way of drawing unbatchable rectangles. However, it may lead
|
||||||
to flicker on some hardware so it's not recommended.
|
to flicker on some hardware so it's not recommended.
|
||||||
|
|
||||||
rendering/batching/parameters
|
#### rendering/batching/parameters
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
- `max_join_item_commands` -
|
- `max_join_item_commands` -
|
||||||
One of the most important ways of achieving batching is to join suitable
|
One of the most important ways of achieving batching is to join suitable
|
||||||
@ -358,8 +340,7 @@ rendering/batching/parameters
|
|||||||
textures. The lookahead for the overlap test has a small cost, so the best
|
textures. The lookahead for the overlap test has a small cost, so the best
|
||||||
value may change per project.
|
value may change per project.
|
||||||
|
|
||||||
rendering/batching/lights
|
#### rendering/batching/lights
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
- `scissor_area_threshold
|
- `scissor_area_threshold
|
||||||
` -
|
` -
|
||||||
@ -372,8 +353,7 @@ rendering/batching/lights
|
|||||||
costs and benefits may be project dependent, and hence the best value to use
|
costs and benefits may be project dependent, and hence the best value to use
|
||||||
here.
|
here.
|
||||||
|
|
||||||
rendering/batching/debug
|
#### rendering/batching/debug
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
- `flash_batching
|
- `flash_batching
|
||||||
` -
|
` -
|
||||||
@ -387,8 +367,7 @@ rendering/batching/debug
|
|||||||
This will periodically print a diagnostic batching log to
|
This will periodically print a diagnostic batching log to
|
||||||
the Pandemonium IDE / console.
|
the Pandemonium IDE / console.
|
||||||
|
|
||||||
rendering/batching/precision
|
#### rendering/batching/precision
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
- `uv_contract
|
- `uv_contract
|
||||||
` -
|
` -
|
||||||
@ -405,8 +384,7 @@ rendering/batching/precision
|
|||||||
|
|
||||||
|
|
||||||
|
|
||||||
Diagnostics
|
### Diagnostics
|
||||||
~~~~~~~~~~~
|
|
||||||
|
|
||||||
Although you can change parameters and examine the effect on frame rate, this
|
Although you can change parameters and examine the effect on frame rate, this
|
||||||
can feel like working blindly, with no idea of what is going on under the hood.
|
can feel like working blindly, with no idea of what is going on under the hood.
|
||||||
@ -415,8 +393,7 @@ print out (to the IDE or console) a list of the batches that are being
|
|||||||
processed. This can help pinpoint situations where batching isn't occurring
|
processed. This can help pinpoint situations where batching isn't occurring
|
||||||
as intended, and help you fix these situations to get the best possible performance.
|
as intended, and help you fix these situations to get the best possible performance.
|
||||||
|
|
||||||
Reading a diagnostic
|
#### Reading a diagnostic
|
||||||
^^^^^^^^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
```
|
```
|
||||||
canvas_begin FRAME 2604
|
canvas_begin FRAME 2604
|
||||||
@ -456,8 +433,7 @@ This is a typical diagnostic.
|
|||||||
- **batch D:** A default batch, containing everything else that is not currently
|
- **batch D:** A default batch, containing everything else that is not currently
|
||||||
batched.
|
batched.
|
||||||
|
|
||||||
Default batches
|
#### Default batches
|
||||||
^^^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
The second number following default batches is the number of commands in the
|
The second number following default batches is the number of commands in the
|
||||||
batch, and it is followed by a brief summary of the contents:
|
batch, and it is followed by a brief summary of the contents:
|
||||||
@ -479,19 +455,16 @@ batch, and it is followed by a brief summary of the contents:
|
|||||||
|
|
||||||
You may see "dummy" default batches containing no commands; you can ignore those.
|
You may see "dummy" default batches containing no commands; you can ignore those.
|
||||||
|
|
||||||
Frequently asked questions
|
### Frequently asked questions
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
I don't get a large performance increase when enabling batching.
|
#### I don't get a large performance increase when enabling batching.
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
- Try the diagnostics, see how much batching is occurring, and whether it can be
|
- Try the diagnostics, see how much batching is occurring, and whether it can be
|
||||||
improved
|
improved
|
||||||
- Try changing batching parameters in the Project Settings.
|
- Try changing batching parameters in the Project Settings.
|
||||||
- Consider that batching may not be your bottleneck (see bottlenecks).
|
- Consider that batching may not be your bottleneck (see bottlenecks).
|
||||||
|
|
||||||
I get a decrease in performance with batching.
|
#### I get a decrease in performance with batching.
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
- Try the steps described above to increase the number of batching opportunities.
|
- Try the steps described above to increase the number of batching opportunities.
|
||||||
- Try enabling `single_rect_fallback
|
- Try enabling `single_rect_fallback
|
||||||
@ -502,29 +475,24 @@ I get a decrease in performance with batching.
|
|||||||
- After trying the above, if your scene is still performing worse, consider
|
- After trying the above, if your scene is still performing worse, consider
|
||||||
turning off batching.
|
turning off batching.
|
||||||
|
|
||||||
I use custom shaders and the items are not batching.
|
#### I use custom shaders and the items are not batching.
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
- Custom shaders can be problematic for batching, see the custom shaders section
|
- Custom shaders can be problematic for batching, see the custom shaders section
|
||||||
|
|
||||||
I am seeing line artifacts appear on certain hardware.
|
#### I am seeing line artifacts appear on certain hardware.
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
- See the `uv_contract
|
- See the `uv_contract
|
||||||
`
|
`
|
||||||
project setting which can be used to solve this problem.
|
project setting which can be used to solve this problem.
|
||||||
|
|
||||||
I use a large number of textures, so few items are being batched.
|
#### I use a large number of textures, so few items are being batched.
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
- Consider using texture atlases. As well as allowing batching, these
|
- Consider using texture atlases. As well as allowing batching, these
|
||||||
reduce the need for state changes associated with changing textures.
|
reduce the need for state changes associated with changing textures.
|
||||||
|
|
||||||
Appendix
|
### Appendix
|
||||||
~~~~~~~~
|
|
||||||
|
|
||||||
Batched primitives
|
#### Batched primitives
|
||||||
^^^^^^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
Not all primitives can be batched. Batching is not guaranteed either,
|
Not all primitives can be batched. Batching is not guaranteed either,
|
||||||
especially with primitives using an antialiased border. The following
|
especially with primitives using an antialiased border. The following
|
||||||
@ -541,8 +509,7 @@ See `doc_custom_drawing_in_2d` for more information.
|
|||||||
|
|
||||||
|
|
||||||
|
|
||||||
Light scissoring threshold calculation
|
#### Light scissoring threshold calculation
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
The actual proportion of screen pixel area used as the threshold is the
|
The actual proportion of screen pixel area used as the threshold is the
|
||||||
`scissor_area_threshold
|
`scissor_area_threshold
|
||||||
|
@ -1,21 +1,14 @@
|
|||||||
.. meta::
|
|
||||||
:keywords: optimization
|
|
||||||
|
|
||||||
|
# Optimizing 3D performance
|
||||||
|
|
||||||
|
# Culling
|
||||||
Optimizing 3D performance
|
|
||||||
=========================
|
|
||||||
|
|
||||||
Culling
|
|
||||||
=======
|
|
||||||
|
|
||||||
Pandemonium will automatically perform view frustum culling in order to prevent
|
Pandemonium will automatically perform view frustum culling in order to prevent
|
||||||
rendering objects that are outside the viewport. This works well for games that
|
rendering objects that are outside the viewport. This works well for games that
|
||||||
take place in a small area, however things can quickly become problematic in
|
take place in a small area, however things can quickly become problematic in
|
||||||
larger levels.
|
larger levels.
|
||||||
|
|
||||||
Occlusion culling
|
### Occlusion culling
|
||||||
~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Walking around a town for example, you may only be able to see a few buildings
|
Walking around a town for example, you may only be able to see a few buildings
|
||||||
in the street you are in, as well as the sky and a few birds flying overhead. As
|
in the street you are in, as well as the sky and a few birds flying overhead. As
|
||||||
@ -44,8 +37,7 @@ It is a very powerful technique for speeding up rendering. You can also use it t
|
|||||||
restrict physics or AI to the local area, and speed these up as well as
|
restrict physics or AI to the local area, and speed these up as well as
|
||||||
rendering.
|
rendering.
|
||||||
|
|
||||||
Portal Rendering
|
### Portal Rendering
|
||||||
~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
However, there is a much easier way to take advantage of occlusion. Pandemonium features
|
However, there is a much easier way to take advantage of occlusion. Pandemonium features
|
||||||
an advanced portal rendering system, which can perform occlusion culling from cameras and
|
an advanced portal rendering system, which can perform occlusion culling from cameras and
|
||||||
@ -62,15 +54,13 @@ Note:
|
|||||||
from seeing too far away, which would decrease performance due to the lost
|
from seeing too far away, which would decrease performance due to the lost
|
||||||
opportunies for occlusion culling.
|
opportunies for occlusion culling.
|
||||||
|
|
||||||
Other occlusion techniques
|
### Other occlusion techniques
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
As well as the portal system and manual methods, there are various other occlusion
|
As well as the portal system and manual methods, there are various other occlusion
|
||||||
techniques such as raster-based occlusion culling. Some of these may be available
|
techniques such as raster-based occlusion culling. Some of these may be available
|
||||||
through add-ons or may be available in core Pandemonium in the future.
|
through add-ons or may be available in core Pandemonium in the future.
|
||||||
|
|
||||||
Transparent objects
|
### Transparent objects
|
||||||
~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Pandemonium sorts objects by `Material` and `Shader
|
Pandemonium sorts objects by `Material` and `Shader
|
||||||
( Shader )` to improve performance. This, however, can not be done with
|
( Shader )` to improve performance. This, however, can not be done with
|
||||||
@ -83,8 +73,7 @@ with its own material.
|
|||||||
For more information, see the `GPU optimizations ( doc_gpu_optimization )`
|
For more information, see the `GPU optimizations ( doc_gpu_optimization )`
|
||||||
doc.
|
doc.
|
||||||
|
|
||||||
Level of detail (LOD)
|
# Level of detail (LOD)
|
||||||
=====================
|
|
||||||
|
|
||||||
In some situations, particularly at a distance, it can be a good idea to
|
In some situations, particularly at a distance, it can be a good idea to
|
||||||
**replace complex geometry with simpler versions**. The end user will probably
|
**replace complex geometry with simpler versions**. The end user will probably
|
||||||
@ -93,8 +82,7 @@ in the far distance. There are several strategies for replacing models at
|
|||||||
varying distance. You could use lower poly models, or use transparency to
|
varying distance. You could use lower poly models, or use transparency to
|
||||||
simulate more complex geometry.
|
simulate more complex geometry.
|
||||||
|
|
||||||
Billboards and imposters
|
### Billboards and imposters
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
The simplest version of using transparency to deal with LOD is billboards. For
|
The simplest version of using transparency to deal with LOD is billboards. For
|
||||||
example, you can use a single transparent quad to represent a tree at distance.
|
example, you can use a single transparent quad to represent a tree at distance.
|
||||||
@ -113,8 +101,7 @@ the viewer a considerable distance for the angle of view to change
|
|||||||
significantly. This can be complex to get working, but may be worth it depending
|
significantly. This can be complex to get working, but may be worth it depending
|
||||||
on the type of project you are making.
|
on the type of project you are making.
|
||||||
|
|
||||||
Use instancing (MultiMesh)
|
### Use instancing (MultiMesh)
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
If several identical objects have to be drawn in the same place or nearby, try
|
If several identical objects have to be drawn in the same place or nearby, try
|
||||||
using `MultiMesh` instead. MultiMesh allows the drawing
|
using `MultiMesh` instead. MultiMesh allows the drawing
|
||||||
@ -124,8 +111,7 @@ identical objects.
|
|||||||
|
|
||||||
Also see the `Using MultiMesh ( doc_using_multimesh )` doc.
|
Also see the `Using MultiMesh ( doc_using_multimesh )` doc.
|
||||||
|
|
||||||
Bake lighting
|
# Bake lighting
|
||||||
=============
|
|
||||||
|
|
||||||
Lighting objects is one of the most costly rendering operations. Realtime
|
Lighting objects is one of the most costly rendering operations. Realtime
|
||||||
lighting, shadows (especially multiple lights), and GI are especially expensive.
|
lighting, shadows (especially multiple lights), and GI are especially expensive.
|
||||||
@ -139,15 +125,13 @@ In general, if several lights need to affect a scene, it's best to use
|
|||||||
`doc_baked_lightmaps`. Baking can also improve the scene quality by adding
|
`doc_baked_lightmaps`. Baking can also improve the scene quality by adding
|
||||||
indirect light bounces.
|
indirect light bounces.
|
||||||
|
|
||||||
Animation and skinning
|
# Animation and skinning
|
||||||
======================
|
|
||||||
|
|
||||||
Animation and vertex animation such as skinning and morphing can be very
|
Animation and vertex animation such as skinning and morphing can be very
|
||||||
expensive on some platforms. You may need to lower the polycount considerably
|
expensive on some platforms. You may need to lower the polycount considerably
|
||||||
for animated models or limit the number of them on screen at any one time.
|
for animated models or limit the number of them on screen at any one time.
|
||||||
|
|
||||||
Large worlds
|
# Large worlds
|
||||||
============
|
|
||||||
|
|
||||||
If you are making large worlds, there are different considerations than what you
|
If you are making large worlds, there are different considerations than what you
|
||||||
may be familiar with from smaller games.
|
may be familiar with from smaller games.
|
||||||
|
@ -1,7 +1,6 @@
|
|||||||
|
|
||||||
|
|
||||||
Optimization using Servers
|
# Optimization using Servers
|
||||||
==========================
|
|
||||||
|
|
||||||
Engines like Pandemonium provide increased ease of use thanks to their high level constructs and features.
|
Engines like Pandemonium provide increased ease of use thanks to their high level constructs and features.
|
||||||
Most of them are accessed and used via the `Scene System( doc_scene_tree )`. Using nodes and
|
Most of them are accessed and used via the `Scene System( doc_scene_tree )`. Using nodes and
|
||||||
@ -23,8 +22,7 @@ back to a more handcrafted, low level implementation of game code.
|
|||||||
|
|
||||||
Still, Pandemonium is designed to work around this problem.
|
Still, Pandemonium is designed to work around this problem.
|
||||||
|
|
||||||
Servers
|
## Servers
|
||||||
-------
|
|
||||||
|
|
||||||
One of the most interesting design decisions for Pandemonium is the fact that the whole scene system is
|
One of the most interesting design decisions for Pandemonium is the fact that the whole scene system is
|
||||||
*optional*. While it is not currently possible to compile it out, it can be completely bypassed.
|
*optional*. While it is not currently possible to compile it out, it can be completely bypassed.
|
||||||
@ -41,8 +39,7 @@ The most common servers are:
|
|||||||
Explore their APIs and you will realize that all the functions provided are low-level
|
Explore their APIs and you will realize that all the functions provided are low-level
|
||||||
implementations of everything Pandemonium allows you to do.
|
implementations of everything Pandemonium allows you to do.
|
||||||
|
|
||||||
RIDs
|
## RIDs
|
||||||
----
|
|
||||||
|
|
||||||
The key to using servers is understanding Resource ID (`RID`) objects. These are opaque
|
The key to using servers is understanding Resource ID (`RID`) objects. These are opaque
|
||||||
handles to the server implementation. They are allocated and freed manually. Almost every
|
handles to the server implementation. They are allocated and freed manually. Almost every
|
||||||
@ -83,8 +80,7 @@ Try exploring the nodes and resources you are familiar with and find the functio
|
|||||||
It is not advised to control RIDs from objects that already have a node associated. Instead, server
|
It is not advised to control RIDs from objects that already have a node associated. Instead, server
|
||||||
functions should always be used for creating and controlling new ones and interacting with the existing ones.
|
functions should always be used for creating and controlling new ones and interacting with the existing ones.
|
||||||
|
|
||||||
Creating a sprite
|
## Creating a sprite
|
||||||
-----------------
|
|
||||||
|
|
||||||
This is a simple example of how to create a sprite from code and move it using the low-level
|
This is a simple example of how to create a sprite from code and move it using the low-level
|
||||||
`CanvasItem` API.
|
`CanvasItem` API.
|
||||||
@ -127,8 +123,7 @@ gdscript GDScript
|
|||||||
VisualServer.canvas_item_clear(ci_rid)
|
VisualServer.canvas_item_clear(ci_rid)
|
||||||
```
|
```
|
||||||
|
|
||||||
Instantiating a Mesh into 3D space
|
## Instantiating a Mesh into 3D space
|
||||||
----------------------------------
|
|
||||||
|
|
||||||
The 3D APIs are different from the 2D ones, so the instantiation API must be used.
|
The 3D APIs are different from the 2D ones, so the instantiation API must be used.
|
||||||
|
|
||||||
@ -158,8 +153,7 @@ gdscript GDScript
|
|||||||
VisualServer.instance_set_transform(instance, xform)
|
VisualServer.instance_set_transform(instance, xform)
|
||||||
```
|
```
|
||||||
|
|
||||||
Creating a 2D RigidBody and moving a sprite with it
|
## Creating a 2D RigidBody and moving a sprite with it
|
||||||
---------------------------------------------------
|
|
||||||
|
|
||||||
This creates a `RigidBody2D` API,
|
This creates a `RigidBody2D` API,
|
||||||
and moves a `CanvasItem` when the body moves.
|
and moves a `CanvasItem` when the body moves.
|
||||||
@ -200,8 +194,7 @@ gdscript GDScript
|
|||||||
The 3D version should be very similar, as 2D and 3D physics servers are identical (using
|
The 3D version should be very similar, as 2D and 3D physics servers are identical (using
|
||||||
`RigidBody` respectively).
|
`RigidBody` respectively).
|
||||||
|
|
||||||
Getting data from the servers
|
## Getting data from the servers
|
||||||
-----------------------------
|
|
||||||
|
|
||||||
Try to **never** request any information from `VisualServer`, `PhysicsServer` or `Physics2DServer`
|
Try to **never** request any information from `VisualServer`, `PhysicsServer` or `Physics2DServer`
|
||||||
by calling functions unless you know what you are doing. These servers will often run asynchronously
|
by calling functions unless you know what you are doing. These servers will often run asynchronously
|
||||||
|
Before Width: | Height: | Size: 45 KiB After Width: | Height: | Size: 45 KiB |
@ -1,10 +1,8 @@
|
|||||||
|
|
||||||
|
|
||||||
Using multiple threads
|
# Using multiple threads
|
||||||
======================
|
|
||||||
|
|
||||||
Threads
|
## Threads
|
||||||
-------
|
|
||||||
|
|
||||||
Threads allow simultaneous execution of code. It allows off-loading work
|
Threads allow simultaneous execution of code. It allows off-loading work
|
||||||
from the main thread.
|
from the main thread.
|
||||||
@ -21,8 +19,7 @@ Warning:
|
|||||||
Before using a built-in class in a thread, read `doc_thread_safe_apis`
|
Before using a built-in class in a thread, read `doc_thread_safe_apis`
|
||||||
first to check whether it can be safely used in a thread.
|
first to check whether it can be safely used in a thread.
|
||||||
|
|
||||||
Creating a Thread
|
## Creating a Thread
|
||||||
-----------------
|
|
||||||
|
|
||||||
Creating a thread is very simple, just use the following code:
|
Creating a thread is very simple, just use the following code:
|
||||||
|
|
||||||
@ -56,8 +53,7 @@ Even if the function has returned already, the thread must collect it, so call
|
|||||||
`Thread.wait_to_finish()( Thread_method_wait_to_finish )`, which will
|
`Thread.wait_to_finish()( Thread_method_wait_to_finish )`, which will
|
||||||
wait until the thread is done (if not done yet), then properly dispose of it.
|
wait until the thread is done (if not done yet), then properly dispose of it.
|
||||||
|
|
||||||
Mutexes
|
## Mutexes
|
||||||
-------
|
|
||||||
|
|
||||||
Accessing objects or data from multiple threads is not always supported (if you
|
Accessing objects or data from multiple threads is not always supported (if you
|
||||||
do it, it will cause unexpected behaviors or crashes). Read the
|
do it, it will cause unexpected behaviors or crashes). Read the
|
||||||
@ -111,8 +107,7 @@ gdscript GDScript
|
|||||||
print("Counter is: ", counter) # Should be 2.
|
print("Counter is: ", counter) # Should be 2.
|
||||||
```
|
```
|
||||||
|
|
||||||
Semaphores
|
## Semaphores
|
||||||
----------
|
|
||||||
|
|
||||||
Sometimes you want your thread to work *"on demand"*. In other words, tell it
|
Sometimes you want your thread to work *"on demand"*. In other words, tell it
|
||||||
when to work and let it suspend when it isn't doing anything.
|
when to work and let it suspend when it isn't doing anything.
|
||||||
|
@ -1,25 +1,21 @@
|
|||||||
|
|
||||||
|
|
||||||
Thread-safe APIs
|
# Thread-safe APIs
|
||||||
================
|
|
||||||
|
|
||||||
Threads
|
## Threads
|
||||||
-------
|
|
||||||
|
|
||||||
Threads are used to balance processing power across CPUs and cores.
|
Threads are used to balance processing power across CPUs and cores.
|
||||||
Pandemonium supports multithreading, but not in the whole engine.
|
Pandemonium supports multithreading, but not in the whole engine.
|
||||||
|
|
||||||
Below is a list of ways multithreading can be used in different areas of Pandemonium.
|
Below is a list of ways multithreading can be used in different areas of Pandemonium.
|
||||||
|
|
||||||
Global scope
|
## Global scope
|
||||||
------------
|
|
||||||
|
|
||||||
`Global Scope( @GlobalScope )` singletons are all thread-safe. Accessing servers from threads is supported (for VisualServer and Physics servers, ensure threaded or thread-safe operation is enabled in the project settings!).
|
`Global Scope( @GlobalScope )` singletons are all thread-safe. Accessing servers from threads is supported (for VisualServer and Physics servers, ensure threaded or thread-safe operation is enabled in the project settings!).
|
||||||
|
|
||||||
This makes them ideal for code that creates dozens of thousands of instances in servers and controls them from threads. Of course, it requires a bit more code, as this is used directly and not within the scene tree.
|
This makes them ideal for code that creates dozens of thousands of instances in servers and controls them from threads. Of course, it requires a bit more code, as this is used directly and not within the scene tree.
|
||||||
|
|
||||||
Scene tree
|
## Scene tree
|
||||||
----------
|
|
||||||
|
|
||||||
Interacting with the active scene tree is **NOT** thread-safe. Make sure to use mutexes when sending data between threads. If you want to call functions from a thread, the *call_deferred* function may be used:
|
Interacting with the active scene tree is **NOT** thread-safe. Make sure to use mutexes when sending data between threads. If you want to call functions from a thread, the *call_deferred* function may be used:
|
||||||
|
|
||||||
@ -49,8 +45,7 @@ you are doing and you are sure that a single resource is not being used or
|
|||||||
set in multiple ones. Otherwise, you are safer just using the servers API
|
set in multiple ones. Otherwise, you are safer just using the servers API
|
||||||
(which is fully thread-safe) directly and not touching scene or resources.
|
(which is fully thread-safe) directly and not touching scene or resources.
|
||||||
|
|
||||||
Rendering
|
## Rendering
|
||||||
---------
|
|
||||||
|
|
||||||
Instancing nodes that render anything in 2D or 3D (such as Sprite) is *not* thread-safe by default.
|
Instancing nodes that render anything in 2D or 3D (such as Sprite) is *not* thread-safe by default.
|
||||||
To make rendering thread-safe, set the **Rendering > Threads > Thread Model** project setting to **Multi-Threaded**.
|
To make rendering thread-safe, set the **Rendering > Threads > Thread Model** project setting to **Multi-Threaded**.
|
||||||
@ -58,12 +53,10 @@ To make rendering thread-safe, set the **Rendering > Threads > Thread Model** pr
|
|||||||
Note that the Multi-Threaded thread model has several known bugs, so it may not be usable
|
Note that the Multi-Threaded thread model has several known bugs, so it may not be usable
|
||||||
in all scenarios.
|
in all scenarios.
|
||||||
|
|
||||||
GDScript arrays, dictionaries
|
## GDScript arrays, dictionaries
|
||||||
-----------------------------
|
|
||||||
|
|
||||||
In GDScript, reading and writing elements from multiple threads is OK, but anything that changes the container size (resizing, adding or removing elements) requires locking a mutex.
|
In GDScript, reading and writing elements from multiple threads is OK, but anything that changes the container size (resizing, adding or removing elements) requires locking a mutex.
|
||||||
|
|
||||||
Resources
|
## Resources
|
||||||
---------
|
|
||||||
|
|
||||||
Modifying a unique resource from multiple threads is not supported. However handling references on multiple threads is supported, hence loading resources on a thread is as well - scenes, textures, meshes, etc - can be loaded and manipulated on a thread and then added to the active scene on the main thread. The limitation here is as described above, one must be careful not to load the same resource from multiple threads at once, therefore it is easiest to use **one** thread for loading and modifying resources, and then the main thread for adding them.
|
Modifying a unique resource from multiple threads is not supported. However handling references on multiple threads is supported, hence loading resources on a thread is as well - scenes, textures, meshes, etc - can be loaded and manipulated on a thread and then added to the active scene on the main thread. The limitation here is as described above, one must be careful not to load the same resource from multiple threads at once, therefore it is easiest to use **one** thread for loading and modifying resources, and then the main thread for adding them.
|
||||||
|
@ -1,7 +1,6 @@
|
|||||||
|
|
||||||
|
|
||||||
Animating thousands of fish with MultiMeshInstance
|
# Animating thousands of fish with MultiMeshInstance
|
||||||
==================================================
|
|
||||||
|
|
||||||
This tutorial explores a technique used in the game `ABZU ( https://www.gdcvault.com/play/1024409/Creating-the-Art-of-ABZ )`
|
This tutorial explores a technique used in the game `ABZU ( https://www.gdcvault.com/play/1024409/Creating-the-Art-of-ABZ )`
|
||||||
for rendering and animating thousands of fish using vertex animation and
|
for rendering and animating thousands of fish using vertex animation and
|
||||||
@ -14,8 +13,7 @@ can render thousands of animated objects, even on low end hardware.
|
|||||||
We will start by animating one fish. Then, we will see how to extend that animation to
|
We will start by animating one fish. Then, we will see how to extend that animation to
|
||||||
thousands of fish.
|
thousands of fish.
|
||||||
|
|
||||||
Animating one Fish
|
## Animating one Fish
|
||||||
------------------
|
|
||||||
|
|
||||||
We will start with a single fish. Load your fish model into a `MeshInstance`
|
We will start with a single fish. Load your fish model into a `MeshInstance`
|
||||||
and add a new `ShaderMaterial`.
|
and add a new `ShaderMaterial`.
|
||||||
@ -179,8 +177,7 @@ Putting the four motions together gives us the final animation.
|
|||||||
Go ahead and play with the uniforms in order to alter the swim cycle of the fish. You will
|
Go ahead and play with the uniforms in order to alter the swim cycle of the fish. You will
|
||||||
find that you can create a wide variety of swim styles using these four motions.
|
find that you can create a wide variety of swim styles using these four motions.
|
||||||
|
|
||||||
Making a school of fish
|
## Making a school of fish
|
||||||
-----------------------
|
|
||||||
|
|
||||||
Pandemonium makes it easy to render thousands of the same object using a MultiMeshInstance node.
|
Pandemonium makes it easy to render thousands of the same object using a MultiMeshInstance node.
|
||||||
|
|
||||||
@ -235,8 +232,7 @@ Notice how all the fish are all in the same position in their swim cycle? It mak
|
|||||||
robotic. The next step is to give each fish a different position in the swim cycle so the entire
|
robotic. The next step is to give each fish a different position in the swim cycle so the entire
|
||||||
school looks more organic.
|
school looks more organic.
|
||||||
|
|
||||||
Animating a school of fish
|
## Animating a school of fish
|
||||||
--------------------------
|
|
||||||
|
|
||||||
One of the benefits of animating the fish using `cos` functions is that they are animated with
|
One of the benefits of animating the fish using `cos` functions is that they are animated with
|
||||||
one parameter, `time`. In order to give each fish a unique position in the
|
one parameter, `time`. In order to give each fish a unique position in the
|
||||||
@ -246,6 +242,7 @@ We do that by adding the per-instance custom value `INSTANCE_CUSTOM` to `time`.
|
|||||||
|
|
||||||
```
|
```
|
||||||
float time = (TIME * time_scale) + (6.28318 * INSTANCE_CUSTOM.x);
|
float time = (TIME * time_scale) + (6.28318 * INSTANCE_CUSTOM.x);
|
||||||
|
```
|
||||||
|
|
||||||
Next, we need to pass a value into `INSTANCE_CUSTOM`. We do that by adding one line into
|
Next, we need to pass a value into `INSTANCE_CUSTOM`. We do that by adding one line into
|
||||||
the `for` loop from above. In the `for` loop we assign each instance a set of four
|
the `for` loop from above. In the `for` loop we assign each instance a set of four
|
||||||
|
@ -1,7 +1,6 @@
|
|||||||
|
|
||||||
|
|
||||||
Controlling thousands of fish with Particles
|
# Controlling thousands of fish with Particles
|
||||||
============================================
|
|
||||||
|
|
||||||
The problem with `MeshInstances` is that it is expensive to
|
The problem with `MeshInstances` is that it is expensive to
|
||||||
update their transform array. It is great for placing many static objects around the
|
update their transform array. It is great for placing many static objects around the
|
||||||
|
Loading…
Reference in New Issue
Block a user