Cleanups.

This commit is contained in:
Relintai 2024-05-02 20:12:45 +02:00
parent 94c244f458
commit a29c7f8dd0
12 changed files with 125 additions and 248 deletions

View File

@ -1,10 +1,8 @@
General optimization tips
=========================
# General optimization tips
Introduction
~~~~~~~~~~~~
### Introduction
In an ideal world, computers would run at infinite speed. The only limit to
what we could achieve would be our imagination. However, in the real world, it's
@ -22,16 +20,14 @@ To achieve the best results, we have two approaches:
And preferably, we will use a blend of the two.
Smoke and mirrors
^^^^^^^^^^^^^^^^^
#### Smoke and mirrors
Part of working smarter is recognizing that, in games, we can often get the
player to believe they're in a world that is far more complex, interactive, and
graphically exciting than it really is. A good programmer is a magician, and
should strive to learn the tricks of the trade while trying to invent new ones.
The nature of slowness
^^^^^^^^^^^^^^^^^^^^^^
#### The nature of slowness
To the outside observer, performance problems are often lumped together.
But in reality, there are several different kinds of performance problems:
@ -45,8 +41,7 @@ But in reality, there are several different kinds of performance problems:
Each of these are annoying to the user, but in different ways.
Measuring performance
=====================
# Measuring performance
Probably the most important tool for optimization is the ability to measure
performance - to identify where bottlenecks are, and to measure the success of
@ -66,8 +61,7 @@ Be very aware that the relative performance of different areas can vary on
different hardware. It's often a good idea to measure timings on more than one
device. This is especially the case if you're targeting mobile devices.
Limitations
~~~~~~~~~~~
### Limitations
CPU profilers are often the go-to method for measuring performance. However,
they don't always tell the whole story.
@ -84,15 +78,13 @@ they don't always tell the whole story.
As a result of these limitations, you often need to use detective work to find
out where bottlenecks are.
Detective work
~~~~~~~~~~~~~~
### Detective work
Detective work is a crucial skill for developers (both in terms of performance,
and also in terms of bug fixing). This can include hypothesis testing, and
binary search.
Hypothesis testing
^^^^^^^^^^^^^^^^^^
#### Hypothesis testing
Say, for example, that you believe sprites are slowing down your game.
You can test this hypothesis by:
@ -105,8 +97,7 @@ the performance drop?
- You can test this by keeping everything the same, but changing the sprite
size, and measuring performance.
Binary search
^^^^^^^^^^^^^
#### Binary search
If you know that frames are taking much longer than they should, but you're
not sure where the bottleneck lies. You could begin by commenting out
@ -116,8 +107,7 @@ performance improved more or less than expected?
Once you know which of the two halves contains the bottleneck, you can
repeat this process until you've pinned down the problematic area.
Profilers
=========
# Profilers
Profilers allow you to time your program while running it. Profilers then
provide results telling you what percentage of time was spent in different
@ -130,8 +120,7 @@ and lead to slower performance.
For more info about using Pandemonium's built-in profiler, see `doc_debugger_panel`.
Principles
==========
# Principles
`Donald Knuth ( https://en.wikipedia.org/wiki/Donald_Knuth )` said:
@ -160,8 +149,7 @@ One misleading aspect of the quote is that people tend to focus on the subquote
optimization is (by definition) undesirable, performant software is the result
of performant design.
Performant design
~~~~~~~~~~~~~~~~~
### Performant design
The danger with encouraging people to ignore optimization until necessary, is
that it conveniently ignores that the most important time to consider
@ -175,8 +163,7 @@ general programming. A performant design, even without low-level optimization,
will often run many times faster than a mediocre design with low-level
optimization.
Incremental design
~~~~~~~~~~~~~~~~~~
### Incremental design
Of course, in practice, unless you have prior knowledge, you are unlikely to
come up with the best design the first time. Instead, you'll often make a series
@ -192,8 +179,7 @@ to a resurgence in data-oriented design, which involves designing data
structures and algorithms for *cache locality* of data and linear access, rather
than jumping around in memory.
The optimization process
~~~~~~~~~~~~~~~~~~~~~~~~
### The optimization process
Assuming we have a reasonable design, and taking our lessons from Knuth, our
first step in optimization should be to identify the biggest bottlenecks - the
@ -209,8 +195,7 @@ The process is thus:
2. Optimize bottleneck.
3. Return to step 1.
Optimizing bottlenecks
~~~~~~~~~~~~~~~~~~~~~~
### Optimizing bottlenecks
Some profilers will even tell you which part of a function (which data accesses,
calculations) are slowing things down.
@ -234,11 +219,9 @@ will increase speed, others may have a negative effect. Sometimes, a small
positive effect will be outweighed by the negatives of more complex code, and
you may choose to leave out that optimization.
Appendix
========
# Appendix
Bottleneck math
~~~~~~~~~~~~~~~
### Bottleneck math
The proverb *"a chain is only as strong as its weakest link"* applies directly to
performance optimization. If your project is spending 90% of the time in

View File

@ -1,10 +1,8 @@
CPU optimization
================
# CPU optimization
Measuring performance
=====================
# Measuring performance
We have to know where the "bottlenecks" are to know how to speed up our program.
Bottlenecks are the slowest parts of the program that limit the rate that
@ -15,8 +13,7 @@ lead to small performance improvements.
For the CPU, the easiest way to identify bottlenecks is to use a profiler.
CPU profilers
=============
# CPU profilers
Profilers run alongside your program and take timing measurements to work out
what proportion of time is spent in each function.
@ -28,11 +25,9 @@ slow down your project significantly.
After profiling, you can look back at the results for a frame.
.. figure:: img/pandemonium_profiler.png)
.. figure:: img/pandemonium_profiler.png)
:alt: Screenshot of the Pandemonium profiler
![Screenshot of the Pandemonium profiler](img/pandemonium_profiler.png)
Results of a profile of one of the demo projects.
Results of a profile of one of the demo projects.
Note:
We can see the cost of built-in processes such as physics and audio,
@ -49,8 +44,7 @@ you can usually increase speed by optimizing this area.
For more info about using Pandemonium's built-in profiler, see
`doc_debugger_panel`.
External profilers
~~~~~~~~~~~~~~~~~~
### External profilers
Although the Pandemonium IDE profiler is very convenient and useful, sometimes you
need more power, and the ability to profile the Pandemonium engine source code itself.
@ -70,10 +64,9 @@ Note:
optimized. Bottlenecks are often in a different place in debug builds,
so you should profile release builds whenever possible.
.. figure:: img/valgrind.png)
:alt: Screenshot of Callgrind
![Screenshot of Callgrind](img/valgrind.png)
Example results from Callgrind, which is part of Valgrind.
Example results from Callgrind, which is part of Valgrind.
From the left, Callgrind is listing the percentage of time within a function and
its children (Inclusive), the percentage of time spent within the function
@ -97,8 +90,7 @@ done in the graphics API. This specific profiling led to the development of 2D
batching, which greatly speeds up 2D rendering by reducing bottlenecks in this
area.
Manually timing functions
=========================
# Manually timing functions
Another handy technique, especially once you have identified the bottleneck
using a profiler, is to manually time the function or area under test.
@ -125,8 +117,7 @@ As you attempt to optimize functions, be sure to either repeatedly profile or
time them as you go. This will give you crucial feedback as to whether the
optimization is working (or not).
Caches
======
# Caches
CPU caches are something else to be particularly aware of, especially when
comparing timing results of two different versions of a function. The results
@ -156,10 +147,9 @@ will be able to work as fast as possible.
Pandemonium usually takes care of such low-level details for you. For example, the
Server APIs make sure data is optimized for caching already for things like
rendering and physics. Still, you should be especially aware of caching when
using `GDNative <toc-tutorials-gdnative )`.
using `GDNative ( toc-tutorials-gdnative )`.
Languages
=========
# Languages
Pandemonium supports a number of different languages, and it is worth bearing in mind
that there are trade-offs involved. Some languages are designed for ease of use
@ -169,42 +159,37 @@ Built-in engine functions run at the same speed regardless of the scripting
language you choose. If your project is making a lot of calculations in its own
code, consider moving those calculations to a faster language.
GDScript
~~~~~~~~
### GDScript
`GDScript <toc-learn-scripting-gdscript )` is designed to be easy to use and iterate,
`GDScript (toc-learn-scripting-gdscript )` is designed to be easy to use and iterate,
and is ideal for making many types of games. However, in this language, ease of
use is considered more important than performance. If you need to make heavy
calculations, consider moving some of your project to one of the other
languages.
C#
~~
### C#
`C# <toc-learn-scripting-C# )` is popular and has first-class support in Pandemonium.It
`C# (toc-learn-scripting-C# )` is popular and has first-class support in Pandemonium.It
offers a good compromise between speed and ease of use. Beware of possible
garbage collection pauses and leaks that can occur during gameplay, though. A
common approach to workaround issues with garbage collection is to use *object
pooling*, which is outside the scope of this guide.
Other languages
~~~~~~~~~~~~~~~
### Other languages
Third parties provide support for several other languages, including `Rust
( https://github.com/pandemonium-rust/pandemonium-rust )` and `Javascript
( https://github.com/PandemoniumExplorer/ECMAScript )`.
C++
~~~
### C++
Pandemonium is written in C++. Using C++ will usually result in the fastest code.
However, on a practical level, it is the most difficult to deploy to end users'
machines on different platforms. Options for using C++ include
`GDNative <toc-tutorials-gdnative )` and
`GDNative (toc-tutorials-gdnative )` and
`custom modules ( doc_custom_modules_in_c++ )`.
Threads
=======
# Threads
Consider using threads when making a lot of calculations that can run in
parallel to each other. Modern CPUs have multiple cores, each one capable of
@ -222,8 +207,7 @@ debugger doesn't support setting up breakpoints in threads yet.
For more information on threads, see `doc_using_multiple_threads`.
SceneTree
=========
# SceneTree
Although Nodes are an incredibly powerful and versatile concept, be aware that
every node has a cost. Built-in functions such as `process()` and
@ -244,8 +228,7 @@ This can be very useful for adding and removing areas from a game, for example.
You can avoid the SceneTree altogether by using Server APIs. For more
information, see `doc_using_servers`.
Physics
=======
# Physics
In some situations, physics can end up becoming a bottleneck. This is
particularly the case with complex worlds and large numbers of physics objects.

View File

@ -1,10 +1,8 @@
GPU optimization
================
# GPU optimization
Introduction
~~~~~~~~~~~~
### Introduction
The demand for new graphics features and progress almost guarantees that you
will encounter graphics bottlenecks. Some of these can be on the CPU side, for
@ -22,8 +20,7 @@ indirectly by changing the instructions you give to the GPU. Also, it may be
more difficult to take measurements. In many cases, the only way of measuring
performance is by examining changes in the time spent rendering each frame.
Draw calls, state changes, and APIs
===================================
# Draw calls, state changes, and APIs
Note:
The following section is not relevant to end-users, but is useful to
@ -42,8 +39,7 @@ reduce these instructions to a bare minimum and group together similar objects
as much as possible so they can be rendered together, or with the minimum number
of these expensive state changes.
2D batching
~~~~~~~~~~~
### 2D batching
In 2D, the costs of treating each item individually can be prohibitively high -
there can easily be thousands of them on the screen. This is why 2D *batching*
@ -54,8 +50,7 @@ to a minimum.
For more information on 2D batching, see `doc_batching`.
3D batching
~~~~~~~~~~~
### 3D batching
In 3D, we still aim to minimize draw calls and state changes. However, it can be
more difficult to batch together several objects into a single draw call. 3D
@ -76,8 +71,7 @@ numbers of distant or low-poly objects.
For more information on 3D specific optimizations, see
`doc_optimizing_3d_performance`.
Reuse Shaders and Materials
~~~~~~~~~~~~~~~~~~~~~~~~~~~
### Reuse Shaders and Materials
The Pandemonium renderer is a little different to what is out there. It's designed to
minimize GPU state changes as much as possible. `SpatialMaterial
@ -99,8 +93,7 @@ If a scene has, for example, `20,000` objects with `20,000` different
materials each, rendering will be slow. If the same scene has `20,000`
objects, but only uses `100` materials, rendering will be much faster.
Pixel cost versus vertex cost
=============================
# Pixel cost versus vertex cost
You may have heard that the lower the number of polygons in a model, the faster
it will be rendered. This is *really* relative and depends on many factors.
@ -155,8 +148,7 @@ Pay attention to the additional vertex processing required when using:
- Morphs (shape keys)
- Vertex-lit objects (common on mobile)
Pixel/fragment shaders and fill rate
====================================
# Pixel/fragment shaders and fill rate
In contrast to vertex processing, the costs of fragment (per-pixel) shading have
increased dramatically over the years. Screen resolutions have increased (the
@ -182,8 +174,7 @@ amount of work the GPU has to do. You can do this by simplifying the shader
**When targeting mobile devices, consider using the simplest possible shaders
you can reasonably afford to use.**
Reading textures
~~~~~~~~~~~~~~~~
### Reading textures
The other factor in fragment shaders is the cost of reading textures. Reading
textures is an expensive operation, especially when reading from several
@ -195,8 +186,7 @@ mobiles.
**If you use third-party shaders or write your own shaders, try to use
algorithms that require as few texture reads as possible.**
Texture compression
~~~~~~~~~~~~~~~~~~~
### Texture compression
By default, Pandemonium compresses textures of 3D models when imported using video RAM
(VRAM) compression. Video RAM compression isn't as efficient in size as PNG or
@ -222,8 +212,7 @@ Note:
significantly due to their low resolution.
Post-processing and shadows
~~~~~~~~~~~~~~~~~~~~~~~~~~~
### Post-processing and shadows
Post-processing effects and shadows can also be expensive in terms of fragment
shading activity. Always test the impact of these on different hardware.
@ -234,8 +223,7 @@ performance of shadows is to turn shadows off for as many lights and objects as
possible. Smaller or distant OmniLights/SpotLights can often have their shadows
disabled with only a small visual impact.
Transparency and blending
=========================
# Transparency and blending
Transparent objects present particular problems for rendering efficiency. Opaque
objects (especially in 3D) can be essentially rendered in any order and the
@ -259,8 +247,7 @@ minimize these fill rate requirements, especially on mobile, where fill rate is
very expensive. Indeed, in many situations, rendering more complex opaque
geometry can end up being faster than using transparency to "cheat".
Multi-platform advice
=====================
# Multi-platform advice
If you are aiming to release on multiple platforms, test *early* and test
*often* on all your platforms, especially mobile. Developing a game on desktop
@ -271,8 +258,7 @@ add optional enhancements for more powerful platforms. For example, you may want
to use the GLES2 backend for both desktop and mobile platforms where you target
both.
Mobile/tiled renderers
======================
# Mobile/tiled renderers
As described above, GPUs on mobile devices work in dramatically different ways
from GPUs on desktop. Most mobile devices use tile renderers. Tile renderers

View File

@ -1,7 +1,6 @@
Optimization using MultiMeshes
==============================
# Optimization using MultiMeshes
For large amount of instances (in the thousands), that need to be constantly processed
(and certain amount of control needs to be retained),
@ -11,8 +10,7 @@ When the amount of objects reach the hundreds of thousands or millions,
none of these approaches are efficient anymore. Still, depending on the requirements, there
is one more optimization possible.
MultiMeshes
-----------
## MultiMeshes
A `MultiMesh( MultiMesh )` is a single draw primitive that can draw up to millions
of objects in one go. It's extremely efficient because it uses the GPU hardware to do this
@ -44,8 +42,7 @@ controlled with the `MultiMesh.visible_instance_count`
property. The typical workflow is to allocate the maximum amount of instances that will be used,
then change the amount visible depending on how many are currently needed.
Multimesh example
-----------------
## Multimesh example
Here is an example of using a MultiMesh from code. Languages other than GDScript may be more
efficient for millions of objects, but for a few thousands, GDScript should be fine.

View File

@ -1,10 +1,7 @@
# Optimization using batching
Optimization using batching
===========================
Introduction
~~~~~~~~~~~~
### Introduction
Game engines have to send a set of instructions to the GPU to tell the GPU what
and where to draw. These instructions are sent using common instructions called
@ -16,8 +13,7 @@ of work for the user in the GPU driver at the cost of more expensive draw calls.
As a result, applications can often be sped up by reducing the number of draw
calls.
Draw calls
^^^^^^^^^^
#### Draw calls
In 2D, we need to tell the GPU to render a series of primitives (rectangles,
lines, polygons etc). The most obvious technique is to tell the GPU to render
@ -41,8 +37,7 @@ automatically group together primitives wherever possible and send these batches
on to the GPU. This can give an increase in rendering performance while
requiring few (if any) changes to your Pandemonium project.
How it works
~~~~~~~~~~~~
### How it works
Instructions come into the renderer from your game in the form of a series of
items, each of which can contain one or more commands. The items correspond to
@ -56,8 +51,7 @@ The batcher uses two main techniques to group together primitives:
- Consecutive items can be joined together.
- Consecutive commands within an item can be joined to form a batch.
Breaking batching
^^^^^^^^^^^^^^^^^
#### Breaking batching
Batching can only take place if the items or commands are similar enough to be
rendered in one draw call. Certain changes (or techniques), by necessity, prevent
@ -75,8 +69,7 @@ Note:
For example, if you draw a series of sprites each with a different texture,
there is no way they can be batched.
Determining the rendering order
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#### Determining the rendering order
The question arises, if only similar items can be drawn together in a batch, why
don't we look through all the items in a scene, group together all the similar
@ -104,8 +97,7 @@ Note:
can improve performance in some cases. See the
`doc_batching_diagnostics` section to help you make this decision.
A trick
^^^^^^^
#### A trick
And now, a sleight of hand. Even though the idea of painter's order is that
objects are rendered from back to front, consider 3 objects `A`, `B` and
@ -129,8 +121,7 @@ drawn *on top* of each other. If we relax that assumption, i.e. if none of these
3 objects are overlapping, there is *no need* to preserve painter's order. The
rendered result will be the same. What if we could take advantage of this?
Item reordering
^^^^^^^^^^^^^^^
#### Item reordering
![](img/overlap2.png)
@ -152,8 +143,7 @@ balance the costs and benefits in your project.
Since the texture only changes once, we can render the above in only 2 draw
calls.
Lights
~~~~~~
### Lights
Although the batching system's job is normally quite straightforward, it becomes
considerably more complex when 2D lights are used. This is because lights are
@ -207,8 +197,7 @@ that in a real game, you might be drawing closer to 1,000 sprites.
That is a 1000× decrease in draw calls, and should give a huge increase in
performance.
Overlap test
^^^^^^^^^^^^
#### Overlap test
However, as with the item reordering, things are not that simple. We must first
perform the overlap test to determine whether we can join these primitives. This
@ -222,8 +211,7 @@ therefore shouldn't be joined). In practice, the decrease in draw calls may be
less dramatic than in a perfect situation with no overlapping at all. However,
performance is usually far higher than without this lighting optimization.
Light scissoring
~~~~~~~~~~~~~~~~
### Light scissoring
Batching can make it more difficult to cull out objects that are not affected or
partially affected by a light. This can increase the fill rate requirements
@ -257,14 +245,12 @@ The exact relationship is probably not necessary for users to worry about, but
is included in the appendix out of interest:
`doc_batching_light_scissoring_threshold_calculation`
.. figure:: img/scissoring.png)
:alt: Light scissoring example diagram
![Light scissoring example diagram](img/scissoring.png)
Bottom right is a light, the red area is the pixels saved by the scissoring
operation. Only the intersection needs to be rendered.
Vertex baking
~~~~~~~~~~~~~
### Vertex baking
The GPU shader receives instructions on what to draw in 2 main ways:
@ -290,8 +276,7 @@ In most cases, this works fine, but this shortcut breaks down if a shader expect
these values to be available individually rather than combined. This can happen
in custom shaders.
Custom shaders
^^^^^^^^^^^^^^
#### Custom shaders
As a result of the limitation described above, certain operations in custom
shaders will prevent vertex baking and therefore decrease the potential for
@ -301,8 +286,7 @@ currently apply:
- Reading or writing `COLOR` or `MODULATE` disables vertex color baking.
- Reading `VERTEX` disables vertex position baking.
Project Settings
~~~~~~~~~~~~~~~~
### Project Settings
To fine-tune batching, a number of project settings are available. You can
usually leave these at default during development, but it's a good idea to
@ -311,8 +295,7 @@ tweaking parameters can often give considerable performance gains for very
little effort. See the on-hover tooltips in the Project Settings for more
information.
rendering/batching/options
^^^^^^^^^^^^^^^^^^^^^^^^^^
#### rendering/batching/options
- `use_batching
` -
@ -328,8 +311,7 @@ rendering/batching/options
This is a faster way of drawing unbatchable rectangles. However, it may lead
to flicker on some hardware so it's not recommended.
rendering/batching/parameters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#### rendering/batching/parameters
- `max_join_item_commands` -
One of the most important ways of achieving batching is to join suitable
@ -358,8 +340,7 @@ rendering/batching/parameters
textures. The lookahead for the overlap test has a small cost, so the best
value may change per project.
rendering/batching/lights
^^^^^^^^^^^^^^^^^^^^^^^^^
#### rendering/batching/lights
- `scissor_area_threshold
` -
@ -372,8 +353,7 @@ rendering/batching/lights
costs and benefits may be project dependent, and hence the best value to use
here.
rendering/batching/debug
^^^^^^^^^^^^^^^^^^^^^^^^
#### rendering/batching/debug
- `flash_batching
` -
@ -387,8 +367,7 @@ rendering/batching/debug
This will periodically print a diagnostic batching log to
the Pandemonium IDE / console.
rendering/batching/precision
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#### rendering/batching/precision
- `uv_contract
` -
@ -405,8 +384,7 @@ rendering/batching/precision
Diagnostics
~~~~~~~~~~~
### Diagnostics
Although you can change parameters and examine the effect on frame rate, this
can feel like working blindly, with no idea of what is going on under the hood.
@ -415,8 +393,7 @@ print out (to the IDE or console) a list of the batches that are being
processed. This can help pinpoint situations where batching isn't occurring
as intended, and help you fix these situations to get the best possible performance.
Reading a diagnostic
^^^^^^^^^^^^^^^^^^^^
#### Reading a diagnostic
```
canvas_begin FRAME 2604
@ -456,8 +433,7 @@ This is a typical diagnostic.
- **batch D:** A default batch, containing everything else that is not currently
batched.
Default batches
^^^^^^^^^^^^^^^
#### Default batches
The second number following default batches is the number of commands in the
batch, and it is followed by a brief summary of the contents:
@ -479,19 +455,16 @@ batch, and it is followed by a brief summary of the contents:
You may see "dummy" default batches containing no commands; you can ignore those.
Frequently asked questions
~~~~~~~~~~~~~~~~~~~~~~~~~~
### Frequently asked questions
I don't get a large performance increase when enabling batching.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#### I don't get a large performance increase when enabling batching.
- Try the diagnostics, see how much batching is occurring, and whether it can be
improved
- Try changing batching parameters in the Project Settings.
- Consider that batching may not be your bottleneck (see bottlenecks).
I get a decrease in performance with batching.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#### I get a decrease in performance with batching.
- Try the steps described above to increase the number of batching opportunities.
- Try enabling `single_rect_fallback
@ -502,29 +475,24 @@ I get a decrease in performance with batching.
- After trying the above, if your scene is still performing worse, consider
turning off batching.
I use custom shaders and the items are not batching.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#### I use custom shaders and the items are not batching.
- Custom shaders can be problematic for batching, see the custom shaders section
I am seeing line artifacts appear on certain hardware.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#### I am seeing line artifacts appear on certain hardware.
- See the `uv_contract
`
project setting which can be used to solve this problem.
I use a large number of textures, so few items are being batched.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#### I use a large number of textures, so few items are being batched.
- Consider using texture atlases. As well as allowing batching, these
reduce the need for state changes associated with changing textures.
Appendix
~~~~~~~~
### Appendix
Batched primitives
^^^^^^^^^^^^^^^^^^
#### Batched primitives
Not all primitives can be batched. Batching is not guaranteed either,
especially with primitives using an antialiased border. The following
@ -541,8 +509,7 @@ See `doc_custom_drawing_in_2d` for more information.
Light scissoring threshold calculation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#### Light scissoring threshold calculation
The actual proportion of screen pixel area used as the threshold is the
`scissor_area_threshold

View File

@ -1,21 +1,14 @@
.. meta::
:keywords: optimization
# Optimizing 3D performance
Optimizing 3D performance
=========================
Culling
=======
# Culling
Pandemonium will automatically perform view frustum culling in order to prevent
rendering objects that are outside the viewport. This works well for games that
take place in a small area, however things can quickly become problematic in
larger levels.
Occlusion culling
~~~~~~~~~~~~~~~~~
### Occlusion culling
Walking around a town for example, you may only be able to see a few buildings
in the street you are in, as well as the sky and a few birds flying overhead. As
@ -44,8 +37,7 @@ It is a very powerful technique for speeding up rendering. You can also use it t
restrict physics or AI to the local area, and speed these up as well as
rendering.
Portal Rendering
~~~~~~~~~~~~~~~~
### Portal Rendering
However, there is a much easier way to take advantage of occlusion. Pandemonium features
an advanced portal rendering system, which can perform occlusion culling from cameras and
@ -62,15 +54,13 @@ Note:
from seeing too far away, which would decrease performance due to the lost
opportunies for occlusion culling.
Other occlusion techniques
~~~~~~~~~~~~~~~~~~~~~~~~~~
### Other occlusion techniques
As well as the portal system and manual methods, there are various other occlusion
techniques such as raster-based occlusion culling. Some of these may be available
through add-ons or may be available in core Pandemonium in the future.
Transparent objects
~~~~~~~~~~~~~~~~~~~
### Transparent objects
Pandemonium sorts objects by `Material` and `Shader
( Shader )` to improve performance. This, however, can not be done with
@ -83,8 +73,7 @@ with its own material.
For more information, see the `GPU optimizations ( doc_gpu_optimization )`
doc.
Level of detail (LOD)
=====================
# Level of detail (LOD)
In some situations, particularly at a distance, it can be a good idea to
**replace complex geometry with simpler versions**. The end user will probably
@ -93,8 +82,7 @@ in the far distance. There are several strategies for replacing models at
varying distance. You could use lower poly models, or use transparency to
simulate more complex geometry.
Billboards and imposters
~~~~~~~~~~~~~~~~~~~~~~~~
### Billboards and imposters
The simplest version of using transparency to deal with LOD is billboards. For
example, you can use a single transparent quad to represent a tree at distance.
@ -113,8 +101,7 @@ the viewer a considerable distance for the angle of view to change
significantly. This can be complex to get working, but may be worth it depending
on the type of project you are making.
Use instancing (MultiMesh)
~~~~~~~~~~~~~~~~~~~~~~~~~~
### Use instancing (MultiMesh)
If several identical objects have to be drawn in the same place or nearby, try
using `MultiMesh` instead. MultiMesh allows the drawing
@ -124,8 +111,7 @@ identical objects.
Also see the `Using MultiMesh ( doc_using_multimesh )` doc.
Bake lighting
=============
# Bake lighting
Lighting objects is one of the most costly rendering operations. Realtime
lighting, shadows (especially multiple lights), and GI are especially expensive.
@ -139,15 +125,13 @@ In general, if several lights need to affect a scene, it's best to use
`doc_baked_lightmaps`. Baking can also improve the scene quality by adding
indirect light bounces.
Animation and skinning
======================
# Animation and skinning
Animation and vertex animation such as skinning and morphing can be very
expensive on some platforms. You may need to lower the polycount considerably
for animated models or limit the number of them on screen at any one time.
Large worlds
============
# Large worlds
If you are making large worlds, there are different considerations than what you
may be familiar with from smaller games.

View File

@ -1,7 +1,6 @@
Optimization using Servers
==========================
# Optimization using Servers
Engines like Pandemonium provide increased ease of use thanks to their high level constructs and features.
Most of them are accessed and used via the `Scene System( doc_scene_tree )`. Using nodes and
@ -23,8 +22,7 @@ back to a more handcrafted, low level implementation of game code.
Still, Pandemonium is designed to work around this problem.
Servers
-------
## Servers
One of the most interesting design decisions for Pandemonium is the fact that the whole scene system is
*optional*. While it is not currently possible to compile it out, it can be completely bypassed.
@ -41,8 +39,7 @@ The most common servers are:
Explore their APIs and you will realize that all the functions provided are low-level
implementations of everything Pandemonium allows you to do.
RIDs
----
## RIDs
The key to using servers is understanding Resource ID (`RID`) objects. These are opaque
handles to the server implementation. They are allocated and freed manually. Almost every
@ -83,8 +80,7 @@ Try exploring the nodes and resources you are familiar with and find the functio
It is not advised to control RIDs from objects that already have a node associated. Instead, server
functions should always be used for creating and controlling new ones and interacting with the existing ones.
Creating a sprite
-----------------
## Creating a sprite
This is a simple example of how to create a sprite from code and move it using the low-level
`CanvasItem` API.
@ -127,8 +123,7 @@ gdscript GDScript
VisualServer.canvas_item_clear(ci_rid)
```
Instantiating a Mesh into 3D space
----------------------------------
## Instantiating a Mesh into 3D space
The 3D APIs are different from the 2D ones, so the instantiation API must be used.
@ -158,8 +153,7 @@ gdscript GDScript
VisualServer.instance_set_transform(instance, xform)
```
Creating a 2D RigidBody and moving a sprite with it
---------------------------------------------------
## Creating a 2D RigidBody and moving a sprite with it
This creates a `RigidBody2D` API,
and moves a `CanvasItem` when the body moves.
@ -200,8 +194,7 @@ gdscript GDScript
The 3D version should be very similar, as 2D and 3D physics servers are identical (using
`RigidBody` respectively).
Getting data from the servers
-----------------------------
## Getting data from the servers
Try to **never** request any information from `VisualServer`, `PhysicsServer` or `Physics2DServer`
by calling functions unless you know what you are doing. These servers will often run asynchronously

View File

Before

Width:  |  Height:  |  Size: 45 KiB

After

Width:  |  Height:  |  Size: 45 KiB

View File

@ -1,10 +1,8 @@
Using multiple threads
======================
# Using multiple threads
Threads
-------
## Threads
Threads allow simultaneous execution of code. It allows off-loading work
from the main thread.
@ -21,8 +19,7 @@ Warning:
Before using a built-in class in a thread, read `doc_thread_safe_apis`
first to check whether it can be safely used in a thread.
Creating a Thread
-----------------
## Creating a Thread
Creating a thread is very simple, just use the following code:
@ -56,8 +53,7 @@ Even if the function has returned already, the thread must collect it, so call
`Thread.wait_to_finish()( Thread_method_wait_to_finish )`, which will
wait until the thread is done (if not done yet), then properly dispose of it.
Mutexes
-------
## Mutexes
Accessing objects or data from multiple threads is not always supported (if you
do it, it will cause unexpected behaviors or crashes). Read the
@ -111,8 +107,7 @@ gdscript GDScript
print("Counter is: ", counter) # Should be 2.
```
Semaphores
----------
## Semaphores
Sometimes you want your thread to work *"on demand"*. In other words, tell it
when to work and let it suspend when it isn't doing anything.

View File

@ -1,25 +1,21 @@
Thread-safe APIs
================
# Thread-safe APIs
Threads
-------
## Threads
Threads are used to balance processing power across CPUs and cores.
Pandemonium supports multithreading, but not in the whole engine.
Below is a list of ways multithreading can be used in different areas of Pandemonium.
Global scope
------------
## Global scope
`Global Scope( @GlobalScope )` singletons are all thread-safe. Accessing servers from threads is supported (for VisualServer and Physics servers, ensure threaded or thread-safe operation is enabled in the project settings!).
This makes them ideal for code that creates dozens of thousands of instances in servers and controls them from threads. Of course, it requires a bit more code, as this is used directly and not within the scene tree.
Scene tree
----------
## Scene tree
Interacting with the active scene tree is **NOT** thread-safe. Make sure to use mutexes when sending data between threads. If you want to call functions from a thread, the *call_deferred* function may be used:
@ -49,8 +45,7 @@ you are doing and you are sure that a single resource is not being used or
set in multiple ones. Otherwise, you are safer just using the servers API
(which is fully thread-safe) directly and not touching scene or resources.
Rendering
---------
## Rendering
Instancing nodes that render anything in 2D or 3D (such as Sprite) is *not* thread-safe by default.
To make rendering thread-safe, set the **Rendering > Threads > Thread Model** project setting to **Multi-Threaded**.
@ -58,12 +53,10 @@ To make rendering thread-safe, set the **Rendering > Threads > Thread Model** pr
Note that the Multi-Threaded thread model has several known bugs, so it may not be usable
in all scenarios.
GDScript arrays, dictionaries
-----------------------------
## GDScript arrays, dictionaries
In GDScript, reading and writing elements from multiple threads is OK, but anything that changes the container size (resizing, adding or removing elements) requires locking a mutex.
Resources
---------
## Resources
Modifying a unique resource from multiple threads is not supported. However handling references on multiple threads is supported, hence loading resources on a thread is as well - scenes, textures, meshes, etc - can be loaded and manipulated on a thread and then added to the active scene on the main thread. The limitation here is as described above, one must be careful not to load the same resource from multiple threads at once, therefore it is easiest to use **one** thread for loading and modifying resources, and then the main thread for adding them.

View File

@ -1,7 +1,6 @@
Animating thousands of fish with MultiMeshInstance
==================================================
# Animating thousands of fish with MultiMeshInstance
This tutorial explores a technique used in the game `ABZU ( https://www.gdcvault.com/play/1024409/Creating-the-Art-of-ABZ )`
for rendering and animating thousands of fish using vertex animation and
@ -14,8 +13,7 @@ can render thousands of animated objects, even on low end hardware.
We will start by animating one fish. Then, we will see how to extend that animation to
thousands of fish.
Animating one Fish
------------------
## Animating one Fish
We will start with a single fish. Load your fish model into a `MeshInstance`
and add a new `ShaderMaterial`.
@ -179,8 +177,7 @@ Putting the four motions together gives us the final animation.
Go ahead and play with the uniforms in order to alter the swim cycle of the fish. You will
find that you can create a wide variety of swim styles using these four motions.
Making a school of fish
-----------------------
## Making a school of fish
Pandemonium makes it easy to render thousands of the same object using a MultiMeshInstance node.
@ -235,8 +232,7 @@ Notice how all the fish are all in the same position in their swim cycle? It mak
robotic. The next step is to give each fish a different position in the swim cycle so the entire
school looks more organic.
Animating a school of fish
--------------------------
## Animating a school of fish
One of the benefits of animating the fish using `cos` functions is that they are animated with
one parameter, `time`. In order to give each fish a unique position in the
@ -246,6 +242,7 @@ We do that by adding the per-instance custom value `INSTANCE_CUSTOM` to `time`.
```
float time = (TIME * time_scale) + (6.28318 * INSTANCE_CUSTOM.x);
```
Next, we need to pass a value into `INSTANCE_CUSTOM`. We do that by adding one line into
the `for` loop from above. In the `for` loop we assign each instance a set of four

View File

@ -1,7 +1,6 @@
Controlling thousands of fish with Particles
============================================
# Controlling thousands of fish with Particles
The problem with `MeshInstances` is that it is expensive to
update their transform array. It is great for placing many static objects around the