Cleanups.

2025-05-03 22:17:59 +02:00 · 2024-05-02 20:12:45 +02:00 · 2024-05-02 20:12:45 +02:00 · a29c7f8dd0
commit a29c7f8dd0
parent 94c244f458
12 changed files with 125 additions and 248 deletions
--- a/03_usage/17_performance/02_general_optimization.md
+++ b/03_usage/17_performance/02_general_optimization.md
@ -1,10 +1,8 @@


-General optimization tips
-=========================
+# General optimization tips

-Introduction
-~~~~~~~~~~~~
+### Introduction

 In an ideal world, computers would run at infinite speed. The only limit to
 what we could achieve would be our imagination. However, in the real world, it's
@ -22,16 +20,14 @@ To achieve the best results, we have two approaches:

 And preferably, we will use a blend of the two.

-Smoke and mirrors
-^^^^^^^^^^^^^^^^^
+#### Smoke and mirrors

 Part of working smarter is recognizing that, in games, we can often get the
 player to believe they're in a world that is far more complex, interactive, and
 graphically exciting than it really is. A good programmer is a magician, and
 should strive to learn the tricks of the trade while trying to invent new ones.

-The nature of slowness
-^^^^^^^^^^^^^^^^^^^^^^
+#### The nature of slowness

 To the outside observer, performance problems are often lumped together.
 But in reality, there are several different kinds of performance problems:
@ -45,8 +41,7 @@ But in reality, there are several different kinds of performance problems:

 Each of these are annoying to the user, but in different ways.

-Measuring performance
-=====================
+# Measuring performance

 Probably the most important tool for optimization is the ability to measure
 performance - to identify where bottlenecks are, and to measure the success of
@ -66,8 +61,7 @@ Be very aware that the relative performance of different areas can vary on
 different hardware. It's often a good idea to measure timings on more than one
 device. This is especially the case if you're targeting mobile devices.

-Limitations
-~~~~~~~~~~~
+### Limitations

 CPU profilers are often the go-to method for measuring performance. However,
 they don't always tell the whole story.
@ -84,15 +78,13 @@ they don't always tell the whole story.
 As a result of these limitations, you often need to use detective work to find
 out where bottlenecks are.

-Detective work
-~~~~~~~~~~~~~~
+### Detective work

 Detective work is a crucial skill for developers (both in terms of performance,
 and also in terms of bug fixing). This can include hypothesis testing, and
 binary search.

-Hypothesis testing
-^^^^^^^^^^^^^^^^^^
+#### Hypothesis testing

 Say, for example, that you believe sprites are slowing down your game.
 You can test this hypothesis by:
@ -105,8 +97,7 @@ the performance drop?
 - You can test this by keeping everything the same, but changing the sprite
  size, and measuring performance.

-Binary search
-^^^^^^^^^^^^^
+#### Binary search

 If you know that frames are taking much longer than they should, but you're
 not sure where the bottleneck lies. You could begin by commenting out
@ -116,8 +107,7 @@ performance improved more or less than expected?
 Once you know which of the two halves contains the bottleneck, you can
 repeat this process until you've pinned down the problematic area.

-Profilers
-=========
+# Profilers

 Profilers allow you to time your program while running it. Profilers then
 provide results telling you what percentage of time was spent in different
@ -130,8 +120,7 @@ and lead to slower performance.

 For more info about using Pandemonium's built-in profiler, see `doc_debugger_panel`.

-Principles
-==========
+# Principles

 `Donald Knuth ( https://en.wikipedia.org/wiki/Donald_Knuth )` said:

@ -160,8 +149,7 @@ One misleading aspect of the quote is that people tend to focus on the subquote
 optimization is (by definition) undesirable, performant software is the result
 of performant design.

-Performant design
-~~~~~~~~~~~~~~~~~
+### Performant design

 The danger with encouraging people to ignore optimization until necessary, is
 that it conveniently ignores that the most important time to consider
@ -175,8 +163,7 @@ general programming. A performant design, even without low-level optimization,
 will often run many times faster than a mediocre design with low-level
 optimization.

-Incremental design
-~~~~~~~~~~~~~~~~~~
+### Incremental design

 Of course, in practice, unless you have prior knowledge, you are unlikely to
 come up with the best design the first time. Instead, you'll often make a series
@ -192,8 +179,7 @@ to a resurgence in data-oriented design, which involves designing data
 structures and algorithms for *cache locality* of data and linear access, rather
 than jumping around in memory.

-The optimization process
-~~~~~~~~~~~~~~~~~~~~~~~~
+### The optimization process

 Assuming we have a reasonable design, and taking our lessons from Knuth, our
 first step in optimization should be to identify the biggest bottlenecks - the
@ -209,8 +195,7 @@ The process is thus:
 2. Optimize bottleneck.
 3. Return to step 1.

-Optimizing bottlenecks
-~~~~~~~~~~~~~~~~~~~~~~
+### Optimizing bottlenecks

 Some profilers will even tell you which part of a function (which data accesses,
 calculations) are slowing things down.
@ -234,11 +219,9 @@ will increase speed, others may have a negative effect. Sometimes, a small
 positive effect will be outweighed by the negatives of more complex code, and
 you may choose to leave out that optimization.

-Appendix
-========
+# Appendix

-Bottleneck math
-~~~~~~~~~~~~~~~
+### Bottleneck math

 The proverb *"a chain is only as strong as its weakest link"* applies directly to
 performance optimization. If your project is spending 90% of the time in
--- a/03_usage/17_performance/03_cpu_optimization.md
+++ b/03_usage/17_performance/03_cpu_optimization.md
@ -1,10 +1,8 @@


-CPU optimization
-================
+# CPU optimization

-Measuring performance
-=====================
+# Measuring performance

 We have to know where the "bottlenecks" are to know how to speed up our program.
 Bottlenecks are the slowest parts of the program that limit the rate that
@ -15,8 +13,7 @@ lead to small performance improvements.

 For the CPU, the easiest way to identify bottlenecks is to use a profiler.

-CPU profilers
-=============
+# CPU profilers

 Profilers run alongside your program and take timing measurements to work out
 what proportion of time is spent in each function.
@ -28,11 +25,9 @@ slow down your project significantly.

 After profiling, you can look back at the results for a frame.

-.. figure:: img/pandemonium_profiler.png)
-.. figure:: img/pandemonium_profiler.png)
-   :alt: Screenshot of the Pandemonium profiler
+![Screenshot of the Pandemonium profiler](img/pandemonium_profiler.png)

-   Results of a profile of one of the demo projects.
+Results of a profile of one of the demo projects.

 Note:
 We can see the cost of built-in processes such as physics and audio,
@ -49,8 +44,7 @@ you can usually increase speed by optimizing this area.
 For more info about using Pandemonium's built-in profiler, see
 `doc_debugger_panel`.

-External profilers
-~~~~~~~~~~~~~~~~~~
+### External profilers

 Although the Pandemonium IDE profiler is very convenient and useful, sometimes you
 need more power, and the ability to profile the Pandemonium engine source code itself.
@ -70,10 +64,9 @@ Note:
          optimized. Bottlenecks are often in a different place in debug builds,
          so you should profile release builds whenever possible.

-.. figure:: img/valgrind.png)
-   :alt: Screenshot of Callgrind
+![Screenshot of Callgrind](img/valgrind.png)

-   Example results from Callgrind, which is part of Valgrind.
+Example results from Callgrind, which is part of Valgrind.

 From the left, Callgrind is listing the percentage of time within a function and
 its children (Inclusive), the percentage of time spent within the function
@ -97,8 +90,7 @@ done in the graphics API. This specific profiling led to the development of 2D
 batching, which greatly speeds up 2D rendering by reducing bottlenecks in this
 area.

-Manually timing functions
-=========================
+# Manually timing functions

 Another handy technique, especially once you have identified the bottleneck
 using a profiler, is to manually time the function or area under test.
@ -125,8 +117,7 @@ As you attempt to optimize functions, be sure to either repeatedly profile or
 time them as you go. This will give you crucial feedback as to whether the
 optimization is working (or not).

-Caches
-======
+# Caches

 CPU caches are something else to be particularly aware of, especially when
 comparing timing results of two different versions of a function. The results
@ -156,10 +147,9 @@ will be able to work as fast as possible.
 Pandemonium usually takes care of such low-level details for you. For example, the
 Server APIs make sure data is optimized for caching already for things like
 rendering and physics. Still, you should be especially aware of caching when
-using `GDNative <toc-tutorials-gdnative )`.
+using `GDNative ( toc-tutorials-gdnative )`.

-Languages
-=========
+# Languages

 Pandemonium supports a number of different languages, and it is worth bearing in mind
 that there are trade-offs involved. Some languages are designed for ease of use
@ -169,42 +159,37 @@ Built-in engine functions run at the same speed regardless of the scripting
 language you choose. If your project is making a lot of calculations in its own
 code, consider moving those calculations to a faster language.

-GDScript
-~~~~~~~~
+### GDScript

-`GDScript <toc-learn-scripting-gdscript )` is designed to be easy to use and iterate,
+`GDScript (toc-learn-scripting-gdscript )` is designed to be easy to use and iterate,
 and is ideal for making many types of games. However, in this language, ease of
 use is considered more important than performance. If you need to make heavy
 calculations, consider moving some of your project to one of the other
 languages.

-C#
-~~
+### C#

-`C# <toc-learn-scripting-C# )` is popular and has first-class support in Pandemonium.It
+`C# (toc-learn-scripting-C# )` is popular and has first-class support in Pandemonium.It
 offers a good compromise between speed and ease of use. Beware of possible
 garbage collection pauses and leaks that can occur during gameplay, though. A
 common approach to workaround issues with garbage collection is to use *object
 pooling*, which is outside the scope of this guide.

-Other languages
-~~~~~~~~~~~~~~~
+### Other languages

 Third parties provide support for several other languages, including `Rust
 ( https://github.com/pandemonium-rust/pandemonium-rust )` and `Javascript
 ( https://github.com/PandemoniumExplorer/ECMAScript )`.

-C++
-~~~
+### C++

 Pandemonium is written in C++. Using C++ will usually result in the fastest code.
 However, on a practical level, it is the most difficult to deploy to end users'
 machines on different platforms. Options for using C++ include
-`GDNative <toc-tutorials-gdnative )` and
+`GDNative (toc-tutorials-gdnative )` and
 `custom modules ( doc_custom_modules_in_c++ )`.

-Threads
-=======
+# Threads

 Consider using threads when making a lot of calculations that can run in
 parallel to each other. Modern CPUs have multiple cores, each one capable of
@ -222,8 +207,7 @@ debugger doesn't support setting up breakpoints in threads yet.

 For more information on threads, see `doc_using_multiple_threads`.

-SceneTree
-=========
+# SceneTree

 Although Nodes are an incredibly powerful and versatile concept, be aware that
 every node has a cost. Built-in functions such as `process()` and
@ -244,8 +228,7 @@ This can be very useful for adding and removing areas from a game, for example.
 You can avoid the SceneTree altogether by using Server APIs. For more
 information, see `doc_using_servers`.

-Physics
-=======
+# Physics

 In some situations, physics can end up becoming a bottleneck. This is
 particularly the case with complex worlds and large numbers of physics objects.
--- a/03_usage/17_performance/04_gpu_optimization.md
+++ b/03_usage/17_performance/04_gpu_optimization.md
@ -1,10 +1,8 @@


-GPU optimization
-================
+# GPU optimization

-Introduction
-~~~~~~~~~~~~
+### Introduction

 The demand for new graphics features and progress almost guarantees that you
 will encounter graphics bottlenecks. Some of these can be on the CPU side, for
@ -22,8 +20,7 @@ indirectly by changing the instructions you give to the GPU. Also, it may be
 more difficult to take measurements. In many cases, the only way of measuring
 performance is by examining changes in the time spent rendering each frame.

-Draw calls, state changes, and APIs
-===================================
+# Draw calls, state changes, and APIs

 Note:
 The following section is not relevant to end-users, but is useful to
@ -42,8 +39,7 @@ reduce these instructions to a bare minimum and group together similar objects
 as much as possible so they can be rendered together, or with the minimum number
 of these expensive state changes.

-2D batching
-~~~~~~~~~~~
+### 2D batching

 In 2D, the costs of treating each item individually can be prohibitively high -
 there can easily be thousands of them on the screen. This is why 2D *batching*
@ -54,8 +50,7 @@ to a minimum.

 For more information on 2D batching, see `doc_batching`.

-3D batching
-~~~~~~~~~~~
+### 3D batching

 In 3D, we still aim to minimize draw calls and state changes. However, it can be
 more difficult to batch together several objects into a single draw call. 3D
@ -76,8 +71,7 @@ numbers of distant or low-poly objects.
 For more information on 3D specific optimizations, see
 `doc_optimizing_3d_performance`.

-Reuse Shaders and Materials
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
+### Reuse Shaders and Materials

 The Pandemonium renderer is a little different to what is out there. It's designed to
 minimize GPU state changes as much as possible. `SpatialMaterial
@ -99,8 +93,7 @@ If a scene has, for example, `20,000` objects with `20,000` different
 materials each, rendering will be slow. If the same scene has `20,000`
 objects, but only uses `100` materials, rendering will be much faster.

-Pixel cost versus vertex cost
-=============================
+# Pixel cost versus vertex cost

 You may have heard that the lower the number of polygons in a model, the faster
 it will be rendered. This is *really* relative and depends on many factors.
@ -155,8 +148,7 @@ Pay attention to the additional vertex processing required when using:
 -  Morphs (shape keys)
 -  Vertex-lit objects (common on mobile)

-Pixel/fragment shaders and fill rate
-====================================
+# Pixel/fragment shaders and fill rate

 In contrast to vertex processing, the costs of fragment (per-pixel) shading have
 increased dramatically over the years. Screen resolutions have increased (the
@ -182,8 +174,7 @@ amount of work the GPU has to do. You can do this by simplifying the shader
 **When targeting mobile devices, consider using the simplest possible shaders
 you can reasonably afford to use.**

-Reading textures
-~~~~~~~~~~~~~~~~
+### Reading textures

 The other factor in fragment shaders is the cost of reading textures. Reading
 textures is an expensive operation, especially when reading from several
@ -195,8 +186,7 @@ mobiles.
 **If you use third-party shaders or write your own shaders, try to use
 algorithms that require as few texture reads as possible.**

-Texture compression
-~~~~~~~~~~~~~~~~~~~
+### Texture compression

 By default, Pandemonium compresses textures of 3D models when imported using video RAM
 (VRAM) compression. Video RAM compression isn't as efficient in size as PNG or
@ -222,8 +212,7 @@ Note:
   significantly due to their low resolution.


-Post-processing and shadows
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
+### Post-processing and shadows

 Post-processing effects and shadows can also be expensive in terms of fragment
 shading activity. Always test the impact of these on different hardware.
@ -234,8 +223,7 @@ performance of shadows is to turn shadows off for as many lights and objects as
 possible. Smaller or distant OmniLights/SpotLights can often have their shadows
 disabled with only a small visual impact.

-Transparency and blending
-=========================
+# Transparency and blending

 Transparent objects present particular problems for rendering efficiency. Opaque
 objects (especially in 3D) can be essentially rendered in any order and the
@ -259,8 +247,7 @@ minimize these fill rate requirements, especially on mobile, where fill rate is
 very expensive. Indeed, in many situations, rendering more complex opaque
 geometry can end up being faster than using transparency to "cheat".

-Multi-platform advice
-=====================
+# Multi-platform advice

 If you are aiming to release on multiple platforms, test *early* and test
 *often* on all your platforms, especially mobile. Developing a game on desktop
@ -271,8 +258,7 @@ add optional enhancements for more powerful platforms. For example, you may want
 to use the GLES2 backend for both desktop and mobile platforms where you target
 both.

-Mobile/tiled renderers
-======================
+# Mobile/tiled renderers

 As described above, GPUs on mobile devices work in dramatically different ways
 from GPUs on desktop. Most mobile devices use tile renderers. Tile renderers
--- a/03_usage/17_performance/05_using_multimesh.md
+++ b/03_usage/17_performance/05_using_multimesh.md
@ -1,7 +1,6 @@


-Optimization using MultiMeshes
-==============================
+# Optimization using MultiMeshes

 For large amount of instances (in the thousands), that need to be constantly processed
 (and certain amount of control needs to be retained),
@ -11,8 +10,7 @@ When the amount of objects reach the hundreds of thousands or millions,
 none of these approaches are efficient anymore. Still, depending on the requirements, there
 is one more optimization possible.

-MultiMeshes
-----------
+## MultiMeshes

 A `MultiMesh( MultiMesh )` is a single draw primitive that can draw up to millions
 of objects in one go. It's extremely efficient because it uses the GPU hardware to do this
@ -44,8 +42,7 @@ controlled with the `MultiMesh.visible_instance_count`
 property. The typical workflow is to allocate the maximum amount of instances that will be used,
 then change the amount visible depending on how many are currently needed.

-Multimesh example
-----------------
+## Multimesh example

 Here is an example of using a MultiMesh from code. Languages other than GDScript may be more
 efficient for millions of objects, but for a few thousands, GDScript should be fine.
--- a/03_usage/17_performance/06_batching.md
+++ b/03_usage/17_performance/06_batching.md
@ -1,10 +1,7 @@

+# Optimization using batching

-Optimization using batching
-===========================
-
-Introduction
-~~~~~~~~~~~~
+### Introduction

 Game engines have to send a set of instructions to the GPU to tell the GPU what
 and where to draw. These instructions are sent using common instructions called
@ -16,8 +13,7 @@ of work for the user in the GPU driver at the cost of more expensive draw calls.
 As a result, applications can often be sped up by reducing the number of draw
 calls.

-Draw calls
-^^^^^^^^^^
+#### Draw calls

 In 2D, we need to tell the GPU to render a series of primitives (rectangles,
 lines, polygons etc). The most obvious technique is to tell the GPU to render
@ -41,8 +37,7 @@ automatically group together primitives wherever possible and send these batches
 on to the GPU. This can give an increase in rendering performance while
 requiring few (if any) changes to your Pandemonium project.

-How it works
-~~~~~~~~~~~~
+### How it works

 Instructions come into the renderer from your game in the form of a series of
 items, each of which can contain one or more commands. The items correspond to
@ -56,8 +51,7 @@ The batcher uses two main techniques to group together primitives:
 - Consecutive items can be joined together.
 - Consecutive commands within an item can be joined to form a batch.

-Breaking batching
-^^^^^^^^^^^^^^^^^
+#### Breaking batching

 Batching can only take place if the items or commands are similar enough to be
 rendered in one draw call. Certain changes (or techniques), by necessity, prevent
@ -75,8 +69,7 @@ Note:
    For example, if you draw a series of sprites each with a different texture,
    there is no way they can be batched.

-Determining the rendering order
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+#### Determining the rendering order

 The question arises, if only similar items can be drawn together in a batch, why
 don't we look through all the items in a scene, group together all the similar
@ -104,8 +97,7 @@ Note:
    can improve performance in some cases. See the
    `doc_batching_diagnostics` section to help you make this decision.

-A trick
-^^^^^^^
+#### A trick

 And now, a sleight of hand. Even though the idea of painter's order is that
 objects are rendered from back to front, consider 3 objects `A`, `B` and
@ -129,8 +121,7 @@ drawn *on top* of each other. If we relax that assumption, i.e. if none of these
 3 objects are overlapping, there is *no need* to preserve painter's order. The
 rendered result will be the same. What if we could take advantage of this?

-Item reordering
-^^^^^^^^^^^^^^^
+#### Item reordering

 ![](img/overlap2.png)

@ -152,8 +143,7 @@ balance the costs and benefits in your project.
 Since the texture only changes once, we can render the above in only 2 draw
 calls.

-Lights
-~~~~~~
+### Lights

 Although the batching system's job is normally quite straightforward, it becomes
 considerably more complex when 2D lights are used. This is because lights are
@ -207,8 +197,7 @@ that in a real game, you might be drawing closer to 1,000 sprites.
 That is a 1000× decrease in draw calls, and should give a huge increase in
 performance.

-Overlap test
-^^^^^^^^^^^^
+#### Overlap test

 However, as with the item reordering, things are not that simple. We must first
 perform the overlap test to determine whether we can join these primitives. This
@ -222,8 +211,7 @@ therefore shouldn't be joined). In practice, the decrease in draw calls may be
 less dramatic than in a perfect situation with no overlapping at all. However,
 performance is usually far higher than without this lighting optimization.

-Light scissoring
-~~~~~~~~~~~~~~~~
+### Light scissoring

 Batching can make it more difficult to cull out objects that are not affected or
 partially affected by a light. This can increase the fill rate requirements
@ -257,14 +245,12 @@ The exact relationship is probably not necessary for users to worry about, but
 is included in the appendix out of interest:
 `doc_batching_light_scissoring_threshold_calculation`

-.. figure:: img/scissoring.png)
-   :alt: Light scissoring example diagram
+![Light scissoring example diagram](img/scissoring.png) 

   Bottom right is a light, the red area is the pixels saved by the scissoring
   operation. Only the intersection needs to be rendered.

-Vertex baking
-~~~~~~~~~~~~~
+### Vertex baking

 The GPU shader receives instructions on what to draw in 2 main ways:

@ -290,8 +276,7 @@ In most cases, this works fine, but this shortcut breaks down if a shader expect
 these values to be available individually rather than combined. This can happen
 in custom shaders.

-Custom shaders
-^^^^^^^^^^^^^^
+#### Custom shaders

 As a result of the limitation described above, certain operations in custom
 shaders will prevent vertex baking and therefore decrease the potential for
@ -301,8 +286,7 @@ currently apply:
 - Reading or writing `COLOR` or `MODULATE` disables vertex color baking.
 - Reading `VERTEX`  disables vertex position baking.

-Project Settings
-~~~~~~~~~~~~~~~~
+### Project Settings

 To fine-tune batching, a number of project settings are available. You can
 usually leave these at default during development, but it's a good idea to
@ -311,8 +295,7 @@ tweaking parameters can often give considerable performance gains for very
 little effort. See the on-hover tooltips in the Project Settings for more
 information.

-rendering/batching/options
-^^^^^^^^^^^^^^^^^^^^^^^^^^
+#### rendering/batching/options

 - `use_batching
 ` -
@ -328,8 +311,7 @@ rendering/batching/options
  This is a faster way of drawing unbatchable rectangles. However, it may lead
  to flicker on some hardware so it's not recommended.

-rendering/batching/parameters
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+#### rendering/batching/parameters

 - `max_join_item_commands` -
  One of the most important ways of achieving batching is to join suitable
@ -358,8 +340,7 @@ rendering/batching/parameters
  textures. The lookahead for the overlap test has a small cost, so the best
  value may change per project.

-rendering/batching/lights
-^^^^^^^^^^^^^^^^^^^^^^^^^
+#### rendering/batching/lights

 - `scissor_area_threshold
 ` -
@ -372,8 +353,7 @@ rendering/batching/lights
  costs and benefits may be project dependent, and hence the best value to use
  here.

-rendering/batching/debug
-^^^^^^^^^^^^^^^^^^^^^^^^
+#### rendering/batching/debug

 - `flash_batching
 ` -
@ -387,8 +367,7 @@ rendering/batching/debug
  This will periodically print a diagnostic batching log to
  the Pandemonium IDE / console.

-rendering/batching/precision
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+#### rendering/batching/precision

 - `uv_contract
 ` -
@ -405,8 +384,7 @@ rendering/batching/precision



-Diagnostics
-~~~~~~~~~~~
+### Diagnostics

 Although you can change parameters and examine the effect on frame rate, this
 can feel like working blindly, with no idea of what is going on under the hood.
@ -415,8 +393,7 @@ print out (to the IDE or console) a list of the batches that are being
 processed. This can help pinpoint situations where batching isn't occurring
 as intended, and help you fix these situations to get the best possible performance.

-Reading a diagnostic
-^^^^^^^^^^^^^^^^^^^^
+#### Reading a diagnostic

 ```
    canvas_begin FRAME 2604
@ -456,8 +433,7 @@ This is a typical diagnostic.
 - **batch D:** A default batch, containing everything else that is not currently
  batched.

-Default batches
-^^^^^^^^^^^^^^^
+#### Default batches

 The second number following default batches is the number of commands in the
 batch, and it is followed by a brief summary of the contents:
@ -479,19 +455,16 @@ batch, and it is followed by a brief summary of the contents:

 You may see "dummy" default batches containing no commands; you can ignore those.

-Frequently asked questions
-~~~~~~~~~~~~~~~~~~~~~~~~~~
+### Frequently asked questions

-I don't get a large performance increase when enabling batching.
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+#### I don't get a large performance increase when enabling batching.

 - Try the diagnostics, see how much batching is occurring, and whether it can be
  improved
 - Try changing batching parameters in the Project Settings.
 - Consider that batching may not be your bottleneck (see bottlenecks).

-I get a decrease in performance with batching.
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+#### I get a decrease in performance with batching.

 - Try the steps described above to increase the number of batching opportunities.
 - Try enabling `single_rect_fallback
@ -502,29 +475,24 @@ I get a decrease in performance with batching.
 - After trying the above, if your scene is still performing worse, consider
  turning off batching.

-I use custom shaders and the items are not batching.
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+#### I use custom shaders and the items are not batching.

 - Custom shaders can be problematic for batching, see the custom shaders section

-I am seeing line artifacts appear on certain hardware.
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+#### I am seeing line artifacts appear on certain hardware.

 - See the `uv_contract
 `
  project setting which can be used to solve this problem.

-I use a large number of textures, so few items are being batched.
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+#### I use a large number of textures, so few items are being batched.

 - Consider using texture atlases. As well as allowing batching, these
  reduce the need for state changes associated with changing textures.

-Appendix
-~~~~~~~~
+### Appendix

-Batched primitives
-^^^^^^^^^^^^^^^^^^
+#### Batched primitives

 Not all primitives can be batched. Batching is not guaranteed either,
 especially with primitives using an antialiased border. The following
@ -541,8 +509,7 @@ See `doc_custom_drawing_in_2d` for more information.



-Light scissoring threshold calculation
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+#### Light scissoring threshold calculation

 The actual proportion of screen pixel area used as the threshold is the
 `scissor_area_threshold
--- a/03_usage/17_performance/07_optimizing_3d_performance.md
+++ b/03_usage/17_performance/07_optimizing_3d_performance.md
@ -1,21 +1,14 @@
-.. meta::
-    :keywords: optimization

+# Optimizing 3D performance

-
-Optimizing 3D performance
-=========================
-
-Culling
-=======
+# Culling

 Pandemonium will automatically perform view frustum culling in order to prevent
 rendering objects that are outside the viewport. This works well for games that
 take place in a small area, however things can quickly become problematic in
 larger levels.

-Occlusion culling
-~~~~~~~~~~~~~~~~~
+### Occlusion culling

 Walking around a town for example, you may only be able to see a few buildings
 in the street you are in, as well as the sky and a few birds flying overhead. As
@ -44,8 +37,7 @@ It is a very powerful technique for speeding up rendering. You can also use it t
 restrict physics or AI to the local area, and speed these up as well as
 rendering.

-Portal Rendering
-~~~~~~~~~~~~~~~~
+### Portal Rendering

 However, there is a much easier way to take advantage of occlusion. Pandemonium features
 an advanced portal rendering system, which can perform occlusion culling from cameras and
@ -62,15 +54,13 @@ Note:
    from seeing too far away, which would decrease performance due to the lost
    opportunies for occlusion culling.

-Other occlusion techniques
-~~~~~~~~~~~~~~~~~~~~~~~~~~
+### Other occlusion techniques

 As well as the portal system and manual methods, there are various other occlusion
 techniques such as raster-based occlusion culling. Some of these may be available
 through add-ons or may be available in core Pandemonium in the future.

-Transparent objects
-~~~~~~~~~~~~~~~~~~~
+### Transparent objects

 Pandemonium sorts objects by `Material` and `Shader
 ( Shader )` to improve performance. This, however, can not be done with
@ -83,8 +73,7 @@ with its own material.
 For more information, see the `GPU optimizations ( doc_gpu_optimization )`
 doc.

-Level of detail (LOD)
-=====================
+# Level of detail (LOD)

 In some situations, particularly at a distance, it can be a good idea to
 **replace complex geometry with simpler versions**. The end user will probably
@ -93,8 +82,7 @@ in the far distance. There are several strategies for replacing models at
 varying distance. You could use lower poly models, or use transparency to
 simulate more complex geometry.

-Billboards and imposters
-~~~~~~~~~~~~~~~~~~~~~~~~
+### Billboards and imposters

 The simplest version of using transparency to deal with LOD is billboards. For
 example, you can use a single transparent quad to represent a tree at distance.
@ -113,8 +101,7 @@ the viewer a considerable distance for the angle of view to change
 significantly. This can be complex to get working, but may be worth it depending
 on the type of project you are making.

-Use instancing (MultiMesh)
-~~~~~~~~~~~~~~~~~~~~~~~~~~
+### Use instancing (MultiMesh)

 If several identical objects have to be drawn in the same place or nearby, try
 using `MultiMesh` instead. MultiMesh allows the drawing
@ -124,8 +111,7 @@ identical objects.

 Also see the `Using MultiMesh ( doc_using_multimesh )` doc.

-Bake lighting
-=============
+# Bake lighting

 Lighting objects is one of the most costly rendering operations. Realtime
 lighting, shadows (especially multiple lights), and GI are especially expensive.
@ -139,15 +125,13 @@ In general, if several lights need to affect a scene, it's best to use
 `doc_baked_lightmaps`. Baking can also improve the scene quality by adding
 indirect light bounces.

-Animation and skinning
-======================
+# Animation and skinning

 Animation and vertex animation such as skinning and morphing can be very
 expensive on some platforms. You may need to lower the polycount considerably
 for animated models or limit the number of them on screen at any one time.

-Large worlds
-============
+# Large worlds

 If you are making large worlds, there are different considerations than what you
 may be familiar with from smaller games.
--- a/03_usage/17_performance/08_using_servers.md
+++ b/03_usage/17_performance/08_using_servers.md
@ -1,7 +1,6 @@


-Optimization using Servers
-==========================
+# Optimization using Servers

 Engines like Pandemonium provide increased ease of use thanks to their high level constructs and features.
 Most of them are accessed and used via the `Scene System( doc_scene_tree )`. Using nodes and
@ -23,8 +22,7 @@ back to a more handcrafted, low level implementation of game code.

 Still, Pandemonium is designed to work around this problem.

-Servers
-------
+## Servers

 One of the most interesting design decisions for Pandemonium is the fact that the whole scene system is
 *optional*. While it is not currently possible to compile it out, it can be completely bypassed.
@ -41,8 +39,7 @@ The most common servers are:
 Explore their APIs and you will realize that all the functions provided are low-level
 implementations of everything Pandemonium allows you to do.

-RIDs
----
+## RIDs

 The key to using servers is understanding Resource ID (`RID`) objects. These are opaque
 handles to the server implementation. They are allocated and freed manually. Almost every
@ -83,8 +80,7 @@ Try exploring the nodes and resources you are familiar with and find the functio
 It is not advised to control RIDs from objects that already have a node associated. Instead, server
 functions should always be used for creating and controlling new ones and interacting with the existing ones.

-Creating a sprite
-----------------
+## Creating a sprite

 This is a simple example of how to create a sprite from code and move it using the low-level
 `CanvasItem` API.
@ -127,8 +123,7 @@ gdscript GDScript
    VisualServer.canvas_item_clear(ci_rid)
 ```

-Instantiating a Mesh into 3D space
----------------------------------
+## Instantiating a Mesh into 3D space

 The 3D APIs are different from the 2D ones, so the instantiation API must be used.

@ -158,8 +153,7 @@ gdscript GDScript
        VisualServer.instance_set_transform(instance, xform)
 ```

-Creating a 2D RigidBody and moving a sprite with it
---------------------------------------------------
+## Creating a 2D RigidBody and moving a sprite with it

 This creates a `RigidBody2D` API,
 and moves a `CanvasItem` when the body moves.
@ -200,8 +194,7 @@ gdscript GDScript
 The 3D version should be very similar, as 2D and 3D physics servers are identical (using
 `RigidBody` respectively).

-Getting data from the servers
-----------------------------
+## Getting data from the servers

 Try to **never** request any information from `VisualServer`, `PhysicsServer` or `Physics2DServer`
 by calling functions unless you know what you are doing. These servers will often run asynchronously
--- a/03_usage/17_performance/img/pandemonium_profiler.png
+++ b/03_usage/17_performance/img/pandemonium_profiler.png
--- a/03_usage/17_performance/threads/01_using_multiple_threads.md
+++ b/03_usage/17_performance/threads/01_using_multiple_threads.md
@ -1,10 +1,8 @@


-Using multiple threads
-======================
+# Using multiple threads

-Threads
-------
+## Threads

 Threads allow simultaneous execution of code. It allows off-loading work
 from the main thread.
@ -21,8 +19,7 @@ Warning:
    Before using a built-in class in a thread, read `doc_thread_safe_apis`
    first to check whether it can be safely used in a thread.

-Creating a Thread
-----------------
+## Creating a Thread

 Creating a thread is very simple, just use the following code:

@ -56,8 +53,7 @@ Even if the function has returned already, the thread must collect it, so call
 `Thread.wait_to_finish()( Thread_method_wait_to_finish )`, which will
 wait until the thread is done (if not done yet), then properly dispose of it.

-Mutexes
-------
+## Mutexes

 Accessing objects or data from multiple threads is not always supported (if you
 do it, it will cause unexpected behaviors or crashes). Read the
@ -111,8 +107,7 @@ gdscript GDScript
        print("Counter is: ", counter) # Should be 2.
 ```

-Semaphores
----------
+## Semaphores

 Sometimes you want your thread to work *"on demand"*. In other words, tell it
 when to work and let it suspend when it isn't doing anything.
--- a/03_usage/17_performance/threads/02_thread_safe_apis.md
+++ b/03_usage/17_performance/threads/02_thread_safe_apis.md
@ -1,25 +1,21 @@


-Thread-safe APIs
-================
+# Thread-safe APIs

-Threads
-------
+## Threads

 Threads are used to balance processing power across CPUs and cores.
 Pandemonium supports multithreading, but not in the whole engine.

 Below is a list of ways multithreading can be used in different areas of Pandemonium.

-Global scope
------------
+## Global scope

 `Global Scope( @GlobalScope )` singletons are all thread-safe. Accessing servers from threads is supported (for VisualServer and Physics servers, ensure threaded or thread-safe operation is enabled in the project settings!).

 This makes them ideal for code that creates dozens of thousands of instances in servers and controls them from threads. Of course, it requires a bit more code, as this is used directly and not within the scene tree.

-Scene tree
----------
+## Scene tree

 Interacting with the active scene tree is **NOT** thread-safe. Make sure to use mutexes when sending data between threads. If you want to call functions from a thread, the *call_deferred* function may be used:

@ -49,8 +45,7 @@ you are doing and you are sure that a single resource is not being used or
 set in multiple ones. Otherwise, you are safer just using the servers API
 (which is fully thread-safe) directly and not touching scene or resources.

-Rendering
---------
+## Rendering

 Instancing nodes that render anything in 2D or 3D (such as Sprite) is *not* thread-safe by default.
 To make rendering thread-safe, set the **Rendering > Threads > Thread Model** project setting to **Multi-Threaded**.
@ -58,12 +53,10 @@ To make rendering thread-safe, set the **Rendering > Threads > Thread Model** pr
 Note that the Multi-Threaded thread model has several known bugs, so it may not be usable
 in all scenarios.

-GDScript arrays, dictionaries
-----------------------------
+## GDScript arrays, dictionaries

 In GDScript, reading and writing elements from multiple threads is OK, but anything that changes the container size (resizing, adding or removing elements) requires locking a mutex.

-Resources
---------
+## Resources

 Modifying a unique resource from multiple threads is not supported. However handling references on multiple threads is supported, hence loading resources on a thread is as well - scenes, textures, meshes, etc - can be loaded and manipulated on a thread and then added to the active scene on the main thread. The limitation here is as described above, one must be careful not to load the same resource from multiple threads at once, therefore it is easiest to use **one** thread for loading and modifying resources, and then the main thread for adding them.
--- a/03_usage/17_performance/vertex_animation/01_animating_thousands_of_fish.md
+++ b/03_usage/17_performance/vertex_animation/01_animating_thousands_of_fish.md
@ -1,7 +1,6 @@


-Animating thousands of fish with MultiMeshInstance
-==================================================
+# Animating thousands of fish with MultiMeshInstance

 This tutorial explores a technique used in the game `ABZU ( https://www.gdcvault.com/play/1024409/Creating-the-Art-of-ABZ )`
 for rendering and animating thousands of fish using vertex animation and
@ -14,8 +13,7 @@ can render thousands of animated objects, even on low end hardware.
 We will start by animating one fish. Then, we will see how to extend that animation to
 thousands of fish.

-Animating one Fish
------------------
+## Animating one Fish

 We will start with a single fish. Load your fish model into a `MeshInstance`
 and add a new `ShaderMaterial`.
@ -179,8 +177,7 @@ Putting the four motions together gives us the final animation.
 Go ahead and play with the uniforms in order to alter the swim cycle of the fish. You will
 find that you can create a wide variety of swim styles using these four motions.

-Making a school of fish
-----------------------
+## Making a school of fish

 Pandemonium makes it easy to render thousands of the same object using a MultiMeshInstance node.

@ -235,8 +232,7 @@ Notice how all the fish are all in the same position in their swim cycle? It mak
 robotic. The next step is to give each fish a different position in the swim cycle so the entire
 school looks more organic.

-Animating a school of fish
--------------------------
+## Animating a school of fish

 One of the benefits of animating the fish using `cos` functions is that they are animated with
 one parameter, `time`. In order to give each fish a unique position in the
@ -246,6 +242,7 @@ We do that by adding the per-instance custom value `INSTANCE_CUSTOM` to `time`.

 ```
  float time = (TIME * time_scale) + (6.28318 * INSTANCE_CUSTOM.x);
+```

 Next, we need to pass a value into `INSTANCE_CUSTOM`. We do that by adding one line into
 the `for` loop from above. In the `for` loop we assign each instance a set of four
--- a/03_usage/17_performance/vertex_animation/02_controlling_thousands_of_fish.md
+++ b/03_usage/17_performance/vertex_animation/02_controlling_thousands_of_fish.md
@ -1,7 +1,6 @@


-Controlling thousands of fish with Particles
-============================================
+# Controlling thousands of fish with Particles

 The problem with `MeshInstances` is that it is expensive to
 update their transform array. It is great for placing many static objects around the