diff --git a/03_usage/17_performance/02_general_optimization.md b/03_usage/17_performance/02_general_optimization.md index bede07b..c0c0ab9 100644 --- a/03_usage/17_performance/02_general_optimization.md +++ b/03_usage/17_performance/02_general_optimization.md @@ -1,10 +1,8 @@ -General optimization tips -========================= +# General optimization tips -Introduction -~~~~~~~~~~~~ +### Introduction In an ideal world, computers would run at infinite speed. The only limit to what we could achieve would be our imagination. However, in the real world, it's @@ -22,16 +20,14 @@ To achieve the best results, we have two approaches: And preferably, we will use a blend of the two. -Smoke and mirrors -^^^^^^^^^^^^^^^^^ +#### Smoke and mirrors Part of working smarter is recognizing that, in games, we can often get the player to believe they're in a world that is far more complex, interactive, and graphically exciting than it really is. A good programmer is a magician, and should strive to learn the tricks of the trade while trying to invent new ones. -The nature of slowness -^^^^^^^^^^^^^^^^^^^^^^ +#### The nature of slowness To the outside observer, performance problems are often lumped together. But in reality, there are several different kinds of performance problems: @@ -45,8 +41,7 @@ But in reality, there are several different kinds of performance problems: Each of these are annoying to the user, but in different ways. -Measuring performance -===================== +# Measuring performance Probably the most important tool for optimization is the ability to measure performance - to identify where bottlenecks are, and to measure the success of @@ -66,8 +61,7 @@ Be very aware that the relative performance of different areas can vary on different hardware. It's often a good idea to measure timings on more than one device. This is especially the case if you're targeting mobile devices. -Limitations -~~~~~~~~~~~ +### Limitations CPU profilers are often the go-to method for measuring performance. However, they don't always tell the whole story. @@ -84,15 +78,13 @@ they don't always tell the whole story. As a result of these limitations, you often need to use detective work to find out where bottlenecks are. -Detective work -~~~~~~~~~~~~~~ +### Detective work Detective work is a crucial skill for developers (both in terms of performance, and also in terms of bug fixing). This can include hypothesis testing, and binary search. -Hypothesis testing -^^^^^^^^^^^^^^^^^^ +#### Hypothesis testing Say, for example, that you believe sprites are slowing down your game. You can test this hypothesis by: @@ -105,8 +97,7 @@ the performance drop? - You can test this by keeping everything the same, but changing the sprite size, and measuring performance. -Binary search -^^^^^^^^^^^^^ +#### Binary search If you know that frames are taking much longer than they should, but you're not sure where the bottleneck lies. You could begin by commenting out @@ -116,8 +107,7 @@ performance improved more or less than expected? Once you know which of the two halves contains the bottleneck, you can repeat this process until you've pinned down the problematic area. -Profilers -========= +# Profilers Profilers allow you to time your program while running it. Profilers then provide results telling you what percentage of time was spent in different @@ -130,8 +120,7 @@ and lead to slower performance. For more info about using Pandemonium's built-in profiler, see `doc_debugger_panel`. -Principles -========== +# Principles `Donald Knuth ( https://en.wikipedia.org/wiki/Donald_Knuth )` said: @@ -160,8 +149,7 @@ One misleading aspect of the quote is that people tend to focus on the subquote optimization is (by definition) undesirable, performant software is the result of performant design. -Performant design -~~~~~~~~~~~~~~~~~ +### Performant design The danger with encouraging people to ignore optimization until necessary, is that it conveniently ignores that the most important time to consider @@ -175,8 +163,7 @@ general programming. A performant design, even without low-level optimization, will often run many times faster than a mediocre design with low-level optimization. -Incremental design -~~~~~~~~~~~~~~~~~~ +### Incremental design Of course, in practice, unless you have prior knowledge, you are unlikely to come up with the best design the first time. Instead, you'll often make a series @@ -192,8 +179,7 @@ to a resurgence in data-oriented design, which involves designing data structures and algorithms for *cache locality* of data and linear access, rather than jumping around in memory. -The optimization process -~~~~~~~~~~~~~~~~~~~~~~~~ +### The optimization process Assuming we have a reasonable design, and taking our lessons from Knuth, our first step in optimization should be to identify the biggest bottlenecks - the @@ -209,8 +195,7 @@ The process is thus: 2. Optimize bottleneck. 3. Return to step 1. -Optimizing bottlenecks -~~~~~~~~~~~~~~~~~~~~~~ +### Optimizing bottlenecks Some profilers will even tell you which part of a function (which data accesses, calculations) are slowing things down. @@ -234,11 +219,9 @@ will increase speed, others may have a negative effect. Sometimes, a small positive effect will be outweighed by the negatives of more complex code, and you may choose to leave out that optimization. -Appendix -======== +# Appendix -Bottleneck math -~~~~~~~~~~~~~~~ +### Bottleneck math The proverb *"a chain is only as strong as its weakest link"* applies directly to performance optimization. If your project is spending 90% of the time in diff --git a/03_usage/17_performance/03_cpu_optimization.md b/03_usage/17_performance/03_cpu_optimization.md index 6f3419b..288f147 100644 --- a/03_usage/17_performance/03_cpu_optimization.md +++ b/03_usage/17_performance/03_cpu_optimization.md @@ -1,10 +1,8 @@ -CPU optimization -================ +# CPU optimization -Measuring performance -===================== +# Measuring performance We have to know where the "bottlenecks" are to know how to speed up our program. Bottlenecks are the slowest parts of the program that limit the rate that @@ -15,8 +13,7 @@ lead to small performance improvements. For the CPU, the easiest way to identify bottlenecks is to use a profiler. -CPU profilers -============= +# CPU profilers Profilers run alongside your program and take timing measurements to work out what proportion of time is spent in each function. @@ -28,11 +25,9 @@ slow down your project significantly. After profiling, you can look back at the results for a frame. -.. figure:: img/pandemonium_profiler.png) -.. figure:: img/pandemonium_profiler.png) - :alt: Screenshot of the Pandemonium profiler +![Screenshot of the Pandemonium profiler](img/pandemonium_profiler.png) - Results of a profile of one of the demo projects. +Results of a profile of one of the demo projects. Note: We can see the cost of built-in processes such as physics and audio, @@ -49,8 +44,7 @@ you can usually increase speed by optimizing this area. For more info about using Pandemonium's built-in profiler, see `doc_debugger_panel`. -External profilers -~~~~~~~~~~~~~~~~~~ +### External profilers Although the Pandemonium IDE profiler is very convenient and useful, sometimes you need more power, and the ability to profile the Pandemonium engine source code itself. @@ -70,10 +64,9 @@ Note: optimized. Bottlenecks are often in a different place in debug builds, so you should profile release builds whenever possible. -.. figure:: img/valgrind.png) - :alt: Screenshot of Callgrind +![Screenshot of Callgrind](img/valgrind.png) - Example results from Callgrind, which is part of Valgrind. +Example results from Callgrind, which is part of Valgrind. From the left, Callgrind is listing the percentage of time within a function and its children (Inclusive), the percentage of time spent within the function @@ -97,8 +90,7 @@ done in the graphics API. This specific profiling led to the development of 2D batching, which greatly speeds up 2D rendering by reducing bottlenecks in this area. -Manually timing functions -========================= +# Manually timing functions Another handy technique, especially once you have identified the bottleneck using a profiler, is to manually time the function or area under test. @@ -125,8 +117,7 @@ As you attempt to optimize functions, be sure to either repeatedly profile or time them as you go. This will give you crucial feedback as to whether the optimization is working (or not). -Caches -====== +# Caches CPU caches are something else to be particularly aware of, especially when comparing timing results of two different versions of a function. The results @@ -156,10 +147,9 @@ will be able to work as fast as possible. Pandemonium usually takes care of such low-level details for you. For example, the Server APIs make sure data is optimized for caching already for things like rendering and physics. Still, you should be especially aware of caching when -using `GDNative Threads > Thread Model** project setting to **Multi-Threaded**. @@ -58,12 +53,10 @@ To make rendering thread-safe, set the **Rendering > Threads > Thread Model** pr Note that the Multi-Threaded thread model has several known bugs, so it may not be usable in all scenarios. -GDScript arrays, dictionaries ------------------------------ +## GDScript arrays, dictionaries In GDScript, reading and writing elements from multiple threads is OK, but anything that changes the container size (resizing, adding or removing elements) requires locking a mutex. -Resources ---------- +## Resources Modifying a unique resource from multiple threads is not supported. However handling references on multiple threads is supported, hence loading resources on a thread is as well - scenes, textures, meshes, etc - can be loaded and manipulated on a thread and then added to the active scene on the main thread. The limitation here is as described above, one must be careful not to load the same resource from multiple threads at once, therefore it is easiest to use **one** thread for loading and modifying resources, and then the main thread for adding them. diff --git a/03_usage/17_performance/vertex_animation/01_animating_thousands_of_fish.md b/03_usage/17_performance/vertex_animation/01_animating_thousands_of_fish.md index 6ca3263..fc82365 100644 --- a/03_usage/17_performance/vertex_animation/01_animating_thousands_of_fish.md +++ b/03_usage/17_performance/vertex_animation/01_animating_thousands_of_fish.md @@ -1,7 +1,6 @@ -Animating thousands of fish with MultiMeshInstance -================================================== +# Animating thousands of fish with MultiMeshInstance This tutorial explores a technique used in the game `ABZU ( https://www.gdcvault.com/play/1024409/Creating-the-Art-of-ABZ )` for rendering and animating thousands of fish using vertex animation and @@ -14,8 +13,7 @@ can render thousands of animated objects, even on low end hardware. We will start by animating one fish. Then, we will see how to extend that animation to thousands of fish. -Animating one Fish ------------------- +## Animating one Fish We will start with a single fish. Load your fish model into a `MeshInstance` and add a new `ShaderMaterial`. @@ -179,8 +177,7 @@ Putting the four motions together gives us the final animation. Go ahead and play with the uniforms in order to alter the swim cycle of the fish. You will find that you can create a wide variety of swim styles using these four motions. -Making a school of fish ------------------------ +## Making a school of fish Pandemonium makes it easy to render thousands of the same object using a MultiMeshInstance node. @@ -235,8 +232,7 @@ Notice how all the fish are all in the same position in their swim cycle? It mak robotic. The next step is to give each fish a different position in the swim cycle so the entire school looks more organic. -Animating a school of fish --------------------------- +## Animating a school of fish One of the benefits of animating the fish using `cos` functions is that they are animated with one parameter, `time`. In order to give each fish a unique position in the @@ -246,6 +242,7 @@ We do that by adding the per-instance custom value `INSTANCE_CUSTOM` to `time`. ``` float time = (TIME * time_scale) + (6.28318 * INSTANCE_CUSTOM.x); +``` Next, we need to pass a value into `INSTANCE_CUSTOM`. We do that by adding one line into the `for` loop from above. In the `for` loop we assign each instance a set of four diff --git a/03_usage/17_performance/vertex_animation/02_controlling_thousands_of_fish.md b/03_usage/17_performance/vertex_animation/02_controlling_thousands_of_fish.md index 232129a..fb0e4e3 100644 --- a/03_usage/17_performance/vertex_animation/02_controlling_thousands_of_fish.md +++ b/03_usage/17_performance/vertex_animation/02_controlling_thousands_of_fish.md @@ -1,7 +1,6 @@ -Controlling thousands of fish with Particles -============================================ +# Controlling thousands of fish with Particles The problem with `MeshInstances` is that it is expensive to update their transform array. It is great for placing many static objects around the