The previously YIQ-based algorithm turned out to be both slow,
and horribly inaccurate.
Another algorithm based on rotating the color value in the
RGB cube along the diagonal axis was also considered, which was
acceptable in terms of accuracy, and very fast.
In the end, I decided on a HSV-based one, because it is by far
the most accurate one, while still being a tad faster than the
YIQ solution.
Algorithm source: gamedev.stackexchange.com/a/59808/24839
A very simple GPU time benchmark when shifting a 2048^2 bitmap:
YIQ rot RGB rot HSV shift
radeon 13.4 ms 2.8 ms 11.4 ms
intel 13.0 ms 6.0 ms 10.5 ms
radeon: HD 3650 mobility
intel: N3540 integrated (Baytrail)
However hue shifting has never shown up as a bottleneck before,
so these are more academic.
Using the kitchen sink plane shader for viewport effects, even
if only a small part of them are active, incurs great performance
loss on mobile, so split the rendering into multiple optional
passes which additionally use the blending hardware for faster
mixing (lerping).
Also, don't mirror the PingPong textures if the viewport effect
covers the entire screen area anyway.
Don't globally set float precision to mediump, only fragment
shaders need that and defining it for vertex shaders causes
tilemap cracks.
Also manually define low precision for variables that hold
color / alpha values.
Setup active RGSS version at runtime. Desired version can be
specified via config, or as default, auto detected from the game
files. This removes the need to build specifically for each
version, which should help packaging a lot.
This also greatly reduces the danger of introducing code that
wouldn't compile on all RGSS version paths (as certain code paths
were completely ifdef'd out).
This can be optimized more, eg. not compiling shaders that aren't
needed in the active version.
Previously, we would just stuff the entire tilemap vertex data
four times into the buffers, with only the autotile vertices
offset according to the animation frame. This meant we could
prepare the buffers once, and then just bind a different offset
for each animation frame without any shader changes, but it also
lead to a huge amount of data being duplicated (and blowing up
the buffer sizes).
The new method only requires one buffer, and instead animates by
recognizing vertices belonging to autotiles in a custom vertex
shader, which offsets them on the fly according to the animation
index.
With giant tilemaps, this method would turn out to be a little
less efficient, but considering the Tilemap is planned to be
rewritten to only hold the range of tiles visible on the screen
in its buffers, the on the fly offsetting will become neglient,
while at the same time the amount of data we have to send to the
GPU everytime the tilemap is updated is greatly reduced; so a
net win in the end.
GL entrypoint resolution is now done manually. This has a couple
immediate benefits, such as not having to retrieve hundreds of
functions pointers that we'll never use. It's also nice to have
an exact overview of all the entrypoints used by mkxp.
This change allows mkxp to run fine with core contexts, not sure
how relevant that is going to be in the future.
What's noteworthy is that _all_ entrypoints, even the ones core
in 1.1 and guaranteed to be in every libGL, are resolved
dynamically.
This has the added benefit of not having to link directly against
libGL anymore, which also cleans up the output of `ldd` quite
a bit (SDL2 loads most system deps dynamically at runtime).
GL headers are still required at build time.
This looks like a pretty major change, but in reality,
80% of it is just renames of types and corresponding
methods.
The config parsing code has been completely replaced
with a boost::program_options based version. This
means that the config file format slightly changed
(checkout the updated README).
I still expect there to be bugs / unforseen events.
Those should be fixed in follow up commits.
Also, finally reverted back to using pkg-config to
locate and link libruby. Yay for less hacks!
The general rule I'm aiming for is to <> include
system wide / installed paths / generally everything
that's outside the git managed source tree (this means
mruby paths too!), and "" include everything else,
ie. local mkxp headers.
The only current exception are the mri headers, which
all have './' at their front as to not clash with
system wide ruby headers. I'm leaving them be for now
until I can come up with a better general solution.
This implementation is also heaps better than the old
one as it doesn't use a (differently sized) aux texture,
meaning the Bitmap discards its old texture and aquires
one of same size, making reuse through the TexPool a
lot more likely. It also saves on the aux texture blits
and binding switches.
As the setup / resource acquisition far outweighs the
actual rendering cost, operation time is relatively
constant no matter how many divisions are used.
The drawing is now completely shader based, which makes away
with all usage of the depracted matrix stack. This also allows
us to do things like simple translations and texture coordinate
translation directly instead of doing everything indirectly
through matrices.
Fixed vertex attributes ('vertexPointer()' etc) are also
replaced with user defined attribute arrays.