hue shader turns pure white pixels black on some GPUs. #84
No reviewers
Labels
No Label
RGSS accuracy
bug
compilation
discussion
documentation
duplicate
enhancement
invalid
performance issue
port request
question
ruby incompatibility
wontfix
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: MapleShrine/mkxp#84
Loading…
Reference in New Issue
No description provided.
Delete Branch "hue-shader-fix"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
As discussed in https://github.com/Ancurio/mkxp/pull/83 this change helped to prevent a graphical glitch on some GPUs where pure white pixels were turned black.
I am actually confident this issue is due to
atan
, which returns an undefined value if the denominator (in this caseI
) is zero. If you have access to a machine where the black pixels occur, you could verify it by changingfloat hue = atan (Q, I);
tofloat hue = atan (Q, I + 0.01);
. If that is the case, this commit is merely masking the real problem.I'd certainly prefer to avoid a branch if at all possible (unfortunately there result is not
NaN
orInf
which we could test for), but it might be necessary.Yeah atan was what I suspected to be the culprit as well, actually I think I should just mod the atan output, I think I might've forgotten to try that, when both Q and I are zero the angle actually doesn't matter, but it matters if its huge because of how sin and cos get very inaccurate with large numbers.
The only thing that baffles me is that multiplication by chroma doesn't counter that since it should be zero and atan should only be problematic if chroma is exactly zero. The only way this would make sense is if somehow the sqrt of two squared zeros gives you something slightly nonzero on my hardware.
And actually the OpenGL spec states the output is only undefined if both are zero, although there's some conflicting information on that particular issue out there.
If it's a NaN of INF, I think they can just trickle through everything, and the dot product on some implementations might just always return 0 if it detects one of them anywhere (since the results of most operations on those special values aren't defined I believe).
Link? The GLSL docs I linked state
https://www.opengl.org/registry/doc/GLSLangSpec.4.50.pdf#page=144
Oh great, conflicting information :D But I think this means it's still enough to just test x (since y/0 isn't defined there couldn't possibly be a defined atan for it).
Yeah it's definitely atan, adding something to x to make it nonzero has the same effect as changing the coefficients. I didn't know NaN or INF would be unaffected by clamp/mod
honestly I think it would be fine to fix this by adding a small number to
I
, the only range where it matters is in very low saturation where you couldn't tell the slight distortion to the color anyways and on top of that we work with limited precision so you won't have a perfect result either way.branches are way too expensive to make them worthwhile to fix this error. The only concern I have is that the ranges of Q and I include negative numbers so if you're very unlucky you'll still get an x of zero with some colors on some chipsets.
That's what I meant by "then this commit is only masking the real problem". We wouldn't be fixing the problem, just pushing it from the gray colors (zero sat) to the ones very close to it.
Is speed really that important though? Already in RMXP the docs state "This process is time-consuming" (as it's mostly only used during loading of image assets). The branch (which might not even be a true branch depending on how the glsl compiler treats ternary operators) may not be noticeable inbetween all the heavy vector operations.
Yeah I guess we can at least try it with the branch, the game I'm working on uses it to make critical damage flash so it's used every two frames in those instances, I'll see if it slows stuff too much, although it might be hard for me to tell on my very recent hardware, it would probably impact older hardware more.
Alright I'm happy with the performance of this, like you implied it'll likely calculate both branches and then just put the appropriate result in at the end so it should not even need to branch.
It would have sufficed to do something like
float hue = (I != 0.0) ? atan(Q, I) : 0.0;
, but this is fine too. Since you said your game uses it for critical damage numbers, the bitmaps are likely very small so the additional cost is further diminished.Yeah my reasoning for putting the ternary operator down there was that if it's down the road as far as possible it's less likely to cause an actual branch.
How so? Any references that I can read up on?
tbh I have nothing to base that on, but I'm generally pretty suspicious of how different drivers and GPUs build their pipelines and do their optimizations so I thought better safe than sorry, I think the GPU can spare the extra storage and move instructions for one float and one vec4 anyways so it wouldn't make much of a difference even if it doesn't help the GPU/driver to put the ternary operation at the end of the shader.
GLSL 1.30 also has a selecting
mix()
:But I guess this also doesn't make a noticeable difference.
@cremno Yep you're right, that could be used as an even stronger hint to the compiler. However all shaders in mkxp are 1.10 and I'd like to not break that for now.