hue shader turns pure white pixels black on some GPUs. #84

Merged
Daverball merged 1 commits from hue-shader-fix into master 2014-12-31 15:12:08 +00:00
Daverball commented 2014-12-27 13:36:40 +00:00 (Migrated from github.com)

As discussed in https://github.com/Ancurio/mkxp/pull/83 this change helped to prevent a graphical glitch on some GPUs where pure white pixels were turned black.

As discussed in https://github.com/Ancurio/mkxp/pull/83 this change helped to prevent a graphical glitch on some GPUs where pure white pixels were turned black.
Ancurio commented 2014-12-31 13:26:33 +00:00 (Migrated from github.com)

I am actually confident this issue is due to atan, which returns an undefined value if the denominator (in this case I) is zero. If you have access to a machine where the black pixels occur, you could verify it by changing float hue = atan (Q, I); to float hue = atan (Q, I + 0.01);. If that is the case, this commit is merely masking the real problem.

I'd certainly prefer to avoid a branch if at all possible (unfortunately there result is not NaN or Inf which we could test for), but it might be necessary.

I am actually confident this issue is due to `atan`, which returns an [undefined value](https://www.opengl.org/sdk/docs/man/html/atan.xhtml) if the denominator (in this case `I`) is zero. If you have access to a machine where the black pixels occur, you could verify it by changing `float hue = atan (Q, I);` to `float hue = atan (Q, I + 0.01);`. If that is the case, this commit is merely masking the real problem. I'd certainly prefer to avoid a branch if at all possible (unfortunately there result is not `NaN` or `Inf` which we could test for), but it might be necessary.
Daverball commented 2014-12-31 13:32:27 +00:00 (Migrated from github.com)

Yeah atan was what I suspected to be the culprit as well, actually I think I should just mod the atan output, I think I might've forgotten to try that, when both Q and I are zero the angle actually doesn't matter, but it matters if its huge because of how sin and cos get very inaccurate with large numbers.

The only thing that baffles me is that multiplication by chroma doesn't counter that since it should be zero and atan should only be problematic if chroma is exactly zero. The only way this would make sense is if somehow the sqrt of two squared zeros gives you something slightly nonzero on my hardware.

Yeah atan was what I suspected to be the culprit as well, actually I think I should just mod the atan output, I think I might've forgotten to try that, when both Q and I are zero the angle actually doesn't matter, but it matters if its huge because of how sin and cos get very inaccurate with large numbers. The only thing that baffles me is that multiplication by chroma doesn't counter that since it should be zero and atan should only be problematic if chroma is exactly zero. The only way this would make sense is if somehow the sqrt of two squared zeros gives you something slightly nonzero on my hardware.
Daverball commented 2014-12-31 13:35:20 +00:00 (Migrated from github.com)

And actually the OpenGL spec states the output is only undefined if both are zero, although there's some conflicting information on that particular issue out there.

And actually the OpenGL spec states the output is only undefined if both are zero, although there's some conflicting information on that particular issue out there.
Ancurio commented 2014-12-31 13:37:35 +00:00 (Migrated from github.com)

If it's a NaN of INF, I think they can just trickle through everything, and the dot product on some implementations might just always return 0 if it detects one of them anywhere (since the results of most operations on those special values aren't defined I believe).

If it's a NaN of INF, I think they can just trickle through everything, and the dot product on some implementations might just always return 0 if it detects one of them anywhere (since the results of most operations on those special values aren't defined I believe).
Ancurio commented 2014-12-31 13:38:34 +00:00 (Migrated from github.com)

And actually the OpenGL spec states the output is only undefined if both are zero, although there's some conflicting information on that particular issue out there.

Link? The GLSL docs I linked state

Results are undefined if x is zero.

> And actually the OpenGL spec states the output is only undefined if both are zero, although there's some conflicting information on that particular issue out there. Link? The GLSL docs I linked state > Results are undefined if x is zero.
Daverball commented 2014-12-31 13:40:12 +00:00 (Migrated from github.com)
https://www.opengl.org/registry/doc/GLSLangSpec.4.50.pdf#page=144
Ancurio commented 2014-12-31 13:43:11 +00:00 (Migrated from github.com)

Oh great, conflicting information :D But I think this means it's still enough to just test x (since y/0 isn't defined there couldn't possibly be a defined atan for it).

Oh great, conflicting information :D But I think this means it's still enough to just test x (since y/0 isn't defined there couldn't possibly be a defined atan for it).
Daverball commented 2014-12-31 13:53:52 +00:00 (Migrated from github.com)

Yeah it's definitely atan, adding something to x to make it nonzero has the same effect as changing the coefficients. I didn't know NaN or INF would be unaffected by clamp/mod

Yeah it's definitely atan, adding something to x to make it nonzero has the same effect as changing the coefficients. I didn't know NaN or INF would be unaffected by clamp/mod
Daverball commented 2014-12-31 14:17:21 +00:00 (Migrated from github.com)

honestly I think it would be fine to fix this by adding a small number to I, the only range where it matters is in very low saturation where you couldn't tell the slight distortion to the color anyways and on top of that we work with limited precision so you won't have a perfect result either way.

branches are way too expensive to make them worthwhile to fix this error. The only concern I have is that the ranges of Q and I include negative numbers so if you're very unlucky you'll still get an x of zero with some colors on some chipsets.

honestly I think it would be fine to fix this by adding a small number to `I`, the only range where it matters is in very low saturation where you couldn't tell the slight distortion to the color anyways and on top of that we work with limited precision so you won't have a perfect result either way. branches are way too expensive to make them worthwhile to fix this error. The only concern I have is that the ranges of Q and I include negative numbers so if you're very unlucky you'll still get an x of zero with some colors on some chipsets.
Ancurio commented 2014-12-31 14:38:42 +00:00 (Migrated from github.com)

The only concern I have is that the ranges of Q and I include negative numbers so if you're very unlucky you'll still get an x of zero with some colors on some chipsets.

That's what I meant by "then this commit is only masking the real problem". We wouldn't be fixing the problem, just pushing it from the gray colors (zero sat) to the ones very close to it.

Is speed really that important though? Already in RMXP the docs state "This process is time-consuming" (as it's mostly only used during loading of image assets). The branch (which might not even be a true branch depending on how the glsl compiler treats ternary operators) may not be noticeable inbetween all the heavy vector operations.

> The only concern I have is that the ranges of Q and I include negative numbers so if you're very unlucky you'll still get an x of zero with some colors on some chipsets. That's what I meant by "then this commit is only masking the real problem". We wouldn't be fixing the problem, just pushing it from the gray colors (zero sat) to the ones very close to it. Is speed really that important though? Already in RMXP the docs state "This process is time-consuming" (as it's mostly only used during loading of image assets). The branch (which might not even be a true branch depending on how the glsl compiler treats ternary operators) may not be noticeable inbetween all the heavy vector operations.
Daverball commented 2014-12-31 14:42:51 +00:00 (Migrated from github.com)

Yeah I guess we can at least try it with the branch, the game I'm working on uses it to make critical damage flash so it's used every two frames in those instances, I'll see if it slows stuff too much, although it might be hard for me to tell on my very recent hardware, it would probably impact older hardware more.

Yeah I guess we can at least try it with the branch, the game I'm working on uses it to make critical damage flash so it's used every two frames in those instances, I'll see if it slows stuff too much, although it might be hard for me to tell on my very recent hardware, it would probably impact older hardware more.
Daverball commented 2014-12-31 15:04:14 +00:00 (Migrated from github.com)

Alright I'm happy with the performance of this, like you implied it'll likely calculate both branches and then just put the appropriate result in at the end so it should not even need to branch.

Alright I'm happy with the performance of this, like you implied it'll likely calculate both branches and then just put the appropriate result in at the end so it should not even need to branch.
Ancurio commented 2014-12-31 15:12:00 +00:00 (Migrated from github.com)

It would have sufficed to do something like float hue = (I != 0.0) ? atan(Q, I) : 0.0;, but this is fine too. Since you said your game uses it for critical damage numbers, the bitmaps are likely very small so the additional cost is further diminished.

It would have sufficed to do something like `float hue = (I != 0.0) ? atan(Q, I) : 0.0;`, but this is fine too. Since you said your game uses it for critical damage numbers, the bitmaps are likely very small so the additional cost is further diminished.
Daverball commented 2014-12-31 15:43:34 +00:00 (Migrated from github.com)

Yeah my reasoning for putting the ternary operator down there was that if it's down the road as far as possible it's less likely to cause an actual branch.

Yeah my reasoning for putting the ternary operator down there was that if it's down the road as far as possible it's less likely to cause an actual branch.
Ancurio commented 2014-12-31 15:57:22 +00:00 (Migrated from github.com)

How so? Any references that I can read up on?

How so? Any references that I can read up on?
Daverball commented 2014-12-31 16:03:08 +00:00 (Migrated from github.com)

tbh I have nothing to base that on, but I'm generally pretty suspicious of how different drivers and GPUs build their pipelines and do their optimizations so I thought better safe than sorry, I think the GPU can spare the extra storage and move instructions for one float and one vec4 anyways so it wouldn't make much of a difference even if it doesn't help the GPU/driver to put the ternary operation at the end of the shader.

tbh I have nothing to base that on, but I'm generally pretty suspicious of how different drivers and GPUs build their pipelines and do their optimizations so I thought better safe than sorry, I think the GPU can spare the extra storage and move instructions for one float and one vec4 anyways so it wouldn't make much of a difference even if it doesn't help the GPU/driver to put the ternary operation at the end of the shader.
cremno commented 2015-01-01 05:54:57 +00:00 (Migrated from github.com)

GLSL 1.30 also has a selecting mix():

float hue = mix(atan(Q, I), 0.0, I == 0.0);

But I guess this also doesn't make a noticeable difference.

GLSL 1.30 also has a selecting `mix()`: ``` glsl float hue = mix(atan(Q, I), 0.0, I == 0.0); ``` But I guess this also doesn't make a noticeable difference.
Ancurio commented 2015-01-01 12:22:04 +00:00 (Migrated from github.com)

@cremno Yep you're right, that could be used as an even stronger hint to the compiler. However all shaders in mkxp are 1.10 and I'd like to not break that for now.

@cremno Yep you're right, that could be used as an even stronger hint to the compiler. However all shaders in mkxp are 1.10 and I'd like to not break that for now.
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: MapleShrine/mkxp#84
No description provided.