# In Browser, Computer Vision Based, Pong

To begin, I just couldn’t resist using an a Atari harkening font in this one. 🙂

This particular project evolved my GLSL skills and started to teach my brain how to work on a GPU better which, naturally, means I have a very nicely decorated whiteboard to my right at the moment. And I’ll be sharing it too, in a moment…

I’ve been thinking about doing a Computer Vision project for a little while now. I was inclined to do a desktop app and use OpenCV but on doing some research on what goes in to computer vision I realized, “Hey, I know the tools to make this work in browser.” So, I did.

My first task to get Computer Vision working was take the RGB feed from my WebCam and convert it to HSV. I’ve done chroma keys before just using the raw RGB values but with the recommendation that HSV might work better, I figured why not give it a try? This is where the GLSL came in and what I had to write was an implementation of the following:

Let $$R =$$ the Red RGB component in the range 0.0 to 1.0

Let $$G =$$ the Green RGB component in the range 0.0 to 1.0

Let $$B =$$ the

Blue RGB component in the range 0.0 to 1.0

Let $$Epsilon =$$ a inconsequential float value to prevent division by 0.

The definition of $$max$$ is elided, but it just returns the maximum of the values it was passed in.

$$H =\begin{cases} G – B & max(R, G, B) = R \\ B – R & max(R, G, B) = G \\ R – G & max(R, G, B) = B\end{cases}$$

$$H$$ had to be accompanied with a related offset value:

$$H’ =\begin{cases} 0 & max(R, G, B) = R \\ 1/3 & max(R, G, B) = G\\ 2/3 & max(R, G, B) = B\end{cases}$$

So my final result for Hue is actually $$H”=H+H’$$ This was because in the HSV colorspace, $$H$$ is represented as a circle going round from $$R \rightarrow G \rightarrow B \rightarrow R$$ So, in order to get select the right hue, an offset will need to be applied to get the value round to the right place.

$$S=\frac{\Delta}{max(R, G, B) + Epsilon}$$

And finally

$$V=max(R, G, B)$$

I found some other documentation about this calculation online and found it messy. That might be normal when discussing math like, I don’t really know,  this but my engineering brain likes things broken down into their individual components. Hopefully, the above will help someone else out! 🙂 Now to calculate that on the GPU making it time to live up to my promise of bringing the whiteboard in…

I ended up being heavy on mix() and step() in order to reduce the branching I had to write. Branching, as I understand, is getting to be less of a concern in GPU programs but the article saying that branching implementation was improving was from 2011 and that’s too recent to assume all GPU’s will behave nicely. Below those function notes, there’s a tracking of how my values were going to flow through in the vectors so I could end up with a the maximum in the first position, the two values I needed to subtract to get my $$H$$ and finally the offset $$H’$$ that went with the selected $$H$$ so I could calculate $$H”$$ properly. For the curious, the shuffling that’s done with mix() and step() was effectively a vectorized ternary operator.

The conversation of RGB to HSV and writing the GLSL shader to do that was most of the effort. But the shader wasn’t quite done yet. The final step was passing Min and Max values for each value: H, S, V. When all three where within the defined min and max, the shader needed to return white, when any of them were out of bounds, the shader had to return black. This way, the user can select an object in the real world by its color that will function as a paddle. Now I did look for and find shader implementations of RGB to HSV online, however, I opted to roll my own to make sure I understood what was going on as this was pretty much the beating heart of this project.

On the JavaScript side of things I did run into a problem. I wanted to use sliders so users could hone in on the color they wanted to select. However, HTML5 only allocates one thumb per slider. I know I could have used jQueryUI for to get two thumbs but I opted to stay away and handle the problem myself. It was also tempting to use three.js to handle the WebGL side of things but I decided to not bring it in either. The changes weren’t too significant from my ANSI.WebGL project so I saw no need to take on another dependency.

The JavaScript I did write is pretty straight forward. There’s a ball (a div that’s been styled to look like aball) that flys around and when it hits a paddle it flies back the other way. Hitting here has been defined as the left or right side of the ball hits a white paddle pixel. When that happens, the column of pixels is counted for connected white pixels to figure out the length of the paddle. If it’s above a threshold (provided to filter out artifacts) the calculation figures out where the ball was hit in relation to the paddle then the ball will bounce away at an angle. The only “gotcha” I ran into with this part was when I read the pixels from the buffer, I found them inverted which broke my hit detection! But it was a simple fix to right them thankfully.

For much larger computer vision problems I probably would turn to something like OpenCV. Still getting to understand something like sometimes, we convert to the HSV color space, is I think very important! Understanding how the tools you are using work I hold to be key to actually using them well. It also allows you to answer a question I’ve mentioned before as being part of my process. 😉

You will need either a recent version of Firefox or Chrome in order to run this project due to the needs of getUserMedia() and WebGL.

https://github.com/dreambbqchild/CVPong

http://www.animetedstudios.com/webapps/CVPong/