Hard hard user interfaces.

Started as someone doing ActionScript in Macromedia Flash I got into Human-Computer-Interaction after watching Jeff's Hann's multi-touch work. That's the power of a really cool demo that inspired the birth of NUIGroup community which spawned 1000s of makers all around the world making their own projector camera multitouch systems. All of it became mainstream once Apple released the iPhone to the world, and also gave birth to my interest in human-centered engineering. I spent some 10 years after that playing with sensors, haptics, and gestural interfaces through multiple input modalities but nothing stuck as much as multi-touch did. Thanks to Apple's execution. Everyone's phone/tablet interaction is default multitouch. (unless you have a visual impairment). Primsense evolved into Kinect, then came Wiimote, Leap but nothing stuck.  for speech, it was Amazon's Alexa. Latest news , that its going to lose Amazon 10B.  Like I am writing this post through my keyboard a...

CV dilemma

Looks like there's a problem with OpenCV's distributed parameterization of coordinates used during the camera calibration tasks. Three distinct sources of information on img distortion formulae apparently give three non-equivalent description of the parameters and equations involved:

(1) In their book "Learning OpenCV" write regarding lens distortion:

xcorrected = x * ( 1 + k1 * r^2 + k2 * r^4 + k3 * r^6 ) + [ 2 * p1 * x * y + p2 * ( r^2 + 2 * x^2 ) ],

ycorrected = y * ( 1 + k1 * r^2 + k2 * r^4 + k3 * r^6 ) + [ p1 * ( r^2 + 2 * y^2 ) + 2 * p2 * x * y ],

where r = sqrt( x^2 + y^2 ).

Assumably, (x, y) are the pixel coordinates (in pixel units) in the uncorrected captured image corresponding to world-point objects with coordinates (X, Y, Z), camera-frame referenced, for which

xcorrected = fx * ( X / Z ) + cx and ycorrected = fy * ( Y / Z ) + cy,

where fx, fy, cx, and cy, are the capturing camera's intrinsic parameters. Therefore, having (x, y) from a captured image, one can derive the desired points ( x-correct, y-correct ) to obtain an un-distorted image of the captured world scene.

However...

(2) The complication issue comes when we look at OpenCV 2.0 C Reference entry under the "Camera Calibration and 3D Reconstruction section". For the ease of the comparing task we begin with all world-point (X, Y, Z) coordinates being expressed w.r.t the camera's reference frame, just like the first point. Consequently, the transformation matrix [ R | t ] is of no concern.

In the C reference, it is expressed that:

x' = X / Z,

y' = Y / Z,

x'' = x' * ( 1 + k1 * r'^2 + k2 * r'^4 + k3 * r'^6 ) + [ 2 * p1 * x' * y' + p2 * ( r'^2 + 2 * x'^2 ) ],

y'' = y' * ( 1 + k1 * r'^2 + k2 * r'^4 + k3 * r'^6 ) + [ p1 * ( r'^2 + 2 * y'^2 + 2 * p2 * x' * y' ],

where r' = sqrt( x'^2 + y'^2 ), and finally

u = fx * x'' + cx,

v = fy * y'' + cy.

One can see these expressions are not equiv. to the ones presented in #1, with the result that the two sets of corrected coordinates ( x-correct, y-correct ) and ( u, v ) aren't the same. Why the contradiction? It seems to me the first set makes more sense as I can attach physical meaning to each and every x and y in there, while I find no physical meaning in x' = X / Z and y' = Y / Z when the camera focal length is not exactly= 1. Moreover, one cannot calculate x' and y' since we don't know (X, Y, Z).

(3) Unfortunately, things get even murkier when we refer to the writings in Intel's OpenCV Library Reference Manual's section Lens Distortion (page 6-4), which states in part:

"Let ( u, v ) be true pixel image coordinates, that is, coordinates with ideal projection, and ( u~, v~ ) be corresponding real observed (distorted) image coordinates. Similarly, ( x, y ) are ideal (distortion-free) and ( x~, y~ ) are real (distorted) image physical coordinates. Taking into account two expansion terms gives the following:

x~ = x * ( 1 + k1 * r^2 + k2 * r^4 ) + [ 2 p1 * x * y + p2 * ( r^2 + 2 * x^2 ) ]

y~ = y * ( 1 + k1 * r^2 + k2 * r^4 ] + [ 2 p2 * x * y + p2 * ( r^2 + 2 * y^2 ) ],

where r = sqrt( x^2 + y^2 ). ...

"Because u~ = cx + fx * u and v~ = cy + fy * v , … the resultant system can be rewritten as follows:

u~ = u + ( u – cx ) * [ k1 * r^2 + k2 * r^4 + 2 * p1 * y + p2 * ( r^2 / x + 2 * x ) ]

v~ = v + ( v – cy ) * [ k1 * r^2 + k2 * r^4 + 2 * p2 * x + p1 * ( r^2 / y + 2 * y ) ]

The latter relations are used to undistort images from the camera."

Well, it would appear that the expressions involving x~ and y~ coincided with the two expressions given at the top of this writing involving x-correct and y-correct. However, x~ and y~ do not refer to corrected coordinates, according to the given description. I don't understand the distinction between the meaning of the coordinates ( x~, y~ ) and ( u~, v~ ), or for that matter, between the pairs ( x, y ) and ( u, v ). From their descriptions it appears their only distinction is that ( x~, y~ ) and ( x, y ) refer to 'physical' coordinates while ( u~, v~ ) and ( u, v ) do not. What is this distinction all about? Aren't they all physical coordinates? I'm lost!. ( OpenCV ML)

Comments

Popular posts from this blog

NZM-SEC Superfast express that took my '35 hours'

Ideas

discordant yet musical whistles