Wednesday, August 13, 2008

I write a lot of image processing code using the OpenCV library. I've been using this C API for years, and I've grown deeply familiar with its eccentricities. I know my way around it pretty well.

However, it is a C API, and hence rather low-level. You have to explicitly allocate and deallocate images and other objects explicitly -- lot's of clerical overhead. The code is not pretty.

Naturally, one starts to wonder if there is a Better Way. And indeed, I've attempted several times to create some kind of object-oriented layer on top of OpenCV. But I always find that the layer I create ends up getting in the way, requires too much maintenance, and ends up causing more annoying resource management issues than it's worth.

So I have always, always gone back to using the C api.

My efforts to build this easier layer usually focussed on creating an image class, encapsulating the IplImage type used by OpenCv. In my first pass many years ago, I created the image class, and then started adding all the image processing functions in OpenCV as methods. This is okay to a point, and simplifies certain types of coding appreciably, but there are too many calls in the API that don't map very well to the object.method() paradigm.

I think the basic error I made in those previous efforts was making the central object the image. Last night I had the idea of making the central object an image operator. The result is extremely clear, brief, simple, and yet still performant code. Here's a snippet:

from_file("input.png")
>> smooth(CV_GAUSSIAN, 21)
>> rotate(45, 100, 100, 2)
>> rgb_to_grey
>> resize(2)
>> threshold(127, 255, CV_THRESH_BINARY)
>> erode(1)
>> dilate(1)
>> to_file("output.png")
;


Here's the equivalent C code:

IplImage *img_from_file = cvCreateImage("input.png");
cvSmooth(img_from_file, img_from_file, CV_GAUSSIAN, 21);
double mat_stg[6];
CvMat rot_mat = cvMat(2, 3, CV_64F, mat_stg);
cv2DRotationMatrix(cvPoint2D32f(100, 100), 45, scale, &rot_mat);
cvWarpAffine(img, _img, &rot_mat);
IplImage *img_grey = cvCreateImage(cvGetSize(img_from_file), IPL_DEPTH_8U, 1);
cvCvtColor(img_from_file, img_grey, CV_BGR2GRAY);
IplImage *big_grey_img = cvCreateImage(cvSize(img_grey->width*2, img_grey->height*2), IPL_DEPTH_8U, 1);
cvThreshold(big_grey_img, big_grey_img, 127, 255, CV_THRESH_BINARY);
cvErode(big_grey_img, big_grey_img, 0, 1);
cvDilate(big_grey_img, big_grey_img, 0, 1);
cvSaveImage("output.png", big_grey_img);
(omitting necessary cleanup code to delete images)



Which code would you rather read? Which code would you rather write?

It's probably apparent that what the nice code is doing is pushing an image through a pipeline of image-processing primitives.

Now pipelines are all well and and good, but what if your processing problem can't be modelled with the simple topology of a pipeline? For instance, what if you have an operator that has two image inputs? A good example is an operator which creates a weighted sum of two images. In that case, the code looks like this:

(from_file("input.png"), rotate) >> mix(.2, .5) >> to_file("output_mix.png");

Pretty slick, if I do say so myself.

Here's how it all works: For each operator we want to add, we create a class with a Process method, an Output method, and an operator() method. The Process method accepts the data that it is to process, and the Output method returns the result of its operation. The overloaded operator() is where the user passes in parameters which affect its operation. You can think of them as "filter settings".

Here's an example of the Threshold operator:

class Threshold
{
public:

Threshold() : _img(0) {}
~Threshold() { if(_img) cvReleaseImage(&_img); }

Threshold& operator()(double thresh, double max_val, int threshold_type)
{
_thresh = thresh;
_max_val = max_val;
_threshold_type = threshold_type;
return *this;
}

IplImage *Output() { return _img; }

void Process(IplImage *img)
{
_make_images_same_size(img, &_img, IPL_DEPTH_8U, 1);
cvThreshold(img, _img, _thresh, _max_val, _threshold_type);
}

protected:

IplImage *_img;
double _thresh, _max_val;
int _threshold_type;
};


I've used an overloaded operator>>() to hook up operators. It looks like this:

template <class T_rhs, class T_lhs>
T_rhs& operator>>(T_lhs &lhs, T_rhs &rhs)
{
rhs.Process(lhs.Output());
return rhs;
}


That's all you need for a pipeline. To support the multiple-inputs idea, I had to do a bit more template legerdemain. To wit, I overloaded operator,() (I had to check that it was actually possible to do this -- who overloads the comma operator, for heaven's sake?), and create a helper class that would cache references to the two input arguments. That code looks like this:

template <class T1, class T2>
struct ArgPair
{
ArgPair(T1& arg1_, T2& arg2_) : arg1(arg1_), arg2(arg2_) {}

T1& arg1;
T2& arg2;
};


template <class T1, class T2>
ArgPair operator,(T1 &arg1, T2 &arg2)
{
return ArgPair(arg1, arg2);
}


And then I had to create another overloaded operator>> that would accept a left-hand side of type ArgPair:

template <class T1, class T2, class T_rhs>
T_rhs& operator>>(ArgPair<T1, T2> &lhs, T_rhs &rhs)
{
rhs.Process(lhs.arg1.Output(), lhs.arg2.Output());
return rhs;
}


Voila. Works like a charm. Happiness blares from every street corner. ...Exccceeept: what if I want to pass more than two arguments to an operator? 3 arguments, or possibly 4? Hm. It took a couple kicks at the can to get this going, and it's not quite as brief as the code so far, but here's what I did:

template <class T1, class T2, class T3, class T_rhs>
T_rhs& operator>>(ArgPair<ArgPair<T1, T2>, T3> &lhs, T_rhs &rhs)
{
rhs.Process(lhs.arg1.arg1.Output(), lhs.arg1.arg2.Output(), lhs.arg2.Output());
return rhs;
}

template <class T1, class T2, class T3, class T4, class T_rhs>
T_rhs& operator>>(ArgPair<ArgPair<ArgPair<T1, T2>, T3>, T4> &lhs, T_rhs &rhs)
{
rhs.Process(lhs.arg1.arg1.arg1.Output(), lhs.arg1.arg1.arg2.Output(), lhs.arg1.arg2.Output(), lhs.arg2.Output());
return rhs;
}


... continue ad infinitum. Or, till, say, 8 or 9. Sheesh, how many inputs do you need?

So I think I may have found my way out of the OpenCV C API wilderness. I have a couple computer vision projects lined up, and I am going to use this approach from start to finish. I just love that you get so much in the way of abstraction and simplicity in return for a very small amount of infrastructure code.

Here's the code. It's very preliminary, but there just isn't all that much to it, so you should be able to start hacking and adding operators right away. Also note that you don't have to use this approach for image processing -- it'll work for anything.

3 Comments:

Blogger Full Infinity said...

That sounds like something suitable for a concatenative language such as Factor.

Come to #concatenative on freenode if you want to learn even more.

8:21 PM  
Blogger The Method Artist said...

I've been watching Factor with interest, and I think stack languages are pretty cool. In fact, Forth was the first language I learned after BASIC.

8:49 PM  
Blogger gleb bahmutov said...

Good thinking, checkout CImg library, a template based wrapper to image and operations. I have used OpenCV in research, and it is a pain, but for most image operations CImg is very convenient.

9:49 PM  

Post a Comment

<< Home