3D Shape Detection with HALCON 13 and Nerian SP1


In this video tutorial I’ll give a brief on
how HALCON 13 can be used together with the SP1 stereo vision system to perform 3D shape detection. HALCON 13 features many improvements for 3D
image processing but most importantly its GenTL interface now supports 3D point clouds,
which greatly simplifies the integration with the SP1. For this tutorial I’m going to use our Karmin
stereo camera, which has been mounted here on a tripod and is looking down vertically
on a table where we have placed a cylindrical can. The Karmin camera is attached to our SP1 stereo
vision system which does all the stereo image processing and the SP1 itself is connected
through gigabit ethernet to a computer that is running HALCON. So right here I’m running HALCON on Linux,
but actually everything that I’m going to show also applies to HALCON for Windows. We start by verifying that the GenTL producer
is installed correctly and that the computer is receiving image data from the SP1. We do so by opening up a new image acquisition
and by having a look at the available interfaces. So right now we still have all possible interfaces
here in this interface list, because we haven’t yet run interface detection. By clicking on the auto detect interfaces
button most of these interfaces should actually disappear but we should still have the GenTL
interface, which we are going to select. If we switch over to the connection tab, then
we can see a list of different detected devices. So, these devices, they’re actually all virtual
devices for one and the same physical SP1 that is connected to this computer. So, each of these virtual devices provides
a different type of data. There the first one that we have here is the
disparity device, which provides the disparity map, which is the inverse depth map. Then we have here the left camera image, we
have the 3D point cloud and the right camera image. And we also have this first entry here, which
does not have any additional name. This is a device providing a multi-part data
stream, meaning that it contains several different types of data, and actually this device contains
all the data that is provided by the other individual devices. Multi-part data streams are a new feature
in HALCON 13 which greatly simplify the synchronized acquisition of different types
of data. So, let’s verify that HALCON is actually receiving
data. We can do so by selecting the left camera
from the list of virtual devices and by pressing the live preview button. Then what we’re getting is an actual live
view of what the left camera is currently recording. So, as we can see the camera is observing
this cylindrical can that we have placed here, and this is actually going to be the goal
for this tutorial, to detect this cylinder by just looking at the 3D depth data that
we receive from the SP1. Likewise, we can also get a live preview of
the disparity map, as it is being computed by the SP1, by selecting the disparity virtual
device and again pressing the live preview button. Now, in this gray scale image that we get
here it’s actually hard to recognize something. So, what we can do is, we can select a look-up
table for color coding. The cyclic-temperature color scale actually
works nice for the disparity map as we can see here. So, in this case red hues correspond to close
image points and yellow hues correspond to image points that are a little bit farther
away. We can also try to have a look at the point
cloud virtual device, but if we do that we’re not getting any preview. This is because this virtual device provides
3D floating point coordinates, which cannot easily be visualized. At least not in this image acquisition dialog. So, so much about experimenting with this
image acquisition dialog. Now, let’s move on to the actual programming. So, for this tutorial what we’re going to
do is, we’re going to modify the example program that is provided with the SP1 software release. So, let’s open the example for HALCON 13,
and then let’s try and run it. So, what we see this example opens up three
different graphics windows, but let’s first disable this code following here. OK. The three different windows that we have here
show the image of the left camera, a color coded version of the disparity map, and here
on the right we have an actual live visualization of the 3D point cloud. OK. So, let’s stop this program and let’s have
a look at what this code is actually doing. Right here at the very top we have one open_framegrabber
statement, which is actually opening one of these virtual devices. Now, the device that it opens is the one that
ends with a /. Now, as I said previously, this is the multi-part
data stream device. By accessing this multi-part device we can
get actually all the data that is provided by the other virtual devices, but we can do
so in a synchronized way, meaning that all of the different types of data that we’re
acquiring always correspond to the same point in time. Right after that we have some code for initializing
the graphical output, where we’re going to skip over. And then the next interesting line again is
here this call to grab_data_async, which acquires one frame from this device. Now, grab_data_async actually uses a separate
thread for image acquisition, which is why there is a call to grab_image_start somewhere
before that. The alternative would be to use the grab_data
function, which does image acquisition without using another thread. The code below then separates this multi-part
frame into its individual components. There is the left camera image, the disparity
map, then three different maps containing the X-coordinates, the Y -coordinates and
the Z-coordinates of the 3D point cloud. After that there’s some code for visualizing
the left camera image and the disparity map, which we’re again going to skip. This is where the processing of the 3D point
cloud starts. First the threshold function is used on the
Z-coordinate map to create a region that contains all points up to a depth of 3 m. Then the reduce_domain function is used with
this region in order to select all points that are within this permitted depth range. It is only necessary to call reduce_domain
for one coordinate map, which is why here it is applied only to the X-map. The individual coordinate maps are then combined
into one 3D object model, by using the xyz_to_object _model_3d function, and then finally this
3D model is displayed using the disp_object_model_3d function. There’s also another way to display this 3D
model, by using the visualize_object_model_3d function, which is here commented out, but
we can switch over to this function, and re-run this program. As we can see, in this case we no longer get
a live preview of the 3D point cloud but rather we get a still frame, but now we’re able to
interactively zoom and rotate this frame, and we can get a rather clear view of the
3D contour of this cylinder. Now, as we will be running this program several
times, the first real modification that we’re going to do is to select an appropriate default
pose for the visualization. We can do so by editing this create_pose statement
at the beginning of the program, and now here in this example the virtual camera is actually
looking at a slight angle from a larger distance at the observed object. Now, we can change that to have a closer and
more vertical view, by setting here the lateral offsets to 0, by reducing the distance, and
by also setting the angles to 0. So, let’s re-run the program and have a look. As we see, this new camera perspective is
actually a lot more suitable for this example. Now, lets continue with the actual image processing. If we want to detect this cylinder, then the
first thing that we have to do is, we have to segment it from its background. In this case, this is actually rather simple,
because the cameras are looking down approximately vertically onto the table. So, if we want to separate the cylinder from
the table, then we can apply a simple thresholding operation to the observed depth values. As we have seen before, this example already
performs a thresholding of the depth values, so we can just go and modify this. Now, here in this case the cameras are approximately
half a meter away from the table. So, let’s first try out a threshold of 60
cm and re-run the program. So, all of the table is still there, meaning
that this threshold was not strict enough. So, let’s reduce it to 50 cm and re-try. So, this seems to have been too strict, as
most of the cylinder has disappeared. So, let’s try out 55 cm instead. This threshold seems to be just right, as
all of the cylinder is still there, and all of the table is disappeared. We should be aware that there’s only the upper
half of the cylinder visible, because this is the only portion that can be seen by the
cameras. Just to improve the error robustness, we can
also set a lower threshold for the depth. We can try out a threshold of 50 cm and re-run
it, and as we can see this actually cuts of some part of the cylinder, but if we set this
to 45 cm and re-run the program, we can again see that this threshold seems to be just right. Let’s continue with the actual object detection,
for which we’re going to use the function segment_object_model_3d. This function is able to detect a set of known
geometric primitives from the 3D point cloud. We pass it the 3D object model that we created
and for now we’re going to use the default parameterization. Then, this function is going to return a new
3D model that we’re going to save in the ObjectSegmented variable. Because we’re trying to write good code, we’re
also releasing this object model again, by calling the function clear_object_model_3d. And finally we’re going to display the detected
3D primitives by also passing them to the visualize_object_model_3d function. So, let’s give this program another go and
see what it does. As we can see, it actually detected several
cylinders, which each approximately represents some part of the observed can, but there are
some significant outliers. The reason for this behavior is that we have
been using the default parameterization for the segment_object_model_3d function, which
does not work so well for our example. The parameters that we’re going to adjust
are the min_area, which is actually the minimum number of 3D points that are required for
detecting a 3D object. Then there’s the maximum curvature difference
between two surface points and the maximum allowed orientation difference between two surface
normals. For the min_area parameter we can actually
select a very high value of 10,000, because we have a very dense point cloud. And for the other two parameters, we’re going
to choose values of 0.1 and 0.2, which are parameters that I have successfully tested
earlier. Now, let’s re-run this program and have a
look. Now, this time the cylinder has been detected
very nicely, and it should be noted that we actually did not instruct the segment_object_model_3d
function to search for cylinders, but rather it searches for known geometric primitives,
which can be planes, cylinders or spheres, and it correctly identifies that what it’s
looking at is a cylinder. In order to improve the visualization, let’s
also visualize the table surface. We can do so by copying the existing code
from above that does the depth thresholding, and we can apply a higher threshold that will
also include the table. We can then create a new 3D object model from
these thresholded coordinates, which we’re going to call ObjectAll. Of course we should not forget to release
this object model again. And then we’re going to use this model for
visualizing the 3D point cloud. So, let’s re-run the program and have another
look. This time we can get a much better view of
how well this cylinder has been detected. If we take a close look, we can see that the
cylinder almost touches the presumable plane of the table surface. To show that this program also works well
for other orientations of the cylinder, we can just re-place the can and re-run the program. The cylinder has been detected just as well
in this other orientation. Let’s re-place the can again and try out one
more orientation. As we can see, the cylinder has again been
detected just as well. So, this concludes this video tutorial. If you want to learn more about the SP1 stereo
vision system and it’s possible applications, then please visit nerian.com or feel free
to contact us. Thank you for watching.

Leave a Reply

Your email address will not be published. Required fields are marked *