WHEN you think about it, lots of people could use the knowledge obtained from capturing, re-creating, and analyzing body motion in 3-D (three-dimensional) form. One group might be specialists: orthopedic surgeons, to plan surgery or physical therapy; athletes, to maximize their training and performance; robotic engineers, to "teach" robots how to interpret motion; and animators and video game designers, to make their animations more realistic. Another group might be the folks who want to understand their own common ailments, such as repetitive motion problems or tendonitis.
Studying body motion is not a new activity, but the data-gathering techniques for these studies have changed over the years. Currently, one common way of collecting motion data is by attaching reflective markers on a human actor at strategic points--on shoulders, elbows, hips, knees, ankles--and then videotaping the actor in motion. The video images are digitized for computer extraction and calculation of the 3-D locations of the markers. The calculated results are presented in graphical form, which may be plots or stick figures. While useful, these graphics are only coarse approximations of actual human movement. If used for animation purposes, for example, they would require a great deal of rendering before they could be considered finished. For studying biomechanics, their accuracy and level of detail are limited by the relatively small number of data points from which their information has been extrapolated.
Recently, Lawrence Livermore engineers Shin-Yee Lu and Robert K. Johnson demonstrated the next step in motion imaging systems. Dispensing with reflective markers, Lu and Johnson devised a system that detects data points on a grid of parallel, closely spaced vertical lines that have been projected onto a moving object (see figure above), which is then captured at video speed. Motion information collected in this way is dense, continuous, and uniform. It can be used to produce a real-time, complex visualization of movement that is realistic enough to be pasted directly into an animation. Lu and Johnson call this system CyberSight.

How to Get CyberSight
The line grid projected onto a moving subject comes from a glass slide precisely etched with parallel black lines; such slides are commonly used for calibrating instrumentation optics. Other than this slide and the projector, the CyberSight system components are similar to those of other motion imaging systems. Two charge-coupled device (CCD) cameras, which are semiconductor image sensors, produce the video signals. To sense the data points, the cameras are firmly positioned a small distance apart from each other to take "snapshots" from two perspectives. Operated from 1 to 10 feet away, the cameras take the snapshots at the standard video rate of 30 frames per second. The CCD images are digitized by an image frame grabber and stored in memory boards. From there, they are transferred to a host computer for calculation and reconstruction into 3-D computer representations that can be presented as rendered images.
The sample images of a facial expression sequence in the figure on the next page are taken from CyberSight data that were reconstructed into several perspectives. Unlike a conventional photograph, the images are generated from a computer model of true dimensionality that can be manipulated, analyzed, and visualized. This feature makes CyberSight useful for biomechanical analyses. The complex information will allow computer models to calculate body surfaces and volumes, determine relationships between bones and muscles, and estimate velocity and force of movements.

Calculating Three-Dimensional Space
By solving the problem of collecting complete, voluminous motion data, CyberSight has uncovered another problem. As collected, the data are of two-dimensional image planes that must still be transformed into 3-D moving images. This is hardly a simple task, given that each image frame from each camera may contain as many as 250,000 data points. Thus, it is no surprise that the key component of CyberSight is its complex image-processing code.
The code transforms 2-D objects into 3-D objects by mimicking human stereo vision. When we look at an object, each eye receives a slightly different image because it sees from a slightly different angle. This angular difference provides us with depth perception. The geometric expression of depth perception has been adapted by the computer code to calculate depth, using the two views of each data point sensed by the stereo cameras. The computer calculation is based on the principle of triangulation, a measurement technique that uses two known points to derive a third value. The triangulation uses the known, left-right views of an image point, the geometry of the camera arrangement, baseline distance, and converging optical angles to establish the position of that image point in space. The figure on the next page simplifies the basis of triangulation.
Camera geometry is determined by means of a calibration process. During calibration, images are taken of reference target points with known spatial positions, and these are used to back-calculate the camera geometry and lens parameters to be used for actual videotaping.

Matching the Left and Right "Eyes"
Before the calculations for depth can be made, the left and right views of the data points must be accurately matched. This difficult, time-consuming task requires a very high degree of computational complexity. The matching involves associating the correct left and right views of the same data point, associating them from the same image frame, and then tracking them from frame to frame so that movement reconstruction is logical. Additional complications come from changes in perspective, such as curvatures and orientation, caused by the moving object. Yet another type of problem intrudes when, on occasion, one view of a data point is eclipsed from the common view of both cameras, so not every data view has a stereo counterpart.
The computational techniques for left-right matching, like the 3-D transformation technique that imitates stereo vision, use principles based on the relationship between physical and mental processes, i.e., how stimuli lead to sensation. The computations imitate the human eye's ability to pick up intensity changes, make use of high-contrast features in an image (such as edges and intersections), and filter or "smooth" received data. They also use dynamic programming, in which small subproblems are solved. Those solutions are used to solve larger and larger subproblems until eventually the problem itself is solved. The techniques result in a code with some ability to interpret "context" in performing the left-right matches and to fill in some missing details, such as the eclipsed data views.

Building on Past Work
The work on CyberSight follows earlier robotic motion studies that Lu and Johnson performed for LLNL's in-house Laboratory Directed Research and Development project. The purpose of that work was to program robotic "hands" to handle hazardous waste. The work captured the interest of pediatric hospitals that are seeking better ways to design treatment for cerebral palsy. Their need gave Lu and Johnson the impetus to develop a more advanced motion imaging technique, and CyberSight was the result.
The most immediate applications for CyberSight are expected to be animation and virtual reality projects. With subsequent development, the CyberSight code will be a powerful tool with far-ranging applications from orthopedic surgical planning, speech therapy, and physical therapy to security applications such as facial recognition systems. This image-processing technology could also be used in manufacturing to provide rapid prototyping of new products and to personalize products such as prostheses, gas masks, clothes, and shoes. It has potential nonhuman-related applications as well. Surface deformations of materials could be monitored during the manufacturing process, or stress and strain analyses could be performed on materials and structures (such as vehicle air bags or the vehicle itself) to determine safety and functionality. The ingenuity of the CyberSight data collection technique, supported by its complex computer code, portends numerous and exciting future applications.

Key Words: 3-D visualization, biomechanics, image processing, motion study, movement reconstruction.

For further information contact Shin-Yee Lu (510) 424-3796 (johnson60@llnl.gov).

Back to December 1996