REAL-WORLD CAMERA AND SCENE MATCHING IN BLENDER © 2007 K. G. Nyman



PART II
The theory and math behind BLenses.py and tips for better scene matching


CAMERA MATCHING IN BLENDER: Some basic requirements & recommendations

In Blender, field-of-view (FOV) is specified in the "Lens" setting by using the "Degrees" option ("D" button). For all intents and purposes, this is the only useful setting for Blender's camera when trying to match a real-world scene. My research shows that the default "Lens" option is actually calculated from the "Degrees" option, using a formula that is inaccurate for matching a real-world focal length. Thus the default option for "Lens" has no generally useful correlation to a focal length, and is best just ignored.

In order for any calculated FOV to be an accurate match when used in a Blender scene, the Blender scene must be an accurately scaled model of the real-world scene in terms of the camera-to-subject relationship. One corrolary of this is that the real and virtual scenes should share a common fundamental unit of measurement. In practical terms, this would usually be millimeters (i.e., 1BU = 1mm), since most lens' focal length and many film and sensor sizes are expressed in these units. However, some of Blender's coded limits to certain values require that a unit like centimeters or meters be used for large-scale scenes.

The metric system makes translation of this scale very simple when necessary -- for example, a city scene constructed in millimeters would require some unwieldy numbers, and exceed many Blender value limits, but meters and even kilometers can be easily translated into millimeters when required, and vice versa. English units can be easily converted, but for greatest accuracy, use as many decimal places as possible in converting to metric (e.g., 1 inch = 25.38mm instead of 25.4). The BLenses script gives options for the scene scale, and automatically converts units when necessary.

Obviously, matching real-world and Blender-world is best done with some advance preparation and careful documentation of scene parameters. Simply taking a "wild" shot and trying to match it in Blender will be fraught with difficulty because essential information about the scene setup may be missing. There's no doubt it can be done that way, and the FOV-matching process will be of some help, but it would be an inefficient process with a lot of trial and error, whereas by taking down a few measurements (or incorporating them into the scene, as described later), things can go a lot smoother.

There is software that uses a number of calculations to produce a perspective match using only the data from a photographed image. I've done some perspective-matching exercises in Icarus, and while it's undoubtedly useful, it seems to me there are many possible sources for error in the way it reconstructs a perspective grid, and it has limitations on what kind of scene can be accurately reconstructed. Using it on a photo I took to test my FOV calculations, I came up with a model that was a pretty good match from a single camera position, but moving from that position showed the model's inaccuracies (sort of like a forced-perspective setup). Using the lens-matching process in Blender can help "iron out" some of the perspective glitches that can arise using the methods employed in Icarus and other software of this type.

FOUNDATIONS: The fundamental concepts of camera, lens , and scene matching

A critical aspect of matching an image from a real-world scene in a virtual environment is perspective. If the perspective of the real-world scene and the virtual scene are significantly mismatched, the compositing of the two will seem artificial no matter how well other parameters of the scene (such as lighting, camera movement, etc.) may be matched. The perspective of an image from the real world is determined by the camera that made the image, and its relationship to the elements contained in the scene. Matching the scene in the virtual world of Blender means matching that camera's characteristics and its physical relationship to the scene being imaged.

Regardless of camera type, there are some basic parameters that need to be known when trying to match a virtual camera to a real-world camera. They're listed here with the variables that are used later in the equations of the simulation:

f -- Focal length of the real-world camera lens, usually measured in millimeters. Since both fixed and zoom lenses are popular, this may not be as straightforward as it seems. For a fixed f, no problem. But a zoom lens rarely gives a specific f for any particular zoom level, so another method must be found to determine this value. More about that later.

s -- The distance from the camera to the subject in sharpest focus. This value is included to take into account the magnification of the lens system. A lens focuses its image at a point some distance from the lens, equal to the lens's focal length only for a lens focused at infinity (in practical terms, about 30m). Focusing on a closer subject alters the focal point of the lens, and this has to be accounted for in simulating a lens in a 3D app if an accurate match to the real-world camera is to be made.

A -- The image aperture of the camera system. This refers to the physical dimensions of the area where the image is formed in the real-world camera. For a 35mm SLR using 135 roll film, this is (rounded) 36mmx24mm. For various types of 35mm motion picture film gauges, the dimensions vary; specifications for these formats are available on the web from a number of sources.

The most common approach is to specify A as the longer of the two dimensions of a film size specification, although still photographers sometimes use the diagonal (which can be calculated with the Pythagorean theorem, solving for the hypoteneuse). The formulas in this project use the longer dimension.

Determining a physical value (i.e., in mm) for A is problematic when digital cameras are considered. Film aperture sizes are well-established and rigidly standardized; digital camera sensors are anything but. Sensor chips come in a wide variety of sizes and resolutions; each manufacturer uses a sensor chip to meet its own specifications, often making the A for each type of camera (even from the same manufacturer) a different value. This difficulty is compounded by the fact that not all the area of a sensor chip may contribute to a digital image, possibly making the physical dimensions of the chip less than useful for determining an accurate value for A. And just to make matters more confusing, higher-end cameras provide a choice of pixel dimensions for images, the smaller image sizes being interpolated from a maximum image area. All this makes calibrating a digital camera's output in terms that can be matched by a 3D app more than a little complicated (see the next section).

AR -- The image aspect ratio of a camera, generally defined as the long dimension of an image size divided by the short, e.g. 135 roll film = 36mm/24mm = 1:1.5 AR. This ratio applies to digital images as well, e.g. 1280x960 = 1:1.33 AR. Other familiar ARs are 16:9, 1:1.85, etc. Sometimes these values are expressed in whole numbers (e.g., 4:3 = 1:1.33), as "normalized-unit" dimensions (e.g., 3x4, 16x9), or without the relational syntax (e.g., simply 1.33, 1.85, etc). AR is not used directly in the FOV formula, but is a critical factor when rendering a matching scene from Blender or any other 3D app. The AR of the rendered image must match that of the target camera.

THE DIGITAL DILEMMA: A proposal of how to calibrate the image aperture of digital cameras

When the image aperture A of a digital camera cannot be reliably determined with any great accuracy from its technical specifications, another means must be found to calibrate this aspect of the FOV equation. I spent some time trying to bend the math around the idea of using pixels as a unit of measurement in the equations (since pixels are a unit of A in the recorded digital image), but found that there is not enough information to do so without first calculating the FOV of the scene, which is the end result of these equations. Converting all units to pixels after the FOV is calculated is straightforward math, and might be a useful step in some situations, but it can't be done in the initial FOV calculations. A "physical" unit like mm must first be determined.

Fortunately, there is a fairly simple method of doing this that can be applied to almost any real-world camera shoot, regardless of make or model of digital camera -- include a "ruler" in a number of shots. For most purposes, this can be a meter-long object, held square to the camera prior to calling "action," much like a sync clapper is employed. For close-ups or other restricted-view shots, smaller rulers of known metric length can be used. A single ruler marked with various metric divisions would be very versatile for shots of many types.

This ruler can be used to calibrate the camera's A value by using the magnification formula of an optical lens system:

M = f / ( s - f )

where M = magnification, f = focal length of a lens and s = camera-to-subject distance, as above

Most people think of "magnification" as the property of a lens to form an image that appears larger than the subject, but that is only for magnification values greater than 1. The small focused image that a camera lens makes is also magnified, but by a factor less than 1.

This presupposes that two values are known -- the focal length of the real-world camera lens, and the distance from the camera to the subject in sharpest focus (f & s), so these values must be recorded at the scene.

The magnification equation determines the size of the image formed by the lens. If an object of known length L is included in the shot, then its length in the image will be L * M. For example, at a magnification of .01 (1/100), a meter-long object held parallel to the image plane of the camera would form an image 10mm in length.

Since a digital camera images in pixels, then the number of pixels that equals the length of the ruler in the image ( P ) will provide a "pixels per millimeter" (ppmm) scale for the digital camera sensor (this is similar in concept to the familar "dpi", or dots/pixels per inch, used to specify print resolution of digital images). Once a ppmm value is determined, it can be used to gauge the useable image aperture of the digital camera sensor in millimeter units (Amm) based on its maximum width in pixels (Ap) -- exactly what's needed to complete the FOV equation for a digital camera. Here's the math:

image length in mm = ruler length * magnification = L*M
ppmm = image length in pixels/image length in mm = P / L * M
Amm = aperture in pixels/pixels per millimeter = Ap / ( P / ( L*M ) )
Simplifying:
Amm = Ap * L * M / P

where Amm = image aperture in mm, L = ruler length in mm, Ap = maximum aperture in pixels (longer dimension of recorded image), M = magnification, and P = ruler image length in pixels.

Example:
Using a 35mm lens to photograph a scene focused at 3m from the camera:

M = 35 / ( 3000 - 35 ) = .0118

Using a ruler of 1m (1000mm) length, with the camera imaging a maximum of 3872 pixels, assume that the ruler in the digital image measures 1940 pixels. Therefore:

Amm = 3872 * 1000 * .0118 / 1940 = 23.55mm effective image aperture

Amm can then be used for the A value in the FOV-matching equation:

FOV = 2 * atan( 23.55 * ( 3000 - 35 ) / ( 2 * 3000 * 35 ) ) = 36.78deg

If your calculator uses radians to do trig, multiply your result by 180/pi to get this solution.

Setting Blender's camera to 36.78 degrees will provide a match to the digital camera's characteristics for this scene.

NOTE: Some digital cameras provide the option of different pixel dimensions for the same scene, with lower resolutions being interpolated from the maximum. To avoid more calculations and possible introduction of error, always use the highest non-interpolated resolution for any digital camera when doing this calibration.

It might be assumed that this calibration need be done only once, since the pixels per millimeter (ppmm) ratio should not change once established. However, a number of factors make this calculation prone to a margin of error:

1) The ruler may not be held perfectly square and parallel to the camera's image plane, skewing the accuracy of the P value;

2) Optical lens systems are subject to aberrations that might introduce small errors in the P value, especially if the camera's sensor is small;

3) Zoom lenses can make determining an exact value for f (and thus M) more difficult than with fixed lenses.

The remedy for these sources of error is to use the ruler and the same camera in a number of shots of varying character -- different fixed focal length lenses (if possible), different camera-to-subject distances, etc. -- and to try to make sure the ruler is as large as possible in the image frame. Once a number of calculations of ppmm is made for a camera, the results can be averaged for a more consistently accurate value. In all shots, the more accurately the ruler is positioned square and parallel to the image plane, and in particular parallel to the long side of the image frame, the better the calibration will be. Greatest accuracy can be achieved by mounting the ruler on a flat surface and setting the camera on a tripod, measuring to insure that the camera is square to the surface with the ruler, similar to the setup for copy photography.

When the ruler image is not parallel to the long sides of the frame, Pythagorean math (based on the formula a^2 + b^2 = c^2 for the sides of a right triangle) can be used to solve for its length, but this formula cannot be used when the ruler is out of parallel with the image plane (i.e., in depth). A number of "ruler shots" made as carefully as possible, and the results averaged, will help "smooth out" the errors that might occur.

The zoom lens issue is particularly difficult to resolve, because they are usually not marked in any way to indicate the effective focal length for any particular level of zoom. A range of f is typically specified, such as 8mm - 33mm, but all the values in between are dependent on the relationship between the optical elements of the lens, which changes as the zoom level changes. The focal length change when zooming isn't necessarily a linear function, making accurate determination of f for any zoom level virtually impossible unless the lens has been calibrated with a scale marking by the manufacturer, which is not common.

If a camera provides for EXIF metadata embedded in images, a focal length for a still shot can be read from this data, although the level of accuracy of the data may be fairly restricted (such as to only whole numbers in focal length reporting).

When other sources for focal length information are absent, if an accurate ppmm value for a digital camera can be determined, then the magnification formula can be used to calculate a "ballpark" focal length for any shot, since magnification can be determined by the number of pixels (and hence the number of mm) a ruler extends in a digital image -- another reason to include the ruler in more than just a few shots. Plugging the M and s values into the formula allows a solution for f using basic algebra. This calculation must be treated as having a possibly significant margin of error rather than exact, but it can provide a starting point for determining a more accurate value of f for scene-matching purposes.

If a selection of lenses isn't an option (i.e., switchable fixed focal length lenses can't be used), and EXIF data is not available, then only the extremes of the zoom lens focal length range should be used to calibrate the ppmm, with the assumption that the range limits are stated accurately. While a difference of a fraction of a mm in the f value may not badly skew the FOV calculation, larger errors can make perspective matching more difficult, since many digital cameras use lenses with very short focal lengths in the low end of the zoom range -- the error may be a significant part of the nominal focal length.

The large variety of digital cameras in use, both still and video, and the wide variety of lens types (with a marked preference for zoom lenses that can't be switched), makes scene matching quite a bit more complex, and perhaps inherently less accurate, than with film cameras, but knowing where the weaknesses are helps when planning workarounds.

An example: Using the calibration on a typical "point & shoot" digital camera

Using a Fujifilm FinePix 2800Zoom, I took two shots of a one-meter-long ruler marked in ten centimeter graduations. Since the lens cannot be changed and has a nominal zoom range of 6mm - 36mm (optical zoom), this represents a fairly "worst-case scenario" for calibration. This is obviously not a professional-level camera, but that just makes the test of the calibration more strenuous. The ruler was mounted on a wall and the camera on an inexpensive tripod at approximately the height of the ruler. The ruler image was squared visually in the viewfinder.

Shot A was taken at the short end of the zoom lens range, nominally
f = 6mm, at a distance of 2m from the camera (measured from the end
of the lens barrel to the subject). The camera was set at its maximum
resolution option, 1600x1200 pixels.
In the resulting image, the ruler
measured 931 pixels. Plugging in all the numbers:
Ap = 1600, L = 1000, P = 931, M = 6/(2000-6) = .003

Amm = 1600 * 1000 * .003 / 931 = 5.15575mm effective image aperture

Shot B was taken at the long end of the zoom lens range, nominally
f = 36mm, at a distance of 4.35m from the camera, again at 1600x1200
pixels.
In this case, only a portion of the ruler image was used,
corresponding to 50cm = 500mm. In the resulting image, the ruler
segment measured 1214 pixels. Plugging in all the numbers:
Ap = 1600, L = 500, P = 1214, M = 6/(2000-6) = .00835

Amm = 1600 * 500 * .00835 / 1214 = 5.502mm effective image aperture

The margins of error in the setup and measurements were relatively high, accounting in the most part for the different results. The camera itself probably contributed a fair amount of error as well, as the 6mm image showed considerable barrel distortion which probably skewed the P factor for that shot.

However, an average of these results gives an effective image aperture of 5.323mm for this camera. To check this result, I used the published camera specifications, which describe the sensor as a "1/2.7-inch square-pixel CCD." This seems to be a fairly common sensor spec for this kind of popular digital camera, and I found some more information on the web that states that this kind of sensor measures 5.3mm x 4mm. This confirms the accuracy of the calibration method within the rather broad error tolerance for this camera and my setup. With more stringent attention to detail, and a more professional-level camera, a more accurate assessment of effective sensor width is quite possible.

MATCHING VIRTUAL CAMERAS AND REAL-WORLD CAMERAS: Math provides a useful common ground

3D apps like Blender have no real lenses, but instead use math to emulate (but not necessarily simulate) what a lens would see. What this means in terms of matching the view of a real-world lens is that parameters like focal length and magnification are meaningless in terms of Blender's camera. Even the concept of focus is absent from a virtual camera in its basic form -- which is why such effects as depth of field (DOF) have to be simulated.

A virtual camera and an optical camera share only one distinct property, the field of view (FOV), sometimes also referred to as angle of view or camera angle. For virtual cameras, FOV is specified (usually in degrees) and used by the matrix math of the programming to determine what portion of a virtual scene is rendered. FOV also determines all the perspective characteristics of a virtual scene, which is of major importance in scene-matching with the real world.

FOV for optical systems is determined by the three variables explained above: f, s, and A. The equation that relates these values is:

FOV = 2 * atan( A ( s - f ) / 2sf ) -- for computation where trig functions use degrees
FOV = 2 * atan( A ( s - f ) / 2sf ) * 180 / pi -- for computation where trig functions use radians

FOV is here expressed in degrees, the most common way of specifying FOV in a 3D app.

The above equation is based on the following equation of an "ideal lens focused at infinity":

FOV = 2 * atan( A / 2f ) - or -
FOV = 2 * atan( A / 2f ) * 180 / pi

but accounts for the magnification of the system by incorporating camera-to-subject distance (s), for a lens focused at less than infinity.

For those interest in the math, here's a heaping helping:

HOW ACCURATE IS ACCURATE? Graphing the practicalities of the FOV formula

The ideal situation for a real-world scene to be matched in a virtual environment is to have complete control over every aspect of camera operation with exhaustive recording of all camera-to-subject relationships in the scene. Unfortunately the real world rarely cooperates to such a degree, and shots are often made under less than ideal circumstances. This raises the question of how much inaccuracy a scene-match can tolerate before the error breaks the illusion of the match. Whether or not a virtual scene's perspective and that of a real-world image are a close match is a very subjective matter -- in the end it will be a matter of the filmmaker's informed judgement. But the likelihood of a perceptible mismatch can be reduced by reducing the error in the completely objective data the formula uses. This raises the question of the formula's error tolerance, which is important because it can help establish some practical limits on scene-matching requirements that tend to complicate making the real-world shots.

Just how sensitive is the FOV calculation to errors in input data? One way to see the answer is to graph the equation in a number of ways, plotting the change in the result (FOV) against changes in one or another of the formula's variables. This reveals where small changes in a variable can make large changes in the FOV result:

The most variation in FOV comes with change in focal length (f) -- not until the f value reaches about 200mm (the yellow trace cross-hairs on the graph) does the slope level off appreciably. This means that within the range of "most often used" lenses, a small error in the input focal length can mean a large error in the FOV result. For fixed-f lenses, this is not usually a problem, but it can be a significant source of error with zoom lenses, particularly on digital cameras with small sensors where the zoom range is from very short (around 8mm) to short (35mm), where the sensitivity to error is greatest. Camera-to-subject distance (s) also has a range of heightened sensitivity -- when the camera is close to the subject. In practice, the subject distance should always be at least 5x the focal length for the FOV formula to work best (the trace marks s = 180mm), and from there to about 1.5m the sensitivity is considerable. The curve flattens out at about 6m -- for this lens & aperture, anything beyond this distance can be considered at "infinity" focus. Other graphs show that longer lenses have a slightly wider sensitivity range than shorter. The plot of FOV against image aperture shows no flat spots where changes make little difference -- the constant slope of the nearly linear curve within a practical range of A (up to 50mm) shows that the entire range has significant sensitivity to change in A, and consequently to error in caused by inaccurate input. For film formats, which have been rigidly standardized, this presents no problem, but the possible inaccuracy in determining the effective aperture of digital sensors make this a potential source of error that must be carefully considered.

While a bit technical, these graphs illustrate what areas of real-world camera configuration and measurement are most likely to contribute to less accurate results from the FOV formula. The formula itself is impeccable, but the result is only as accurate as the values plugged into it.

Hopefully this analysis will help identify which kinds of real-world scenes would benefit most from careful measurement to insure accurate scene matching, e.g., a close-up shot with a fairly short lens. Conversely, if a scene is well out of the range of greatest sensivity to error, the filmmaker need not go to special lengths to insure greatest accuracy, e.g., the camera-to-subject distance for a wide shot focused at considerable distance from the camera using a normal fixed-f lens doesn't need to be extremely accurate, and can vary during the shot with considerable latitude without introducing large error in the FOV calculation (and thus in the scene match). When mismatches do occur, these analyses can also help determine where to look for sources of error.

EXAMPLES OF REAL-WORLD SCENE MATCHING: Matching a 35mm SLR film camera photo, a digital composite, and other tests

When I first developed the FOV-matching math described above, I knew it had to be tested against real-world imagery no matter how logical and rigorous the math may be. My resources for this were limited to 35mm SLR still photography with only a couple of lenses, but within those limits the tests were very successful.


The real-world scene match at left uses a shot taken by the digital camera described above in the sensor width calibration discussion. The goal was to do a convincing perspective match using only the most basic information from the scene. The camera was set on a tripod at approximate eye height, and only the camera-to-subject distance measured at approximately 20m. No great pains were taken to be excessively accurate in the measurement, as this setup falls in the range of little sensitivity for error in the FOV calculation; one purpose of the test was to see how sloppy measurements might affect the goal of getting a good perspective match.

In Blender, a new scene was set up by first using BLenses to assign a focal length (6mm) to the camera, and create a POI for ease in visualizing camera-to-subject distance and subsequent viewpoint adjustments. The World Scale was set at meters, subject distance at 20, Image Aperture at 5.3mm, and BLenses returned an FOV of 47.8 degrees. Since the relative positions of the house and the digital camera weren't tightly measured, a primitive model of the house was assembled and positioned at the Blender camera focal point (marked by the DoFDist display):

The house model was sized to fit the image. By rotating the house model on the Z axis, and making small adjustments to the positions of the Blender camera and its POI constraint, a good perspective match was eventually achieved in less than two hours -- more measurements of the real-world scene would have sped the process considerably. Final tweaks to the match were made using BLenses to set the final focal length to 6.2mm -- a small difference, but it made the match more accurate.

Lighting the scene to match the photo image took quite a bit more time than the perspective match, the final result being a combination of soft but directional key light from a Sun Lamp, a more intense fill and backlight using a Hemi Lamp, and ambient occlusion shadowing for the Suzanne statues and the extremely soft shadows they cast. Compositing was done in Blender using two Render Layers, with only some small post-rendering Photoshop touch-ups at the bases of the statues and a small Levels adjustment.



Another "in virtu" test involved a setup with a calculated magnification factor. A target of known size was placed at a distance from the Blender camera that would yield a specific calculated magnification. The rendering of this scene was then measured (in pixels) to determine the actual magnification, and in all instances, the match was exact.

These tests were static and highly controlled, which in live-action shoots is rarely the case. However, they do support the accuracy of the lens-simulation math, which boosts confidence in the method for use in more realistic production scenarios.

SPECIAL CONSIDERATIONS FOR MOTION PICTURES:
Rack focus, contra-zoom, and other movie magic

Rack focus with moving camera. The POI constraint was used both to set the DoFDist for keyframing, and for animating the camera viewpoint -- the camera itself was stationary. In this example, the focal length/FOV was kept constant.

An example of contra-zoom, rendered in Blender 2.44. The focal length was animated using the BLenses script, with focal length varying from 40 mm to 19.747 mm (calculated to keep the magnification constant).

Depending on the director, a motion picture scene may have a completely static camera, but more often than not, some aspect of the camera's relationship to a scene will be in motion during a shot. Even if fixed in position, the focal point of a camera may change considerably during a shot -- rack focus and follow focus. Dolly shots, crane shots, truck shots, and tracking shots all involve a camera in motion relative to the subject. These kinds of moves affect the value of s in the FOV equation, and thus the FOV may also change within certain limits. In addition, zooming during a shot changes the value of f, and not always in a linear fashion, once again leading to a variation in FOV over the length of a scene.

For situations like these where the values used to calculate FOV may change from frame to frame of a shot, it may be necessary to match these changes with an IPO curve modulating the Blender camera's FOV as well as matching any physical motions with translation IPO curves on the Blender Camera. If the range of camera-to-subject distance (s), and/or range of f encompassed by a zoom, is known, these values can determine initial start and end point values for the FOV change, and keys added to the IPO curve as needed to tailor the virtual image to the live action.

One of the more dramatic effects that can be obtained by such variations during a shot is the "contra-zoom," where a focal length change (zoom) is combined with a dolly (motion toward or away from the subject along the camera's optical axis) such that the subject remains in relatively the same position and scale in the shot (magnification is kept to a single value) while the perspective changes noticeably around it. Notable examples can be found in Hitchcock's Vertigo, Spielberg's Jaws, and Jackson's Fellowship of the Ring. Even such complex effects can be easily emulated in a virtual scene (see the associated animation), and with pre-planning and careful measurement, successfully matched to a real world shot.



REAL-WORLD CAMERA AND SCENE MATCHING IN BLENDER Part I: BLenses.py, a script for matching real camera characteristics with Blender’s camera