posted on Feb 2, 2009 at 10:23PM
Video-enabled DSLR Cameras
Sensor Design Tradeoffs and Video PerformanceBy Bill Mixon
Sensor design tradeoffs and video performance
Since the new DSLRs use a different sensor technology than earlier video cameras, and since this choice introduces artifacts that some consider serious, let’s examine what causes this. One can then decide for which uses each might be better, and minimize unexpected problems. (Warning: if one’s eyes glaze over easily, one could be excused for skipping this section.)
There are two basic image sensor technologies used in video and still cameras, CCD and CMOS. Most video cameras, and the majority of P&S cameras, use the older CCD technology. An increasing number of DSLR still cameras (by now probably most of them) use the newer CMOS technology.
Both of these technologies divide the sensor area up into a two-dimensional grid of “photosites”, or photosensitive semiconductor cells. When light (from the lens) falls on these sites they convert it to electric charge and accumulate (integrate) that charge, each eventually building up a total charge proportional to the intensity of light at that point in the image.
After a predetermined amount of time spent sensing light and accumulating charge, the charge in each photosite is read out. The onset of readout stops the accumulation process, essentially creating an electronic shutter. At the end of readout, the charge in each cell is bled off (or reset to a known value) in preparation for the next exposure and accumulation cycle. For still pictures this need be done only once for each image. For a video, this expose / readout / reset cycle is repeated many times per second, at the chosen frame rate.
CCD (“Charge-Coupled Device”) sensors read out the data by transferring the accumulated charge sequentially from site to site. You can visualize this process by analogy with an old-fashioned firefighters’ “bucket brigade”, where a line of people passed a series of buckets, each filled with varying amounts of water, down the line to douse a fire at the end. (In fact, CCDs were once called “Bucket Brigade Devices”.) In place of the fire, a CCD has a digitizer at the end of the line that converts analog charges from the photosites into digital codes, sequentially, one site at a time. This sequential conversion takes awhile. And because the CCD photosites cannot accumulate new charge while they are being read out, most CCD sensors use a two-stage process for readout. Each photosite has a light-blind “holding cell” as well as a photosensitive “imaging cell”. At the end of each frame’s exposure, the charge in the imaging cell is transferred to its companion holding cell. The holding cells then perform the readout process, passing the charge down the bucket brigade of other cells to the digitizer. Meanwhile, the imaging cells are free to be reset and start accumulating charge, beginning the exposure for the next frame before the previous frame has finished being read out. This speeds things up, overall, as two things can be done at once (exposure and readout). It also allows exposure timing to be a little more independent of readout timing.
In practice, the CCD readout itself is also a two-stage process. Each time charge is transferred between cells, a little might be lost, or a little extraneous signal “noise” might be picked up, introducing error. There are millions of cells in the image. If a charge had to pass through all of them, the final error would be huge. To shorten the path between each photosite and the digitizer, there is an additional row of holding cells added. Each cell in this row is fed by the last (output) cell in one of the columns of the image array. It’s the last cell of this extra row that feeds the digitizer. So, for readout, the last cells of all the columns of the array are transferred to the readout row, which is then shifted into the digitizer. The columns are then shifted and their next cells are transferred to the readout row, which is again shifted into the digitizer. And so on until the whole image array has been transferred. The path to the digitizer is thus at most one row and column long, for any given cell, reducing error. Though the column holding cells are shifting too, bucket-brigade style, they do so at the much lower horizontal line-scan rate rather than the very high pixel rate. This reduced column-shifting speed helps keep overall power dissipation lower.
In such a CCD, the transfer of charge from imaging cells to holding cells occurs at the same time for all cells in the image. The start and end of the exposure time are thus exactly the same for all cells in the image. This is called a “global shutter”, because it’s applied globally (that is, to all cells) at once. Its operation is analogous to that of a mechanical “leaf shutter” in a film camera. It exposes all parts of the image at the same time.
The access to pixels (photosites) in a CCD is inherently sequential. One must shift through all the cells at the beginning of a row or column to get to those near the middle or end. Therefore, it is difficult on a CCD to read out just a part of the image. The whole thing must be done at once.
CMOS (“Complementary Metal-Oxide Semiconductor”, which refers to the use of Field-effect transistors of opposite [complementary] polarity in the circuits) sensors are the newer technology, and are increasingly popular in DSLR still cameras. In CMOS sensors, each photosite is connected by transistor switches (row and column “multiplexors”) more or less directly to the digitizer, thus eliminating the need for an elaborate shifting process during readout. Just connect, digitize, and repeat for the next photosite, until done. This makes pixel access on CMOS sensors a random-access thing. They can be accessed in any order, and only a subset of the whole chip need be read out, if that’s all that is of interest.
The Casio Exilim EX-F1 camera takes advantage of this ability, and that’s why it uses a CMOS rather than the more typical (for a P&S) CCD sensor. By reading out a smaller portion of the full image, it is able to complete the readout in a shorter time. Therefore it’s possible to start exposing and reading a new frame sooner. Very high frame rates are obtainable with this technique, which is called “windowing”.
CMOS sensors were originally developed as a low-cost technology for use in high-volume consumer products (webcams, cellphones, etc.). These were engineered to minimize cost. While a storage cell could be placed at each pixel site on a CMOS sensor, along with the essential imaging cell, it would take up space. This would mean either that
· the sensor would have to be larger (increasing cost, and also not an option if the sensor is constrained to some standard frame size); or
· the number of photosites that would fit on the sensor would be less (decreasing resolution); or
· the size of the imaging cell would have to be decreased at each photosite to offset the size of the storage cell (decreasing low-light sensitivity and increasing noise).
So early CMOS sensors were designed without storage cells at the photosites. This legacy continues even today. We have perhaps arrived at a time where use of storage cells would be useful. But even if cost is not the driving concern, adding storage cells has bad effects on light sensitivity and noise, so it’s not a step to be taken lightly.
Since CMOS sensors usually do not have storage cells at each photosite, they must scan the image a little differently. They can’t start a new exposure at a photosite until that site has been read out. Therefore, without onchip storage, it might seem that they must wait for the entire image to be read out, before the next exposure can be started.
In a traditional still camera, of course, there’s a mechanical shutter to regulate the light falling onto the sensor during exposure. An electronic shutter isn’t needed there, and it matters little how the data is read out.
In a video camera, though, a mechanical shutter would be cumbersome and noisy, so video cameras rely more upon electronic shutters, that is, using time between readouts as the exposure (light-accumulation) time. Based on the frame rate, video cameras have a finite period in which to complete both the exposure and the readout. But it can take a long time to read out and digitize the millions of photosites on a camera’s sensor. If one had to wait for the whole image to be read, there wouldn’t be much time left in the frame interval for exposure. One could get around this by adding storage cells to allow more CCD-like timing, overlapping and decoupling the exposure and readout intervals. But barring this, some other solution is needed.
CMOS sensors finesse the problem by reading out and restarting exposure a row at a time, rather than the whole frame all at once. It takes only a little time to read out and digitize the few thousand (at most) pixels in a single row. After that, the photosites in that row can be reset and start accumulating light for the next frame’s exposure. At 25fps, the frame interval is 40 milliseconds (1/25th second), and most of that is potentially usable for exposure. (Old movie cameras, with mechanical rotating shutters, traditionally fixed the shutter speed at half the frame rate, that is, 1/50th second shutter for 25fps. But digital video cameras are more flexible, and can set the shutter to a wide range of values.)
So, a CMOS sensor will expose and read out the first scanline. And then expose and read out the second scanline. And so on till all scanlines have been read, and it’s time to repeat the whole process over for the next frame. This technique is called “rolling shutter”. It is akin to the mechanical “focal plane” shutter used on many still cameras, in which a narrow slit is moved across the film plane. The width and speed of the slit define the shutter duration. But in this case, the “slit” is electronic rather than mechanical; it is simply a series of scanlines, switched between readout and exposure modes, progressing through the frame. An important characteristic of this type of shutter is that not all parts of the image are exposed at the same time.
There is a famous photograph by the French photographer Jacques-Henri Lartigue, taken in 1912. (You can see it, at a link given in the appendices below.) The photo depicts a race car in motion, with some spectators in the background. The photographer was panning with the motion of the car, though not quite fast enough to keep up with it. Lartigue’s camera had a vertically-moving focal-plane shutter. Therefore, during the exposure, as the shutter slit moved up the film plane, the car and spectators moved across the film plane in opposite directions. The result? The car’s wheels appear as ellipses, tilted forward in the direction of motion. And the spectators are leaning severely in the other direction!
Now common sense tells us that the wheels were really round, and that the people were erect in real life and not tilted the way they appear in the photo. What caused this? It’s a common distortion of moving objects, seen with focal-plane shutters. As the shutter moved across the picture, it recorded a little slice of each subject at the position where it was at that moment in time, with the times (and therefore positions) quite different from the top to the bottom of the picture. Hence the tilt.
Is this a bad thing? It was very effective in Lartigue’s photo. That picture would be less interesting without the tilt. But we’re on a slippery slope here. Objectively, we can confidently predict that this will happen under the right circumstances, such as rapid panning or horizontal motion in a field exposed with a rolling or focal-plane shutter. Subjectively, aesthetically, each must decide for oneself whether the effect is good or bad. Cool as it is, if one were aiming for a different effect, this one might not be wanted.
So, by analogy with a focal-plane shutter, we could expect the rolling shutter used in video captured with CMOS sensors to show similar effects. And it does, as is quite apparent in sample footage taken with CMOS-based cameras.
The “tilt” effect in Lartigue’s photo is commonly called “skew” in the parlance of rolling-shutter artifacts. If the direction of relative subject movement is constantly changing, as it might be for the case of camera shake (if the camera were held freehand, for example) the artifact is called “wobble” and resembles a jello-like stretching and vibrating of the image. (Some people, in fact, call this “jelly video”.)
Jannard’s sample Nikon D90 videos clearly show both skew and wobble, as they were meant to do. Some of the samples in the D90 reviews by Bloom and Stamatiou also show examples of skew.
To be fair, the D90 is perhaps especially prone to these effects because of its 24fps frame rate. The slower the frame rate, the longer the frame interval, and the more time objects have to change position as they move across the frame. These motion artifacts tend to be less obvious at higher frame rates (a fact which lets the CMOS-based Casio camera off the hook, at least in high-speed video modes).
Rolling-shutter effects are far less obvious in the sample footage for the Canon EOS 5D Mark II. This does not mean that camera is immune to those artifacts, it just means that the people who made the samples were either savvy enough to avoid them, or they edited them out. In Michaud & Porta’s “Tokyo Reality” film, during the sequences shot inside the Bullet Train, if one looks out the windows, objects in the rapidly passing landscape show severe tilt. (The relatively stationary subjects inside the train, of course, look normal.)
Even very high-end cameras are not immune to the phenomenon. Consider the RED One camera, which also uses a CMOS sensor. In the excerpt from Peter Jackson’s film “Crossing the Line”, look closely at the wagon wheel that appears briefly near one soldier’s face at the beginning of the clip. As the camera pans away rapidly, this wheel takes on a very definite tilt. This is easier to see if you step through the movie frame by frame. It’s not at all objectionable, in fact it’s barely perceptible, but it’s there.
People like Peter Jackson can do as they like, they’ve earned the right to make those artistic choices without apology. The rest of us also have that right, we just might have to defend our choices a little more vigorously.