Digital Video Basics

Perceived Motion

A sequence of successive still images will be perceived by the mind as showing motion.

Approximately 10 images per second are necessary to create the illusion of continuous motion.

Approximately 50 images per second must be shown to create the illusion of smooth motion.

Frames

All film and video consists of a series of still images played at a steady rate over a period of time, creating perceived motion. The individual images are referred to as Frames.

The rate at which frames are recorded or played back (the "frame rate") is measured in "frames-per-second" or "fps"

Film Basics:

Film is recorded at 24 fps.

A frame of film is similar to a slide or photograph - it is one, single, complete image.

When projected, each frame is shown twice. This results in a total of 48 images per second, close enough to 50 to make the motion seem continuous (otherwise you would see the screen go black between each frame, making the image flicker). However, because there are only 24 discrete images per second, the motion in the image is not perfectly smooth. If you watch closely, you can see a slight stuttering of the image - this is most evident when the entire image is changing between frames, such as when the camera is moving.

Video Basics:

Most video is recorded at approximately 30 fps.

The individual frames of a video can be recorded in one of two ways: Interlaced or Progressive.

In an interlaced frame every other line (i.e. every odd numbered line or every even numbered line) is recorded first and the remaining half of the lines are recorded second. Because of the slight delay between recording each set of lines this has the effect of doubling the apparent frame rate and increasing the temporal resolution (producing smoother motion).
Progressive scanning records all lines in a frame in one pass. This generally results in a lower temporal resolution that more closely approximates the look of images recorded on film.
All current broadcast formats are Interlaced.

In video, each frame is composed of three channels of information. The Luminance channel contains brightness information for the entire frame and corresponds to the green information in the signal. The remaining two channels (Chrominance) are called "difference" channels because they contain only the difference between the green channel and the red and blue channels. This eliminates a significant amount of redundant information and reduces the bandwidth necessary to carry a video signal. This system is referred to as Component Video and may be noted as (Y, B-Y, R-Y), (Y,Cr,Cb), or (YUV)

Aspect Ratios

The Aspect Ratio is the ratio of a frames' width to its' height. It is notated as width:height

4:3 is the aspect ratio for all current broadcast standards. This produces a video image that is 4 units wide by 3 units tall. Broadcast standards specify an image that is 720x480 pixels, which produces an image that is actually 4.5:3. The pixels, however, are taller than they are wide and are squashed slightly when displayed on a television to achieve the proper 4:3 ratio. Computer displays, however, are standardized on square pixels, so a broadcast video image displayed on a computer monitor will look unnaturally wide. When preparing video for display on a computer it is thus necessary to scale the image to 640x480 pixels (full frame) or 320x240 pixels (half frame) so that it will have the proper aspect ratio.
16:9 is a new aspect ratio that has been selected for high definition video. It more closely approximates the aspect ratio of 35-mm film, and requires a widescreen television for proper display. When viewed on a standard 4:3 television, 16:9 images are displayed with a black band across the top and bottom of the screen, a process known as Letterboxing.

Broadcast Video Standards:

NTSC (National Television Standards Committee): 29.97 frames/second - 59.94 fields. 525 horizontal lines. Used primarily in North, Central and South America.
PAL (Phase Alternate Line): 25 frames/sec Ð 50 fields. 625 horizontal lines. Used primarily in Western Europe, Australia, Middle East, most of Asia, some African countries.
SECAM (systeme electronique couleur avec memoire) 25 frames/second Ð 50 fields. 625 horizontal lines. Used primarily in France, Eastern Europe, former USSR, and some African countries.

Digital Sampling

Sampling is the process of converting a continuous, analog waveform into discrete numeric values (bits) so that a computer can store and manipulate the signal.

According to the Nyquest Theorem, the minimum sampling rate that will accurately represent a waveform is twice the frequency of the waveform.

The Nyquest Theorem applies when sampling video, with the sampling rate determining the maximum frequency that can be accurately represented in the final video file. Higher frequencies correspond to finer details in the image being recorded.

ITU-R 601 is the international standard that defines the sampling rate and resolution of digital video for broadcast and studio use; this is the standard against which all digital formats and compression schemes are compared. It specifies 8 or 10 bits per pixel, 720 pixels per line for the luminance channel, sampled at 13.5 MHz. The remaining two channels are sampled horizontally at half that rate, or 6.75 MHz. This reduces the total amount of data that is recorded for each frame, but still provides a satisfactory image because the human eye is significantly more attuned to changes in luminance (brightness) than it is to chrominance (color). ITU-R 601 is usually what is meant by "uncompressed digital video"

8 bits per pixel, per channel, results in 24 bit color overall, meaning approximately 16.7 million colors can be represented. A fourth channel, the Alpha Channel, is often present in digital video, and it carries information about the transparency of each pixel. Video with an alpha channel is often referred to as 32-bit video (8bits per color channel, plus 8bits for the alpha channel). 10 bits per pixel results in just over 1 billion possible colorsÉwhile this is overkill for video intended to be viewed on television, it is frequently used when creating graphics or video which will be later transferred to film.

4:2:2 component video refers to the standard of sampling the chrominance channels at half the horizontal resolution of the luminance channels, as in the ITU-R 601 standard.
4:2:0 component video refers to the process of sampling the chrominance at half the vertical resolution of the luminance channel. Both are sampled at 13.5 MHz, but only every other line of the chrominance channel is sampled. This is the process used in PAL DV format and in the MPEG2 standard for DVDs (see compression formats below).
4:1:1 component video reduces the video data rate even further by sampling the chrominance channels at one quarter the horizontal resolution of the luminance channels. Again, because of the human eye's greater sensitivity to brightness changes this generally produces an image that is of acceptable quality; however, for certain processes such as chroma keying the reduced color information can present problems. This sampling rate is primarily used in the NTSC DV formats.

Cameras

All modern video cameras use a CCD, or Charge Coupled Device, to translate light into an electrical signal. A CCD is an array of photovoltaic sensors which each produce a consistently variable electric signal that changes in response to the amount of light striking the sensor.

Video cameras generally use either 1 or 3 CCDs, and are referred to as Single-Chip or Three-Chip. A single-chip camera uses one CCD to record all three primary colors of light (red, green and blue), while a three-chip camera splits the light, usually with a prism, into separate colors and uses one chip to record each color. Consumer cameras are generally single-chip, to keep costs down, while professional cameras are generally three-chip, to produce the best images. Three-chip cameras reproduce colors more accurately than single- chip cameras, but because the prism reduces the intensity of light reaching each chip, a single chip camera will generally respond better to lower light levels.

Gain is the process of electronically amplifying the signal coming from the CCD, which increases the sensitivity of the chip to lower light levels. While this allows a camera to record an acceptable image with little light, it also introduces noise to the video image. Low levels of gain will produce little extra noise, but high gain can result in visibly inferior images. The increased noise from high gain can cause problems when compressing digital video later, so it is generally preferable to shoot with the lowest level of gain possible. Gain is expressed in decibels, with common gain levels being Ð3db, 0db, 3db, 6db, 9db, 12db, and 18db.

The most common CCD sizes are 1/4", 1/3", 1/2" and 2/3". Smaller CCDs are less expensive, but larger chips can produce higher resolutions and less noise at a given gain rate. Consumer cameras generally use 1/4" chips, prosumer cameras may use 1/4" or 1/3", and professional cameras use either 1/2" or 2/3" chips.

Acquisition Formats

VHS, VHS-C, and 8mm are the most common analog consumer tape formats. Each reproduces approximately 240 horizontal lines of resolution. Because of their low resolution and high picture noise they are not well suited to acquisition of video for production purposes.
SVHS and Hi8 are considered Prosumer analog formats, meaning that they can be used for production purposes. Each can resolve about 400 horizontal lines of resolution and they have lower noise properties than their consumer counterparts. These should be considered the minimum for any type of production work.
MiniDV and Digital8 are the most common consumer/prosumer digital formats. Both use the same compression format, have no analog noise problems, and can resolve about 500 lines of horizontal resolution. These can be considered consumer, prosumer or professional depending on the equipment they are used in (i.e. a three chip MiniDV camera with balanced audio and a removable lens would be considered professional, while a single chip MiniDV handycam would be considered consumer equipment). These formats are popular for multimedia work because of their high image quality combined with relatively low cost.
BetaSP is the industry standard for professional analog acquisition. It produces a high quality component video signal with 500 lines of resolution and very low noise properties. BetaSP equipment generally costs about 3-4 times as much as MiniDV equipment.
DVCAM is the professional version of MiniDV. It uses a larger cassette but the video format is essentially the same.
DVCPro is similar to DVCAM but uses less compression on the video signal, producing a signal that is better suited for compositing and effects work.
D-9 (formerly Digital-S) is JVC's high-end professional digital format. It produces high quality images that are well suited to any type of broadcast work, and it is comparably priced to BetaSP.
Digital Betacam is Sony's high-end digital production format. It is extremely high quality, with a very low compression ratio. DigiBeta equipment is also very expensive, and used primarily for high-end broadcast production.

Standard Analog Connections:

Component uses three cables to maintain the three color channels separately. This generally produces the cleanest analog signal. Used on most professional analog video equipment and beginning to show up on high-end consumer DVD players and televisions.

S-Video maintains a separate luminance signal but combines the two chrominance channels into a single signal. This results in a slightly lower quality image than component, but is significantly better than composite video. Signals are carried in a single cable with a five-pin connector. Usually found on most professional video equipment, S-VHS and Hi8 consumer gear, DVD players and some consumer televisions.

Composite combines the three channels into a single signal that is carried through a single cable, usually with an RCA connector. This can cause significant degradation of the signal and is generally the least desirable connection to use. Primarily found on inexpensive consumer VCRs and televisions; may be present on professional/prosumer equipment but should only be used as a reference signal and not in situations where image quality is important.