 |
|
|
|
Videoconferencing H.323 Basics
Introduction
H.323 is an International standard protocol for
videoconferencing. It uses the Internet for connectivity between
endpoints. Endpoints can be client videoconferencing terminals,
Multipoint Control Units (MCUs), or gateways. This presentation
describes the various endpoints and how they interoperate.
|
|
|
| |
|
Point-to-Point Videoconferencing
Consider two client terminals that are connected to the Internet.
(See Figure 1)
An example of a client
terminal or end point is a Polycom Viewstation. The Viewstation and its
associated peripherals allow the user to make a call to another client,
send the local audio/video stream to the remote client, and hear/view
the received audio/video stream on a local speaker/monitor that is
connected to the Viewstation.
Assume one user (the local user) uses a Viewstation to call a user at a
remote Viewstation (client terminal) by entering the IP address of the
remote Viewstation. The clients setup a call between the stations
following the specifications of the H.323 protocol. Once the call is
setup, the clients exchange audio/video streams over the Internet. The
point-to-point videoconference continues until one of the users "hangs
up" the call.
One of the problems with this type of video call is that IP numbers are
used for the call. IP numbers are difficult to remember; some users have
dynamically assigned (DHCP) IP numbers that can change every time they
boot their system; and we have noted problems in using IP addressing
when different vendor systems are used. We thus do not recommend the use
of IP dialing although it is occasionally used.
|
|
|
| |
|
The Gatekeeper
To alleviate the problem of IP dialing, the H.323 standard defines the use of a gatekeeper.
(See Figure 2)
The gatekeeper
is a system that connects to the Internet just like the client
terminals. The IP address of the gatekeeper is configured into the
client terminals and when the clients "power up", they communicate with
the gatekeeper and transfer certain information to the gatekeeper that
describes the client. This process is known as registration; the client
registers with the gatekeeper.
Two identifiers are assigned to and configured in each client terminal.
One is a H.323 Alias. It is usually descriptive of the particular client
terminal and usually contains alphanumeric characters. The other
descriptor is the H.323 Extension. It usually consists of several
numbers and can be thought of as being the video telephone number of the
client. While it is possible to use either the H.323 Alias or the H.323
extension for dialing, it is difficult to dial alphanumeric characters
on most clients; it is the H.323 extension that is normally used for
dialing. Refer to the section "Understanding Videoconferencing"-
"Addressing Issues" for a better understanding of addressing standards
used at Northwestern University.
When the clients register with the gatekeeper, they pass their IP
numbers, H.323 alias, and H.323 extension to the gatekeeper where it is
stored. This allows a local user to dial a remote user by entering the
remote users H.323 extension (video telephone number) rather than an IP
address. The local client terminal communicates the H.323 extension to
the gatekeeper. The gatekeeper then checks to see if the remote client
is registered with the gatekeeper. If it has, the gatekeeper sets up the
call between the two clients; if it is not registered, the call is
rejected. Once the call has been setup, the audio/video streams flow
directly between the clients over the Internet. The gatekeeper can
perform a number of other management functions as well. For a
description of these, see "Understanding Videoconferencing"- Advanced
Issues".
|
|
|
| |
|
Multipoint Videoconferences
To this point we have only considered
point-to-point videoconferences. These are conferences between two
client terminals. The question can then be raised, "what if we have
users at three or more clients that want to hold a videoconference". To
handle this situation, the H.323 standard introduces the concept of a
Multipoint Control Unit (MCU). The MCU
(See Figure 3)
is an endpoint
that can be thought of as a "video bridge". The MCU connects to the
Internet as does any other endpoint and registers with the gatekeeper,
as does any other endpoint.
A MCU, depending on its design capacity, can handle a certain number of
simultaneous videoconferences each with each videoconference being
logically separate from the others and with each having a specified
number of users. System administrators define "services" on the MCU
where each service has certain characteristics that contrast it from
other defined services on the MCU. As an example, a service of 75 might
be defined that allows for several simultaneous videoconferences to be
created where each have a maximum size of, say, five sites (clients) and
where all must encode their audio/video streams at 384 Kbps. A specific
videoconference on service 75 is then defined by the service number and
by a conference "password" (e.g. 751234). Each of the simultaneous
videoconferences that are held on service 75 is then defined by the
service number (75) and by a different password.
When users want to join a particular videoconferencing session, they
dial the service number/password combination. The gatekeeper checks to
see if that service has been registered by a MCU. If it has, the
gatekeeper completes the call by connecting the client to the specified
videoconference on the MCU; if the service has not been registered, the
call is rejected. Once the call has been connected, the client's
audio/video stream is then sent over the Internet from the client to the
MCU. Similarly, other clients connect to the session and send their
audio/video streams to the MCU. The MCU selects one of the audio/video
streams on the videoconference and returns that audio/video stream to
all of the clients (that is all except the client whose stream was
selected). There are several methods for selecting an audio/video
stream. Audio switching and chairman control are two alternatives.
Typically, the method that is chosen is audio switching where the MCU
selects the stream that currently has active audio (someone is talking
or is talking the loudest). We frequently refer to this selection
process by saying that this particular stream (client) has "captured"
the MCU.
Lets assume that we have several clients connected to a single
videoconferencing session on a MCU. The assumption is that no users want
to have the MCU send them back video of themselves and no site wants to
receive an audio stream that contains their own audio. So the MCU sends
the selected video stream to all the clients except the client whose
stream was selected; the MCU sends the video from the last site that was
selected to the currently selected site. All of the audio streams are
aggregated together and sent back to each site except with their audio
removed. Thus each site gets a unique audio stream. Each stream only
contains the audio from the other sites.
As the user(s) at one site stop talking and the user(s) at another site
start to talk, they capture the MCU. The process is repeated with the
video from the newly selected site now being sent to all the other
sites, and the newly selected site getting the video from the previously
selected site.
|
|
|
| |
|
Streaming
To participate in a H.323 videoconference, users must have
appropriate videoconferencing client terminals and have Internet
connectivity with sufficient bandwidth to support the videoconference.
Some users may not have these capabilities but would still like to be
able to participate even if that meant that they could only see and hear
conference participants but not be able to interact with them. This can
be accomplished if the videoconference session is captured, encoded in
an appropriate format, and streamed over the Internet although this
capability is not a part of the H.323 standard.
(see Figure 4)
To accomplish the streaming, a H.323 client must be connected to the
conference session to be streamed. This station will be able to capture
and decode the audio/video that the MCU has currently selected. This
decoded audio/video can then be re-encoded and streamed over the
Internet. There are two popular encoding standards that are currently
being used: RealVideo and Microsoft Windows Media. The encoded
audio/video can then be either streamed on the Internet by a server or
archived on a disk file for later viewing or both. The system consists
of a H.323 client, an encoder, a server, and an archive storage system.
Users can receive the stream using a browser on a computer. They enter
the URL of the server, and the server starts the encoded audio/video
stream over the Internet to the computer. Plug-Ins for the browser exist
that are capable of decoding both RealVideo and Windows media streams.
The user can thus see and hear the participants in the streamed
videoconference in near real-time. Alternatively, a user can connect to
the server at a latter date and view the archived version of the
videoconference.
|
|
|
| |
|
Gateways
So far we have discussed H.323 videoconferencing capabilities.
However, many sites have videoconferencing rooms that implement the
H.320 standard that uses telecommunication lines (e.g. dial-up or
dedicated ISDN lines). H.323 standard was developed after the H.320
standard and uses many of the encoding/decoding protocols originally
developed for H.320. The H.320 systems can be considered to be legacy
systems, but since many of them still exist, it is important that we
continue to support H.320.
In addition to supporting pure H.320 videoconferences using H.320 MCUs,
gateways between the two protocols can be provided.
(See Figures 5)
A gateway provides a path between H.320 and H.323 systems. It translates
H.320 commands and audio/video streams to H.323 audio/video streams and
vice versa. Users with H.320 client terminals dial the gateway over ISDN
lines. The H.320 client then needs to input a service/password
combination for the selected session, and the gateway connects the H.320
terminal to the selected session. All H.323-based users can see and hear
the H.320-based users as if they were on H.323 terminals, and similarly
the H.320-based users can see and hear the H.323 users as if the were on
H.320-based terminals. Multiple H.320 connections can be made to the
gateway up to the capacity of the gateway.
|
|
|
| |
|
One other benefit of the gateway is that it can accept calls from
standard telephones.
(see Figure 6)
A user with a standard telephone
dials the ISDN telephone number and is connected to the gateway. The
telephone user then enters a series of digits to indicate the
service/password combination of the desired videoconferencing session.
The user can then hear the entire audio from the videoconference and can
also interact with others in the conference. The gateway is able to
simultaneously connect multiple telephone calls and can even connect to
a telephone bridge that could allow participation by a large number of
audio only users.
|
|
|
| |
|
Bandwidth Considerations
The H.323 client terminals encode the selected
audio (usually from a microphone) and video (usually from a camera)
inputs. The encoded and video are then compressed into a single
audio/video stream and sent to the remote end point (another client
terminal or a MCU). Different rates can be selected for the encoding
process. As an example, an encoding rate of 384 Kbps might be selected.
64 Kbps is reserved for the audio and 320 Kbps is reserved for the
video. The 384 Kbps stream is compressed (redundancy is removed) and
sent to the remote end point. Similarly a 384 Kbps stream is received
from the remote end point. Thus approximately twice 384 Kbps in
bandwidth (less any bandwidth saved because of compression) is required
to support the videoconference for this end point. If there is a lot of
motion in the video, very little compression is achieved. If there is
almost no motion in the video, the savings approaches about 50%. Since
we must design for the worst case, assume a bandwidth requirement of
twice 384 Kbps.
Faster encoding rates can be selected. Most client terminals support
rates up to 768 Kbps. Some proprietary implementations can encode at
speeds up to 2 Mbps. The higher the encoding rate, the better the
quality of the video. However, higher encoding rates also mean higher
bandwidth requirements, greater impact on the network, and greater
impact on the MCU capacity. Lower encoding speeds can also be selected
down to about 128 Kbps. This of course means lower video quality. 384
Kbps is a good compromise between quality on one hand and resource
impact on the other. 384 Kbps will support 30 frames per second video.
Lower encoding speeds yields lower frame rates and choppy video. There
is a discernable but small improvement in quality between 384 Kbps and
768 Kbps.
|
|
 |
 |
 |
 |
 |
|
 |