Web-enabled, CORBA Driven, Distributed VideoTalk Environment
on the Java Platform
Tomasz Mojsa, Krzysztof Zielinski
Institute of Computer Science
at UMM University
Al. A. Mickiewicza 30
30-095 Krakow, Poland
Nothing probably gets people more accessibility than the ability
to be everywhere without actually ever being there. The article presents
a general overview of a Java-driven, distributed framework for networked
real motion multimedia on the World Wide Web. It presents the obstacles
and difficulties one must inevitably face to make real video, Web-centric,
network Java applications a reality. The paper is primarily intended to
outline one of the most, as it is perceived, feasible ways to meet the
challenge at this stage of Sun's Java technology. It emphasizes the benefits
brought and limitations imposed by Java on the architecture of a multimedia
system. It sketches out the ways the designers of Java VideoTalk Environment
(a.k.a. VTE) system tried to exploit the former and to overcome the latter.
The article also portrays an attempt to build the entire video system on
top of OMG's CORBA computational model, facilitating enormously VTE 's
interoperability with ABS from Oracle/Olivetti Research Laboratory, Cambridge,
UK and, to a certain degree, with IBM's Aglets.
Keywords: CORBA, Java, ORL's ABS, Streaming Audio and Video, IBM
The demand for audio and video existence on the web seems to grow,
especially since the day when Java mesmerized web users by offering
easy and straightforward executable applet code download and execution
across many different platforms and environments. The interest sparked
by the promise of much more lively web pages and the omnipresence of proclaiming
Java as the web and multimedia oriented language somewhat diminished when
it turned out that the only multimedia support, for the time being, was
cartoon animation and that real video libraries were scheduled for 1997.
Starting with SunSoft's alpha3 version of Java a group at UMM embarked
upon integrating video and audio capabilities into Java software.
Implementing a real time networked video system is not a trivial task.
An implementation often requires addressing many levels of software development,
ranging from writing low level code (like code getting the images off the
video devices) up to higher software levels (like APIs needed for harnessing
the need for initiating connections and, in general, controlling the system's
execution flow). To tackle the problem of delivering real time video to
the web efficiently, one must be able to provide viable solutions in all
of the software areas.
Sun's Java appearance seems to be an attempt to provide an important answer
to the need for founding a common software platform, making the high level
code portable. However, its current state of technology offers very modest
programming facilities that could serve as a vehicle for real motion video
applications of practical use, thus effectively bridging the accessibility
gap between web users. VideoTalk Environment development team's aim was
the creation of a flexible system, bringing audio and video accessibility
to the user within a corporate LAN or, as work progresses, to wider
audience, in each case, from within an ordinary Java enabled web browser.
2. Design Problems
During the development of a Java video system today, one encounters
three main areas that in the reality shape the overall Java multimedia
system's structure. First of them is the difficulty of seamless integration
of video handling code with Java applet software, the second one is
the selection of the framework used to provide all the facilities needed
to harness the complexities of the system (like initiating/closing
connections, handling communication errors), and the third one is the
choice of transport protocol used to transfer data between video sources
We might imagine that ideally, a user of the system should use an ordinary
Java enabled browser and this would be all that he or she should need in
a Java network video application. An applet loaded into Netscape from a
remote host would get the video image of the user, thanks to standard Java
classes in the browser (supported by native libraries), and send user's
video to a destination (obviously with the user's permission to do so).
Unfortunately, at this moment, the Java packages do not provide anything
that would enable getting images off cameras and sound off microphones.
The only way to deal with it now is to use native methods and hook them
up to Java code. This approach, however, introduces a substantial flaw
into such Java software: classes containing native methods cannot be loaded
by most popular classloaders in browsers due to strict security restrictions.
This way getting multimedia capturing and displaying code in a dynamic
way from a given remote server without the support from the installed base
of browser's native software turns out to be unrealistic with the applet
concept. The reason is that letting to download such code into a WWW viewer
in the form of a lightweight applet might be dangerous (not to say non-portable).
Some code might try to mimic VTE system and try to execute native (unverified)
code, possibly breaching security at a host. Indeed, were it made possible,
the Java security model would be seriously deficient.
This observation of the limitation the security provision inevitably enforces,
provides the rationale behind introducing at least a two layer architecture
into a web based Java video system. The sensible way out for video system
implementors is providing native libraries on each host and creating a
server that would provide video capabilities for all locally started applets
and applications. The server could be a Java application doing the real
work in native methods or it could simply be an application written entirely
in C++, which should be the better implementation choice dictated by performance
reasons. This necessity to separate out native (C and C++) code out of
an applet code coincides with the benefit that is brought to an application
that has a very clear boundary between graphical user interface objects
and functional objects. VTE system follows the obvious observation that
precise separation between all software components (i.e. not just GUIs
and the back end) renders yet even clearer architecture. A natural consequence
of this is employing OMG's CORBA  IDL definition language to sketch
out all the services offered by system components. After approaching the
problem this way, a CORBA audio and video server would supplement the deficiency
in services offered by the native code, accessible to applets, through
Java (Netscape) Native Interface  and shipped with browser libraries.
An Object Request Broker also seems to provide a splendid solution as a
framework for coordinating and managing the objects interacting in the
VTE video system, thus addressing the second problem of the multimedia
system design which is the need for managing the system components.
The following sections depict the implementation of most of the objects
in a general way, presenting their capabilities and limitations.
3. High-level Overview of the System
Implementation obstacles imposed on developers by the integration of
video handling code into Java applets are reflected by the layered structure
of the VTE system (Figure 1.). The two layers (Browser Layer and ConnectionRequester
Layer) can be thought of as entirely Java layers. The AudioVideoSplitController
layer can be seen as native code (C++) layer.
VTE system utilizes VisiGenic's VisiBroker for Java ORB to manage its
components in a network. A sensible alternative might be basing the application
on top of Iona's OrbixWeb CORBA 2.0 implementation. Unfortunately, at
the time the team set out to implement the system, OrbixWeb didn't support
server side mapping for Java. There is however no reason, why some of the
components in the VTE system couldn't be built on top of OrbixWeb, while
some of them on top of VisiBroker, yet couldn't be fully interoperable,
thanks to the CORBA 2.0 IIOP protocol.
As a consequence of the chosen structure, every audio and video device
server (AudioVideoSplitController) in VTE must be installed on a host prior
to using it as a multimedia access point. Right now, there seems to be
no other elegant solution than to provide video managing facilities to
clients through a neatly designed "native" CORBA server.
The picture presenting the overview of the VTE system reveals that there
is a ConnectionRequester server object in the middle layer between the
separated out AudioVideoSplitController and Java applet code executing
in browser. It performs an extremely important function of coordinating,
validating, registering and possibly authenticating communication between
applets and AudioVideoSplitController. ConnectionRequester is also the
server that rejects or accepts connection requests and decides if they
should be handed over to AudioVideoSplitController or simply discarded.
Localization Site, also included in the middle layer, provides localization
information about the users of the system.
Figure 1. a high-level outlook on the layered structure of the vte system.
The next horizontal layer in the VTE system is the user interface web
based layer, acting mainly as a client level to the layer containing ConnectionRequester
and Localization Site. The applet GUIs communicate with the higher level
through CORBA mechanisms. All of them can be executed either as applets
or standalone applications and they all act as "stateless" software
code. Most of the VTE's object interfaces are contained in the system's
main module named VideoPackage. Almost all components in the system interact
in terms of operations defined in this module and exchange data structures
defined within it.
//some more code defining structs ....
oneway void transmit(in TransmitInfoClass bic);
oneway void receive(in ReceiveInfoClass ric);
oneway void change_packet_size(in SizeChange sc);
.//..less important code };
boolean request_connection_from(in ReceiveInfoClass ric,in TransmitInfoClass
boolean accept_connection_from(in ReceiveInfoClass ric,in TransmitInfoClass
oneway void change_transmit_param(in TransmitInfoClass bic);
oneway void change_receive_param(in ReceiveInfoClass ric);
boolean allow_connection(in string username);
boolean deny_connection(in string username);
boolean remove_queued_connection_request(in string username);
boolean register_listener(in string username,in string ior);
boolean unregister_listener(in string username,in string ior); };
boolean request_connection_from(in ReceiveInfoClass ric,in TransmitInfoClass
//some more code
4. Cooperation of the Web-oriented GUI and Native Layers Using the
The structuring of the system makes the implementation of user interfaces
and functional code separate. It is worth to catch a glimpse of the interaction
between VTE's layers and spot the essential role of the ConnectionRequester
layer with its persistent functionality. ConnectionRequester acts as an
entity relaying requested attributes of a video connection, set up by user
GUIs in applets (like "I want to open a connection to my friend on
host tulip (184.108.40.206) with JPEG video and linear PCM 44.1Khz sound").
It receives the parameters in CORBA calls from a client applet GUI and,
after authorizing access, passes them up to AudioVideoSplitController for
further execution. The piece of applet Java code loaded into browsers performs
vital role by passing commands from the user, but it is also needed to
register a user as a listener of a ConnectionRequester and required to
unregister when his or her browser exits. Client applets register by handing
a ConnectionEstablishCallback object's (which is executing within them
as a lightweight thread) IOR on to the local ConnectionRequester. Not only
do they act as clients for ConnectionRequester when requesting connection
to someone, but they also provide servers within browsers that are called
up by ConnectionRequester, when an INCOMING connection request is put in
the ConnectionRequester's queue. This way GUIs also serve to "wake
up" the users of VTE. Obviously, this requires that a proper applet
web page is viewed, but a nice trick with a constantly executing applet
thread is also sometimes possible, notifying one of a requested connection,
even if he or she changes web pages.
Let us imagine that user A chooses to talk to user B and follow a scenario
without bogging down into unnecessary details. User's A applet calls request_connection_from
on its local ConnectionRequester server object, passing data structures
filled with info describing the needed connection. The local A's ConnectionRequester
server object invokes accept_connection_from on the B host's ConnectionRequester
thus putting a connection request in its queue. Now, B's ConnectionRequester
calls request_connection_from on the local ConnectionEstablishCallback
server object in B user's applet. Subsequently, ConnectionEstablishCallback
notifies the B user of the need to invoke either allow_connection
or deny_connection on its local ConnectionRequester. Depending on
the decision, the ConnectionRequester triggers calling up operations open_av_device,
transmit and receive (which send outbound audio/video stream
and accept inbound audio/video) on AudioVideoSplitController server object
or it simply triggers removing a request from the queue by calling remove_queued_connection_request
on the local ConnectionRequester server object. When a connection is accepted
AudioVideoSplitContollers exchange audio/video information. ConnectionRequester
is designed as a fully multithreaded Java server application and also greatly
benefits from the underlying ORB's threaded environment in servicing clients'
requests and maintaining the behavioral consistency of the VTE system.
It becomes evident especially when user A accesses VTE simultaneously in
all 7 browsers started on the host by him or her and in each browser asks
for the same connection but passing contradicting parameters it is to be
initiated with. As opposed to "stateless"applet GUI code, ConnectionRequester
acts as a persistent "storage server", tracking down video sessions
awaiting connection and the sessions currently active.
5. Localization Facilities Incorporated in VTE and the Applet GUI
Without doubt video accessibility is only part of the real accessibility
challenge. The other part of the task that a system must be able to handle
is to easily find the person one is looking for. Java with its mobile facilities
and dynamic code reloading seems to also provide some excellent mechanism
to tackle the problem. This especially holds true when we take into the
account that the VTE system is designed to work on the Java/CORBA driven
platform with IBM's Aglets and Active Badge System(ABS) from Olivetti/Oracle
Research Laboratory, Cambridge, UK. In the VTE, the localization of
a collocutor can be established in two distinct ways. One of them, the
most rudimentary, is giving an email address of the person one wants to
talk to. Obviously, this approach is extremely inefficient as one may not
have valid information about a person's location (esp. which workstation/computer
the person is currently working at). This option is provided rather for
those who languish for a modern replacement for the ubiquitous talk utility
on the UNIX systems.
The other, much more sophisticated way of locating users, is ORL's ABS
system coupled with IBM's aglets-mobile Java pieces of code that can find
the requested information for us. Active Badge System equips its users
with small badges that can be located by groups of sensors dispersed in
all parts of a building. The ABS sensors and software can track people's
location within a building. Obviously, ABS system is usable only in places
where sensors are located.
VTE is being extended to even more tightly integrate with the ABS system,
not only to establish physical location of potential collocutors, but also
to collect some info (like discerning if John is moving fast between rooms)
about their current activities from aglets, roaming a set of special servers.
Each of the servers is responsible for managing a collection of sensors
that intercept signals from badges held by every member of the staff in
the office or in a building. The Localization Site acts as a server initiating
lightweight aglets movement in search of information and as a server controlling
the retrieval of the information gleaned by the mobile code. To obtain
this information a user interface client applet consults it using CORBA
calls and builds a tree depicting the current location of the users in
the local domain. Right now, the Java user interface provides clickable
names of hosts at which one may ask a video connection to a given person.
Figure 2. Java VTE user interface with custom crafted Java GUI components
The graphical user interfaces used by the VTE system provide the mechanism
for the interaction needed between the web based and the middle system
layer. They all act as "stateless" entities. Java turns out to
be an excellent tool and provides facilities that enable creating packages
with specialized components like knobs, trees and custom drawn list, thus
making the interaction with the system surpass in convenience alternative
solutions that might use cumbersome HTML forms. Also, the VTE 's graphical
user components are designed to conform to the requirements stipulated
by The Java Beans specification.
6. Audio- and Video-handling Native Layer
In essence, AudioVideoSplitController is the only part of the system
responsible for creating, directing (splitting) multimedia stream to recipients
and retrieving the streams incoming. One can inquire whether it can be
efficiently handled by interpreted Java code even accelerated by Just In
Time compilers. Based on the experience garnered during the implementation
of VTE system the answer is that it certainly can, as almost all audio/video
code is executed within native, video, compiled C libraries anyway (like
XIL on Solaris for example). However, there is indeed no reason to implement
it in Java as the necessity to access video devices will always force it
to relay real work to native code. This approach might seem inevitable
also for performance reasons, but in theory there are no obstacles to use
Java JIT accelerated code with the fundamental work done in native methods
at this layer too.
The VTE's AudioVideoSplitController implementation is capable of handling
several different types of audio and video formats unavailable in standard
Java libraries. In its current state it can use linear PCM, u-law, and
A-law audio encoding together with JPEG or CellB video de/compression standards.
Unfortunately, VTE system currently doesn't support any variety of MPEG
compression standard yet (still under development), although MPEG-1 decompression
plug into the AudioVideoSplitController is already implemented.
Figure 3. Video displaying is handled by native code while the
applets provide GUI to localization facilities and relay user requests
to management objects in the middle VTE layer. The GUI obviously easily runs in Java enabled
Currently the VTE system's transport protocol used to transfer data
is UDP/IP connectionless protocol, acting as a "plumbing" approach
to the rest of the system which is structured on top of CORBA protocol.
One might also suggest basing the multimedia data transport work entirely
on CORBA with its underlying IIOP protocol. There is however a peculiar
difference that must be noticed when developing a real motion video application
and a traditional networked one: building video applications on top of
inherently request/reply computational model like CORBA, inevitably requires
implementing audio and video delivery outside CORBA requests. ORB based
model suits perfectly well in most cases when searching databases, managing
objects (like in this web application's case) and performing tasks on a
request/reply basis is needed, but modern video applications require much
more than just this kind of behavior. They need streaming capabilities
and call for providing sensible Quality of Service in the delivery of data.
It is a generally well-known fact that multimedia systems require substantial
support from the underlying transport and signalling protocols to provide
QoS. The IIOP protocol doesn't currently equip CORBA with any streaming
capabilities. Considering that, one must openly admit that, without introducing
proprietary extensions to CORBA, one cannot do without the so-called "plumbing
approach" i.e. handling audio video transmission on a different network
There is currently no standard protocol we know of on the Internet, providing
multimedia programmers with the ability to control and negotiate the quality
of service of a given stream of multimedia data. Even ATM networks existing
right now cannot be used in such a way that programmers may use the signalling
benefits of a QoS oriented network, unless they delve deep into the native
ATM mode and thus step out of code usable on the Internet.
Due to lack of real multimedia oriented standard protocols VTE's team implemented
a test version in which servers exchanged audio and video using oneway
CORBA operations. Unfortunately, some CORBA implementations seem to execute
some subsequent oneway operations in a LIFO fashion. It resulted in the
need to build yet another kind of "sequencing protocol" and introduced
considerable jitter into the video stream, confirming the notion that applying
CORBA request/reply model is not appropriate when streaming time dependent
data transfer is required. Verifying the correctness of the opinion about
uselessness of IIOP in transferring video data pinpointed in the VTE's
test implementation is the fact that oneway CORBA calls may not be asynchronous
in the reality, since the CORBA specification clearly permits the ORB to
even block while sending a oneway call. Such behavior is highly undesirable
when high speed video is transmitted and displayed.
There are generally two approaches to providing QoS in multimedia applications:
reservation of resources and adaptation to operating under existing
resources. Obviously, UDP/IP used by VTE provides no QoS, not to say
any sensible "multimedia orientation " approach with reservation
of resources, but it definitely doesn't lock us out of the Internet like
native ATM network programming. Hence the VTE designers' decision, to at
least provide an illusion of real QoS, by enabling the user to adapt the
system to the existing resources at run time, when he or she gets annoyed
by frequent frame dropping or unbearable sound jitter. In this way one
can lower the number of colors displayed, switch to smaller picture size
and decrease the quality of sound transmitted. The VTE system's AudioVideoSplitController
tries also to provide sensible "QoS" (but of course it is only
a substitute for it) maintaining synchronization between audio and video
by inserting voice and images together into about 1000 bytes long packets
that are assembled at the receiving host's AudioVideoSplitController into
a bigger chunk of data. The chunk of data is immediately decompressed and
displayed on the screen (and the sound played) using native function calls.
This approach virtually cancels out the so-often terrible synchronization
problem found in many multimedia systems sending audio and video separately.
They often work excellent on LANs, but scaling them to larger networks
makes it almost impossible to see the lips moving in sync with the sound.
The AudioVideoSplitController interface includes also operations enabling
to change the size of a packet being sent. Although increasing the packet
size vastly improves network performance, it may introduce substantial
jitter into multimedia data flow. Obviously, all this also depends on the
type of data being sent and how fast the sources generate them. For example,
when a VTE connection is configured for 16-bit stereo, linear PCM, sampled
at 44.1Khz audio sound, it requires approximately 10MB of data per 60 second
time frame. In this case the size of the packet could be increased without
almost any sound jitter perceived by the user. However, 8-bit monaural,
u-law encoded audio at 8khz sampling would be unbearable when sent in 20KB
data chunks-the delay needed to fill audio buffers (after which data is
sent) would be noticeable and annoying.
It is, however, anticipated that the IPv6 will provide video application
programmers with a standard-RSVP reservation protocol. A notice worth is
also the fact that RSVP messages may also be carried in UDP datagrams.
One might cherish hope that all this could provide solid ground for delivering
QoS by reservation on the Internet in the future. IDL defined layered structure
should make changes to the VTE system much easier when an opportunity appears.
It is also expected that the next release of the VTE system will definitely
utilize OrbixWeb, as the authors are very enthusiastic about IONA Technologies'
OrbixTalk- OMG's CORBA Event Service Implementation, and suppose that
the AudioVideoSplitController layer will greatly benefit from multicast
group communication in delivering multimedia data.
7. Concluding Remarks
Definitely a growing number of applications will require more and more
networked multimedia capabilities. Java seems to path the way in an attempt
to create a neutral web based software platform upon which portable software
solutions can be based. However, before one contrives to detach oneself
from writing non-portable, proprietary code many Java libraries will have
to become available and the set of native methods delivered with Java classes
will have to become much more extensive. There is also definitely a lot
of standardization to be done in the area of networking itself bearing
in mind multimedia requirements imposed by audio and video applications,
as the UDP/IP solution on top of IPv4, commonly met in the industry, exhibits
substantial flaws in this area. The VTE system clearly portrays the observation,
that right now, there is no way to build a Java video system with no proprietary
native code. The only remedy is to encapsulate it, preferably, in IDL interfaces,
thus making the rest of the application independent of their implementation,
so that they can be replaced with better solutions in the days to come.
Till the day better solutions become a reality, one will have to also rely
on software components that cannot be simply installed on the fly.
Java is an extremely promising technology but there is still a lot to be
added to it before one can immerse oneself in writing totally non-proprietary
web based code with blissful ignorance of C.
 The Java Language Specification, James Gosling, Bill Joy,
Addison Wesley, ISBN 0-201-63451-1, First printing, August 1996
 The Common Object Request Broker: Architecture and Specification,
Object Management Group, 2.0 edition July 1995
 The Java Native Interface Specification, Sheng Liang,
Sun Microsystems, Inc. 2550 Garcia Avenue, Mountain View, CA 94043-1100,
 The Java Beans Specification,
Sun Microsystems, Inc. 2550 Garcia Avenue, Mountain View, CA 94043-1100,
 OrbixWeb for Java,
Iona Technologies Ltd., The Iona Building, 8-10 Pembroke St. 2, Dublin,
 Visigenic's VisiBroker for Java Reference Manuals,
Visigenic Software, Inc., 951 Mariner's Island Blvd., Suite 120, San
Mateo, CA 94404, USA
 The Active Badge Location System, Roy Want, Andy Hopper,
Veronica Falcao, Jonathon Gibbons, Olivetti Research Ltd, 24a Trumpington
Street, Cambridge CB2 1QA, England
 Programming Mobile Agents in Java, Danny B. Lange and Daniel
T. Chang, IBM Corporation, September 9, 1996
Iona Technologies Ltd., The Iona Building, 8-10 Pembroke St. 2, Dublin,
Return to Top of Page
Return to Technical Papers Index