@device(imprint10)
@make(Report, Form 1)
@style(hyphenation on)
@style(BindingMargin 0)
@style(justification off)
@style(FontFamily ComputerModernRoman10)
@style(footnotes "")
@begin(center)
@majorheading(GDB)
@heading(Global Databases for Project Athena)
@subheading(Noah Mendelsohn)
@subheading(April 17, 1986)
@end(center)
@blankspace(1 cm)
This note is intended as a brief introduction to the Global Database
(GDB)  system being
developed at MIT Project Athena.
GDB is an ongoing effort, still in its early stages, to provide the
services of a high performance shared relational database to the
heterogeneous systems comprising the Athena network.  Specifications
have been developed for a set of library routines to be used by
@i[clients] to access the database.  
Current plans are to use the Ingres relational database product from
RTI as a local data manager, but to support access via the client
library from any Berkeley Unix@+[TM]@foot(@+[TM]Unix is a trademark of
AT&T Bell Laboratories) system in the internet.  Though early
versions will manage only a single copy of any given relation,
replication may be added at some point in the future.  
In the meantime the client library provides a uniform framework for
writing database applications at Athena.

While designing the client library it became apparent that many of its
underlying services for structured data storage and transmission would
be of value for a variety of applications.  Most of these interfaces
have been exposed, and the GDB project has undertaken as a secondary
goal the development of these simple services for structured data
maintenance and transmission.

@section(Raison d'etre@+[1]@foot[@+[1]with apologies for lack of accents in the
font!])

The GDB project was motivated by the observation that Athena
applications tend to exploit the computational and display services of
the system much more effectively than they use the network.
Furthermore,
those applications which do use the network tend to have strong
machine type affinities, running comfortably on either a Vax or an
RT/PC,
but rarely both.  Indeed, the @i[strategic] Athena database system is
currently unavailable on the RT/PC's.

Of the many unexplored uses of the network, globally accessible
databases seem to have great value in a variety of disciplines, and
they are also badly needed for certain aspects of Athena
administration.  By providing well architected services for global
data sharing, we hope to achieve at least two goals: (1) set the
precedent that user written applications and Athena supplied services,
like @b[madm], @b[chhome], and @b[passwd], run compatibly 
from any machine in the network, and (2) encourage
the development of new database applications by eliminating the need
for individual projects and departments to develop their own
transmission and encapsulation protocols.

@section(Implementation Goals)

The following goals have been established for the architecture and
implementation of GDB:

@begin(itemize)
Access to databases stored on incompatible machines (e.g. RT/PC to
Vax) should be supported transparently.

Multiple databases, possibly at several sites,  should be accessible
simultaneously.  The ability to do concurrent activity on the several
databases is desirable.

Appropriate facilities for managing structured data returned from the
database should be provided for programmers (e.g. access fields by
name.)

Asynchronous operation should be supported, for several reasons:
@begin(itemize)
Required for control of simultaneous access to multiple databases.

Needed for graceful interruption of long-running or erroneous
requests.

Facilitates pipelining of requests, thereby maximizing overlap of
server and client processing.
@end(itemize)

When the internal interfaces used for session control and data
transmission can be generalized without adding unnecessarily to their
complexity, then those interfaces should be documented and exposed.
@end(itemize)

@section(Implementation Strategy)

Several approaches to achieving these goals were considered, and an
implementation strategy has been chosen.  

One approach to achieving the required function would be to rely on
the appearance of RTI products containing the necessary facilities.
At the very least, we would need a full function Ingres port to the
4.2 system on the RT/PC.  RTI would further have to extend Ingres for
access to databases through the internet, and they would have to
support such access across multiple machine types.  These extensions
would give us a core of function suitable for limited application,
though we would have to see whether flexibility and performance were
truly appropriate for our needs.  If RTI should come forward with a
commitment to produce these products within the next few months, then
need for the libraries described herein might not be so great.
Lacking such products from RTI, it seems essential that we carry
forward with a strategy for database access from @i[all] of the
workstations in the Athena network.

Having decided to do at least some of the necessary work ourselves,
several implementation strategies are possible.  One, which is
currently being pursued by Roman Budzianowski, is to interpose the
appropriate transmission services between the RTI Ingres front end and
back end.  This technique has a number of interesting advantages, and
some disadvantages.  The primary advantage is the ability to run
existing Ingres applications, including some of the forms and query
facilities, through the network.  Also, Roman reports that he has
succeeded in running an interesting subset of applications without too
much effort.  The disadvantages of Roman's approach are the lack of 
a strategy for supporting non-Vax machines until RTI comes out with
the appropriate base products, and the dependence on undocumented
interfaces.   
In some cases, the front-end and the back-end are sharing files, while
in others signals seem to be sent.  It remains to be seen how successfully
we can divine these undocumented interfaces, how stable they remain
over time, and whether--given an RTI Ingres on the RT/PC--we can
figure out how to do the right byte swapping on the binary data sent
through Ingres' pipes and files.  Our conclusion is that Roman's
effort should continue, because it can
achieve valuable results without excessive effort.
Nevertheless, this scheme falls short of our requirement for balanced
support of all the machines in the Athena network, so we recommend an
alternate implementation as the primary base for Athena application
development. 

A machine independent access method for relational databases could be
constructed in many different ways.  One technique we have considered
and rejected is to base an implementation on the RPC prototype
developed by Steve Miller and Bob Souza.  While RPC is
convenient, and the prototype appears to be of very high quality, it
fails to meet our needs in several crucial areas.  We have concluded
that asynchronous interaction between clients and servers is
essential for performance, for parallel execution of multiple queries,
and for interruptibility of ongoing operations.  Synchronous RPC seems
ill-suited to these requirements.  A secondary concern is the lack of
support for procedure calls across heterogeneous architectures in the
current version of the RPC prototype.  The right hooks are supposedly
there, but the necessary alignment and type conversion routines have
not been built.  Indeed, the prototype has yet to be ported to the
RT/PC.

As a result of this analysis, we have designed a system which uses RTI
Ingres for the things it does (or purports to do) well, and we have
added a flexible, asynchronous transport mechanism for transmission of
structured data between heterogeneous processors.  The specification,
outlined in a separate document, includes libraries for creation and
management of tuples and relations in virtual memory, along with a
simple mechanism for typing the fields comprising a tuple.  Layered
upon these are services for transmitting fields, tuples and relations
through the internet, doing the necessary conversions and
re-alignments when moving between incompatible machines.  These
services, in turn, are used by a library which provides almost
all of the services of Ingres EQUEL to clients throughout the network.

@section(Project Status)

The specification for the interfaces to the client library routines is
available in draft form and is now being refined.  In parallel, design
is proceeding on the protocol to be used between the sites, and on the
related software structures used to encapsulate and parse the
transmitted data.  The design does include a simple but flexible
proposal for managing asynchronous activities.  Coding will start soon
on those pieces of the library which seem to be stable;  refinement of
the other parts will take a few more weeks.  While our rate of
progress will depend greatly on the number of people doing the work
and on their other responsibilities (neither of which are clear at
this time), I'm optimistic that a basic implementation will start
showing signs of life within a couple of months, with polishing taking
a bit longer.