From uhog.mit.edu!news.mathworks.com!newsfeed.internetmci.com!usenet.eel.ufl.edu!psgrain!nntp.teleport.com!usenet 3 Jan 1996 12:47:21 GMT
Path: uhog.mit.edu!news.mathworks.com!newsfeed.internetmci.com!usenet.eel.ufl.edu!psgrain!nntp.teleport.com!usenet
From: danq@spot.Colorado.EDU (Daniel Quinlan)
Newsgroups: comp.lang.perl.announce,comp.lang.perl.misc
Subject: Announce: Datascope: a Unix database system based on flat files
Followup-To: comp.lang.perl.misc
Date: 3 Jan 1996 12:47:21 GMT
Organization: University of Colorado at Boulder
Lines: 124
Approved: merlyn@stonehenge.com (comp.lang.perl.announce)
Message-ID: <4cdtsp$fk@maureen.teleport.com>
NNTP-Posting-Host: linda.teleport.com
Keywords: flat file database
X-Disclaimer: The "Approved" header verifies header information for article transmission and does not imply approval of content.
Xref: uhog.mit.edu comp.lang.perl.announce:45 comp.lang.perl.misc:5851


Datascope: A Relational Database System for Scientists 

Datascope is a database system with tcl and perl interfaces (hence
the cross-posts).  If the information below interests you, take a
look at 

	http://jspc-www.colorado.edu/software/software.html#datascope

for more information.

What it is: 

    Datascope is a relational database system in which tables are
    represented by fixed-format files.  Typically these files are plain
    ascii files; the fields are separated by spaces and each line is a
    record.  The format of the files making up a database is specified in
    a separate schema file.  The system includes simple ways of doing the
    standard operations on relational database tables:  subsets, joins,
    and sorts.  The keys in tables may be simple or compound.  Views are
    easily generated.  Indexes are generated automatically to perform
    joins, but may also be created precomputed and explicitly saved. 
    Fairly general expressions may be evaluated, and can be used as the
    basis of sorts, joins, and subsets,

    The system provides a variety of ways to use the data.  There are c,
    fortran, perl, and tcl interfaces to the database routines.  There are
    command-line utilities which provide most of the facilities available
    through the programming libraries.  There are a few GUI tools for
    editing and exploring a database.  And, since the data is typically
    plain ascii, it's also possible to just use standard UNIX tools like
    sed, awk, and vi.  

What it is not: 

    Datascope is not a full fledged multiple user database.  Access
    permissions are handled strictly through UNIX file permissions. 
    Multiple people may use a database at once, but only one user should
    be modifying any table at any time; the database does not enforce this
    limitation, however, and this can lead to the same kind of problems
    that you experience when two people edit the same file at the same
    time.  There are no transactions, no rollback, and no logging.  Update
    operations happen immediately (that is, as quickly as UNIX writes them
    to disk -- there is no attempt to deal with synchronization issues). 
    The system is intended for either fairly static databases, or
    databases where only one person is making updates. 

    Datascope is not SQL based.  It might be fairly simple to add an SQL
    interface using the basic routines, but Datascope provides the
    functionality of SQL in a different fashion. 

    Datascope is intended for relatively small databases.  It uses memory
    fairly prodigously, since it memory maps all tables which are
    referenced, and also keeps the indexes in memory.  This has not been a
    serious limitation so far, and as prices of memory and storage
    continue to fall, it shouldn't be in the future. 

Novel Features

Despite its limitations, Datascope has some novel features: 

       It is fairly small, conceptually simple, and fast.  The
       Datascope specific source code is around 12,000 lines of c
       (various support libraries bring this total a bit higher). 

       Interfaces to several languages (c, fortran, tcl and perl), a
       command line interface, and a GUI interface are part of the
       package. 

       It is freely available. 

       The keys to tables may include ranges, for instance, a
       beginning time and an ending time.  This is useful, perhaps
       essential, for time dependent parameters, like instrument
       settings.  Indexes may be formed on these ranges, and these
       indexes can considerably speed join operations.  (When two
       tables are joined by time range keys, the join condition is
       that the time ranges overlap.)

       The schema file, in addition to specifying the fields which
       make up tables, and the format of individual records in every
       table, may provide a great deal of additional information,
       including : 

	   short and long descriptions of every attribute and relation

	   null values for each attribute

	   a legal range for each attribute

	   units for an attribute

	   Primary and alternate keys for each relation. 

	   Foreign keys in each relation

    This additional information is useful for documenting a database and
    when exploring a new database.  It often makes it possible to form the
    natural joins between tables without explicitly specifying the join
    conditions. 

What is it good for? 

    Relational database systems are clearly a useful way of representing
    information, much more powerful than the traditional scientific
    approach of grabbag datafiles, log files, handwritten notes, and
    bizarre and idiosyncratic data formats.  However, most relational
    database systems are expensive, difficult to understand, difficult to
    administer, and subject to mysterious failures.  In addition, they
    require large amounts of disk space and preferably a dedicated
    machine.  These characteristics put them out of the ballpark for most
    casual users and most scientists with smallish budgets. 

    Datascope provides a cheap, easy, fairly intuitive way of moving from
    the traditional plethora of formats to a better approach which
    organizes the data, documents it, and provides powerful tools for
    manipulating it.  It should be useful to any one who needs to organize
    data, but can't afford the time, money or other resources which a
    commercial database system requires. 
-- 
Daniel Quinlan               		danq@lemond.colorado.edu
University of Colorado, Physics Department/JSPC, Campus Box 583       
Boulder, CO  80309-0390			            303/492-4878


