Berkeley DB Reference Guide: Overview of the Java API

Berkeley DB Reference Guide:
Java API

Overview of the Java API

The Java API is a Java framework that extends the well known Java Collections design pattern such that collections can now be stored, updated and queried in a transactional manner. The Java API is a layer on top of Berkeley DB.

Together the Java API and Berkeley DB provide an embedded data management solution with all the benefits of a full transactional storage and the simplicity of a well known Java API. Java programmers who need fast, scalable, transactional data management for their projects can quickly adopt and deploy the Java API with confidence.

This framework was first known as Greybird DB written by Mark Hayes. Sleepycat Software has collaborated with Mark to permanently incorporate his excellent work into our distribution and support it as an ongoing part of Berkeley DB. The repository of source code that remains at Sourceforge at version 0.9.0 is considered the last version before incorporation and will remain intact but will not be updated to reflect changes made as part of Berkeley DB.

What does The Java API add to Berkeley DB?

Berkeley DB has always provided a Java API which can be roughly described as a map and cursor interface, where the keys and values are represented as byte arrays. This API is a Java (JNI) interface to the C API and it closely modeled the Berkeley DB C API's interface. The Java API is a layer on top of that thin JNI mapping of the C API to Berkeley DB that adds significant new functionality in several ways.

An implementation of the Java Collections interfaces (Map, Set, List and Iterator) is provided.
Transactions are supported using the conventional Java transaction-per-thread model, where the current transaction is implicitly associated with the current thread.
Transaction runner utilities are provided that automatically perform transaction retry and exception handling.
Keys and values are represented as Java objects rather than byte arrays. Bindings are used to map between Java objects and the stored byte arrays.
The tuple data format is provided as the simplest data representation, and is useful for keys as well as simple compact values.
The serial data format is provided for storing arbitrary Java objects without writing custom binding code. Java serialization is extended to store the class descriptions separately, making the data records much more compact than with standard Java serialization.
Custom data formats and bindings can be easily added. XML data format and XML bindings could easily be created using this feature.
In addition to secondary indices, foreign key indices are provided with integrity constraints.
The Java API API insulates the application from minor differences in the use of the Berkeley DB Data Store, Concurrent Data Store and Transactional Data Store products. This allows for development with one and deployment with another without significant changes to code.

Note that the Java API does not support caching of programming language objects nor keep track of their stored status. This is in contrast to "persistent object" approaches such as those defined by ODMG and JDO (JSR 12). Such approaches have benefits but also require sophisticated object caching. For simplicity the Java API treats data objects by value, not by reference, and does not perform object caching of any kind. Since the Java API is a thin layer, its reliability and performance characteristics are roughly equivalent to those of Berkeley DB, and database tuning is accomplished in the same way as for any Berkeley DB database.

Choices to make

There are several important choices to make when developing an application using Java API.

Choose the Berkeley DB Environment
Depending on your application's concurrency and transactional requirements you may choose one of the three Berkeley DB Environments: Data Store, Concurrent Data Store, or Transactional Data Store. For details on creating and configuring the environment see DbEnv .
Choose the Berkeley DB Access Method
For each Berkeley DB datastore, or data store as it is called within the Java API, you may choose from any of the four Berkeley DB access methods -- BTREE, HASH, RECNO or QUEUE ( Db.DB_BTREE , Db.DB_HASH , Db.DB_RECNO or Db.DB_QUEUE ) -- and a number of other database options. Your choice depends on several factors such as whether you need ordered keys, unique keys, record number access, etc. For more information on access methods see the com.sleepycat.bdb package description.
Choose the Data Format for Keys and Values
For each database you may choose a data format for the keys and values. For example, the tuple data format is useful for keys because it has a deterministic sort order. The serial format is useful for values if you want to store arbitrary Java objects. In some cases a custom data format may be appropriate. For details on choosing a data format see the com.sleepycat.bdb.bind package description.
Choose the Binding for Keys and Values
With the serial data format you do not have to create a binding for each Java class that is stored since Java serialization is used. But for other formats a binding must be defined that translates between stored byte arrays and Java objects. For details see the com.sleepycat.bdb.bind package description.
Choose Secondary Indices and Foreign Key Indices
Any data store that has unique keys may have any number of indices. An index has keys that are derived from data values in the primary data store. This allows lookup and iteration of objects in the data store by its index keys. A foreign key index is a special type of index where the index keys are also the primary keys of another data store. For each index you must define how the index keys are derived from the data values using a KeyExtractor . For details see the DataIndex , ForeignKeyIndex and KeyExtractor classes.
Choose the Collection Interface for each Data Store
The standard Java Collection interfaces are used for accessing data stores and data indices. The Map and Set interfaces may be used for any type of data store, while the List interface may only be used for data stores with record number access. The Iterator interface is used through the Map, Set and List interfaces. For more information on collection interfaces see the com.sleepycat.bdb.collection package.

Things to keep in mind

A single DataStore object should be created for each database, and all data stores in an application should normally be used with a single DbEnv object.

However, any number of bindings and collections may be created for the same data store. This allows multiple views of the same stored data. For example, a data store may be viewed as a Map of keys to values, a Set of keys, a Collection of values, or a List of values. String values, for example, may be used with the built-in binding to the String class, or with a custom binding to another class that represents the string values differently.

It is sometimes desirable to use a Java class that encapsulates both a data key and a data value. For example, a Part object might contain both the part number (key) and the part name (value). Using the Java API this type of object is called an "entity". An entity binding is used to translate between the Java object and the stored data key and value. Entity bindings may be used with all Collection types.

Please be aware the Java API collection classes provided do not conform completely to the interface contracts defined in the java.util package. For example, all iterators must be explicitly closed and the size() method is not available. The differences between Java API collections and standard Java collections are documented in the com.sleepycat.bdb.collection package description.

Berkeley DB Reference Guide:Java API

Overview of the Java API

What does The Java API add to Berkeley DB?

Choices to make

Things to keep in mind

Berkeley DB Reference Guide:
Java API