Berkeley DB Reference Guide: Troubleshooting common Berkeley DB problems

Berkeley DB Reference Guide:
Debugging Applications

Troubleshooting common Berkeley DB problems

The following are some common problems that applications encounter. For the answers to more Frequently Asked Questions, see the Berkeley DB Reference Guide FAQ sections, typically located at the end of each chapter.

A Berkeley DB method is returning "argument invalid" (EINVAL) or other general error value, or throwing a general exception, and the cause is not obvious. Or, a Berkeley DB method is returning an out-of-memory (ENOMEM) error and there is plenty of disk and heap space available.
The application is calling the Berkeley DB API incorrectly or configuring the database environment with insufficient resources.

The Berkeley DB library optionally outputs a verbose error message whenever it is about to return a general-purpose error, or throw a non-specific exception. Whenever it is not clear why an application call into Berkeley DB is failing, the first step is always to turn on verbose error messages, which will almost always explain the problem. See the Run-time error information section of the Reference Guide for more information.
Multiple databases are being created in a single physical file and there is random database corruption.
The databases do not share an underlying database cache. Databases that share a single physical file must almost always share an underlying database cache as well. See the Opening multiple databases in a single file section of the Reference Guide for more information.
There are random failures when creating a database environment, often associated with creating or initializing the shared memory regions that back the database environment.
The filesystem in which the database environment is being created is an NFS or other remote filesystem. Database environments should not be created in NFS filesystems. See the Remote filesystem section of the Reference Guide for more information.
There are core dumps or garbage returns from random Berkeley DB operations.
The application is failing to zero out DBT objects before calling Berkeley DB. Before using a DBT, you must initialize all its elements to 0 and then set the ones you are using explicitly.

Another reason for this symptom is the application may be using Berkeley DB handles in a free-threaded manner, without specifying the DB_THREAD flag to the DB->open or DB_ENV->open methods. Any time you are sharing a handle across multiple threads, you must specify DB_THREAD when you open that handle.

Another reason for this symptom is the application is concurrently accessing the database, but not acquiring locks. The Berkeley DB Data Store product does no locking at all; the application must do its own serialization of access to the database to avoid corruption. The Berkeley DB Concurrent Data Store and Berkeley DB Transactional Data Store products do lock the database, but still require that locking be configured.
A transactional database environment locks up, and no threads of control are making progress.
The most common cause of this failure is a thread of control exiting unexpectedly, while holding a Berkeley DB mutex or a read/write logical database lock. If a thread of control exits holding a data structure mutex, other threads of control will likely lock up fairly quickly, queued behind the mutex. If a thread of control exits holding a logical database lock, other threads of control may lock up over a long period of time, as they will not be blocked until they attempt to acquire the specific page for which a lock is not available. See the Deadlock debugging section of the Reference Guide for more information on debugging deadlocks.

Whenever a thread of control exits Berkeley DB holding a mutex or logical lock, all threads of control must exit the database environment, and database recovery must be performed. See the Application structure section of the Reference Guide for more information.

Finally, the Berkeley DB API is not re-entrant, and it is usually unsafe for signal handlers to call the Berkeley DB API. See the Signal handling section of the Reference Guide for more information.
Locks are accumulating, or threads and/or processes are deadlocking in a transactional environment, even though there is no concurrent access to the database.
The application may have failed to close a cursor. Cursors retain locks between calls. Everywhere the application uses a cursor, the cursor should be explicitly closed as soon as possible after it is used.

Another reason for this symptom is the application is not checking for DB_LOCK_DEADLOCK errors (or DbDeadlockException exceptions). Unless you are using the Berkeley DB Concurrent Data Store product, whenever there are multiple threads and/or processes concurrently accessing a database and at least one of them is writing the database, there is potential for deadlock.

If deadlock can occur, applications must test for deadlock failures and abort the enclosing transaction, or locks will be left. See the Recoverability and deadlock handling section of the Reference Guide for more information.
A transactional database environment cannot be recovered or normal database operations fail with messages that "LSN" values are past the end of the log.
The application may have removed all of its log files without also dumping and reloading all of its databases. Log files should never be removed unless explicitly authorized by the db_archive utility or the DB_ENV->log_archive method. Note that those interfaces will never authorize removal of all existing log files.

Another reason for this symptom is the application may have created a database file in one transactional environment and then moved it into another transactional environment. While it is possible to create databases in non-transactional environments (for example, when doing bulk database loads) and then move them into transactional environments, once a database has been used in a transactional environment, it cannot be moved to another environment without first being dumped and reloaded.
A transactional application is seeing an inordinately high number of deadlocks.
The application may be acquiring database objects in inconsistent orders; having threads of control always acquire objects in the same order will reduce the frequency of deadlocks.

If you frequently read a piece of data, modify it and then write it, you may be inadvertently causing a large number of deadlocks. Try specifying the DB_RMW flag on your get calls.

Or, if the application is doing a large number of updates in a small database, turning off Btree splits may help (see DB_REVSPLITOFF for more information.)

Berkeley DB Reference Guide:Debugging Applications

Troubleshooting common Berkeley DB problems

Berkeley DB Reference Guide:
Debugging Applications