Berkeley DB Reference Guide: Selecting an access method

Berkeley DB Reference Guide:
Access Methods

Selecting an access method

The Berkeley DB access method implementation unavoidably interacts with each application's data set, locking requirements and data access patterns. For this reason, one access method may result in dramatically better performance for an application than another one. Applications whose data could be stored using more than one access method may want to benchmark their performance using the different candidates.

(버클리디비 액세스 메소드 구현은 애플리케이션의 데이타셋,락 요구사항,데이타 접근패턴과 상호작요한다.이같은 이유로 애플리케이션에서 어떤 액세스메소드는 다른 액세스메소드에 비해 극적으로 성능이 향상 될 수 있다.)

One of the strengths of Berkeley DB is that it provides multiple access methods with nearly identical interfaces to the different access methods. This means that it is simple to modify an application to use a different access method. Applications can easily benchmark the different Berkeley DB access methods against each other for their particular data set and access pattern.

(버클리디비는 거의 같은 인터페이스에 다른 타입의 액세스메소드를 호출할수 있기 때문에 밴치마킹이 쉽다.)

Most applications choose between using the Btree or Hash access methods or between using the Queue and Recno access methods, because each of the two pairs offer similar functionality.

(대부분의 애플리케이션은 Btree 또는 Hash 를 선택하거나 Queue 또는 Recno 를 선택한다.왜냐하면 각 액세스메소드 쌍은 비슷한 기능을 하기 때문이다.)

Hash or Btree?

The Hash and Btree access methods should be used when logical record numbers are not the primary key used for data access. (If logical record numbers are a secondary key used for data access, the Btree access method is a possible choice, as it supports simultaneous access by a key and a record number.)

(hash,btree는 논리적인 레코드번호가 프라이머리 키가 아닐때 사용된다.만약 논리적인 레코드번호가 세컨더리 키로 사용된다면 Btree를 사용할 수 있고 이 Btree는 키와 레코드번호로 동시에 접근가능하다.)

Keys in Btrees are stored in sorted order and the relationship between them is defined by that sort order. For this reason, the Btree access method should be used when there is any locality of reference among keys. Locality of reference means that accessing one particular key in the Btree implies that the application is more likely to access keys near to the key being accessed, where "near" is defined by the sort order. For example, if keys are timestamps, and it is likely that a request for an 8AM timestamp will be followed by a request for a 9AM timestamp, the Btree access method is generally the right choice. Or, for example, if the keys are names, and the application will want to review all entries with the same last name, the Btree access method is again a good choice.

(Btree의 키는 정렬되어 저장되고 키간의 관계는 정렬순서로 정의된다.이같은 이유로 Btree는 키들간의 지역적 참조가 있을때 사용된다.지역적참조가 의미하는것은 Btree에서 특정한 키를 접근하는 것은 애플리케이션이 접근되고 있는 키와 정렬순서에서 근접한 다른키를 접근하는 경향이 있음을 말한다.예를들어 키들이 타임스탬프라면 8AM타임스템프를 요구한 후에는 9AM 타임스템프롤 요구할 것이다.이런경우 Btree가 올바른 선택이다.또다른 예로 키가 이름이면 애플리케이션은 같은 성을 가진 모든 앤트리를 보고 싶어할것이고 이경우도 Btree가 좋은 선택이다.)

There is little difference in performance between the Hash and Btree access methods on small data sets, where all, or most of, the data set fits into the cache. However, when a data set is large enough that significant numbers of data pages no longer fit into the cache, then the Btree locality of reference described previously becomes important for performance reasons. For example, there is no locality of reference for the Hash access method, and so key "AAAAA" is as likely to be stored on the same database page with key "ZZZZZ" as with key "AAAAB". In the Btree access method, because items are sorted, key "AAAAA" is far more likely to be near key "AAAAB" than key "ZZZZZ". So, if the application exhibits locality of reference in its data requests, then the Btree page read into the cache to satisfy a request for key "AAAAA" is much more likely to be useful to satisfy subsequent requests from the application than the Hash page read into the cache to satisfy the same request. This means that for applications with locality of reference, the cache is generally much more effective for the Btree access method than the Hash access method, and the Btree access method will make many fewer I/O calls.

(작은 데이타셋(캐시에 모든 엔트리가 위치될수 있을 정도의 크기)에 대한 Btree와 Hash사이에는 약간 성능차가 있다.그러나 데이타셋이 굉장히 커서 캐시에 데이타페이지가들어갈수 없게되면 Btree의 지역적 참조는 성능의 중요한 요소가 된다.예를들어 해쉬는 지역적참조가 없어서 키 "AAAAA"는 "ZZZZZ"키와 같은 페이지에 저장된다.Btree의 경우 "ZZZZZ"키보다는 "AAAAB"키에 근접한곳에 저장된다.그래서 만약 애플리케이션이 지역참조으로 데이타요청을 하게 되면 키에대한 요청처리를 위해 캐시에 로딩되는 Btree페이지는 애플의 이후 요청에 해쉬방식보다 훨씬 효율적이다.이것이 의미하는 것은 지역참조성이 있는 애플에 대해서는 해쉬메소드보다 Btre메소드가 캐시를 훨씬 유용하게 사용하게 되고 Btree액세스메소드는 훨씬 적은 I/O 호출을 하게 된다.)

However, when a data set becomes even larger, the Hash access method can outperform the Btree access method. The reason for this is that Btrees contain more metadata pages than Hash databases. The data set can grow so large that metadata pages begin to dominate the cache for the Btree access method. If this happens, the Btree can be forced to do an I/O for each data request because the probability that any particular data page is already in the cache becomes quite small. Because the Hash access method has fewer metadata pages, its cache stays "hotter" longer in the presence of large data sets. In addition, once the data set is so large that both the Btree and Hash access methods are almost certainly doing an I/O for each random data request, the fact that Hash does not have to walk several internal pages as part of a key search becomes a performance advantage for the Hash access method as well.

(그러나 데이타셋이 매우커질때 해시액세스메소드는 Btree액세스메소드의 성능을 능가한다.이러한 이유는 Btree는 해쉬디비보다 더많은 메타페이지를 가지고 있기 때문이다.데이타 셋이 크게되면 메타페이지는 Btree의 캐쉬를 넘어서게 된다.이것이 발생하면 Btree는 캐시에 있는 데이타페이지가 극히 작을 가능성이 있기때문에 각 데이타요청에 대해 I/O을 강제로 발생시킬 수도 있게된다.해쉬액세스메소드는 메타페이지가 적기때문에 캐시에는 많은 데이타셋이 있게된다.추가적으로 데이타셋이 아주크면 Btree,hash는 거의 랜덤 데이타요청에대해 하나의 I/O을 발생시킨다.해쉬가 키검색때 여러 페이지 순회를 필요로 하지 않기 때문에 이것또한 해쉬의 장점이다.)

Application data access patterns strongly affect all of these behaviors, for example, accessing the data by walking a cursor through the database will greatly mitigate the large data set behavior describe above because each I/O into the cache will satisfy a fairly large number of subsequent data requests.

(애플리케이션의 데이타접근패턴은 이러한 모든 행위에 영향을 미친다.예를들어 커서를 사용하여 디비 데이타를 사용하는것은 큰데이타셋에 대한 위에서 설명된 행위들을 안화시킨다.왜냐하면 캐시에 들어가는 각 I/O은 이후발생하는 많은 데이타요청을 만족시킨다.(역자주:즉 커서를 사용하면 한번의 I/O으로 많은 데이타셋이 케시에 있게된다.커서는 파일의 특정레코드를 가르키는 객체))

In the absence of information on application data and data access patterns, for small data sets either the Btree or Hash access methods will suffice. For data sets larger than the cache, we normally recommend using the Btree access method. If you have truly large data, then the Hash access method may be a better choice. The db_stat utility is a useful tool for monitoring how well your cache is performing.

(데이타접근패턴과 애플리케이션데이타에 대한 정보가 없을때의 고려사항은 다음과 같다. 작은데이타셋에 대해서는 Btree,hash 어떤것도 충분하다.캐시보다 데이타셋들이 클경우는 일반적으로 Btree를 추천한다.만약 굉장히 큰 데이타를 가지고 있다면 이때는 해시가 좋은 방법이다.캐시 퍼포먼스는 db_stat 유틸을 사용하여 모니터링될수 있다.)

Queue or Recno?

The Queue or Recno access methods should be used when logical record numbers are the primary key used for data access. The advantage of the Queue access method is that it performs record level locking and for this reason supports significantly higher levels of concurrency than the Recno access method. The advantage of the Recno access method is that it supports a number of additional features beyond those supported by the Queue access method, such as variable-length records and support for backing flat-text files.

(Queue or Recno 는 논리적인 레코드번호가 디비접근의 primary키로 사용될때 사용된다.Queue의 장점은 레코드레벨 락을 실행한다는 점이다. 이러한 이유로 Recno보다 동시성이 월등히 좋다.Recno의 장점은 Queue 가 제공하는 기능에 덧붙여 여러 기능을 제공한다.예를들어 가변길이 레코드 와 backing flat-text파일 지원등..)

Logical record numbers can be mutable or fixed: mutable, where logical record numbers can change as records are deleted or inserted, and fixed, where record numbers never change regardless of the database operation. It is possible to store and retrieve records based on logical record numbers in the Btree access method. However, those record numbers are always mutable, and as records are deleted or inserted, the logical record number for other records in the database will change. The Queue access method always runs in fixed mode, and logical record numbers never change regardless of the database operation. The Recno access method can be configured to run in either mutable or fixed mode.

(논리적인 레코드번호는 변경(레코드 추가,삭제시 레코드번호들이 변경)되거나 고정(어떤 디비 오퍼레이션에도 변경되지 않음)될수 있다.Btree에서 논리적인 레코드 번호로 데이타를 읽을 수 있다.그러나 이 레코드번호는 항상 변경(레코드의 삽입,삭제시)된다.Queue는 항상 고정모드고 동작하고 논리적인 레코드번호는 디비 오퍼레이션시 변경되지 않느다.Recno는 변경,고정모드중의 하나로 설정될수 있다.)

In addition, the Recno access method provides support for databases whose permanent storage is a flat text file and the database is used as a fast, temporary storage area while the data is being read or modified.

(추가적으로 Recno는 영구저장소가 flat-text file인 디비를 지원하고 이 디비는 데이타가 읽거나 수정될때 빠른 임시 저장소로 사용된다.)

Berkeley DB Reference Guide:Access Methods

Selecting an access method

Hash or Btree?

Queue or Recno?

Berkeley DB Reference Guide:
Access Methods