Tuning CMP PerformanceFor any complex persistence system there are two main factors that impact the performance of the system:
In performance tuning a persistence system, the primary strategies for optimizing these factors are:
JBossCMP offers a variety of mechanisms that allow tuning of its strategies. Caching Mechanisms used by JBossCMPWith CMP 2.0, the specification added a mechanism that allows the CMP implementation to separate the point in time at which it loads data from the store from the time when an application needs that data. Rather than maintaining information in fields, an application always accesses data using a get or set method, the implementation of which is provided by the Container. This allows the Container to optimize persistence operations by potentially loading data in advance of when the application needs it, or by delaying the load until the the actual get method is called. It also provides the Container with information on exactly which data has been modified, allowing it to potentially delay storing that data and allowing it to tune those operations so that only modified data gets written. For these mechanisms to be effective, JBossCMP must be able to cache data for the time between it is loaded from the store until it is ultimately flushed back. It does this using two separate caches: Entity CacheJBossCMP uses the main JBoss Entity Cache to store values associated with an Entity that is in the main cache. Such an entity is either enrolled in the current transaction, or has been configured in such a way that JBoss will cache it between transactions (commit option A or D). For each CMP Entity in the cache, JBossCMP stores the state of every cmp- or cmr-field; whether the data is loaded and if so its value, and whether the data has been modified. JBossCMP ReadAhead CacheIn some cases JBossCMP will optimistically pre-load data for entities that are not (yet) associated with the current transaction. It cannot use the Entity Cache to store this data as doing so would require a lock on those entities. Instead it uses a CMP-specific Read-Ahead cache as a temporary store for this information. Once it is used in the course of a transaction, then the data becomes associated with the Entity instance in the main cache. Load GroupsLoad groups are a mechanism for grouping fields together so that fields that are commonly accessed together by the application can be loaded together in a single persistence operation. This allows us to reduce the number of operations required, and to efficiently transfer data from the store by fetching all the data the application needs for an instance, but no more than is really necessary. A load-group is a grouping of fields defined in jbosscmp-jdbc.xml. For example, the definition:
defines two load groups: one for billing with fields that are used to bill for an order, and one containing fields that are used to ship the order. A field may be in multiple load-groups as needed. Load groups are references by the different loading strategies described below to determine which fields should be added to a query. Eager LoadingEager loading is the mechanism the Container uses to load data in advance of when it actually needs to return it to an application. This allows data to be piggy-backed onto other persistence operations, taking account that the cost of loading one field or record is comparable to loading several. JBossCMP can perform eager loading in response to any of three triggers:
Eager Loading caused by QueriesThe query for a finder or for an ejbSelect method that returns Entities returns a Set or Collection of references to those Entities. All that is required to construct such a reference is the primary key, so these queries will normally just select the primary key fields. For example, the simple finder public interface Order extends EJBLocalHome { would generate SQL like SELECT t0_o.ORDER_ID FROM ORDER_DATA t0_o where orderId/ORDER_ID is the primary key for the Order EJB. However, if the application then iterates over every EJB in the collection, code like: Collection orders = orderHome.findAll(); then every order in the collection must be loaded independently. This requires 1+N persistence operations which violates the first rule above. To alleviate this performance issue, JBossCMP support the concept of on-find read-ahead which allows additional columns to be added to the generated select statement. The read-ahead definition uses a load group to determine which fields to use. For example, the definition:
would cause the fields from the Billing load group to be added to the generated select statement: SELECT t0_o.ORDER_ID, The addition column values are stored in the JBossCMP ReadAhead cache. When the application iterates over the result of the finder, the vales for the cmp- and cmr-fields in each Order instance can be loaded from the ReadAhead cache and do not need to be loaded from the database. This means that we now only need to perform 1 persistence operation and have eliminated N others. This is a very effective optimization but care should be taken to avoid loading data that is not needed. This is especially true if the table contains potentially large values, such as data in LOB columns. Eager Loading caused by Instance LoadingWhen the application tries to read a field, JBossCMP will first check the ReadAhead cache to see if the field being accessed has already been loaded as the result of a query. If so, then the value is returned from cache and no load operation is performed. However, if the value was not in the ReadAhead cache, then an SELECT statement will be execute to load the require field from the database. JBossCMP provides strategies for setting which additional fields get loaded by the select, and for causing addition rows to be pre-loaded into the cache. By default, JBossCMP will load all the entity's fields. This can be overridden by specifying an eager load group in jbosscmp-jdbc.xml. For example, we could define the loading strategy for the Order EJB to load just the fields required by shipping:
This would cause the SELECT statement to become: SELECT t0_o.FK_SHIP_TO, t0_o.ITEM_COUNT JBossCMP will also look in the ReadAhead cache for the most recent finder result that included this Entity. If such as result is present, JBossCMP will try to read-ahead other rows from the same finder as determined by the read-ahead page-size element for the Entity. For example, if the entity definition in jbosscmp-jdbc.xml was:
then the SELECT statement would be modified to include additional rows from the finder: SELECT t0_o.FK_SHIP_TO, t0_o.ITEM_COUNT, t0_o.ORDER_ID The rows fetched will be the required row, plus the next 4 rows returned by the query that was run for the finder. This strategy can be used where a finder returns too many results to be efficiently held in cache. Instead of trying to hold all the results in memory, only the number defined by the page size are held. This prevents the cache from being flooded whilst still reducing the number of operations required to 1+(N/page-size). Eager Loading caused by RelationshipsJBossCMP will also read ahead when the use of a get accessor for a cmr-field causes a query to be executed to a load related entity. A read-ahead element can be added to the ejb-relationship-role that specifies which fields should be eager loaded and the number of rows to pre-fetch. For example, the definition:
would cause the Shipping fields of the Order to be pre-loaded when the getOrder() accessor was called on a LineItem instance. The SQL generated would be: SELECT ORDER_DATA.FK_SHIP_TO, ORDER_DATA.ITEM_COUNT If the LineItem instance was returned from a finder, then the page-size pre-load would add the next 4 rows from the finder into the query. The SQL would then be: SELECT ORDER_DATA.FK_SHIP_TO, ORDER_DATA.ITEM_COUNT, LINEITEM.ITEM_ID Lazy LoadingLazy loading is activated when an application access a field whose value has not already been loaded through an eager load strategy. To reduce the potential number of operations that need to be performed, JBossCMP can be configured to load additional fields as well as the one being accessed. If specific lazy load groups are defined in jbosscmp-jdbc.xml, JBossCMP will merge together all the fields from all the groups the field being accessed is a member of and then issue a select with all of those that have not already been loaded. For example, if the following lazy load groups are defined for the Order EJB:
and no fields have been loaded, then if the get accessor for orderDate is called would result in the SQL: SELECT ORDER_DATE, FK_BILL_TO, FK_SHIP_TO, TOTAL being executed. If the get accessor for lineItemCount is then called, then the shipTo field has already been loaded so the SQL executed would be: SELECT ITEM_COUNT Impact of TransactionsTransactions have a major impact on the effectiveness of any loading strategy. JBossCMP can only cache information in memory if it can be sure that it will remain consistent with any information stored in the database. If a transaction is in progress, JBossCMP knows that the current thread will be isolated from any other changes in the database and so is able to cache values until the transaction commits. However, if there is no transaction in progress, JBossCMP assumes that the data may be modified at any time and so does not retain any information in cache. This leads to behaviour that is contrary to expectations: Attempting to improve performance by avoiding transaction overhead will actually result in a substantial decrease due to additional persistence operations. For example, consider the simple use case where we query for orders and then iterate over them: double total = 0.0; Assume that eager load groups have been configured for the finder and load instance cases to pre-load the totalAmount field. If this is run without a transaction, then the SELECT issued to run the finder will include the pre-load field for amount: SELECT t0_o.ORDER_ID, t0_o.TOTAL This will populate the ReadAhead cache with the result of the finder including the values for totalAmount. However, because the finder ran without a transaction, this information is immediately aged out of the cache. As a result, when the first entity in the list is loaded, none of its fields remain in the cache and hence a select must be issued to fetch its values. This repeats for every Order in the list resulting in 1+N persistence operations. However, if this is run within a transaction, then JBossCMP will not discard the pre-loaded results when the finder completes. As the list is then iterated, the cmp-field values for every entity can be found in the cache and no SQL needs to be executed. As a result, only 1 persistence operation is needed. |
|
|||||
© 2003 Core Developers Network Ltd "Core Developers Network", the stylized apple logo and "Core Associates" are trademarks of Core Developers Network Ltd. All other trademarks are held by their respective owners. Core Developers Network Ltd is not affiliated with any of the respective trademark owners. |