Open Source

Search Performance and Best Practices

For Search API and BigMemory SQL

Search Implementation and Performance
Best Practices for Optimizing Searches
Concurrency Notes
Options for Working with Nulls

Search Implementation and Performance

BigMemory uses a Search index that is maintained at the local node. The index is stored under a directory in the DiskStore and is available whether or not persistence is enabled. Any overflow from the on-heap tier of the cache is searched using indexes.

Search operations perform in O(log(n)) time. For tips that can aid performance, see Best Practices.

For caches that are on-heap only, Attributes are extracted during query execution rather than ahead of time, and indexes are not used. Instead, a fast iteration of the cache takes advantage of the fast access to do the equivalent of a table scan for each query. Each element in the cache is only visited once.

On-heap search operations perform in O(n) time. To see performance results, see Maven-based performance test, where an average of representative queries takes 4.6 ms for a 10,000 entry cache, and 427 ms for a 1,000,000 entry cache.

Best Practices for Optimizing Searches

Construct searches by including only the data that is actually required.
- Only use includeKeys() and/or includeAttribute() if those values are required for your application logic.
- If you don't need values or attributes, be careful not to burden your queries with unnecessary work. For example, if result.getValue() is not called in the search results, do not use includeValues() in the query.
- Consider if it would be sufficient to get attributes or keys on demand. For example, instead of running a search query with includeValues() and then result.getValue(), run the query for keys and include cache.get() for each individual key.
Note: includeKeys() and includeValues() have lazy deserialization, which means that keys and values are de-serialized only when result.getKey() or result.getValue() is called. However, calls to includeKeys() and includeValues() do take time, so consider carefully when constructing your queries.
Searchable keys and values are automatically indexed by default. If you will not be including them in your query, turn off automatic indexing with the following:
```
<cache name="cacheName" ...>
  <searchable keys="false" values="false"/>
  ...
  </searchable>
</cache>
```
Limit the size of the result set. Depending on your use case, you might consider maxResult or an Aggregator:
- if getting a subset of all the possible results quickly is more important than receiving all the results, consider using query.maxResults(int number_of_results) Sometimes maxResults is useful where the result set is ordered such that the items you want most are included within the maxResults.
- if all you want is a summary statistic, use a built-in Aggregator function, such as count(). For details, see the net.sf.ehcache.search.aggregator package in the Ehcache Javadoc.
Make your search as specific as possible.

Queries with iLike criteria and fuzzy (wildcard) searches might take longer than more specific queries.

If you are using a wildcard, try making it the trailing part of the string instead of the leading part ("321*" instead of "*123").

TIP: If you want leading wildcard searches, you should create a <searchAttribute> with the string value reversed in it, so that your query can use the trailing wildcard instead.
When possible, use the query criteria "Between" instead of "LessThan" and "GreaterThan", or "LessThanOrEqual" and "GreaterThanOrEqual". For example, instead of using le(startDate) and ge(endDate), try not(between(startDate,endDate)).
Index dates as integers. This can save time and can also be faster if you have to do a conversion later on.
Searches of eventually consistent BigMemory Max data sets are fast because queries are executed immediately, without waiting for the commit of pending transactions at the local node. Note: This means that if a thread adds an element into an eventually consistent cache and immediately runs a query to fetch the element, it will not be visible in the search results until the update is published to the server.

Concurrency Notes

Unlike cache operations, which have selectable concurrency control or transactions, queries are asynchronous and Search results are eventually consistent with the caches.

Index Updating

Although indexes are updated synchronously, their state lags slightly behind that of the cache. The only exception is when the updating thread performs a search.

For caches with transactions, an index does not reflect the new state of the cache until commit has been called.

Query Results

Unexpected results might occur if:

A search returns an Element reference that no longer exists.
Search criteria select an Element, but the Element has been updated.
Aggregators, such as sum(), disagree with the same calculation done by redoing the calculation yourself by re-accessing the cache for each key and repeating the calculation.
A value reference refers to a value that has been removed from the cache, and the cache has not yet been reindexed. If this happens, the value is null but the key and attributes supplied by the stale cache index are non-null. Because values in Ehcache are also allowed to be null, you cannot tell whether your value is null because it has been removed from the cache after the index was last updated or because it is a null value.

Recommendations

Because the state of the cache can change between search executions, the following is recommended:

Add all of the aggregators you want for a query at once, so that the returned aggregators are consistent.
Use null guards when accessing a cache with a key returned from a search.

Options for Working with Nulls

BigMemory SQL supports using the presence or absence of null as a search criterion:

select * from searchable where birthDate is null
select * from searchable where birthDate is not null

The Search API supports the same criteria:

myQuery.addCriteria(cache.getAttribute("middle_name").isNull());

The opposite case: require that a value for the attribute must be present:

myQuery.addCriteria(cache.getAttribute("middle_name").notNull());

which is equivalent to:

myQuery.addCriteria(cache.getAttribute("middle_name").isNull().not());

Alternatively, you can call constructors to set up equivalent logic:

Criteria isNull = new IsNull("middle_name");
Criteria notNull = new NotNull("middle_name");

Products

Components

More Documentation