When we have applications that share a database and work together based on that shared data, we begin to worry about the freshness of data, especially for the applications that use -but don't preside over- such data.

This could be especially worrying for JPA users who simply can't afford to turn off their shared cache -at least for those vendors that have them (we do, since we're talking about Eclipselink).

Sooo… What are some of the seemingly friendly or immediate ways to do it? Well, this post will give you some which probably immediately appeared when some of us sifted through Eclipselink documentation. However, they are not always what they seem, so let us perform some experiments.

Preparing the Test Artifacts

So you have your IDE (hopefully ADF), Eclipselink and JPA imported, and the database (preferably Oracle). Let's define the sample artifacts we can test with:

The Project

Click here for an overview of a main-runnable class that can test your JPA entities.
I built a relatively simple ADF Custom Project with the following technologies, with the rest being default:

EJB
EJB Modeling
Java
Toplink
don't forget to import the library for your database connection (such as Oracle JDBC)
make sure your entities are being woven (click here for more info)

The structure is as follows:

For the model package, we have the following simple ERD for our tables, and the meat-code for the corresponding Java Entity Beans:


*Entity-Relationship Diagram*	*ATest.java*	*ATestRel.java*

Notice here that we have introduced a circular mapping between these two entities. This will be important later.

For our persistence.xml, we name our persistence unit "TestProject", along with additional properties below:

Wow, did we set a lot of logging in there. This is just so that we can keep track of what goes on during our queries. Of course, using the breakpoint feature of your IDE is useful in concert with the logs the above settings provide. Notice the eclipselink.profiler setting. This will allow in-depth access to event data in your persistence unit. Click here for more information.
And finally, we have our testing class:
We skip the total formality of transaction demarcation as we only want to study entity retrieval behavior.
Notice the getNumOfCacheHits() method? This is where we get to use the profiler we set in the persistence.xml file. As its name suggests, we take from the profiler the number of times the cached is checked against/retrieved from whenever we need an entity from the database.

Let's prepare for a bit.

A Quick (Re)Fresh(en)er

In Eclipselink's 2-tiered cache architecture, we gain two more sources from which our application touches the database: the Persistence Context (Isolated/L1) Cache, and the Persistence Unit (Shared/L2) Cache. The L1 cache has the shorter lifecycle which lasts only as long as its associated EntityManager lives (which can be defined as spanning a single transaction, or as long as it is referenced). The L2 cache, on the other hand, is associated with the deployment, and even spans across mutiple instances of EntityManagerFactory. Context caches can get their data from and cache data to their parent Units, thereby indirectly supplying to and obtaining data from each other.

More info on this here.

Now, the "problem" here is that these caches are used whenever possible. If your application is the only authority on the entities it uses, then you need not worry much, as the risk of stale data is immensely reduced -even further if you decide that all your entities have an version field. In our case, however, we situate ourselves with a subtle urgency for fresh data -and we(lit. "I") want to make the fix as simple as possible (lol, I'm no pro at Eclipselink yet). Of course, unless the entities in the actual application are scarce and shallow, turning off either cache should only be considered in most dire need, and should be out of the question in the first place. Here is a wonderful reference about caches from a contributor to the Eclipselink project: James Sutherland's blog on caching.

Assuming everything else is default, we define the following basic ways with which we retrieve entities:

Entity Manager's find: -The easiest and most intuitive non-sql way of obtaining an entity, all that needs to be passed is the Class of the desired entity and the ID you wish to search against. Without special instructions, known as hints, this type of retrieval is designed to check the cache first, and then the database; -by default, any object or relationship that is not already cached when a SELECT is done by a find is cached. If a relationship is queried for, but the corresponding object is already cached, it does not update the cache, even if data from the query is more up-to-date
JPA Queries: -JPA queries (that return entities) such as "SELECT o FROM ATest o WHERE o.id = 1"; -These intuitive queries execute a database call first, and then check the cache later. However, this is where it gets a bit gritty: if a resulting ID-Entity Class pair is already cached, it uses the cached instance in its entirety instead, despite the appearance of an actual query in the log, which, to the inexperienced Eclipselink user, would seem to mean that the object returned uses data from that most recent query. This is further worsened by the description of a default setting to be described later on.; -In essence, these queries would run the query for data, but the obtained IDs would be checked against the cache before anything. If the ID-Entity Class pair is not cached, then the object is built and cached using the data from the query executed.; -Object queries that are by (single) ID (such as "SELECT o FROM ATest o WHERE o.id = 1") however, tend to have logs that look like that of an entityManager.find execution. This is because it no longer needs to query for an ID, since it is already supplied as a parameter.

(art taken from ユフィ@紅楼夢C-19a)

Before we begin, let's add some more code to the test class shown previously:

As the method names suggest, we will use printData to display the data1 fields of both the ATest instance and its ATestRel relationship, and we will use updateData to update the database without using entities so that we do not affect the cache in any way. We use seconds and milliseconds as the new values so the data would be volatile as we please.

And finally, our new main method code:

I gave it a quick run, and here's a section of the log I would like to describe to you:
Observe line 3 above. It contains "Execute query ReadObjectQuery(name="readATest" referenceClass=ATest sql="SELECT ID, DATA1, DATA2, DATA3 FROM ATEST WHERE (ID = ?)")" It is at this point that the execution of the query can be redirected to check the cache already, especially since it is a find method and an ID is already supplied. If there is no instance cached in either of the L1 or L2 caches, then a query is performed, as shown by lines 5 and 10; where line 5 queries for the ATest of ID = 3, and line 10 queries for the object referenced by its atestRel Mapping. Lines 16 and 17 display the current data.

Let's pay special attention to line 18, however. We are shown that we hit the cache once. This was when the relationship atestRel was queried for. Remember that ATestRel is the owner of the relationsip (it has the actual foreign key in its table/entity), and contains a reference to the related ATest, which is what we obtained in the first query. Since the ATest it wanted was already obtained and cached, it no longer needed to hit the database for the object, and instead it hit the cache(either L1 or L2). Notice that in line 13 the log displays "Execute query ReadObjectQuery(name="atestfk" referenceClass=ATest )", but there was no corresponding "sql: " log after it.

Now that we know how to read the log, and understand a bit more, we can FINALLY FINALLY ACTUALLY being.
Here are some ways we might think of from the top of our heads (and perhaps light search results from Eclipselink Documentation).

1. Entity Manager's refresh

If you'd like to find out what goes on inside, you can begin tracing from EntityManagerImpl line 928.

Anyway, it does what it seems: it refreshes the entity with live data. Let us run our test class with the following code filling in em2's transaction:
We find in our logs that only the object we passed as the parameter got refreshed. This is because, by default, Eclipselink refreshes the object, and any relationship that we explicitly declare as would cascade a refresh (via a relationship annotation that defines CascadeType.REFRESH, or by the PrivateOwned). Additionally, we can add other QueryHints to this method, but I'm keeping this simple. After the find method from the em2, the cache hits rose to 2.

But what's that in the logs? The cache hits stayed at 2 even if the ATestRel was explicitly queried for during refresh, but was not updated? A bit confusing, indeed. Going by what we know, since the ATestRel was not updated, it should have meant that the cache was hit. I'm no expert about what goes on in the very depths of the querying, but by beginning a trace from [EntityManagerImpl: 928:refresh(entity)] to [ObjectBuilder: 2133: buildWorkingCopyCloneFromRow();], I can only a simplified explanation of what I saw: fresh data for ATest2 was taken indeed, but since the ATest2 entity was not defined to be refreshed, it did not need to be rebuilt, retrieved, nor recached, and so that field of the refreshed ATest entity remained.

Perhaps a cache hit means that the cache was checked first? If you know the actual explanation, I would like to hear from you in the comments!

Surely, in bigger apps, it becomes tedious to scatter these refresh calls, so let us move on.

2. QueryHint: Cache Usage

This one deserves a special mention, because I think it CAN BE misconception to use this for refreshing. Its setting called "DoNotCheckCache" sounds like the droid we're looking for. However, it isn't. This is already the default for JPA queries, and it enforces how some types of query behave when retrieving data, as in here. It actually means "Do Not Check Cache First", so that it can get primary keys for cache retrieval usage, and so that if special refresh settings are defined, then some, if not all, of the data is prepared for use.

Even if you use it on entityManager.find, it won't work -at least not for me. If you'd like to try it out anyway, here's the code I used for em2's transaction:


JPA Query	Entity Manager Find

3. QueryHints: Refresh and Refresh Cascade

Now these two are pretty much what we would want, since it they do what they seem, in all its unadulterated glory. As much as possible, this is what we would like to use. It can be put on entity manager methods and JPA Object queries.

REFRESH, by itself, only cascades to relationships explicitly mapped to cascade a refresh (also @PrivateOwned). Of course, this is modified by the REFRESH_CASCADE hint, which allows us a number of options.

It is such that I feel no need to demonstrate, although it has happened to me that using this combination (REFRESH and REFRESH_CASCADE.CascadeAllParts) has caused a relationship to merge to the database as an empty object, causing exceptions. I could not reproduce this problem yet, but I hope it is a rare case.

4. QueryHints: Cache Retrieve Mode (BYPASS) and Cache Store Mode (REFRESH)

These might also come across as intuitively sufficient, however they may be not in most cases.

CacheRetrieveMode.BYPASS works against the L2 cache. If what you are searching for is in the L1 cache, then it won't make a difference, no matter whether entity manager find or JPA Object Query is used. This is not the real problem, however.

Using CacheRetrieveMode.BYPASS will build only the root object with live data, and unlike REFRESH, the usage of fresh data is not cascaded onto relationships, even if a refresh cascade is defined as a hint or part of the mapping. But what it does build freshly, it also caches. Here is an example to try if you please (notice that only the root, ATest, has updated data):
CacheStoreMode.REFRESH, on the other hand, is quite the quirky one. In our setting, it would initially work like CacheRetrieveMode.BYPASS. However, its being a REFRESH is what makes it quirky. Even though it is a refresh, it does not automatically cascade for relationships that define a refresh cascade, but it refreshes relationships that are joined with the query for the root object, whether it be a left join through JPQL, or by annotation (!in my experience, with the exception of collection relationships). Below is sample code:
Additionally, it can work with REFRESH_CASCADE hints to emulate a refresh: use CascadePolicy.CascadeByMapping to emulate the default of the REFRESH hint, and CascadePolicy.CascadeAllParts to refresh it all.

5. Annotation: Cache: @Cache: disableHits and alwaysRefresh

Entity-level annotations!

The disableHits property of the Cache annotation will force a JPA Query to execute an actual database query when an attempt is made to build an instance of the annotated Entity, by telling the query to bypass checking the cache before the database is touched. It also does not work on entity manager find since the find checks the cache first. Does the setting sound familiar?

This particular setting has a more specific use, especially in JPA queries by ID/key. Remember that those queries can act like finds in that they can skip checking the database first. disableHits forces that database hit, but it still does not necessariy cause a fresh-data build. Try it out with the code below:


Entity Annotation	em2 Transaction Code

So, the data isn't fresh at all yet, but if you checked out the log for cache hits, it would show null. Then again, cache hits can still be quite a vague concept for us -perhaps it would be easier to understand if it had a different name. Anyway, disableCacheHits seemingly prevents us from hitting the cache, but we still get objects that are already registered. Furthermore, we are presented with a caveat in the find of the first em: instead of the usual 2 queries, we are shown 3, as the final query is one that touches the database for the root entity, which was already queried.

Perhaps a cache hits happens when we perform a pre-sql read (when "Execute query ReadObjectQuery(re..." appears in the log) on an object and we reference and find an instance from the cache, eliminating the need for touching the database. In the most recent test with the code above, we find that after the first forced query of em2, instead of a pre-sql read, we find: "UnitOfWork(388753413)--Thread(Thread[main,5,main])--Register the existing object model.ATestRel@344...". I can only speculate that, since we didn't go to the cache for an instance hit, we built a new object using existing data, but from a different source (something to do with where the unit of work gets data from)?

Moving on, the disableHits setting is a primer for the alwaysRefresh/onlyRefreshIfNewer setting. Let us add this to both of our entities:
With alwaysRefresh in place, any selection query that is executed against the database will cause a refresh on the object and its cached instance. However, even if it is a refresh, the effect of alwaysRefresh will not cascade along refreshed mappings, nor can it be modified by REFRESH_CASCADE. We'd have to annotate the related objects as well.

With the annotation down, try to run the code again.

If you're worried about the extra database trip for entities that have already been queried, this can be solved by a good ol' left join fetch. In our case, it will look like one of these: query.setHint(QueryHints.LEFT_FETCH, "o.atestRel.atestfk"); or em2.createQuery("SELECT o FROM ATest o LEFT JOIN FETCH o.atestRel.atestfk WHERE o.id = 1", ATest.class);. Also, you can use onlyRefreshIfNewer if the entity has an optimistic locking field and your application manages its state (insertion, merging, and deleting), so you don't always have to refresh it.

PHEW! THAT WAS A LOT!

Summary

So those are some ways you can keep data fresh. Keeping data fresh can sure be a pain, huh? Let's summarize them with a table, and also, let's leave out #2, since that isn't really a solution (why did I even put it in there):

	Pros	Cons
entityManager.refresh	-very simple and intuitive -flexible, depending on mappings and a refresh_cascade hint	-might not warrant frequent use -suceptible to n-querying, depending on object tree and refresh depth, since it requires more coding to deal with joining when using the em.refresh method
QueryHint.REFRESH, QueryHint.REFRESH_CASCADE	-also intuitive -as a hint, you can use it almost anywhere	-none, really. by all means, use it first -my only bias against this is when it caused an erroneous insert.
CacheRetrieveMode.BYPASS, CacheStoreMode.REFRESH		-behavior is too complex to use fully and efficiently. StoreMode has uses other than during selection, so it might shine more elsewhere
@Cache: -disableHits -alwaysRefresh	-as it is a setting at the entity level, maintenance is less of a problem	-forces the entity to follow the setting it is given, reducing dynamics -relegates the cache to be of less use as a helper for explicit selection (it is still useful for non-JPA Query ways of selection such as em.find)

And then a usage table:

	works on em.find	works on JPA Queries	cascades the freshness	recommended?
entityManager.refresh			by hint(REFRESH_CASCADE)	perhaps, sparingly
QueryHint.REFRESH, QueryHint.REFRESH_CASCADE	yes	yes	by hint (REFRESH_CASCADE)	yes
CacheRetrieveMode.BYPASS, CacheStoreMode.REFRESH	yes	yes	by hint (REFRESH_CASCADE) by join (only first level, non-collection)	no
@Cache: -disableHits -alwaysRefresh	no	yes	by related entity annotation	relatively simple to use, but no

Now, in the project I am working on, I use the fifth option, as I have an occasional problem when using REFRESH and REFRESH_CASCADE (as much as I'd like to use them), and this more common problem which I have yet to solve (the link might have the solution, but I have yet to try -I will update on that).

Anyway, for you guys out there who are far better experts than I am, hope you could share some more strats for this kind of thing.

So that's it! Hope you enjoyed the read!

More References:
Eclipselink documentation is sure hard to sift through, ain't it?

JPA (Java Persistence API) With Eclipselink and Hibernate: Special Uses

Sunday, January 17, 2016

Newbie Eclipselink/ADF: Testing Your JPA With a Runnable Class

Saturday, January 16, 2016

Newbie Eclipselink/ADF: An Experiment on Seemingly Possible Ways to Obtain Fresh Data