.

Tuesday, October 31, 2017

Eclipselink 2.5.2 (JPA 2.1.0): Entity Cloning

This discussion made use of JPA 2.1.0 and Eclipselink 2.5.2. Specs and implementation and may change in future releases. It might be helpful to test with different versions by (locally) modifying the pom.xml file that appears in the github repository, Eclipselink Entity Copying, that contains in-depth code and discussion for the concepts relevant to this post.

Also, here is another great reference for when reading through the code: Eclipselink Attribute Groups

Among other cases, the need to clone entities JPA appears under the following conditions:
  • The absence of DTOs (Data Transfer Objects) - which may arguably represent the lack of a clean separation between the model layer and the controller layer
  • The need to protect entities from being polluted by changes, especially in multipart transactions
  • The need to simply persist or manage duplicate information

Although it can be argued that using a JPA provider to this extent is yet another antipattern, we won't talk about that kind of stuff here.

Anyway, say that the need arises for us to clone our entities. Of course, the easiest (but not necessarily the most succint) thing to do is to implement the Cloneable interface and code away on how we are to handle what.

Suddenly, we are hit with some realizations:
  • What if our entities traverse deeply?
  • What if our entities are only partially fetched?
  • How do we want to handle unfetched attributes?

This is when we find out how DTOs fall short. If JPA is to still be used in querying for the required information for, say, a view, then DTOs have to be smart to some degree. It might decided that different views that use information from the same entity would warrant per-view DTOs. DTOs might also have to be made aware of which attributes it should copy from the entity, especially when certain fetch optimizations (attribute narrowing via Eclipselink FetchGroups or JPA FetchGraphs) were used, since violating such optimizations usually lead to the loss of the advantage of their use in the first place.

Luckily, we're using Eclipselink, which has a nifty tool that we can use when in such a predicament: CopyGroups (and the JPAEntityManager.copy method). Honestly though, it won't be the most sturdy of swiss army knives, but the tool we'd be using a lot is luckily also the sharpest one in the set.

Eclipselink CopyGroups and JPAEntityManager.copy()


The main entry point to using this feature is in the following method of the org.eclipse.persistence.jpa.JpaEntityManager class:
It returns Object from a method without generics, so we still have to cast it. Also, the entityOrEntities parameter can also accept a Collection of the same type of entity. Also notice that the method accepts an AttributeGroup; it internally transforms this into a CopyGroup if it not already one. We'll usually pass CopyGroups when we use this method anyway.

We can obtain an instance of a JPAEntityManager via two ways:
  • Directly casting an EntityManager instance (make sure it runs on Eclipselink)
  • Calling unwrap(JpaEntityManager.class) on an EntityManager (again, it should run on Eclipselink)

With that, doing the actual copying is pretty much covered.

What we have to actually be familiar (and careful) with are the configuration options for the CopyGroup we pass to the copy method.

Experiment-discussion


The meat of this discussion actually appears in the test class found in this github repository:

Eclipselink Entity Copying

Simply run the "mvn test" Maven command from the directory that contains the pom.xml file to see if all the test pass (they should). After this, read the code found in the only test class under src/test/main/....

For this post, I'll just leave a summary of the discussion in the code for our reference.

Summary: CopyGroup Configuration


A CopyGroup has two main points of configuration:
  • cascade level, which defines which types (and not numerical depth) of associations the copying should include
  • declared attributes (as it is an AttributeGroup) which it should consider when cloning (only considered when the cascade level is Cascade Tree)

General Considerations

  • Whenever an attribute is added to a CopyGroup, its cascade level is set to CascadeTree. As CascadeTree is only depth that considers attributes, be mindful when adding attributes to a CopyGroup
  • When a FetchGroup, a type of AttributeGroup used for query optimization, is turned into a CopyGroup (via the toCopyGroup() method), it is automatically set to CascadeTree
  • Primary key and version columns can optionally be omitted from the copies via the CopyGroup configuraion methods setShouldResetVersion(boolean) and setShouldResetPrimaryKey(boolean). These options behave differently, according to the cascade level configured
  • Copies can back-reference; that is, when copying with circular references, same entities with the same key share the same reference

Cascade Level Options

  • Cascade All Parts
    • set via the CopyGroup method cascadeAllParts()
    • does not consider attributes it contains
    • copies ALL associations; initializes them if need be
      • For entities and associations that have been or are to be partially fetched, their respective copies would only have copied the attributes for partial fetching (as probably declared via FetchGroup when querying for the original)
      • For associations that were not declared in such a partial fetching scheme, they would be initialized as default
      • (i.e.) ALL associations would still be initialized; only, those that were declared to have a FetchGroup might lack some BASIC attributes
    • when an unfetched BASIC attribute is encountered, its corresponding value in the copy will be null
    • initialization triggered by copying affects the original; i.e. if an association was initialized via a query triggered by copying, then it also becomes initialized in the original
    • because ALL associations are initialized, it might not be worth using this cascade level for heavily associated entities
    • if the group is configured with setShouldResetPrimaryKey(true), the keys will only be reset if none of them are associations (all or nothing)
  • Cascade Tree
    • set via the CopyGroup method cascadeTree()
    • the cascade level is automatically set to Cascade Tree when an attribute is added to the CopyGroup
    • when a CopyGroup is obtained via a toCopyGroup() on a FetchGroup, the resulting CopyGroup uses the Cascade Tree level
    • when passing a CopyGroup without attributes, copying will involve "all attributes", though when it comes to associations, it is still unpredictable (needs more testing); it won't be probable that an empty CopyGroup will be used with the CascadeTree level anyway
    • when accessing an attribute/association from the copy that is not declared in the CopyGroup (which is not empty), an IllegalStateException is thrown;
      • this can be useful for adjusting/optimizing FetchGroups (use them as CopyGroups)
      • be careful when turning FetchGroups into CopyGroups:
        • if a complete CopyGroup is desired, then take it from the FetchGroup that is manually configured and passed as a query hint
        • FetchGroups taken from resulting entities (after being casted to FetchGroupTracker) have been broken down so that they only describe the entity it was taken from
    • if copying triggers initialization queries, then the original entities are affected as well
  • Cascade Private Parts
    • set via the CopyGroup method cascadePrivateParts()
    • supposed to behave like Cascade All Parts, except it cascades only associations annotated with @org.eclipse.persistence.annotations.PrivateOwned
    • it worked the other way around in the tests - all but the PrivateOwned association was cascaded
    • still unpredictable; needs more testing
  • Cascade None
    • set via the CopyGroup method cascadeNone()
    • still initialized associations, thus straying from its name and contract
    • still unpredictable; needs more testing

Unfortunately, only Cascade All Parts and Cascade Tree can be help up to their intention to a usable degree - luckily, Cascade Tree is the level that would see the most use.

In the end, the only CopyGroups actually worth using (fortunately, it should also be the common use case) are those derived from manually configured FetchGroups, or cascade-tree-level groups that were manually built with careful consideration.

Admittedly, this time, it probably seems like a disappointing turnout - one where we are only given a limited number of options.

Perhaps with this we can help each other dig deeper into this feature and learn more about it, or even have the guys over at Eclipselink help us with it.

In any case, once again, hope this helped. Thanks!

Sunday, September 24, 2017

Eclipselink 2.5.2 (JPA 2.1.0): Determining Fetch State

This discussion made use of JPA 2.1.0 and Eclipselink 2.5.2. Specs and implementation and may change in future releases. It might be helpful to test with different versions by (locally) modifying the pom.xml file that appears in the github repository Eclipselink Fetch State Experiment that contains in-depth code and discussion for the concepts relevant to this post.

Also, here is another great reference for when reading through the code: Eclipselink JPA 2.0 Persistence Utils

This "issues" disucussed here do not seem to have been resolved yet as of Eclipselink 2.6.x releases.

Lazy-loading (in various degrees) definitely benefits optimization, and being able to tell whether certain attributes of JPA entities are loaded or not pretty much lies in cusps between such decisions.

At the forefront, JPA actually provides a way to inspect an entity (or its attributes) to determine its fetch state - whether they are loaded or not - through PersistenceUtil.

It can be invoked via the following code:
Looks simple enough. However, is it reliable?

Because it relies on provider implementation, that greatly depends.

To be fair, aside from the method semantics, JPA (as of 2.0, and even in 2.1) provides specification described in the comments in an interface used in the implementation internals of PersistenceUtil, ProviderUtil:
  • isLoadedWithoutReference and isLoadedWithReference, both with arguments (Object entity, String attributeName)

    • "If the provider determines that the entity has been provided by itself and that the state of the specified attribute has been loaded, this method returns LoadState.LOADED."

    • "If the provider determines that the entity has been provided by itself and that either entity attributes with FetchType.EAGER have not been loaded or that the state of the specified attribute has not been loaded, this methods returns LoadState.NOT_LOADED"; and

    • "If a provider cannot determine the load state, this method returns LoadState.UNKNOWN."

    • These two methods are differentiated in that WithReference is permitted to obtain/initialize a reference, whereas the other is not. Note that Eclipselink does not obtain a reference for either one anyway.

  • isLoaded(Object entity)

    • "If the provider determines that the entity has been provided by itself and that the state of all attributes for which FetchType.EAGER has been specified have been loaded, this method returns LoadState.LOADED."

    • "If the provider determines that the entity has been provided by itself and that not all attributes with FetchType.EAGER have been loaded, this method returns LoadState.NOT_LOADED"; and

    • "If the provider cannot determine if the entity has been provided by itself, this method returns LoadState.UNKNOWN."

    • This method is also not permitted to obtain/initialize references.

To put it simply, JPA specifies that an entity is loaded if all the attributes and associations configured to be EAGER by "DEFAULT" have been initialized; if the entity is found to be loaded, then checking for attributes can be done properly and predictably.

The word "default" is stressed here because it actually describes explicit configuration via ORM XML or annotations - pretty much whatever you define at the beginning.

Watch how your provider implements the specification metioned in the comments in ProdivderUtil.

Eclipselink holds true to this (tested in version 2.5.2), and only to this extent. When dynamic configuration is done through runtime application of FetchGroups, PersistenceUtil becomes completely unusable.

Experiment-discussion


The meat of this discussion actually appears in the test class found in this github repository:

Eclipselink Fetch State Experiment

Simply run the "mvn test" Maven command from the directory that contains the pom.xml file to see if all the test pass (they should). After this, read the code found in the only test class under src/test/main/....

With all the technicalities discussed in the github repository linked previously, I'll just leave y'all with a summary of what was in there.

Summary


For Eclipselink, there are actually two main ways to accurately determine entity/attribute/association fetch state. Furthermore, they only work for entities that have been woven. Then again, weaving is what enables lazy loading, and we'd only need to determine fetch state at all if lazy loading was enabled. Anyway, here they are:
  • org.eclipse.persistence.queries.FetchGroupTracker, the simpler (but not the best) way
    • FetchGroupTracker is one of the interfaces that entity weaving adds to your entities. It tracks the FetchGroup used when the specific entity was loaded. A FetchGroup is simply a group of attributes used by Eclipselink to specify which attributes and associations a query or fetch should use.

    • FetchGroupTracker has the _persistence_isAttributeFetched(String attributeName) with which the load state of an attribute can be determined.

      Do this by simply casting the entity to FetchGroupTracker, and use the method accordingly:
    • Now, the problem (or maybe the cool thing) here is that this method is the counterpart of PersistenceUtil; it is unreliable when a FetchGroup is not present for the entity - this means that the entity was fetched using default configuration, where no basic attributes were made LAZY (if a basic attribute was made LAZY, it would have had a default FetchGroup). So if PersistenceUtil works on entities that used defaults while FetchGroupTracker works on entities that used a custom FetchGroup, perhaps they can be made to work together to cover each other's weaknesses (this is only an option; the better ways are described below).
  • org.eclipse.persistence.internal.jpa.EntityManagerFactoryImpl, the actual correct way
    • Even though Eclipselink's PersistenceUtil uses the relevant EntityManagerFactoryImpl methods internally, PersistenceUtil fails due to some of the logic written to follow JPA's specification. However, using EntityManagerFactoryImpl's various isLoaded(...) methods actually work properly (isLoaded(entity) follows JPA's definition of a loaded entity; you'll be using the more attribute-specific overloads).

    • The following snippet describes the methods in question:


And that's pretty much it. Hope this helped!

Friday, September 22, 2017

ADF (12.1.3) Table Row Selection: Dangerous Behavior

The ADF Table is one of those prolific components used in ADF web projects.

Over at the ADF project I worked on, there were cases where the selectionListener attribute was used to hook processing to the event. Of course, being a newbie that time, using the feature didn't go very smoothly.

This post will discuss things to watch our for when using the selection listener.



The following is a normal ADF table whose value is bound to some List whose values are irrelevant:


These tables are usually bound to a backing bean via the following code:

The generic type org.apache.myfaces.trinidad.util.ComponentReference is used to store component reference in a serializable and lightweight manner.

Row Selection and Activity

Selection Basics


By default, highlighting rows to mark them for selection is not enabled in a table. To augment this, simply modify the rowSelection property of the table. For example, if the rowSelection is set to "single", clicking on a row will yield the following visual output:


Although it has a little more added pleasant effects on the side, the highlighting is the most notable.

Behind the scenes, a table uses RowKeySets to keep track of the states of each row – like which rows are selected. The structure of a RowKeySet is quite close to a simple array of integers.

This allows the use of the table's getSelectedRowData() method to return the selected row, although a cast is needed as the return is of type java.lang.Object.

A problem with this is that this highlighting can be cached, even if the row is not actually selected! Furthermore, row selection cannot occur when a highlighted row is clicked on. This is very troublesome for pages that initially render with tables that have cached selection, while the tables have important row selection events – especially if there is only a single row, where the user can no longer trigger the event for the row whose selection is cached.

Of course, there are workarounds for this.

For pages with tables that initially render with cached selection, the solution can be quite a burden, but it is a solution nonetheless. It is as follows:
  1. Bind the table's selectedRowKeys property to the bean.
    • The bound bean property should be of type org.apache.myfaces.trinidad.model.RowKeySet; the property variable can be initialized using the concrete class org.apache.myfaces.trinidad.model.RowKeySetImpl. Initializing the variable with the empty constructors is sufficient.
    • Do not include the settermethod for this bean property so the table does not influence the state that the bean – in other words, the developer – should manage.
      • This means that for every process that might modify the table or its contents, the RowKeySet must be adjusted manually and constantly. Idioms exist for such cases:
        • adding an element to the table:
        • removing an item:
        • clearing selection:
  2. In the setter method for the table's binding, use the setRowIndex(int) method of the table during initialization, and pass it -1, as in "table.setRowIndex(-1)".
    • This is to make sure that the table has no initial back-end cached row selections (this does not make it safe from front-end selection caching, however)
    • Doing this disallows proper use of the varStatus property of the table for some reason – row indices are not displayed properly, but it does not negatively affect server-side processing.
  3. If the table resides in a reusable region, make sure the invoking page refreshes the region before it is displayed. The following snippet does just that:

On the other hand, when clearing the selection only has to happens during the lifecycle of the view, then clearing the table selection is sufficient.

 The Selection Event


Another main benefit of allowing selection is the ability to declare a selectionListener on the table. This property a allows a hook on which logic can be serviced upon the selection of a new row in the table. The following empty method snippet describes the signature of the method to bind:

Using this feature imposes new behavior upon its table. Without a selectionListener, row selection will not cause (partial) submission of table data; input components with validation will not trigger by merely selecting rows on the table. On the other hand, declaring a selectionListener the table submits its contents, tripping the validation of any enabled component within the table. Although this might sound very risky and could potentially produce vulnerabilities (this post discusses some of that), this is also a step taken to extend the operability of tables - also, because the events would usually reside on the server, it only makes sense that the table produces a request.

Of course, using more features introduces more considerations and maintenance requirements. It cannot be reiterated enough how invaluable awareness is throughout the construct of a page; couple it with knowledge of the problems discussed here, and, as with many other features, the output can be made robust.

Adding a selection event causes the table, and any partial targets to submit. This means that a selection will not succeed if the components queued for submission are not satisfied. Furthermore, the problem is aggravated by the inconsistency introduced regarding selection: even though the selection event is not fired, the selection changes anyway! This causes selection integrity efforts done during the selection event to go awry.

Here is a quick demonstration:
  1. Initially, the input components are not required so that the first selection can be made, and the selection event fired properly. A row is then selected (the text components below the table are partial targets of the table):

    • The value for "current selection via table" is directly bound in the JSF through the expression #{viewScope.backingBean.table.selectedRowData.name}, with the following properties:
      • backingBean refers to a viewScope bean that holds the table component
      • table is the bean property of the backing bean where the table is bound
      • selectedRowData is the table method that returns what is actually selected
      • name is a property of a row in the table that corresponds to the first column (output components)
    • The value for "current selection via selection listener" is bound to a viewScope bean property whose value is update during the table's selection event with code following from this snippet:
  2. The "make input required" button is clicked so that the input fields on the second column are now required. This is implemented as a simple toggle in the view-scope bean.
  3. The second row is selected, and then the page is manually refreshed (via f5 in most browsers) since partial rendering will not occur if there are unsatisfied fields during submission:

Notice that "current selection" text fields at the bottom are inconsistent. Again, "current selection via table" takes the value straight from the table, with the EL string "#{viewScope.backingBean.table.selectedRowData.name}". On the other hand, "current selection via selection listener" takes its value from a bean property that is managed by the table's selectionListener. It becomes clear that the selection event was not fired.

There is quick remedy for this problem: setting the table's immediate attribute to "true". Doing so will bypass errors from component validation as the table short-circuits its submission, if only to allow unhindered row changes. Take careful consideration of its implications to your desired program flow and guard your code well against problems (increasing such robustness may involve relocatin validation and manually queueing events, among others).

Use the immediate attribute for components responsibly. Normally, it is to be used with action components such as buttons or links, but finer requirements may merit the use of the attribute in input components as well. Be careful as using this attribute introduces delicacy in the lifecycle of a JSF request. This blog post provides a full explanation of what the attribute does.

Table Editing Mode

Table Editing Mode is another neat feature of ADF tables which appears as the editingMode property. It has two values: editAll (the default), which has the table leave the its input components to their own conditions for being disabled; and clickToEdit, which has the table disable the input components of a row that is not selected (hence, "click on the row to edit" its values).

As cool as this feature is, a selection listener combined with rows with components that use validation throws a spanner into its workings.

Let's observe such a case. Here is the previous table (still not made immediate), now modified to have the clickToEdit editingMode (notice the input text box only appears on the selected row):



The input text box components on the second row were left so that input is required. Since a selection listener causes table data submission upon row selection change, the requiredness should trip an error:



A truly frightening sight, even as we have not manually refreshed the page!

Now, let us refresh the page to find out of the selection listener was called:



It is now also clear that even the selection event was not fired, but the actual selection still changed.

Hold on, what about submission?

Let's satisfy the requiredness of the input field, and then click on the submit button:



Though the event is invisible (as the implementation is hollow), the submit actually proceeded. This spells bad news for inactive rows with invalid data – manual validation has to be done upon submit when using this feature.

Summary

Selection is pretty much commonplace when using tables, though the newbie ADF developer may find trouble when having to use the selectionListener.

When selection is enabled in a table, a selectionListener can further be declared to handle selection events as desired. However, doing so now forces the table to submit its contents as it now makes requests whenever row selection (of a different row from the previous) is done.

This can be troublesome when input components in the table have validation issues, as row selection now only "occurs" in the view (via highlight changes), as the selection event does not get called. A quick solution for this is to have the table's immediate attribute set to "true". Of course, be careful of its implications.

Another nice feature of ADF tables is the editingMode. Depending on the value, a table can have input components of non-selected rows disabled (by default, all input components of the table would be enabled).

Of course, this feature is not safe from what selection events and component validation impose. With both in commission, row selections that trip validation errors still have the highlight (only the visual element) move, while the actual selection remains, also without the selection event being called. Furthermore, "inactive" rows with invalid input do not trip their validations, so proper collective validation also has to be done deeper in server-side logic (as better practices also propose).


And that's all. Hope you find this helpful. Cheers!

Thursday, May 11, 2017

JPA Fetch Behavior: Eclipselink and Hibernate - and Configuration Options!

JPA is godsend for happily churning out code that works with data from - you've guessed it - databases! Of course - and as I've learned when it comes to building things -, nothing ever goes blissfully right.

Considering Murphy's laws, it's quite scary when things go right for so long - land mines are silent and patient.

Not-so-luckily, I'm here to discuss things that are only more apparent. Since this is going to be about two JPA providers, let's dabble into some history.

This post makes use of Hibernate 5.2.10-FINAL, and Eclipselink 2.6.4.

Some History


Back in 2000, during the advent of EJB 2.0, JEE developers had been given a powerful feature that offered to encompass data source interaction into the EJB toolset: CMP (Container Managed Persistence).

Of course, and as usual, its reception was by a divided crowd. Integrating this cool new feature into EJB meant that JEE-compliant servers had to be used, leaving out non-JEE projects. It also suffered from the interface-and-descriptor-heavy way of creating EJB components, while still lacking some necessary features.

A short while later, in 2001, Gavin King, along with some colleagues, started Hibernate to provide an alternative to EJB's CMP, but more simplified and a fuller set of features.

Eclipselink, on the other hand, started as Toplink, which started during the 1990s, with its Java version emerging between 1996 and 1998. It was then acquired by Oracle to merge into its Oracle Fusion Middleware product. Later, in 2007, code was donated to the Eclipse Foundation, giving birth to the Eclipselink project.

Over those years, it became more apparent and voiced that the EJB was in dire need of improvement. Thus, the JSR for EJB 3.0 was started with the goal of simplifying EJB. Developers from the Hibernate team joined in this effort. The JPA specification then emerged in 2006.

Since then, Hibernate 3.0 has become a certified JPA 2.0 implementation, while Eclipselink was selected as the the reference implementation for JPA 2.0 (and 2.1).

Relevant Basics


JPA is one of those specifications that follow configuration by exception. This means that it offers strong and comprehensive conventions, and using its more gritty features are only warranted by more specific use cases.

This is certainly a good general thought, but personally, I like to use as much features as I can, both from JPA and extras from the provider, to be able to produce as best and robust experiences as possible. Of course, I still try very hard to minize code and avoid luxuries. I guess what I'm trying to say here is that you should use as much as you need; just be thoughtful.

Entity Associations


JPA brings a simple and amazing convention. One of its major benefits is that it allows us to navigate through relational data in an object-oriented manner by allowing us to specify entity relationships as fields, represented by association mappings (annotated with @OneToOne, @OneToMany, etc.). This is what we call "fetching data".

We declare entity associations by using the JPA annotations @OneToOne, @OneToMany, @ManyToOne, and @ManyToMany (aside from @JoinColumn).

Their naming draws directly from their ERD (Entity Relationship Diagram) counterparts, so it can be quickly understood that they denote association and multiplicity.

 Of course, the discussion will not be limited to associations as even basic fields can be configured to some extent in optimization efforts.

Configuring these things, along with carefully engineering the entities, is key to getting the best performance out of this great piece of technology. Honestly though, the defaults for fetching suck.

Obtaining Entities


There are two main ways with which entities can be obtained: JPQL and EntityManager operations. Though JPQL is also pretty much achieved via the EntityManager, let us differentiate them from EntityManager's less verbose retrieval ways such as find(...) or getReference(...), as they are behave differently between providers, as will be discussed later on.

One convention to note that JPA covers these retrieval methods is that by default, they will always consult caches before executing a query and building objects. The latter part, in other words, means that even if a query was actually executed, if an entity that could have been built from the query data already exists in a cache, then that cached entity will be returned instead. If you're using Eclipselink and want to know some more about it, check out my other post (I'd love to hear your thoughts and questions).


Let us move on to the meat of this discussion.

Fetching


There are two main considerations in fetching associations that work in concert to increase performance: fetch latency (pretty much referred to as "eagerness" or "lazy loading") and fetch strategy. They deal with when and how data is fetched, respectively. It is easy to mix them up as they have configurations that overwrite each other. Generally, latency is usually a necessary consideration when planning a strategy.

Fetch Latency


JPA specifies default behavior for latency when it comes to associations: eager for toOne associations, and lazy for toMany associations. Eager fetching means that the association is also immediately retrieved (via cache or query) when its owning entity is retrieved (and managed). On the other hand, lazy fetching simply means that the association need not be retrieved until it is needed (accessed). Needless to say, basic/scalar fields are also eagerly fetched.

Deciding whether single-valued (toOne) associations should be eagerly or lazily fetched depends heavily on architectural decisions, among others. Eager fetching can make sense when entities are modeled so that their associations are almost always needed, otherwise these associations can be made lazy (usually eventually, as decisions lead to expansion of these entities).

It is also worth noting that pre-JPA Hibernate defaults all associations to lazy fetching. I read somewhere in Manning's Java Persistence With Hibernate (1st Edition, 2005) that JPA only specified single-valued relationships as eager by default because it was easier to support.

Personally, I follow Hibernate's default (also recommended by Hibernate) and explicitly configure single-valued associations as lazy (leaving multi-valued associations to its already lazy default) as it is more fail-safe and minimizes unnecessary fetching. I mentioned "fail-safe" because certain fetch strategies can proceed to querying and initializing relationships as handling for when their contracts are broken.

Configuring Fetch Latency


Of course, JPA specifies the following configuration options to augment latency:
  • Association annotations @OneToOne, @OneToMany, @ManyToOne, and @ManyToMany have a fetch attribute that can be set using values from javax.persistence.FetchType (LAZY or EAGER).
  • Scalar attributes can be annotated with javax.persistence@Basic, which also has a fetch that accepts a FetchType value as mentioned above.
However, Hibernate and Eclipselink have  a prerequisite to fully leverage these features: bytecode instrumentation, also known as weaving. As the name suggests, this has to do with modifying the class files of entities, during deployment or runtime, to extend their behavior and operability. Being able to configure fetch latency is only one of these extensions.

There are different ways to do bytecode instrumentation; just refer to related documentation for your provider: Hibernate or Eclipselink. Additionally, Java SE code can be run with weaving by simply adding a javaagent to the run options. Check this post for information.

Both Hibernate and Eclipselink make use of indirection when it comes to lazy relationships. Simply put, when an entity is retieved, its relationships that are configured to be lazy will be filled in by a proxy object that knows how to initialize the relationship it proxies. Hibernate has more configurations when it comes to this proxy.

Finally, it is also worth noting that Hibernate does not actually need bytecode instrumentation to configure latency for associations, but it does for scalar fields.

Fetch Strategy


Phew! That was a lot of primer information! Finally, we can get our hands on some code!

Because I'll be demonstrating code and show logs, here is an ERD for reference:




 Spare the unrealistic modeling of employee information; my aim is to demonstrate fetch behavior in certain cases.

Fetching Single-Valued Associations


Single-valued associations are your toOne mappings.

Eclipselink


The default fetch behavior for these relationships in Eclipselink is to execute a separate query to initialize the association. Thus, referring to the ERD, if we were to query for an Employee, it would take another query to fetch its EmployeeInfo and yet another to fetch the EmployeeInfo_Ext, for a total of three queries.

This is true for both JPQL and EntityManager methods.

Obviously, this default is not even close to optimal as database trips are usually costly. On the other hand, Hibernate already automatically joins these for us, with extra behavior if a JPQL query is used.

Hibernate: Acquisition by JPQL


 For JPQL queries, it executes an isolated fetch for the entity first, then initializes its eager associations and applies joins whenever it can.

For the following subsection of the ERD;


With everything left eager, a JPQL query of

SELECT o FROM Employee o
Would execute an initial query true to its JPQL statement:

select
        employee0_.ID as ID1_0_,
        employee0_.INFO_ID as INFO_ID4_0_,
        employee0_.NAME as NAME2_0_,
        employee0_.VERSION as VERSION3_0_ 
    from
        EMPLOYEES employee0_
Then, for each Employee with PersonalInfos to initialize, it follows up with these queries:

select
        personalin0_.ID as ID1_1_0_,
        personalin0_.AGE as AGE2_1_0_,
        personalin0_.EMAIL as EMAIL3_1_0_,
        personalin0_.EXT_ID as EXT_ID4_1_0_,
        personalin1_.ID as ID1_2_1_,
        personalin1_.ADDRESS as ADDRESS2_2_1_,
        personalin1_.CONTACT_NUM as CONTACT_3_2_1_ 
    from
        PERSONAL_INFO personalin0_ 
    left outer join
        PERSONAL_INFO_EXT personalin1_ 
            on personalin0_.EXT_ID=personalin1_.ID 
    where
        personalin0_.ID=?
]]>
Hibernate makes a nice move in keeping true to what the JPQL defines. This behavior is what splits the queries into two parts where the initial query is one that follows the JQPL; let us then call this the main query. The supporting queries that aim to initialize eager relationships then execute as per provider's find implementation  (seemingly).

This behavior seems to aim to hand as much control as it can to the JPQL, because it is what is most visible to the developer - so that little is left to implicit default behavior.

This is also where configuration all associations to be lazy come in; when we configure JPQL to fetch everything that we need, then there's no need to worry about attempts to fetch anything else.

Hibernate: Acquisition by Find

For entity manager's find, fetch control is completely handed over to defaults and configurations - no more splitting queries.

With the same section of the ERD and everything still eager, an EntityManager.find executes the following query:

Hibernate: 
    select
        employee0_.ID as ID1_0_0_,
        employee0_.INFO_ID as INFO_ID4_0_0_,
        employee0_.NAME as NAME2_0_0_,
        employee0_.VERSION as VERSION3_0_0_,
        personalin1_.ID as ID1_1_1_,
        personalin1_.AGE as AGE2_1_1_,
        personalin1_.EMAIL as EMAIL3_1_1_,
        personalin1_.EXT_ID as EXT_ID4_1_1_,
        personalin2_.ID as ID1_2_2_,
        personalin2_.ADDRESS as ADDRESS2_2_2_,
        personalin2_.CONTACT_NUM as CONTACT_3_2_2_ 
    from
        EMPLOYEES employee0_ 
    left outer join
        PERSONAL_INFO personalin1_ 
            on employee0_.INFO_ID=personalin1_.ID 
    left outer join
        PERSONAL_INFO_EXT personalin2_ 
            on personalin1_.EXT_ID=personalin2_.ID 
    where
        employee0_.ID=?

With unconfigured fetching, Hibernate is able to fetch all of it in a single query. It also becomes apparent that outer join is the default fetch behavior for eager associations. Perhaps we can say that Hibernate follows configuration by exception more.

Of course, this doesn't mean one is better than the other. Eclipselink just handles it differently, and it probably doesn't want to do anything fancy unless told so.

Configuring Single-Valued Associations


From here on out, let us assume that all associations are configured to be lazy.

Naturally, there are ways to augment fetch behavior:

1. JPA: JPQL Join Fetching


In runtime or named-query JPQL, LEFT JOIN FETCH or INNER JOIN FETCH clauses can be declared after the FROM clause, and before the WHERE clause (if any). A sample JPQL statement follows:

SELECT o FROM Employee o LEFT JOIN FETCH o.info
]]>
Having the configuration in the JPQL itself, the main query follows suit. Hibernate's execution follows:

select
        employee0_.ID as ID1_0_0_,
        personalin1_.ID as ID1_1_1_,
        employee0_.INFO_ID as INFO_ID4_0_0_,
        employee0_.NAME as NAME2_0_0_,
        employee0_.VERSION as VERSION3_0_0_,
        personalin1_.AGE as AGE2_1_1_,
        personalin1_.EMAIL as EMAIL3_1_1_,
        personalin1_.EXT_ID as EXT_ID4_1_1_ 
    from
        EMPLOYEES employee0_ 
    left outer join
        PERSONAL_INFO personalin1_ 
            on employee0_.INFO_ID=personalin1_.ID 
    where
        employee0_.ID=?

Eclipselink behaves the same way.

It is also worth noting, though probably obvious, that join fetching an association overrides it to be eager.

There is a difference between these two providers, however.

Hibernate

Hibernate does not allows left join fetching beyond immediate associations of the root entity.

Perhaps it makes us consider; we did the join because we need it, but could there have been a better way to model things instead? I think it doesn't sound like it's worth the trouble anyway, so I'm not really in favor of this limitation.

Eclipselink

On the other hand, Eclipselink allows deeper fetch joins, as this behavior also aligns with its internals and availability of more dynamic configuration options.

Using join fetching is certainly nice and standard. However, using them on JPQL queries may render them specialized (unless they already appear where they are used), especially if they are NamedQueries. NamedQueries are nice because for most IDEs, you can already get early warnings during development. On the other hand, increasing the metadata for an entity class file is not very nice. This concern will be discussed later.

In any case, NamedQueries only contribute little to metadata clutter.

2. Eclipselink and Hibernate: Fetch Annotations


Without a standard way to include fetch configurations in metadata, each provider has specific annotations instead.

For the examples, let us refer to the same section of the ERD:


Hibernate

Hibernate offers the @Fetch annotation, and it accepts a FetchMode value which has three options: join, select, and subselect. 

FetchMode: Join

Recall the entity acquisition examples where every was defaulted. It was mentioned that eager associations used outer join. It is represented by this FetchMode value.

Now that all the associations have been made lazy, let's try it out! We annotate Employee's association to PersonalInfo with @Fetch using join:

    @JoinColumn(name = "INFO_ID")
    @Fetch(value = org.hibernate.annotations.FetchMode.JOIN)
    @OneToOne(fetch = javax.persistence.FetchType.LAZY)
    private PersonalInfo info;
Now let's try out acquiring the Entity by a JPQL query SELECT o FROM Employee o:

    Hibernate: 
    select
        employee0_.ID as ID1_0_,
        employee0_.INFO_ID as INFO_ID4_0_,
        employee0_.NAME as NAME2_0_,
        employee0_.VERSION as VERSION3_0_ 
    from

The association was not joined. Recall the earlier acquisition by JPQL example once more, where I mentioned that the main query kept true to the JPQL. This is the reason for that.

However, even if all associations were declared lazy, the fetching did not finish; separate queries were executed to fetch PersonalInfo objects:

Hibernate: 
    select
        personalin0_.ID as ID1_1_0_,
        personalin0_.AGE as AGE2_1_0_,
        personalin0_.EMAIL as EMAIL3_1_0_,
        personalin0_.EXT_ID as EXT_ID4_1_0_ 
    from
        PERSONAL_INFO personalin0_ 
    where
        personalin0_.ID=?
Hibernate: 
    select
        personalin0_.ID as ID1_1_0_,
        personalin0_.AGE as AGE2_1_0_,
        personalin0_.EMAIL as EMAIL3_1_0_,
        personalin0_.EXT_ID as EXT_ID4_1_0_ 
    from
        PERSONAL_INFO personalin0_ 
    where
        personalin0_.ID=?

Two of the results had values for their info associations, and they were loaded right away.

Needless to say, using EntityManager.find will successfully join the association.


Since eager associations are configured to use join, then the only explanation should be that configuring a FetchMode of join on a lazy association would coerce it to eager. This makes sense since joins cannot be delayed. Furthermore, if we annotate deeper into PersonalInfo's association with PersinalInfoExt, it also gets included in the join. 

However, this undoes our precaution of making all associations lazy.

On another note involving eager join fetching;  since the joins can go indefinitely deep, it may be of interest to limit the depth. This can be done by including the hibernate.max_fetch_depth property in the persistence.xml file, set to a reasonable value:


FetchMode: Select

Hibernate 5.2 javadocs specifies this mode to "use a secondary select for each individual entity, collection, or join load". Sounds like the default that Eclipselink has.

A quick test of using it on a lazy single-valued association showed that it doesn't make the association eager, so let's try explicitly making the relationship eager:

    @JoinColumn(name = "INFO_ID")
    @Fetch(value = FetchMode.SELECT)
    @OneToOne(fetch = FetchType.EAGER)
    private PersonalInfo info;
It had the following results:

Hibernate: 
    select
        employee0_.ID as ID1_0_,
        employee0_.INFO_ID as INFO_ID4_0_,
        employee0_.NAME as NAME2_0_,
        employee0_.VERSION as VERSION3_0_ 
    from
        EMPLOYEES employee0_
Hibernate: 
    select
        personalin0_.ID as ID1_1_0_,
        personalin0_.AGE as AGE2_1_0_,
        personalin0_.EMAIL as EMAIL3_1_0_,
        personalin0_.EXT_ID as EXT_ID4_1_0_ 
    from
        PERSONAL_INFO personalin0_ 
    where
        personalin0_.ID=?
Hibernate: 
    select
        personalin0_.ID as ID1_1_0_,
        personalin0_.AGE as AGE2_1_0_,
        personalin0_.EMAIL as EMAIL3_1_0_,
        personalin0_.EXT_ID as EXT_ID4_1_0_ 
    from
        PERSONAL_INFO personalin0_ 
    where
        personalin0_.ID=?

We find that it executes three queries: one for the Employees, and two for those two employees who have PersonalInfo values. Surprisingly, it does not override any fetch latency configuration.

There's nothing too special with this configuration involving single-valued associations. This actually performs worse than joins.

FetchMode: Subselect

According to Hibernate 5.2 javadocs, this option is "available to collections" only.

Eclipselink

Eclipselink provides two ways to configure fetching: @JoinFetch and QueryHints.

@JoinFetch

Let's get right down to it and stick it to the PersonalInfo association:

    @JoinColumn(name = "INFO_ID")
    @JoinFetch(JoinFetchType.OUTER)
    @OneToOne(fetch = FetchType.LAZY)
    private PersonalInfo info;
Regardless of acquisition method, if we query this by ID, the following statement is executed:

SELECT t1.ID, t1.NAME, t1.VERSION, t1.INFO_ID, t0.ID, t0.AGE, t0.EMAIL, t0.EXT_ID 
FROM EMPLOYEES t1 
LEFT OUTER JOIN PERSONAL_INFO t0 ON (t0.ID = t1.INFO_ID) 
WHERE (t1.ID = ?)
Predictably, this also coerced the association to become eager. What is interesting is that the FetchModes available specify between inner or outer join. Unfortunately, the inner join option did not work when I was experimenting; perhaps it's a bug or I'm missing something.

As another experiment, let us also apply this annotation to PersonalInfo's association with PersonalInfoExt:

    @JoinColumn(name = "EXT_ID")
    @OneToOne(fetch = javax.persistence.FetchType.LAZY)
    @JoinFetch(JoinFetchType.OUTER)
    private PersonalInfoExt moreInfo;
Again, regardless of acquisition method, we are greeted by the following query:

SELECT t1.ID, t1.NAME, t1.VERSION, t1.INFO_ID, t0.ID, t0.AGE, t0.EMAIL, t0.EXT_ID 
FROM EMPLOYEES t1 
LEFT OUTER JOIN PERSONAL_INFO t0 ON (t0.ID = t1.INFO_ID)
The join made was only for the first level. Even if the deeper association is also annotated, it did not get fetched. It appears that Eclipselink does not completely coerce it to eager, and it cannot cascade its join beyond one level. I've tried setting the deeper association to eager, but it was loaded in a separate query.

This kind of draws a parallelism between the limitations of Hibernate's JPQL joins and Eclipselink's annotation joins.

We can only then successfully join the PersonalInfoExt association if our query root is PersonalInfo:

SELECT t1.ID, t1.AGE, t1.EMAIL, t1.EXT_ID, t0.ID, t0.ADDRESS, t0.CONTACT_NUM 
FROM PERSONAL_INFO t1 
LEFT OUTER JOIN PERSONAL_INFO_EXT t0 ON (t0.ID = t1.EXT_ID) 
WHERE (t1.ID = ?)
]]>

3. QueryHints: FETCH and LEFT_FETCH


Query hints are actually a standard JPA way to configure query behavior. Being more granular, than their entity annotation counterparts. Here are two ways to apply such hints:
  • Part of NamedQuery declaration
    • 
      @NamedQuery(name = "findEmployee", query = "SELECT o FROM Employee o",
           hints =
             { @QueryHint(name = QueryHints.LEFT_FETCH, value = "o.info.moreInfo"),
               @QueryHint(name = QueryHints.READ_ONLY, value = "true") })
      
  • Runtime
    • 
      TypedQuery query = em.createQuery("SELECT o FROM Employee o", Employee.class);
      query.setHint(QueryHints.LEFT_FETCH, "o.info.moreInfo");
      query.setHint(QueryHints.READ_ONLY, HintValues.TRUE);
              
      //or when using em operations:
      Map hints = new HashMap<>();
      hints.put(QueryHints.LEFT_FETCH, "o.info.moreInfo");
      hints.put(QueryHints.READ_ONLY, HintValues.TRUE);
      Employee emp = em.find(Employee.class, 3L, hints);
      

Eclipselink is a provider that allows extensive configuration at query time, thanks to the abundance of hints it provides.

The FETCH and LEFT_FETCH query hints names are constants from the org.eclipse.persistence.config.QueryHints class. They provide the behavior of the INNER and OUTER modes of @JoinFetch, respectively, but can cascade the join beyond the immediate association.

There is little need for an example, so here are some descriptions and considerations instead:
  • It joins all specified relationships into a single query.
    • It can specify associations up to any depth.
  • This hint can be applied multiple times, with each time building on top of the current configuration.
  • If a left fetch hint is a substring of another, then the shorter one is not needed;
    • e.g. if LEFT_FETCH on "o.info.moreInfo" is specified, there is no need to apply one for "o.info".
  • The value passed should begin with a fetch alias, such as "o.".
    • i.e. The first traversal segment, such as in "o.info.moreInfo" is treated as an alias, so passing in "info.moreInfo" will cause the configuration to fail.
    • This fetch alias can be any arbitrary string, though "o." is usually used.
It can be said that the extensiveness of Eclipselink's LEFT JOIN FETCH in JPQL is attributed to its internals that are able to heavily augment the behavior of many provisions during query time.

Configuring Multivalued/Collection Associations


For this section we will use the collections part of the ERD:


JPA defaults these associations, our toMany annotations, to lazy, which only makes sense. Now let's explore how it fetches.

Because initializing these relationships is deferred to usage time, we have to call the corresponding collection's get(something) or isEmpty() method to trigger initialization (usually within a transaction or as long as the entity isn't detached).

Let us then get an Employee with three Beneficiaries using EntityManager.find, after which we call isEmpty() on its beneficiaries collection:

Employee emp = em.find(Employee.class, 1L);
emp.getBeneficiaries().isEmpty();
Throughout the method, the following queries are executed by Hibernate:

Hibernate: 
    select
        employee0_.ID as ID1_3_0_,
        employee0_.INFO_ID as INFO_ID4_3_0_,
        employee0_.NAME as NAME2_3_0_,
        employee0_.VERSION as VERSION3_3_0_ 
    from
        EMPLOYEES employee0_ 
    where
        employee0_.ID=?

Hibernate: 
    select
        beneficiar0_.EMPLOYEE_ID as EMPLOYEE5_0_0_,
        beneficiar0_.ID as ID1_0_0_,
        beneficiar0_.ID as ID1_0_1_,
        beneficiar0_.AGE as AGE2_0_1_,
        beneficiar0_.INFO as INFO4_0_1_,
        beneficiar0_.EMPLOYEE_ID as EMPLOYEE5_0_1_,
        beneficiar0_.NAME as NAME3_0_1_ 
    from
        BENEFICIARIES beneficiar0_ 
    where
        beneficiar0_.EMPLOYEE_ID=?

]]>
This time, let's run it with Eclipselink:

SELECT ID, NAME, VERSION, INFO_ID FROM EMPLOYEES WHERE (ID = ?)
 bind => [1]

SELECT ID, AGE, NAME, EMPLOYEE_ID, INFO FROM BENEFICIARIES WHERE (EMPLOYEE_ID = ?)
 bind => [1]
The ERD shows that Beneficiary has another collection; if we access that, another query is executed:

Hibernate: 
    select
        contactinf0_.BENEFICIARY_ID as BENEFICI4_2_0_,
        contactinf0_.ID as ID1_2_0_,
        contactinf0_.ID as ID1_2_1_,
        contactinf0_.ADDRESS as ADDRESS2_2_1_,
        contactinf0_.BENEFICIARY_ID as BENEFICI4_2_1_,
        contactinf0_.CONTACT_NUM as CONTACT_3_2_1_ 
    from
        CONTACT_INFO contactinf0_ 
    where
        contactinf0_.BENEFICIARY_ID=?
Eclipselink:
    SELECT ID, ADDRESS, CONTACT_NUM, BENEFICIARY_ID 
    FROM CONTACT_INFO 
    WHERE (BENEFICIARY_ID = ?)
As a side note, if we make these two multivalued associations eager, Eclipselink remains the same, while Hibernate executes the following query:

select
        employee0_.ID as ID1_3_0_,
        employee0_.INFO_ID as INFO_ID4_3_0_,
        employee0_.NAME as NAME2_3_0_,
        employee0_.VERSION as VERSION3_3_0_,
        beneficiar1_.EMPLOYEE_ID as EMPLOYEE5_0_1_,
        beneficiar1_.ID as ID1_0_1_,
        beneficiar1_.ID as ID1_0_2_,
        beneficiar1_.AGE as AGE2_0_2_,
        beneficiar1_.INFO as INFO4_0_2_,
        beneficiar1_.EMPLOYEE_ID as EMPLOYEE5_0_2_,
        beneficiar1_.NAME as NAME3_0_2_,
        contactinf2_.BENEFICIARY_ID as BENEFICI4_2_3_,
        contactinf2_.ID as ID1_2_3_,
        contactinf2_.ID as ID1_2_4_,
        contactinf2_.ADDRESS as ADDRESS2_2_4_,
        contactinf2_.BENEFICIARY_ID as BENEFICI4_2_4_,
        contactinf2_.CONTACT_NUM as CONTACT_3_2_4_ 
    from
        EMPLOYEES employee0_ 
    left outer join
        BENEFICIARIES beneficiar1_ 
            on employee0_.ID=beneficiar1_.EMPLOYEE_ID 
    left outer join
        CONTACT_INFO contactinf2_ 
            on beneficiar1_.ID=contactinf2_.BENEFICIARY_ID 
    where
        employee0_.ID=?
Surprisingly, Hibernate also default multivalued associations to join when made eager. Of course, we shouldn't let this happen as the result will be a cartesian product with exponentially increasing duplicate information. Of course, this can be quickly remedied by specifying a FetchMode other than join for the association.

Going back to default behavior, the standard is that the query for initializing a connection is one that selects using the key-references to the immediate owning entity. Hence, initializing the Beneficiaries collection uses the Employee's ID, and initializing the ContactInfo collection uses the Beneficiary's ID.

Now imagine that all the associations are lazy. If we find an Employee and iterate through its Beneficiaries to get information from its BeneficiaryInfo single-valued association, we would have to execute an extra query to get the Beneficiaries, and another query for each of the Beneficiaries' ContactInfo. This is the famous n+1 queries problem (where the query to get the Beneficiaries is the "1", then querying "n" times for each Beneficiary's association - yes, this can also occur to beneficiary's single-valued associations; basically anything that is separately queried for).


This algebraic representation is actually an understatement in many cases, as the "n" part can usually be expanded further (even recursively). If we started with a query for multiple Employees, this happens, assuming we need to go deep into the contacts.

Again, the association accessed from Beneficiary doesn't have to be multivalued for it to be come a problem.

Of course, providers are aware of this problem and offer ways to avoid or at least alleviate the problem.

Hibernate


FetchMode: Subselect

Let's start off with the mystery third option of Hibernate's @Fetch annotation: subselect, which is only available to collection associations. Documentation explains it to "use a subselect query to load the additional collections".

Let us test it out on Employee's association with Beneficiary:

    @OneToMany(mappedBy = "employee")
    @Fetch(FetchMode.SUBSELECT)
    private List beneficiaries;
If we find a single Employee and then access its beneficiaries collection, the following query is executed:

Hibernate:    
    select
        beneficiar0_.EMPLOYEE_ID as EMPLOYEE5_0_0_,
        beneficiar0_.ID as ID1_0_0_,
        beneficiar0_.ID as ID1_0_1_,
        beneficiar0_.AGE as AGE2_0_1_,
        beneficiar0_.INFO as INFO4_0_1_,
        beneficiar0_.EMPLOYEE_ID as EMPLOYEE5_0_1_,
        beneficiar0_.NAME as NAME3_0_1_ 
    from
        BENEFICIARIES beneficiar0_ 
    where
        beneficiar0_.EMPLOYEE_ID=?
It doesn't look like anything changed. In fact, this is the same even if we use FetchMode.SELECT instead. This time, let's also annotate Beneficiary's (multivalued) association with ContactInfo:

    @OneToMany(mappedBy = "employee")
    @Fetch(FetchMode.SUBSELECT)
    private List beneficiaries;
Accessing the beneficiaries collection still executes the same query. However, when we access the contactInfo collection in beneficiaries, the following query is executed:

Hibernate: 
    select
        contactinf0_.BENEFICIARY_ID as BENEFICI4_2_1_,
        contactinf0_.ID as ID1_2_1_,
        contactinf0_.ID as ID1_2_0_,
        contactinf0_.ADDRESS as ADDRESS2_2_0_,
        contactinf0_.BENEFICIARY_ID as BENEFICI4_2_0_,
        contactinf0_.CONTACT_NUM as CONTACT_3_2_0_ 
    from
        CONTACT_INFO contactinf0_ 
    where
        contactinf0_.BENEFICIARY_ID in (
            select
                beneficiar0_.ID 
            from
                BENEFICIARIES beneficiar0_ 
            where
                beneficiar0_.EMPLOYEE_ID=?
        )
This time, we find that the contactInfo collection was initialized using an IN clause, where the associated Beneficiary IDs were specified using a subquery that selects by the Employee ID! Furthermore, because all of the beneficiaries of the specific employee, were included, their contactInfo collections were initialized as well - even if we only accessed that of the first!

As another test, let's acquire multiple Employees using the JPQL SELECT o FROM Employee o, and then access both collections. This results in the following queries:

Hibernate: 
    select
        employee0_.ID as ID1_3_,
        employee0_.INFO_ID as INFO_ID4_3_,
        employee0_.NAME as NAME2_3_,
        employee0_.VERSION as VERSION3_3_ 
    from
        EMPLOYEES employee0_
Hibernate: 
    select
        beneficiar0_.EMPLOYEE_ID as EMPLOYEE5_0_1_,
        beneficiar0_.ID as ID1_0_1_,
        beneficiar0_.ID as ID1_0_0_,
        beneficiar0_.AGE as AGE2_0_0_,
        beneficiar0_.INFO as INFO4_0_0_,
        beneficiar0_.EMPLOYEE_ID as EMPLOYEE5_0_0_,
        beneficiar0_.NAME as NAME3_0_0_ 
    from
        BENEFICIARIES beneficiar0_ 
    where
        beneficiar0_.EMPLOYEE_ID in (
            select
                employee0_.ID 
            from
                EMPLOYEES employee0_
        )
Hibernate: 
    select
        contactinf0_.BENEFICIARY_ID as BENEFICI4_2_1_,
        contactinf0_.ID as ID1_2_1_,
        contactinf0_.ID as ID1_2_0_,
        contactinf0_.ADDRESS as ADDRESS2_2_0_,
        contactinf0_.BENEFICIARY_ID as BENEFICI4_2_0_,
        contactinf0_.CONTACT_NUM as CONTACT_3_2_0_ 
    from
        CONTACT_INFO contactinf0_ 
    where
        contactinf0_.BENEFICIARY_ID in (
            select
                beneficiar0_.ID 
            from
                BENEFICIARIES beneficiar0_ 
            where
                beneficiar0_.EMPLOYEE_ID in (
                    select
                        employee0_.ID 
                    from
                        EMPLOYEES employee0_
                )
            )
With the first query being the main query that actually finds the Employees first, we find that the other two queries(which respectively initialize Beneficiaries and their ContactInfo) base their subselect on the main query. This is, then, the behavior of FetchMode.subselect. Also, notice that as the associations go deeper, the subselects being to nest.

This certainly helps out with our n+1 problem, but sometimes we don't need really all of the data - we just want to avoid separate querying for of the values we come across.

WELL, TOUGH LUCK BECAUSE WE CAN'T HAVE EVERYTHING. This doesn't mean that we'll have nothing.

BatchSize

Hibernate gives us the @BatchSize annotation. Because it requires a size argument, it means that it's up to us to guess how big the batch sizes should be. It's our compromise between being able to query for more than one result at a time and undesirably having a query return more data than can be processed.

Let us try it out before we get into the details. We annotate both collection relationships with @BatchSize while keeping the @Fetch annotation in subselect mode:

In Employe---------
    @OneToMany(mappedBy = "employee")
    @Fetch(FetchMode.SUBSELECT)
    @BatchSize(size = 3)
    private List beneficiaries;
In Beneficiary-----
    @OneToMany(mappedBy = "beneficiary")
    @Fetch(FetchMode.SUBSELECT)
    @BatchSize(size = 3)
    private List contactInfo;
If we search for a single Employee (and the access both collections), the following queries are executed:

Hibernate: 
    select
        employee0_.ID as ID1_3_,
        employee0_.INFO_ID as INFO_ID4_3_,
        employee0_.NAME as NAME2_3_,
        employee0_.VERSION as VERSION3_3_ 
    from
        EMPLOYEES employee0_ 
    where
        employee0_.ID=1
Hibernate: 
    select
        beneficiar0_.EMPLOYEE_ID as EMPLOYEE5_0_1_,
        beneficiar0_.ID as ID1_0_1_,
        beneficiar0_.ID as ID1_0_0_,
        beneficiar0_.AGE as AGE2_0_0_,
        beneficiar0_.INFO as INFO4_0_0_,
        beneficiar0_.EMPLOYEE_ID as EMPLOYEE5_0_0_,
        beneficiar0_.NAME as NAME3_0_0_ 
    from
        BENEFICIARIES beneficiar0_ 
    where
        beneficiar0_.EMPLOYEE_ID=?
Hibernate: 
    select
        contactinf0_.BENEFICIARY_ID as BENEFICI4_2_1_,
        contactinf0_.ID as ID1_2_1_,
        contactinf0_.ID as ID1_2_0_,
        contactinf0_.ADDRESS as ADDRESS2_2_0_,
        contactinf0_.BENEFICIARY_ID as BENEFICI4_2_0_,
        contactinf0_.CONTACT_NUM as CONTACT_3_2_0_ 
    from
        CONTACT_INFO contactinf0_ 
    where
        contactinf0_.BENEFICIARY_ID in (
            select
                beneficiar0_.ID 
            from
                BENEFICIARIES beneficiar0_ 
            where
                beneficiar0_.EMPLOYEE_ID=?
        )
It seems like @BatchSize did nothing. If we fetch multiple Employees via JPQL, the succeeding queries still use the subselect based on the main query.

This time, let's remove the @Fetch annotation (or set it to FetchMode.SELECT, as it is the default for lazy associations). Upon execution and collection access, the following query appears:

Hibernate: 
    select
        employee0_.ID as ID1_3_0_,
        employee0_.INFO_ID as INFO_ID4_3_0_,
        employee0_.NAME as NAME2_3_0_,
        employee0_.VERSION as VERSION3_3_0_ 
    from
        EMPLOYEES employee0_ 
    where
        employee0_.ID=?
Hibernate: 
    select
        beneficiar0_.EMPLOYEE_ID as EMPLOYEE5_0_1_,
        beneficiar0_.ID as ID1_0_1_,
        beneficiar0_.ID as ID1_0_0_,
        beneficiar0_.AGE as AGE2_0_0_,
        beneficiar0_.INFO as INFO4_0_0_,
        beneficiar0_.EMPLOYEE_ID as EMPLOYEE5_0_0_,
        beneficiar0_.NAME as NAME3_0_0_ 
    from
        BENEFICIARIES beneficiar0_ 
    where
        beneficiar0_.EMPLOYEE_ID=?
Hibernate: 
    select
        contactinf0_.BENEFICIARY_ID as BENEFICI4_2_1_,
        contactinf0_.ID as ID1_2_1_,
        contactinf0_.ID as ID1_2_0_,
        contactinf0_.ADDRESS as ADDRESS2_2_0_,
        contactinf0_.BENEFICIARY_ID as BENEFICI4_2_0_,
        contactinf0_.CONTACT_NUM as CONTACT_3_2_0_ 
    from
        CONTACT_INFO contactinf0_ 
    where
        contactinf0_.BENEFICIARY_ID in (
            ?, ?, ?
        )

There is now a difference in fetching, but only in fetching the ContactInfo collection of Beneficiary.

But the documentation said we could "define size for loading of collections or lazy entities"! It does not lie. A lot of people new to this annotation also get confused.

Batching can only happen where the owning entity was also retrieved as part of a collection or a result of a range query - if we queried for multiple Employees at the start, then the initialization of Beneficiaries would also have been batched. Essentially, Hibernate isn't worried about fetching a collection association of an entity that was queried for by itself - if we were worried, we could have executed a separate paginated query for that - it was worried about associations of entities from a collection.

Additionally, we can also batch fetch single-valued associations of entities we received in a collection. To do this, we annotate the entity at class level with @BatchSize. It would have made sense if they were at attribute level, but it didn't work when I tried it.

Having used a batch size of 3, the WHERE clause in initializing the ContactInfo collection used 3 Beneficiary IDs. Effectively, this reduces our n+1 problem into (n/batchSize)+1.

Extra Lazy Collections

On the other hand, if out of a large collection only a handful (or less) is required, then extra lazy collections might be important. Honestly, there are other ways to go about this - like pagination -, and this method still pretty much suffers from the n+1 problem.

In any case, this is an interesting feature. It can immediately work on the collection at hand (though it has to be an entity association, and not an initial query with multiple results), but it has some nasty prerequisites:
  1. The target entity collection association offers a column with dense numeric values with which to order the collection.
    1. The collection association would then be annotated with @javax.persistence.OrderBy(name="nameOfIndexCol")
  2. The collection association is annotated with @org.hibernate.annotations.ListIndexBase(<int value>), where the int value represents the start value of the order column in the target entity. This is so we don't have to prefetch any data before we access elements of the collection. When accessing the actual list, it still starts at 0.
Let us quickly satisfy these requirements on Employee's association with Beneficiary:

    @OneToMany(mappedBy = "employee")
    @LazyCollection(LazyCollectionOption.EXTRA)
    @ListIndexBase(1)
    @OrderColumn(name = "ID")
    private List beneficiaries;
It might be worth noting that configuring to extra lazy collection takes priority over FetchMode.SUBSELECT, but not over FetchMode.JOIN.

Anyway, if we find a single Employee and iterate through the Beneficiary collection at list indices 0 and 1, the following queries are executed:

--the initial query
Hibernate: 
    select
        employee0_.ID as ID1_3_0_,
        employee0_.INFO_ID as INFO_ID4_3_0_,
        employee0_.NAME as NAME2_3_0_,
        employee0_.VERSION as VERSION3_3_0_ 
    from
        EMPLOYEES employee0_ 
    where
        employee0_.ID=?

--querying for entry at index 1
Hibernate: 
    select
        beneficiar0_.ID as ID1_0_0_,
        beneficiar0_.AGE as AGE2_0_0_,
        beneficiar0_.INFO as INFO4_0_0_,
        beneficiar0_.EMPLOYEE_ID as EMPLOYEE5_0_0_,
        beneficiar0_.NAME as NAME3_0_0_ 
    from
        BENEFICIARIES beneficiar0_ 
    where
        beneficiar0_.EMPLOYEE_ID=? 
        and beneficiar0_.ID=?

--querying for entry at index 2
Hibernate: 
    select
        beneficiar0_.ID as ID1_0_0_,
        beneficiar0_.AGE as AGE2_0_0_,
        beneficiar0_.INFO as INFO4_0_0_,
        beneficiar0_.EMPLOYEE_ID as EMPLOYEE5_0_0_,
        beneficiar0_.NAME as NAME3_0_0_ 
    from
        BENEFICIARIES beneficiar0_ 
    where
        beneficiar0_.EMPLOYEE_ID=? 
        and beneficiar0_.ID=?
Well, it works. The configuration causes the WHERE clause in the entry-specific queries have two conditions: one that references the owning entity, and one for matching with the order column, respectively.

As a final note, it might be worth knowing that in these "smarter" collections, operations like size() and contains() will no longer trigger initialization, though some operations (like isEmpty())may execute a COUNT or MAX query so the collection "knows" if it has contents.

Eclipselink

@BatchFetch

Eclipselink provides the @BatchFetch annotation which can be considered a combination of Hibernate's @Fetch(FetchType.SUBSELECT) and @BatchSize() through its size parameter and its three available types: JOIN(the default), EXISTS, or IN. The size parameters only works for the IN type, however.

First, let's try out JOIN on Employee's Beneficiaries association:

    @OneToMany(mappedBy = "employee")
    @BatchFetch(value = BatchFetchType.JOIN)
    private List beneficiaries;
For some reason, weaving fails if we leave @BatchFetch without its value attribute, which should have defaulted to JOIN. Anyway, let us use the JPQL SELECT o FROM Employee o, and then access a beneficiaries collection:

    SELECT ID, NAME, VERSION, INFO_ID FROM EMPLOYEES

    SELECT t0.ID, t0.AGE, t0.NAME, t0.EMPLOYEE_ID, t0.INFO FROM BENEFICIARIES t0, EMPLOYEES t1 WHERE (t0.EMPLOYEE_ID = t1.ID)
Of course, the first query is the one to actually get the Employees; the second query has the batching. In this case, we find that the selection for the beneficiaries includes on the Employee ID, taking after the original query. To prove this, let's add a pretty useless WHERE clause condition, o.name <> 'some_name'. The secondary query then becomes the following:

    SELECT t0.ID, t0.AGE, t0.NAME, t0.EMPLOYEE_ID, t0.INFO FROM BENEFICIARIES t0, EMPLOYEES t1 WHERE ((t0.EMPLOYEE_ID = t1.ID) AND (t1.NAME <> ?))
 bind => [some_name]
If we try to access the deeper collection that is the association of Beneficiary to ContactInfo, the following query is executed:

    SELECT t0.ID, t0.ADDRESS, t0.CONTACT_NUM, t0.BENEFICIARY_ID 
    FROM CONTACT_INFO t0, EMPLOYEES t2, BENEFICIARIES t1 
    WHERE ((t0.BENEFICIARY_ID = t1.ID) AND ((t1.EMPLOYEE_ID = t2.ID) AND (t2.NAME <> ?)))
 bind => [some_name]
We find that the WHERE clause conjoins with the main query. Worry not; only the necessary data appears in the SELECT clause because the joining only affects the condition.

Essentially, this partitions the initialization the same way as Hibernate's subselect fetching, except that the joins widen as associations nest, whereas in subselect, the subselections nest.

This time, let us try using the EXISTS option, but still executing the same JPQL SELECT o FROM Employee o WHERE o.name <> 'some_name'. Skipping the main query, the following query executes when we access an element of beneficiaries:

    SELECT t0.ID, t0.AGE, t0.NAME, t0.EMPLOYEE_ID, t0.INFO 
    FROM BENEFICIARIES t0 
    WHERE EXISTS 
        (SELECT t1.ID FROM EMPLOYEES t1 WHERE ((t1.NAME <> ?) AND (t0.EMPLOYEE_ID = t1.ID))) 
Then let's access a beneficiary's contactInfo collection:

    SELECT t0.ID, t0.ADDRESS, t0.CONTACT_NUM, t0.BENEFICIARY_ID 
    FROM CONTACT_INFO t0 
    WHERE EXISTS 
      (SELECT t1.ID FROM BENEFICIARIES t1 
       WHERE (EXISTS 
         (SELECT t2.ID FROM EMPLOYEES t2 WHERE ((t2.NAME <> ?) AND (t1.EMPLOYEE_ID = t2.ID)))  AND (t0.BENEFICIARY_ID = t1.ID))) 
Again, this initializes the same amount of data, but in a different way. This time, it's nested EXISTS subqueries.

Finally, I was supposed to demonstrate the IN option, but its behavior is pretty much exactly like Hibernate's @BatchSize when given a size argument. The only difference is that Hibernate does not have a way to execute a raw IN batch query unless @BatchSize is used.

As an important note, the size argument in the @BatchFetch annotation does not seem to work; I had to apply the size as a query hint instead.

QueryHints: BATCH and BATCH_TYPE

With little need for description, quick information about these query hint counterparts of @BatchFetch follow:
  • QueryHints.BATCH accepts the association traversal string to batch.
    • The association string, much like LEFT_FETCH, should begin with an arbitrary alias.
  • QueryHints.BATCH_TYPE defines the type of batching used for the assocations to be batched.
    • It accepts the strings "JOIN", "EXISTS", or "IN", or their BatchFetchType enum counterparts.
  • QueryHints.BATCH_SIZE defines the size of the IN clause for the IN batch type.

Finally, the most interesting feature of Eclipselink's batching toolset is this:  it can also be applied to single-valued associations.

This is useful because Eclipselink does not yet define a way to join a single-valued association to something that is fetched by batch other than using the @JoinFetch annotation which we try to avoid. Since we can apply these batch configurations to single-valued associations as well, we are still able to optimize their fetching!


The following images summarizes the capabilities of some of these configuration options:


Configuring With Hibernate's FetchProfile


Unfortunately, I came across a StackOverflow entry with an answer that mentions that JPA 2.1 does not support FetchProfiles. This was my most anticipated configuration option for Hibernate.

I'll update this post once I test it on JPA 2.0;  I still tried it (I'm using 2.1), but it didn't work on my experiments.

Configuring Scalar Attributes


These basic fields aren't associations, but they can be configured for optimization nonetheless. This time, we want to limit which fields we want to SELECT, as were only able to specify associations previously (which still took all of the fields of the queried entities). Let's get this over with.

Like mentioned near the beginning of this post, because we are dealing with scalar attributes, configuring them as lazy will not work unless bytecode instrumentation is done. This is true for both providers.

JPA: @Basic

JPA provides the @Basic annotation to be used on scalar types, should we desire to configure its fetch latency. Its fetch attribute accepts a FetchType value, either EAGER or LAZY, as used in the annotations @OneToOne, etc.

This augments the default behavior so that we have to access the attribute to have it loaded.

JPA (2.1): EntityGraph

JPA 2.1 brings the EntityGraph configuration. With the EntityGraph, we can define a contract that tells initialization which attributes to initialize for the entity. This is called a fetch group.

There are two ways to define an EntityGraph. The first is to declare them with the @NamedEntityGraph annotation:

@Entity
@Table(name = "EMPLOYEES")
@NamedEntityGraphs({
                   @NamedEntityGraph(name = "testGraph",
                                     attributeNodes =
                                     { @NamedAttributeNode(value = "name"),
                                       @NamedAttributeNode(value = "info", subgraph = "info_email") },
                                     subgraphs =
                                     { @NamedSubgraph(name = "info_email",
                                                      attributeNodes = { @NamedAttributeNode(value = "email") }) })
    })
It certainly is a lot of code just to fetch a couple of attributes. One thing to note is that the info association uses the info_email subgraph (also declared in the same NamedEntityGraph). This subgraph further specifies that only the email attribute is to be fetched for that association. Anything annotated with @ID and @Version are always included, and cannot be excluded. Furthermore, because we declare the fields and associations we want right away, providers usually configure relevant associations to eager for this fetch scheme, and then make lazy everthing else (this is true for Eclipselink; Hibernate has an issue with this, where eager associations are not made lazy).

If we declare an association without a subgraph, then all of its attributes and associations that are not lazy will be included (and cascade the fetch).

The EntityGraph can then be applied through the following code:

        EntityGraph entityGraph = em.getEntityGraph("testGraph");
        
        //when using find:
        Map hints = new HashMap<>();
        hints.put("javax.persistence.fetchgraph", entityGraph);
        Employee emp = em.find(Employee.class, 1L, hints);
        
        //when using a query:
        TypedQuery query = em.createQuery("SELECT o FROM Employee o", Employee.class);
        query.setHint("javax.persistence.fetchgraph", entityGraph);
We can also build EntityGraphs at runtime:

        //creating an EntityGraph identical to the previous example
        EntityGraph entityGraph = em.createEntityGraph(Employee.class);
        entityGraph.addAttributeNodes("name");
        Subgraph infoSubGraph = entityGraph.addSubgraph("info");
        infoSubGraph.addAttributeNodes("email");

        //get a mutable copy of the named EntityGraph
        entityGraph = em.createEntityGraph("testGraph");
        //modifications...
Unfortunately, because using an EntityGraph as a fetchgraph coerces the associations to eager, It may be counterproductive to use with Hibernate as it automatically joins the associations in the graph.

An EntityGraph can also be applied as a LoadGraph. Simply replace the hints application with "javax.persistence.loadgraph". Using a loadgraph defines that nodes in the graph be "treated as FetchType.EAGER and attributes that are not specified are treated according to their specified or default FetchType" (from the JPA 2.1 spec). After some experimentation, I didn't achieve the expected behavior yet, so I won't expound much else.

JPA: JPQL Constructors

Another way to get only a subset of attributes is through using constructors in the JPQL query. The following is an example:

SELECT NEW com.test.Employee(o.id, o.name) FROM Employee o
Of course, this is very static behavior and prolific usage of this method bloats the codebase with multiple queries and constructors.

Eclipselink: FetchGroup (QueryHints.FETCH_GROUP_NAME, FETCH_GROUP, and FETCH_GROUP_ATTRIBUTE)

The FetchGroup feature of Eclipselink works almost exactly like JPA's EntityGraph (when used as a fetchgraph) since it's Eclipselink's underlying implementation for it. In fact, the FetchGroup feature predates JPA 2.1. The main difference is that FetchGroup does not coerce associations to become eager. Sure, it makes everything not declared in it to be lazy, but it makes sure that when a lazy association it includes is initialized, only the what is declared in the FetchGroup is loaded for the association's target.

Declaring a fetch group is much simpler (as it is nonstandard). The following fetchgroup is similar to the previous sample fetchgraph:

@Entity
@Table(name = "EMPLOYEES")
@FetchGroups({
             @FetchGroup(name = "testFetchGroup",
                         attributes = { @FetchAttribute(name = "name"), 
                                        @FetchAttribute(name = "info.email") })
    })
The named fetchgroup is then applied using the QueryHints.FETCH_GROUP_NAME hint:

        TypedQuery query = em.createQuery("SELECT o FROM Employee o WHERE o.id = 1", Employee.class);
        query.setHint(QueryHints.FETCH_GROUP_NAME, "testFetchGroup");
        Employee emp = query.getSingleResult();
A similar FetchGroup can also be made at runtime:

        TypedQuery query = em.createQuery("SELECT o FROM Employee o WHERE o.id = 1", Employee.class);
        FetchGroup fg = new FetchGroup();
        fg.addAttribute("name");
        fg.addAttribute("info.email");
        query.setHint(QueryHints.FETCH_GROUP_NAME, "testFetchGroup");
Notice that this time no alias is required. Also, if the passed attribute is an association, then, much like the EntityGraph, it includes the eager attributes and associations of that association.

We can also use the query hint FETCH_GROUP_ATTRIBUTE to declare attributes one at a time, calling query.setHint() multiple times.

Finally, in the case that we access attributes not declared in the FetchGroup, it will cause the entity to fully initialize itself, potentially throwing out all our optimizations out the window. Of course, this problem is minimized if all associations have been made lazy, as only the entity who owns the attribute gets to be fully initialized.

Also, since Eclipselink allows multiple applications of some query hints, using default Maps to contain hints for entity manager methods will be problematic, as each key has to be unique. A custom Map implementation has to be used.

Other General Considerations


Adding metadata annotations for fetching strategies, configurations, or specialized queries in entities can quickly cause the code to bloat. Furthermore, it is debatable whether we should pollute model code with knowledge about the requirements of business methods or views. However, since these are directly related to the modeling of the entity, it can be said that this is a cross-cutting concern, as such configurations are heavily involved in both the model and its specific use case.

Personally, I favor moving such requirements as close as possible to where they are used. In cases where actual configuration cannot be stored as they are, I try to instead create tools or objects that are able to represent (and simplify, if possible) these configurations, and then use the tools to translate the objects into the desired configurations. Doing it this way makes a strong preference for granular runtime configurations, usually through applying hints at query level. Luckily, I'm using Eclipselink, and that's what Eclipselink advocates - not that I think it's better or anything.

Perhaps I've said this a couple of times throughout the post, but it's better if we make the associations lazy. This promotes a more fail-safe base configuration for when we breach contracts and limitations set by further configurations - such as when we access properties not declared in a FetchGroup that was used.

It is also usually worthwhile to think if entities should be well-associated and comprehensive (if they are left eager, they usually become the cause of optimization problems) or concise and modular. Considering this becomes a deciding between reusable entities with more configuration and specialized entities with little configuration. A need for concise data may even call for using native queries instead; it comes down to the need for having managed entities that JPA can perform various operations upon - like locking.

Because JPA and its providers rightfully worry so much about collection associations, it is also worthwhile to think about whether it is safe (or even worth it) to make collection associations, or should operations just be performed on them without the association. Of course, it is a lot easier to make JPA handle these operations by making the associations.

Configuration Recommendations


As some solid takeaways (and out of all the mentioned configuration options), here are some recommendations, or at least points to ponder as there never really is a silver bullet:

  • It is a good idea to make associations lazy if they are not already.
  • If the entities are concise and only require single-valued associations, using JPQL FETCH JOINS can be sufficient.
  • Prefer runtime configuration over cross-cutting and cluttered metadata annotations.
  • For single-valued associations:
    • The easiest and risk-free option, thought limited, is JPQL join fetching. The most useful is EntityGraph (as fetchgraph), but be careful to not include collection associations and their associations.
    • Eclipselink has LEFT_FETCH hints. 
  • For collection associations:
    • For Hibernate, use @Fetch(FetchMode.SUBSELECT) or @BatchSize() accordingly. For single-valued associations in entities that appear in a collection association, annotate the target entity (at class level) with @BatchSize to apply batching.
      • This can cause maintenance difficulties as it won't be immediately obvious that a relationship is batched until the entity is read.
    • Eclipselink has batch-related hints. This can avoid clutter and ill-placed annotations.
  • When it comes to scalar attributes, as the fields in an entity increase, and it is still desirable that they be reusable, it might now be advantageous to use EntityGraphs (as a fetchgraph, or FetchGroups for Eclipselink).
    • Using EntityGraphs as a loadgraph is still buggy for both providers.
    • Though using EntityGraphs can achieve deep join fetching with Hibernate, be cautious to avoid (or at least minize) including collection associations, as they will end up being joined as well due to being coerced to eager, and they can form cartesian products.
      • Unfortunately, other than this, Hibernate does not offer a runtime-configurable way to perform deep joins (perhaps aside from FetchProfiles). Using @Fetch JOIN causes it to be eager, which is undersirable.


Summary


Phew! That was definitely a lot! If you read all that up to here, then I thank you for coming along on this journey.

Anyway, if you started here, then I completely understand that it would have been a lot of text to go through - even I would have had to gather strength to be willing to read something like this. Anyway, please enjoy the summay.

Here are the important points discussed in this post:
  • Entity associations are declared with @OneToOne, @OneToMany, @ManyToOne, and @ManyToMany.
  • JPA has two main ways to obtain managed entities: EntityManager operations (like find), and JPQL queries. 
    • Hibernate handles these two acquisition methods differently: if a JPQL is used, then the query executed remains true to the JPQL as possible - configuration annotations will not take effect on this main query.
    • Eclipselink treats these two equally, where configurations just build on top of each other. 
    • For latency, it is recommeneded that associations are all made lazy.
    • Everything else in this post mostly has to do with strategy.
  • There are two main facets of configuration when optimizing fetching: Fetch Latency, and Fetch Strategy.
    • Latency deals with when associations and fields are fetched - we differentiate between two modes: eager and lazy. An eager property gets loaded (i.e. queried for) during acquisition of the main entity/ies. A lazy property, on the other hand, only gets loaded upon use. JPA defaults toOne associations to eager, and toMany associations to lazy.
    • Strategy deals with how the associations are fetched - whether they be joined with the original query, or queried separately, but with different WHERE clauses.
  • Single-valued Associations
    • With their eager defaults, Eclipselink initializes them through separate queries, one for each association.
    • Hibernate automatically outer joins eager associations with the initial query.
    • There are ways to configure fetching single-valued associations:
      • JPQL LEFT/INNER JOIN FETCH
        • Simply add this clause between the FROM and WHERE clauses. Hibernate can only fetch a single level, while Eclipselink can fetch beyond.
      • Fetch Annotations
        •  Hibernate: @Fetch: JOIN, SELECT
          • The JOIN FetchMode causes the association to join (already default for eager associations), and coerces the association to become eager.
          • The SELECT FetchMode does not coerce associations to eager. It mimics Eclipselink's behavior of needing a single query for each association.
        • Eclipselink: @JoinFetch: INNER, OUTER 
          • Eclipselink can only join associations with this annotation for up to one level. The options only change between using an INNER or OUTER join.
      • Eclipselink: QueryHints: FETCH and LEFT_FETCH
        • Query hints are applied during NamedQuery definition or at query level. FETCH makes use of on an inner join; LEFT_FETCH uses an outer join.
  • Multivalued Associations
    • Initializing these lazy-default associations requires accessing their collections. 
    • Eclipselink and Hibernate fetch them by using a separate query that finds with the key of the owning entity.
    • If made eager, Hibernate also uses a join. This is not the best option for collections.
    • The "n+1" problem happens when we separately initialize associations as we traverse a collection.
    • To solve - or at least alleviate - the problem, providers allow batch fetching.
      • Hibernate: @Fetch: SUBSELECT 
        • With subselect, Hibernate initializes the association for all members of the collection by nesting IN clauses based on the original query.
      • Hibernate: @BatchSize 
        • @BatchSize works on a collection association if it is not in JOIN FetchMode. FetchMode SELECT is the default for lazy associations.
        • This annotation specifies the size of the IN clause (which now uses ID tuples of the owning entities) with which to fetch.
        • It works in associations whose immediate owner was retrieved as part of a collection (or multivalued result).
        • Works on both single- and multivalued associations; for single-valued associations, the annotation has to be at class level (of the target entity).
      • Hibernate: @LazyCollection(LazyCollectionOption.EXTRA)
        • For when you need only a small number of entries from a large collection.
        • Enables querying per entry in the collection.
        • Potentially worsens the n+1 problem.
        • Has prerequisites:
          • The target entity has to have a dense, numeric column against which the owner can use JPA's @OrderBy.
          • The association is annotated with org.hibernate.annotations.ListIndexBase(value), where the value would be the starting index of the value in the ordering column. The resulting List is still zero-based.
      • Eclipselink: @BatchFetch: JOIN, EXISTS, IN
        • Works like Hibernate's SUBSELECT FetchMode, except the WHERE clause differs: JOIN expands the condition horizontally as the initial query is joined progressively deeper; EXISTS nests its clauses, much like Hibernate's SUBSELECT; and IN makes use of an IN clause that queries using key-tuples of the owning entity.
        • The IN type behaves exactly like Hibernate's @BatchSize annotation - it even has a size parameter that configures the batch size.
        • They also have QueryHints versions (BATCH, BATCH_TYPE, and BATCH_SIZE).
  • Scalar Attributes
    • It may be desirable to limit the attributes fetched from entities. There are options:
      • JPA: @Basic(fetch=FetchType.Lazy)
        • For attribute-level configuration.
      • JPA: JPQL Constructors
        • For query-level configuration.
        • This method makes queries very specialized and quickly bloats code to an increasing number of constructors.
        • A very static configuration.
      • JPA: EntityGraph
        • Declared by annotation or created at runtime; applied at query level (using hint name "javax.persistence.fetchgraph").
        • Can be verbose do declare, but because is can be made at runtime, it allows dynamic and well-placed configuration options.
        • Coerces associations to become eager; using it with Hibernate can cause problems as multivalued associations get joined instead.
        • Can also be used as a LoadGraph by using the hint name "javax.persistence.loadgraph" instead. A LoadGraph forces attributes it contains to become eager, but leaves the rest to be default. Behavior is still not too predictable for both providers.
      • Eclipselink: FetchGroup
        • Predates EntityGraph.
        • Forces attributes and associations not declared in it to be come lazy, but does not coerce lazy associations it contains to eager.
        • Much easier to build; has both annotation and runtime creation versions.
        • Applied via query hints.
        • Accessing attributes outside a FetchGroup causes full loading of the entity, potentially throwing away configuration benefits (usually if the associations were not made lazy).
  • Consider when using optimization methods that use annotations - they can bloat entity code with knowledge of business method and view layer requirements; it becomes  a cross-cutting concern.
    • Make all the associations lazy (as also recommended by Hibernate) to protect from heavy damage when configurations are breached (mainly FetchGroups).
    • Define a good overall scheme for engineering entities and associations.
    • Prefer runtime configuration, especially those that configure at query level (usually query hints).
    • View the Configuration Recommendations section.

I'd like to leave you with a quote I found while experimenting on these things:
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%." - Donal Knuth, Structured Programming With GoTo Statements
First of all, if we consider what this quote is from, it is for "programming with goto statements". Of course overthinking optimization is going to be a lot of work. Also, this quote mentions "small efficiencies". The efficiencies discussed in this post, is anything but that.

In the rise of agile engineering, we want to churn out code as fast as we can - when we find the big problems, it is usually only then that we fix it. The problem only happens when there is no general knowledge throughout the team regarding tooling limitations, basic behavior, and how to fix and optimize things when the need arises.

With proper foreknowledge and experience, it shouldn't be a problem to do general preemptive optimizations - it could even set a more solid base and start the project out on a good pace!

The key here is to have a keen understanding of the basics.

I hope I helped with that through this post. Again, thanks for joining me.

Cheers!


Some Sources

  • https://fndong.wordpress.com/2016/03/14/about-hibernate-orm/comment-page-1/
  • https://en.wikipedia.org/wiki/EclipseLink
  • https://en.wikipedia.org/wiki/TopLink
  • https://en.wikipedia.org/wiki/Java_Persistence_API
  • https://en.wikipedia.org/wiki/Hibernate_(framework)
  • http://wiki.c2.com/?PrematureOptimization 
  • https://docs.jboss.org/hibernate/orm/4.2/manual/en-US/html/ch20.html
  • Java Persistence With Hibernate; First Edition; Manning, 2005
  • http://www.eclipse.org/eclipselink/#documentation