JPA is godsend for happily churning out code that works with data from - you've guessed it - databases! Of course - and as I've learned when it comes to building things -, nothing ever goes blissfully right.

Considering Murphy's laws, it's quite scary when things go right for so long - land mines are silent and patient.

Not-so-luckily, I'm here to discuss things that are only more apparent. Since this is going to be about two JPA providers, let's dabble into some history.

This post makes use of Hibernate 5.2.10-FINAL, and Eclipselink 2.6.4.

Some History

Back in 2000, during the advent of EJB 2.0, JEE developers had been given a powerful feature that offered to encompass data source interaction into the EJB toolset: CMP (Container Managed Persistence).

Of course, and as usual, its reception was by a divided crowd. Integrating this cool new feature into EJB meant that JEE-compliant servers had to be used, leaving out non-JEE projects. It also suffered from the interface-and-descriptor-heavy way of creating EJB components, while still lacking some necessary features.

A short while later, in 2001, Gavin King, along with some colleagues, started Hibernate to provide an alternative to EJB's CMP, but more simplified and a fuller set of features.

Eclipselink, on the other hand, started as Toplink, which started during the 1990s, with its Java version emerging between 1996 and 1998. It was then acquired by Oracle to merge into its Oracle Fusion Middleware product. Later, in 2007, code was donated to the Eclipse Foundation, giving birth to the Eclipselink project.

Over those years, it became more apparent and voiced that the EJB was in dire need of improvement. Thus, the JSR for EJB 3.0 was started with the goal of simplifying EJB. Developers from the Hibernate team joined in this effort. The JPA specification then emerged in 2006.

Since then, Hibernate 3.0 has become a certified JPA 2.0 implementation, while Eclipselink was selected as the the reference implementation for JPA 2.0 (and 2.1).

Relevant Basics

JPA is one of those specifications that follow configuration by exception. This means that it offers strong and comprehensive conventions, and using its more gritty features are only warranted by more specific use cases.

This is certainly a good general thought, but personally, I like to use as much features as I can, both from JPA and extras from the provider, to be able to produce as best and robust experiences as possible. Of course, I still try very hard to minize code and avoid luxuries. I guess what I'm trying to say here is that you should use as much as you need; just be thoughtful.

Entity Associations

JPA brings a simple and amazing convention. One of its major benefits is that it allows us to navigate through relational data in an object-oriented manner by allowing us to specify entity relationships as fields, represented by association mappings (annotated with @OneToOne, @OneToMany, etc.). This is what we call "fetching data".

We declare entity associations by using the JPA annotations @OneToOne, @OneToMany, @ManyToOne, and @ManyToMany (aside from @JoinColumn).

Their naming draws directly from their ERD (Entity Relationship Diagram) counterparts, so it can be quickly understood that they denote association and multiplicity.

Of course, the discussion will not be limited to associations as even basic fields can be configured to some extent in optimization efforts.

Configuring these things, along with carefully engineering the entities, is key to getting the best performance out of this great piece of technology. Honestly though, the defaults for fetching suck.

Obtaining Entities

There are two main ways with which entities can be obtained: JPQL and EntityManager operations. Though JPQL is also pretty much achieved via the EntityManager, let us differentiate them from EntityManager's less verbose retrieval ways such as find(...) or getReference(...), as they are behave differently between providers, as will be discussed later on.

One convention to note that JPA covers these retrieval methods is that by default, they will always consult caches before executing a query and building objects. The latter part, in other words, means that even if a query was actually executed, if an entity that could have been built from the query data already exists in a cache, then that cached entity will be returned instead. If you're using Eclipselink and want to know some more about it, check out my other post (I'd love to hear your thoughts and questions).

Let us move on to the meat of this discussion.

Fetching

There are two main considerations in fetching associations that work in concert to increase performance: fetch latency (pretty much referred to as "eagerness" or "lazy loading") and fetch strategy. They deal with when and how data is fetched, respectively. It is easy to mix them up as they have configurations that overwrite each other. Generally, latency is usually a necessary consideration when planning a strategy.

Fetch Latency

JPA specifies default behavior for latency when it comes to associations: eager for toOne associations, and lazy for toMany associations. Eager fetching means that the association is also immediately retrieved (via cache or query) when its owning entity is retrieved (and managed). On the other hand, lazy fetching simply means that the association need not be retrieved until it is needed (accessed). Needless to say, basic/scalar fields are also eagerly fetched.

Deciding whether single-valued (toOne) associations should be eagerly or lazily fetched depends heavily on architectural decisions, among others. Eager fetching can make sense when entities are modeled so that their associations are almost always needed, otherwise these associations can be made lazy (usually eventually, as decisions lead to expansion of these entities).

It is also worth noting that pre-JPA Hibernate defaults all associations to lazy fetching. I read somewhere in Manning's Java Persistence With Hibernate (1st Edition, 2005) that JPA only specified single-valued relationships as eager by default because it was easier to support.

Personally, I follow Hibernate's default (also recommended by Hibernate) and explicitly configure single-valued associations as lazy (leaving multi-valued associations to its already lazy default) as it is more fail-safe and minimizes unnecessary fetching. I mentioned "fail-safe" because certain fetch strategies can proceed to querying and initializing relationships as handling for when their contracts are broken.

Configuring Fetch Latency

Of course, JPA specifies the following configuration options to augment latency:

Association annotations @OneToOne, @OneToMany, @ManyToOne, and @ManyToMany have a fetch attribute that can be set using values from javax.persistence.FetchType (LAZY or EAGER).
Scalar attributes can be annotated with javax.persistence@Basic, which also has a fetch that accepts a FetchType value as mentioned above.

However, Hibernate and Eclipselink have a prerequisite to fully leverage these features: bytecode instrumentation, also known as weaving. As the name suggests, this has to do with modifying the class files of entities, during deployment or runtime, to extend their behavior and operability. Being able to configure fetch latency is only one of these extensions.

There are different ways to do bytecode instrumentation; just refer to related documentation for your provider: Hibernate or Eclipselink. Additionally, Java SE code can be run with weaving by simply adding a javaagent to the run options. Check this post for information.

Both Hibernate and Eclipselink make use of indirection when it comes to lazy relationships. Simply put, when an entity is retieved, its relationships that are configured to be lazy will be filled in by a proxy object that knows how to initialize the relationship it proxies. Hibernate has more configurations when it comes to this proxy.

Finally, it is also worth noting that Hibernate does not actually need bytecode instrumentation to configure latency for associations, but it does for scalar fields.

Fetch Strategy

Phew! That was a lot of primer information! Finally, we can get our hands on some code!

Because I'll be demonstrating code and show logs, here is an ERD for reference:

Spare the unrealistic modeling of employee information; my aim is to demonstrate fetch behavior in certain cases.

Fetching Single-Valued Associations

Single-valued associations are your toOne mappings.

Eclipselink

The default fetch behavior for these relationships in Eclipselink is to execute a separate query to initialize the association. Thus, referring to the ERD, if we were to query for an Employee, it would take another query to fetch its EmployeeInfo and yet another to fetch the EmployeeInfo_Ext, for a total of three queries.

This is true for both JPQL and EntityManager methods.

Obviously, this default is not even close to optimal as database trips are usually costly. On the other hand, Hibernate already automatically joins these for us, with extra behavior if a JPQL query is used.

Hibernate: Acquisition by JPQL

For JPQL queries, it executes an isolated fetch for the entity first, then initializes its eager associations and applies joins whenever it can.

For the following subsection of the ERD;

With everything left eager, a JPQL query of


SELECT o FROM Employee o

Would execute an initial query true to its JPQL statement:


select
        employee0_.ID as ID1_0_,
        employee0_.INFO_ID as INFO_ID4_0_,
        employee0_.NAME as NAME2_0_,
        employee0_.VERSION as VERSION3_0_ 
    from
        EMPLOYEES employee0_

Then, for each Employee with PersonalInfos to initialize, it follows up with these queries:


select
        personalin0_.ID as ID1_1_0_,
        personalin0_.AGE as AGE2_1_0_,
        personalin0_.EMAIL as EMAIL3_1_0_,
        personalin0_.EXT_ID as EXT_ID4_1_0_,
        personalin1_.ID as ID1_2_1_,
        personalin1_.ADDRESS as ADDRESS2_2_1_,
        personalin1_.CONTACT_NUM as CONTACT_3_2_1_ 
    from
        PERSONAL_INFO personalin0_ 
    left outer join
        PERSONAL_INFO_EXT personalin1_ 
            on personalin0_.EXT_ID=personalin1_.ID 
    where
        personalin0_.ID=?
]]>

Hibernate makes a nice move in keeping true to what the JPQL defines. This behavior is what splits the queries into two parts where the initial query is one that follows the JQPL; let us then call this the main query. The supporting queries that aim to initialize eager relationships then execute as per provider's find implementation (seemingly).

This behavior seems to aim to hand as much control as it can to the JPQL, because it is what is most visible to the developer - so that little is left to implicit default behavior.

This is also where configuration all associations to be lazy come in; when we configure JPQL to fetch everything that we need, then there's no need to worry about attempts to fetch anything else.

Hibernate: Acquisition by Find

For entity manager's find, fetch control is completely handed over to defaults and configurations - no more splitting queries.

With the same section of the ERD and everything still eager, an EntityManager.find executes the following query:


Hibernate: 
    select
        employee0_.ID as ID1_0_0_,
        employee0_.INFO_ID as INFO_ID4_0_0_,
        employee0_.NAME as NAME2_0_0_,
        employee0_.VERSION as VERSION3_0_0_,
        personalin1_.ID as ID1_1_1_,
        personalin1_.AGE as AGE2_1_1_,
        personalin1_.EMAIL as EMAIL3_1_1_,
        personalin1_.EXT_ID as EXT_ID4_1_1_,
        personalin2_.ID as ID1_2_2_,
        personalin2_.ADDRESS as ADDRESS2_2_2_,
        personalin2_.CONTACT_NUM as CONTACT_3_2_2_ 
    from
        EMPLOYEES employee0_ 
    left outer join
        PERSONAL_INFO personalin1_ 
            on employee0_.INFO_ID=personalin1_.ID 
    left outer join
        PERSONAL_INFO_EXT personalin2_ 
            on personalin1_.EXT_ID=personalin2_.ID 
    where
        employee0_.ID=?

With unconfigured fetching, Hibernate is able to fetch all of it in a single query. It also becomes apparent that outer join is the default fetch behavior for eager associations. Perhaps we can say that Hibernate follows configuration by exception more.

Of course, this doesn't mean one is better than the other. Eclipselink just handles it differently, and it probably doesn't want to do anything fancy unless told so.

Configuring Single-Valued Associations

From here on out, let us assume that all associations are configured to be lazy.

Naturally, there are ways to augment fetch behavior:

1. JPA: JPQL Join Fetching

In runtime or named-query JPQL, LEFT JOIN FETCH or INNER JOIN FETCH clauses can be declared after the FROM clause, and before the WHERE clause (if any). A sample JPQL statement follows:


SELECT o FROM Employee o LEFT JOIN FETCH o.info
]]>

Having the configuration in the JPQL itself, the main query follows suit. Hibernate's execution follows:


select
        employee0_.ID as ID1_0_0_,
        personalin1_.ID as ID1_1_1_,
        employee0_.INFO_ID as INFO_ID4_0_0_,
        employee0_.NAME as NAME2_0_0_,
        employee0_.VERSION as VERSION3_0_0_,
        personalin1_.AGE as AGE2_1_1_,
        personalin1_.EMAIL as EMAIL3_1_1_,
        personalin1_.EXT_ID as EXT_ID4_1_1_ 
    from
        EMPLOYEES employee0_ 
    left outer join
        PERSONAL_INFO personalin1_ 
            on employee0_.INFO_ID=personalin1_.ID 
    where
        employee0_.ID=?

Eclipselink behaves the same way.

It is also worth noting, though probably obvious, that join fetching an association overrides it to be eager.

There is a difference between these two providers, however.

Hibernate

Hibernate does not allows left join fetching beyond immediate associations of the root entity.

Perhaps it makes us consider; we did the join because we need it, but could there have been a better way to model things instead? I think it doesn't sound like it's worth the trouble anyway, so I'm not really in favor of this limitation.

Eclipselink

On the other hand, Eclipselink allows deeper fetch joins, as this behavior also aligns with its internals and availability of more dynamic configuration options.

Using join fetching is certainly nice and standard. However, using them on JPQL queries may render them specialized (unless they already appear where they are used), especially if they are NamedQueries. NamedQueries are nice because for most IDEs, you can already get early warnings during development. On the other hand, increasing the metadata for an entity class file is not very nice. This concern will be discussed later.

In any case, NamedQueries only contribute little to metadata clutter.

2. Eclipselink and Hibernate: Fetch Annotations

Without a standard way to include fetch configurations in metadata, each provider has specific annotations instead.

For the examples, let us refer to the same section of the ERD:

Hibernate

Hibernate offers the @Fetch annotation, and it accepts a FetchMode value which has three options: join, select, and subselect.

FetchMode: Join

Recall the entity acquisition examples where every was defaulted. It was mentioned that eager associations used outer join. It is represented by this FetchMode value.

Now that all the associations have been made lazy, let's try it out! We annotate Employee's association to PersonalInfo with @Fetch using join:


    @JoinColumn(name = "INFO_ID")
    @Fetch(value = org.hibernate.annotations.FetchMode.JOIN)
    @OneToOne(fetch = javax.persistence.FetchType.LAZY)
    private PersonalInfo info;

Now let's try out acquiring the Entity by a JPQL query SELECT o FROM Employee o:


    Hibernate: 
    select
        employee0_.ID as ID1_0_,
        employee0_.INFO_ID as INFO_ID4_0_,
        employee0_.NAME as NAME2_0_,
        employee0_.VERSION as VERSION3_0_ 
    from

The association was not joined. Recall the earlier acquisition by JPQL example once more, where I mentioned that the main query kept true to the JPQL. This is the reason for that.

However, even if all associations were declared lazy, the fetching did not finish; separate queries were executed to fetch PersonalInfo objects:


Hibernate: 
    select
        personalin0_.ID as ID1_1_0_,
        personalin0_.AGE as AGE2_1_0_,
        personalin0_.EMAIL as EMAIL3_1_0_,
        personalin0_.EXT_ID as EXT_ID4_1_0_ 
    from
        PERSONAL_INFO personalin0_ 
    where
        personalin0_.ID=?
Hibernate: 
    select
        personalin0_.ID as ID1_1_0_,
        personalin0_.AGE as AGE2_1_0_,
        personalin0_.EMAIL as EMAIL3_1_0_,
        personalin0_.EXT_ID as EXT_ID4_1_0_ 
    from
        PERSONAL_INFO personalin0_ 
    where
        personalin0_.ID=?

Two of the results had values for their info associations, and they were loaded right away.

Needless to say, using EntityManager.find will successfully join the association.

Since eager associations are configured to use join, then the only explanation should be that configuring a FetchMode of join on a lazy association would coerce it to eager. This makes sense since joins cannot be delayed. Furthermore, if we annotate deeper into PersonalInfo's association with PersinalInfoExt, it also gets included in the join.

However, this undoes our precaution of making all associations lazy.

On another note involving eager join fetching; since the joins can go indefinitely deep, it may be of interest to limit the depth. This can be done by including the hibernate.max_fetch_depth property in the persistence.xml file, set to a reasonable value:

FetchMode: Select

Hibernate 5.2 javadocs specifies this mode to "use a secondary select for each individual entity, collection, or join load". Sounds like the default that Eclipselink has.

A quick test of using it on a lazy single-valued association showed that it doesn't make the association eager, so let's try explicitly making the relationship eager:


    @JoinColumn(name = "INFO_ID")
    @Fetch(value = FetchMode.SELECT)
    @OneToOne(fetch = FetchType.EAGER)
    private PersonalInfo info;

It had the following results:


Hibernate: 
    select
        employee0_.ID as ID1_0_,
        employee0_.INFO_ID as INFO_ID4_0_,
        employee0_.NAME as NAME2_0_,
        employee0_.VERSION as VERSION3_0_ 
    from
        EMPLOYEES employee0_
Hibernate: 
    select
        personalin0_.ID as ID1_1_0_,
        personalin0_.AGE as AGE2_1_0_,
        personalin0_.EMAIL as EMAIL3_1_0_,
        personalin0_.EXT_ID as EXT_ID4_1_0_ 
    from
        PERSONAL_INFO personalin0_ 
    where
        personalin0_.ID=?
Hibernate: 
    select
        personalin0_.ID as ID1_1_0_,
        personalin0_.AGE as AGE2_1_0_,
        personalin0_.EMAIL as EMAIL3_1_0_,
        personalin0_.EXT_ID as EXT_ID4_1_0_ 
    from
        PERSONAL_INFO personalin0_ 
    where
        personalin0_.ID=?

We find that it executes three queries: one for the Employees, and two for those two employees who have PersonalInfo values. Surprisingly, it does not override any fetch latency configuration.

There's nothing too special with this configuration involving single-valued associations. This actually performs worse than joins.

FetchMode: Subselect

According to Hibernate 5.2 javadocs, this option is "available to collections" only.

Eclipselink

Eclipselink provides two ways to configure fetching: @JoinFetch and QueryHints.

@JoinFetch

Let's get right down to it and stick it to the PersonalInfo association:


    @JoinColumn(name = "INFO_ID")
    @JoinFetch(JoinFetchType.OUTER)
    @OneToOne(fetch = FetchType.LAZY)
    private PersonalInfo info;

Regardless of acquisition method, if we query this by ID, the following statement is executed:


SELECT t1.ID, t1.NAME, t1.VERSION, t1.INFO_ID, t0.ID, t0.AGE, t0.EMAIL, t0.EXT_ID 
FROM EMPLOYEES t1 
LEFT OUTER JOIN PERSONAL_INFO t0 ON (t0.ID = t1.INFO_ID) 
WHERE (t1.ID = ?)

Predictably, this also coerced the association to become eager. What is interesting is that the FetchModes available specify between inner or outer join. Unfortunately, the inner join option did not work when I was experimenting; perhaps it's a bug or I'm missing something.

As another experiment, let us also apply this annotation to PersonalInfo's association with PersonalInfoExt:


    @JoinColumn(name = "EXT_ID")
    @OneToOne(fetch = javax.persistence.FetchType.LAZY)
    @JoinFetch(JoinFetchType.OUTER)
    private PersonalInfoExt moreInfo;

Again, regardless of acquisition method, we are greeted by the following query:


SELECT t1.ID, t1.NAME, t1.VERSION, t1.INFO_ID, t0.ID, t0.AGE, t0.EMAIL, t0.EXT_ID 
FROM EMPLOYEES t1 
LEFT OUTER JOIN PERSONAL_INFO t0 ON (t0.ID = t1.INFO_ID)

The join made was only for the first level. Even if the deeper association is also annotated, it did not get fetched. It appears that Eclipselink does not completely coerce it to eager, and it cannot cascade its join beyond one level. I've tried setting the deeper association to eager, but it was loaded in a separate query.

This kind of draws a parallelism between the limitations of Hibernate's JPQL joins and Eclipselink's annotation joins.

We can only then successfully join the PersonalInfoExt association if our query root is PersonalInfo:


SELECT t1.ID, t1.AGE, t1.EMAIL, t1.EXT_ID, t0.ID, t0.ADDRESS, t0.CONTACT_NUM 
FROM PERSONAL_INFO t1 
LEFT OUTER JOIN PERSONAL_INFO_EXT t0 ON (t0.ID = t1.EXT_ID) 
WHERE (t1.ID = ?)
]]>

3. QueryHints: FETCH and LEFT_FETCH

Query hints are actually a standard JPA way to configure query behavior. Being more granular, than their entity annotation counterparts. Here are two ways to apply such hints:

Part of NamedQuery declaration


@NamedQuery(name = "findEmployee", query = "SELECT o FROM Employee o",
     hints =
       { @QueryHint(name = QueryHints.LEFT_FETCH, value = "o.info.moreInfo"),
         @QueryHint(name = QueryHints.READ_ONLY, value = "true") })

Runtime


TypedQuery query = em.createQuery("SELECT o FROM Employee o", Employee.class);
query.setHint(QueryHints.LEFT_FETCH, "o.info.moreInfo");
query.setHint(QueryHints.READ_ONLY, HintValues.TRUE);
        
//or when using em operations:
Map hints = new HashMap<>();
hints.put(QueryHints.LEFT_FETCH, "o.info.moreInfo");
hints.put(QueryHints.READ_ONLY, HintValues.TRUE);
Employee emp = em.find(Employee.class, 3L, hints);

Eclipselink is a provider that allows extensive configuration at query time, thanks to the abundance of hints it provides.

The FETCH and LEFT_FETCH query hints names are constants from the org.eclipse.persistence.config.QueryHints class. They provide the behavior of the INNER and OUTER modes of @JoinFetch, respectively, but can cascade the join beyond the immediate association.

There is little need for an example, so here are some descriptions and considerations instead:

It joins all specified relationships into a single query.

It can specify associations up to any depth.

This hint can be applied multiple times, with each time building on top of the current configuration.
If a left fetch hint is a substring of another, then the shorter one is not needed;

e.g. if LEFT_FETCH on "o.info.moreInfo" is specified, there is no need to apply one for "o.info".

The value passed should begin with a fetch alias, such as "o.".

i.e. The first traversal segment, such as in "o.info.moreInfo" is treated as an alias, so passing in "info.moreInfo" will cause the configuration to fail.
This fetch alias can be any arbitrary string, though "o." is usually used.

It can be said that the extensiveness of Eclipselink's LEFT JOIN FETCH in JPQL is attributed to its internals that are able to heavily augment the behavior of many provisions during query time.

Configuring Multivalued/Collection Associations

For this section we will use the collections part of the ERD:

JPA defaults these associations, our toMany annotations, to lazy, which only makes sense. Now let's explore how it fetches.

Because initializing these relationships is deferred to usage time, we have to call the corresponding collection's get(something) or isEmpty() method to trigger initialization (usually within a transaction or as long as the entity isn't detached).

Let us then get an Employee with three Beneficiaries using EntityManager.find, after which we call isEmpty() on its beneficiaries collection:


Employee emp = em.find(Employee.class, 1L);
emp.getBeneficiaries().isEmpty();

Throughout the method, the following queries are executed by Hibernate:


Hibernate: 
    select
        employee0_.ID as ID1_3_0_,
        employee0_.INFO_ID as INFO_ID4_3_0_,
        employee0_.NAME as NAME2_3_0_,
        employee0_.VERSION as VERSION3_3_0_ 
    from
        EMPLOYEES employee0_ 
    where
        employee0_.ID=?

Hibernate: 
    select
        beneficiar0_.EMPLOYEE_ID as EMPLOYEE5_0_0_,
        beneficiar0_.ID as ID1_0_0_,
        beneficiar0_.ID as ID1_0_1_,
        beneficiar0_.AGE as AGE2_0_1_,
        beneficiar0_.INFO as INFO4_0_1_,
        beneficiar0_.EMPLOYEE_ID as EMPLOYEE5_0_1_,
        beneficiar0_.NAME as NAME3_0_1_ 
    from
        BENEFICIARIES beneficiar0_ 
    where
        beneficiar0_.EMPLOYEE_ID=?

]]>

This time, let's run it with Eclipselink:


SELECT ID, NAME, VERSION, INFO_ID FROM EMPLOYEES WHERE (ID = ?)
 bind => [1]

SELECT ID, AGE, NAME, EMPLOYEE_ID, INFO FROM BENEFICIARIES WHERE (EMPLOYEE_ID = ?)
 bind => [1]

The ERD shows that Beneficiary has another collection; if we access that, another query is executed:


Hibernate: 
    select
        contactinf0_.BENEFICIARY_ID as BENEFICI4_2_0_,
        contactinf0_.ID as ID1_2_0_,
        contactinf0_.ID as ID1_2_1_,
        contactinf0_.ADDRESS as ADDRESS2_2_1_,
        contactinf0_.BENEFICIARY_ID as BENEFICI4_2_1_,
        contactinf0_.CONTACT_NUM as CONTACT_3_2_1_ 
    from
        CONTACT_INFO contactinf0_ 
    where
        contactinf0_.BENEFICIARY_ID=?
Eclipselink:
    SELECT ID, ADDRESS, CONTACT_NUM, BENEFICIARY_ID 
    FROM CONTACT_INFO 
    WHERE (BENEFICIARY_ID = ?)

As a side note, if we make these two multivalued associations eager, Eclipselink remains the same, while Hibernate executes the following query:


select
        employee0_.ID as ID1_3_0_,
        employee0_.INFO_ID as INFO_ID4_3_0_,
        employee0_.NAME as NAME2_3_0_,
        employee0_.VERSION as VERSION3_3_0_,
        beneficiar1_.EMPLOYEE_ID as EMPLOYEE5_0_1_,
        beneficiar1_.ID as ID1_0_1_,
        beneficiar1_.ID as ID1_0_2_,
        beneficiar1_.AGE as AGE2_0_2_,
        beneficiar1_.INFO as INFO4_0_2_,
        beneficiar1_.EMPLOYEE_ID as EMPLOYEE5_0_2_,
        beneficiar1_.NAME as NAME3_0_2_,
        contactinf2_.BENEFICIARY_ID as BENEFICI4_2_3_,
        contactinf2_.ID as ID1_2_3_,
        contactinf2_.ID as ID1_2_4_,
        contactinf2_.ADDRESS as ADDRESS2_2_4_,
        contactinf2_.BENEFICIARY_ID as BENEFICI4_2_4_,
        contactinf2_.CONTACT_NUM as CONTACT_3_2_4_ 
    from
        EMPLOYEES employee0_ 
    left outer join
        BENEFICIARIES beneficiar1_ 
            on employee0_.ID=beneficiar1_.EMPLOYEE_ID 
    left outer join
        CONTACT_INFO contactinf2_ 
            on beneficiar1_.ID=contactinf2_.BENEFICIARY_ID 
    where
        employee0_.ID=?

Surprisingly, Hibernate also default multivalued associations to join when made eager. Of course, we shouldn't let this happen as the result will be a cartesian product with exponentially increasing duplicate information. Of course, this can be quickly remedied by specifying a FetchMode other than join for the association.

Going back to default behavior, the standard is that the query for initializing a connection is one that selects using the key-references to the immediate owning entity. Hence, initializing the Beneficiaries collection uses the Employee's ID, and initializing the ContactInfo collection uses the Beneficiary's ID.

Now imagine that all the associations are lazy. If we find an Employee and iterate through its Beneficiaries to get information from its BeneficiaryInfo single-valued association, we would have to execute an extra query to get the Beneficiaries, and another query for each of the Beneficiaries' ContactInfo. This is the famous n+1 queries problem (where the query to get the Beneficiaries is the "1", then querying "n" times for each Beneficiary's association - yes, this can also occur to beneficiary's single-valued associations; basically anything that is separately queried for).

This algebraic representation is actually an understatement in many cases, as the "n" part can usually be expanded further (even recursively). If we started with a query for multiple Employees, this happens, assuming we need to go deep into the contacts.

Again, the association accessed from Beneficiary doesn't have to be multivalued for it to be come a problem.

Of course, providers are aware of this problem and offer ways to avoid or at least alleviate the problem.

Hibernate

FetchMode: Subselect

Let's start off with the mystery third option of Hibernate's @Fetch annotation: subselect, which is only available to collection associations. Documentation explains it to "use a subselect query to load the additional collections".

Let us test it out on Employee's association with Beneficiary:


    @OneToMany(mappedBy = "employee")
    @Fetch(FetchMode.SUBSELECT)
    private List beneficiaries;

If we find a single Employee and then access its beneficiaries collection, the following query is executed:


Hibernate:    
    select
        beneficiar0_.EMPLOYEE_ID as EMPLOYEE5_0_0_,
        beneficiar0_.ID as ID1_0_0_,
        beneficiar0_.ID as ID1_0_1_,
        beneficiar0_.AGE as AGE2_0_1_,
        beneficiar0_.INFO as INFO4_0_1_,
        beneficiar0_.EMPLOYEE_ID as EMPLOYEE5_0_1_,
        beneficiar0_.NAME as NAME3_0_1_ 
    from
        BENEFICIARIES beneficiar0_ 
    where
        beneficiar0_.EMPLOYEE_ID=?

It doesn't look like anything changed. In fact, this is the same even if we use FetchMode.SELECT instead. This time, let's also annotate Beneficiary's (multivalued) association with ContactInfo:


    @OneToMany(mappedBy = "employee")
    @Fetch(FetchMode.SUBSELECT)
    private List beneficiaries;

Accessing the beneficiaries collection still executes the same query. However, when we access the contactInfo collection in beneficiaries, the following query is executed:


Hibernate: 
    select
        contactinf0_.BENEFICIARY_ID as BENEFICI4_2_1_,
        contactinf0_.ID as ID1_2_1_,
        contactinf0_.ID as ID1_2_0_,
        contactinf0_.ADDRESS as ADDRESS2_2_0_,
        contactinf0_.BENEFICIARY_ID as BENEFICI4_2_0_,
        contactinf0_.CONTACT_NUM as CONTACT_3_2_0_ 
    from
        CONTACT_INFO contactinf0_ 
    where
        contactinf0_.BENEFICIARY_ID in (
            select
                beneficiar0_.ID 
            from
                BENEFICIARIES beneficiar0_ 
            where
                beneficiar0_.EMPLOYEE_ID=?
        )

This time, we find that the contactInfo collection was initialized using an IN clause, where the associated Beneficiary IDs were specified using a subquery that selects by the Employee ID! Furthermore, because all of the beneficiaries of the specific employee, were included, their contactInfo collections were initialized as well - even if we only accessed that of the first!

As another test, let's acquire multiple Employees using the JPQL SELECT o FROM Employee o, and then access both collections. This results in the following queries:


Hibernate: 
    select
        employee0_.ID as ID1_3_,
        employee0_.INFO_ID as INFO_ID4_3_,
        employee0_.NAME as NAME2_3_,
        employee0_.VERSION as VERSION3_3_ 
    from
        EMPLOYEES employee0_
Hibernate: 
    select
        beneficiar0_.EMPLOYEE_ID as EMPLOYEE5_0_1_,
        beneficiar0_.ID as ID1_0_1_,
        beneficiar0_.ID as ID1_0_0_,
        beneficiar0_.AGE as AGE2_0_0_,
        beneficiar0_.INFO as INFO4_0_0_,
        beneficiar0_.EMPLOYEE_ID as EMPLOYEE5_0_0_,
        beneficiar0_.NAME as NAME3_0_0_ 
    from
        BENEFICIARIES beneficiar0_ 
    where
        beneficiar0_.EMPLOYEE_ID in (
            select
                employee0_.ID 
            from
                EMPLOYEES employee0_
        )
Hibernate: 
    select
        contactinf0_.BENEFICIARY_ID as BENEFICI4_2_1_,
        contactinf0_.ID as ID1_2_1_,
        contactinf0_.ID as ID1_2_0_,
        contactinf0_.ADDRESS as ADDRESS2_2_0_,
        contactinf0_.BENEFICIARY_ID as BENEFICI4_2_0_,
        contactinf0_.CONTACT_NUM as CONTACT_3_2_0_ 
    from
        CONTACT_INFO contactinf0_ 
    where
        contactinf0_.BENEFICIARY_ID in (
            select
                beneficiar0_.ID 
            from
                BENEFICIARIES beneficiar0_ 
            where
                beneficiar0_.EMPLOYEE_ID in (
                    select
                        employee0_.ID 
                    from
                        EMPLOYEES employee0_
                )
            )

With the first query being the main query that actually finds the Employees first, we find that the other two queries(which respectively initialize Beneficiaries and their ContactInfo) base their subselect on the main query. This is, then, the behavior of FetchMode.subselect. Also, notice that as the associations go deeper, the subselects being to nest.

This certainly helps out with our n+1 problem, but sometimes we don't need really all of the data - we just want to avoid separate querying for of the values we come across.

WELL, TOUGH LUCK BECAUSE WE CAN'T HAVE EVERYTHING. This doesn't mean that we'll have nothing.

BatchSize

Hibernate gives us the @BatchSize annotation. Because it requires a size argument, it means that it's up to us to guess how big the batch sizes should be. It's our compromise between being able to query for more than one result at a time and undesirably having a query return more data than can be processed.

Let us try it out before we get into the details. We annotate both collection relationships with @BatchSize while keeping the @Fetch annotation in subselect mode:


In Employe---------
    @OneToMany(mappedBy = "employee")
    @Fetch(FetchMode.SUBSELECT)
    @BatchSize(size = 3)
    private List beneficiaries;
In Beneficiary-----
    @OneToMany(mappedBy = "beneficiary")
    @Fetch(FetchMode.SUBSELECT)
    @BatchSize(size = 3)
    private List contactInfo;

If we search for a single Employee (and the access both collections), the following queries are executed:


Hibernate: 
    select
        employee0_.ID as ID1_3_,
        employee0_.INFO_ID as INFO_ID4_3_,
        employee0_.NAME as NAME2_3_,
        employee0_.VERSION as VERSION3_3_ 
    from
        EMPLOYEES employee0_ 
    where
        employee0_.ID=1
Hibernate: 
    select
        beneficiar0_.EMPLOYEE_ID as EMPLOYEE5_0_1_,
        beneficiar0_.ID as ID1_0_1_,
        beneficiar0_.ID as ID1_0_0_,
        beneficiar0_.AGE as AGE2_0_0_,
        beneficiar0_.INFO as INFO4_0_0_,
        beneficiar0_.EMPLOYEE_ID as EMPLOYEE5_0_0_,
        beneficiar0_.NAME as NAME3_0_0_ 
    from
        BENEFICIARIES beneficiar0_ 
    where
        beneficiar0_.EMPLOYEE_ID=?
Hibernate: 
    select
        contactinf0_.BENEFICIARY_ID as BENEFICI4_2_1_,
        contactinf0_.ID as ID1_2_1_,
        contactinf0_.ID as ID1_2_0_,
        contactinf0_.ADDRESS as ADDRESS2_2_0_,
        contactinf0_.BENEFICIARY_ID as BENEFICI4_2_0_,
        contactinf0_.CONTACT_NUM as CONTACT_3_2_0_ 
    from
        CONTACT_INFO contactinf0_ 
    where
        contactinf0_.BENEFICIARY_ID in (
            select
                beneficiar0_.ID 
            from
                BENEFICIARIES beneficiar0_ 
            where
                beneficiar0_.EMPLOYEE_ID=?
        )

It seems like @BatchSize did nothing. If we fetch multiple Employees via JPQL, the succeeding queries still use the subselect based on the main query.

This time, let's remove the @Fetch annotation (or set it to FetchMode.SELECT, as it is the default for lazy associations). Upon execution and collection access, the following query appears:


Hibernate: 
    select
        employee0_.ID as ID1_3_0_,
        employee0_.INFO_ID as INFO_ID4_3_0_,
        employee0_.NAME as NAME2_3_0_,
        employee0_.VERSION as VERSION3_3_0_ 
    from
        EMPLOYEES employee0_ 
    where
        employee0_.ID=?
Hibernate: 
    select
        beneficiar0_.EMPLOYEE_ID as EMPLOYEE5_0_1_,
        beneficiar0_.ID as ID1_0_1_,
        beneficiar0_.ID as ID1_0_0_,
        beneficiar0_.AGE as AGE2_0_0_,
        beneficiar0_.INFO as INFO4_0_0_,
        beneficiar0_.EMPLOYEE_ID as EMPLOYEE5_0_0_,
        beneficiar0_.NAME as NAME3_0_0_ 
    from
        BENEFICIARIES beneficiar0_ 
    where
        beneficiar0_.EMPLOYEE_ID=?
Hibernate: 
    select
        contactinf0_.BENEFICIARY_ID as BENEFICI4_2_1_,
        contactinf0_.ID as ID1_2_1_,
        contactinf0_.ID as ID1_2_0_,
        contactinf0_.ADDRESS as ADDRESS2_2_0_,
        contactinf0_.BENEFICIARY_ID as BENEFICI4_2_0_,
        contactinf0_.CONTACT_NUM as CONTACT_3_2_0_ 
    from
        CONTACT_INFO contactinf0_ 
    where
        contactinf0_.BENEFICIARY_ID in (
            ?, ?, ?
        )

There is now a difference in fetching, but only in fetching the ContactInfo collection of Beneficiary.

But the documentation said we could "define size for loading of collections or lazy entities"! It does not lie. A lot of people new to this annotation also get confused.

Batching can only happen where the owning entity was also retrieved as part of a collection or a result of a range query - if we queried for multiple Employees at the start, then the initialization of Beneficiaries would also have been batched. Essentially, Hibernate isn't worried about fetching a collection association of an entity that was queried for by itself - if we were worried, we could have executed a separate paginated query for that - it was worried about associations of entities from a collection.

Additionally, we can also batch fetch single-valued associations of entities we received in a collection. To do this, we annotate the entity at class level with @BatchSize. It would have made sense if they were at attribute level, but it didn't work when I tried it.

Having used a batch size of 3, the WHERE clause in initializing the ContactInfo collection used 3 Beneficiary IDs. Effectively, this reduces our n+1 problem into (n/batchSize)+1.

Extra Lazy Collections

On the other hand, if out of a large collection only a handful (or less) is required, then extra lazy collections might be important. Honestly, there are other ways to go about this - like pagination -, and this method still pretty much suffers from the n+1 problem.

In any case, this is an interesting feature. It can immediately work on the collection at hand (though it has to be an entity association, and not an initial query with multiple results), but it has some nasty prerequisites:

The target entity collection association offers a column with dense numeric values with which to order the collection.

The collection association would then be annotated with @javax.persistence.OrderBy(name="nameOfIndexCol")

The collection association is annotated with @org.hibernate.annotations.ListIndexBase(<int value>), where the int value represents the start value of the order column in the target entity. This is so we don't have to prefetch any data before we access elements of the collection. When accessing the actual list, it still starts at 0.

Let us quickly satisfy these requirements on Employee's association with Beneficiary:


    @OneToMany(mappedBy = "employee")
    @LazyCollection(LazyCollectionOption.EXTRA)
    @ListIndexBase(1)
    @OrderColumn(name = "ID")
    private List beneficiaries;

It might be worth noting that configuring to extra lazy collection takes priority over FetchMode.SUBSELECT, but not over FetchMode.JOIN.

Anyway, if we find a single Employee and iterate through the Beneficiary collection at list indices 0 and 1, the following queries are executed:


--the initial query
Hibernate: 
    select
        employee0_.ID as ID1_3_0_,
        employee0_.INFO_ID as INFO_ID4_3_0_,
        employee0_.NAME as NAME2_3_0_,
        employee0_.VERSION as VERSION3_3_0_ 
    from
        EMPLOYEES employee0_ 
    where
        employee0_.ID=?

--querying for entry at index 1
Hibernate: 
    select
        beneficiar0_.ID as ID1_0_0_,
        beneficiar0_.AGE as AGE2_0_0_,
        beneficiar0_.INFO as INFO4_0_0_,
        beneficiar0_.EMPLOYEE_ID as EMPLOYEE5_0_0_,
        beneficiar0_.NAME as NAME3_0_0_ 
    from
        BENEFICIARIES beneficiar0_ 
    where
        beneficiar0_.EMPLOYEE_ID=? 
        and beneficiar0_.ID=?

--querying for entry at index 2
Hibernate: 
    select
        beneficiar0_.ID as ID1_0_0_,
        beneficiar0_.AGE as AGE2_0_0_,
        beneficiar0_.INFO as INFO4_0_0_,
        beneficiar0_.EMPLOYEE_ID as EMPLOYEE5_0_0_,
        beneficiar0_.NAME as NAME3_0_0_ 
    from
        BENEFICIARIES beneficiar0_ 
    where
        beneficiar0_.EMPLOYEE_ID=? 
        and beneficiar0_.ID=?

Well, it works. The configuration causes the WHERE clause in the entry-specific queries have two conditions: one that references the owning entity, and one for matching with the order column, respectively.

As a final note, it might be worth knowing that in these "smarter" collections, operations like size() and contains() will no longer trigger initialization, though some operations (like isEmpty())may execute a COUNT or MAX query so the collection "knows" if it has contents.

Eclipselink

@BatchFetch

Eclipselink provides the @BatchFetch annotation which can be considered a combination of Hibernate's @Fetch(FetchType.SUBSELECT) and @BatchSize() through its size parameter and its three available types: JOIN(the default), EXISTS, or IN. The size parameters only works for the IN type, however.

First, let's try out JOIN on Employee's Beneficiaries association:


    @OneToMany(mappedBy = "employee")
    @BatchFetch(value = BatchFetchType.JOIN)
    private List beneficiaries;

For some reason, weaving fails if we leave @BatchFetch without its value attribute, which should have defaulted to JOIN. Anyway, let us use the JPQL SELECT o FROM Employee o, and then access a beneficiaries collection:


    SELECT ID, NAME, VERSION, INFO_ID FROM EMPLOYEES

    SELECT t0.ID, t0.AGE, t0.NAME, t0.EMPLOYEE_ID, t0.INFO FROM BENEFICIARIES t0, EMPLOYEES t1 WHERE (t0.EMPLOYEE_ID = t1.ID)

Of course, the first query is the one to actually get the Employees; the second query has the batching. In this case, we find that the selection for the beneficiaries includes on the Employee ID, taking after the original query. To prove this, let's add a pretty useless WHERE clause condition, o.name <> 'some_name'. The secondary query then becomes the following:


    SELECT t0.ID, t0.AGE, t0.NAME, t0.EMPLOYEE_ID, t0.INFO FROM BENEFICIARIES t0, EMPLOYEES t1 WHERE ((t0.EMPLOYEE_ID = t1.ID) AND (t1.NAME <> ?))
 bind => [some_name]

If we try to access the deeper collection that is the association of Beneficiary to ContactInfo, the following query is executed:


    SELECT t0.ID, t0.ADDRESS, t0.CONTACT_NUM, t0.BENEFICIARY_ID 
    FROM CONTACT_INFO t0, EMPLOYEES t2, BENEFICIARIES t1 
    WHERE ((t0.BENEFICIARY_ID = t1.ID) AND ((t1.EMPLOYEE_ID = t2.ID) AND (t2.NAME <> ?)))
 bind => [some_name]

We find that the WHERE clause conjoins with the main query. Worry not; only the necessary data appears in the SELECT clause because the joining only affects the condition.

Essentially, this partitions the initialization the same way as Hibernate's subselect fetching, except that the joins widen as associations nest, whereas in subselect, the subselections nest.

This time, let us try using the EXISTS option, but still executing the same JPQL SELECT o FROM Employee o WHERE o.name <> 'some_name'. Skipping the main query, the following query executes when we access an element of beneficiaries:


    SELECT t0.ID, t0.AGE, t0.NAME, t0.EMPLOYEE_ID, t0.INFO 
    FROM BENEFICIARIES t0 
    WHERE EXISTS 
        (SELECT t1.ID FROM EMPLOYEES t1 WHERE ((t1.NAME <> ?) AND (t0.EMPLOYEE_ID = t1.ID)))

Then let's access a beneficiary's contactInfo collection:


    SELECT t0.ID, t0.ADDRESS, t0.CONTACT_NUM, t0.BENEFICIARY_ID 
    FROM CONTACT_INFO t0 
    WHERE EXISTS 
      (SELECT t1.ID FROM BENEFICIARIES t1 
       WHERE (EXISTS 
         (SELECT t2.ID FROM EMPLOYEES t2 WHERE ((t2.NAME <> ?) AND (t1.EMPLOYEE_ID = t2.ID)))  AND (t0.BENEFICIARY_ID = t1.ID)))

Again, this initializes the same amount of data, but in a different way. This time, it's nested EXISTS subqueries.

Finally, I was supposed to demonstrate the IN option, but its behavior is pretty much exactly like Hibernate's @BatchSize when given a size argument. The only difference is that Hibernate does not have a way to execute a raw IN batch query unless @BatchSize is used.

As an important note, the size argument in the @BatchFetch annotation does not seem to work; I had to apply the size as a query hint instead.

QueryHints: BATCH and BATCH_TYPE

With little need for description, quick information about these query hint counterparts of @BatchFetch follow:

QueryHints.BATCH accepts the association traversal string to batch.

The association string, much like LEFT_FETCH, should begin with an arbitrary alias.

QueryHints.BATCH_TYPE defines the type of batching used for the assocations to be batched.

It accepts the strings "JOIN", "EXISTS", or "IN", or their BatchFetchType enum counterparts.

QueryHints.BATCH_SIZE defines the size of the IN clause for the IN batch type.

Finally, the most interesting feature of Eclipselink's batching toolset is this: it can also be applied to single-valued associations.

This is useful because Eclipselink does not yet define a way to join a single-valued association to something that is fetched by batch other than using the @JoinFetch annotation which we try to avoid. Since we can apply these batch configurations to single-valued associations as well, we are still able to optimize their fetching!

The following images summarizes the capabilities of some of these configuration options:

Configuring With Hibernate's FetchProfile

Unfortunately, I came across a StackOverflow entry with an answer that mentions that JPA 2.1 does not support FetchProfiles. This was my most anticipated configuration option for Hibernate.

I'll update this post once I test it on JPA 2.0; I still tried it (I'm using 2.1), but it didn't work on my experiments.

Configuring Scalar Attributes

These basic fields aren't associations, but they can be configured for optimization nonetheless. This time, we want to limit which fields we want to SELECT, as were only able to specify associations previously (which still took all of the fields of the queried entities). Let's get this over with.

Like mentioned near the beginning of this post, because we are dealing with scalar attributes, configuring them as lazy will not work unless bytecode instrumentation is done. This is true for both providers.

JPA: @Basic

JPA provides the @Basic annotation to be used on scalar types, should we desire to configure its fetch latency. Its fetch attribute accepts a FetchType value, either EAGER or LAZY, as used in the annotations @OneToOne, etc.

This augments the default behavior so that we have to access the attribute to have it loaded.

JPA (2.1): EntityGraph

JPA 2.1 brings the EntityGraph configuration. With the EntityGraph, we can define a contract that tells initialization which attributes to initialize for the entity. This is called a fetch group.

There are two ways to define an EntityGraph. The first is to declare them with the @NamedEntityGraph annotation:


@Entity
@Table(name = "EMPLOYEES")
@NamedEntityGraphs({
                   @NamedEntityGraph(name = "testGraph",
                                     attributeNodes =
                                     { @NamedAttributeNode(value = "name"),
                                       @NamedAttributeNode(value = "info", subgraph = "info_email") },
                                     subgraphs =
                                     { @NamedSubgraph(name = "info_email",
                                                      attributeNodes = { @NamedAttributeNode(value = "email") }) })
    })

It certainly is a lot of code just to fetch a couple of attributes. One thing to note is that the info association uses the info_email subgraph (also declared in the same NamedEntityGraph). This subgraph further specifies that only the email attribute is to be fetched for that association. Anything annotated with @ID and @Version are always included, and cannot be excluded. Furthermore, because we declare the fields and associations we want right away, providers usually configure relevant associations to eager for this fetch scheme, and then make lazy everthing else (this is true for Eclipselink; Hibernate has an issue with this, where eager associations are not made lazy).

If we declare an association without a subgraph, then all of its attributes and associations that are not lazy will be included (and cascade the fetch).

The EntityGraph can then be applied through the following code:


        EntityGraph entityGraph = em.getEntityGraph("testGraph");
        
        //when using find:
        Map hints = new HashMap<>();
        hints.put("javax.persistence.fetchgraph", entityGraph);
        Employee emp = em.find(Employee.class, 1L, hints);
        
        //when using a query:
        TypedQuery query = em.createQuery("SELECT o FROM Employee o", Employee.class);
        query.setHint("javax.persistence.fetchgraph", entityGraph);

We can also build EntityGraphs at runtime:


        //creating an EntityGraph identical to the previous example
        EntityGraph entityGraph = em.createEntityGraph(Employee.class);
        entityGraph.addAttributeNodes("name");
        Subgraph infoSubGraph = entityGraph.addSubgraph("info");
        infoSubGraph.addAttributeNodes("email");

        //get a mutable copy of the named EntityGraph
        entityGraph = em.createEntityGraph("testGraph");
        //modifications...

Unfortunately, because using an EntityGraph as a fetchgraph coerces the associations to eager, It may be counterproductive to use with Hibernate as it automatically joins the associations in the graph.

An EntityGraph can also be applied as a LoadGraph. Simply replace the hints application with "javax.persistence.loadgraph". Using a loadgraph defines that nodes in the graph be "treated as FetchType.EAGER and attributes that are not specified are treated according to their specified or default FetchType" (from the JPA 2.1 spec). After some experimentation, I didn't achieve the expected behavior yet, so I won't expound much else.

JPA: JPQL Constructors

Another way to get only a subset of attributes is through using constructors in the JPQL query. The following is an example:


SELECT NEW com.test.Employee(o.id, o.name) FROM Employee o

Of course, this is very static behavior and prolific usage of this method bloats the codebase with multiple queries and constructors.

Eclipselink: FetchGroup (QueryHints.FETCH_GROUP_NAME, FETCH_GROUP, and FETCH_GROUP_ATTRIBUTE)

The FetchGroup feature of Eclipselink works almost exactly like JPA's EntityGraph (when used as a fetchgraph) since it's Eclipselink's underlying implementation for it. In fact, the FetchGroup feature predates JPA 2.1. The main difference is that FetchGroup does not coerce associations to become eager. Sure, it makes everything not declared in it to be lazy, but it makes sure that when a lazy association it includes is initialized, only the what is declared in the FetchGroup is loaded for the association's target.

Declaring a fetch group is much simpler (as it is nonstandard). The following fetchgroup is similar to the previous sample fetchgraph:


@Entity
@Table(name = "EMPLOYEES")
@FetchGroups({
             @FetchGroup(name = "testFetchGroup",
                         attributes = { @FetchAttribute(name = "name"), 
                                        @FetchAttribute(name = "info.email") })
    })

The named fetchgroup is then applied using the QueryHints.FETCH_GROUP_NAME hint:


        TypedQuery query = em.createQuery("SELECT o FROM Employee o WHERE o.id = 1", Employee.class);
        query.setHint(QueryHints.FETCH_GROUP_NAME, "testFetchGroup");
        Employee emp = query.getSingleResult();

A similar FetchGroup can also be made at runtime:


        TypedQuery query = em.createQuery("SELECT o FROM Employee o WHERE o.id = 1", Employee.class);
        FetchGroup fg = new FetchGroup();
        fg.addAttribute("name");
        fg.addAttribute("info.email");
        query.setHint(QueryHints.FETCH_GROUP_NAME, "testFetchGroup");

Notice that this time no alias is required. Also, if the passed attribute is an association, then, much like the EntityGraph, it includes the eager attributes and associations of that association.

We can also use the query hint FETCH_GROUP_ATTRIBUTE to declare attributes one at a time, calling query.setHint() multiple times.

Finally, in the case that we access attributes not declared in the FetchGroup, it will cause the entity to fully initialize itself, potentially throwing out all our optimizations out the window. Of course, this problem is minimized if all associations have been made lazy, as only the entity who owns the attribute gets to be fully initialized.

Also, since Eclipselink allows multiple applications of some query hints, using default Maps to contain hints for entity manager methods will be problematic, as each key has to be unique. A custom Map implementation has to be used.

Other General Considerations

Adding metadata annotations for fetching strategies, configurations, or specialized queries in entities can quickly cause the code to bloat. Furthermore, it is debatable whether we should pollute model code with knowledge about the requirements of business methods or views. However, since these are directly related to the modeling of the entity, it can be said that this is a cross-cutting concern, as such configurations are heavily involved in both the model and its specific use case.

Personally, I favor moving such requirements as close as possible to where they are used. In cases where actual configuration cannot be stored as they are, I try to instead create tools or objects that are able to represent (and simplify, if possible) these configurations, and then use the tools to translate the objects into the desired configurations. Doing it this way makes a strong preference for granular runtime configurations, usually through applying hints at query level. Luckily, I'm using Eclipselink, and that's what Eclipselink advocates - not that I think it's better or anything.

Perhaps I've said this a couple of times throughout the post, but it's better if we make the associations lazy. This promotes a more fail-safe base configuration for when we breach contracts and limitations set by further configurations - such as when we access properties not declared in a FetchGroup that was used.

It is also usually worthwhile to think if entities should be well-associated and comprehensive (if they are left eager, they usually become the cause of optimization problems) or concise and modular. Considering this becomes a deciding between reusable entities with more configuration and specialized entities with little configuration. A need for concise data may even call for using native queries instead; it comes down to the need for having managed entities that JPA can perform various operations upon - like locking.

Because JPA and its providers rightfully worry so much about collection associations, it is also worthwhile to think about whether it is safe (or even worth it) to make collection associations, or should operations just be performed on them without the association. Of course, it is a lot easier to make JPA handle these operations by making the associations.

Configuration Recommendations

As some solid takeaways (and out of all the mentioned configuration options), here are some recommendations, or at least points to ponder as there never really is a silver bullet:

It is a good idea to make associations lazy if they are not already.
If the entities are concise and only require single-valued associations, using JPQL FETCH JOINS can be sufficient.

Prefer runtime configuration over cross-cutting and cluttered metadata annotations.
For single-valued associations:

The easiest and risk-free option, thought limited, is JPQL join fetching. The most useful is EntityGraph (as fetchgraph), but be careful to not include collection associations and their associations.
Eclipselink has LEFT_FETCH hints.

For collection associations:

For Hibernate, use @Fetch(FetchMode.SUBSELECT) or @BatchSize() accordingly. For single-valued associations in entities that appear in a collection association, annotate the target entity (at class level) with @BatchSize to apply batching.

This can cause maintenance difficulties as it won't be immediately obvious that a relationship is batched until the entity is read.

Eclipselink has batch-related hints. This can avoid clutter and ill-placed annotations.

When it comes to scalar attributes, as the fields in an entity increase, and it is still desirable that they be reusable, it might now be advantageous to use EntityGraphs (as a fetchgraph, or FetchGroups for Eclipselink).

Using EntityGraphs as a loadgraph is still buggy for both providers.
Though using EntityGraphs can achieve deep join fetching with Hibernate, be cautious to avoid (or at least minize) including collection associations, as they will end up being joined as well due to being coerced to eager, and they can form cartesian products.

Unfortunately, other than this, Hibernate does not offer a runtime-configurable way to perform deep joins (perhaps aside from FetchProfiles). Using @Fetch JOIN causes it to be eager, which is undersirable.

Summary

Phew! That was definitely a lot! If you read all that up to here, then I thank you for coming along on this journey.

Anyway, if you started here, then I completely understand that it would have been a lot of text to go through - even I would have had to gather strength to be willing to read something like this. Anyway, please enjoy the summay.

Here are the important points discussed in this post:

Entity associations are declared with @OneToOne, @OneToMany, @ManyToOne, and @ManyToMany.
JPA has two main ways to obtain managed entities: EntityManager operations (like find), and JPQL queries.

Hibernate handles these two acquisition methods differently: if a JPQL is used, then the query executed remains true to the JPQL as possible - configuration annotations will not take effect on this main query.
Eclipselink treats these two equally, where configurations just build on top of each other.
For latency, it is recommeneded that associations are all made lazy.
Everything else in this post mostly has to do with strategy.

There are two main facets of configuration when optimizing fetching: Fetch Latency, and Fetch Strategy.

Latency deals with when associations and fields are fetched - we differentiate between two modes: eager and lazy. An eager property gets loaded (i.e. queried for) during acquisition of the main entity/ies. A lazy property, on the other hand, only gets loaded upon use. JPA defaults toOne associations to eager, and toMany associations to lazy.
Strategy deals with how the associations are fetched - whether they be joined with the original query, or queried separately, but with different WHERE clauses.

Single-valued Associations

With their eager defaults, Eclipselink initializes them through separate queries, one for each association.
Hibernate automatically outer joins eager associations with the initial query.
There are ways to configure fetching single-valued associations:

JPQL LEFT/INNER JOIN FETCH

Simply add this clause between the FROM and WHERE clauses. Hibernate can only fetch a single level, while Eclipselink can fetch beyond.

Fetch Annotations

Hibernate: @Fetch: JOIN, SELECT

The JOIN FetchMode causes the association to join (already default for eager associations), and coerces the association to become eager.
The SELECT FetchMode does not coerce associations to eager. It mimics Eclipselink's behavior of needing a single query for each association.

Eclipselink: @JoinFetch: INNER, OUTER

Eclipselink can only join associations with this annotation for up to one level. The options only change between using an INNER or OUTER join.

Eclipselink: QueryHints: FETCH and LEFT_FETCH

Query hints are applied during NamedQuery definition or at query level. FETCH makes use of on an inner join; LEFT_FETCH uses an outer join.

Multivalued Associations

Initializing these lazy-default associations requires accessing their collections.
Eclipselink and Hibernate fetch them by using a separate query that finds with the key of the owning entity.
If made eager, Hibernate also uses a join. This is not the best option for collections.
The "n+1" problem happens when we separately initialize associations as we traverse a collection.
To solve - or at least alleviate - the problem, providers allow batch fetching.

Hibernate: @Fetch: SUBSELECT

With subselect, Hibernate initializes the association for all members of the collection by nesting IN clauses based on the original query.

Hibernate: @BatchSize

@BatchSize works on a collection association if it is not in JOIN FetchMode. FetchMode SELECT is the default for lazy associations.
This annotation specifies the size of the IN clause (which now uses ID tuples of the owning entities) with which to fetch.
It works in associations whose immediate owner was retrieved as part of a collection (or multivalued result).
Works on both single- and multivalued associations; for single-valued associations, the annotation has to be at class level (of the target entity).

Hibernate: @LazyCollection(LazyCollectionOption.EXTRA)

For when you need only a small number of entries from a large collection.
Enables querying per entry in the collection.
Potentially worsens the n+1 problem.
Has prerequisites:

The target entity has to have a dense, numeric column against which the owner can use JPA's @OrderBy.
The association is annotated with org.hibernate.annotations.ListIndexBase(value), where the value would be the starting index of the value in the ordering column. The resulting List is still zero-based.

Eclipselink: @BatchFetch: JOIN, EXISTS, IN

Works like Hibernate's SUBSELECT FetchMode, except the WHERE clause differs: JOIN expands the condition horizontally as the initial query is joined progressively deeper; EXISTS nests its clauses, much like Hibernate's SUBSELECT; and IN makes use of an IN clause that queries using key-tuples of the owning entity.
The IN type behaves exactly like Hibernate's @BatchSize annotation - it even has a size parameter that configures the batch size.
They also have QueryHints versions (BATCH, BATCH_TYPE, and BATCH_SIZE).

Scalar Attributes

It may be desirable to limit the attributes fetched from entities. There are options:

JPA: @Basic(fetch=FetchType.Lazy)

For attribute-level configuration.

JPA: JPQL Constructors

For query-level configuration.
This method makes queries very specialized and quickly bloats code to an increasing number of constructors.
A very static configuration.

JPA: EntityGraph

Declared by annotation or created at runtime; applied at query level (using hint name "javax.persistence.fetchgraph").
Can be verbose do declare, but because is can be made at runtime, it allows dynamic and well-placed configuration options.
Coerces associations to become eager; using it with Hibernate can cause problems as multivalued associations get joined instead.
Can also be used as a LoadGraph by using the hint name "javax.persistence.loadgraph" instead. A LoadGraph forces attributes it contains to become eager, but leaves the rest to be default. Behavior is still not too predictable for both providers.

Eclipselink: FetchGroup

Predates EntityGraph.
Forces attributes and associations not declared in it to be come lazy, but does not coerce lazy associations it contains to eager.
Much easier to build; has both annotation and runtime creation versions.
Applied via query hints.
Accessing attributes outside a FetchGroup causes full loading of the entity, potentially throwing away configuration benefits (usually if the associations were not made lazy).

Consider when using optimization methods that use annotations - they can bloat entity code with knowledge of business method and view layer requirements; it becomes a cross-cutting concern.

Make all the associations lazy (as also recommended by Hibernate) to protect from heavy damage when configurations are breached (mainly FetchGroups).
Define a good overall scheme for engineering entities and associations.
Prefer runtime configuration, especially those that configure at query level (usually query hints).
View the Configuration Recommendations section.

I'd like to leave you with a quote I found while experimenting on these things:

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%." - Donal Knuth, Structured Programming With GoTo Statements

First of all, if we consider what this quote is from, it is for "programming with goto statements". Of course overthinking optimization is going to be a lot of work. Also, this quote mentions "small efficiencies". The efficiencies discussed in this post, is anything but that.

In the rise of agile engineering, we want to churn out code as fast as we can - when we find the big problems, it is usually only then that we fix it. The problem only happens when there is no general knowledge throughout the team regarding tooling limitations, basic behavior, and how to fix and optimize things when the need arises.

With proper foreknowledge and experience, it shouldn't be a problem to do general preemptive optimizations - it could even set a more solid base and start the project out on a good pace!

The key here is to have a keen understanding of the basics.

I hope I helped with that through this post. Again, thanks for joining me.

Cheers!

Some Sources

https://fndong.wordpress.com/2016/03/14/about-hibernate-orm/comment-page-1/
https://en.wikipedia.org/wiki/EclipseLink
https://en.wikipedia.org/wiki/TopLink
https://en.wikipedia.org/wiki/Java_Persistence_API
https://en.wikipedia.org/wiki/Hibernate_(framework)
http://wiki.c2.com/?PrematureOptimization
https://docs.jboss.org/hibernate/orm/4.2/manual/en-US/html/ch20.html
Java Persistence With Hibernate; First Edition; Manning, 2005
http://www.eclipse.org/eclipselink/#documentation

JPA (Java Persistence API) With Eclipselink and Hibernate: Special Uses

Thursday, May 11, 2017

JPA Fetch Behavior: Eclipselink and Hibernate - and Configuration Options!