Qi Qiao Ban: Byte-Code Transformation is no big deal

To avoid another language in my web projects I'm using Java as the design language for the Objects to be persisted as well. It seemed easy to use with the different ORM Standards and implementations available.

Take me from Java to the Database

Many ORM implementors tend to recommend using a byte-code transformation process to make the classes usable in their respective persisting contextes (e.g. http://www.avaje.org/doc/ebean-userguide.pdf - Chapter 15). This in fact means, after you did your job of coding and compiling the classes, some other component takes this code and transforms it into some other code additionally dealing with the ORM/Database related stuff.
The idea is, to avoid runtime penalties or the generation of subclasses dealing with the additional database related issues which would show up at runtime potentially screwing up your idea of the class hierarchy. (Which it did for me. See below.)

Why class weaving or enhancing is a big deal

Of course this still means that you are running code, you don't now in detail.
The assumption of any of the ORM framework authors is, that the byte-code transformation process can be easily automated and as far as possible be hidden from the application developer. JPA based JEE applications are expected to do the transformation at deployment time to the container - so this doesn't even happen within your development tool-set.
My experience is a different story. And the hiding of things during development once again was no good idea for me.

The easy Start

I started with the Eclipse IDE some years ago and the Google App Engine Plugin. It does the byte-code transformation for the JDO implementation from DataNucleus automatically at compile time. This worked fine as long as I was coding in "play-around" mode. When the code started to grow into modules, from time to time the classes were propagated to the client module unenhanced. This is were I learned what the use of DataNucleus feels like, when in fact just the transformation is missing (of course it doesn't tell "you missed to transform classe this-and-that"). I got around these issues with the dumb "clean nearly everything in your work environment" pattern.

Build System Integration

Things got even more complicated when I started to write build scripts, since the project grew and was supposed to be published. You don't want to give anyone a 20 page description on how to setup the IDE just as you did. You simply give friends a script which describes the necessary parts in human and machine readable form. So the project gets cloned from the source and a simple build tool call - hopefully in default mode with no to few options - will create a usable result.
But the promise to support me as a developer from the ORM provider still holds true for these situations. I was just expected to change the way I was using the transformation tool. DataNucleus comes with an ant task, a compiler plugin and so on. Since I didn't want to use the obsolete legacy tools Maven or ANT (hey, why not use make or punch cards?) I "simply" plugged in the compiler plugin since there is no direct support for Gradle and the integration of the ANT task was not that easy at the initial try.
First of all this compiler plugin was not able deal with all of the versions of the Oracle Java Compiler and all language levels beyond "Java 6" so I had to prepare the source code carefully.
This gave me enhanced classes and sure the enhancer was running, but...
When packaging the classes to JARs as build systems tend to do after compilation nearly automatically, those classes where unusable again with the error messages I was already familiar with.

Unit-Testing the Byte-Code Transformation instead of my Code

At this point in time I started writing JUnit tests to test, if my build environment was working and not to test if my code was correct. This gave me the impression that some things are going wrong.
I learned, that the compiler plugin took some time after compilation before it started to "enhance" (byte-code transform) the class files. It used some sort of threading for this so that Gradle already had packaged the jar files, before the process was completed. I started to add some 10s waiting to my build scripts. Argh...

Refactoring - Get the same Thing you already had

I took a second look at the DataNucleus Enhancer's ANT task to integrate this into the build process as a Gradle task without those eratic 10s of waiting. I also needed this step since I was updating DataNucleus from the old version used in the Google App Engine at that time to a newer one also meant for stand-alone use.

Use other APIs as well like they were simple Libraries

After all these pieces were working, I started playing around with the Java Persistence API JPA. Also the implementations of JPA I came accross - OpenJPA, EclipseLink, and again DataNucleus - recommended the use of byte-code transformations called Enhancement (OpenJPA and DataNucleus) or Weaving (EclipseLink).
The integration of that many APIs and byte-code transformers made things more complicated again, while the code I wrote still is not that complicated. It's just the byte-code transformation which adds to the complexity. I needed to present OpenJPA, EclipseLink, and DataNuceus Versions of my single JAR archive with only very few classes and only two of them needed to be byte-code transformed. Additionally with JPA I have the option to use the original classes without byte-code transformation in some scenarios with certain limitations (Only DataNucleus is really capable of a automatic discovery of available classes for database access, the others need detailed lists passed over to the implementation in different ways. This is anything but portable!)

Stop pretending it is easy and write a decent Tool to do the Job

Since not all of the implementations can be on the compile time classpath of the JPA relying portions of my project, it was now time - just because of the necessary byte-code transformations - to write a Gradle Plugin dealing with this.
Very easily this plugin was generic enough to be used in any project using JPA, JDO, or Ebean as the ORM Solution for Java and the Gradle build tool.
Two third of the work on the build-scripts of the Tangram dynamic webapplication framework were related to the byte-code transformations over the last five years.

Conclusion after some Years

So my best friend now is OpenJPA which can relatively easy be used without transformation. Yes, it presented me the nice subclassing issue where I am at runtime dealing with subclasses of the classes I designed myself, but this was solvable with half a dozen lines of code.
My second best friend is DataNucleus where I am now able to integrate the byte-code transformer into the runtime environment of my framework and write JDO annotated classes in groovy, put the code into the JDO based database layer itself and thus be able to extend the object oriented storage at runtime. This is what adds very nicely to Stylesheets, JavaScript Codes, URL-Formats and Business Logic in Groovy in the Database layer resembling the dynamic part of Tangram. I tried this with the OpenJPA Enhancer and EclipseLink Weaver as well but with no success.
Also I now got a code base which was easily extended with another ORM Solution called EBean. It was meant as an option with a smaller footprint but does not present any advantaged over the other options already implemented and proven in real projects live on the web using the Tangram dynamic web application framework.
So, anyone still thinks that byte-code transformation is a non-issue as you may read on introductory web pages on ORM? Give me some 30s to make your build process break - at least every once in a while when you don't expect it and won't easily discover the source of your pain.
But in the end with my Gradle based plugin, things are definitely a lot easier and reliable - again after a lot of work with things that were supposed to be easy, automatic, or hidden from me.

Qi Qiao Ban

Sonntag, 13. Juli 2014

Byte-Code Transformation is no big deal