Qi Qiao Ban: 2014

Freitag, 5. Dezember 2014

Dinistiq Version 0.3

Dinistiq ist entstanden, weil es schneller zu implemtieren war, als die Integration anderer DI-Framework und Container in Tangram dauerte.
Da es dann eine Weile einfach alles tat, was notwendig war, wurde nur geringe Code-Optimierungen eingeführt.

Qualitätssicherung

Zu seinem ersten Geburtstag hat Dinistiq aber ein echte Upgrade verdient.
Seit August befindet es sich im produktiven Einsatz in einem Testframework zur Validierung der Einhaltung der Anforderungen für ein großes deutsches Medienportal, das auf einer neuen technischen Basis komplett reimplementiert wird.
Dabei sollte ein einfacher, effizienter, wartbarer und schlanker Technologiestack zu Einsatz kommen.
Die alternativen Stapelten Selenium, PHP und Shellskripte und waren nicht in der Lage, die angeforderten Varianten an Plattformen und Browsern mit Testfällen zusammenzuführen. Außerdem paßten sie natürlich nicht in die reine Java-Welt des Restes des Projektes.
Gleichzeitig sollten veraltete Techniken wie Maven natürlich ebenso außen vor bleiben.
Nun muß sich Dinistiq jeden Tag darum kümmern, hunderte von Testfällen mit jeweils unterschiedlichen Browsern und Zielplattformen durch Dependency Injection zusammenzuführen.

Qualitätssicherung

Dinistiq befindet sich nun täglich im Projekteinsatz und es ist nicht mehr hinnehmbar, wenn dabei genutzte Funktionen nicht zuverlässig nutzbar sind. Da Dinistiq mit dem Einsatz reift, sind dennoch häufige Snapshots nötig gewesen, sodaß ist einer verstärkten UnitTest Abdeckung und dem Einsatz von PMD zur Source-Code Kontrolle zwar nur ein Fehler befunden wurde, das Neu-Einführen von Fehlern aber von vorneherein úmgangen wurde.
Zur Generierung der Übersicht wird Jacoco eingesetzt.
Der Impuls war aber hier auch, daß der Einsatz in der Qualitätssicherung für ein ungleich größeres Projekt erfolgen sollte und so mußte sich Dinistiq als Testfeld für die angeforderten Qualitätsmaße an den Source-Code zur Verfügung stellen.
Bisher war dieser Schritt ein voller Erfolg und in jedem Falle viel einfacher umzusetzen als alle mir bekannten Alternativen wie Springframework, Google Guice, Weld, OpenWebBeans, die den Job sicherlich auch erledigen können.
Dafür gibt es heute den Stempel "0.3".

Details

Mit Version 0.3 ist auch das Standard Repository für die Artefakte gewandert und der CI Server ebenfalls. Dies ist nur der Tatsache geschuldet, daß Cloudbees sich aus dem Geschäft zum Jahresende zurückzieht.

Samstag, 30. August 2014

Tangram CoMA erwacht

Da ich nach einem langen Sommeurlaub nun wieder schwerpunktmäßig mit dem CoreMedia CMS zutun habe, habe ich mir auch im Tangram nach sehr langer Ruhe mal wieder die entsprechende Anbindung angesehen.

Ein altes Beispiele

Seit geraumer Zeit gibt es eine Beispielanwendung zum CoreMedia Adapter CoMA im Tangram Framework. Diese litt aber daran, daß das Aufsetzen der Umgebung dazu nicht ganz trivial war. Alleine schon deshalb konnte ich nicht soviel Energie in die Weiterentwicklung stecken.
Für diese Anwendung stehen Möglichkeiten von Tangram, Content einzugeben, nicht zur Verfügung, da der CoMA nur lesend arbeitet. Um dennoch Inhalte und ein paar Templates zum Zeigen zu bekommen, nutze ich die uralte Beispielanwendung MenuSite von CoreMedia selbst.
Man benötigt also zum Nutzen der Anwendung eine Datenbank, wie sie ein CoreMedia Content Server für die MenuSite angelegt und mit Daten befüllt hätte.
Bisher waren die Schritte, die nötig waren, um diese Datenbank aufzusetzen nur zur Ausführung beschrieben. Alle andere Beispiele kann man einfach bauen und starten.

Neue Server

Meine Experimente mit Gradle zum Zusammenbau von CoreMedia Softwarekomponenten ohne das bei mir nicht sehr beliebte Maven haben allerdings nun einen Stand erreicht, in dem man Skripte und kleine Workspaces präsentieren kann, mit denen man die benötigte Datenbank erstellen kann.
Die Content Management Server Webanwendung unter
https://github.com/mgoellnitz/cm-cms-webapp
und die Content Management Server Tools unter
https://github.com/mgoellnitz/cm-cms-tools
Die harte Nuß dabei war weniger der Server als das Tools Paket. Aber ohne die Tools konnte ich die Daten natürlich nicht in den Server importieren. Außerdem hat das von mir gewählte Tomcat Plugin für Gradle sich nicht von der freundlichen Seite gezeigt. Mittlerweile weiß ich, daß auch andere Anwendungen als der CoreMedia Content Server unter dem falsch zusammengestellten Classpath leiden.

Und alles ohne Maven

Die genaus Beschreibung findet sich im Beispiel Repository zu Tangram.

Sonntag, 17. August 2014

Tangram Release 0.9

Over the last year, Tangram has changed very much and does not look the same in all of its modules and options.

Persistence Options

The Tangram Dynamic Web Application Framework has been extended to support the Java Persistence API (JPA) and EBean as persistence layers in addition to the already available Java Data Objects solution. This greatly extends the number of options for platforms to use with Tangram.
The Tangram examples reflect this with the JDO/DataNucleus, JDO/Google App Engine, EBean, and JPA example where the later is switchable with tested API implementations for OpenJPA, EclipseLink, DataNucleus, and Hibernate.
The simple and more or less generic JDO editor now is a generic editor for all the storage solutions and built as a separate module. Except for Google App Engine this module can be left out if there is another solution available to get the content into the repository.

Dynamic Model Extenions

In addition to the Groovy implemented classes stored in the repository, which are used to extend the view and controller part of the application, you can now create JDO annotated classes which are immetiately usable as model classes in the repository. This option can only be presented for the JDO implementation.
To make the building of Tangram itself, any Tangram application, or even any application using byte-code transformation for JDO, JPA, and Ebean easier, a gradle plugin has been introduced. The examples and over blog entries illustrate the usage of this plugin.

IDE Integration

To be able to synchronize codes stored in the repository a separate FTP module has be introduced. Most IDE due to their strong PHP past support FTP better than any other protocol. CSS, JavaScript, Groovy codes can be imported and exported.

Generic Import and Export

Additionally and apart from the FTP module Tangram now contains a generic importer and exporter for the whole content stored in the repository. The XML representation can be used to transfer content between application using JPA, JDO, or EBean. IDs cannot be preserved.

Framework Independency

For the whole Tangram code the Springframework now is only an option to do the setup of the application by Spring's Dependency Injection IoC-Container and for the controller and view parts of the application.
Along this way it was easier to present a custom DI component, Dinistiq, than to use one of the available solutions. Tests have been done with Google Guice, TinyDI, and JSR-330. As of Tangram 0.9 none of these are more than experiments. Work on a Tangram JavaEE integration is also in an experimental stage and would require a major refactoring of some parts of Tangram, see other entries in this blog on the JavaEE topic.
It should now be fairly easy to do more experiments with other IoC/DI frameworks and containers or to integrate other web frameworks starting from the new plain servlet based solution.

Options

Tangram 0.9 has been tested, verified, or is even in production use on the following platforms: Google App Engine, run@CloudBees, OpenShift, and Standalone with Apache Tomcat using RDBMS or MongoDB.
The option matrix now looks like this:
Persistence: JDO, JPA, EBean
IoC: Springframework, Dinistiq
Security: Spring Security, Apache Shiro
Hosting: On Premise, OpenShift, Cloudbees, Google App Engine
Storage: RDBMS, No-SQL e.g. MongoDB, Files

Code Reduction

Tangram started as an idea to plug existing components together to form a web application with as little glue code as possible. Some of the ideas like object oriented templating though had to be coded explicitly. After this starting point the Tangram codes started growing and the 0.9 release got an additional design goal to stop this. Still some of the codes in Tangram duplicate functionality which would otherwise externally be available but avoids a depedency just to call one method. And there are still codes duplicates within Tangram where the different options e.g. for persistence might get a stronger common code base. In these areas it is still not clear if they will differentiate or grow together.

Outlook

The experiences with CapeDwarf showed the way into a Java EE module as an alternative to dinistiq, the Springframework, and the many tests with e.g. Google Guice. The not that encouraging experiences with the EBean ORM might lead to discontinued support for that module. Dinistiq will become more Java EE / CDI compatible and CDI without the full Java EE stack also seems to be an interesting option. Work on the generic import and export functions using XStream needs to be extended and easier integration of OpenID or OAuth outside of the Google App Engine is needed for many applications. Google App Engine with JPA is still on the list but not desperately needed.

Donnerstag, 7. August 2014

Dinistiq 0.2 Release

The minimalistic Dependency Injection solution for the setup of component based Java applications using Singletons and the JSR330 annotations has stabilized to a 0.2 release.
With some minor bug fixes it also better deals with misconfigured setups.
All system properties from the JVM are now part of the scope. The logging framework has been changed from Apache Commons Logging to Simple Logging Facade for Java (slf4j).
Decent handling of maps and booleans has been added and circular dependencies are now detected and reported.
Along all these minor changes the documentation has been extended and a javadoc archive is added to the deliverables.

Freitag, 25. Juli 2014

150 Lines just for one Collection

Some 15 years ago I left the Java Enterprise Edition train and am now pushed by things like CapeDwarf on JEE Servers to the evaluation of running Tangram somehow in a JEE environment. I read the promise, that things have become a lot easier and that JEE has adopted the annotation and convention based style I now feel familiar.
I came across the JSR330 annotations from the javax.injection package which were introduced in the JEE world. JSR330 Annotations are recognized by several Dependency Injection frameworks like Guice, Springframework, my homegrown dinistiq, and obviously the JEE Containers.

Step 1: Get rid of Spring

This is were I started to make Tangram less depending on the Springframework directly or provide interfaces to be implemented in a spring and a non-spring way. For the dependency injection part it was mostly as easy as to migrate from

    @Autowired
    private ClassRepository classRepository;

to

    @Inject
    private ClassRepository classRepository;
I lost the required = false option at that time but was able to live with that given the advantages of portability.

Step 2: Find Alternatives

The spring parts of the application where re-implemented using a plain servlet based solution, and springsecurity was replaced by Apache Shiro for those scenarios.
After having migrated to the new annotations most of my Components where portable across containers. So I don't have to sell my soul to a special DI framework for very many cases, which helps choosing the right infrastructure for every project. But there are several details which still remain problematic, most notably the missing required = false option and the missing collection of instances to be able to automatically let the whole set of instances be injected.

Step 3: Annoying Details

The annoying collection problem looks like this:

@Autowired(required = false)
private Collection<ControllerHook> controllerHooks = new HashSet<ControllerHook>();

Later I migrated this to the JSR-330 Annotations - which also works with spring.

@Inject
private Collection<ControllerHook> controllerHooks = new HashSet<ControllerHook>();

I made this work with dinistiq as well.
With Guice I learned, that it didn't like to collect the instances for me and provide me with the collection out of the box.
I'm expecting here, that any instance implementing ControllerHook from the application context gets added to a collection which then is injected into the consuming component. Spring does it, dinistiq does it, Guice can be convinced to do it by the additional Multibinder:

Multibinder<ControllerHook> multibinder =
Multibinder.newSetBinder(binder(), ControllerHook.class);
multibinder.addBinding().to(UniqueUrlHook.class);
multibinder.addBinding().to(ProtectionHook.class);

Step 4: Show Stopper

Inspired by the CapeDwarf project, promising to be able to deploy applications developed for the Google App Engines and their APIs on a JEE infrastructure, I tried to deploy GAE based Application to such an Infrastructure based on a JBoss AS7 or Wildfly 8 Server using CapeDwarf Versions 1 or 2 respectively.
This approach doesn't work for Tangram since the used DI solution within the application - be it dinistiq or the Springframework - interferes with the JEE Container's CDI implementation. The Application Server starts to interpret the annotations intended for the other DI framework. This would instantiate the application components twice and right at the moment it also fails on some of those components.

Step 5: JEE Wordiness

So quite naturally I now went over to directly migrate Tangram to the JEE world. But also JEE has the same problem as plain Guice has.

@Inject
private Collection<ControllerHook> controllerHooks = new HashSet<ControllerHook>();
This time the solution really is ugly. It seems I have to implement a different consuming component and thus instead of the two lines, a separate ControllerHookProvider is needed

public interface ControllerHookProvider {

    Collection<ControllerHook> getControllerHooks();

} // ControllerHookProvider
Of course the generic implementation looks like the good old two lines which did the whole job for me so far.

public class GenericControllerHookProvider implements ControllerHookProvider {
    @Inject
    private Collection<ControllerHook> controllerHooks;

    public Collection<ControllerHook> getControllerHooks() {
        return controllerHooks;
    } // getControllerHooks()

} // GenericControllerHookProvider
For JEE an alternative implementation has to be chosen, which made the interface necessary in the first place, and it is surprisingly wordy for that common and simple scenario.

@Named("controllerHooksProvider")
@Singleton
public class JeeControllerHooksProvider implements ControllerHookProvider {

    private Collection<ControllerHook> controllerHooks = new HashSet<>();

    @Inject
    public void setControllerHooks(@Any Instance<ControllerHook> hooks) {
        for (ControllerHook hook : hooks) {
            controllerHooks.add(hook);
        } // for
    } // setControllerHooks()

    public Collection<ControllerHook> getControllerHooks() {
        return controllerHooks;
    } // getControllerHooks()

} // JeeControllerHooksProvider
This really didn't invite me to go deep into JEE again. Additionally it is missing any of the configuration file based bean definitions avoiding much of the interfacing of classes where in fact just two injected values make the difference. With the Springframework configuration files and auto scanning of beans go hand in hand. JEE intentionally left out this part.

Donnerstag, 24. Juli 2014

Fast on the first Request

It seems to be common sense with so many developers, that "expensive" things that only happen once are not that bad. And when those things happen at the wrong point in time that it is a good idea to move them to the "application startup".
So most of us are not very surprised and consider it no problem, if this startup takes some time.

Welcome to the Cloud

At a first sight, deploying my applications in the cloud is exactly the same as it used to be. I follow accepted standards, use common frameworks and respect many best practices. So any runtime platform supporting this should do the job. Cloud in that case means, that many more aspects of deployment and operations get automated - including that starting and stopping of additional instances, load balancing and so on.
But wait... When does the platform get the idea to start new instances?

Let the Customer wait?

It's just load. And load means, that Customers issued HTTP Requests and are awaiting responses in time. From research we know that in time means something around 3s. But when a user request leads to the start of a new instance to handle the load it's quite too late to do all the expensive stuff. Application startup is not the right point in time anymore.
Additionally you can learn, that all those nice precomputations simply re-calculate values the instances from yesterday or the other instance on the other machine already learned. And of course it still is a good idea to have all those values at hand - meaning to "cache" them. But very many of the values which are not coded or configured into the application deployment stay the same as long as the deployment environment doesn't change. Or they stay the same unless the application itself changes them and thus is able to tell when really to re-calculate derived values in the caches.

Examples for all this are:

Database derived value caches like query results with calculations based on the result, where just this application writes to the database. Those values might be needed at startup but stay constant until the application itself changes the database contents.
Classpath scanning to auto-discover software components. The developers and deployers wont want to collect the lists manually or at build time but the components don't change after the deployment. And definitely not on every startup.
Webtemplates and other codes which need to be fetched and prepared for use (e.g. get compiled). Those codes get prepared on every startup of the application while they change not that frequently and especially not on application startup. Obviously they add a big amount to the the response time since those codes need to be executed for the generation of the response.

Just use a Cache

What? Oh, well not that Cache. This Cache. This cache doesn't need to be as fast as the in memory caches already available and I didn't want to re-invent the wheel. They just need to have some values at hand some other instance already calculated and they have to be changed when these values change - which is - as already pointed out - not as frequent as other runtime values.
The values in the cache still are volatile and can be re-calculated by the application at any time. But they should be persisted for some time to be available at startup and reduce the time for the first request in web applications.
The jsr107 cache implementation in the Google App Engine is an example of exactly this approach. I wrapped this cache as a PersistentRestartCache in Tangram and cut reponse times for the first request to a third (or half as the worst case scenario) from around 30s to 11s. For all other platforms available for the Tangram dynamic webapplication framework I presented a simple (maybe too simple) file based implementation which does most of the job as well.
Still this leaves out the classpath scanning of the Springframework or dinistiq. In my environment the Springframework uses between 3s and 7s and dinistiq around 4s to do the basic application setup based on classpath scanning and additional configuration files. So half of the time the startup deals with the framework code and not the applicaiton startup itself. And this value only was achieved by nearly brutal reduction of the portion of the classpath to be scanned creating a "components" package where all autoscanned components reside for the Springframework and dinistiq.

Resume

Caches are a good Idea. Applications with a dynamic deployment are a good idea. Thinking of the application startup as a point in time where long calculations might take place is not that much of a good idea (at least anymore) where instances should be automatically braught up depending on load and not human decisions.
So simply bring together caches with persistence and knowledge when those caches can be invalidated. As an example I did this for the tangram web application framework and had quite some success on the Google App Engine, run@cloudbees, and OpenShift cloud platforms for the Java world.
The last but also important point: Optimizing the application startup in general is worth the time nowadays.

Donnerstag, 17. Juli 2014

Not that Groovy

Ist es eigentlich schon allgemein akzeptiert, daß Web Anwendungen nicht mehr davon ausgehen dürfen, daß es für sie eine entspannte Startphase gibt? Daß sie auch beim ersten Request für den Endbenutzer ausreichend schnell antworten sollten? Es klingt immer noch häufig so, als wären "teure" Dinge, die nur einmal in der Anwendung passieren kein so großes Problem und das man dieses "teure" einfach auf die Startphase verschiebt.
Alles was ich hier schreibe, ist nur für Projekte relevant, in denen ein Deployment nennenswerte Kosten verursacht oder es aus anderen Gründen keine gute Idee ist, ständig die gesamte Software installieren. Mir war diese Stabilität mit großen Systemen auf Basis CoreMedia immer ein so großer Vorteil, daß ich das auch mit Tangram im kleinen nachgezeichnet habe (bzw. dort sogar für die großen Projekte vorgezeichnet habe). Dennoch muß man auf wechselnde Anforderungen genauso dynamisch reagieren, wie auf die Anfragen der Kunden an das System.

Messungen mit Groovy-Code in der Datenbank

Um die Software einfacher und schneller an Anforderungen anzupassen, habe ich mich ja vor Jahren auf Groovy-Code in einem irgendwie gearteten Repository festgelegt. Wir kennen das von Stylesheets, JavaScripts, Templates (z.B. Freemarker, Apache Velocity).
Damals habe ich für einen großen deutschen Telekommunikationskonzern eine Reihe von Messungen durchgeführt, die deutlich belegt haben, daß der Zugriff auf eine fertig kompilierte Groovy-Klasse - oder gar eine vorher abgelegte Instanz davon - keine Nachteile gegenüber dem Java-Code des statischen Programmteiles hat.
Für den einzelnen Request einer Webanwendung ist es also vollkommen egal, ob der Code nun deployed wurde oder in der Datenbank liegt.
In der Folge hat die typische Tangram Webanwendung kaum noch Anteile an Java-Code (siehe z.B. amor oder dragon-map usw). Es wird alles über Templates mit Apache Velocity für die Darstellungsschicht und Groovy-Code für die Geschäftslogik oberhalb der Datespeicherung bis hin zur URL-Struktur erledigt. (Mit JDO kann sogar ein Teil des Datenmodells so in den dynamisch veränderbaren Teil verlagert werden.)
Die Effizienz bei der Ausführung wird durch einfaches Kompilieren und Instanziieren beim Speichern der Codes erreicht.
Und hier genau schlummert ein Problem, das sich mir nun richtig stellt, wo einiges an "dynamischen" Codes (also solchen in der Datenbank, die potenziell häufiger geändert werden) zusammengekommen ist:
Im Gegensatz zu den Aufrufen der Anwendung und dem direkten Benutzen der Instanzen erhalten wir beim Start des Systems einen deutlichen Performance-Nachteil, da hier nun einiges an Datenbankabfragen und Kompiliervorgängen sowie Instanziierungen zusammenkommt. In der Summe ist das deutlich aufwendiger als die entsprechenden Vorgänge mit statischem Java-Code.
Warum interessiert der Start aber nun gerade wieder? In den meisten Szenarien kommt so etwas doch nicht gerade häufig vor.

Willkommen in der Wolke

Für einen selbstbetreuten "on premise" Servlet/JSP Container mag das stimmen, aber gerade für die "billigen" Modelle in der Wolke und z.B. auch die Google App Engine, OpenShift oder run@cloudbees stimmt das überhaupt nicht, da hier zum Skalieren und bei Nichtbenutzung der Trick gerade darin besteht, Instanzen der Anwendungen herunterzufahren oder neue hinzuzustellen.
Bei Instanzen auf der Google App Engine, bei denen keine sonstigen Zusicherungen eingestellt sind, führt das zu einem netten Hinauf- und Herunterfahren wie ein Jojo. Bis Version 0.7 beinhalten alle Tangram Anwendungen einen Cron-Job, der genau das verhindern sollte, indem die Anwendung sich selbst aufgerufen hat. Vor einiger Zeit habe ich mich dem Problem nun endlich gestellt und versucht, die Startup-Sequenz einmal zu tunen. In der Ausgangslage sind wir bei einem Erst-Start im Bereich von gerne 30s abgelaufene Zeit.

Spring Tuning

Das ist nicht das erste Mal: Bereits früher hat das Springframework gezeigt, daß es den Komfort für den Entwickler mit "Kosten" beim Start belegt. Der Scan der Klassen nach Annotationen und Komponenten sowie deren vollautomatisches Zusammensetzen dauert auf einem mittelprächtigen Rechner zwei bis drei Sekunden, in der App-Engine jedoch gerne bis zu acht Sekunden, die ein Kunde am Browser warten müßte. Das konnte damals durch das Eingrenzen beim Scan reduziert werden:

<context:component-scan base-package="org.tangram" />

Die Angabe des Base-Package ist hier wichtig und grenzt den Bereich der Klassen, die geprüft werden, nachhaltig ein, was einige Sekunden bringt. In Zukünftigen Versionen sollten hier noch weitere Einschränkungen greifen, weil das "base package" org.tangram immer größer wird und über eine Anzahl von Jars verteilt ist. Ab Version 0.8 müssen daher all Komponenten, die gerne gescannt werden möchten "org.tangram.components" mit Vornamen - also Package-Namen - heißen.

<context:component-scan base-package="org.tangram.components" />

Dazu wurden einige Klassen in den Paketen verschoben und es bringt wieder einmal ein paar Bruchteile von Sekunden ein. Lohnend, aber es durchbricht die fachlich Sortierung der Klassen ein wenig. (Aus org.tangram.jdo wird org.tangram.components.jdo - man findet sich also schon zurecht)

Caching

Moment! Wieso Caching? Das Füllen der Caches - soweit zu dem Zeitpunkt hilfreich oder notwendig - ist doch gerade das, was den Start u.a. so langsam macht. Aber die Inhalte der Caches werden dann bei Start genau dieselben sein, wie beim letzten Herunterfahren - oder dieselben, die schon eine andere Instanz berechnet hat. Es wäre also ganz nett, wenn man einfach auf diese Wert zurückgreifen könnte.
Das tut Tangram letztlich auch bereits seit langer Zeit in der Google App Engine: Der Scan nach Klassen, die zur Persistenz-Schicht gehören, fällt in die gleiche Rubrik wie der Scan des Spring-Framework und die Daten, die sich die BeanFactory hier merken muß, werden in der Google App Engine im Memory Cache über die JSR107 Schnittstelle gespeichert.
Diese Schnittstelle soll uns nun helfen, noch mehr Vorgänge nur so häufig auszuführen, wie es notwendig ist, unabhängig davon, ob es wir von einem Start-Request sprechen oder mitten in der Arbeit sind. Das macht die Webanwendungen Wolken-tauglicher als sie es bisher sind.
Die Benutzung dieses Caches ist denkbar einfach:

try {
CacheFactory cacheFactory = CacheManager.getInstance().getCacheFactory();
jsrCache = cacheFactory.createCache(Collections.emptyMap());
} catch (CacheException ce) {
log.error("()", ce);
} // try
Und danach ist es mit put() und get() getan, wobei man natürlich immer gewahr sein muß, daß einmal nichts im Cache ist. - Und einige kompliziertere Sache - obwohl java.lang.Serializable markiert - kann man zwar hineinstopfen, findet sie in der Admin-Oberfläche, bekommt sie aber nicht mehr heraus.
Das gilt leider insbesondere für per JDO persistierbare Objekte und kompilierte Java Klassen.

Tunen wo es wehtut

Langsam ist bei einem leeren Cache das beziehen aller Codes, die für die Website notwendig sind, aus dem Datastore. Hier geht es schon bei wenigen Codes eher um Sekunden als die oben zitierten Bruchteile. Also enthält Tangram ab Version 0.8 einen Simplen, persistenten Query-Cache, der genau wie der transiente Cache die IDs der Objekte der Ergebnislisten den Queries erfaßt und ablegt. Dieser Cache bringt der gesamten Anwendung bei ersten Aufrufen einer Instanz eine deutlichen Schub bei abfragebasierten Anwendungen.
Mit diesen ID-Listen muß man sich leider immer noch an den Datastore wenden, um die Objekte zu beziehen, da diese Objekte ja nicht in den Cache gelegt werden konnten.
Diesen Schritt aber wiederum kann man sich bei den Codes gut sparen, in dem man sie beim Lesen in reine transiente Objekte umwandelt (hey, es sind Codes!) und sie dann im persistenten Startup Cache ablegegen zu können.
Jetzt ist eine der längsten Phasen beim Starten das Kompilieren der Groovy-Klassen. Hier den Binärcode zu cachen hat sich bisher als nicht umsetzbarer Ansatz erwiesen. Aber evtl. kommt das ja auch noch. So ist der limitierende Faktor beim Startup auf die Anzahl der Groovy-Codes und den Compiler gesetzt. Mehr ging bisher nicht, aber es hat den Startup in der Zeit gedrittelt. - Und die Ansätze daraus sind verallgemeinerbar für Anwendungen in der Wolke.

Dienstag, 15. Juli 2014

Google App Annoyance - aka Engine

Together with the missing support for the Servlet 3.0 specification in Google App Engine (and, yes, there are still too many situations were this specification level ist not available) we are reading from Google for their App Engine that classpath scanning is an issue on that platform and that this is one of the reasons that kept them from supporting this version of the servlet specification.
What I didn't read from Google is, that they noticed that the Servlet 3.0 version is (arguably) one of the most important steps since Java came to the web. There is no major framework in the Java world anymore not doing any classpath scanning right now. The mentioned Springframework I'm using is just one example.
After some five years of work with the Google App Engine and Java as the only language in use, I changed my mind and don't consider the App Engine as one of the first choices in cloud platforms for the Java world anymore.
And this is really just because of these two small issues.
Like very many others I'm using the Springframework - with component scan and thus with classpath scanning. Any first request to an instance take ages. But "first requests" are common in the cloud, where instances need to be shut down and brought up depending on load. And the cloud is what Google App Engine is about, isn't it?
I invested quite some effort to learn how to be fast on the first request while still using the Springframework and even developed my own stripped down, minimalistic Dependency Injection environment for the application setup (dinistiq) to only have the features I'm using at hand. But this all didn't help to make the end user experience satisfying. Things feel slow.
So in the end this all gave the push for Tangram to support that many new platforms, use the CI Features and Repositories at cloudbees, enjoy the command line access of OpenShift. This brought options of different frameworks and I learned much about cloud deployment and operation scenarios. So in that respect we should be thankful for the Google App Engine weakness.
But it still is the source of some level of complexity and number of artifacts flying around in what I call my dynamic web application framework Tangram. Also this currently renders Google App Engine the second best cloud platform for Java while still having great web based monitoring tools.

Montag, 14. Juli 2014

Ein weiterer Tangram Nutzer

Die Ponton GmbH aus Hamburg hat nun seit einiger Zeit auch eine Webanwendung mit Tangram im produktiven Betrieb. Dabei habe ich sie natürlich selbst in diese Richtung geschubst, um schnell Ergebnisse vorweisen zu können, aber weder war ich der einzige Entwickler noch hab es ausreichende Gegenwehr.
Auf dieser Basis wird anscheinend das Projekt auch mit neuen Anforderungen weiterentwickelt.
Als eher konservatives Layout wird hier noch das Springframework genutzt und auf JPA als Persistenzschicht gesetzt. URL-Formate als Groovy-Codes in der Datenbank und dort auch jede Menge Busineslogik waren aber gerade der Gewinner bei der laufenden Anpassung von Kleinigkeiten. Nun ist nicht mehr jedesmal ein Deployment erforderlich wie bei der Vorgängerlösung (in PHP).
Aber insbesondere danke, daß ich das laut sagen darf.

Sonntag, 13. Juli 2014

Byte-Code Transformation is no big deal

To avoid another language in my web projects I'm using Java as the design language for the Objects to be persisted as well. It seemed easy to use with the different ORM Standards and implementations available.

Take me from Java to the Database

Many ORM implementors tend to recommend using a byte-code transformation process to make the classes usable in their respective persisting contextes (e.g. http://www.avaje.org/doc/ebean-userguide.pdf - Chapter 15). This in fact means, after you did your job of coding and compiling the classes, some other component takes this code and transforms it into some other code additionally dealing with the ORM/Database related stuff.
The idea is, to avoid runtime penalties or the generation of subclasses dealing with the additional database related issues which would show up at runtime potentially screwing up your idea of the class hierarchy. (Which it did for me. See below.)

Why class weaving or enhancing is a big deal

Of course this still means that you are running code, you don't now in detail.
The assumption of any of the ORM framework authors is, that the byte-code transformation process can be easily automated and as far as possible be hidden from the application developer. JPA based JEE applications are expected to do the transformation at deployment time to the container - so this doesn't even happen within your development tool-set.
My experience is a different story. And the hiding of things during development once again was no good idea for me.

The easy Start

I started with the Eclipse IDE some years ago and the Google App Engine Plugin. It does the byte-code transformation for the JDO implementation from DataNucleus automatically at compile time. This worked fine as long as I was coding in "play-around" mode. When the code started to grow into modules, from time to time the classes were propagated to the client module unenhanced. This is were I learned what the use of DataNucleus feels like, when in fact just the transformation is missing (of course it doesn't tell "you missed to transform classe this-and-that"). I got around these issues with the dumb "clean nearly everything in your work environment" pattern.

Build System Integration

Things got even more complicated when I started to write build scripts, since the project grew and was supposed to be published. You don't want to give anyone a 20 page description on how to setup the IDE just as you did. You simply give friends a script which describes the necessary parts in human and machine readable form. So the project gets cloned from the source and a simple build tool call - hopefully in default mode with no to few options - will create a usable result.
But the promise to support me as a developer from the ORM provider still holds true for these situations. I was just expected to change the way I was using the transformation tool. DataNucleus comes with an ant task, a compiler plugin and so on. Since I didn't want to use the obsolete legacy tools Maven or ANT (hey, why not use make or punch cards?) I "simply" plugged in the compiler plugin since there is no direct support for Gradle and the integration of the ANT task was not that easy at the initial try.
First of all this compiler plugin was not able deal with all of the versions of the Oracle Java Compiler and all language levels beyond "Java 6" so I had to prepare the source code carefully.
This gave me enhanced classes and sure the enhancer was running, but...
When packaging the classes to JARs as build systems tend to do after compilation nearly automatically, those classes where unusable again with the error messages I was already familiar with.

Unit-Testing the Byte-Code Transformation instead of my Code

At this point in time I started writing JUnit tests to test, if my build environment was working and not to test if my code was correct. This gave me the impression that some things are going wrong.
I learned, that the compiler plugin took some time after compilation before it started to "enhance" (byte-code transform) the class files. It used some sort of threading for this so that Gradle already had packaged the jar files, before the process was completed. I started to add some 10s waiting to my build scripts. Argh...

Refactoring - Get the same Thing you already had

I took a second look at the DataNucleus Enhancer's ANT task to integrate this into the build process as a Gradle task without those eratic 10s of waiting. I also needed this step since I was updating DataNucleus from the old version used in the Google App Engine at that time to a newer one also meant for stand-alone use.

Use other APIs as well like they were simple Libraries

After all these pieces were working, I started playing around with the Java Persistence API JPA. Also the implementations of JPA I came accross - OpenJPA, EclipseLink, and again DataNucleus - recommended the use of byte-code transformations called Enhancement (OpenJPA and DataNucleus) or Weaving (EclipseLink).
The integration of that many APIs and byte-code transformers made things more complicated again, while the code I wrote still is not that complicated. It's just the byte-code transformation which adds to the complexity. I needed to present OpenJPA, EclipseLink, and DataNuceus Versions of my single JAR archive with only very few classes and only two of them needed to be byte-code transformed. Additionally with JPA I have the option to use the original classes without byte-code transformation in some scenarios with certain limitations (Only DataNucleus is really capable of a automatic discovery of available classes for database access, the others need detailed lists passed over to the implementation in different ways. This is anything but portable!)

Stop pretending it is easy and write a decent Tool to do the Job

Since not all of the implementations can be on the compile time classpath of the JPA relying portions of my project, it was now time - just because of the necessary byte-code transformations - to write a Gradle Plugin dealing with this.
Very easily this plugin was generic enough to be used in any project using JPA, JDO, or Ebean as the ORM Solution for Java and the Gradle build tool.
Two third of the work on the build-scripts of the Tangram dynamic webapplication framework were related to the byte-code transformations over the last five years.

Conclusion after some Years

So my best friend now is OpenJPA which can relatively easy be used without transformation. Yes, it presented me the nice subclassing issue where I am at runtime dealing with subclasses of the classes I designed myself, but this was solvable with half a dozen lines of code.
My second best friend is DataNucleus where I am now able to integrate the byte-code transformer into the runtime environment of my framework and write JDO annotated classes in groovy, put the code into the JDO based database layer itself and thus be able to extend the object oriented storage at runtime. This is what adds very nicely to Stylesheets, JavaScript Codes, URL-Formats and Business Logic in Groovy in the Database layer resembling the dynamic part of Tangram. I tried this with the OpenJPA Enhancer and EclipseLink Weaver as well but with no success.
Also I now got a code base which was easily extended with another ORM Solution called EBean. It was meant as an option with a smaller footprint but does not present any advantaged over the other options already implemented and proven in real projects live on the web using the Tangram dynamic web application framework.
So, anyone still thinks that byte-code transformation is a non-issue as you may read on introductory web pages on ORM? Give me some 30s to make your build process break - at least every once in a while when you don't expect it and won't easily discover the source of your pain.
But in the end with my Gradle based plugin, things are definitely a lot easier and reliable - again after a lot of work with things that were supposed to be easy, automatic, or hidden from me.

Montag, 23. Juni 2014

CoreMedia CMS und Gradle - just for the LOLs

Um das tangram-coma Modul wieder mehr in meinen Fokus zu bringen, brauche ich zum Testen immer einen CoreMedia CMS Content Server als backend.

Bisher war dieser Server entsprechend einer Anleitung und mit der Lieferung des Produktes als ZIP-Datei von Hand herzustellen. Das paßt natürlich nicht so richtig in die Tangram Beispiele, die in sich abgeschlossen sein sollten, und die bisher genutzte Version CMS 2008 (5.2) läuft nun unwidderruflich auch aus.

Mit den aktuellen Versionen wird das Produkt handlich in Form von Artefakten in einem Maven-Repository geliefert und es stehen Maven-Module zur Verfügung, daraus komplette Server zu erstellen, zu customizen und zu bestücken.

Leider ist dieser Bereich eben noch mit Maven formuliert und außerdem müßte ich mich dann - und dafür ist es ein wenig früh - vom Minimalbeispiel MenuSite verabschieden.

Also habe ich mich gefragt, ob ich fit genug bin, die Baupläne von CoreMedia mal im ganz kleinen nach gradle zu übersetzen und mir so den Content Management Server mit Build-Script in die Tangram Beispiele zu integrieren. Dabei habe ich natürlich wieder einen Dienst menr aus der Cloud als Entwicklungsunterstützung hinzugezogen, wie es zum Entwicklungsmodell von Tangram am besten paßt: Gut funktionierende Dienste und Komponenten nutzen. Da ich mit CoreMedia in der aktuellen Version hsqldb nicht mehr nutzen kann und nicht mehr wie im bisherigen Beispiel postgresql lokal installieren und nutzen wollte, habe ich mir mal schnell eine Testdatenbank bei DB4Free besorgt, da das "default" Datenbanksystem im CoreMedia CMS derzeit MySQL ist.

Um's kurz zu machen: Das Zusammenstellen eines Content Management Servers geht ganz wunderbar einfach und auch die Maven-Vorlagen sind - für Maven's Verhältnisse - relativ kompakt (d.h. unlesbar, häßlich, langatmig aber nicht lang). Den Prototyp für einen komplett Maven-freien und nicht Tangram-bezogenen Content Management Server gibt's ab heute unter

https://github.com/mgoellnitz/cm-cms-webapp

Und mit diesem Werkzeug gehe ich nun bei Tangram in die nächste Runde. Außerdem sieht man, daß man langsam den Weg weg von Maven (unter Beibehaltung der großartigen Dependency und Repository-Bereitstellung) beschreiten kann hin zu Gradle. Das ist kein harter Schnitt, das ist ein Weg, sodaß ich nun auch im CoreMedia Kontext keinen Neubau mit Maven-Syntax beginnen werde. Gradle verspricht, uns dabei zu unterstützen, da sie die installierte Maven-Basis im Blick haben.

Samstag, 5. April 2014

Why use JSR330 Dependency Injection annotations

I rather apologise to introduce another Dependency Injection Container for the Java world - dinistiq - a very minimalistic approach to the topic. It turned out to be easier to implement another one, than to use others listed here. Limited in features, easy to use, and still more configurable than other options I could think of. After some months of use, I now can invite other users to take a look at it and try it in their own projects.

Also this text gives you a "why" on the use of the JSR 330 annotations for Dependency Injection. It simply makes your code even more reusable in case your development or deployment environment changes.

Since tangram is much more about glueing together proven existing software components and frameworks than writing code, I felt the need to check if the existing code base was really fully dependent on the Spring Framework.

Despite the fact that spring more or less in many ways does what I need, it sometimes feels a bit bloated and does too much magic I don't understand in detail (which I still had to learn when debugging things). So I tried to isolate the spring code during the tangram 0.9 work and present at least a second solution for all the things I did with spring so far.

For tangram spring does three things

Dependency Injection to plug the whole application together
support a decent view layer with JSP and Apache Velocity views
A concise way to map http requests to code - controller classes or methods

So I took a look at other view frameworks like Vaadin, GWT, Apache Wicket, Play, Struts, JSF/JEE, Stripes. Right at the moment I think Vaading, GWT, Wicket, and Play are no really good fit for tangram, Struts in my eyes is a fading technology, and only JSF/JEE is an obvious option. With Java Server Faces I only had unsatisfying project experiences and the rest of JEE goes for plain Servlet. So tangram had to be provided with a plain Servlet way of doing the view layer.

Since the modularity of tangram was achieved by the Spring way of plugging components together with Dependency Injection, the first thing to do was, to mark the generic components in a spring independent way and to look at the other options for the Dependency Injection part. Only then it would be possible to replace the spring view layer with a Servlet view layer during the startup and wire-up of the application.

So the list of relevant DI frameworks gets shortened to those supporting the generic Dependency Injection annotations from JSR330 which are intended for JEE and can e.g. also be used with Google Guice and the Spring Framework alike.

From the reading Google Guice seemed to be a good alternative for the proof of concept phase, but it took me that much work to get something to run with it (not everything can be plugged together programmatically in my case), that I came out faster with my own Dependency Injection Container. Rather minimalistic and only suited for the setup of components.

Its advantage over Guice is that it's smaller and easier configurable with properties files. Weeks later I discovered TinyDI as another option. While this container seems to be a lot cleverer about the search of annotated classes it seems to lack the needed option of extending the configuration aspects from the annotations with properties files - defaults and overridden values and references.

So right at the moment I still don't have a running tangram application but all of the tangram framework now can be used with dinistiq. This example shows that now over 90% of the classes of tangram are free of direct dependencies to the Spring Framework while still taking advantage of its features and runtime environment. The code definitely got cleaner and more reusable.