practicing techie

tech oriented notes to self and lessons learned

Category Archives: development

Web Framework Benchmarks – Round 10

TechEmpower Web Framework Benchmarks is a collaborative, open web framework benchmarking project that I blogged about last year (see An open web application framework benchmark). Since the last write-up a new benchmark round has been run with lots of new test implementations. Among others, round 10 benchmark run includes support for the Cassandra NoSQL database, as well as a new Java-based test implementation leveraging Servlet 3 asynchronous processing.

I did some non-scientific analysis of TFB round 10 results and here’re a few observations on the Peak environment, “best” concurrency level results. According to the documentation round 10 and 9 Peak hosting hardware specifications are the same, so the results should be directly comparable.

JSON serialization test

lwan has overtaken previous run’s #1 performer in the JSON serialization test by a large margin, more than doubling round 9 winner’s throughput. cpoll_cppsp, last round’s winner, also improved its throughput by nearly 700k req/s. Six contestants from round 9 have made it into top 10 in round 10, with all of them being able to improve their performance. OpenResty managed to more than double its round 9 throughput.

My new entrant, servlet3-cass, has taken 5th place in this category, which made it the second best performing Java implementation, losing only slightly to Netty. Based on this, I would conclude that Servlet container overhead doesn’t currently seem to impose a significant bottleneck in this test. It’s interesting to note that there’s about 400 req/s difference between servlet and servlet3-cass performance, though the implementations are very similar. Both have been implemented on the Servlet API, run on the Resin servlet container and use Jackson for JSON processing. The only difference between the two seems to be Jackson JSON library version numbers.

The top 10 test implementations were based on the following programming languages: C, C++, Java and Lua. Of the top 10 performers 5 are Java based, and Ur and Go got dropped out of the top 10 from the previous round.

Single database query test

Since round 9 the #1 performer’s throughput has improved only marginally in this test category. C++ based test implementations dominate the top 4 spots with Lua, Ur and Java also in top 10. Only test implementations based on relational databases made it to top 10, leaving out MongoDB and Cassandra.

Some test implementations seem to have multiple entries in the results table (e.g. cpoll_cppsp-postgres, undertow and activeweb) for this test. I guess this is because the test was run multiple times, but I would’ve expected each framework to be listed only once in the best results table. Also, what’s the exact test methodology like in these cases? Under which conditions does a framework get multiple tries? Some test implementations seem to have experienced performance degradation since round 9. One such example is OpenResty. It would be interesting to analyze the cause further. Was the degradation caused by e.g. changes in the test implementation, infrastructure setup or changes in test methodology?

I was a bit disappointed with the performance of the servlet3-cass test implementation in the database tests. Unfortunately, there’s no resource usage or profiling data available for the test run, and as I don’t have access to a real performance test environment myself, so it’s difficult to analyse potential bottlenecks further.

Because servlet3-cass was able to reach 14 times the single database query throughput with the JSON test, I would reason that no inherent app server framework bottleneck gets reached in the database test. By comparison, the Java and servlet API based test implementation that uses MySQL database, was able to achieve nearly twice the throughput. Two notable implementation differences come to mind between the two test implementations: a) different datastore and b) use of different servlet API (synchronous vs. asynchronous).

Since, Cassandra is in general very fast in reading and writing data by key, there shouldn’t be any inherent reason why it should perform worse than the MySQL based implementation. Some aspects I need to analyse in more detail: 1) Cassandra server configuration 2) Resin servlet container asynchronous processing support and/or configuration 3) check that thread pool size for Resin asynchronous processing and Cassandra driver is optimized for the test server hardware concurrency level. During the test implementation I ran into a couple of bugs in servlet container asynchronous processing support (both Resin + Tomcat. Kudos to both teams for fixing them!), which proved that asynchronous processing is a tricky issue, not only for application, but also app server developers. In order to evaluate whether some aspect of asynchronous processing has a negative performance impact it would be interesting to implement the test using traditional synchronous servlet API.

Multiple database queries test

The best two implementations in this test, start and stream, were able to hold on to their round 9 positions, while both improved throughput by more than 30 %. The top 10 test implementations are based on Dart, Java, C++ and Clojure.

MongoDB based implementations have claimed a triple win in this test with the 4 implementations that follow being PostgreSQL based.

Fortunes test

Five new test implementations made it to top 10 compared to the previous round. The top 10 implementations were implemented in C++, Ur and Java. Again, both undertow and undertow edge seem to be included three times in round 9 data table, which looks a bit strange for the best results table.

Database updates test

Nodejs based implementations that occupied the top 3 places on round 9, have seen their lead overtaken by C++ based implementations. Instead of raw processing efficiency this test type is expected to primarily stress the backend datastore, datastore driver and parallelism of the framework, so C++ as a language should not have an inherent advantage, as such. The top 10 implementation languages in this test are: C++, JavaScript, Scala, Dart and Perl.

Having done quite a bit of Perl programming years ago, I was quite fascinated to see an implementation based on the old scripting workhorse make it to top 10.

In this test there’s also been about 10 % performance degradation for the top 5 implementations.

On round 9 the top 10 implementations were all based on MySQL whereas on round 10 three PostgreSQL based implementations have entered the top 10.

Plaintext test

The top 10 in this category is occupied by test implementations based on C++, C, Java and Scala. While there’s been only a modest performance increase for the #1 position, the top place has been overtaken by ulib. Four new entrants have made their way into top 10. Netty has gained over 40% throughput increase compared to the previous round.

The development process

The TechEmpower team has again put a huge effort into the FrameworkBenchmarks project and done a great job at it! Still, as with everything, there’s always room for improvement and, from a casual test implementation contributor perspective, the following issues should be considered:

  • more predictable release schedule. I realize the TE team needs to prioritize actual customer projects over pro bono style work. However, many test implementers are in a similar position, so having a predictable release plan would help contributors schedule their work. It seems that at least part of the delays may have been caused by scope-creep, so a stricter release policy could also help make release schedule more predictable.
  • release phase change notifications. Not all contributors are able to follow the TFB Google group discussion. Having a notification mechanism for informing about release schedule phase changes could be helpful for many contributors. This could be as simple as a Github issue to subscribe to or a separate Google group for announcements.
  • enable test implementation logging during preview runs (functional test run). I found it very difficult to troubleshoot infrastructure deployment automation related bugs during the preview runs. Allowing server side logging for preview runs could be a huge help for test implementers troubleshooting their code.
  • provide resource consumption statistics for preview runs. The only performance metric a test implementer currently gets is the throughput figures per test and concurrency level. This gives the implementer very little to go by, in terms of optimization feedback.
  • provide access to app server, DB server etc. logs. Additional infrastructure logs would help in identifying functional and performance related issues.
  • performance related data gathering could even be taken further by running test implementations using a profiling tool during preview runs.

Benchmark results visualization tool updated

I’ve updated a TFB projects results visualization tool I created to also render round 10 results. As before, the tool can be found here:

A new Scala + ElasticSearch based test implementation

The project team has been hard at work and they’ve published a tentative release schedule for round 11 results, which are aimed at being released by the end of June 2015. Since round 10, I’ve contributed support for ElasticSearch search server as well as a Scala / Spray based test implementation. Both have been merged into the project repository, so the results should be available when round 11 data gets published. The code for the new test implementation can be found here:

Again, great job TFB development team and contributors – keep up the good work!
Looking forward to seeing round 11 results!

Scala def macros and Java interoperability

Most of the time Scala-Java interoperability works pretty well from a Scala application developer perspective: you can use a wealth of Java class libraries in your Scala programs with fairly little effort. Simply using JavaConversions and maybe a few custom wrappers usually gets you pretty far. Sure, there can be some friction resulting from the use of different programming paradigms and mutable data structures, but if you’re a pragmatist re-using Java code in Scala is nevertheless quite feasible. Scala version 2.10 saw the introduction of an experimental language feature called macros. Specification work on Scala macros recognizes different flavors of macros, but versions 2.10 and 2.11, as well as the future version 2.12, support only def macros. This macro variety behaves similar to methods except that def macro invocations are expanded at compile time. Here’s how EPFL Scala team member and “Scala macros guy” Eugene Burmako characterizes def macros:

If, during type-checking, the compiler encounters an application of the macro m(args), it will expand that application by invoking the corresponding macro implementation method, with the abstract-syntax trees of the argument expressions args as arguments. The result of the macro implementation is another abstract syntax tree, which will be inlined at the call site and will be type-checked in turn.

Def macros resemble C/C++ pre-preprocessor macros, the m4 macro processor and similar in that the result of macro application will be inlined at the call site. A notable difference is that Scala macros are well integrated into the language meaning e.g. that the results of macro expansion are type-checked. But let’s look at this from the code perspective. I’ve implemented a “hello, world” macro and an object called MyReusableService that defines two methods: a regular method and one implemented as a macro. Two objects, ScalaClient and JavaClient, invoke methods on MyReusableService. Here’s what happens when compiling Java code that tries to invoke a macro method on a Scala object:

~ ᐅ sbt 'runMain com.practicingtechie.gist.macros.JavaClient'
[error] /Users/marko/blog-gists/macros-interop/src/main/java/com/practicingtechie/gist/macros/ cannot find symbol
[error] symbol: method macroMethod()
[error] location: class com.practicingtechie.gist.macros.MyReusableService
[error] MyReusableService.macroMethod

Method macroMethod is defined by MyReusableService Scala object, but the method is not visible when inspecting the disassembled class file:

~ ᐅ javap -cp target/scala-2.11/classes com.practicingtechie.gist.macros.MyReusableService$
Compiled from "MyReusableService.scala"
public final class com.practicingtechie.gist.macros.MyReusableService$ {
  public static final com.practicingtechie.gist.macros.MyReusableService$ MODULE$;
  public static {};
  public void regularMethod();

After removing the macroMethod invocation JavaClient is able to compile. ScalaClient, however, is more interesting. Here’s macro debugging output from running the code (with “-Ymacro-debug-lite” argument):

~ ᐅ sbt 'runMain com.practicingtechie.gist.macros.ScalaClient'
[info] Compiling 2 Scala sources and 1 Java source to /Users/marko/blog-gists/macros-interop/target/scala-2.11/classes...
performing macro expansion MyReusableService.macroMethod at source-/Users/marko/blog-gists/macros-interop/src/main/scala/com/practicingtechie/gist/macros/ScalaClient.scala,line-7,offset=158
println("Hello macro world")
Apply(Ident(TermName("println")), List(Literal(Constant("Hello macro world"))))
[info] Running com.practicingtechie.gist.macros.ScalaClient
Hello, from regular method
Hello macro world

In the above extract we can see the location where the macro was applied, as well as results of macro expansion, both as Scala code and as well as abstract-syntax tree (AST) representation.

Finally, we can see macro expansion results expanded and compiled into bytecode at the ScalaClient call site:

~ ᐅ javap -c -cp target/scala-2.11/classes com.practicingtechie.gist.macros.ScalaClient$
  public void main(java.lang.String[]);
       0: getstatic     #19                 // Field com/practicingtechie/gist/macros/MyReusableService$.MODULE$:Lcom/practicingtechie/gist/macros/MyReusableService$;
       3: invokevirtual #22                 // Method com/practicingtechie/gist/macros/MyReusableService$.regularMethod:()V
       6: getstatic     #27                 // Field scala/Predef$.MODULE$:Lscala/Predef$;
       9: ldc           #29                 // String Hello macro world
      11: invokevirtual #33                 // Method scala/Predef$.println:(Ljava/lang/Object;)V

Scala def macros are a very interesting language feature that’s planned to be officially supported in a future Scala version. Def macros are implemented by the Scala compiler, so a function or method whose definition is macro based won’t be accessible in Java code. This is because unlike ordinary function or method invocations the result of the macro application gets expanded at the call site. Still, Scala functions or methods that simply invoke macros from within their body can nonetheless be called from Java code. Depending on how def macros are used, they can sometimes hinder reuse of Scala code from Java or other JVM-based languages.

More info

What’s the strangest bug you’ve squashed?

As software engineers we’re tasked with creating solutions to customer’s business problems. Being complex systems, every once in a while flaws inevitably slip in the design or implementation. And sometimes flaws creep in through use of third party software, which can can make problems all the more difficult to track down. Each bug has a story to tell and the stories about hunting the most puzzling, challenging, annoying and time-consuming bugs can sometimes live with you for a long time.

What’s the strangest bug you’ve managed to squash?

Mine was quite a few years ago when we were working on a greenfield Java EE software project for a client in the health care industry. With alpha release cycle nearing I was deploying a new build in a newly created server environment and during testing we found a bug in an isolated software feature. Initially, I thought there was something wrong with the new environment setup, but after a while I realized the bug seemed to be related with the way the new release was built. During development phase we had been building the software using Oracle JDeveloper IDE, but had moved to using Apache Ant with Sun Java JDK. So, at that point I thought – this was Java EE after all – it was a packaging issue. I carefully compared the working and broken release packages, but couldn’t find any significant differences. Though it was troublesome to reproduce, the problem was fortunately nevertheless reproducible, so I started tracking it down with remote debugging. After a while I noticed that the software was executing a weird code path I couldn’t quite explain.

Puzzled by the strange behaviour I didn’t really have a clear idea how to continue troubleshooting, but I decided to take a long shot with comparing compiled bytecode from the working and broken releases. This was my first time looking at disassembled Java bytecode, which made analysis a bit slow, and all the more interesting, but fortunately in my remote debugging sessions I had been able to identify some likely places for the bug. After staring at the disassembled bytecode for a while an initially innocuous looking bit of code started to look suspect: a mutator method was present in one class in the working build, but missing in the broken one. It turned out that when the code was built with Oracle JDeveloper it automatically generated a mutator method for a subclass, which just happened to override a buggy superclass mutator method. In the broken build such a mutator method wasn’t being generated causing the buggy superclass mutator method to execute.

This story happened years ago, and while there are lots of things I do differently nowadays, including use of different technologies, design approach, unit testing, test automation, build methods and tooling etc., for me this was one of those more memorable bug squashing sessions.

Do you have a intriguing bug hunting story to share?

An open web application framework benchmark

Selecting a platform for your next application development project can be a complex and burdensome undertaking. It can also be very intriguing and a lot of fun. There’s a wide range of different approaches to take: at one end The Architect will attend conferences, purchase and study analyst reports from established technology research companies such as Gartner, and base his evaluation on analyst views. Another approach is to set up a cross-disciplinary evaluation committee that will collect a wishlist of platform requirements from around the organization and make its decision based on a consensus vote. The first approach is very autocratic, while the second can sometimes lead to lack of focus. A clear, coherent vision of requirements and prioritization is essential for the success of the evaluation. Due to these problems, a middle road and a more pragmatic approach is becoming increasingly popular: a tightly-knit group of senior propellerheads use a more empiric method of analysing requirements, study and experiment with potential solution stack elements, brainstorm to produce a short list of candidates to be validated using a hands-on architecture exercises and smell-tests. Though hands-on experimentation can lead to better results, the cost of this method can be prohibitive, so often only a handful of solutions that pass the first phase screening can be evaluated this way.

Platform evaluation criteria depend on the project requirements and may include:

  • developer productivity
  • platform stability
  • roadmap alignment with projected requirements
  • tools support
  • information security
  • strategic partnerships
  • developer ecosystem
  • existing software license and human capital investments
  • etc.

Performance and scalability are often high priority concerns. They are also among those platform properties that can be formulated into quantifiable criteria, though the key challenge here is how to model the user and implement performance tests that accurately model your expected workloads. Benchmarking several different platforms can only add to the cost of benchmarking.

A company called TechEmpower has started a project called TechEmpower Framework Benchmarks, or TFB for short, that aims to compare the performance of different web frameworks. The project publishes benchmark results that application developers can use to make more informed decisions when selecting frameworks. What’s particularly interesting about FrameworkBenchmarks, is that it’s a collaborative effort conducted in an open manner. Development related discussions take place in an online forum and the source code repository is publicly available on GitHub. Doing test implementation development in the open is important for enabling peer review and it allows implementations to evolve and improve over time. The project implements performance tests for a wide variety of frameworks, and chances are that the ones that you’re planning to use are included. If not, you can create your own tests and submit them to be included in the project code base. You can also take the tests and run the benchmarks on your own hardware.

Openly published test implementations are not only useful for producing benchmark data, but can also be used by framework developers to communicate framework performance related best practices to application developers. They also allow framework developers to receive reproducible performance benchmarking feedback and data for optimization purposes.

It’s interesting to note that the test implementations have been designed and built by different groups and individuals, and some may have been more rigorously optimized than others. The benchmarks measure the performance of the framework as much as they measure the test implementation, and in some cases suboptimal test implementation will result in poor overall performance. Framework torchbearers are expected to take their best shot in optimizing the test implementation, so the implementations should eventually converge to optimal solutions given enough active framework pundits.

Test types

In the project’s parlance, the combination of programming language, framework and database used is termed “framework permutation” or just permutation, and some test types have been implemented in 100+ different permutations. The different test types include:

  1. JSON serialization
    “test framework fundamentals including keep-alive support, request routing, request header parsing, object instantiation, JSON serialization, response header generation, and request count throughput.”
  2. Single database query
    “exercise the framework’s object-relational mapper (ORM), random number generator, database driver, and database connection pool.”
  3. Multiple database queries
    “This test is a variation of Test #2 and also uses the World table. Multiple rows are fetched to more dramatically punish the database driver and connection pool. At the highest queries-per-request tested (20), this test demonstrates all frameworks’ convergence toward zero requests-per-second as database activity increases.”
  4. Fortunes
    “This test exercises the ORM, database connectivity, dynamic-size collections, sorting, server-side templates, XSS countermeasures, and character encoding.”
  5. Database updates
    “This test is a variation of Test #3 that exercises the ORM’s persistence of objects and the database driver’s performance at running UPDATE statements or similar. The spirit of this test is to exercise a variable number of read-then-write style database operations.”
  6. Plaintext
    “This test is an exercise of the request-routing fundamentals only, designed to demonstrate the capacity of high-performance platforms in particular. The response payload is still small, meaning good performance is still necessary in order to saturate the gigabit Ethernet of the test environment.”

Notes on Round 9 results

Currently, the latest benchmark is Round 9 and the result data is published on the project web page. The data is not available in machine-readable form and it can’t be sorted by column for analysing patterns. It can, however, be imported into a spreadsheet program fairly easily, so I took the data and analyzed it a bit. Some interesting observations could be made just by looking at the raw data. In addition to comparing throughput, it’s also interesting to compare how well frameworks scale. One way of quantifying scalability is to take test implementation throughput figures for the lowest and highest concurrency level (for test types 1, 2, 4 and 6) per framework and plot them on a 2-D plane. A line can then be drawn between these two points with the slope characterizing scalability. Well-scaling test implementations would be expected to have a positive, steep slope for test types 1, 2, 4 and 6 whereas for test types 3 and 5 the slope is expected to be negative.

This model is not entirely without problems since the scalability rating is not relative to the throughput, so e.g. a poorly performing framework can end up having a great scalability rating. As a result, you’d have to look at these figures together.

To better visualize throughput against concurrency level (“Peak Hosting” environment data), I created a small web app that’s available at (the app is subject to removal without notice).

JSON serialization

The JSON serialization test aims to measure framework overhead. One could argue that it’s a bit of a micro benchmark, but it should demonstrate how well the framework does with basic tasks like request routing, JSON serialization and response generation.

The top 10 frameworks were based on the following programming languages: C++, Java, Lua, Ur and Go. C++ based CPPSP was the clear winner while the next 6 contestants were Java -based. No database is used in this test type.

The top 7 frameworks with highest throughput also have the highest scalability rating. After that, both these figures start declining fairly rapidly. This is a very simple test and it’s a bit of a surprise to see such large variation in results. In their commentary TechEmpower attributes some of the differences to how well frameworks work on a NUMA-based system architecture.

Quite many frameworks are Java or JVM based and rather large variations exist even within this group, so clearly neither the language nor the JVM is an impeding factor in this group.

I was surprised about Node.js and HHVM rankings. Unfortunately, the Scala-based Spray test implementation, as well as the JVM-based polyglot framework Vert.x implementation, were removed due to being outdated. Hope to see these included in a future benchmark round.

Single database query

This test type measures database access throughput and parallelizability. Again, surprisingly large spread in performance can be observed for a fairly trivial test case. This would seem to suggest that framework or database access method overhead contributes significantly to the results. Is the database access technology (DB driver or ORM) a bottleneck? Or is the backend system one? It would be interesting to look at the system activity reports from test runs to analyze potential bottlenecks in more detail.

Before seeing the results I would’ve expected the DB backend to be the bottleneck, but this doesn’t appear to be clear-cut based on the fact that the top, as well as many of the bottom performing test implementations, are using the same DB. It was interesting to note that the top six test implementations use a relational database with the first NoSQL based implementation taking 7th place. This test runs DB read statements by ID, which NoSQL databases should be very good at.

Top performing 10 frameworks were based on Java, C++, Lua and PHP languages and are using MySQL, PostgreSQL and MongoDB databases. Java based Gemini leads with CPPSP being second. Both use MySQL DB. Spring based test implementation performance was a bit of a disappointment.

Multiple database queries

Where the previous test exercised a single database query per request this test does a variable number of database queries per request. Again, I would’ve assumed this test would measure the backend database performance more than the framework performance, but it seems that framework and database access method overhead can also contribute significantly.

The top two performers in this test are Dart based implementations that use MongoDB.

Top 10 frameworks in this test are based on Dart, Java, Clojure, PHP and C# languages and they use MongoDB and MySQL databases.


This is the most complex test that aims to exercise the full framework stack from request routing through business logic execution, database access, templating and response generation.

Top 10 frameworks are based on C++, Java, Ur, Scala, PHP languages and with the full spectrum of databases being used (MySQL, PostgreSQL and MongoDB).

Database updates

In addition to reads this test exercises database updates as well.

HHVM wins this test with 3 Node.js based frameworks coming next. Similar to the Single database query test the top 13 implementations work with relational MySQL DB, before NoSQL implementations. This test exercises simple read and write data access by ID which, again, should be one of NoSQL database strong points.

Top performing 10 frameworks were based on PHP, JavaScript, Scala, Java and Go languages, all of which use the MySQL database.


The aim of this test is to measure how well the framework performs under extreme load conditions and massive client parallelism. Since there’s no backend system dependencies involved, this test measures platform and framework concurrency limits. Throughput plateaus or starts degrading with top-performing frameworks in this test before client concurrency level reaches the maximum value, which seems to suggest that a bottleneck is being hit somewhere in the test setup, presumably hardware, OS and/or framework concurrency.

Many frameworks are at their best with concurrency level of 256, except CPPSP which peaks at 1024. CPPSP is the only one of the top-performing implementations that is able to significantly improve its performance as the concurrency level increases from 256, but even with CPPSP throughput actually starts dropping after concurrency level hits the 4,096 mark. Only 12 test implementations are able to exceed 1 M requests per second. Some well-known platforms e.g. Spring did surprisingly poorly.

There seems to be something seriously wrong with HHVM test run as it generates only tens of responses per second with concurrency levels 256 and 1024.

Top 10 frameworks are based on C++, Java, Scala and Lua languages. No database is used in this test.

Benchmark repeatability

In the scientific world research must be repeatable, in order to be credible. Similarly, the benchmark test methodology and relevant circumstances should be documented to make the results repeatable and credible. There’re a few details that could be documented to improve repeatability.

The benchmarking project source code doesn’t seem to be tagged. Tagging would be essential for making benchmarks repeatable.

A short description of the hardware and some other test environment parameters is available on the benchmark project web site. However, the environment setup (hardware + software) is expected to change over time, so this information should be documented per round. Also, Linux distribution minor release or the exact Linux kernel version don’t appear to be identified.

Detailed data about what goes on inside the servers could be published, so that externals could analyze benchmark results in a more meaningful way. System activity reports e.g. system resource usage (CPU, memory, IO) can provide valuable clues to possible scalability issues. Also, application, framework, database and other logs can be useful to test implementers.

Resin was chosen as the Java application server over Apache Tomcat and other servlet containers due to performance reasons. While I’m not contesting this statement, but there wasn’t any mention about software versions, and since performance attributes tend to change over time between releases, this premise is not repeatable.

Neither the exact JVM version nor the JVM arguments are documented for JVM based test implementation execution. Default JVM arguments are used if test implementations don’t override the settings. Since the test implementations have very similar execution profiles by definition, it could be beneficial to explicitly configure and share some JVM flags that are commonly used with server-side applications. Also, due to JVM ergonomics different GC parameters can be automatically selected based on underlying server capacity and JVM version. Documenting these parameters per benchmark round would help with repeatability. Perhaps all the middleware software versions could be logged during test execution and the full test run logs could be made available.

A custom test implementation: Asynchronous Java + NoSQL DB

Since I’ve worked recently on implementing RESTful services based on JAX-RS 2 API with asynchronous processing (based on Jersey 2 implementation) and Apache Cassandra NoSQL database, I got curious about how this combination would perform against the competition so, I started coding my own test implementation. I decided to drop JAX-RS in this case, however, to eliminate any non-essential abstraction layers that might have a negative impact on performance.

One of the biggest hurdles in getting started with test development was that, at the time I started my project there wasn’t a way to test run platform installation scripts in smaller pieces, and you had to run the full installation, which took a very long time. Fortunately, since then framework installation procedure has been compartmentalized, so it’s possible to install just the framework that you’re developing tests for. Also, recently the project has added support for fully automated development environment setup with Vagrant, which is a great help. Another excellent addition is Travis CI integration that allows test implementation developers to gain additional assurance that their code is working as expected also outside their sandbox. Unfortunately, Travis builds can take a very long time, so you might need to disable some of the tests that you’re not actively working on. The Travis CI environment is also a bit different from the developer and the actual benchmarking environments, so you could bump into issues with Travis builds that don’t occur in the development environment, and vice versa. Travis build failures can sometimes be very obscure and tricky to troubleshoot.

The actual test implementation code is easy enough to develop and test in isolation, outside of the real benchmark environment, but if you’re adding support for new platform components such as databases or testing platform installation scripts, it’s easiest if you have an environment that’s a close replica of the actual benchmarking environment. In this case adding support for a new database involved creating a new DB schema, test data generation and automating database installation and configuration.

Implementing the actual test permutation turned out to be interesting, but surprisingly laborious, as well. I started seeing strange error responses occasionally when benchmarking my test implementation with ab and wrk, especially with higher loads. TFB executes Java based performance implementations in the Resin web container, and after a while of puzzlement about the errors, I decided to test the code in other web containers, namely Tomcat and Jetty. It turned out that I had bumped into 1 Resin bug (5776) and 2 Tomcat bugs (56736, 56739) related to servlet asynchronous processing support.

Architecturally, Test types 1 and 6 have been implemented using traditional synchronous Servlet API, while the rest of the test implementations leverage non-blocking request handling through Servlet 3 asynchronous processing support. The test implementations store their data in the Apache Cassandra 2 NoSQL database, which is accessed using the DataStax Java Driver. Asynchronous processing is also used in the data access tier in order to minimize resource consumption. JSON data is processed with the Jackson JSON library. In Java versions predating version 8, asynchronous processing requires passing around callbacks in the form of anonymous classes, which can at times be a bit high-ceremony syntactically. Java 8 Lambda expressions does away with some of the ceremonial overhead, but unfortunately TFB doesn’t yet fully support the latest Java version. I’ve previously used the JAX-RS 2 asynchronous processing API, but not the Servlet 3 async API. One thing I noticed during the test implementation was that the mechanism provided by Servlet 3 async API for generating error response to the client is much lower level, less intuitive and more cumbersome than its JAX-RS async counterpart.

The test implementation code was merged in the FrameworkBenchmarks code base, so it should be benchmarked on the next round. The code can be found here:


TechEmpower’s Framework Benchmarks is a really valuable contribution to the web framework developer and user community. It holds great potential for enabling friendly competition between framework developers, as well as, framework users, and thus driving up performance of popular frameworks and adoption of framework performance best practices. As always, there’s room for improvement. Some areas from a framework user and test implementer point of view include: make the benchmark tests and results more repeatable, publish raw benchmark data for analysis purposes and work on making test development and adding new framework components even easier.

Good job TFB team + contributors – can’t wait to see Round 10 benchmark data!

Daemonizing JVM-based applications

Deployment architecture design is a vital part of any custom-built server-side application development project. Due to it’s significance, deployment architecture design should commence early and proceed in tandem with other development activities. The complexity of deployment architecture design depends on many aspects, including scalability and availability targets of the provided service, rollout processes as well as technical properties of the system architecture.

Serviceability and operational concerns, such as deployment security, monitoring, backup/restore etc., relate to the broader topic of deployment architecture design. These concerns are cross-cutting in nature and may need to be addressed on different levels ranging from service rollout processes to the practical system management details.

On the system management detail level the following challenges often arise when using a pure JVM-based application deployment model (on Unix-like platforms):

  • how to securely shut down the app server or application? Often, a TCP listener thread listening for shutdown requests is used. If you have many instances of the same app server deployed on the same host, it’s sometimes easy to confuse the instances and shutdown the wrong one. Also, you’ll have to prevent unauthorized access to the shutdown listener.
  • creating init scripts that integrate seamlessly with system startup and shutdown mechanisms (e.g. Sys-V init, systemd, Upstart etc.)
  • how to automatically restart the application if it dies?
  • log file management. Application logs can be managed (e.g. rotate, compress, delete) by a log library. App server or platform logs can sometimes be also managed using a log library, but occasionally integration with OS level tools (e.g. logrotate) may be necessary.

There’s a couple of solutions to these problems that enable tighter integration between the operating system and application / application server. One widely used and generic solution is the Java Service Wrapper. The Java Service Wrapper is good at addressing the above challenges and is released under a proprietary license. GPL v2 based community licensing option is available as well.

Apache commons daemon is another option. It has its roots in Apache Tomcat and integrates well with the app server, but it’s much more generic than that, and in addition to Java, commons daemon can be used with also other JVM-based languages such as Scala. As the name implies, commons daemon is Apache licensed.

Commons daemon includes the following features:

  • automatically restart JVM if it dies
  • enable secure shutdown of JVM process using standard OS mechanisms (Tomcat TCP based shutdown mechanism is error-prone and unsecure)
  • redirect STDERR/STDOUT and set JVM process name
  • allow integration with OS init script mechanisms (record JVM process pid)
  • detach JVM process from parent process and console
  • run JVM and application with reduced OS privileges
  • allow coordinating log file management with OS tools such as logrotate (reopen log files with SIGUSR1 signal)


Deploying commons daemon

From an application developer point of view commons daemon consists of two parts: the jsvc binary used for starting applications and commons daemon Java API. During startup, jsvc binary bootstraps the application through lifecycle methods implemented by the application and defined by commons daemon Java API. Jsvc creates a control process for monitoring and restarting the application upon abnormal termination. Here’s an outline for deploying commons daemon with your application:

  1. implement commons daemon API lifecycle methods in an application bootstrap class (see Using jsvc directly).
  2. compile and install jsvc. (Note that it’s usually not good practice to install compiler toolchain on production or QA servers).
  3. place commons-daemon API in application classpath
  4. figure out command line arguments for running your app through jsvc. Check out bin/ in Tomcat distribution for reference.
  5. create a proper init script based on previous step. Tomcat can be installed via package manager on many Linux distributions and the package typically come with an init script that can be used as a reference.


Practical experiences

Tomcat distribution includes “”, a generic wrapper shell script that can be used as a basis for creating a system specific init script variant. One of the issues that I encountered was the wait configuration parameter default value couldn’t be overridden by the invoker of the wrapper script. In some cases Tomcat random number generator initialization could exceed the maximum wait time, resulting in the initialization script reporting a failure, even if the app server would eventually get started. This seems to be fixed now.

Another issue was that the wrapper script doesn’t allow passing JVM-parameters with spaces in them. This can be handy e.g. in conjunction with the JVM “-XX:OnOutOfMemoryError” & co. parameters. Using the wrapper script is optional, and it can also be changed easily, but since it includes some pieces of useful functionality, I’d rather reuse instead of duplicating it, so I created a feature request and proposed tiny patch for this #55104.

While figuring out the correct command line arguments for getting jsvc to bootstrap your application, the “-debug” argument can be quite useful for troubleshooting purposes. Also, by default the jsvc changes working directory to /, in which case absolute paths should typically be used with other options. The “-cwd” option can be used for overriding the default working directory value.


Daemonizing Jetty

In addition to Tomcat, Jetty is another servlet container I often use. Using commons daemon with Tomcat poses no challenge since the integration already exists, so I decided to see how things would work with an app server that doesn’t support commons daemon out-of-the-box.

To implement the necessary changes in Jetty, I cloned the Jetty source code repository, added jsvc lifecycle methods in the Jetty bootstrap class and built Jetty. After that, I started experimenting with jsvc command line arguments for bootstrapping Jetty. Jetty comes with startup script that has an option called “check” for outputting various pieces of information related to the installation. Among other things it outputs the command line arguments that would be used with the JVM. This provided quite a good starting point for the jsvc command line.

These are the command lines I ended up with:

export JH=$HOME/jetty-9.2.2-SNAPSHOT
export JAVA_HOME=`/usr/libexec/java_home -v 1.8`
jsvc -debug -pidfile $JH/ -outfile $JH/std.out -errfile $JH/std.err -Djetty.logs=$JH/logs -Djetty.home=$JH -Djetty.base=$JH -classpath $JH/commons-daemon-1.0.15.jar:$JH/start.jar org.eclipse.jetty.start.Main jetty.state=$JH/jetty.state jetty-logging.xml jetty-started.xml

This could be used as a starting point for a proper production grade init script for starting and shutting down Jetty.

I submitted my code changes as issue #439672 in the Jetty project issue tracker and just received word that the change has been merged with the upstream code base, so you should be able to daemonize Jetty with Apache commons daemon jsvc in the future out-of-the-box.

Implementing Jersey 2 Spring integration

Jersey is the excellent Java JAX-RS specification reference implementation from Oracle. Last year, when we were starting to build RESTful backend web services for a high-volume website, we chose to use the JAX-RS API as our REST framework and Spring framework for dependency injection. Jersey was our JAX-RS implementation of choice.

When the project was started JAX-RS API 2.0 specification was not yet released, and neither was Jersey 2.0. Since we didn’t see any fundamental deficiencies with JAX-RS 1.1, and because a stable Spring integration module existed for Jersey 1.1, we decided to go with the tried-and-true version instead of taking on the bleeding edge.

Still, I was curious to learn what could be gained by adopting the newer version, so I started looking at the JAX-RS 2 API on my free time and doing some prototyping with Jersey 2. I noticed that Jersey 2 lacked Spring framework integration that was available for the previous version. Studying the issue further, I found that the old Spring integration module would not be directly portable to Jersey 2. The reason was that Jersey 1 builds on a custom internal dependency injection framework while Jersey 2 had switched to HK2 for dependency injection. (HK2 is an interesting, light-weight dependency injection framework used in GlassFish.)

My original goals for Jersey-Spring integration were fairly simple:

inject Spring beans declared in application context XML into JAX-RS resource classes (using @Autowired annotation or XML configuration)

So, I thought I’d dig a bit deeper and started looking into Jersey source code. I was happy to notice that Jersey development was being done in an open and approachable manner. The source code was hosted on GitHub and updated frequently. After a while of digging, a high-level design for Jersey Spring integration started to take shape. It took quite some experimenting and many iterations before the first working prototype. At that point, being an optimist, I hoped I was nearly done and contacted the jersey-users mailing list to get feedback on the design and implementation. The feedback: add more use cases, provide sample code, implement test automation, sign Oracle Contributor Agreement 🙂 (The feedback, of course, was very reasonable from the Jersey software product point of view). So, while it wasn’t quite back to the drawing board, but at this point I realized the last mile was to be considerably longer than I had hoped for.

Eventually, though, the Jersey-Spring integration got merged in Jersey 2 code base in Jersey v2.2 release. The integration API is based on annotations and supports the following features:

  • inject Spring beans into Jersey managed JAX-RS resource classes (using org.springframework.beans.factory.annotation.Autowired or javax.inject.Inject). @Qualifier and @Named annotations can be used to further qualify the injected instance.
  • allow JAX-RS resource class instance lifecycle to be managed by Spring instead of Jersey (org.springframework.stereotype.Component)
  • support different Spring bean injection scopes: singleton, request, prototype. Bean scope is declared in applicationContext.xml.

The implementation

Source code for the Jersey-Spring integration can be found in the main Jersey source repository:

Jersey-Spring integration consists of the following implementation classes:

This ComponentProvider implementation is registered with Jersey SPI extension mechanism and it’s responsible for bootstrapping Jersey 2 Spring integration. It makes Jersey skip JAX-RS life-cycle management for Spring components. Otherwise, Jersey would bind these classes to HK2 ServiceLocator with Jersey default scope without respecting the scope declared for the Spring component. This class also initializes HK2 spring-bridge and registers Spring @Autowired annotation handler with HK2 ServiceLocator. When being run outside of servlet context, a custom org.springframework.web.context.request.RequestScope implementation is configured to implement request scope for beans.

HK2 injection resolver that injects dependencies declared using Spring framework @Autowired annotation. HK2 invokes this resolver and asks it to resolve dependencies annotated using @Autowired.

Handles container lifecycle events. Refreshes Spring context on reload and close it on shutdown.

A convenience class that helps the user avoid having to configure Spring ContextLoaderListener and RequestContextListener in web.xml. Alternatively the user can configure these in web application web.xml.

In addition to the actual implementation code, the integration includes samples and tests, which can be very helpful in getting developers started.

The JAX-RS specification defines its own dependency injection API. Additionally, Jersey supports JSR 330 style injection not mandated by the JAX-RS specification. Jersey-Spring integration adds support for Spring style injection. Both JAX-RS injection and Spring integration provide a mechanism for binding objects into a registry, so that objects can later be looked up and injected. If you’re using a full Java EE application server, such as Glassfish, you also have the option of binding objects via the CDI API. On non-Java EE environments it’s possible to use CDI by embedding a container implementation such as Weld. Yet another binding method is to use Jersey specific API. The test code includes a JAX-RS application class that demonstrates how this can be done.

Modifying Jersey-Spring

If you want to work on Jersey-Spring, you need to check out Jersey 2 code base and build it. That process is rather easy and well documented:

You simply need to clone the repository and build the source. The build system is Maven based. You can also easily import the code base into your IDE of choice (tried it with IDEA 12, Eclipse 4.3 and NetBeans 8.0 beta) using its Maven plugin. I noticed, however, that some integration tests failed with Maven 3.0, and I had to upgrade to 3.1, but apart from that there weren’t any issues.

After building Jersey 2 you can modify the Spring integration module, and build only the changed modules to save time.


Jersey-Spring integration tests have been built using Jersey test framework and they’re run under the control of maven-failsafe-plugin. Integration tests consist of actual test code and a JAX-RS backend webapp that the tests exercise. The backend gets deployed into an external Jetty servlet container using jetty-maven-plugin. Jersey-Spring tests can be executed separately from the rest of the tests. Integration tests can be found in a separate Maven submodule here:

In addition to demonstrating the basic features of Jersey-Spring, the tests show how to use different Spring bean scopes: singleton, request, prototype. The tests also exhibit using a JAX-RS application class for registering your own dependencies in the container, in different scopes.


I think the JAX-RS 2.0 API provides a nice and clean way of implementing RESTful interfaces in Java. Development of the Jersey JAX-RS reference implementation is being conducted in an open and transparent manner. Jersey also has a large and active user community.

As noted by Frederick Brooks, Jr.: “All programmers are optimists”. It’s often easy to underestimate the amount of work required to integrate code with a relative large and complex code base, and in particular when you need to mediate between multiple different frameworks (in this case Jersey, HK2, Spring framework). Also, though Jersey has pretty good user documentation, I missed high-level architectural documentation on the design and implementation. A lot of poking around was needed to be able to identify the correct integration points. Fortunately, the Jersey build system is pretty easy to use and allows building only selected parts, which makes experimenting and the change-build-test cycle relatively fast.

Both Jersey and Spring framework provide a rich set of features and you can use them together in a multitude of ways. Jersey-Spring integration in it’s current form covers a couple of basic integration scenarios between the two. If you find that your particular scenario isn’t supported, join the jersey-users mailing list to discuss it. You can also just check out the code, implement your changes and contribute them by submitting a pull request on GitHub.

Building a beacon out of Pi

Building a beacon out of Pi

iBeacon is, to quote Wikipedia, an indoor positioning system that Apple calls “a new class of low-powered, low-cost transmitters that can notify nearby iOS 7 devices of their presence. The technology is not, however, restricted to iOS devices and can currently also be used with Android devices. Many people are predicting it will change retail shopping.

At it’s core, the technology allows proximity sensing so, that a device can alert a user when moved from or to close proximity of a peer device. In a typical use case a “geo fence” is established around a stationary device (i.e. “beacon”) and mobile devices carried by users issue alerts when crossing the fence to either enter or exit the region.

After Apple’s WWDC 2013 conference we’ve seen quite a bit of a buzz about iBeacon. Interestingly, the underlying technology is based on the Bluetooth Proximity profile specification ratified in 2011. Though a lot of the buzz has been associated with Apple, the company is not attributed as a contributor to the original specification. Also, Apple is yet to reveal what it plans to do exactly with iBeacon.

A few other companies are planning to bring iBeacon compatible beacons to the market, but the beacons aren’t shipping, yet. So, if you want to start developing software right now you have to resort to other solutions. One option is to use a Bluetooth LE (BLE) capable device, equipped with the right software, as a beacon. For example BLE capable iOS devices, can act as beacons with the AirLocate application. So, if you have e.g. a new iPhone 5 to spare, you can make it into a beacon very easily. Another option is to build a beacon yourself, since iBeacons are based on standard Bluetooth LE proximity profile. A company called Radius Networks has published an article about building a beacon, so I decided to try this out.

Beacon BOM

The bill of materials for the beacon was pretty simple

  • Raspberry Pi + memory card + power cord
  • a USB Bluetooth 4.0 LE dongle

Additionally, I bought the following items to make installation easier:

  • a USB SD card reader
  • HDMI display cable
  • USB hub
beacon bill-of-materials

beacon bill-of-materials

Unfortunately, Radius was using IOGEAR GBU521 Bluetooth dongle which I couldn’t find at any of the local electronics shops. USB Bluetooth dongles aren’t very expensive, however, and since there aren’t many Bluetooth chipsets on the market, I decided to experiment a bit and buy two different dongles to try out. These were the Asus USB-BT400 and TeleWell Bluetooth 4.0 LE + EDR.

Operating system deployment

Some vendors sell memory cards with pre-installed OS for Raspberry Pi, but I couldn’t find one of those either at my local electronics shop, so I bought a blank memory card and a USB-based SD card reader, just in case. The SD card reader proved handy because it turned out the card didn’t work with the built-in reader on my MacBook Pro. Installing the Raspbian (2013-09-25-wheezy) Linux distribution was fairly straightforward using the instructions on Embedded Linux Wiki (RPi Easy SD Card Setup). The only notable issue was that on Mac OS X writing the OS image to the card was a lot faster (~ 4 min. vs. ~ 30 min.) using the raw disk device instead of the buffered one. Another issue was that the memory card disk had to be unmounted using diskutil, not ejecting it through Mac OS X Finder.

After installing the OS image on the SD card it was time to see if the thing would boot. Unfortunately, I only had a VGA monitor and no suitable HDMI-VGA adapters , so I wasn’t able to make a console connection with the Pi. Being eager to see if everything worked so far, I decided to connect the Pi to a wireless access point and power it up. After a few moments, I noticed that RasPi had acquired an IP address from the AP’s DHCP server and I was able to log in to the Pi via SSH using the default credentials. So, no console whatsoever was required to set up the Pi!

Being a Java developer, I was happy to notice that Raspbian came with a fairly recent Java Standard Edition 7 installation by default. The latest Java 8 build is also available for Raspbian.

Building the Bluetooth stack

Once the basic OS setup was done, I had to compile the BlueZ Linux Bluetooth stack which, proved to be a rather simple matter of installing the compile-time pre-requisites through RasPi package management (apt-get):

libusb-dev libdbus-1-dev libglib2.0-dev libudev-dev libical-dev libreadline-dev

and then configuring the source and building it. I used BlueZ version 5.10, which was the latest official version at the time.

After compiling the BlueZ Bluetooth protocol stack it was time to test if the Bluetooth dongles I had bought were working. Running hciconfig I noticed that the Telewell dongle was being detected, while the Asus wasn’t. During further testing it became clear that the Pi’s signal wasn’t being picked up by a demo app on an iPad. More research showed that though the TeleWell dongle did support Bluetooth 4.0 LE, it didn’t officially support the required proximity profile. After yet more googling, I found that the Asus dongle seemed to include the same Broadcom BCM20702 (A0) chipset as the IOGEAR one used by Radius Networks. However, the dongle wasn’t being detected because Asus has a vendor specific USB device ID for it that wasn’t known by the Raspbian kernel. The solution for this issue was to add the device ID in the kernel source (credit to linux-bluetooth mailing list), rebuild and install the newly built kernel.

Compiling the Linux kernel

Compiling the Linux kernel is a very time-consuming task, particularly on a resource constrained environment such as the Raspberry Pi. Fortunately, it’s possible to set up a cross-compiler environment on a more powerful system to speed things up. Again, the Embedded Linux Wiki proved to be a great resource with this task (RPi Kernel Compilation). Though, you can install a x86 / Mac OS X ⇒ ARM / Linux cross-compilation environment, I thought I’d try to go for an, arguably, more mainstream choice and set up my cross-compiler environment on a Linux Mint 15 Xfce guest virtual machine. The required packages were again available via apt-get:

gcc-arm-linux-gnueabi libncurses-dev

Additionally, Git had to be installed to be able to fetch the Raspberry Pi kernel source, build tools and firmware.

Kernel compilation for Raspberry Pi is a bit different than compiling a standard kernel for a server class machine, and is done using the following procedure:

  • fetch Raspberry Pi Linux kernel source, compiler tools and firmware
  • set up environment variables for the compilation
  • configure the kernel source (using a config file from the Pi)
  • build kernel and modules
  • package up the kernel, modules and firmware
  • deploy kernel, modules and firmware to Pi
  • reboot Pi (with fingers crossed)

Fortunately, I was able to produce and deploy a working kernel build and the Pi booted up with the fresh kernel. This time, hciconfig showed that the kernel and the Bluetooth stack were able to detect the Asus Bluetooth dongle.

Testing the beacon

There are a couple of mobile applications available that can be used for verifying that a beacon is functioning properly. iBeacon Locate is available in Google play store for Android 4.3+ users and Beacon Toolkit in Apple App Store for iOS 7.0+ users. Both applications require that the device supports Bluetooth BLE.

Sample code demonstrating how to read beacon signals is also available from multiple sources, including the AirLocate application, Apple WWDC 2013 and android-ibeacon-service. AirLocate e.g. is a complete sample app that you can build, modify and install on your iOS device, provided that you have an Apple iOS developer certificate.

I was able to pick up the Raspberry Pi’s beacon signal using AirLocate on iOS and iBeacon Locate on Android. There were some problems with the sample apps, when I configured the beacon to use a custom device UUID instead of an Apple demo UUID: the demo apps failed to detect the beacon when using custom a UUID. A custom application was, however, able to detect the beacon also with a custom UUID.

beacon ranging and proximity sensing alert

beacon ranging and proximity sensing alert

Next, I’ll have to experiment a bit more with proximity sensing accuracy, notification event delay and how different physical space topologies and interference affect proximity sensing. Also, a demo app should be developed to simulate proximity alerts in the context of a real use case.

Init scripts

Once Pi beacon was running fine, the last thing was to make the beacon automatically turn on at boot. For this I studied the existing init scripts to learn what kind of metadata is required to be able to manage the “service” using update-rc.d command. I also separated the configuration parameters in a separate file (/etc/default/ibeacon.conf) from the init script.

Final thoughts

iBeacon is based on standard Bluetooth 4.0 LE proximity profile that is not Apple proprietary technology. The technology works currently on newer iOS and Android devices, and these operating systems include APIs required to detect beacon signals. iBeacon has many interesting use cases for indoor positioning, including but not limited to, retail shopping and analytics. It’s still an emerging technology and I’m sure we’ll see it applied in many unexpected contexts in the future.

The Raspberry Pi is a device platform with intriguing possibilities. Traditionally embedded software development and server-side software development have required very different skill sets. Raspberry Pi demonstrates how the embedded platforms have evolved significantly during the last few years in terms of hardware capacity as well as software platform maturity. Consider e.g. the following points:

  • the system has “standard” support for Java as well as many other popular language runtimes, making it possible to develop software using desktop or server-side development skills and tools
  • hardware capacity is gaining on low-end server-class machines in terms of memory, CPU and storage capacity
  • the OS and other software can be updated over the network using update manager tools instead of having to flash firmware
  • it’s based on a standard Linux environment
  • the system supports remote management over secure terminal connection

With the high end embedded platforms, such as the Raspberry Pi, similar tools and techniques can be used for developing software on both platforms, so from a development and maintenance point of view these platforms are converging. This creates fascinating opportunities for software developers who don’t have a traditional embedded development skill set.

Practical NoSQL experiences with Apache Cassandra

Most of the backend systems I’ve worked with over the years have employed relational database storage in some role. Despite many application developers complaining about RDBMS performance, I’ve found that with good design and implementation a relational database can actually scale a lot further than developers think. Often software developers who don’t really understand relational databases tend to blame the database for being a performance bottleneck, even if the root cause could actually be traced to bad design and implementation.

That said, there are limits to RDBMS scalability and it can become a serious issue with massive transaction and data volumes. A common workaround is to partitioning application data based on a selected criteria (functional area and/or selected property of entities within functional area) and then distributing data across database server nodes. Such partitioning must usually be done at the expense of relaxing consistency. There are also plenty of other use cases for which relational databases in general, or the ones that are available to you, aren’t without problems.

Load-balancing and failover are sometimes difficult to achieve even on a smaller scale with relational databases, especially if you don’t have the option to license a commercial database clustering option. And even if you can, there are limits to scalability. People tend to workaround these problems with master-slave database configurations, but they can be difficult to set up and manage. This sort of configuration will also impact data consistency if master-slave replication is not synchronous, as is often the case.

When an application also requires a dynamic or open-ended data model, people usually start looking into NoSQL storage solutions.

This was the path of design reasoning for a project I’m currently working on. I’ve been using Apache Cassandra (v1.2) in a development project for a few months now. NoSQL databases come in very different forms and Cassandra is often characterized as a “column-oriented” or “wide-row” database. With the introduction of the Cassandra Query Language (CQL) Cassandra now supports declaring schema and typing for your data model. For the application developer, this feature brings the Cassandra data model somewhat closer to the relational (relations and tuples) model.

NoSQL design goals and the CAP theorem

NoSQL and relational databases have very different design goals. It’s important for application developers to understand these goals because in practice they guide and dictate the set of feasible product features.

ACID transaction guarantees provide a strong consistency model around which web applications have traditionally been designed. When building Internet-scale systems developers came to realize that strong consistency guarantees come at a cost. This was formulated in Brewer’s CAP theorem, which in its original form stated that a distributed system can only achieve two of the following properties:

  • consistency (C) equivalent to having a single up-to-date copy of the data;
  • high availability (A) of that data (for updates); and
  • tolerance to network partitions (P).

The “2 of 3” formulation was later revised somewhat by Brewer, but this realization led developers to consider using alternative consistency models, such as “Basically Available, Soft state, Eventual consistency” or BASE, in order to trade off strong consistency guarantees for availability and partition tolerance, but also scalability. Promoting availability over consistency became a key design tenet for many NoSQL databases. Other common design goals for NoSQL databases include high performance, horizontal scalability, simplicity and schema flexibility. These design goals were also shared by Cassandra founders, but it was also designed to be CAP-aware, meaning the developer is allowed to tune the tradeoff between consistency and latency.

BASE is a consistency model for distributed systems that does not require a NoSQL database. NoSQL databases that promote the BASE model also encourage applications to be designed around BASE. Designing a system that uses BASE consistency model can be challenging from technical perspective, but also because relaxing consistency guarantees will be visible to the users and requires a new way of thinking from the product owner, who traditionally are accustomed to thinking in terms of a strong consistency model.

Data access APIs and client libraries

One of the first things needed when starting to develop a Java database application is a database client library. With most RDBMS products this is straightforward: JDBC is the defacto low-level database access API, so you just download a JDBC driver for that particular database and configure your higher level data access API (e.g. JPA) to use that driver. You get to choose which higher level API to use, but there’s usually only a single JDBC driver vendor for a particular database product. Cassandra on the other hand currently has 9 different clients for Java developers. These clients provide different ways of managing data: some offer an object-relational -mapping API, some support CQL and others provide a lower level (e.g. Thrift based) APIs.

Data in Cassandra can be accessed and managed using an RPC-style (Thrift based) API, but Cassandra also has a very basic query language called CQL that resembles SQL syntactically to some extent, but in many cases the developer is required to have a much deeper knowledge of how the storage engine works below the hood than with relational databases. The Cassandra community recommended API to use for new projects using Cassandra 1.2 is CQL 3.

Since Cassandra is being actively developed, it’s important to pick a client whose development pace matches that of the server. Otherwise you won’t be able to leverage all the new server features in your application. Because Cassandra user community is still growing, it’s good to choose a client with an active user community and existing documentation. Astyanax client, developed by Netflix, currently seems to be the most widely used, production-ready and feature complete Java client for Cassandra. This client supports both Thrift and CQL 3 based APIs for accessing the data. DataStax, a company that provides commercial Cassandra offering and support, is also developing their own CQL 3 based Java driver, which recently came out of beta phase.

Differences from relational databases

Cassandra storage engine design goals are radically different from those of relational databases. These goals are inevitably reflected in the product and APIs, and IMO neither can nor should be hidden from the application developer. The CQL query language sets expectations for many developers and may make them assume they’re working with a relational database. Some important differences to take note of that may feel surprising from an RDBMS background include:

  • joins and subqueries are not supported. The database is optimized for key-oriented access and data access paths need to be designed in advance by denormalization. Apache Solr, Hive, Pig and similar solutions can be used to provide more advanced query and join functionality
  • no integrity constraints. Referential and other types of integrity constraint enforcement must be built into the application
  • no ACID transactions. Updates within a single row are atomic and isolated, but not across rows or across entities. Logical units of work may need be split and ordered differently than when using RDBMS. Applications should be designed should to be designed for eventual consistency
  • only indexed predicates can be used in query statements. An index is automatically created for row keys, indexes in other column values can be created as needed (except currently for collection typed columns). Composite indexes are not supported. Solr, Hive etc. can be used to address these limitations.
  • sort criteria needs to be designed ahead. Sort order selection is very limited. Rows are sorted by row key and columns by column name. These can’t be changed later. Solr, Hive etc. can be used to address these limitations.

Data model design is a process where developers will encounter other dissimilarities compared to the RDBMS world. For Cassandra, the recommended data modeling approach is the opposite of RDBMS: identify data access patterns, then the model data to support those access patterns. Data independence is not a primary goal and developers are expected to understand how the CQL data model maps to storage engine’s implementation data structures in order to make optimal use of Cassandra. (In practice, full data independence can be impossible to achieve with high data volume RDBMS applications as well). The database is optimized for key-oriented data access and data model must be denormalized. Some aspects of the application that can be easily modified or configured at runtime in relational databases are design time decisions with Cassandra, e.g. sorting.

A relational application data model typically stores entities of a single type per relation. The Cassandra storage engine does not require that rows in a column family contain the same set of columns. You can store data about entirely unrelated entities in a single column family (wide rows).

Row key, partition key and clustering column are data modeling concepts that are important to understand for the Cassandra application developer. The Cassandra storage engine uniquely identifies rows by row key and keys provide the primary row access path. A CQL single column primary key maps directly to a row key in the storage engine. In case of a composite primary key, the first part of the primary key is used as the row key and partition key. The remaining parts of a composite primary key are used as clustering columns.

Row key and column name, along with partitioner (and comparator) algorithm selection have important consequences for data partitioning, clustering, performance and consistency. Row key and partitioner control how data is distributed among nodes in the database cluster and ordered within a node. These parameters also determine whether range scanning and sorting is possible in the storage engine. Logical rows with the same partition key get stored as a single, physical wide row, on the same database node. Updates within a single storage engine row are atomic and isolated, but not across rows. This means that your data model design determines which updates can be performed atomically. Columns within a row are clustered and ordered by the clustering columns, which is particularly important when the data model includes wide rows (e.g. time-series data).

When troubleshooting issues with an application, it’s often very important to be able to study the data in the storage engine using ad-hoc queries. Though Cassandra does support ad-hoc CQL queries, the supported query types are more limited. Also, the database schema changes, data migration and data import typically require custom software development. On the other hand, schema evolution has traditionally been very challenging with RDBMS when large data volumes have been involved.

Cassandra supports secondary indexes, but applications are often designed to maintain separate column families that support looking up data based on a single or multiple secondary access criteria.

One of the interesting things I noticed about Cassandra was that it has really nice load-balance and failover clustering support that’s quite easy to setup. Failover works seamlessly and fast. Cassandra is also quite lightweight and effortless to set up. Data access and manipulation operation performance is extremely fast in Cassandra. The data model is schema-flexible and supports use cases for which RDMBS usually aren’t up to the task e.g. storing large amounts of time-series data with very high performance.


Cassandra is a highly available, Internet-scale NoSQL database with design goals that are very different from those of traditional relational databases. The differences between Cassandra and relational databases identified in this article should each be regarded as having pros and cons and be evaluated in the context of the your problem domain. Also, using NoSQL does not exclude the use of RDBMS – it’s quite common to have a hybrid architecture where each database type is used in different use cases according the their strengths.

When starting their first NoSQL project, developers are likely to enter new territory and have their first encounters with related concepts such as big data and eventual consistency. Relational databases are often associated with strong consistency, whereas NoSQL systems are associated with eventual consistency (even though the use of a certain type of database doesn’t imply a particular consistency model). When moving from the relational world and strong consistency to the NoSQL world the biggest mind shift may be in understanding and architecting an application for eventual consistency. Data modeling is another area where a new way of design thinking needs to be adopted.

Cassandra is a very interesting product with a wide range of use cases. I think it’s particularly well suited database option for the following use cases:

  • very large data volumes
  • very large user transaction volumes
  • high reliability requirements for data storage
  • dynamic data model. Data model may be semi structured and expected see significant changes over time
  • cross datacenter distribution

It is, however, very different from relational databases. In order to be able to make an informed design decision on whether to use Cassandra or not, a good way to learn more is to study the documentation carefully. Cassandra development is very fast paced, so many of the documents you may find could be outdated. There’s no substitute for hands-on experience, though, so you should do some prototyping and benchmarking as well.

Java VM Options

Some of the more frequently used Java HotSpot VM Options have been publicly documented by Oracle.

Of those flags the following are the ones I tend to find most useful for server-side Java applications:

  • -XX:+UseConcMarkSweepGC – select garbage collector type
  • -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$APP_HOME_DIR – when an out-of-memory error occurs, dump heap
  • -XX:OnOutOfMemoryError=<command_for_processing_out_of_memory_errors> – execute user configurable command when an out-of-memory error occurs
  • -XX:OnError=<command_for_processing_unexpected_fatal_errors> – execute user configurable command when an unexpected error occurs
  • -XX:+PrintGCDetails -XX:+PrintGCTimeStamps – increase garbase collection logging
  • -Xloggc:$APP_HOME_DIR/gc.log – log garbage collection information to a separate log file
  • -XX:-UseGCLogFileRotation -XX:GCLogFileSize=<max_file_size>M -XX:NumberOfGCLogFiles=<nb_of_files> – set up log rotation for garbage collection log

Occasionally, the following might also be helpful to gain more insight into what the GC is doing:

  • -XX:PrintHeapAtGC
  • -XX:PrintTenuringDistribution

Oracle documentation covers quite a few HotSpot VM flags, but every once in a while you bump into flags that aren’t on the list. So, you begin to wonder if there’re more JVM options out there.

And, in fact, there’s a lot more as documented by this blog entry: The most complete list of -XX options for Java JVM

That list is enough to make any JVM application developer stand in awe 🙂
There’s enough options to shoot yourself in the foot, but you should avoid the temptation of employing each and every option.

Most of the time it’s best to use only the one’s you know are needed. JVM options and their default values tend to change from release to release, so it’s easy to end up with a huge list of flags that suddenly may not be required at all with a new JVM release, or worse,  may even result in conflicting or erroneous behavior. The implications of some of the options, the GC one’s in particular, and the way that related options interact with each other may sometimes be difficult. Using a minimal list of options is easier to understand and maintain. Don’t just copy-paste options from other applications without really understanding what they’re for.

So, what’re your favorite options?

Mine would probably be -XX:SelfDestructTimer 😉
(No, there really is such an option and it does work)

Advanced PostgreSQL features

Nearly all the server-side development projects I’ve worked in over the years have stored at least part of their data in a relational database system. Even the systems using NoSQL storage have included a RDBMS in some form, whether local to a single subsystem or in a large role. In some cases the RDBMS systems have been proprietary, but increasingly they’ve been open source projects.

I’ve been using PostgreSQL in projects with RDBMS storage requirements on and off over the years. It has often impressed me with its depth of support for the SQL standard as well as wide range of non-standard extension features. With other widely used OSS RDBMS engines, I’ve often noticed that while the database claims to support feature X on paper, it only supports a subset. And then there’s a list of limitations you need to account for. Typically those limitations are something you wouldn’t expect, so they come as a surprise. Subqueries and joins are one such feature.

With PostgreSQL, I haven’t yet found a case where the database wouldn’t be able to handle a SQL standard subquery or join statement. Having an item such as comprehensive support for SQL subqueries and joins in a database product feature list may not look particularly appealing, it may even raise some suspicions. But from a developer point of view I find this “feature” a very important one, perhaps even one of PostgreSQL’s biggest selling points compared to some of it’s OSS competitors. PostgreSQL has many other advanced features that are interesting for application developers.

Common table expressions and hierarchic queries

Common table expressions or CTE is a handy standard SQL feature that allows you to split a query statement into distinct parts where results of each part will appear as a table, so you can reference the intermediate results in other parts of the statement, possibly several times. This can help make query statements more maintainable, but it also helps optimize queries in some cases, since CTE expressions are performed only once per statement execution.

In addition to allowing subquery factoring, CTE can process data as hierarchical. With small amounts of data and relatively shallow hierarchies you can implement hierarchical queries simply using joins, but this workaround may not be acceptable in all cases. A CTE hierarchical query makes it possible to process arbitrarily deep hierarchies with just one query.

Here’s an example how you can get a list of subtasks in an arbitrarily deep task tree along with path information for each task. Including path information will make it easier to build a graph representation on the receiving end.

mydb=> WITH RECURSIVE task_tree (id, name, parent_id, depth, path) AS (
mydb(>   SELECT id, name, parent_id, 1, ARRAY[]
mydb(>     FROM task t WHERE = 1
mydb(>   UNION ALL
mydb(>   SELECT,, s.parent_id, tt.depth + 1, path ||
mydb(>     FROM task s, task_tree tt WHERE s.parent_id =
mydb(> )
mydb-> SELECT * FROM task_tree
mydb-> ORDER BY depth ASC;

 id |    name     | parent_id | depth |    path
  1 | task1       |           |     1 | {1}
  4 | task1-1     |         1 |     2 | {1,4}
  5 | task1-2     |         1 |     2 | {1,5}
  6 | task1-3     |         1 |     2 | {1,6}
 13 | task1-3-1   |         6 |     3 | {1,6,13}
 14 | task1-3-1-1 |        13 |     4 | {1,6,13,14}
(6 rows)

PostgreSQL also has an extension module that allows handling hierarchic data in a less verbose, but non-standard manner:

mydb=> SELECT * FROM connectby('task', 'id', 'parent_id', '1', 0, '/')
mydb->  AS t(id BIGINT, parent_id BIGINT, level INT, branch TEXT);

 id | parent_id | level |  branch
  1 |           |     0 | 1
  4 |         1 |     1 | 1/4
  5 |         1 |     1 | 1/5
  6 |         1 |     1 | 1/6
 13 |         6 |     2 | 1/6/13
 14 |        13 |     3 | 1/6/13/14
(6 rows)

Aggregates and window functions

SQL GROUP BY lets you calculate aggregates of data over a single or multiple columns in a result set. The clause, however, can only aggregate over a single grouping, so it wouldn’t be possible e.g. to get calculate average salaries over departments and locations in a single query. Another limitation is that only aggregated data is returned and detail data is not preserved, so you can’t get both the detail records and aggregates using a single query.

Window functions make it possible to get both. Here’s how to calculate employee salary aggregates over several different groupings while preserving the detail records:

mydb=> SELECT depname, location, empno, salary,
mydb-> AVG(salary) OVER (PARTITION BY depname) avgdept,
mydb-> SUM(salary) OVER (PARTITION BY depname) sumdept,
mydb-> AVG(salary) OVER (PARTITION BY location) avgloc,
mydb-> RANK() OVER (PARTITION BY depname ORDER BY salary DESC, empno) AS pos
mydb-> FROM empsalary;

  depname  | location | empno | salary |        avgdept        | sumdept |        avgloc         | pos
 develop   | fi       |     8 |   6000 | 5020.0000000000000000 |   25100 | 4550.0000000000000000 |   1
 develop   | se       |    10 |   5200 | 5020.0000000000000000 |   25100 | 4950.0000000000000000 |   2
 develop   | fi       |    11 |   5200 | 5020.0000000000000000 |   25100 | 4550.0000000000000000 |   3
 develop   | fi       |     9 |   4500 | 5020.0000000000000000 |   25100 | 4550.0000000000000000 |   4
 develop   | fi       |     7 |   4200 | 5020.0000000000000000 |   25100 | 4550.0000000000000000 |   5
 personnel | fi       |     2 |   3900 | 3700.0000000000000000 |    7400 | 4550.0000000000000000 |   1
 personnel | fi       |     5 |   3500 | 3700.0000000000000000 |    7400 | 4550.0000000000000000 |   2
 sales     | se       |     1 |   5000 | 4866.6666666666666667 |   14600 | 4950.0000000000000000 |   1
 sales     | se       |     3 |   4800 | 4866.6666666666666667 |   14600 | 4950.0000000000000000 |   2
 sales     | se       |     4 |   4800 | 4866.6666666666666667 |   14600 | 4950.0000000000000000 |   3
(10 rows)

Like CTE, window functions is a feature specified in the SQL standard, but it’s not supported by all OSS or proprietary RDBMS systems.

Pivoting data

Sometimes it’s nice to be able to pivot data in a properly normalized data model, so that repeating groups of related entities are folded into parent entity as columns. This can be useful e.g. for reporting purposes and ad-hoc queries. PostgreSQL can handle pivoting data using subqueries and arrays like this:

mydb=> SELECT e.*,
mydb-> (SELECT ARRAY_TO_STRING(ARRAY(SELECT emp_phone_num FROM emp_phone p WHERE e.employee_id = p.emp_id), ',')) AS phones
mydb->  FROM employees AS e;

 employee_id | last_name | manager_id |         phones
         100 | King      |            |
         101 | Kochhar   |        100 | 555-123,555-234,555-345
         108 | Greenberg |        101 | 555-111
         205 | Higgins   |        101 | 555-914,555-222
         206 | Gietz     |        205 |
(13 rows)

Another way is to use the tablefunc extension module again:

mydb=> SELECT *
mydb-> FROM crosstab(
mydb(>   'SELECT emp_id, contact_type, contact FROM emp_sm_contact ORDER BY 1',
mydb(>   'SELECT DISTINCT contact_type FROM emp_sm_contact ORDER BY 1'
mydb(> )
mydb-> AS emp_sm_contact(emp_id BIGINT, "g+" TEXT, "linkedIn" TEXT, twitter TEXT);

 emp_id |    g+     | linkedIn  | twitter
    100 | bking     | b.king    | beking
    101 | kochhar.2 | kochhar.1 | kochhar
    200 |           |           | whalen
(3 rows)

Other advanced features

Other advanced PostgreSQL features that I find of interest to application developers include:

  • pattern matching. Regular expression matching is supported
  • geolocation queries. PostGIS extension adds comprehensive support for managing and querying geospatial data
  • partitioning
  • replication

Final thoughts

Problems faced in the transition phase of the software development process, when the software has been handed over from development to operations team, have prompted the need for closer collaboration between the teams in the form of devops culture. Similarly, application developers can’t remain ignorant of database design, implementation and optimization issues, and expect the DBAs to magically fix any data tier related design issues after the system has been implemented. Application developers need to learn how to leverage database systems effectively and take responsibility of database tier design to make transitions more seamless and production deployments succeed.

While standards based object relational mapping (ORM) technologies, such as Java persistence API, can be a great help to application developers, developers should be aware of what kind of queries the ORM implementation is generating, and in particular watch out for N+1 queries issues. With higher data, transaction volumes or data access patterns advanced database features will be a significant help in optimising the application.

To quote Oracle guru Tom Kyte: it’s a database, not a data dump. PostgreSQL is an advanced relational database engine and it has a lot of features that can help application developers implement new features faster and in a more efficient and scalable manner. As with all the tools: you should learn how to use it to get the most out of it.

More info