practicing techie

tech oriented notes to self and lessons learned

Scala client for Amazon Glacier

Amazon Glacier is a secure, durable, and extremely low-cost cloud storage service for data archiving and long-term backup. Glacier offers a cold storage data archival solution meaning the stored data is not available for immediate retrieval. You need to first request retrieval of the data and access time can vary from minutes to several hours, depending on the service level you choose.

While cold storage may feel cumbersome at first, it also has its advantages. No one will be able to accidentally modify important, archived files. It’s also possible to prevent deletion altogether, if needed.

Glacier is designed for use cases in which retrievals are infrequent and exceptional, and data will be stored for extended periods of time.

Concepts

If you haven’t worked with AWS services or Glacier before, it’s helpful to learn a few concepts first:

AWS Region – a named set of AWS resources in the same geographical area. Regions are completely isolated from each other, so when you view your resources, you’ll only see the resources tied to the region you’ve specified. In Glacier terms, stored data is bound to a particular region. Glacier storage prices vary across regions.

Vault – a container for storing data in the form of archives. An unlimited number of archives can be stored in a vault. Vaults and their contents are available only in the region where they were created. Access permissions, notifications and compliance controls are configured on vault level.

Archive – an archive can be any data such as a photo, video, or document and is a base unit of storage in Amazon Glacier. Each archive has a unique ID and an optional description. You may upload a single file as an archive, but your costs will be lower if you aggregate your data. Archives stored in Amazon Glacier are immutable, i.e. archives can be uploaded, downloaded and deleted, but cannot be edited or overwritten as with services like Dropbox.

(Vault) Inventory – AWS Console will show you a list of vaults, but not a list of vault contents, or inventory. Inventory needs to be separately requested for retrieval and fulfilling the request can take several hours.

Job – retrieving an archive or vault inventory (list of archives) are asynchronous operations in Amazon Glacier. You first initiate a job, and then download the job output after Amazon Glacier completes the job. With Amazon Glacier, your data retrieval requests are queued and jobs will take hours to complete.

Notification-configuration – because jobs take time to complete, Amazon Glacier supports a notification mechanism to notify you when a job is complete. You can configure a vault to send notification to an Amazon Simple Notification Service (Amazon SNS) topic when jobs complete. You can specify one SNS topic per vault in the notification configuration.

More info on these concepts can be found here: Amazon Glacier data model

Glacier client

Amazon Glacier can be used with the Amazon AWS CLI, but it’s quite clumsy to use, especially for archive uploads. Some backup tools support Glacier based storage, but the ones I came across didn’t seem to be suited for server side backups or programmatic use. Amazon AWS Console allows you to e.g. create and configure vaults, but archive operations are not supported.

Glacier client is a simple tool I created for working with Amazon Glacier. It was designed to support both interactive use (with Scala REPL), as well as programmatic use with Scala or Java. It’s well suited for server side use. Glacier client is built on Amazon AWS SDK for Java.

The code can be found on GitHub: https://github.com/marko-asplund/glacier-client

Setting up Glacier

AWS configuration

To use Glacier you need to first set up an AWS user account and permissions in AWS Console as follows:

  • Create user account in AWS IAM (identity and access management)
  • Grant the user the following permissions: AmazonGlacierFullAccess, Grant AmazonSQSFullAccess, AmazonSNSFullAccess
  • Create an access key

Some operations, such as creating a vault inventory or preparing an archive for download are performed asynchronously. Setting up notifications will be helpful with these operations. You need to enable notifications on the vault and configure a corresponding SNS topic in AWS Console.

Glacier client setup

Set up AWS credentials

The simplest way to setup Glacier client authorisation is to configure “default credential profiles file” as described in Working with AWS Credentials.

The profiles file is a text file with a simple file format, so you can set it up with just a text editor by following instructions on the above mentioned page.

You can also set up the file by using the AWS CLI invoking “aws configure” command to set up the default credentials file, as described in AWS CLI configure options.

Get glacier-client

To run glacier-client, you need to have Git, sbt and Java JRE installed.

git clone https://github.com/marko-asplund/glacier-client.git
cd glacier-client

Basic operation

Start up Scala REPL with sbt

~/glacier-backup-cli (master ✔) ᐅ sbt console

[info] Loading settings from plugins.sbt ...
[info] Loading project definition from /Users/marko/glacier-backup-cli/project
[info] Loading settings from build.sbt ...
[info] Set current project to glacier-backup-cli (in build file:/Users/marko/glacier-backup-cli/)
[info] Starting scala interpreter...
Welcome to Scala 2.11.11 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_151).
Type in expressions for evaluation. Or try :help.

List the names of available AWS regions

scala> fi.markoa.glacier.GlacierClient.regions
res0: Array[String] = Array(us-gov-west-1, us-east-1, us-east-2, us-west-1, us-west-2, eu-west-1, eu-west-2, eu-central-1, ap-south-1, ap-southeast-1, ap-southeast-2, ap-northeast-1, ap-northeast-2, sa-east-1, cn-north-1, ca-central-1)

Create a Glacier client that connects to the us-west-2 region

scala> val c = fi.markoa.glacier.GlacierClient("us-west-2")
c: fi.markoa.glacier.GlacierClient = fi.markoa.glacier.GlacierClient@11b6e34a

Create new vault. The ID (or ARN) for the newly created vault is returned.

scala> c.createVault("test-vault-1")
res1: String = /429963740182/vaults/test-vault-1

List all vaults in the region. A sequence of vault objects is returned, in this case it includes just the vault we created above. Please note that with vault operations the results are visible immediately.

scala> c.listVaults
res2: Seq[fi.markoa.glacier.Vault] = ArrayBuffer(Vault(arn:aws:glacier:us-west-2:429963740182:vaults/test-vault-1,test-vault-1,2017-11-19T08:18:38.990Z,None,0,0))

Now we’re ready to upload an archive into the vault:

scala> c.uploadArchive("test-vault-1", "my backup archive", "my-backup.zip")
TransferStarted: transfer started
TransferProgress: transfer progress: 5% (bytes: 516096)
TransferProgress: transfer progress: 10% (bytes: 1024000)
TransferProgress: transfer progress: 15% (bytes: 1540096)
TransferProgress: transfer progress: 20% (bytes: 2048000)
TransferProgress: transfer progress: 25% (bytes: 2564096)
TransferProgress: transfer progress: 30% (bytes: 3072000)
...
TransferProgress: transfer progress: 90% (bytes: 9216000)
TransferProgress: transfer progress: 95% (bytes: 9732096)
TransferProgress: transfer progress: 100% (bytes: 10240000)
TransferCompleted: transfer completed
res3: fi.markoa.glacier.Archive = Archive(WREjqj2BItYhI5BGV7mdJGsDl3oztPvpvVh_hngm5SWqJkOd5jnLipLyYy2KkM74-3mkt85nUjI4a_hcQZhtLnQF03K0sv2Bc97BYEwYQ7M4O_lmtgrCTuGCyAEEiuQmCFfRSnBkTw,Some(my-backup.zip),0c5dc86251d157e29cfadb04ac615426600a4e1177a8ac2c1134d895378b3acd,10240000,Some(my backup archive))

Note that Glacier doesn’t maintain an up-to-date list vault contents – a list of contents needs to be requested explicitly and preparing it can take a very long time. For this reason Glacier client stores a local catalogue of archives per vault. Vault contents can be listed as follows:

scala> c.catListArchives("test-vault-1")
res4: Seq[fi.markoa.glacier.Archive] = ArraySeq(Archive(WREjqj2BItYhI5BGV7mdJGsDl3oztPvpvVh_hngm5SWqJkOd5jnLipLyYy2KkM74-3mkt85nUjI4a_hcQZhtLnQF03K0sv2Bc97BYEwYQ7M4O_lmtgrCTuGCyAEEiuQmCFfRSnBkTw,Some(my-backup.zip),0c5dc86251d157e29cfadb04ac615426600a4e1177a8ac2c1134d895378b3acd,10240000,Some(my backup archive)))

Archives need to be prepared prior to their retrieval and preparation can take several hours. For this reason it’s often more convenient to retrieve them asynchronously: 1) you request archive retrieval and after Glacier has finished preparing archive you can 2) download it.

scala> c.prepareArchiveRetrieval("test-vault-1", "WREjqj2BItYhI5BGV7mdJGsDl3oztPvpvVh_hngm5SWqJkOd5jnLipLyYy2KkM74-3mkt85nUjI4a_hcQZhtLnQF03K0sv2Bc97BYEwYQ7M4O_lmtgrCTuGCyAEEiuQmCFfRSnBkTw")
res1: Option[String] = Some(h479o4kxdawFsho0POzQAznw6e6beampFAIBYuI7s41O_HmzqqWsg2qk2vL2Lw_4MOsI1VFarvokz7NXczBq0CrwPKzv)

Archive retrieval is added in the vault’s list of jobs. You can list unfinished jobs as follows:

scala> c.listJobs("test-vault-1")
res4: Seq[fi.markoa.glacier.Job] = ArrayBuffer(Job(h479o4kxdawFsho0POzQAznw6e6beampFAIBYuI7s41O_HmzqqWsg2qk2vL2Lw_4MOsI1VFarvokz7NXczBq0CrwPKzv,arn:aws:glacier:us-west-2:429963740182:vaults/test-vault-1,ArchiveRetrieval,null,2017-11-19T09:00:34.339Z,InProgress,null,None,Some(WREjqj2BItYhI5BGV7mdJGsDl3oztPvpvVh_hngm5SWqJkOd5jnLipLyYy2KkM74-3mkt85nUjI4a_hcQZhtLnQF03K0sv2Bc97BYEwYQ7M4O_lmtgrCTuGCyAEEiuQmCFfRSnBkTw)))

Notice the InProgress status. Once archive preparation has been finished the job list will look something like this:

scala> c.listJobs("test-vault-1")
res8: Seq[fi.markoa.glacier.Job] = ArrayBuffer(Job(h479o4kxdawFsho0POzQAznw6e6beampFAIBYuI7s41O_HmzqqWsg2qk2vL2Lw_4MOsI1VFarvokz7NXczBq0CrwPKzv,arn:aws:glacier:us-west-2:429963740182:vaults/test-vault-1,ArchiveRetrieval,null,2017-11-19T09:00:34.339Z,Succeeded,Succeeded,Some(2017-11-19T12:52:38.363Z),Some(WREjqj2BItYhI5BGV7mdJGsDl3oztPvpvVh_hngm5SWqJkOd5jnLipLyYy2KkM74-3mkt85nUjI4a_hcQZhtLnQF03K0sv2Bc97BYEwYQ7M4O_lmtgrCTuGCyAEEiuQmCFfRSnBkTw)))

Setting up notifications relieves you from having to periodically poll job completion status and instead receive notifications. Notifications can be set up via AWS Console.

A prepared archive can then be downloaded from Glacier using the retrieval job ID:

scala> c.downloadPreparedArchive("test-vault-1", "h479o4kxdawFsho0POzQAznw6e6beampFAIBYuI7s41O_HmzqqWsg2qk2vL2Lw_4MOsI1VFarvokz7NXczBq0CrwPKzv", "my-backup-restored.zip")
TransferStarted: transfer started
TransferProgress: transfer progress: 5% (bytes: 520869)
TransferProgress: transfer progress: 10% (bytes: 1025701)
TransferProgress: transfer progress: 15% (bytes: 1547941)
TransferProgress: transfer progress: 20% (bytes: 2052773)
TransferProgress: transfer progress: 25% (bytes: 2575013)
TransferProgress: transfer progress: 30% (bytes: 3079845)
...
TransferProgress: transfer progress: 90% (bytes: 9228965)
TransferProgress: transfer progress: 95% (bytes: 9736869)
TransferProgress: transfer progress: 100% (bytes: 10240000)
TransferCompleted: transfer completed

That’s it for the basic operations!

Some other tasks Glacier client let’s you do include delete vault, request a vault inventories (list of archives a vault contains), download inventories and delete archives.

Automating TLS certificate issuance with OpenSSL

Transport layer security (TLS) has long been the way to provide privacy, integrity and optionally authentication to protect TCP/IP traffic on the internet. TLS relies on public key infrastructure and trusted third-parties (Certificate Authority) to vouch for communicating parties correctness, which can complicate its use for the impatient software engineer.

Acquiring a X.509 server certificate from an established Certificate Authority (CA) is usually somewhat of a hassle. Certificate Authorities vouch for the legitimacy of the server certificate and the vouched party’s right to use it. In order to gain a level of trust in the vouched party Certificate Authorities use different kinds of authentication methods, which from a developer perspective add to the delivery delay, cost and effort.

As a result, for non-production or internal uses self-signed certificates are often used instead third-party signed certificates. Self-signed certificates can sometimes be good for ad-hoc use, but initially no shared chain of trust exists between the server and its clients. This means that for each client you either have to disable certificate validation or make the certificate trusted.

If you find yourself generating lots of self-signed certificates, a better solution may be to use a CA with fast turnaround time. These are usually also ones with an automated authentication process, and hence lower cost. Examples include commercial operators such as gandi.net, as well as non-profits like Let’s Encrypt. Let’s Encrypt also allows certificate requests to be fulfilled using an automated process. It’s a security vs. fast turnaround / ease-of-use tradeoff: faster turnaround usually translates to less scrutiny in the authentication process.

Customer/partner-facing systems often warrant using a certificate from an established CA, but sometimes, particularly for internal purposes, when fast turnaround time is needed, as well as control over the certificate issuance process in order to automate it, the best solution may be to run your own CA. Running your own certificate authority is not difficult, technically. Keeping the system secure, on the other hand, will be hard work.

The big risk is that if the private key associated with your CA certificate is compromised, a malicious party can create and sign new certificates with your CA certificate. Also, if a private key associated with a certificate is compromised, a malicious party can intercept communication to/from the certificate owner or impersonate the owner. When a software-based crypto implementation is used, your best bet of securing your keys is applying operating system hardening practices. The full data lifecycle (including backups!) should be considered when planning to secure your keys. For an added level of security, specialised crypto hardware are able to store private keys in a secure manner.

OpenSSL based Certificate Authority

The OpenSSL project provides a basic tool for running your own CA, in the form of the ‘openssl ca’ command. OpenSSL also includes the CA.pl wrapper script for making certificate management easier to use.

While Linux distributions and macOS ship with OpenSSL, it’s only newer versions of both OpenSSL and CA.pl that support generating certificates using just command line tools and with no user interaction during the process. The simplest way to update the script is to download it from the project’s GitHub repository: https://github.com/openssl/openssl/blob/master/apps/CA.pl.in

Make sure you get commit 022696cab014ffb94c8ef0bfc79c8955b9970eb6 or newer (not part of a released version at the time of this writing). Older version of openssl command should suffice. (It seems that ‘openssl ca’ has supported the required passin parameter for a while, but this was only documented in commit 16e1b281b2e16ff6deb8ca431dfc5743de31d0e2.)

With CA.pl you can automate certificate generation with a simple shell script like the following:

cn=$1
pass=`openssl rand -hex 18`
pushd `dirname $CA_ROOT`

echo "issuing certificate for $cn"

# create certificate request
SUBJECT="/C=FI/L=Helsinki/O=Practicing techie/CN=$cn/emailAddress=info@practicingtechie.com"
OPENSSL=$OPENSSL $CAPL -newreq -extra-req "-passout pass:$pass -subj '$SUBJECT'"

# sign certificate request
OPENSSL=$OPENSSL $CAPL -sign -extra-ca "-passin file:$CA_ROOT/ca-cert-passphrase.txt -batch"
if [ "$?" -ne 0 ]; then
  echo "FATAL: failed to sign, aborting"
  exit 1
fi

# export private key unencrypted, archive files
$OPENSSL rsa -in newkey.pem -out newkey-nodes.pem -passin pass:$pass
mkdir -p $CERT_BASE/$cn
mv new*.pem $CERT_BASE/$cn

See my issue_certificate.sh Gist for a more details.

Conclusions

When you need to secure TCP/IP traffic in a non-production environment or non-customer facing system, running your own CA can be a very viable solution. Using certificates signed by a third-party can prove burdensome and costly when your architecture is based on microservices and TLS is being used to secure the traffic. In this case running your own CA to create certificates can help. Other uses include securing traffic between system management and monitoring software and remote agents (e.g. Nagios, JMX etc.).

OpenSSL tools can make automating certificate creation and signing quite easy. Keeping the CA keys secure will require considerable effort, so you need to carefully weigh the risks.

Extending a LVM based file system

A file system running out of space doesn’t always have to be a major catastrophe. The problem could be quite easily resolved, if you happen to be running Linux and were foresighted enough to create your file system on a LVM volume. If you also happen to have extra storage space available, the file system could likely be extended in a fairly straightforward manner. Despite the procedure’s relative simplicity, it’s quite easy to forget the details if you perform it very infrequently. So, this time I’m writing this down.

In order to extend a LVM based file system, a new block device needs to be allocated for LVM. This can be done by adding a new physical disk, partitioning an existing disk or adding a virtual disk. Typically, reboot is required to get Linux to detect a new physical or virtual disk.

Once the new block device is detected, you need to identify its device path. This information can be obtained e.g. by using lsblk or sfdisk commands. If you’ve ran out of disk space you know which file system is causing the problems, but you also need to determine the LVM logical volume and volume group underlying that file system. These can be determined using output from df and lvdisplay (pvscan, vgscan and lvscan may also sometimes be helpful). In the snippet below we assume the block device path to be /dev/sdb, volume group ubuntu-vg and logical volume /dev/ubuntu-vg/root.

After you learning these parameters the LVM based file system can be extended as follows:


# initialize physical device to be used as a physical volume with LVM
# (below we assume the newly added block device was "/dev/sdb")
pvcreate /dev/sdb
# add physical volume to volume group
vgextend ubuntu-vg /dev/sdb
# extend logical volume and file system
lvextend -r /dev/ubuntu-vg/root /dev/sdb

Recently, I bumped into a tool called Asciinema, that allows recording and playback of terminal sessions. This differs from videos in that the terminal session’s input and output is recorded in a textual form, allowing you to e.g. copy-paste commands during playback. In order to try out the tool, I made a recording of extending a file system.

Microservices vs. SOA at QCon New York

Based on the steady flow of recently published technology articles and blog posts, as well as upcoming tech conference program session titles, it seems evident that the buzz on microservices, doesn’t show signs of tailing off in the near future.

As a concept, microservices based architecture, or just microservices, is quite old by IT standards, and it first came into ThoughtWorks’ Technology Radar already in 2012. Last year when I visited QCon New York microservices architecture was one of the major themes with one track and lots of sessions dedicated to the topic. Microservices also played roles of varying importance also in other track sessions.

At QCon, microservices was often characterized as “SOA done right” or “SOA without the ESB” by speakers and attendees alike. Many of the microservices track sessions focused on various practical issues encountered when implementing microservices including:

  • service isolation vs. sharing
  • managing dependencies
  • importance of durable logical data model and API design
  • managing and validating interface changes
  • polyglot service API clients etc.
  • test automation
  • service API documentation

These topics were covered by plenty of speakers from different angles. Michael Bryzek of Gilt gave a good discussion of these issues in his talk on Microservices and the art of taming the Dependency Hell Monster.

In the Microservice open space much of the discussion revolved around defining the boundaries of a microservice: to what extent should a microservice be an entity isolated from its environment? A microservice can form dependencies to its environment through a wide variety of aspects, including logical data model, data model implementation, logical and physical runtime platforms, data storage systems and other microservices. To what degree should a microservice be allowed to depend on its environment or other microservices? How to balance aspects of reuse and isolation?

Microservices based architecture is often contrasted with SOA, with SOA being scorned at and ridiculed as the old way of doing things. Nevertheless, many of the problems and solutions related to microservices based architectures remind me of SOA literature of past years, so I decided to dust off some of my Thomas Erl SOA books to brush up my memory. According to Erl, service-orientation is a design paradigm that adheres to the following principles:

Standardized service contract – “Services within the same service inventory are in compliance with the same contract design standards.
The same design standards are applied to related services. In particular, this means the data models are coherent across related services and some common type definitions can be shared. Standardization also applies to other aspects such as policy and SLA.

Service loose coupling – “Service contracts impose low consumer coupling requirements and are themselves decoupled from their surrounding environment.
Service consumers are necessarily dependent on the contract that a service provides. This principle states that the API surface area, through which consumers interact with a service, should be minimized and explicitly defined, so that consumers can withstand evolution of services. Services are also decoupled from their runtime environment.

Service abstraction – “Service contracts only contain essential information and information about services is limited to what is published in service contracts.
A service should be considered as a black box that provides a work specified by its service contract for consumers and hides other meta data and implementation details. Service contract should expose only information essential for accessing the service.

Service reusability – “Services contain and express agnostic logic and can be positioned as reusable enterprise resources.
Organizational project delivery processes should prefer to implement business logic as reusable services and to use existing services over re-implementing business logic

Service autonomy – “Services exercise a high level of control over their underlying runtime execution environment.
Services that depend on shared virtual or physical resources (e.g. database, other information systems or services, physical hardware) necessarily lose some of their autonomy, which can lead to unpredictable runtime behaviour. High level of autonomy can be costly and tradeoffs often need to be made.

Service statelessness – “Services minimize resource consumption by deferring the management of state information when necessary.
State data management consumes system resources and can result in a significant resource burden … Therefore, the temporary delegation and deferral of state management can increase service scalability and support a wider range of reuse and recomposition over time.

Service discoverability – “Services are supplemented with communicative meta data by which they can be effectively discovered and interpreted.
Additional metadata is used to describe services to help humans looking to find existing services to reuse in their project delivery.

Service composability – “Services are effective composition participants, regardless of the size and complexity of the composition.
Services should be designed and implemented in a manner that does not limit them composing them into new services, in different layers

Most of these principles are quite general, technology agnostic and also very much relevant in a microservices based architecture. One thing they share is that at their core both are architectural styles and not technologies. Still, microservices and SOA are perceived in quite a different manner. SOA started gaining momentum in the era of the enterprise architect, and arguably, the voice of the practitioners (i.e. developers) was lost as SOA got hijacked by software tool industry heavyweights. Nowadays SOA is often regarded as enterprisey and associated with heavy tooling and onerous processes.

Microservices architecture on the other hand is associated with promoting a much leaner approach overall. There’re also some interesting relations that microservices has with the structure of the development organization. Many leading internet companies are organizing their development work around agile, small (“2 pizza” sized), autonomous teams with well-defined end-to-end responsibilities, in order to scale their organization better. According to Conway’s law organizations “are constrained to produce designs which are copies of the communication structures of these organizations“. In that sense,  microservices architecture reflects organizational structures of the development teams as well as the current best practice technological means of achieving team autonomy.

I like the pragmatism and the strive for lean practices in contemporary software development in general, and in implementing microservices in particular. However, the baby often seems to get thrown out with the bathwater when our industry is too busy reinventing things. A sign of this is that microservices developers clearly are rediscovering many of the insights that SOA architects and developers learned and documented long ago.

Web Framework Benchmarks – Round 10

TechEmpower Web Framework Benchmarks is a collaborative, open web framework benchmarking project that I blogged about last year (see An open web application framework benchmark). Since the last write-up a new benchmark round has been run with lots of new test implementations. Among others, round 10 benchmark run includes support for the Cassandra NoSQL database, as well as a new Java-based test implementation leveraging Servlet 3 asynchronous processing.

I did some non-scientific analysis of TFB round 10 results and here’re a few observations on the Peak environment, “best” concurrency level results. According to the documentation round 10 and 9 Peak hosting hardware specifications are the same, so the results should be directly comparable.

JSON serialization test

lwan has overtaken previous run’s #1 performer in the JSON serialization test by a large margin, more than doubling round 9 winner’s throughput. cpoll_cppsp, last round’s winner, also improved its throughput by nearly 700k req/s. Six contestants from round 9 have made it into top 10 in round 10, with all of them being able to improve their performance. OpenResty managed to more than double its round 9 throughput.

My new entrant, servlet3-cass, has taken 5th place in this category, which made it the second best performing Java implementation, losing only slightly to Netty. Based on this, I would conclude that Servlet container overhead doesn’t currently seem to impose a significant bottleneck in this test. It’s interesting to note that there’s about 400 req/s difference between servlet and servlet3-cass performance, though the implementations are very similar. Both have been implemented on the Servlet API, run on the Resin servlet container and use Jackson for JSON processing. The only difference between the two seems to be Jackson JSON library version numbers.

The top 10 test implementations were based on the following programming languages: C, C++, Java and Lua. Of the top 10 performers 5 are Java based, and Ur and Go got dropped out of the top 10 from the previous round.

Single database query test

Since round 9 the #1 performer’s throughput has improved only marginally in this test category. C++ based test implementations dominate the top 4 spots with Lua, Ur and Java also in top 10. Only test implementations based on relational databases made it to top 10, leaving out MongoDB and Cassandra.

Some test implementations seem to have multiple entries in the results table (e.g. cpoll_cppsp-postgres, undertow and activeweb) for this test. I guess this is because the test was run multiple times, but I would’ve expected each framework to be listed only once in the best results table. Also, what’s the exact test methodology like in these cases? Under which conditions does a framework get multiple tries? Some test implementations seem to have experienced performance degradation since round 9. One such example is OpenResty. It would be interesting to analyze the cause further. Was the degradation caused by e.g. changes in the test implementation, infrastructure setup or changes in test methodology?

I was a bit disappointed with the performance of the servlet3-cass test implementation in the database tests. Unfortunately, there’s no resource usage or profiling data available for the test run, and as I don’t have access to a real performance test environment myself, so it’s difficult to analyse potential bottlenecks further.

Because servlet3-cass was able to reach 14 times the single database query throughput with the JSON test, I would reason that no inherent app server framework bottleneck gets reached in the database test. By comparison, the Java and servlet API based test implementation that uses MySQL database, was able to achieve nearly twice the throughput. Two notable implementation differences come to mind between the two test implementations: a) different datastore and b) use of different servlet API (synchronous vs. asynchronous).

Since, Cassandra is in general very fast in reading and writing data by key, there shouldn’t be any inherent reason why it should perform worse than the MySQL based implementation. Some aspects I need to analyse in more detail: 1) Cassandra server configuration 2) Resin servlet container asynchronous processing support and/or configuration 3) check that thread pool size for Resin asynchronous processing and Cassandra driver is optimized for the test server hardware concurrency level. During the test implementation I ran into a couple of bugs in servlet container asynchronous processing support (both Resin + Tomcat. Kudos to both teams for fixing them!), which proved that asynchronous processing is a tricky issue, not only for application, but also app server developers. In order to evaluate whether some aspect of asynchronous processing has a negative performance impact it would be interesting to implement the test using traditional synchronous servlet API.

Multiple database queries test

The best two implementations in this test, start and stream, were able to hold on to their round 9 positions, while both improved throughput by more than 30 %. The top 10 test implementations are based on Dart, Java, C++ and Clojure.

MongoDB based implementations have claimed a triple win in this test with the 4 implementations that follow being PostgreSQL based.

Fortunes test

Five new test implementations made it to top 10 compared to the previous round. The top 10 implementations were implemented in C++, Ur and Java. Again, both undertow and undertow edge seem to be included three times in round 9 data table, which looks a bit strange for the best results table.

Database updates test

Nodejs based implementations that occupied the top 3 places on round 9, have seen their lead overtaken by C++ based implementations. Instead of raw processing efficiency this test type is expected to primarily stress the backend datastore, datastore driver and parallelism of the framework, so C++ as a language should not have an inherent advantage, as such. The top 10 implementation languages in this test are: C++, JavaScript, Scala, Dart and Perl.

Having done quite a bit of Perl programming years ago, I was quite fascinated to see an implementation based on the old scripting workhorse make it to top 10.

In this test there’s also been about 10 % performance degradation for the top 5 implementations.

On round 9 the top 10 implementations were all based on MySQL whereas on round 10 three PostgreSQL based implementations have entered the top 10.

Plaintext test

The top 10 in this category is occupied by test implementations based on C++, C, Java and Scala. While there’s been only a modest performance increase for the #1 position, the top place has been overtaken by ulib. Four new entrants have made their way into top 10. Netty has gained over 40% throughput increase compared to the previous round.

The development process

The TechEmpower team has again put a huge effort into the FrameworkBenchmarks project and done a great job at it! Still, as with everything, there’s always room for improvement and, from a casual test implementation contributor perspective, the following issues should be considered:

  • more predictable release schedule. I realize the TE team needs to prioritize actual customer projects over pro bono style work. However, many test implementers are in a similar position, so having a predictable release plan would help contributors schedule their work. It seems that at least part of the delays may have been caused by scope-creep, so a stricter release policy could also help make release schedule more predictable.
  • release phase change notifications. Not all contributors are able to follow the TFB Google group discussion. Having a notification mechanism for informing about release schedule phase changes could be helpful for many contributors. This could be as simple as a Github issue to subscribe to or a separate Google group for announcements.
  • enable test implementation logging during preview runs (functional test run). I found it very difficult to troubleshoot infrastructure deployment automation related bugs during the preview runs. Allowing server side logging for preview runs could be a huge help for test implementers troubleshooting their code.
  • provide resource consumption statistics for preview runs. The only performance metric a test implementer currently gets is the throughput figures per test and concurrency level. This gives the implementer very little to go by, in terms of optimization feedback.
  • provide access to app server, DB server etc. logs. Additional infrastructure logs would help in identifying functional and performance related issues.
  • performance related data gathering could even be taken further by running test implementations using a profiling tool during preview runs.

Benchmark results visualization tool updated

I’ve updated a TFB projects results visualization tool I created to also render round 10 results. As before, the tool can be found here:

http://tfb-kippo.rhcloud.com/

A new Scala + ElasticSearch based test implementation

The project team has been hard at work and they’ve published a tentative release schedule for round 11 results, which are aimed at being released by the end of June 2015. Since round 10, I’ve contributed support for ElasticSearch search server as well as a Scala / Spray based test implementation. Both have been merged into the project repository, so the results should be available when round 11 data gets published. The code for the new test implementation can be found here:

https://github.com/TechEmpower/FrameworkBenchmarks/tree/master/frameworks/Scala/spray-es

Again, great job TFB development team and contributors – keep up the good work!
Looking forward to seeing round 11 results!

Scala def macros and Java interoperability

Most of the time Scala-Java interoperability works pretty well from a Scala application developer perspective: you can use a wealth of Java class libraries in your Scala programs with fairly little effort. Simply using JavaConversions and maybe a few custom wrappers usually gets you pretty far. Sure, there can be some friction resulting from the use of different programming paradigms and mutable data structures, but if you’re a pragmatist re-using Java code in Scala is nevertheless quite feasible. Scala version 2.10 saw the introduction of an experimental language feature called macros. Specification work on Scala macros recognizes different flavors of macros, but versions 2.10 and 2.11, as well as the future version 2.12, support only def macros. This macro variety behaves similar to methods except that def macro invocations are expanded at compile time. Here’s how EPFL Scala team member and “Scala macros guy” Eugene Burmako characterizes def macros:

If, during type-checking, the compiler encounters an application of the macro m(args), it will expand that application by invoking the corresponding macro implementation method, with the abstract-syntax trees of the argument expressions args as arguments. The result of the macro implementation is another abstract syntax tree, which will be inlined at the call site and will be type-checked in turn.

Def macros resemble C/C++ pre-preprocessor macros, the m4 macro processor and similar in that the result of macro application will be inlined at the call site. A notable difference is that Scala macros are well integrated into the language meaning e.g. that the results of macro expansion are type-checked. But let’s look at this from the code perspective. I’ve implemented a “hello, world” macro and an object called MyReusableService that defines two methods: a regular method and one implemented as a macro. Two objects, ScalaClient and JavaClient, invoke methods on MyReusableService. Here’s what happens when compiling Java code that tries to invoke a macro method on a Scala object:

~ ᐅ sbt 'runMain com.practicingtechie.gist.macros.JavaClient'
...
[error] /Users/marko/blog-gists/macros-interop/src/main/java/com/practicingtechie/gist/macros/JavaClient.java:6: cannot find symbol
[error] symbol: method macroMethod()
[error] location: class com.practicingtechie.gist.macros.MyReusableService
[error] MyReusableService.macroMethod

Method macroMethod is defined by MyReusableService Scala object, but the method is not visible when inspecting the disassembled class file:

~ ᐅ javap -cp target/scala-2.11/classes com.practicingtechie.gist.macros.MyReusableService$
Compiled from "MyReusableService.scala"
public final class com.practicingtechie.gist.macros.MyReusableService$ {
  public static final com.practicingtechie.gist.macros.MyReusableService$ MODULE$;
  public static {};
  public void regularMethod();
}

After removing the macroMethod invocation JavaClient is able to compile. ScalaClient, however, is more interesting. Here’s macro debugging output from running the code (with “-Ymacro-debug-lite” argument):

~ ᐅ sbt 'runMain com.practicingtechie.gist.macros.ScalaClient'
...
[info] Compiling 2 Scala sources and 1 Java source to /Users/marko/blog-gists/macros-interop/target/scala-2.11/classes...
performing macro expansion MyReusableService.macroMethod at source-/Users/marko/blog-gists/macros-interop/src/main/scala/com/practicingtechie/gist/macros/ScalaClient.scala,line-7,offset=158
println("Hello macro world")
Apply(Ident(TermName("println")), List(Literal(Constant("Hello macro world"))))
[info] Running com.practicingtechie.gist.macros.ScalaClient
Hello, from regular method
Hello macro world

In the above extract we can see the location where the macro was applied, as well as results of macro expansion, both as Scala code and as well as abstract-syntax tree (AST) representation.

Finally, we can see macro expansion results expanded and compiled into bytecode at the ScalaClient call site:

~ ᐅ javap -c -cp target/scala-2.11/classes com.practicingtechie.gist.macros.ScalaClient$
...
  public void main(java.lang.String[]);
    Code:
       0: getstatic     #19                 // Field com/practicingtechie/gist/macros/MyReusableService$.MODULE$:Lcom/practicingtechie/gist/macros/MyReusableService$;
       3: invokevirtual #22                 // Method com/practicingtechie/gist/macros/MyReusableService$.regularMethod:()V
       6: getstatic     #27                 // Field scala/Predef$.MODULE$:Lscala/Predef$;
       9: ldc           #29                 // String Hello macro world
      11: invokevirtual #33                 // Method scala/Predef$.println:(Ljava/lang/Object;)V
...

Scala def macros are a very interesting language feature that’s planned to be officially supported in a future Scala version. Def macros are implemented by the Scala compiler, so a function or method whose definition is macro based won’t be accessible in Java code. This is because unlike ordinary function or method invocations the result of the macro application gets expanded at the call site. Still, Scala functions or methods that simply invoke macros from within their body can nonetheless be called from Java code. Depending on how def macros are used, they can sometimes hinder reuse of Scala code from Java or other JVM-based languages.

More info

What’s the strangest bug you’ve squashed?

As software engineers we’re tasked with creating solutions to customer’s business problems. Being complex systems, every once in a while flaws inevitably slip in the design or implementation. And sometimes flaws creep in through use of third party software, which can can make problems all the more difficult to track down. Each bug has a story to tell and the stories about hunting the most puzzling, challenging, annoying and time-consuming bugs can sometimes live with you for a long time.

What’s the strangest bug you’ve managed to squash?

Mine was quite a few years ago when we were working on a greenfield Java EE software project for a client in the health care industry. With alpha release cycle nearing I was deploying a new build in a newly created server environment and during testing we found a bug in an isolated software feature. Initially, I thought there was something wrong with the new environment setup, but after a while I realized the bug seemed to be related with the way the new release was built. During development phase we had been building the software using Oracle JDeveloper IDE, but had moved to using Apache Ant with Sun Java JDK. So, at that point I thought – this was Java EE after all – it was a packaging issue. I carefully compared the working and broken release packages, but couldn’t find any significant differences. Though it was troublesome to reproduce, the problem was fortunately nevertheless reproducible, so I started tracking it down with remote debugging. After a while I noticed that the software was executing a weird code path I couldn’t quite explain.

Puzzled by the strange behaviour I didn’t really have a clear idea how to continue troubleshooting, but I decided to take a long shot with comparing compiled bytecode from the working and broken releases. This was my first time looking at disassembled Java bytecode, which made analysis a bit slow, and all the more interesting, but fortunately in my remote debugging sessions I had been able to identify some likely places for the bug. After staring at the disassembled bytecode for a while an initially innocuous looking bit of code started to look suspect: a mutator method was present in one class in the working build, but missing in the broken one. It turned out that when the code was built with Oracle JDeveloper it automatically generated a mutator method for a subclass, which just happened to override a buggy superclass mutator method. In the broken build such a mutator method wasn’t being generated causing the buggy superclass mutator method to execute.

This story happened years ago, and while there are lots of things I do differently nowadays, including use of different technologies, design approach, unit testing, test automation, build methods and tooling etc., for me this was one of those more memorable bug squashing sessions.

Do you have a intriguing bug hunting story to share?

An open web application framework benchmark

Selecting a platform for your next application development project can be a complex and burdensome undertaking. It can also be very intriguing and a lot of fun. There’s a wide range of different approaches to take: at one end The Architect will attend conferences, purchase and study analyst reports from established technology research companies such as Gartner, and base his evaluation on analyst views. Another approach is to set up a cross-disciplinary evaluation committee that will collect a wishlist of platform requirements from around the organization and make its decision based on a consensus vote. The first approach is very autocratic, while the second can sometimes lead to lack of focus. A clear, coherent vision of requirements and prioritization is essential for the success of the evaluation. Due to these problems, a middle road and a more pragmatic approach is becoming increasingly popular: a tightly-knit group of senior propellerheads use a more empiric method of analysing requirements, study and experiment with potential solution stack elements, brainstorm to produce a short list of candidates to be validated using a hands-on architecture exercises and smell-tests. Though hands-on experimentation can lead to better results, the cost of this method can be prohibitive, so often only a handful of solutions that pass the first phase screening can be evaluated this way.

Platform evaluation criteria depend on the project requirements and may include:

  • developer productivity
  • platform stability
  • roadmap alignment with projected requirements
  • tools support
  • information security
  • strategic partnerships
  • developer ecosystem
  • existing software license and human capital investments
  • etc.

Performance and scalability are often high priority concerns. They are also among those platform properties that can be formulated into quantifiable criteria, though the key challenge here is how to model the user and implement performance tests that accurately model your expected workloads. Benchmarking several different platforms can only add to the cost of benchmarking.

A company called TechEmpower has started a project called TechEmpower Framework Benchmarks, or TFB for short, that aims to compare the performance of different web frameworks. The project publishes benchmark results that application developers can use to make more informed decisions when selecting frameworks. What’s particularly interesting about FrameworkBenchmarks, is that it’s a collaborative effort conducted in an open manner. Development related discussions take place in an online forum and the source code repository is publicly available on GitHub. Doing test implementation development in the open is important for enabling peer review and it allows implementations to evolve and improve over time. The project implements performance tests for a wide variety of frameworks, and chances are that the ones that you’re planning to use are included. If not, you can create your own tests and submit them to be included in the project code base. You can also take the tests and run the benchmarks on your own hardware.

Openly published test implementations are not only useful for producing benchmark data, but can also be used by framework developers to communicate framework performance related best practices to application developers. They also allow framework developers to receive reproducible performance benchmarking feedback and data for optimization purposes.

It’s interesting to note that the test implementations have been designed and built by different groups and individuals, and some may have been more rigorously optimized than others. The benchmarks measure the performance of the framework as much as they measure the test implementation, and in some cases suboptimal test implementation will result in poor overall performance. Framework torchbearers are expected to take their best shot in optimizing the test implementation, so the implementations should eventually converge to optimal solutions given enough active framework pundits.

Test types

In the project’s parlance, the combination of programming language, framework and database used is termed “framework permutation” or just permutation, and some test types have been implemented in 100+ different permutations. The different test types include:

  1. JSON serialization
    “test framework fundamentals including keep-alive support, request routing, request header parsing, object instantiation, JSON serialization, response header generation, and request count throughput.”
  2. Single database query
    “exercise the framework’s object-relational mapper (ORM), random number generator, database driver, and database connection pool.”
  3. Multiple database queries
    “This test is a variation of Test #2 and also uses the World table. Multiple rows are fetched to more dramatically punish the database driver and connection pool. At the highest queries-per-request tested (20), this test demonstrates all frameworks’ convergence toward zero requests-per-second as database activity increases.”
  4. Fortunes
    “This test exercises the ORM, database connectivity, dynamic-size collections, sorting, server-side templates, XSS countermeasures, and character encoding.”
  5. Database updates
    “This test is a variation of Test #3 that exercises the ORM’s persistence of objects and the database driver’s performance at running UPDATE statements or similar. The spirit of this test is to exercise a variable number of read-then-write style database operations.”
  6. Plaintext
    “This test is an exercise of the request-routing fundamentals only, designed to demonstrate the capacity of high-performance platforms in particular. The response payload is still small, meaning good performance is still necessary in order to saturate the gigabit Ethernet of the test environment.”

Notes on Round 9 results

Currently, the latest benchmark is Round 9 and the result data is published on the project web page. The data is not available in machine-readable form and it can’t be sorted by column for analysing patterns. It can, however, be imported into a spreadsheet program fairly easily, so I took the data and analyzed it a bit. Some interesting observations could be made just by looking at the raw data. In addition to comparing throughput, it’s also interesting to compare how well frameworks scale. One way of quantifying scalability is to take test implementation throughput figures for the lowest and highest concurrency level (for test types 1, 2, 4 and 6) per framework and plot them on a 2-D plane. A line can then be drawn between these two points with the slope characterizing scalability. Well-scaling test implementations would be expected to have a positive, steep slope for test types 1, 2, 4 and 6 whereas for test types 3 and 5 the slope is expected to be negative.

This model is not entirely without problems since the scalability rating is not relative to the throughput, so e.g. a poorly performing framework can end up having a great scalability rating. As a result, you’d have to look at these figures together.

To better visualize throughput against concurrency level (“Peak Hosting” environment data), I created a small web app that’s available at http://tfb-kippo.rhcloud.com/ (the app is subject to removal without notice).

JSON serialization

The JSON serialization test aims to measure framework overhead. One could argue that it’s a bit of a micro benchmark, but it should demonstrate how well the framework does with basic tasks like request routing, JSON serialization and response generation.

The top 10 frameworks were based on the following programming languages: C++, Java, Lua, Ur and Go. C++ based CPPSP was the clear winner while the next 6 contestants were Java -based. No database is used in this test type.

The top 7 frameworks with highest throughput also have the highest scalability rating. After that, both these figures start declining fairly rapidly. This is a very simple test and it’s a bit of a surprise to see such large variation in results. In their commentary TechEmpower attributes some of the differences to how well frameworks work on a NUMA-based system architecture.

Quite many frameworks are Java or JVM based and rather large variations exist even within this group, so clearly neither the language nor the JVM is an impeding factor in this group.

I was surprised about Node.js and HHVM rankings. Unfortunately, the Scala-based Spray test implementation, as well as the JVM-based polyglot framework Vert.x implementation, were removed due to being outdated. Hope to see these included in a future benchmark round.

Single database query

This test type measures database access throughput and parallelizability. Again, surprisingly large spread in performance can be observed for a fairly trivial test case. This would seem to suggest that framework or database access method overhead contributes significantly to the results. Is the database access technology (DB driver or ORM) a bottleneck? Or is the backend system one? It would be interesting to look at the system activity reports from test runs to analyze potential bottlenecks in more detail.

Before seeing the results I would’ve expected the DB backend to be the bottleneck, but this doesn’t appear to be clear-cut based on the fact that the top, as well as many of the bottom performing test implementations, are using the same DB. It was interesting to note that the top six test implementations use a relational database with the first NoSQL based implementation taking 7th place. This test runs DB read statements by ID, which NoSQL databases should be very good at.

Top performing 10 frameworks were based on Java, C++, Lua and PHP languages and are using MySQL, PostgreSQL and MongoDB databases. Java based Gemini leads with CPPSP being second. Both use MySQL DB. Spring based test implementation performance was a bit of a disappointment.

Multiple database queries

Where the previous test exercised a single database query per request this test does a variable number of database queries per request. Again, I would’ve assumed this test would measure the backend database performance more than the framework performance, but it seems that framework and database access method overhead can also contribute significantly.

The top two performers in this test are Dart based implementations that use MongoDB.

Top 10 frameworks in this test are based on Dart, Java, Clojure, PHP and C# languages and they use MongoDB and MySQL databases.

Fortunes

This is the most complex test that aims to exercise the full framework stack from request routing through business logic execution, database access, templating and response generation.

Top 10 frameworks are based on C++, Java, Ur, Scala, PHP languages and with the full spectrum of databases being used (MySQL, PostgreSQL and MongoDB).

Database updates

In addition to reads this test exercises database updates as well.

HHVM wins this test with 3 Node.js based frameworks coming next. Similar to the Single database query test the top 13 implementations work with relational MySQL DB, before NoSQL implementations. This test exercises simple read and write data access by ID which, again, should be one of NoSQL database strong points.

Top performing 10 frameworks were based on PHP, JavaScript, Scala, Java and Go languages, all of which use the MySQL database.

Plaintext

The aim of this test is to measure how well the framework performs under extreme load conditions and massive client parallelism. Since there’s no backend system dependencies involved, this test measures platform and framework concurrency limits. Throughput plateaus or starts degrading with top-performing frameworks in this test before client concurrency level reaches the maximum value, which seems to suggest that a bottleneck is being hit somewhere in the test setup, presumably hardware, OS and/or framework concurrency.

Many frameworks are at their best with concurrency level of 256, except CPPSP which peaks at 1024. CPPSP is the only one of the top-performing implementations that is able to significantly improve its performance as the concurrency level increases from 256, but even with CPPSP throughput actually starts dropping after concurrency level hits the 4,096 mark. Only 12 test implementations are able to exceed 1 M requests per second. Some well-known platforms e.g. Spring did surprisingly poorly.

There seems to be something seriously wrong with HHVM test run as it generates only tens of responses per second with concurrency levels 256 and 1024.

Top 10 frameworks are based on C++, Java, Scala and Lua languages. No database is used in this test.

Benchmark repeatability

In the scientific world research must be repeatable, in order to be credible. Similarly, the benchmark test methodology and relevant circumstances should be documented to make the results repeatable and credible. There’re a few details that could be documented to improve repeatability.

The benchmarking project source code doesn’t seem to be tagged. Tagging would be essential for making benchmarks repeatable.

A short description of the hardware and some other test environment parameters is available on the benchmark project web site. However, the environment setup (hardware + software) is expected to change over time, so this information should be documented per round. Also, Linux distribution minor release or the exact Linux kernel version don’t appear to be identified.

Detailed data about what goes on inside the servers could be published, so that externals could analyze benchmark results in a more meaningful way. System activity reports e.g. system resource usage (CPU, memory, IO) can provide valuable clues to possible scalability issues. Also, application, framework, database and other logs can be useful to test implementers.

Resin was chosen as the Java application server over Apache Tomcat and other servlet containers due to performance reasons. While I’m not contesting this statement, but there wasn’t any mention about software versions, and since performance attributes tend to change over time between releases, this premise is not repeatable.

Neither the exact JVM version nor the JVM arguments are documented for JVM based test implementation execution. Default JVM arguments are used if test implementations don’t override the settings. Since the test implementations have very similar execution profiles by definition, it could be beneficial to explicitly configure and share some JVM flags that are commonly used with server-side applications. Also, due to JVM ergonomics different GC parameters can be automatically selected based on underlying server capacity and JVM version. Documenting these parameters per benchmark round would help with repeatability. Perhaps all the middleware software versions could be logged during test execution and the full test run logs could be made available.

A custom test implementation: Asynchronous Java + NoSQL DB

Since I’ve worked recently on implementing RESTful services based on JAX-RS 2 API with asynchronous processing (based on Jersey 2 implementation) and Apache Cassandra NoSQL database, I got curious about how this combination would perform against the competition so, I started coding my own test implementation. I decided to drop JAX-RS in this case, however, to eliminate any non-essential abstraction layers that might have a negative impact on performance.

One of the biggest hurdles in getting started with test development was that, at the time I started my project there wasn’t a way to test run platform installation scripts in smaller pieces, and you had to run the full installation, which took a very long time. Fortunately, since then framework installation procedure has been compartmentalized, so it’s possible to install just the framework that you’re developing tests for. Also, recently the project has added support for fully automated development environment setup with Vagrant, which is a great help. Another excellent addition is Travis CI integration that allows test implementation developers to gain additional assurance that their code is working as expected also outside their sandbox. Unfortunately, Travis builds can take a very long time, so you might need to disable some of the tests that you’re not actively working on. The Travis CI environment is also a bit different from the developer and the actual benchmarking environments, so you could bump into issues with Travis builds that don’t occur in the development environment, and vice versa. Travis build failures can sometimes be very obscure and tricky to troubleshoot.

The actual test implementation code is easy enough to develop and test in isolation, outside of the real benchmark environment, but if you’re adding support for new platform components such as databases or testing platform installation scripts, it’s easiest if you have an environment that’s a close replica of the actual benchmarking environment. In this case adding support for a new database involved creating a new DB schema, test data generation and automating database installation and configuration.

Implementing the actual test permutation turned out to be interesting, but surprisingly laborious, as well. I started seeing strange error responses occasionally when benchmarking my test implementation with ab and wrk, especially with higher loads. TFB executes Java based performance implementations in the Resin web container, and after a while of puzzlement about the errors, I decided to test the code in other web containers, namely Tomcat and Jetty. It turned out that I had bumped into 1 Resin bug (5776) and 2 Tomcat bugs (56736, 56739) related to servlet asynchronous processing support.

Architecturally, Test types 1 and 6 have been implemented using traditional synchronous Servlet API, while the rest of the test implementations leverage non-blocking request handling through Servlet 3 asynchronous processing support. The test implementations store their data in the Apache Cassandra 2 NoSQL database, which is accessed using the DataStax Java Driver. Asynchronous processing is also used in the data access tier in order to minimize resource consumption. JSON data is processed with the Jackson JSON library. In Java versions predating version 8, asynchronous processing requires passing around callbacks in the form of anonymous classes, which can at times be a bit high-ceremony syntactically. Java 8 Lambda expressions does away with some of the ceremonial overhead, but unfortunately TFB doesn’t yet fully support the latest Java version. I’ve previously used the JAX-RS 2 asynchronous processing API, but not the Servlet 3 async API. One thing I noticed during the test implementation was that the mechanism provided by Servlet 3 async API for generating error response to the client is much lower level, less intuitive and more cumbersome than its JAX-RS async counterpart.

The test implementation code was merged in the FrameworkBenchmarks code base, so it should be benchmarked on the next round. The code can be found here:
https://github.com/TechEmpower/FrameworkBenchmarks/tree/master/frameworks/Java/servlet3-cass

Conclusions

TechEmpower’s Framework Benchmarks is a really valuable contribution to the web framework developer and user community. It holds great potential for enabling friendly competition between framework developers, as well as, framework users, and thus driving up performance of popular frameworks and adoption of framework performance best practices. As always, there’s room for improvement. Some areas from a framework user and test implementer point of view include: make the benchmark tests and results more repeatable, publish raw benchmark data for analysis purposes and work on making test development and adding new framework components even easier.

Good job TFB team + contributors – can’t wait to see Round 10 benchmark data!

Daemonizing JVM-based applications

Deployment architecture design is a vital part of any custom-built server-side application development project. Due to it’s significance, deployment architecture design should commence early and proceed in tandem with other development activities. The complexity of deployment architecture design depends on many aspects, including scalability and availability targets of the provided service, rollout processes as well as technical properties of the system architecture.

Serviceability and operational concerns, such as deployment security, monitoring, backup/restore etc., relate to the broader topic of deployment architecture design. These concerns are cross-cutting in nature and may need to be addressed on different levels ranging from service rollout processes to the practical system management details.

On the system management detail level the following challenges often arise when using a pure JVM-based application deployment model (on Unix-like platforms):

  • how to securely shut down the app server or application? Often, a TCP listener thread listening for shutdown requests is used. If you have many instances of the same app server deployed on the same host, it’s sometimes easy to confuse the instances and shutdown the wrong one. Also, you’ll have to prevent unauthorized access to the shutdown listener.
  • creating init scripts that integrate seamlessly with system startup and shutdown mechanisms (e.g. Sys-V init, systemd, Upstart etc.)
  • how to automatically restart the application if it dies?
  • log file management. Application logs can be managed (e.g. rotate, compress, delete) by a log library. App server or platform logs can sometimes be also managed using a log library, but occasionally integration with OS level tools (e.g. logrotate) may be necessary.

There’s a couple of solutions to these problems that enable tighter integration between the operating system and application / application server. One widely used and generic solution is the Java Service Wrapper. The Java Service Wrapper is good at addressing the above challenges and is released under a proprietary license. GPL v2 based community licensing option is available as well.

Apache commons daemon is another option. It has its roots in Apache Tomcat and integrates well with the app server, but it’s much more generic than that, and in addition to Java, commons daemon can be used with also other JVM-based languages such as Scala. As the name implies, commons daemon is Apache licensed.

Commons daemon includes the following features:

  • automatically restart JVM if it dies
  • enable secure shutdown of JVM process using standard OS mechanisms (Tomcat TCP based shutdown mechanism is error-prone and unsecure)
  • redirect STDERR/STDOUT and set JVM process name
  • allow integration with OS init script mechanisms (record JVM process pid)
  • detach JVM process from parent process and console
  • run JVM and application with reduced OS privileges
  • allow coordinating log file management with OS tools such as logrotate (reopen log files with SIGUSR1 signal)

 

Deploying commons daemon

From an application developer point of view commons daemon consists of two parts: the jsvc binary used for starting applications and commons daemon Java API. During startup, jsvc binary bootstraps the application through lifecycle methods implemented by the application and defined by commons daemon Java API. Jsvc creates a control process for monitoring and restarting the application upon abnormal termination. Here’s an outline for deploying commons daemon with your application:

  1. implement commons daemon API lifecycle methods in an application bootstrap class (see Using jsvc directly).
  2. compile and install jsvc. (Note that it’s usually not good practice to install compiler toolchain on production or QA servers).
  3. place commons-daemon API in application classpath
  4. figure out command line arguments for running your app through jsvc. Check out bin/daemon.sh in Tomcat distribution for reference.
  5. create a proper init script based on previous step. Tomcat can be installed via package manager on many Linux distributions and the package typically come with an init script that can be used as a reference.

 

Practical experiences

Tomcat distribution includes “daemon.sh”, a generic wrapper shell script that can be used as a basis for creating a system specific init script variant. One of the issues that I encountered was the wait configuration parameter default value couldn’t be overridden by the invoker of the wrapper script. In some cases Tomcat random number generator initialization could exceed the maximum wait time, resulting in the initialization script reporting a failure, even if the app server would eventually get started. This seems to be fixed now.

Another issue was that the wrapper script doesn’t allow passing JVM-parameters with spaces in them. This can be handy e.g. in conjunction with the JVM “-XX:OnOutOfMemoryError” & co. parameters. Using the wrapper script is optional, and it can also be changed easily, but since it includes some pieces of useful functionality, I’d rather reuse instead of duplicating it, so I created a feature request and proposed tiny patch for this #55104.

While figuring out the correct command line arguments for getting jsvc to bootstrap your application, the “-debug” argument can be quite useful for troubleshooting purposes. Also, by default the jsvc changes working directory to /, in which case absolute paths should typically be used with other options. The “-cwd” option can be used for overriding the default working directory value.

 

Daemonizing Jetty

In addition to Tomcat, Jetty is another servlet container I often use. Using commons daemon with Tomcat poses no challenge since the integration already exists, so I decided to see how things would work with an app server that doesn’t support commons daemon out-of-the-box.

To implement the necessary changes in Jetty, I cloned the Jetty source code repository, added jsvc lifecycle methods in the Jetty bootstrap class and built Jetty. After that, I started experimenting with jsvc command line arguments for bootstrapping Jetty. Jetty comes with jetty.sh startup script that has an option called “check” for outputting various pieces of information related to the installation. Among other things it outputs the command line arguments that would be used with the JVM. This provided quite a good starting point for the jsvc command line.

These are the command lines I ended up with:

export JH=$HOME/jetty-9.2.2-SNAPSHOT
export JAVA_HOME=`/usr/libexec/java_home -v 1.8`
jsvc -debug -pidfile $JH/jetty.pid -outfile $JH/std.out -errfile $JH/std.err -Djetty.logs=$JH/logs -Djetty.home=$JH -Djetty.base=$JH -Djava.io.tmpdir=/var/folders/g6/zmr61rsj11q5zjmgf96rhvy0sm047k/T/ -classpath $JH/commons-daemon-1.0.15.jar:$JH/start.jar org.eclipse.jetty.start.Main jetty.state=$JH/jetty.state jetty-logging.xml jetty-started.xml

This could be used as a starting point for a proper production grade init script for starting and shutting down Jetty.

I submitted my code changes as issue #439672 in the Jetty project issue tracker and just received word that the change has been merged with the upstream code base, so you should be able to daemonize Jetty with Apache commons daemon jsvc in the future out-of-the-box.

Thoughts on The Reactive Manifesto

Reactive programming is an emerging trend in software development that has gathered a lot of enthusiasm among technology connoisseurs during the last couple of years. After studying the subject last year, I got curious enough to attend the “Principles of Reactive Programming” course on Coursera (by Odersky, Meijer and Kuhn). Reactive advocates from Typesafe and others have created The Reactive Manifesto that tries to formulate the vocabulary for reactive programming and what it actually aims at. This post collects some reflections on the manifesto.

According to The Reactive Manifesto systems that are reactive

  • react to events – event-driven nature enables the following qualities
  • react to load – focus on scalability by avoiding contention on shared resources
  • react to failure – resilient systems that are able to recover at all levels
  • react to users – honor response time guarantees regardless of load

Event-driven

Event-driven applications are composed of components that communicate through sending and receiving events. Events are passed asynchronously, often using a push based communication model, without the event originator blocking. A key goal is to be able to make efficient use of system resources, not tie up resources unnecessarily and maximize resource sharing.

Reactive applications are built on a distributed architecture in which message-passing provides the inter-node communication layer and location transparency for components. It also enables interfaces between components and subsystems to be based on loosely coupled design, thus allowing easier system evolution over time.

Systems designed to rely on shared mutable state require data access and mutation operations to be coordinated by using some concurrency control mechanism, in order to avoid data integrity issues. Concurrency control mechanisms limit the degree of parallelism in the system. Amdahl’s law formulates clearly how reducing the parallelizable portion of the program code puts an upper limit to system scalability. Designs that avoid shared mutable state allow for higher degrees of parallelism and thus reaching higher degrees of scalability and resource sharing.

Scalable

System architecture needs to be carefully designed to scale out, as well as up, in order to be able to exploit the hardware trends of both increased node-level parallelism (increased number of CPUs and nb. of physical and logical cores within a CPU) and system level parallelism (number of nodes). Vertical and horizontal scaling should work both ways, so an elastic system will also be able to scale in and down, thereby allowing to optimize operational cost structures for lower demand conditions.

A key building block for elasticity is achieved through a distributed architecture and the node-to-node communication mechanism, provided by message-passing, that allows subsystems to be configured to run on the same node or on different nodes without code changes (location transparency).

Resilient

A resilient system will continue to function in the presence of failures in one or more parts of the system, and in unanticipated conditions (e.g. unexpected load). The system needs to be designed carefully to contain failures in well defined and safe compartments to prevent failures from escalating and cascading unexpectedly and uncontrollably.

Responsive

The Reactive manifesto characterizes the responsive quality as follows:

Responsive is defined by Merriam-Webster as “quick to respond or react appropriately”.

Reactive applications use observable models, event streams and stateful clients.

Observable models enable other systems to receive events when state changes.

Event streams form the basic abstraction on which this connection is built.

Reactive applications embrace the order of algorithms by employing design patterns and tests to ensure a response event is returned in O(1) or at least O(log n) time regardless of load.

Commentary

If you’ve been actively following software development trends during the last couple of years, ideas stated in the reactive manifesto may seem quite familiar to you. This is because the manifesto captures insights learned by the software development community in building internet-scale systems.

One such set of lessons stems from problems related to having centrally-stored state in distributed systems. The tradeoffs of having a strong consistency model in a distributed system have been formalized in the CAP theorem. CAP-induced insights led developers to consider alternative consistency models, such as BASE, in order to trade off strong consistency guarantees for availability and partition tolerance, but also scalability. Looser consistency models have been popularized during recent years, in particular, by different breeds of NoSQL databases. Application’s consistency model has a major impact on the application scalability and availability, so it would be good to address this concern more explicitly in the manifesto. The chosen consistency model is a cross-cutting trait, over which all the application layers should uniformly agree. This concern is something that is mentioned in the manifesto, but since it’s such an important issue, and it has subtle implications, it would be good to elaborate it a bit more or refer to a more thorough discussion of the topic.

Event-driven is a widely used term in programming that can take on many different meanings and has multiple variations. Since it’s such an overloaded term, it would be good to define it more clearly and try to characterize what exactly does and does not constitute as event-driven in this context. The authors clearly have event-driven architecture (EDA) in mind, but EDA is also something that can be achieved with different approaches. The same is true for “asynchronous communication”. In the reactive manifesto “asynchronous communication” seems to imply using message-passing, as in messaging systems or the Actor model, and not asynchronous function or method invocation.

The reactive manifesto adopts and combines ideas from many movements from CAP theorem, NoSQL, event-driven architecture. It captures and amalgamates valuable lessons learned by the software development community in building internet-scale applications.

The manifesto makes a lot of sense, and I can subscribe to the ideas presented in it. However, on a few occasions, the terminology could be elaborated a bit and made more approachable to developers who don’t have extensive experience in scalability issues. Sometimes, the worst thing that can happen to great ideas is that they get diluted by unfortunate misunderstandings 🙂