practicing techie

tech oriented notes to self and lessons learned

Monthly Archives: August 2012

System call tracing is your friend

After downloading and installing Java SE 7 update 6 I tried running “java -version” to verify that the JDK was installed properly. To my surprise, the command reported the previous version instead of update 6. I then tried troubleshooting the problem using:

pkgutil --verbose --files com.oracle.jdk7u6
installer -dumplog -verbose -pkg '/Volumes/JDK 7 Update 06/JDK 7 Update 06.pkg' -target /

but with no effect. Then, browsing through the previous Java 7 installation directory parent directories I noticed that with update 6 the installation path was actually

/Library/Java/JavaVirtualMachines/jdk1.7.0_06.jdk

instead of

/Library/Java/JavaVirtualMachines/1.7.0.jdk

as with the previous Java 7 update releases, so I was using the old absolute path in my “java -version” command.

Now, on Linux one of my first troubleshooting methods would’ve been to use the strace command, but for some reason this doesn’t come instinctively for me on Mac OS X. On the Mac the equivalent command is called dtruss and it would’ve revealed the new installation path immediately, as strace would’ve:

dtruss 'installer -dumplog -verbose -pkg /Volumes/JDK\ 7\ Update\ 06/JDK\ 7\ Update\ 06.pkg -target /'
...
kevent(0x3, 0x153C67788, 0x1) = 1 0
audit_session_self(0x7FB1EB9640E0, 0x7FB1EBBEB150, 0x78) = 6659 0
kevent(0x3, 0x153C67788, 0x1) = 1 0
lstat64("/Library/Java/JavaVirtualMachines/jdk1.7.0_06.jdk", 0x153C65860, 0x1) = -1 Err#2
stat64("/Library/Java/JavaVirtualMachines/jdk1.7.0_06.jdk", 0x153C668B8, 0x0) = -1 Err#2
getattrlist("/", 0x153C665A0, 0x153C66190) = 0 0
getattrlist("/Library/Internet Plug-Ins/JavaAppletPlugin.plugin", 0x153C665A0, 0x153C66190) = 0 0
...

So, when troubleshooting OS level problems, system call tracing is always your friend, irrespective of the operating system. This is a good case in point.

Asynchronous event-driven servers with Apache MINA

A while ago we had to do performance testing for a web application that depends on an external network service that couldn’t be tested in-place with high data volumes. We wanted to include the network protocol communication with the external service in the test (i.e. work on “system integration testing” level) and there was no existing mock server, so I decided to spend a few hours evaluating if we could implement one ourselves. Since the mock server can obviously become a bottleneck I had to make sure it was implemented efficiently (IO, threading, session and memory usage etc.) enough.

Implementing a server that leverages asynchronous IO with Java NIO can be a tedious task mainly because incoming and outgoing protocol messages will get fragmented and you need to handle things like defragmentation and state management. The network protocol handling code can be difficult to get correctly and if you don’t design your abstractions carefully, it will get intertwined with application level logic resulting in unmaintainable code.

There are several prominent asynchronous event-driven network communication frameworks for Java that you can use for implementing protocol servers and clients. Among the better known are Netty, Apache MINA and GlassFish Grizzly. These frameworks allow implementing scalable, high-performance and extensible network applications. The application developer is freed of much of the protocol message handling, state, session and thread management details. All of the frameworks listed above are widely used and mature, but I had to pick one and decided to give Apache MINA 2.0 a try.

Apache MINA defines the concept of a service, which in abstract terms represents a network accessible endpoint that a consumer can communicate with to request it to perform some well-defined task. An IoService class instance acts as an entry point to a service, which is implemented as a connector on the client-side and as an acceptor on the server-side. An acceptor is used when implementing servers and they act as communication endpoints to a service accepting new sessions and mediating network traffic between consumers and the server side components responsible for actual message processing. The application developer picks an appropriate acceptor type (e.g. NioSocketAcceptor for non-blocking TCP/IP) based on his requirements. Acceptors are responsible for network communication, connection and thread management etc. but you they delegate responsibilities to other interfaces that you’re free to customize and configure. As a minimum you’ll need to configure an IoHandler interface implementation that takes care of handling different I/O events, for servers most notably receiving messages, but you can also choose to handle session and exception related events. An acceptor can also have multiple filters that can do I/O event pre and post processing. You’ll typically need to configure at least a protocol message encoder and decoder (ProtocolCodecFilter), that will take care of message serialization and deserialization.

I found that Apache MINA really did fulfill its promise and implementing a high-performance, scalable and extensible network server was easy using it. MINA also helps very cleanly separate network communication and application level message processing logic. Supporting multiple different protocols in in the same server is well supported in MINA. As a downside the documentation for v2.0 is a bit lacking, but fortunately there are quite a few code samples that you can check out.

Using Oracle SQLDeveloper with MySQL

Oracle SQLDeveloper is a tool I’ve found very valuable in projects where I’m using the Oracle Database. Normally I like using command line tools, but many tasks such as browsing large result sets or data in fat tables, browsing database schema metadata etc. are much faster with SQLDeveloper. SQLDeveloper supports other relational databases and since I’m currently working on a project involving MySQL, I thought I’d give SQLDeveloper (v3.1.07) a little test with MySQL (v5.5).

You can install extensions in SQLDeveloper in a similar fashion as in Eclipse and there’s a MySQL JDBC driver available (Third Party SQLDeveloper extension). For some reason the extension failed to install properly on my Mac: everything looked to be going fine but the installation failed silently for some reason. You can configure JDBC drivers manually in SQLDeveloper, however, so I downloaded the MySQL driver and configured it (preferences / database / third party JDBC drivers). After that, a new tab called “MySQL” appears when creating a new database connection. Here you can specify DB product specific connection parameters.

I was able to successfully connect to my MySQL database but when trying to browse table data on a table containing 5+ M rows, the operation failed with the following error:

Task Error
Java heap space

I don’t remember running into this problem with SQLDeveloper when connecting to Oracle DB. As a workaround I modified the Java VM heap size argument that SQLDeveloper passes to Java VM at launch (sqldeveloper.conf configuration file).

I also, wanted to test if SQLDeveloper would run with my newly installed Java 7 but that turned out to be a bit more difficult. On Mac OS X, changing the Java path in the SQLDeveloper default configuration files had no effect, as this parameter was overridden in a platform specific configuration file that had to be changed (sqldeveloper-Darwin.conf), in order to use an alternate Java VM. The correct configuration file to change was revealed after starting up SQLDeveloper with –verbose flag from the command line:

sqldeveloper.sh --verbose

SQLDeveloper can help in a number of ways when you’re working with Oracle DB including: provide wizards for creating and editing table definitions, import and export data and allow viewing and changing many aspects of database metadata. The SQL Worksheet can help you when writing SQL statements with the autocompletion feature. SQLDeveloper is a great tool to use with Oracle DB, but you should note that some of its features aren’t available in SQLDeveloper for other database products.

Deploying a dependency manager

Older codebases are common to include dependencies required by the application, such as code libraries, picture and sound files and other artifacts in the source code repository itself. This typically leads to the repository getting bloated. As an example, in a project where I was working recently the repository size on checkout was 450 MB. Checking out such a repository takes a lot of time, wastes disk space on each developer sandbox and makes implementing continuos integration more difficult.

Here’s where dependency management technologies such as Apache Maven and Ivy (with an artifact repository) can be a great help. Deploying such a solution in a greenfield project is simple, but when you have an existing codebase that’s used in production things get more complicated.

Suppose you have a set of libraries that the codebase currently uses and you want to manage them using a dependency manager and artifact repository. You can choose to

  • a) declare and publish your own artifacts based on artifacts in the existing codebase
  • b) use artifacts published in public repositories
  • c) use a hybrid approach

With option a you gain maximum control over your artifacts (and thus what exactly gets included) while with b you hope to be doing less work managing artifacts on the long term. Using publicly available artifacts requires you to determine and categorize library dependencies, which can require a lot of detective work. Also, you need to trust the persons providing artifacts to be doing a good job in declaring transitive dependencies. I haven’t found good automated solutions for determining dependencies, so I’ve used the following approach:

  1. Determine compile-time dependencies
    Check import statements from source code. For each import find the corresponding library in a public artifact repository and add the dependency to compile-time set.
    Delete jars one-by-one, compile. If build fails, add required dependencies.
    Rebuild and iterate.
  2. Determine runtime dependencies
    build package, deploy and test. If any dependencies are missing, add them to the runtime set.
    Iterate.
  3. Cross check original set of jar files (optional)
    Remove any files that aren’t present in the original runtime set using explicit exclusions.

This is a rather frustrating and dull method. Also, in practice, in step 3 some artifacts usually bring in dependencies that weren’t packaged with the application originally and which you might not want to include in the dependency set. You can solve this in two ways: a) include transitive dependencies by default and exclude the unnecessary ones or b) exclude transitive dependencies by default and include explicitly the required ones. With option b you gain control but you’re not using the dependency manager up to its full potential. Option a on the other hand may require that you let go on your purism and accept the fact that the dependency set will include something, that wasn’t included as a dependency originally.

Oracle Enterprise Linux now free (of charge)

Linux application software developers often face a choice between two compatible but different OS variants: Red Hat Enterprise Linux (RHEL) and CentOS. Using RHEL can sometimes be problematic for developers because typically some sort of centralized subscription management is required for enabling software updates, and depending on the organization you can get stuck from hours to days. The required bureaucracy can be a really frustrating experience for software developers looking to install just a basic virtualized RHEL guest OS instance for development or QA purposes: 5 minutes and you’re done – if it just wasn’t for the subscription management part! CentOS on the other hand can be freely downloaded and used but the downside is that traditionally publishing updates has dragged behind. Depending on the project, this may not be a big problem for QA and development purposes, but for internet facing production platforms you’d like the security updates to get installed as soon as they get released.

Oracle Enterprise Linux is an enterprise Linux distribution similar to CentOS in that it’s binary compatible with RHEL. It’s also been made freely (as in beer) available recently. The big upside for the app dev use case above is that Oracle promises to publish updates faster than CentOS has done. For operations personnel the benefit is that you can also get paid support for the OS from Oracle as well as some interesting features, such as zero-downtime kernel updates with Ksplice.

Being a bit curious, I downloaded Oracle Linux installation image (Oracle Linux Release 6 Update 3 for x86_64 [64 Bit], 3.5 GB) from Oracle and installed it as a virtualized guest OS instance on my laptop. The installation process worked as expected with RHEL and CentOS, except for the different branding, logos etc., of course. Software updates also installed without problems after initial installation.

So far I’ve dismissed Oracle Linux from consideration as a niche distribution and had some doubts about its continuity, but it does look like a solid OS and it has been around for a while now, so it could be a viable option to consider when choosing an enterprise Linux platform.

For more information see:

Java 7 on Mac OS X – finally!

Apple doesn’t exactly have a history of timely Java releases for Mac OS X so, I didn’t expect Java 7 to be available soon after its GA release, but I was very disappointed to read instead Apple’s announcement in october 2010 stating they will not be supporting Java 7 on Mac OS X. I also was quite sceptic in november, when there was a surprise announcement from Apple and Oracle saying the two companies will be working together to port OpenJDK to Mac OS X. Java 7 was published in july 2011 but patience was required from Mac OS X Java developers still.

When the OpenJDK Java 7 preview packages were finally made available in 2012 they didn’t run on Mac OS X Snow Leopard, so I had to build the JDK from the sources. That was fairly simple but rather time-consuming, and the build process practically rendered my laptop unusable since it used up a lot of CPU, IO and memory resources. Operating system reinstallation is always a huge load of work, with all the backing up, finding a suitable time slot and other arrangements, so it was only last week when I finally managed to find the time for OS X Lion upgrade, but now I’m able to use the Oracle provided JDK 7 installation packages, which makes JDK upgrades a lot easier. So, a year after Java 7 release I’m finally able to run it on my laptop! And one nice thing about Oracle picking up Java on Mac OS X is that they’ve promised to release Java 7 updates simultaneously for Mac, Windows, Linux and Solaris (Henrik on Java). I hope the Mac OS X port code base is well integrated with the rest of the tree and that also future Java major releases like 8 ja 9 will happen in a timely fashion.