practicing techie
tech oriented notes to self and lessons learned
Category Archives: development
Advanced PostgreSQL features
2013-05-12
Posted by on Nearly all the server-side development projects I’ve worked in over the years have stored at least part of their data in a relational database system. Even the systems using NoSQL storage have included a RDBMS in some form, whether local to a single subsystem or in a large role. In some cases the RDBMS systems have been proprietary, but increasingly they’ve been open source projects.
I’ve been using PostgreSQL in projects with RDBMS storage requirements on and off over the years. It has often impressed me with its depth of support for the SQL standard as well as wide range of non-standard extension features. With other widely used OSS RDBMS engines, I’ve often noticed that while the database claims to support feature X on paper, it only supports a subset. And then there’s a list of limitations you need to account for. Typically those limitations are something you wouldn’t expect, so they come as a surprise. Subqueries and joins are one such feature.
With PostgreSQL, I haven’t yet found a case where the database wouldn’t be able to handle a SQL standard subquery or join statement. Having an item such as comprehensive support for SQL subqueries and joins in a database product feature list may not look particularly appealing, it may even raise some suspicions. But from a developer point of view I find this “feature” a very important one, perhaps even one of PostgreSQL’s biggest selling points compared to some of it’s OSS competitors. PostgreSQL has many other advanced features that are interesting for application developers.
Common table expressions and hierarchic queries
Common table expressions or CTE is a handy standard SQL feature that allows you to split a query statement into distinct parts where results of each part will appear as a table, so you can reference the intermediate results in other parts of the statement, possibly several times. This can help make query statements more maintainable, but it also helps optimize queries in some cases, since CTE expressions are performed only once per statement execution.
In addition to allowing subquery factoring, CTE can process data as hierarchical. With small amounts of data and relatively shallow hierarchies you can implement hierarchical queries simply using joins, but this workaround may not be acceptable in all cases. A CTE hierarchical query makes it possible to process arbitrarily deep hierarchies with just one query.
Here’s an example how you can get a list of subtasks in an arbitrarily deep task tree along with path information for each task. Including path information will make it easier to build a graph representation on the receiving end.
mydb=> WITH RECURSIVE task_tree (id, name, parent_id, depth, path) AS ( mydb(> SELECT id, name, parent_id, 1, ARRAY[t.id] mydb(> FROM task t WHERE t.id = 1 mydb(> UNION ALL mydb(> SELECT s.id, s.name, s.parent_id, tt.depth + 1, path || s.id mydb(> FROM task s, task_tree tt WHERE s.parent_id = tt.id mydb(> ) mydb-> SELECT * FROM task_tree mydb-> ORDER BY depth ASC; id | name | parent_id | depth | path ----+-------------+-----------+-------+------------- 1 | task1 | | 1 | {1} 4 | task1-1 | 1 | 2 | {1,4} 5 | task1-2 | 1 | 2 | {1,5} 6 | task1-3 | 1 | 2 | {1,6} 13 | task1-3-1 | 6 | 3 | {1,6,13} 14 | task1-3-1-1 | 13 | 4 | {1,6,13,14} (6 rows)
PostgreSQL also has an extension module that allows handling hierarchic data in a less verbose, but non-standard manner:
mydb=> SELECT * FROM connectby('task', 'id', 'parent_id', '1', 0, '/') mydb-> AS t(id BIGINT, parent_id BIGINT, level INT, branch TEXT); id | parent_id | level | branch ----+-----------+-------+----------- 1 | | 0 | 1 4 | 1 | 1 | 1/4 5 | 1 | 1 | 1/5 6 | 1 | 1 | 1/6 13 | 6 | 2 | 1/6/13 14 | 13 | 3 | 1/6/13/14 (6 rows)
Aggregates and window functions
SQL GROUP BY lets you calculate aggregates of data over a single or multiple columns in a result set. The clause, however, can only aggregate over a single grouping, so it wouldn’t be possible e.g. to get calculate average salaries over departments and locations in a single query. Another limitation is that only aggregated data is returned and detail data is not preserved, so you can’t get both the detail records and aggregates using a single query.
Window functions make it possible to get both. Here’s how to calculate employee salary aggregates over several different groupings while preserving the detail records:
mydb=> SELECT depname, location, empno, salary, mydb-> AVG(salary) OVER (PARTITION BY depname) avgdept, mydb-> SUM(salary) OVER (PARTITION BY depname) sumdept, mydb-> AVG(salary) OVER (PARTITION BY location) avgloc, mydb-> RANK() OVER (PARTITION BY depname ORDER BY salary DESC, empno) AS pos mydb-> FROM empsalary; depname | location | empno | salary | avgdept | sumdept | avgloc | pos -----------+----------+-------+--------+-----------------------+---------+-----------------------+----- develop | fi | 8 | 6000 | 5020.0000000000000000 | 25100 | 4550.0000000000000000 | 1 develop | se | 10 | 5200 | 5020.0000000000000000 | 25100 | 4950.0000000000000000 | 2 develop | fi | 11 | 5200 | 5020.0000000000000000 | 25100 | 4550.0000000000000000 | 3 develop | fi | 9 | 4500 | 5020.0000000000000000 | 25100 | 4550.0000000000000000 | 4 develop | fi | 7 | 4200 | 5020.0000000000000000 | 25100 | 4550.0000000000000000 | 5 personnel | fi | 2 | 3900 | 3700.0000000000000000 | 7400 | 4550.0000000000000000 | 1 personnel | fi | 5 | 3500 | 3700.0000000000000000 | 7400 | 4550.0000000000000000 | 2 sales | se | 1 | 5000 | 4866.6666666666666667 | 14600 | 4950.0000000000000000 | 1 sales | se | 3 | 4800 | 4866.6666666666666667 | 14600 | 4950.0000000000000000 | 2 sales | se | 4 | 4800 | 4866.6666666666666667 | 14600 | 4950.0000000000000000 | 3 (10 rows)
Like CTE, window functions is a feature specified in the SQL standard, but it’s not supported by all OSS or proprietary RDBMS systems.
Pivoting data
Sometimes it’s nice to be able to pivot data in a properly normalized data model, so that repeating groups of related entities are folded into parent entity as columns. This can be useful e.g. for reporting purposes and ad-hoc queries. PostgreSQL can handle pivoting data using subqueries and arrays like this:
mydb=> SELECT e.*, mydb-> (SELECT ARRAY_TO_STRING(ARRAY(SELECT emp_phone_num FROM emp_phone p WHERE e.employee_id = p.emp_id), ',')) AS phones mydb-> FROM employees AS e; employee_id | last_name | manager_id | phones -------------+-----------+------------+------------------------- 100 | King | | 101 | Kochhar | 100 | 555-123,555-234,555-345 108 | Greenberg | 101 | 555-111 205 | Higgins | 101 | 555-914,555-222 206 | Gietz | 205 | ... (13 rows)
Another way is to use the tablefunc extension module again:
mydb=> SELECT * mydb-> FROM crosstab( mydb(> 'SELECT emp_id, contact_type, contact FROM emp_sm_contact ORDER BY 1', mydb(> 'SELECT DISTINCT contact_type FROM emp_sm_contact ORDER BY 1' mydb(> ) mydb-> AS emp_sm_contact(emp_id BIGINT, "g+" TEXT, "linkedIn" TEXT, twitter TEXT); emp_id | g+ | linkedIn | twitter --------+-----------+-----------+--------- 100 | bking | b.king | beking 101 | kochhar.2 | kochhar.1 | kochhar 200 | | | whalen (3 rows)
Other advanced features
Other advanced PostgreSQL features that I find of interest to application developers include:
- pattern matching. Regular expression matching is supported
- geolocation queries. PostGIS extension adds comprehensive support for managing and querying geospatial data
- partitioning
- replication
Final thoughts
Problems faced in the transition phase of the software development process, when the software has been handed over from development to operations team, have prompted the need for closer collaboration between the teams in the form of devops culture. Similarly, application developers can’t remain ignorant of database design, implementation and optimization issues, and expect the DBAs to magically fix any data tier related design issues after the system has been implemented. Application developers need to learn how to leverage database systems effectively and take responsibility of database tier design to make transitions more seamless and production deployments succeed.
While standards based object relational mapping (ORM) technologies, such as Java persistence API, can be a great help to application developers, developers should be aware of what kind of queries the ORM implementation is generating, and in particular watch out for N+1 queries issues. With higher data, transaction volumes or data access patterns advanced database features will be a significant help in optimising the application.
To quote Oracle guru Tom Kyte: it’s a database, not a data dump
. PostgreSQL is an advanced relational database engine and it has a lot of features that can help application developers implement new features faster and in a more efficient and scalable manner. As with all the tools: you should learn how to use it to get the most out of it.
More info
- DDL, sample data and queries used in this blog entry, see file-advanced-psql-examples-sql
- PostgreSQL 9.2 documentation
Tomcat JDBC Connection Pool
2013-03-29
Posted by on tomcat-jdbc is a relatively new entrant to the Java JDBC connection pool game. It’s been designed to be a drop-in replacement for commons-dbcp.
Many of the popular Java connection pool implementations have become quite stagnant over the years, so it’s nice to see someone make a fresh start in this domain. tomcat-jdbc code base is small and has minimal dependencies. It has configurable connection validation (validation policy, query and intervals) and enables automatically closing connections after they reach a configurable maximum age.
Some time ago, we started having problems with a near end-of-life legacy application that was migrated to a new environment. The application was a standalone Java app that used commons-dbcp as its JDBC connection pool implementation. After the migration, MySQL connections started failing occasionally. This appeared to happen with a somewhat regular interval of a few hours. Since tomcat-jdbc API is compatible with commons-dbcp we were able to replace the connection pool without any code changes and configure the new pool to automatically close connections once they had been open for a certain period of time. This turned out to be an effective and nonintrusive workaround for the issue.
I hope the Tomcat development team would more actively promote using tomcat-jdbc also outside of Tomcat. The pool implementation doesn’t currently seem to be available as a separate download or from Maven central, which is likely to hinder its adoption.
A JVM polyglot experiment with JRuby
2013-01-21
Posted by on The nice thing about hobby technology projects is that you get to freely explore and learn new things. Sometimes this freedom makes the project go off at a tangent, and it’s in those cases in particular when you get to explore.
Some time ago I was working on a multi-vendor software development project. We had trouble making developers follow Git commit message guidelines and asking multiple times didn’t help, so I thought I’d implement a technological solution for this. Our repositories were hosted at GitHub, so I studied the post-receive hooks mechanism, learned a bit of Ruby and implemented my own service hook that validates the message format against a configurable format, generates an email using a configurable template and delivers it to selected recipients. Post-receive hooks don’t prevent people from committing with invalid messages, but I chose to go with a centralized solution that would not require every developer to configure their repository. I submitted my module, test code and documentation to GitHub, but the service hook implementation was rejected.
After the dead-end, I decided to try enforcing a commit message policy using a server-side hook that could actually prevent invalid commits. That solution was technically viable, but as suspected, it turned out that not all developers were willing to configure the hook in their repository. Also, every once in awhile when developers do a clean clone of the repository, the configuration needs to be redone.
So, I decided to study how service hooks could be run on an external system instead of being hosted on GitHub. The “WebHook” service hook allows you to deliver the post-receive event anywhere over HTTP. GitHub also makes service hook implementations available to be run on your own servers. The easy way would’ve been to simply take my custom service hook implementation and run it on our server. In addition to being too easy, there were some limitations with this approach as well:
- a github-services server instance can only have a single configuration i.e. you can’t serve multiple repositories each with different configurations
- the github-services server dispatches data it receives to a single service based on the request URL. It’s not possible to dispatch the data to a set of services.
- you have to code service hooks in Ruby
I had heard of JRuby at that time, but didn’t have practical experience with it. After some experimenting I was able to validate my assumption that GitHub Service implementations could, in fact, be run with JRuby. At that point I started migrating the code base into a polyglot GitHub Services container that allows you to run the GitHub provided github-services as well as your own custom service implementations. Services can be implemented in different languages and run simultaneously in the same container instance. The container can be configured with an ordered set of services (chain) to handle post-receive events from one or more GitHub repositories. It’s also possible to configure a single container with separate service chains, each bound to a different repository. The container is run in Jetty servlet container and uses JRuby for executing Ruby code.
Below is an illustration of an example configuration scenario where two GitHub repositories are set up to deliver post-receive events to a single container. The container has been configured with a separate service chain for each repository.
The current status of the project is that a few GitHub Services as well as my custom Ruby and Java based services have been tested and seem to be working.
Lessons learned
The JVM can run code written in a large number of different programming languages and it’s a great platform for both dynamic language implementers and polyglot application developers. To quote JRuby developer Charles Nutter:
The JVM is going to be the best VM for building dynamic languages, because it already is a dynamic language VM.
Java 7 delivered vastly improved support for dynamic language implementers with JSR 292 or the invokedynamic bytecode instruction. Java 8 is expected to further improve language interoperability and performance.
JRuby is an interesting alternative Ruby implementation for the JVM. It’s mature and the performance benchmark numbers are impressive (Why JRuby) compared with Ruby MRI. Performance is expected to get even better with invokedynamic optimization work being done for Java 8.
While taking the existing Ruby based github-services and making them run on JRuby was successful and didn’t require any code changes, there were lots of small issues that took a surprising amount of time to resolve. Many of the issues were related to setting up the runtime environment in one way or another. High-level troubleshooting strategies are similar from platform to platform, but on a more detailed level the methods and tools are often quite different, and many of the problems I encountered might have been easier to crack with solid Ruby experience.
Here’re some lessons learned during the project:
- Learning how to use the JRuby embedding API. There’re 3 different APIs to choose from: Red Bridge, JSR 223 and BSF. Performing tasks like instantiating objects and passing parameters in a Java call-out was not immediately obvious at first using Red Bridge
- Figuring out concurrency / thread-safety properties of different areas of the JRuby embedding API. JRuby concurrency documentation was lacking at the time when the project was started.
- JRuby tooling. Tooling works somewhat different from Ruby.
- bootstrapping GitHub services gem environment 
- in order to keep the github-services installation self-contained, I wanted to install as many gems as possible in the github-services vendor directory instead of installing them in JRuby. Some gems had to be installed in JRuby while others could be installed in vendor tree.
- some gems need to be replaced by JRuby specific ones (e.g. jruby-openssl)
- setting up Ruby requires and load paths
- Gems implemented as native extensions require a compiler toolchain as well as gem module library dependencies to be installed.
- bypassing the Sinatra web app framework that’s used by github-services
Probably, most of the issues I encountered were related to bootstrapping GitHub services gem environment in some way.
Code for the experiment can be found from https://github.com/marko-asplund/github-hook-jar
Java on Mac OS X
2013-01-03
Posted by on Mac OS X is a nice platform for Java development because it successfully combines a very good desktop user experience with the system’s unix heritage and tooling. There are some problems, however:
- only a limited set of JDK versions are available for current OS X releases
- the JVM implementations are not standalone and require Apple proprietary frameworks to be present
In this respect, Linux is probably the best Java development platform because JDKs are available from many different vendors and multiple versions of the most popular JDKs run on Linux. The JVM implementations, usually don’t have esoteric dependencies and only require basic OS libraries in addition to the ones bundled with the JVM.
On the other hand, only Apple and Oracle provided JDKs run on Mac OS X and older Java versions aren’t available. Also, neither Apple’s Java 6 nor Oracle’s Java 7 JVM seems to run on Lion or Mountain Lion without the com.apple.pkg.JavaEssentials package, for example.
Recently, I managed to corrupt my Java installation beyond repair and I wanted to try and avoid this in the future by trying to isolate my Java 7 and 8 installations as much as possible. This turned out to be fairly simple if you defy the temptation to install by just clicking on the downloaded JDK package. The JDKs are distributed by Oracle as disk images that contain a Mac OS X installer package file. Instead of running the installer you can easily extract the contents of the package using command line tools. First mount the disk image by clicking on the disk image and then run the following commands in a terminal session:
xar -xf '/Volumes/JDK 8/JDK 8.pkg' cat jdk180.pkg/Payload | gunzip | cpio -i
After that, the Contents directory will include the entire JDK installation and you can move it to the location of your choosing. Then just specify JAVA_HOME environment variable and add the JVM and required tools to shell search path.
JavaOne 2012 – Keynotes
2012-11-16
Posted by on Java Strategy Keynote
Java Strategy and JavaOne technical keynotes were delivered at the end of the first conference day, on sunday.
The Java Strategy keynote was kicked off with a “catchy” music video “coding in Java”. After the video Hasan Rizvi, EVP Middleware and Java Development, opened the more formal part of the keynote. Rizvi described how the conference theme “make the future Java” referred to two different aspect of building the future:
- a) ensuring the platform stays competitive. Competitiveness involves platform completeness, modernization and innovation, developer productivity as well as quality and security
- b) making sure that the collaborative process through which the platform is being developed, works well. The process needs open and transparent evolution, and active community involvement
Rizvi noted that “we have bet our business on Java and a lot of you have bet your business and careers on Java”. Oracle’s Fusion Middleware platform as well as a lot of (if not all) Oracle applications have been built on Java, so Oracle has in fact, made a huge bet on Java.
As for the Java roadmap, Oracle stated they’re committed to more regular platform major releases. During Sun stewardship there was a period of Java stagnation when 4.5 years elapsed between Java 6 ja 7 releases, and Java 7 was actually finally released by Oracle, not Sun. Even though evolving Java is a collaborative effort, a lot of responsibility lies on the steward. A key duty is to produce the reference implementation. The developers, partners, clients and all the stakeholders in the Java ecosystem need to be able to rely on the steward to move things forward in a consistent and predictable manner, and timeboxed releases are an important indication to everyone that the train is moving.
Rizvi gave some highlights of Java roadmap for SE, ME, EE, JavaFX, Java Card and NetBeans. These were later described in more detail by the product development leaders. He also presented results for Oracle’s Java 2012 scorecard. The scorecard is split into three different areas: technology, community and Oracle leadership.
Rizvi then handed over to Georges Saab, VP / Development who described the current state of Java SE 7 adoption. According to Saab they’re seeing rapid uptake of the new release and mentioned that Oracle supports its entire Fusion Middleware stack on JDK 7. (With the end of public Java 6 updates scheduled for 2013 february, it’s time to upgrade unless you have a Java support contract.) He also emphasized support for 2 new platforms added in the release. Support for Linux ARM seems very much related with Oracle’s aspirations for Java in the embedded space (Saab mentioned the emerging ARM microserver market).
Java 8 is scheduled for Q3 2013 with developer preview slated for February 2013. OpenJDK 8 early builds are available already to test things like Lambda. Some of the highlights of the planned release content include Lambda expressions (closures), parallel operations on core collections API, eliminating PermGen, a new JVM based JavaScript implementation called Nashorn, language interoperability, Java ME/SE convergence and new Date & Time APIs. Oracle is planning to contribute Nashorn to the OpenJDK project. Nashorn is said to be a high performance, modern JavaScript implementation on the JVM and will probably replace the experimental Rhino JavaScript engine shipped since JDK 6. NetBeans uses Nashorn internally for its JavaScript support.
Java 9 will likely include at least Jigsaw modularity, which was deferred from Java 8 and is scheduled for 2015. While some potential development areas were listed for this release the details were pretty scarce, as can be expected at this time.
Nandini Ramani, VP / Engineering, Java Client and Mobile Platforms, then took to the stage to describe plans for Java Client and Embedded. It’s interesting to note that JavaFX is not currently supported on all Oracle supported Java platforms, which would in theory seem to contradict the “write once, run anywhere” proposition. Ramani was briefly joined by people from Navis and Canoo to present a JavaFX in cargo management case study.
Then back to longer term plans for the JDK. Phil Rogers of AMD described Project Sumatra, which aims to bring heterogeneous computing platform to Java. Rogers described the hardware trends behind the project:
1) first the move from single core to multi-core CPUs and now to 2) full SOCs (system on chip) and a heterogeneous computing platform, where we combine a CPU and the parallel processor of the GPU into a single piece of silicon and shared memory
High level of parallelism is required from the platform by workloads such as media processing, AI, and big data. With Sumatra developers will be able to write code that will take advantage of the heterogeneous computing platform without explicitly coding for it. The JVM will decide on runtime whether to run the code on CPU or GPU.
Ramani then came back to tell about Java in the Embedded space. I’ve written another blog entry about this, so I won’t go into detail here. It was interesting to note, however, that Oracle seems very determined to push Java in the embedded space and they’re talking a lot about the “Internet of Things” and M2M communication. In Java Embedded their focus seems to be on small headless devices, which apparently doesn’t include smart phones. They also want to lower the barrier of entry for a Java SE developer to enter embedded development through Java ME / SE convergence mentioned earlier. This could create interesting opportunities for developers by allowing them to move between these ecosystems. Java ME / SE convergence appears to be a key driver behind JDK 9 modularization (Jigsaw). Ramani concluded her part of the keynote by introducing two more case studies: Java enabled SOC by Cinterion (Java Embedded) and MintChip by The Royal Canadian Mint (Java Card based digital currency).
Cameron Purdy, VP Fusion Middleware Development and Java EE, took to stage after Ramani to discuss Java EE status and direction. He started off by briefing on Java EE 6 adoption among application developers and JEE server vendors. He then went on to describe Oracle’s Java EE focus areas that include standardization, productivity, portability, extensibility and modularity. Like other keynote speakers, Purdy also emphasized that developing the Java EE platform and specifications is a community effort. He presented some interesting details about Java EE release dates, themes and number of specifications included up to Java EE 7. Java EE 7 is currently scheduled for Q2 2013. The release themes include HTML 5 and continued developer productivity. Features such as WebSockets, Servlet 3.1 NIO, Server Sent Events, JSON, REST are considered to fall under the HTML 5 theme umbrella while API pruning, built on Java SE 7, JCache, JMS 2.0 and batch are driven by the productivity goal. Some features that Oracle would like to see in Java EE 8 were discussed briefly, but it will be the responsibility of the eventually formed expert group to decide what will go into the actual specification. Cloud programming (multitenancy for SaaS apps, PaaS enablement) model standardization was a feature deferred from Java EE 7 and will likely be included in JEE 8. Other things being considered include NoSQL, Project Avatar, state management, JSON-B and modularity based on Jigsaw. Purdy finally invited Nicole Otto from Nike to endorse Java EE as the platform for Nike’s online services.
In the final part of the keynote, Robert Ballard, oceanographer and discoverer of RMS Titanic, talked about innovation and science education. He described how modern oceanography makes pretty advanced use of information and communications technology. He told he’s often asked what he’d like to discover next? A spaceship, he said. Why? Because then I’d never have to talk about the Titanic again 🙂
IBM Keynote
IBM was a diamond sponsor for the conference and they presented their own keynote, right after the strategy keynote. The IBM talk focused a lot on cloud enablement and optimization, multitenancy, tenant isolation and reducing footprint. Polyglot also appears to be on IBM’s Java platform agenda as they discussed support for multiple JVM-based languages. A key part of IBM’s message was that hardware matters. Even if Java developers typically work at a level where the underlying hardware is abstracted away, system hardware architecture design is still crucial for mission critical applications. Somewhere deep below all the layers of indirection, hardware virtualization and JVM simulated virtual machine, the code is still run by physical processors. And since IBM can deliver the whole stack from server hardware and storage to language runtime and middleware, all the pieces have been designed and optimized to work together. So, IBM was basically echoing the Oracle “software and hardware, engineered to work together” -value proposition. They also presented SPECjEnterprise and SPECpower_ssj2008 performance performance benchmark figures where the IBM J9 JVM came out as the winner.
Java Technical Keynote
The Technical keynote was primarily delivered by Oracle Java Platform Chief Architect, Mark Reinhold. The technical keynote focused on Java SE (Java 8) and Java EE (JEE 7) platform releases. These releases were presented against the backdrop of sample applications (Schedule builder and Angry bids). Java Language Architect Brian Goetz dropped by on stage to show how Java 8 Lambda together with changes in the collections API can make the JavaFX Schedule builder application code more beautiful, and improve code and libraries in general. A large part of the presentation was dedicated to Jigsaw, which I think will play a really big role in the future of the platform. Jigsaw will not be included in Java 8 but Lambda, compact profiles, Nashorn, data/time API and type annotations will. In addition, various smaller things like PermGen removal, bulk data operations, parallel array sorting etc. are also scheduled for Java 8.
Arun Gupta, Java EE Technology Evangelist, then talked about Java EE in more detail than in the strategy keynote. Gupta briefly talked about Java EE history and current status in terms of release dates, release theme and dates. He then dived deeper into the Java EE 7 specification content. Some of the more interesting current candidate specification requests for Java EE 7 include: JAX-RS 2.0, EL 3.0, JMS 2.0, Java Caching API, Java API for JSON and Java API for WebSocket. Many other EE specifications will also get smaller updates such as JTA, EJB, CDI and JPA. After more than ten years in the making, who would’ve thought the Caching API specification would actually get finished some day 🙂 I was happy to see that EE 7 will not only bring additions to the specification, but will also remove things by making some APIs optional. The idea of pruning was introduced already in EE 6, so it’s not new, but it’s good to see the cleanup process continuing. Gupta then moved to detail changes to selected EE sub-specifications and demonstrated how the updates would improve productivity and reduce boilerplate code.
Thin server architecture is still an emerging architectural model for designing web applications that moves view generation from the server-side to the client side. Thin server architecture is platform agnostic and it effectively moves a lot of server-side code to the browser side that has traditionally been the domain of front-end or web developers. As the name implies, the server-side gets a lot simpler and thinner, and with this change, in my opinion, comes a really big productivity challenge for developing the Java backend wrt. to dynamic languages. Project Avatar and Easel are projects that are tackling this problem and exploring what kind of infrastructure and tooling is required end-to-end to build TSA applications on the Java platform. Some of the tooling is already available in NetBeans 7.3 beta, so it’s something that can be tried out right now. A TSA sample application called Angry bids, as well as the tooling part for developing the app were demoed.
Java Community Keynote
The Java community keynote was scheduled for the last day of the conference and started off with lots of thank yous and some making waves. After that, Gary Frost of AMD was brought up by Donald Smith (Oracle) to discuss Project Sumatra that was mentioned earlier in the Java Strategy keynote. AMD has been working to make it possible for Java developers to take advantage of the GPU for a few years now, and they’ve released an open source project called Aparapi for doing this. Aparapi requires that code be specifically written to get it executed on a GPU, but Sumatra aims to make all this unnecessary. Frost showed some interesting demos of rendering a Mandelbrot set, Game of life and N body physics simulation using Aparapi. Frost said AMD is hoping to get Sumatra included in the JDK within Java 9 timeframe.
Smith then reflected on the role of Java in innovation. His approach was to separately mindmap the strengths of Java and fostering innovation, and try to see how these two could be linked together. He invited people from Eucalyptus, Twitter, Cloudera, Eclipse and Perrone Robotics for a panel discussion on the role of Java in innovation.
After the innovation panel, Martijn Verburg of London JUG, introduced the Adopt a JSR -program they had started. The purpose was said to be to prevent bad specifications, such as EJB 2.0, from happening again by engaging ordinary developers in the specification process. Verburg hosted a short panel where he asked the panelists a range of questions related to their role in the Java community and Java specification process.
After the panel Saab brought up Paul Perrone to discuss and demo a Java based robotics platform his company develops. Continuing on the robotics theme, Java creator James Gosling came up on stage wearing his Sun Microsystems t-shirt to tell about his current work at Liquid Robotics, and how they’re using Java. Liquid Robotics is building robots that float in the ocean and gather telemetric data of different kinds for various purposes (e.g. marine mammal and pollution tracking, weather data, global warming studies etc.). Java is used for analysing the data delivered by the robots, but also the newer robot generation has an ARM processor and runs JDK 7 on Linux (ARM). They’ve built a Swing based UI for studying and drilling down the data, e.g. the routes that each robot has travelled. Gosling had evaluated all of the NoSQL databases for his use cases but felt that no existing ones worked well with the telemetry data they process, so he built his own NoSQLish database. The data they receive is really valuable, so reliability is crucial, which is why they’re using 3 different hosting providers. After evaluating hosting providers he confessed to be a real Jelastic fan. So, since Gosling in his role as the chief software architect in his new company picked to build on Java and chose to present at JavaOne, I guess it means he still has a soft spot for the platform.
Conclusions
Oracle is a huge company and many people in the developer and OSS communities have had reservations about what will happen to Java under Oracle leadership, and whether Java will be submitted to its owner’s short-term commercial ambitions. But despite its huge size Oracle is not self sufficient and their long term success is very much tied to the larger developer ecosystem. This means that Oracle needs to make sure Java is a platform that developers want to invest their human capital in also in the future.
Active community participation is absolutely vital for Java’s long term viability and it’s reassuring to see that Oracle seems to acknowledge and commit to this. Recent changes in the Java Community Process (JCP), that governs the rules for creating Java specifications, require a more open and transparent way of working from the expert groups. By making OpenJDK the Java SE reference implementation (RI), Oracle has leveled the playing field with regard to other Java SE vendors, as now Oracle’s Java implementation is only one Java SE implementation among others that has to conform to the specification and RI. Oracle has also been able to engage IBM in OpenJDK instead of Apache Harmony, which I think overall will be reduce the risk of fragmentation and benefit the whole Java community.
According to the Java Community Process, the specification lead of a particular JSR is responsible for developing the specification, but also for producing a reference implementation as well as Technology Compatibility Kit (TCK or test suite). For large specifications, such as Java SE and EE, this is no small task. OpenJDK is the Java SE reference implementation while GlassFish is the Java EE RI. There’s been some speculation about whether the JDK will remain to be made freely available, as well as the future of some Sun Java products, such as GlassFish and NetBeans, under Oracle leadership. OpenJDK and GlassFish have a clear role to play in this picture as platform reference implementations. NetBeans on the other hand provides support for emerging technologies and day 0 support for new Java standards, which is important for allowing developers to actually get hands-on experience with new standards. So, currently none of these products would appear to be redundant.
Traditionally ME and SE/EE development were regarded very different and were typically performed by people with different skill sets. The plan for ME / SE convergence on platform and API level could change that in the short term (Java SE 8 timeframe). Also, with the merge of previously separate JCP executive committees for ME and SE taking place in november, work is being carried out in the process level to try and avoid the platforms from diverging in the future.
Google used to be a visible and active member of the Java community before the legal dispute between Google and Oracle started over Android. Google has also released quite a few interesting Java based components as open source so, it’s been a pity to see Google withdraw from JavaOne as well as many other Java communities. No googlers appeared to be presenting at this year’s JavaOne either. I was surprised to find out after the conference (through some googling) that Google is actually still a member of the JCP Executive Committee and they’ve also joined the Java SE 8 expert group in August 2012! Hope they will be able to have a more active role in the Java ecosystem in the future.
There’re a lot of interesting technology changes planned for Java. Some of the changes I’m really looking forward to include
- JDK modularization (via Project Jigsaw, JDK 9)
- thin server architecture support (via Project Avatar and Easel, NetBeans v7.3, Java EE 7 / 8)
- Java SE / ME convergence (JDK 8)
- compact profiles (JDK 8)
- heterogeneous computing platform support (via Project Sumatra, JDK 9?)
Many enhancements and changes that are clearly driven by polyglot requirements appeared on Oracle’s tentative roadmap plans, so they seem to be serious about improving polyglot support in the JVM.
Based on the conference and actual work being carried out by Oracle and the larger Java community, I think Java will remain viable as a community, technology platform and an ecosystem.
Getting started with Oracle WLS 12c
2012-09-16
Posted by on I’ve been developing software for different incarnations of the Oracle Application Server in the past (Oracle OC4J 10g R3 and BEA/Oracle WebLogic (v8.1 and v10.3), but it’s been quite a while since my last encounter with the server. During recent years I’ve been involved mostly with other application servers. Despite occasional hiccups, I had been reasonably satisfied with the server, so I was curious to give the latest version of Oracle’s application server a quick test drive. Having a background in software development, I thought I’d approach this first from a developer perspective, checking out how the application development workflow (including code, build, deploy) feels like with the latest version. Obviously, much of the workflow is actually about generic Java EE development (as opposed to app server specific development) as long as you adhere to the standard, but I’ve felt that trying simulate the development workflow gives you a more complete view of what it’s like to work with a particular app server product. Instead of coding a Java EE app myself or porting an existing one, I thought I’d work with sample applications made available by others.
Installing Oracle WebLogic Server
There are several options for getting WebLogic running for development purposes: a) use Oracle JDeveloper and the embedded WLS server b) use the IDE of your choice and the WLS zip distribution c) use the IDE of your choice and the full WLS. Since the focus of my test was to check out WLS for development purposes (but not JDeveloper), I chose option b.
So, I downloaded the following WLS distributions from Oracle:
- WLS Zip Distribution for Oracle WebLogic Server 12.1.1.0
- WLS Supplemental Zip Distribution for Oracle WebLogic Server 12.1.1.0
The first distribution includes the application server itself and weighs approximately 184 MB. The second one includes sample code.
The installation process is pretty well documented in the WLS package README files but there were a few small gotchas, though. The supplemental zip distribution also includes a nice set of documentation, including architecture description, for the samples in found in $MW_HOME/wlserver/samples/server/index.html.
Here’s the installation procedure I used:
(The text below has been written for Mac OS X and assumes WLS has been installed in $HOME/opt/wls1211_dev but it should be trivial to adapt the instructions for other configurations.)
# 1. extract WLS (see README.txt in WLS package) mkdir wls1211_dev cd wls1211_dev unzip ~/Downloads/wls1211_dev.zip # 2. set environment variables export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home export USER_MEM_ARGS="-Xmx1024m -XX:MaxPermSize=256m" export MW_HOME=$HOME/opt/wls1211_dev # 3. run the installation script . ./configure.sh
We’ll skip WLS domain creation for now, because the samples setup script creates one for us, and start up and move straight to installing the WLS supplemental distribution.
# wls supplement (see README_SUPP.txt) unzip ~/Downloads/wls1211_dev_supplemental.zip # 64-bit environments . wlsenv.properties # create WLS domain, server, database etc. ./run_samples.sh
This script sets up a WLS domain, WLS server and a database server for the sample application, configures datasources etc. When I tried to start up the sample domain at this point, I received an error about JRE not being found, so decided to reset the environment variables by firing up a new shell session and then set the WLS environment variables again:
# start up WLS sample domain export MW_HOME=$HOME/opt/wls1211_dev export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home export USER_MEM_ARGS="-Xmx1024m -XX:MaxPermSize=256m" $MW_HOME/wlserver/samples/domains/medrec/startWebLogic.sh
If you have a GUI session with your OS, a web browser should open up with the sample application page.
Sample app #1: MedRec
Oracle provides a WLS supplemental zip distribution aimed at development use. The supplement includes code samples demonstrating various aspects of using different Java EE technologies. It also includes a complete Java EE v5.0 (why not v6.0?) sample application called Avitek Medical Records or MedRec. It claims to “showcase the use of each Java EE component, and illustrates best practice design patterns for component interaction and client development“.
After I got the application server and sample application up and running I wanted to start browsing the application source code and see how to make modification.
You can build and deploy the sample application using the following commands:
# set up the environment for the build export MW_HOME=$HOME/opt/wls1211_dev . $MW_HOME/wlserver/samples/domains/medrec/bin/setDomainEnv.sh cd $WL_HOME/samples/server/medrec # build + deploy ant -Dnotest=true deploy
The Ant command will build and deploy the new application version, if you have the application server up and running. (Environment variables set by the WLS installation scripts appeared to interfere somehow with the ones set by setDomainEnv.sh and I had to start a new shell session to make the build work.)
The sample application includes Eclipse project and classpath files, so you can easily import the application code in Eclipse (e.g. Juno). The application depends on different Java EE and third-party APIs that are bundled with the application, so you’ll end up seeing lots of errors in Eclipse. The easiest way to get the source code imported and classpaths set up correctly is to use the Oracle provided Eclipse distribution (Oracle Enterprise Pack for Eclipse [v12c for Eclipse Juno]). Here’s how to import the code in OEPE and create a WLS 12c runtime configuration:
- create new workspace
- configure WebLogic server runtime
select: window / show view / other
server / servers
new server wizard
select server type: Oracle / Oracle WebLogic Server 12c
and fill in the following:
WebLogic home: $HOME/opt/wls1211_dev/wlserver
Java home: /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home
Domain directory: $HOME/opt/wls1211_dev/wlserver/samples/domains/medrec - import MedRec source code
file / import: general / existing projects into workspace
select root directory: $HOME/opt/wls1211_dev/wlserver/samples/server/medrec - select all 12 projects
configure target server runtime for each project
select project / properties / server or targeted runtimes and choose “Oracle WebLogic Server 12c”. Uncheck WebLogic 10.3 version. - refresh all projects
At this point your Eclipse project explorer should look like this and you should be able to do a full modify-build-deploy cycle:
Sample app #2: Pet Catalog
The Pet Catalog is a Java EE 6 sample application that demonstrates usage of JavaServer Faces 2.0 and the Java Persistence API. It’s based on a three-tiered architecture on a logical level, but both the presentation and logic tier components are packaged in a single WAR module.
With the first sample app, we were able to skip creating a WLS domain because the installation script created one for us, but now we’ll have to create one. In WLS, the concept of a domain refers to a logically related group of WLS servers and/or server clusters that are managed as a unit. Each domain has an administration server, which is used to configure, manage and monitor other servers and resources in that domain. Additional servers in the domain are called managed servers, which are used for deploying and executing Java EE artifacts. The administration server is meant to be used only for administration, though you can deploy applications to it in development installations.
Creating a WLS domain
# setup WLS environment export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home export USER_MEM_ARGS="-Xmx1024m -XX:MaxPermSize=256m" export MW_HOME=$HOME/opt/wls1211_dev . $MW_HOME/wlserver/server/bin/setWLSEnv.sh # create a new WLS domain and start WLS mkdir -p $HOME/wls/dom1 cd $HOME/wls/dom1 $JAVA_HOME/bin/java $JAVA_OPTIONS -Xmx1024m -XX:MaxPermSize=256m weblogic.Server
Building the application
The source code links found on the sample app web pages didn’t seem to be working. The application source code comes bundled with NetBeans 7.2 Java EE, however so you can get the source code from NetBeans by choosing:
File / New Project
choose project: samples / Java EE / Pet Catalog
Java’s “Write once, run anywhere” is a great value proposition, but especially in Java EE space delivering on that proposition has been lacking. Portability issues arose also in this case, when I tried deploying to WLS the Pet Catalog app, that apparently had been tested mostly on GlassFish. The actual issue seemed to be related more with the particular JPA implementation (EclipseLink) than standard JPA, but I think it’s telling evidence of portability issues since this is supposed to be a standard Java EE showcase sample application. Once, I managed to find out what was causing the issue, fixing it was simple. Often application servers have their own, sometimes very unintuitive, ways of reporting issues and troubleshooting is an area where experience in your particular application server product can really make a big difference. Also, often with well-architected applications, it’s the packaging and deployment where portability problems typically arise, instead of the actual code.
In this case I ran into a problem with datasource authentication. To fix the deployment issue I had to modify the persistence unit definition in persistence.xml by commenting out eclipselink.jdbc.user and eclipselink.jdbc.password parameters.
Deploying the application
Create and initialize the database
Pet Catalog uses a MySQL database for persisting data. A database, tables and a user account must be created before deploying the application.
create database petcatalog; GRANT ALL ON petcatalog.* TO 'pet1'@'localhost' IDENTIFIED BY 'pet1'; cat setup/catalog.sql | /usr/local/mysql/bin/mysql -h 127.0.0.1 -P 3406 -u pet1 -f -p petcatalog
Create a Data Source
Once you’ve set up the database, the database connection or datasource needs to be configured in the application server. To do this, log on to WLS console and do the following:
Choose: Services / Data Sources / New / Generic Data Source
Then on “JDBC Data Source Properties” page fill in the following:
- Name: petCatalogDS
- JNDI Name: jdbc/petcatalog
- Database Type: MySQL
- Database Driver: MySQL’s Driver (Type 4) Versions:using com.mysql.jdbc.Driver
And on “Transaction Options” page:
- Supports Global Transactions
- One-Phase Commit
Then “Connection Properties”:
- Database Name: petcatalog
- Host Name: localhost
- Port: 3406
- Database User Name: pet1
- Password: pet1
Test Database Connection
And finally on “Select Targets” page choose the server to deploy to:
- myserver
Deploy WAR
Finally, deploy the application WAR to WLS. The application should run without customizing any deployment parameters.
Conclusions
In my quick test drive I focused mostly on the development workflow related aspects of the WebLogic server (developer distribution), and not operational aspects such as scalability, availability, reliability, operability etc. WLS appears to be a capable, feature rich Java EE application server, as could be expected from a major vendor, but the zip distribution was also relatively light-weight and ran quite well on my laptop.
WLS has very nice server administration capabilities: you can easily view and edit the configuration using command line tools, but a comprehensive web-based administration console is also available that allows you to perform any server administration task. The server configuration is persisted in XML files (e.g. config.xml) that are stored under a single filesystem directory tree, which makes it easy to compare and migrate configuration files. The console just enables administrators to manipulate these configuration files through a web UI. The web console has a much more comprehensive feature set than e.g. the one in Jboss EAP 5. WebLogic also features a command-line scripting environment (WLST) that you can use to create, manage, and monitor WLS domains. Due to XML based configuration and scripting support backup and recovery of server configuration, as well as taking snapshots and rolling back changes should be easy. Deploying the exact same configuration should be simple as well.
It seems odd that the sample application doesn’t showcase all the new features of the latest-and-greatest Java EE specification version that the WebLogic server supports. Also, the basic development mode installation could’ve been made simpler still, similar to some other app servers where you only need to do a simple unzip. Production installation is of course an entirely different story.
System call tracing is your friend
2012-08-26
Posted by on After downloading and installing Java SE 7 update 6 I tried running “java -version” to verify that the JDK was installed properly. To my surprise, the command reported the previous version instead of update 6. I then tried troubleshooting the problem using:
pkgutil --verbose --files com.oracle.jdk7u6 installer -dumplog -verbose -pkg '/Volumes/JDK 7 Update 06/JDK 7 Update 06.pkg' -target /
but with no effect. Then, browsing through the previous Java 7 installation directory parent directories I noticed that with update 6 the installation path was actually
/Library/Java/JavaVirtualMachines/jdk1.7.0_06.jdk
instead of
/Library/Java/JavaVirtualMachines/1.7.0.jdk
as with the previous Java 7 update releases, so I was using the old absolute path in my “java -version” command.
Now, on Linux one of my first troubleshooting methods would’ve been to use the strace command, but for some reason this doesn’t come instinctively for me on Mac OS X. On the Mac the equivalent command is called dtruss and it would’ve revealed the new installation path immediately, as strace would’ve:
dtruss 'installer -dumplog -verbose -pkg /Volumes/JDK\ 7\ Update\ 06/JDK\ 7\ Update\ 06.pkg -target /' ... kevent(0x3, 0x153C67788, 0x1) = 1 0 audit_session_self(0x7FB1EB9640E0, 0x7FB1EBBEB150, 0x78) = 6659 0 kevent(0x3, 0x153C67788, 0x1) = 1 0 lstat64("/Library/Java/JavaVirtualMachines/jdk1.7.0_06.jdk", 0x153C65860, 0x1) = -1 Err#2 stat64("/Library/Java/JavaVirtualMachines/jdk1.7.0_06.jdk", 0x153C668B8, 0x0) = -1 Err#2 getattrlist("/", 0x153C665A0, 0x153C66190) = 0 0 getattrlist("/Library/Internet Plug-Ins/JavaAppletPlugin.plugin", 0x153C665A0, 0x153C66190) = 0 0 ...
So, when troubleshooting OS level problems, system call tracing is always your friend, irrespective of the operating system. This is a good case in point.
Asynchronous event-driven servers with Apache MINA
2012-08-06
Posted by on A while ago we had to do performance testing for a web application that depends on an external network service that couldn’t be tested in-place with high data volumes. We wanted to include the network protocol communication with the external service in the test (i.e. work on “system integration testing” level) and there was no existing mock server, so I decided to spend a few hours evaluating if we could implement one ourselves. Since the mock server can obviously become a bottleneck I had to make sure it was implemented efficiently (IO, threading, session and memory usage etc.) enough.
Implementing a server that leverages asynchronous IO with Java NIO can be a tedious task mainly because incoming and outgoing protocol messages will get fragmented and you need to handle things like defragmentation and state management. The network protocol handling code can be difficult to get correctly and if you don’t design your abstractions carefully, it will get intertwined with application level logic resulting in unmaintainable code.
There are several prominent asynchronous event-driven network communication frameworks for Java that you can use for implementing protocol servers and clients. Among the better known are Netty, Apache MINA and GlassFish Grizzly. These frameworks allow implementing scalable, high-performance and extensible network applications. The application developer is freed of much of the protocol message handling, state, session and thread management details. All of the frameworks listed above are widely used and mature, but I had to pick one and decided to give Apache MINA 2.0 a try.
Apache MINA defines the concept of a service, which in abstract terms represents a network accessible endpoint that a consumer can communicate with to request it to perform some well-defined task. An IoService class instance acts as an entry point to a service, which is implemented as a connector on the client-side and as an acceptor on the server-side. An acceptor is used when implementing servers and they act as communication endpoints to a service accepting new sessions and mediating network traffic between consumers and the server side components responsible for actual message processing. The application developer picks an appropriate acceptor type (e.g. NioSocketAcceptor for non-blocking TCP/IP) based on his requirements. Acceptors are responsible for network communication, connection and thread management etc. but you they delegate responsibilities to other interfaces that you’re free to customize and configure. As a minimum you’ll need to configure an IoHandler interface implementation that takes care of handling different I/O events, for servers most notably receiving messages, but you can also choose to handle session and exception related events. An acceptor can also have multiple filters that can do I/O event pre and post processing. You’ll typically need to configure at least a protocol message encoder and decoder (ProtocolCodecFilter), that will take care of message serialization and deserialization.
I found that Apache MINA really did fulfill its promise and implementing a high-performance, scalable and extensible network server was easy using it. MINA also helps very cleanly separate network communication and application level message processing logic. Supporting multiple different protocols in in the same server is well supported in MINA. As a downside the documentation for v2.0 is a bit lacking, but fortunately there are quite a few code samples that you can check out.
Using Oracle SQLDeveloper with MySQL
2012-08-03
Posted by on Oracle SQLDeveloper is a tool I’ve found very valuable in projects where I’m using the Oracle Database. Normally I like using command line tools, but many tasks such as browsing large result sets or data in fat tables, browsing database schema metadata etc. are much faster with SQLDeveloper. SQLDeveloper supports other relational databases and since I’m currently working on a project involving MySQL, I thought I’d give SQLDeveloper (v3.1.07) a little test with MySQL (v5.5).
You can install extensions in SQLDeveloper in a similar fashion as in Eclipse and there’s a MySQL JDBC driver available (Third Party SQLDeveloper extension). For some reason the extension failed to install properly on my Mac: everything looked to be going fine but the installation failed silently for some reason. You can configure JDBC drivers manually in SQLDeveloper, however, so I downloaded the MySQL driver and configured it (preferences / database / third party JDBC drivers). After that, a new tab called “MySQL” appears when creating a new database connection. Here you can specify DB product specific connection parameters.
I was able to successfully connect to my MySQL database but when trying to browse table data on a table containing 5+ M rows, the operation failed with the following error:
Task Error
Java heap space
I don’t remember running into this problem with SQLDeveloper when connecting to Oracle DB. As a workaround I modified the Java VM heap size argument that SQLDeveloper passes to Java VM at launch (sqldeveloper.conf configuration file).
I also, wanted to test if SQLDeveloper would run with my newly installed Java 7 but that turned out to be a bit more difficult. On Mac OS X, changing the Java path in the SQLDeveloper default configuration files had no effect, as this parameter was overridden in a platform specific configuration file that had to be changed (sqldeveloper-Darwin.conf), in order to use an alternate Java VM. The correct configuration file to change was revealed after starting up SQLDeveloper with –verbose flag from the command line:
sqldeveloper.sh --verbose
SQLDeveloper can help in a number of ways when you’re working with Oracle DB including: provide wizards for creating and editing table definitions, import and export data and allow viewing and changing many aspects of database metadata. The SQL Worksheet can help you when writing SQL statements with the autocompletion feature. SQLDeveloper is a great tool to use with Oracle DB, but you should note that some of its features aren’t available in SQLDeveloper for other database products.