Tuesday, June 29th, 2010

Network versus Relational: Part III

In previous posts (original post, follow up) I ran small database performance tests involving Neo4j (a graph or network database) and Apache Derby (a relational database). Both are Java-based.

How about running the same tests with trusty old PostgreSQL? Neo4j and Apache Derby were both embedded in the test application. There was no database server, although Derby may be run in client-server mode.

PostgreSQL is a fully featured RDBMS firmly founded on the classical client-server architecture. It’s been around for ages. The server is coded in C.

The PostgreSQL logo is a stylized head of an elephant. If you think it’s a symbol of heavy feet you are in for a surprise.

I installed PostgreSQL 8.3.9 on the box where the previous tests were run.
Tuning PostgreSQL is an art in itself. It’s a big system with many parameters. To avoid getting stuck in configuration I decided to run PostgreSQL with default configuration, no tweaking. Server and client ran in the same box. The test programs needed only a minor touch-up.

I was in a hurry, so I ended up having data on the OS disk. Neo4j and Derby had the luxury of storing data on a dedicated database disk.

The effect of caching in the PostgreSQL server is very visible. Repeating a test improves the test case performance. Measurements were made after stopping and starting the server. In a real application, of course, you want the benefits of server-side caching.

Create database: PostgreSQL performance was about the same as Apache Derby in the small data sets, but noticeably faster in the long-running ones. Here is the table from the original post complemented with PostgreSQL but without the database size columns.

FilesTime (s)
7 1094.813.412.1
53 37816.451.053.7
520 691479.8408.9

Neo4j shines in this test, except that it had a problem with large transactions.

Retrieval: PostgreSQL wiped the floor with Neo4j and Apache Derby. The original test on a scan containing 524.000 files took less than 0.8 s. with 1.2 million records in the file table. The modified test case, testing for inclusion in a subtree, took 0.3 s, still with 1.2 million files in the database. This is an order of magnitude faster than the Java databases. I double-checked that Postgres hadn’t created indexes behind my back, but no.

C is still faster than Java. PostgreSQL left the Java databases in the dust in the retrieval tests.

Comments are closed.