<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Söderström Programvaruverkstad AB</title>
	<atom:link href="http://www.soderstrom.se/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://www.soderstrom.se</link>
	<description>Where problems are solved</description>
	<lastBuildDate>Thu, 02 Sep 2010 08:18:43 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Neo4j Performance Revisited and Appreciated</title>
		<link>http://www.soderstrom.se/?p=354</link>
		<comments>http://www.soderstrom.se/?p=354#comments</comments>
		<pubDate>Thu, 02 Sep 2010 08:18:43 +0000</pubDate>
		<dc:creator>Håkan</dc:creator>
				<category><![CDATA[Database views]]></category>

		<guid isPermaLink="false">http://www.soderstrom.se/?p=354</guid>
		<description><![CDATA[My previous Neo4j performance micro-benchmark left a disturbing hole: There was no explanation why Neo4j didn&#8217;t cope well with big transactions.
A closer study uncovered the pretty obvious reason. It also turned out to be the key to zippier Neo4j performance.

Perhaps I should explain my seeming obsession with Neo4j. Once upon a time I was part [...]]]></description>
			<content:encoded><![CDATA[<p>My previous Neo4j performance micro-benchmark left a disturbing hole: There was no explanation why Neo4j didn&#8217;t cope well with big transactions.</p>
<p>A closer study uncovered the pretty obvious reason. It also turned out to be the key to zippier Neo4j performance.<br />
<span id="more-354"></span><br />
Perhaps I should explain my seeming obsession with Neo4j. Once upon a time I was part of the development of an &#8220;object oriented&#8221; database system, doing much of its design. Quotes around &#8220;object oriented&#8221; because it was really a graph database in today&#8217;s terminology. It was written in C, having strong schema support. C plus schema support made data access extremely fast, a matter of a handful CPU cycles. EasyDB, the current name, has been further developed and is still available from <a href="http://www.basesoft.se/">Basesoft</a>. It is a NoSQL database, although the term didn&#8217;t exist at the time it was engineered.</p>
<p>The key to Neo4j performance is memory. I reran the test cases with the Java memory options -Xms128m -Xmx1g, allowing the JVM to expand to 1 GB of heap space rather than the default 64 MB.</p>
<p>With these memory settings scanning 115.735 files and writing file information to the database became 35 times (!) faster with Neo4j than Apache Derby or PostgreSQL. Neo4j quickly claimed all of the 1 GB heap space, actively using up nearly 800 MB, almost 8 kB per node. Despite this remarkable performance it still spent a large fraction of the time doing MarkSweep garbage collection.</p>
<p>In comparison, Apache Derby had a modest memory footprint: under 100 MB and only Scavenge garbage collection.</p>
<p>MarkSweep garbage collection is very time-consuming. It seems Derby has logic to avoid incurring it. Neo4j has a tendency to trigger MarkSweep whenever it comes even remotely near the memory limit. It may escape the fatal OutOfMemory, but performance suffers badly.</p>
<p>As for reading the database, more memory didn&#8217;t change the relative performance figures a lot. Derby and PostgreSQL stand up well against Neo4j, using a fraction of the memory. A few observations:</p>
<ul>
<li>Neo&#8217;s <code>getAllNodes</code> uses a lot of memory, 600 MB in this test, and seems prone to triggering MarkSweep.</li>
<li>The Groovy overhead became visible as a side effect of monitoring the test programs. Only Derby spent a sizable amount of time (20%) in Groovy-specific code. For Neo4j, garbage collection dwarfs the Groovy overhead.</li>
</ul>
<p>Let it be perfectly clear that the reading test case for Derby and PostgreSQL involves only one or two queries. I think it is relevant to compare row-to-row table scan in the relational databases to node-to-node traversal in Neo4j.</p>
<p>In my test cases the data model could be adapted to the problem. The relational databases only had to use one or two queries. There might be situations when this is not possible. In such case the hierarchy must be traversed with one query per node, with a devastating performance hit for the relational databases. Only a specialized database like Neo4j keeps its performance without a flinch.</p>
<blockquote><p>Software: Neo4j 1.1, Apache Derby 10.6.1.0, PostgreSQL 8.4.4, Groovy 1.7.4, Java 1.6.0_21. Platform: 64-bit openSUSE 11.3 on a humble Dell box with a Pentium Dual-Core CPU E5300 @ 2.60GHz CPU and 2 GB of memory.</p></blockquote>
<p>Previous posts:</p>
<ul>
<li><a href="http://www.soderstrom.se/?p=191">Network versus Relational</a></li>
<li><a href="http://www.soderstrom.se/?p=256">Network versus Relational: Part II</a></li>
<li><a href="http://www.soderstrom.se/?p=266">Network versus Relational: Part III</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.soderstrom.se/?feed=rss2&amp;p=354</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>2010 U-City World Forum</title>
		<link>http://www.soderstrom.se/?p=338</link>
		<comments>http://www.soderstrom.se/?p=338#comments</comments>
		<pubDate>Fri, 27 Aug 2010 08:13:14 +0000</pubDate>
		<dc:creator>Håkan</dc:creator>
				<category><![CDATA[Korea]]></category>
		<category><![CDATA[U as in Ubiquitous]]></category>

		<guid isPermaLink="false">http://www.soderstrom.se/?p=338</guid>
		<description><![CDATA[The Korean Ubiquitous City Association is organizing a first U-City World Forum this year, calling for international participation.
A U-city is a city where you want to live, by definition. South Korea has seen an astounding rate of urbanization. A large part of the country&#8217;s population lives in Seoul, a megacity. No wonder building attractive and [...]]]></description>
			<content:encoded><![CDATA[<p>The Korean Ubiquitous City Association is organizing a first U-City World Forum this year, calling for international participation.</p>
<p>A U-city is a city where you want to live, by definition. South Korea has seen an astounding rate of urbanization. A large part of the country&#8217;s population lives in Seoul, a megacity. No wonder building attractive and sustainable cities is a hot topic in Korea. Large-scale projects are underway to take principles to practice.<br />
<span id="more-338"></span><br />
The Korean U-city concept includes a strong technology component. Information technology is regarded as a utility that is woven into the infrastructure from the beginning, just like water and electricity. This is not without reason. Korea has been very successful in building an IT industry, winning a big chunk of the world market. It is one of the pillars underlying Korea&#8217;s current high standard of living.</p>
<p>President Lee, who took office in 2008, quickly proclaimed &#8220;Low Carbon, Green Growth&#8221; as the national vision. The vision has been picked up by the Korean Ubiquitous City Association. The most prominent U-city keywords are <strong>green</strong> (referring to environmental considerations) and <strong>smart</strong> (referring primarily to information technology).</p>
<p>In Europe the &#8220;U&#8221;/&#8221;ubiquitous&#8221; terminology is a barrier to understanding concepts like U-city. This terminology is unique to Korea and Japan. See the <a href="http://www.soderstrom.se/?cat=39">U as in Ubiquitous</a> blog category for more information.</p>
<p>The Korean U-city concept envisions tight monitoring of public resources and the public space. In my opinion Europeans are more apprehensive of such surveillance than Koreans.</p>
<p><strong>Resources:</strong> The <a href="http://www.uwforum.org">U-City World Community</a> is a new Korean web site promoting the U-City World Forum.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.soderstrom.se/?feed=rss2&amp;p=338</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Low Level File Information For Java</title>
		<link>http://www.soderstrom.se/?p=325</link>
		<comments>http://www.soderstrom.se/?p=325#comments</comments>
		<pubDate>Fri, 27 Aug 2010 06:55:02 +0000</pubDate>
		<dc:creator>Håkan</dc:creator>
				<category><![CDATA[Geek peeks]]></category>

		<guid isPermaLink="false">http://www.soderstrom.se/?p=325</guid>
		<description><![CDATA[This post is good news for some Java programmers. Once in a while you wish you knew a bit more about the files you access from Java. The java.io.File class offers only a bare minimum of visibility.
If you are wrestling with this problem, and if you are on a Linux/Unix platform, download the filestat package [...]]]></description>
			<content:encoded><![CDATA[<p>This post is good news for some Java programmers. Once in a while you wish you knew a bit more about the files you access from Java. The <code>java.io.File</code> class offers only a bare minimum of visibility.</p>
<p>If you are wrestling with this problem, and if you are on a Linux/Unix platform, <a href="http://www.soderstrom.se/wp-content/uploads/filestat-1.0.tgz">download the <em>filestat</em> package</a> from this website and get going. This post introduces the package briefly.<br />
<span id="more-325"></span><br />
My interest was spawned by an excellent post: <a href="http://www.onyxbits.de/content/blog/patrick/how-deal-filesystem-softlinkssymbolic-links-java" target="_blank">How to deal with filesystem softlinks/symbolic links in Java</a>. It contains complete code for a native Java method to determine if a file is a symbolic link.</p>
<p>It was a long time since I did any JNI, so I felt inspired to dig into the more general problem of returning all of the C <code>struct stat</code> data to the Java level. The resulting <code>FileStat</code> Java class lets you access every bit of it.</p>
<p>If you wonder about the name, &#8217;stat(2)&#8217; is the system call used in Linux/Unix to get information about a file. There is also a similarly named command &#8217;stat(1)&#8217; that you may enter on the command line. It produces text output for human consumption.</p>
<p>On this level it is important to know about hard and soft (or symbolic) links. In a Linux/Unix file system any file is identified by its inode number. A file name is a <strong>hard link</strong> to an inode. Any number of names may be linked to the same inode. For instance, every directory contains entries named &#8220;.&#8221; and &#8220;..&#8221;. These are not shell conventions, but real file names, i.e. hard links. You may say that a Linux/Unix file system is a bag of inodes. Its structure is only determined by storage considerations. The directory tree structure we think is so essential is introduced by the file names linked to the inodes.</p>
<p>A <strong>symbolic link</strong> is a pointer to another file or directory. More exactly, it is a file containing the name of another file. It has a low level flag to indicate that this is a symbolic link. The flag may be accessed from Java through the filestat package.</p>
<p><strong>Directories</strong> work in a similar way. A directory is just a file with a low level flag that indicates that its contents should be interpreted as a directory.</p>
<p>Another low level flag may indicate that a file is a <strong>socket</strong>. This kind of socket is a <em>Unix domain socket</em>, a device for interprocess communication. It is similar to network sockets, but works only for local processes. A Unix domain socket appears as a file in the file system, so it is natural that it can only be accessed locally. Unix domain sockets are heavily used by the KDE and Gnome desktop environments, for example.</p>
<p>In Linux/Unix <strong>devices</strong> appear as files, usually under <code>/dev</code>. This is a powerful metaphor. Want some high quality random numbers? It&#8217;s as easy as typing <code>cat /dev/random</code>. This example is contrived, but it shows the power of the concept. No special programming is necessary. Another example is <code>/dev/null</code>, the bottomless pit (or empty input). It may be used from the command line as well as programmatically.</p>
<p>The <em>filestat</em> package lets you check all those low level flags from Java. It also lets you access two file timestamps in addition to the last modification time present in <code>java.io.File</code>. The timestamps contain the time of last access (atime) and the time of the last status change (ctime). A status change is when you change the owner or the protection of a file, for instance. All timestamps are available with nanosecond precision, for what it&#8217;s worth.</p>
<p>The <em>filestat</em> package also contains a class named TestFileStat. You may use it from the command line. Give it one or more file paths. For each path it first prints the result of invoking FileStat methods. Then it runs the stat(1) command and prints its output for comparison.</p>
<p>The only caveat is that the <em>filestat</em> package comes to you as source code. It contains native methods, so you must compile the package on your own box. You need <em>make</em> and <em>gcc</em> (or some other C compiler).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.soderstrom.se/?feed=rss2&amp;p=325</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The Schema-Less Database: Freedom or Bondage?</title>
		<link>http://www.soderstrom.se/?p=316</link>
		<comments>http://www.soderstrom.se/?p=316#comments</comments>
		<pubDate>Tue, 27 Jul 2010 05:59:16 +0000</pubDate>
		<dc:creator>Håkan</dc:creator>
				<category><![CDATA[Database views]]></category>

		<guid isPermaLink="false">http://www.soderstrom.se/?p=316</guid>
		<description><![CDATA[Some of the new strain of NoSQL databases are schema-less. They also claim this is a feature that brings flexibility. If the schema is such a roadblock, why was it invented in the first place?

Let&#8217;s begin with a fundamental question: What is a database schema? Answer: The database schema is the degree to which the [...]]]></description>
			<content:encoded><![CDATA[<p>Some of the new strain of NoSQL databases are schema-less. They also claim this is a feature that brings flexibility. If the schema is such a roadblock, why was it invented in the first place?<br />
<span id="more-316"></span><br />
Let&#8217;s begin with a fundamental question: What is a database schema? Answer: The database schema is the degree to which the database system understands your conceptual model. Understands in the sense of automatically supporting and upholding the model. A lot of schema, a lot of support. No schema, no support. A schema-less database system doesn&#8217;t have a clue about your model.</p>
<p>When I say “the model” I mean the conceptual model that is the foundation of every database application. The model is usually a judicious simplification of the real world. Take an order entry system, for example. The model may stipulate that there are Products, Customers and Orders and relationships between them. This is an extremely narrow view of a business, but it may be ok for a  model. The model bridges the gap between data and information. It gives meaning to the bytes stored in a database. Programmers continuously use the model when they code an application. The model is the crown jewel of every database application. </p>
<p>A modern relational database systems offers several standardized mechanisms to support a conceptual model. You may tell it to check value ranges and check relationships. You may create indexes to check that some value is unique or to speed up selected types of queries. In their limited way, these mechanisms help protect the database from storing garbage.</p>
<p>So why would anyone be against having their model supported by the database system? No one, if it wasn&#8217;t for one very disturbing fact about conceptual models: They evolve. All models in practical use change. If the model changes, and here is the crux of the matter, so must the schema.</p>
<p>High-end relational database systems support many types of schema changes. Adding or dropping a column is seldom a problem. Adding an index is straightforward. In theory. In practice, if a table has half a billion rows, creating an index may take a full day. It&#8217;s doable, but it takes so much time that it may interfere with normal operation.</p>
<p>Schema changes in large databases can be a serious pain. Schema-less databases don&#8217;t have a schema, so is the pain gone? The answer is no, the integrity of the conceptual model still has to be protected. The burden is shifted from the database system to applications. More code must be written because the database system no longer stops a buggy application from filling the database with nonsense. Application development time increases.</p>
<p>In my opinion “schema-less” is not a feature but a pragmatic trade-off to achieve performance in certain very large databases at the cost of longer development time.</p>
<p>In the specific case of Neo4j the almost total schema-less-ness seems counterproductive in my opinion. Every node in a database may be different. An application must check every node it retrieves to find out if it has the expected attributes, leading to performance loss. In practice every application will invent its own way of managing node types. The database system would do this more efficiently than applications. Versions of node types could be used to retain flexibility.</p>
<p>In conclusion, a schema-less database brings you freedom and flexibility in the same sense that driving without a safety belt does. In some special applications you may be forced to sacrifice schema support due to overriding performance concerns. Mainstream applications should use all schema support they can get.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.soderstrom.se/?feed=rss2&amp;p=316</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why NoSQL Won&#8217;t Replace Relational Databases Anytime Soon</title>
		<link>http://www.soderstrom.se/?p=286</link>
		<comments>http://www.soderstrom.se/?p=286#comments</comments>
		<pubDate>Thu, 01 Jul 2010 21:59:35 +0000</pubDate>
		<dc:creator>Håkan</dc:creator>
				<category><![CDATA[Database views]]></category>
		<category><![CDATA[Foundations]]></category>

		<guid isPermaLink="false">http://www.soderstrom.se/?p=286</guid>
		<description><![CDATA[These days there is a lot of buzz about NoSQL databases. We hear that new databases are about to replace relational ones because the relational data model is so old.
I&#8217;m sure this won&#8217;t happen anytime soon. On the other hand, I don&#8217;t believe NoSQL databases will go away either. My prediction is that new NoSQL [...]]]></description>
			<content:encoded><![CDATA[<p>These days there is a lot of buzz about NoSQL databases. We hear that new databases are about to replace relational ones because the relational data model is so old.</p>
<p>I&#8217;m sure this won&#8217;t happen anytime soon. On the other hand, I don&#8217;t believe NoSQL databases will go away either. My prediction is that new NoSQL databases will keep popping up. Each one will fill a specific need for a time, and then fade away. No two NoSQL databases will ever be compatible. Standardization is fundamentally contrary to the reason why NoSQL databases exist.</p>
<p>Hang on and I will show you why.<br />
<span id="more-286"></span><br />
<strong>Cutting Time to Market</strong><br />
Let&#8217;s start here: The only thing that matters in mainstream application development is time to market. Even code quality is sacrificed to have an application hit the market before the competition. This means developer productivity is a top priority. Write as few lines of code as possible to get the job done.</p>
<p>The main way to improve developer productivity is to raise the level of abstraction. Figuratively speaking, you don&#8217;t want to build the house from nails and a heap of boards. What you want is prefab modules. In computing it means you no longer code in C. You use Java or some other contemporary language that reduces the code volume by a significant factor.</p>
<p><strong>Abstractions</strong><br />
Progress in the art of programming is all about creating abstractions. They are the prefab modules of programming. Object orientation, message passing, automatic memory management, dependency injection: all such concepts offer abstractions with one goal, to help developers get more bang per line of code. Abstractions certainly don&#8217;t help the computer. When the code hits the CPU almost all traces of classes, interfaces and methods are gone. There is only a stream of primitive byte-shuffling instructions. Abstractions are strictly for human consumption.</p>
<p>The principle of the highest possible level of abstraction also applies to data, but may be less obvious. People have generally left C, but you may still run into projects that claim that they “just need something simple” for managing data and that the file system will do.</p>
<p><strong>The Relational Data Model</strong><br />
The currently highest level of abstraction in data management is the relational data model. This is true despite the fact that the theory was proposed in the 70&#8217;s. Its solid mathematical foundation makes a lot of difference. Its most notable abstraction is the possibility of a non-procedural query language that decouples applications from their data store.</p>
<blockquote><p>Note: SQL is not inherent in the relational model. There have been other relational query languages, but SQL ended up as an ANSI standard.</p></blockquote>
<p>The high level of abstraction was a problem at the time the relational model was proposed. As an illustration, <em>Ingres</em> was a pioneering relational research effort. It typically ran under Unix on PDP-11. It had to be partitioned into several interconnected processes because the PDP-11 (a 16-bit architecture) did not support processes larger than 64 KB (that&#8217;s right, kilobytes). This was a long time ago, well before the IBM PC era.</p>
<p>The feasibility of relational databases was questioned for decades. One of the most prominent contenders back then was the so-called CODASYL databases. The network data model was specified by CODASYL by the end of the 60&#8217;s. I mention it because modern-day databases like Neo4j have a quite similar data model. It&#8217;s one of the oldest ideas in database technology.</p>
<p>You may have heard about the so-called impedance mismatch. It means that the data structures of a programming language, Java for example, cannot be immediately translated to relational table rows. Most people blame the relational data model, but I also find fault with programming languages. The data structures of current programming languages are largely low level abstractions based on one-way pointers (lists) and one-way associations (maps). They have stayed essentially the same for decades so there should be room for new levels of abstraction.</p>
<p><strong>Quantum Jumps in Computing</strong><br />
Note that doubt was cast on Java for performance reasons during its inceptive years. It was believed to be mainly a tool for building applets (client side).</p>
<p>Today a real-time telecom system may well be implemented in Java (server side) and backed by a relational database. A real time billing system, for instance, may handle millions of subscribers and generate gigabytes of billing records every day. It took a while, but today there is no doubt that the relational model and Java are feasible for many demanding purposes.</p>
<p>My point is this: A quantum jump in computing begins with conjuring up a level of abstraction that is not yet feasible. Initially people will complain that it&#8217;s a terrible waste of CPU cycles and that performance sucks. Given time there will be enough CPU cycles and memory and gradually everyone will use the new abstractions to improve their productivity.</p>
<p><strong>Stretching the Limits</strong><br />
So much for mainstream applications. At any given time there are people who stretch the limits. This was true about electronics CAD systems in the 80&#8217;s – the relational databases of the day could not handle them. It is true now. For instance, there are extremely visible web applications with millions of users spread around the globe. Relational databases just won&#8217;t cut it.</p>
<p>The challenges are different at different points in time, but usually a combination of two principles are used to solve those hard cases,</p>
<ul>
<li>Give up finding a general solution and solve only the relevant special cases</li>
<li>Take the hit of accepting lower levels of abstraction</li>
</ul>
<p>For instance, if you are not in banking or telecom billing you may find that your application can do without airtight transactions. Eventual consistency may be good enough. Perhaps some components have to be coded in C after all. Perhaps you invent a way of partitioning the problem over a thousand computers in parallel. The important thing is to get the job done. The flip side of leaving the beaten track is a sharp increase in the volume of code you must produce. The companies behind the million-user web applications have the resources to do it.</p>
<p><strong>NoSQL Databases</strong><br />
Most of the NoSQL databases we see are designed to gain some desirable characteristic (performance, scalability etc.) at the expense of not being general-purpose or operating at a lower level of abstraction. You may find that your application needs one of them to satisfy extreme requirements. Then go for it and accept that you will pay a price in terms of more developer hours. A few years down the road the trade-off will be different. General-purpose databases will manage more complex stuff which means that the previous generation of NoSQL databases will be less needed. New ones will be necessary for those who stretch the new limits.</p>
<p>NoSQL databases may be extremely capable. They are temporary fixes nonetheless. They get their edge by not trying to solve all the problems that contemporary relational databases do and/or by operating at a lower level of abstraction. Each one strikes its own trade-offs. You should find out what they are.</p>
<p>Maybe I&#8217;m unfair. Maybe some NoSQL databases build on new theory and not just on pragmatic trade-offs.</p>
<p>Maybe one day we will see a data model on a level of abstraction higher than the relational one. For instance, I have waited a long time for “table” to appear as a valid column data type in a new strain of relational databases. It sounds conceptually simple but is very disruptive. The query language as well as fundamental principles of storage organization are challenged.</p>
<p><strong>Summing Up</strong><br />
In summary, when selecting a database system keep your head cool and don&#8217;t make decisions based on hype. Rational thinking still works. That and hands-on testing will help you steer away from costly database mistakes.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.soderstrom.se/?feed=rss2&amp;p=286</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Network versus Relational: Part III</title>
		<link>http://www.soderstrom.se/?p=266</link>
		<comments>http://www.soderstrom.se/?p=266#comments</comments>
		<pubDate>Tue, 29 Jun 2010 10:05:19 +0000</pubDate>
		<dc:creator>Håkan</dc:creator>
				<category><![CDATA[Database views]]></category>

		<guid isPermaLink="false">http://www.soderstrom.se/?p=266</guid>
		<description><![CDATA[In previous posts (original post, follow up) I ran small database performance tests involving Neo4j (a graph or network database) and Apache Derby (a relational database). Both are Java-based.
How about running the same tests with trusty old PostgreSQL? Neo4j and Apache Derby were both embedded in the test application. There was no database server, although [...]]]></description>
			<content:encoded><![CDATA[<p>In previous posts (<a href="http://www.soderstrom.se/?p=191">original post</a>, <a href="http://www.soderstrom.se/?p=256">follow up</a>) I ran small database performance tests involving Neo4j (a graph or network database) and Apache Derby (a relational database). Both are Java-based.</p>
<p>How about running the same tests with trusty old <strong>PostgreSQL</strong>? Neo4j and Apache Derby were both embedded in the test application. There was no database server, although Derby may be run in client-server mode.</p>
<p>PostgreSQL is a fully featured RDBMS firmly founded on the classical client-server architecture. It&#8217;s been around for ages. The server is coded in C.</p>
<p>The PostgreSQL logo is a stylized head of an elephant. If you think it&#8217;s a symbol of heavy feet you are in for a surprise.<br />
<span id="more-266"></span><br />
I installed PostgreSQL 8.3.9 on the box where the previous tests were run.<br />
Tuning PostgreSQL is an art in itself. It&#8217;s a big system with many parameters. To avoid getting stuck in configuration I decided to run PostgreSQL with default configuration, no tweaking. Server and client ran in the same box. The test programs needed only a minor touch-up.</p>
<p>I was in a hurry, so I ended up having data on the OS disk. Neo4j and Derby had the luxury of storing data on a dedicated database disk.</p>
<p>The effect of caching in the PostgreSQL server is very visible. Repeating a test improves the test case performance. Measurements were made after stopping and starting the server. In a real application, of course, you want the benefits of server-side caching.</p>
<p><strong>Create database:</strong> PostgreSQL performance was about the same as Apache Derby in the small data sets, but noticeably faster in the long-running ones. Here is the table from the original post complemented with PostgreSQL but without the database size columns.<br />
<TABLE WIDTH=300 CELLSPACING=8 COLS=5 RULES=NONE BORDER=0><TBODY><TR><TH>Files</TD><TH COLSPAN=3>Time (s)</TD></TR><TR><TD ALIGN=RIGHT> </TD><TD ALIGN=RIGHT>Neo4j</TD><TD ALIGN=RIGHT>Derby</TD><TD ALIGN=RIGHT>PostgreSQL</TD></TR><TR><TD ALIGN=RIGHT>7&nbsp;109</TD><TD ALIGN=RIGHT>4.8</TD><TD ALIGN=RIGHT>13.4</TD><TD ALIGN=RIGHT>12.1</TD></TR><TR><TD ALIGN=RIGHT>53&nbsp;378</TD><TD ALIGN=RIGHT>16.4</TD><TD ALIGN=RIGHT>51.0</TD><TD ALIGN=RIGHT>53.7</TD></TR><TR><TD ALIGN=RIGHT>520&nbsp;691</TD><TD ALIGN=RIGHT>&#8211;</TD><TD ALIGN=RIGHT>479.8</TD><TD ALIGN=RIGHT>408.9</TD></TR></TBODY></TABLE></p>
<p>Neo4j shines in this test, except that it had a problem with large transactions.</p>
<p><strong>Retrieval:</strong> PostgreSQL wiped the floor with Neo4j and Apache Derby. The original test on a scan containing 524.000 files took less than 0.8 s. with 1.2 million records in the file table. The modified test case, testing for inclusion in a subtree, took 0.3 s, still with 1.2 million files in the database. This is an order of magnitude faster than the Java databases. I double-checked that Postgres hadn&#8217;t created indexes behind my back, but no.</p>
<p>C is still faster than Java. PostgreSQL left the Java databases in the dust in the retrieval tests.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.soderstrom.se/?feed=rss2&amp;p=266</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Network versus Relational: Part II</title>
		<link>http://www.soderstrom.se/?p=256</link>
		<comments>http://www.soderstrom.se/?p=256#comments</comments>
		<pubDate>Tue, 22 Jun 2010 15:40:55 +0000</pubDate>
		<dc:creator>Håkan</dc:creator>
				<category><![CDATA[Database views]]></category>

		<guid isPermaLink="false">http://www.soderstrom.se/?p=256</guid>
		<description><![CDATA[In a recent post I ran a simplistic database performance test involving Neo4j (a graph or network database) and Apache Derby (a relational database). Both are Java-based. Relational databases are challenged by deep hierarchies, so the test was exactly that: Build a deep hierarchy and retrieve data from it.
The retrieval test was a surprise because [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://www.soderstrom.se/?p=191">a recent post</a> I ran a simplistic database performance test involving Neo4j (a graph or network database) and Apache Derby (a relational database). Both are Java-based. Relational databases are challenged by deep hierarchies, so the test was exactly that: Build a deep hierarchy and retrieve data from it.</p>
<p>The retrieval test was a surprise because relational Derby seemed to perform better than Neo4j.</p>
<p>Anders Nawroth, Neo Technology, correctly commented that the retrieval test case searches for any file in a hierarchy, i.e. the search is independent of the hierarchy. The relational test program took advantage of this fact while Neo4j traversed the the graph.</p>
<p>In the interest of fairness this post digs somewhat deeper into the retrieval test case. The results are intriguing.<br />
<span id="more-256"></span><br />
To force the retrieval test to take the hierarchy into account, let&#8217;s restate it this way: Find the oldest file bigger than 5 MiB in a given subtree of a given scan.</p>
<p>For Neo4j the revamped test case makes no big difference. It&#8217;s still a matter of graph traversal.</p>
<p>However, the idea of traversing a hierarchy with a relational database makes me cringe. A general solution implies one query per parent node to find the children. Performance is awful. In real life I would find a way to avoid it. (Note that I already did because every file node has a pointer to the scan it belongs to.) In the revamped test case I prefer to check the file path to find out if it belongs to the given subtree. I must give up my NoSQL ambitions and use a query like this,</p>
<p><code>SELECT * FROM file WHERE scan = ? AND size > 5000000 AND path LIKE '/usr/local/%' ORDER BY mtime</code></p>
<p>The result is ordered, so the answer is the first row of the result set. Before running this test I added another file system scan to the database, rooted in <code>/usr</code>, increasing the number of records of the file table to 1.15 million.</p>
<p>The revamped test case applied to the <code>/usr/local</code> subtree of the <code>/usr</code> scan took Derby 3.7 s. It may be compared directly to Neo4j running the original test case on a scan rooted in <code>/usr/local</code> (3.6 s). Derby still stands up to the competition.</p>
<p>Now let&#8217;s turn the test case around a different way. In the original test we allowed Derby to do a plain table scan. A similar approach may be used with Neo4j. It has a complementary access method (<code>getAllNodes</code>) for iterating over all nodes regardless of their position in the graph.</p>
<p>There is an important difference between Neo&#8217;s <code>getAllNodes</code> and a relational table scan. Neo4j is a schema-less database (except for relationship types). The <code>getAllNodes</code> method iterates over the entire database. In the original test program the type of a node is implied by its position in the hierarchy. With <code>getAllNodes</code> there is no way to determine the position of a node returned by the iterator. The solution is to add a node type attribute to every node. We also have to tag every file/directory node with the scan it belongs to, just like the relational schema.</p>
<blockquote><p>(Note in passing: There is no such thing as a schema-free database. The question is only who does the job, the database system or you. But that&#8217;s the subject of another post.)</p></blockquote>
<p>For every node in the iteration we must now check the node type and the scan. The search time rose to somewhere around 9 s, worse than graph traversal. The database contained 430.000 nodes at this point. Just iterating over all nodes without accessing any attributes or relationships took 3.0 s.</p>
<p>Derby, on the other hand, solved the original problem in 2.5 s having 1.150.000 files in the database. Finding the oldest file over 5 MiB among 521.000 files took 4.7 s (still with 1.15 million files in the database). Both cases involved at least one comparison for every table row.</p>
<p><strong>Conclusion:</strong> I really cannot convince myself that Neo4j graph traversal is faster than Derby table scans. Database creation turned out to be 3 times faster than Derby, but most applications have many more reads than writes, making basic retrieval very important.</p>
<p><strong>Disclaimer:</strong> This judgement is based on a toy example and does not necessarily apply to your application. Please make your own tests before doing anything expensive. Please also tell me if you find evidence contrary to this test.</p>
<p>In this test I managed to avoid hierarchy navigation with the relational database. Of course there are applications where this just isn&#8217;t possible and a non-relational database is the only way to get acceptable performance. I&#8217;ll look into this general issue in an upcoming post.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.soderstrom.se/?feed=rss2&amp;p=256</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Network versus Relational</title>
		<link>http://www.soderstrom.se/?p=191</link>
		<comments>http://www.soderstrom.se/?p=191#comments</comments>
		<pubDate>Wed, 02 Jun 2010 09:00:10 +0000</pubDate>
		<dc:creator>Håkan</dc:creator>
				<category><![CDATA[Database views]]></category>
		<category><![CDATA[Groovy lessons]]></category>

		<guid isPermaLink="false">http://www.soderstrom.se/?p=191</guid>
		<description><![CDATA[It&#8217;s time for the long awaited Network versus Relational database head-to-head, no-mercy showdown. Network databases is represented by Neo4j 1.0, relational by Apache Derby 10.5.3.0.

This shop is biased towards Groovy, so Groovy is used for all test programs. It saves a lot of coding. To a certain extent it also allows us to run a [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s time for the long awaited Network versus Relational database head-to-head, no-mercy showdown. Network databases is represented by Neo4j 1.0, relational by Apache Derby 10.5.3.0.<br />
<span id="more-191"></span><br />
This shop is biased towards Groovy, so Groovy is used for all test programs. It saves a lot of coding. To a certain extent it also allows us to run a relational database without writing SQL, adding a special twist to the “nosql” concept.</p>
<p>The test programs have been presented in previous posts. This is a <a href="http://www.soderstrom.se/?p=102">link to the Neo4j test case</a>. Here is a <a href="http://www.soderstrom.se/?p=140">link to the Derby test case</a>. The task is to create a database representing a file system hierarchy and then to find the oldest file with a size greater than 5 MiB from the database. A deep hierarchical data structure was chosen for the test because that&#8217;s where network databases are expected to shine. Relational databases, on the other hand, are challenged by deep hierarchies.</p>
<p>Any database evaluation is flame war fuel. A few disclaimers might ward off some of it.</p>
<blockquote><p>The simple test case used here is like peeping through a keyhole. What you see is real but very limited. There is a lot more to evaluate. Fundamentals not tested here include managing concurrency and making efficient use of multi-core CPUs.</p></blockquote>
<blockquote><p>We measure performance. But seriously, if performance really matters one should not use Groovy.</p></blockquote>
<p>The testing environment is summarized at the end of this post.</p>
<p><strong>First task:</strong> Create a database. Here are the measurements. The <em>Files</em> column contains the number of files scanned. This is also approximately the number of database records created. The execution time includes opening and closing the database, a significant portion of the total execution time, especially for Neo4j.</p>
<p><TABLE WIDTH=400 CELLSPACING=8 COLS=5 RULES=NONE BORDER=0><TBODY><TR><TH>Files</TD><TH COLSPAN=2>Time (s)</TD><TH COLSPAN=2>DB size (MB)</TD></TR><TR><TD ALIGN=RIGHT> </TD><TD ALIGN=RIGHT>Neo4j</TD><TD ALIGN=RIGHT>Derby</TD><TD ALIGN=RIGHT>Neo4j</TD><TD ALIGN=RIGHT>Derby</TD></TR><TR><TD ALIGN=RIGHT>7&nbsp;109</TD><TD ALIGN=RIGHT>4.8</TD><TD ALIGN=RIGHT>13.4</TD><TD ALIGN=RIGHT>2</TD><TD ALIGN=RIGHT>1</TD></TR><TR><TD ALIGN=RIGHT>53&nbsp;378</TD><TD ALIGN=RIGHT>16.4</TD><TD ALIGN=RIGHT>51.0</TD><TD ALIGN=RIGHT>15</TD><TD ALIGN=RIGHT>13</TD></TR><TR><TD ALIGN=RIGHT>520&nbsp;691</TD><TD ALIGN=RIGHT>&#8211;</TD><TD ALIGN=RIGHT>479.8</TD><TD ALIGN=RIGHT>&#8211;</TD><TD ALIGN=RIGHT>121</TD></TR></TBODY></TABLE></p>
<p>The table shows that Neo4j was around 3 times faster than Derby in the two first cases. In the biggest test case Neo4j blew up with an out of memory error. One may argue that creating half a million records in a single transaction is unreasonable, but Derby did it without complaining. Neo4j also uses slightly more disk space for its databases.</p>
<p><strong>Second task:</strong> Find the oldest file with a size greater than 5 MiB. Neither database uses any index, linear scan is expected. The test was only run on a database containing 53&nbsp;378 records because of the Neo4j memory problems with the bigger database. Timing in this case only includes the search. Opening and closing the database is not measured. The timing is an average of a few runs.</p>
<p>Neo4j: 3.6 seconds<br />
Derby: 2.2 seconds</p>
<p>In this task Derby outran Neo4j, a very surprising result. One possible explanation is that in the Derby test program the innermost loop never loops. It is converted behind the scenes into a single SQL query. The corresponding loop in the Neo4j test program is executed literally. Even so I find it a bit surprising that network traversal in Neo4j appears slower than a table scan in Derby.</p>
<p>In summary Neo4j was much quicker writing a database, but Derby unexpectedly won the retrieval test. If I can find some time I will dig deeper into this and follow up with a later post.</p>
<blockquote><p><strong>Testing environment:</strong> OpenSUSE 11.1 64-bit Linux on a vanilla HP xw4400 workstation with an Intel Core2 CPU 6400 at 2.13 GHZ and 4 GB of memory. More software versions: Groovy 1.7.2, Java 1.6.0. Default Java memory settings.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.soderstrom.se/?feed=rss2&amp;p=191</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>&#8220;Ubiquitous&#8221;, a Hidden Language Trap in Korean and Japanese IT</title>
		<link>http://www.soderstrom.se/?p=182</link>
		<comments>http://www.soderstrom.se/?p=182#comments</comments>
		<pubDate>Thu, 13 May 2010 19:32:46 +0000</pubDate>
		<dc:creator>Håkan</dc:creator>
				<category><![CDATA[Japan]]></category>
		<category><![CDATA[Korea]]></category>
		<category><![CDATA[U as in Ubiquitous]]></category>

		<guid isPermaLink="false">http://www.soderstrom.se/?p=182</guid>
		<description><![CDATA[The word ubiquitous is a key to understanding Korean and Japanese information technology (IT). An example: U-city (U as in ubiquitous) is a concept heavily promoted in Korea. All the major Korean cities strive to earn the U-city label. Ubiquitous, according to an English dictionary, means found or seeming to be found everywhere. How can [...]]]></description>
			<content:encoded><![CDATA[<p>The word <em>ubiquitous</em> is a key to understanding Korean and Japanese information technology (IT). An example: U-city (U as in ubiquitous) is a concept heavily promoted in Korea. All the major Korean cities strive to earn the U-city label. Ubiquitous, according to an English dictionary, means <em>found or seeming to be found everywhere</em>. How can a city be found everywhere? The very ambition to be found everywhere may seem mysterious, or even suspect to a Westerner.</p>
<p>However, to Koreans and the Japanese <em>ubiquitous</em> has a different meaning. The double semantics of this word is little known. Since I couldn&#8217;t find any previous work on this subject I recently wrote an article about it, now published in the proceedings of the ICISA 2010 conference.<br />
<span id="more-182"></span><br />
ICISA 2010 (The International Conference on Information Science and Applications) was held in April 2010 in Seoul, Korea. The Icelandic ash cloud did its best to shut out all European delegates. I may have been then only one slipping through.</p>
<p>The title of my contribution is: <em><a href="http://www.soderstrom.se/wp-content/uploads/icisa2010-soderstrom-draft.pdf">Linguistic Aspects of Ubiquitous Computing: On &#8220;ubiquitous&#8221; in Japanese and Korean Information Technology</a></em>. So what is a linguistic paper doing in a conference on information science? It may seem like a long shot. In my opinion, if you are Korean or Japanese, the paper contains things you need to know to successfully promote IT concepts and products abroad. If you are not Korean or Japanese the paper contains a background you need to understand Korean and Japanese IT.</p>
<p>The key to resolving the language issue is to realize that <em>ubiquitous</em> has become a loanword in the Korean and Japanese languages. Loanwords are written in Korean and Japanese scripts (Hangul and Katakana, respectively). In this form their existence is somewhat decoupled from their English origin. The original phrase <em>ubiquitous computing</em> has been subject to lexical truncation and semantic shift. See the paper for details.</p>
<p>A draft version of the paper is available in the <a href="http://www.soderstrom.se/?page_id=26">download area</a>. The copyright of the conference proceedings is held by IEEE. The published article is available from <a href="http://ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&#038;arnumber=5480575">IEEE Xplore</a> for a fee (USD 30).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.soderstrom.se/?feed=rss2&amp;p=182</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Neo4j vs. Relational: The Relational Combatant</title>
		<link>http://www.soderstrom.se/?p=140</link>
		<comments>http://www.soderstrom.se/?p=140#comments</comments>
		<pubDate>Fri, 07 May 2010 09:48:18 +0000</pubDate>
		<dc:creator>Håkan</dc:creator>
				<category><![CDATA[Database views]]></category>
		<category><![CDATA[Groovy lessons]]></category>

		<guid isPermaLink="false">http://www.soderstrom.se/?p=140</guid>
		<description><![CDATA[A previous post promised a head-to-head no-mercy Neo4j vs. relational showdown. It also provided Groovy programs to store and retrieve file system data in a Neo4j graph database. Neo4j was recently released in a 1.0 version (see the Neo4j site).
Now it's time for the relational combatant to enter the scene: Apache Derby.
We will write another [...]]]></description>
			<content:encoded><![CDATA[<p>A <a href="http://www.soderstrom.se/?p=102">previous post</a> promised a head-to-head no-mercy Neo4j vs. relational showdown. It also provided Groovy programs to store and retrieve file system data in a Neo4j graph database. Neo4j was recently released in a 1.0 version (see <a href="http://neo4j.org/">the Neo4j site</a>).</p>
<p>Now it's time for the relational combatant to enter the scene: <a href="http://db.apache.org/derby/index.html">Apache Derby</a>.<br />
We will write another Groovy program to store file system data, this time using a Derby relational database. To make the task more interesting we will try to earn a "nosql" medal by not using SQL.<br />
<span id="more-140"></span><br />
Derby is a relatively new player in the database arena. I chose it mainly because I haven't used it before. The solid documentation makes a good initial impression by presenting a quite feature complete SQL/JDBC database.</p>
<p>The database was defined like this (nosql has to wait a minute, this is nothing but SQL):</p>
<div class="igBar"><span id="lsql-5"><a href="#" onclick="javascript:showPlainTxt('sql-5'); return false;">Click here for plain text view</a></span></div>
<div class="syntax_hilite"><span class="langName">SQL:</span>
<div id="sql-5">
<div class="sql">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> scan <span style="color:#006600; font-weight:bold;">&#40;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp;id int <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #993333; font-weight: bold;">KEY</span> generated <span style="color: #993333; font-weight: bold;">BY</span> <span style="color: #993333; font-weight: bold;">DEFAULT</span> <span style="color: #993333; font-weight: bold;">AS</span> identity,</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp;path varchar<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color: #cc66cc;color:#800000;">400</span><span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>,</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp;tstamp timestamp <span style="color: #993333; font-weight: bold;">DEFAULT</span> current_timestamp <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#006600; font-weight:bold;">&#41;</span>;</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> file <span style="color:#006600; font-weight:bold;">&#40;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp;id int <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #993333; font-weight: bold;">KEY</span> generated <span style="color: #993333; font-weight: bold;">BY</span> <span style="color: #993333; font-weight: bold;">DEFAULT</span> <span style="color: #993333; font-weight: bold;">AS</span> identity,</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp;scan int <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #993333; font-weight: bold;">REFERENCES</span> scan<span style="color:#006600; font-weight:bold;">&#40;</span>id<span style="color:#006600; font-weight:bold;">&#41;</span>,</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp;parent int <span style="color: #993333; font-weight: bold;">REFERENCES</span> file<span style="color:#006600; font-weight:bold;">&#40;</span>id<span style="color:#006600; font-weight:bold;">&#41;</span>,</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp;path varchar<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color: #cc66cc;color:#800000;">400</span><span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>,</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp;size bigint,</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp;mtime timestamp</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#006600; font-weight:bold;">&#41;</span>; </div>
</li>
</ol>
</div>
</div>
</div>
<p>
We will use this schema to store file system scans in the <em>scan</em> table. Every file or directory in a file system scan will add an entry to the <em>file</em> table. Every file and directory except the top directory is linked to its parent directory through the <em>parent</em> column.</p>
<p>Scans and files are uniquely identified by an integer key. A specific file may occur more than once, but in different scans, so the file path alone is not unique.</p>
<p>The schema uses a common trick for managing hierarchies in a relational database: cross-level links. In this case every file will have a direct link (the <em>scan</em> column) to the scan it is part of.</p>
<p>Here is a Groovy program that scans the file system and writes the database.</p>
<div class="igBar"><span id="lgroovy-6"><a href="#" onclick="javascript:showPlainTxt('groovy-6'); return false;">Click here for plain text view</a></span></div>
<div class="syntax_hilite"><span class="langName">GROOVY:</span>
<div id="groovy-6">
<div class="groovy">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #a1a100;">import java.sql.*</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #a1a100;">import groovy.sql.*</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">DBURL = <span style="color: #ff0000;">'jdbc:derby:/sdb3/cur/data/derby1'</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">START = <span style="color: #aaaadd; font-weight: bold;">System</span>.<span style="color: #006600;">currentTimeMillis</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #993333;">int</span> LASTID</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #808080; font-style: italic;">// Collect file system data, shut down database</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #000000; font-weight: bold;">def</span> <span style="color: #663399;">collect</span><span style="color: #66cc66;">&#40;</span><span style="color: #aaaadd; font-weight: bold;">String</span> topDirPath<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #000000; font-weight: bold;">try</span> <span style="color: #66cc66;">&#123;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000; font-weight: bold;">def</span> cnx = java.<span style="color: #006600;">sql</span>.<span style="color: #aaaadd; font-weight: bold;">DriverManager</span>.<span style="color: #006600;">getConnection</span><span style="color: #66cc66;">&#40;</span>DBURL<span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000; font-weight: bold;">def</span> db = <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #993399; font-weight: bold;">Sql</span><span style="color: #66cc66;">&#40;</span>cnx<span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; setNextId<span style="color: #66cc66;">&#40;</span>db<span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000; font-weight: bold;">def</span> scanId = addScan<span style="color: #66cc66;">&#40;</span>db, topDirPath<span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000; font-weight: bold;">def</span> fileCount = collectDir<span style="color: #66cc66;">&#40;</span>db.<span style="color: #006600;">dataSet</span><span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'FILE'</span><span style="color: #66cc66;">&#41;</span>, scanId, <span style="color: #000000; font-weight: bold;">null</span>, <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #aaaadd; font-weight: bold;">File</span><span style="color: #66cc66;">&#40;</span>topDirPath<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; cnx.<span style="color: #006600;">commit</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #993399;">println</span> <span style="color: #ff0000;">"Files collected: ${fileCount}"</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #66cc66;">&#125;</span> <span style="color: #000000; font-weight: bold;">catch</span> <span style="color: #66cc66;">&#40;</span><span style="color: #aaaadd; font-weight: bold;">Throwable</span> exc<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; cnx.<span style="color: #006600;">rollback</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #993399;">println</span> <span style="color: #ff0000;">"OOPS! ${exc.message}"</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #66cc66;">&#125;</span> <span style="color: #000000; font-weight: bold;">finally</span> <span style="color: #66cc66;">&#123;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000; font-weight: bold;">try</span> <span style="color: #66cc66;">&#123;</span> <span style="color: #808080; font-style: italic;">// Derby shutdown requires a new connection</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; java.<span style="color: #006600;">sql</span>.<span style="color: #aaaadd; font-weight: bold;">DriverManager</span>.<span style="color: #006600;">getConnection</span><span style="color: #66cc66;">&#40;</span>DBURL + <span style="color: #ff0000;">';shutdown=true'</span><span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #66cc66;">&#125;</span> <span style="color: #000000; font-weight: bold;">catch</span> <span style="color: #66cc66;">&#40;</span><span style="color: #aaaadd; font-weight: bold;">SQLException</span> exc<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #993399;">println</span> <span style="color: #ff0000;">"Shutdown message: ${exc.message}"</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #66cc66;">&#125;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #66cc66;">&#125;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #000000; font-weight: bold;">def</span> stopTime = <span style="color: #aaaadd; font-weight: bold;">System</span>.<span style="color: #006600;">currentTimeMillis</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #993399;">println</span> <span style="color: #ff0000;">"Processing time: ${stopTime - START} ms"</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #66cc66;">&#125;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #993333;">int</span> addScan<span style="color: #66cc66;">&#40;</span><span style="color: #993399; font-weight: bold;">Sql</span> db, <span style="color: #aaaadd; font-weight: bold;">String</span> topDirPath<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #000000; font-weight: bold;">def</span> ds = db.<span style="color: #006600;">dataSet</span><span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'SCAN'</span><span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; ds.<span style="color: #006600;">add</span><span style="color: #66cc66;">&#40;</span>id: ++LASTID, path: topDirPath<span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #000000; font-weight: bold;">return</span> LASTID</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #66cc66;">&#125;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #993333;">int</span> addDirOrFile<span style="color: #66cc66;">&#40;</span>DataSet ds, <span style="color: #993333;">int</span> scanId, <span style="color: #aaaadd; font-weight: bold;">Integer</span> parentId, <span style="color: #aaaadd; font-weight: bold;">File</span> file<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #000000; font-weight: bold;">def</span> values = <span style="color: #66cc66;">&#91;</span>id: ++LASTID, scan: scanId, path: file.<span style="color: #006600;">path</span>,</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; mtime: <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #aaaadd; font-weight: bold;">Timestamp</span><span style="color: #66cc66;">&#40;</span>file.<span style="color: #006600;">lastModified</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#93;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span>parentId<span style="color: #66cc66;">&#41;</span> values.<span style="color: #006600;">parent</span> = parentId</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span>!file.<span style="color: #006600;">isDirectory</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> values.<span style="color: #663399;">size</span> = file.<span style="color: #663399;">size</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; ds.<span style="color: #006600;">add</span><span style="color: #66cc66;">&#40;</span>values<span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #000000; font-weight: bold;">return</span> LASTID</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #66cc66;">&#125;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #808080; font-style: italic;">// Recursively collect directory data</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #993333;">int</span> collectDir<span style="color: #66cc66;">&#40;</span>DataSet ds, <span style="color: #993333;">int</span> scanId, <span style="color: #aaaadd; font-weight: bold;">Integer</span> parentId, <span style="color: #aaaadd; font-weight: bold;">File</span> dir<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #993333;">int</span> currentDirId = addDirOrFile<span style="color: #66cc66;">&#40;</span>ds, scanId, parentId, dir<span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #993333;">int</span> fileCount = <span style="color: #cc66cc;color:#800000;">1</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; dir.<span style="color: #FFCC33;">eachFile</span> <span style="color: #66cc66;">&#123;</span>file <span style="color: #b1b100;">-&gt;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span>file.<span style="color: #006600;">isDirectory</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; fileCount += collectDir<span style="color: #66cc66;">&#40;</span>ds, scanId, currentDirId, file<span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #66cc66;">&#125;</span> <span style="color: #b1b100;">else</span> <span style="color: #66cc66;">&#123;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; addDirOrFile<span style="color: #66cc66;">&#40;</span>ds, scanId, currentDirId, file<span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; fileCount++</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #66cc66;">&#125;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #66cc66;">&#125;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #000000; font-weight: bold;">return</span> fileCount</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #66cc66;">&#125;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #993333;">int</span> setNextId<span style="color: #66cc66;">&#40;</span><span style="color: #993399; font-weight: bold;">Sql</span> db<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #000000; font-weight: bold;">def</span> row = db.<span style="color: #006600;">firstRow</span><span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'SELECT MAX(id) as maxid FROM file'</span><span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #aaaadd; font-weight: bold;">Integer</span> id = row.<span style="color: #006600;">MAXID</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; LASTID = id ?: <span style="color: #cc66cc;color:#800000;">1</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #66cc66;">&#125;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #808080; font-style: italic;">// Be useful</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #66cc66;">&#40;</span>args.<span style="color: #663399;">size</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>&gt; <span style="color: #cc66cc;color:#800000;">0</span><span style="color: #66cc66;">&#41;</span>? <span style="color: #663399;">collect</span><span style="color: #66cc66;">&#40;</span>args<span style="color: #66cc66;">&#91;</span><span style="color: #cc66cc;color:#800000;">0</span><span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#41;</span> : <span style="color: #993399;">println</span><span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">"Top directory path required"</span><span style="color: #66cc66;">&#41;</span> </div>
</li>
</ol>
</div>
</div>
</div>
<p>
The program expects a directory path on the command line. It will traverse the file system beginning with that directory and store file and directory data. Like the Neo4j equivalent it does everything in a single transaction.</p>
<p>You will find the main logic in the recursive <code>collectDir</code> method. It runs through all entries of a directory. If an entry is a directory it invokes itself. Otherwise it just adds a record to the <em>file</em> table.</p>
<p>The "nosql"-ness comes from using Groovy <code>DataSet</code>. DataSets have no way of picking up values assigned to autoincremented columns, so there is a tiny bit of SQL after all. We also had to avoid autoincrement and manage record ids explicitly. This detail takes away some of the Groovy elegance.</p>
<p>After making a few scans it is time to retrieve data. The obvious choice is to use <em>ij</em> that comes with Apache Derby. It is an interactive, command line tool. The beauty of JDBC is that there are many other JDBC-compatible tool we could use, some with a graphical user interface.</p>
<p>Listing scans and asking for the oldest file with a size > 5000000 looks like this (with a scan id provided in the second statement):</p>
<div class="igBar"><span id="lsql-7"><a href="#" onclick="javascript:showPlainTxt('sql-7'); return false;">Click here for plain text view</a></span></div>
<div class="syntax_hilite"><span class="langName">SQL:</span>
<div id="sql-7">
<div class="sql">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #993333; font-weight: bold;">SELECT</span> * <span style="color: #993333; font-weight: bold;">FROM</span> scan <span style="color: #993333; font-weight: bold;">ORDER</span> <span style="color: #993333; font-weight: bold;">BY</span> tstamp;</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #993333; font-weight: bold;">SELECT</span> path, size, mtime <span style="color: #993333; font-weight: bold;">FROM</span> file <span style="color: #993333; font-weight: bold;">WHERE</span> scan=<span style="color: #cc66cc;color:#800000;">570467</span> <span style="color: #993333; font-weight: bold;">AND</span> size&gt; <span style="color: #cc66cc;color:#800000;">5000000</span> <span style="color: #993333; font-weight: bold;">ORDER</span> <span style="color: #993333; font-weight: bold;">BY</span> mtime fetch first row only; </div>
</li>
</ol>
</div>
</div>
</div>
<p>
However, this is supposed to be a head-to-head showdown, so we also provide Groovy code to do the same thing.</p>
<div class="igBar"><span id="lgroovy-8"><a href="#" onclick="javascript:showPlainTxt('groovy-8'); return false;">Click here for plain text view</a></span></div>
<div class="syntax_hilite"><span class="langName">GROOVY:</span>
<div id="groovy-8">
<div class="groovy">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #a1a100;">import java.sql.*</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #a1a100;">import groovy.sql.*</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">DBURL = <span style="color: #ff0000;">'jdbc:derby:/sdb3/cur/data/derby1'</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #000000; font-weight: bold;">def</span> DB = <span style="color: #000000; font-weight: bold;">null</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #000000; font-weight: bold;">def</span> query<span style="color: #66cc66;">&#40;</span><span style="color: #aaaadd; font-weight: bold;">String</span> scanIdString<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #000000; font-weight: bold;">try</span> <span style="color: #66cc66;">&#123;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000; font-weight: bold;">def</span> cnx = java.<span style="color: #006600;">sql</span>.<span style="color: #aaaadd; font-weight: bold;">DriverManager</span>.<span style="color: #006600;">getConnection</span><span style="color: #66cc66;">&#40;</span>DBURL<span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; DB = <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #993399; font-weight: bold;">Sql</span><span style="color: #66cc66;">&#40;</span>cnx<span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; scanIdString? findFile<span style="color: #66cc66;">&#40;</span>scanIdString <span style="color: #000000; font-weight: bold;">as</span> <span style="color: #aaaadd; font-weight: bold;">Long</span><span style="color: #66cc66;">&#41;</span> : listScans<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; DB.<span style="color: #006600;">commit</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #66cc66;">&#125;</span> <span style="color: #000000; font-weight: bold;">catch</span> <span style="color: #66cc66;">&#40;</span><span style="color: #aaaadd; font-weight: bold;">Throwable</span> exc<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; DB.<span style="color: #006600;">rollback</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #993399;">println</span> <span style="color: #ff0000;">"OOPS! ${exc.message}"</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #66cc66;">&#125;</span> <span style="color: #000000; font-weight: bold;">finally</span> <span style="color: #66cc66;">&#123;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000; font-weight: bold;">try</span> <span style="color: #66cc66;">&#123;</span> <span style="color: #808080; font-style: italic;">// Derby shutdown requires a new connection</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; java.<span style="color: #006600;">sql</span>.<span style="color: #aaaadd; font-weight: bold;">DriverManager</span>.<span style="color: #006600;">getConnection</span><span style="color: #66cc66;">&#40;</span>DBURL + <span style="color: #ff0000;">';shutdown=true'</span><span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #66cc66;">&#125;</span> <span style="color: #000000; font-weight: bold;">catch</span> <span style="color: #66cc66;">&#40;</span><span style="color: #aaaadd; font-weight: bold;">SQLException</span> exc<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #993399;">println</span> <span style="color: #ff0000;">"Shutdown message: ${exc.message}"</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #66cc66;">&#125;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #66cc66;">&#125;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #66cc66;">&#125;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #000000; font-weight: bold;">def</span> listScans<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #000000; font-weight: bold;">def</span> ds = DB.<span style="color: #006600;">dataSet</span><span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'SCAN'</span><span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; ds.<span style="color: #663399;">each</span> <span style="color: #66cc66;">&#123;</span><span style="color: #993399;">println</span> <span style="color: #ff0000;">"Scan ${it.ID}: ${it.PATH}"</span><span style="color: #66cc66;">&#125;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #66cc66;">&#125;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #000000; font-weight: bold;">def</span> findFile<span style="color: #66cc66;">&#40;</span><span style="color: #aaaadd; font-weight: bold;">Long</span> scanId<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #000000; font-weight: bold;">def</span> ds = DB.<span style="color: #006600;">dataSet</span><span style="color: #66cc66;">&#40;</span>createView<span style="color: #66cc66;">&#40;</span>scanId<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #000000; font-weight: bold;">def</span> row = ds.<span style="color: #663399;">findAll</span><span style="color: #66cc66;">&#123;</span>it.<span style="color: #006600;">size</span>&gt; <span style="color: #cc66cc;color:#800000;">5000000</span><span style="color: #66cc66;">&#125;</span>.<span style="color: #663399;">sort</span><span style="color: #66cc66;">&#123;</span>it.<span style="color: #006600;">mtime</span><span style="color: #66cc66;">&#125;</span>.<span style="color: #006600;">firstRow</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span>row<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #993399;">println</span> <span style="color: #ff0000;">"${row.PATH} size ${row.SIZE} modified ${row.MTIME}"</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #66cc66;">&#125;</span> <span style="color: #b1b100;">else</span> <span style="color: #66cc66;">&#123;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #993399;">println</span> <span style="color: #ff0000;">"File matching criteria not found"</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #66cc66;">&#125;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #66cc66;">&#125;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #aaaadd; font-weight: bold;">String</span> createView<span style="color: #66cc66;">&#40;</span><span style="color: #aaaadd; font-weight: bold;">Long</span> scanId<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #aaaadd; font-weight: bold;">String</span> viewName = <span style="color: #ff0000;">"VIEW${scanId}"</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #000000; font-weight: bold;">def</span> row = DB.<span style="color: #006600;">firstRow</span><span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'SELECT count(*) as VCOUNT FROM sys.systables WHERE tablename=?'</span>, <span style="color: #66cc66;">&#91;</span>viewName<span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #aaaadd; font-weight: bold;">Integer</span> <span style="color: #663399;">count</span> = row.<span style="color: #006600;">VCOUNT</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span><span style="color: #663399;">count</span> == <span style="color: #cc66cc;color:#800000;">0</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #aaaadd; font-weight: bold;">String</span> q = <span style="color: #ff0000;">"CREATE VIEW ${viewName} AS SELECT * FROM file WHERE scan=${scanId}"</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; DB.<span style="color: #993399; font-weight: bold;">execute</span><span style="color: #66cc66;">&#40;</span>q<span style="color: #66cc66;">&#41;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #66cc66;">&#125;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; <span style="color: #000000; font-weight: bold;">return</span> viewName</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #66cc66;">&#125;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #808080; font-style: italic;">// Be useful</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #66cc66;">&#40;</span>args.<span style="color: #663399;">size</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>&gt; <span style="color: #cc66cc;color:#800000;">0</span><span style="color: #66cc66;">&#41;</span>? query<span style="color: #66cc66;">&#40;</span>args<span style="color: #66cc66;">&#91;</span><span style="color: #cc66cc;color:#800000;">0</span><span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#41;</span> : query<span style="color: #66cc66;">&#40;</span><span style="color: #000000; font-weight: bold;">null</span><span style="color: #66cc66;">&#41;</span> </div>
</li>
</ol>
</div>
</div>
</div>
<p>
The code takes some explanation. The <code>listScans</code> method is obvious enough, but how about <code>findFile</code>?</p>
<p>The <code>findAll</code> call in the <code>findFile</code> method looks like a procedural statement where we iterate over the tuples of the <em>file</em> table. Maybe you don't believe this, but the two closures (in <code>findAll</code> and <code>sort</code>) are <strong>never executed</strong>! They are compiled into a SQL statement which is run by <code>firstRow</code>. This is stunning nosql.</p>
<p>Well, to arrive at the elegant code we have to go through some SQL hoops after all. (Maybe someone out there can show me a better solution.) The only values allowed in the <code>findAll</code> conditions are literals. So we define a view for the query to operate on. This is done in the <code>createView</code> method. It is possible to use exception handling to check if a view exists, but there will be an ugly message on the console even if you catch the exception. In addition, strings containing <code>${...}</code> (GString) are lazily evaluated and intelligently replaced by SQL parameters by <code>DB.execute</code>. So it is necessary to convert the GString to an ordinary String before submitting it as SQL because the view name cannot be a parameter.</p>
<p>In summary we have created almost nosql counterparts to the programs exercising Neo4j in the <a href="http://www.soderstrom.se/?p=102">previous post</a>. Running the last program yields,</p>
<p><code>/usr/local/thunderbird/thunderbird-bin size 12403548 modified 2006-12-07 09:05:44.0</code></p>
<p>With a sigh of relief it we note that this is the same file found by the Neo4j retrieval program. The modification time format differs only by including fractional seconds. The reason is that we use the Derby <em>Timestamp</em> datatype without formatting.</p>
<p>The two combatants are finally ready for the final fierce, no-mercy showdown.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.soderstrom.se/?feed=rss2&amp;p=140</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
