Graphs and Relational Databases

Situations involving hierarchical data and relational databases are quite common in web applications. Trees lend themselves quite well to providing organizational structures a web site, such as sitemaps and breadcrumb trails. A slightly less common and different type of situation, where the application is just as useful and a solution is a bit more complex to derive, is one involving graphical data (as in graphs, not graphics) and relational databases. These situations have issues like the shortest path problem and find their solutions in graph theory such as the A* algorithm or Dijkstra's algorithm. An example of such a situation is an airline web site that requires the ability to locate connecting and round-trip flights and find the flight path with the lowest cost in terms of time or ticket price.

If you use MySQL, this chapter from "Get It Done with MySQL 5" is a fairly verbose but comprehensive guide to using MySQL to store graphical data. It includes background information such as terminology used in graph theory and has numerous implementation examples of adjacency list graph models, nested set graph models, and breadth-first and depth-first graph search algorithms.

For Oracle users, there's a slightly more application-oriented tutorial that assumes more theoretical knowledge on the part of the reader. It shows that Oracle's hierarchical data features unfortunately can't be used in cases where cycles might exist in graphs (which is handy if you're trying to detect them) and then goes on to show an implementation that uses temporary tables to store a summary of a graph analysis. A worthy side note is that part 2 of that tutorial deals with a more specialized approach using state machines that may or may not be applicable to your situation.

If you'd like more information on this topic, a good place to look is your local university. Most with a computer science program offer a course in theory of computation, which deals with topics like these as well as context-free grammars in the context of developing programming languages. Even if you never actually use this information to develop a language yourself, it can still serve a good purpose: it make you more informed when engaging in discussions about language development, and it can increase your appreciation for the beauty of a language from a user perspective.

PHP, MySQL, and Oracle: An Odd Triangle

Maggie Nelson posted a blog entry recently with a review of an article entitled "Database Design for PHP Programmers" in the February 2008 edition of php|architect magazine. In it, she remarks on the article being MySQL-oriented and how limited MySQL explain plan support is compared to Oracle. I've had some thoughts in my head for a while that are related to these points, so I finally decided to, knock on wood, put pen to paper.

First, I'll agree with Maggie that Oracle beats MySQL hands down in a number of areas. Explain plans in Oracle are significantly more detailed and informative than they are in MySQL. Oracle offers features to support hierarchical data, whereas the most MySQL can recommend is to use the nested set model. Oracle also fully supports set operators, while MySQL still only supports unions. Oracle offers Oracle XE and Oracle SQL Developer to lower the barrier of entry for new developers. MySQL itself is free, but MySQL only currently offers similar developer-targetted features in the form of four separate tools, one of which (Workbench) is still very much in beta. These are just a few places where Oracle comes out the clear victor against MySQL. From a developer perspective, it's no contest.

The story doesn't end there, unfortunately. The major issue with Oracle is licensing, which still costs a pretty penny these days. Oracle is lowering this barrier a little to include midsize companies, but obviously MySQL still comes out more cost-effective. MySQL itself isn't completely innocent of causing licensing issues, but it has at least remained free and compatible with PHP in terms of licensing up to this point. The LAMP platform became popular with good reason: it gives smaller companies a scalable, zero-cost, effective means of rapidly developing applications. At the moment, the most Oracle can claim is two out of those three points.

In addition to licensing, Oracle also comes with additional administration costs. I'll use the managed server hosting provider Rackspace, with which I've had a fair amount of experience, as an example of this. They provide hosting for some fairly big names, Apple being one of them. Take a look at the database services they offer. They support both MySQL and Oracle. If you look closely, though, you'll notice this statement on the Oracle page: "Oracle Server Support is not available to our Managed Support Level
customers. For information about our Intensive Support Level, please click here." I think that about sums up my point.

I would actually like to see us switch to Oracle where I work. As a developer, I prefer its feature set, enough so that I'm willing to overlook some of its nastier qualities. However, with the additional costs that would come with licensing and administration by Rackspace (which is still very reasonable for what they provide), I don't think the cost difference is one that I can justify. Until the licensing and administration requirements that come with it are lessened, I don't believe Oracle is going to be an ideal solution for me or the average PHP developer.

Additionally, though I'm still a bit skeptical, the Sun buyout of MySQL could accelerate MySQL's development and in time put it closer to being on par with Oracle's feature set. Only time will tell on that one, but I'd say it's just as much of a possibility as Oracle expanding its offerings to include smaller companies. It's still too early after the buyout to make any remotely definitive predictions as to its long-term effects on MySQL as a product.

But, as Dennis Miller would say, "That's just my opinion, I could be wrong." What's your take?

NULLification

I've seen some "interesting" things during my time with database systems, but the one that takes the cake by far is variations in how NULL is interpreted. I'm going to provide some examples to showcase what I'm talking about using Oracle and MySQL, being that my experience is mostly with those two particular systems. Examples given are run on Windows XP SP2 using Oracle Express Edition 10.2.0.1.0 and MySQL Community Edition 5.0.45.

The stated intention of the existence of NULL is to convey the absence of any value. Both Oracle and MySQL say as much (and I particularly like MySQL's explanation of the reasoning behind this). Oracle, however, immediately goes on to contradict that principle and maintain that a character value with a length of zero is considered to be equivalent to NULL.

Oracle: SELECT CONCAT('test', NULL) FROM dual; -> 'test'
MySQL: SELECT CONCAT('test', NULL); -> NULL

This behavior manifests itself in some other "interesting" ways.

Oracle: SELECT 'true' FROM dual WHERE LENGTH('') IS NULL; -> 'true'
MySQL: SELECT LENGTH(''); -> 0

If you're wondering why I've formatted my Oracle query differently in this example, it's because this particular release of SQL*Plus appears to not want to give me a definitive answer. If I use the equivalent Oracle query, I simply get blank space where I would expect to see NULL. If you invert the WHERE clause in the query, you'll see what I mean. I wonder why this is?

To make things worse, it is only indirectly noted in the Oracle LENGTH function documentation in the statement that passing NULL to LENGTH will result in NULL. You have to read the section on NULL to find out that the empty string is equivalent to NULL, then put two and two together in order to figure this out. Oracle has made the statement that it is possible this behavior will change in the future. Given backward compatibility-related implications, however, I highly doubt that.

Strings aren't the only area to which I have an objection with respect to NULL; numbers are, as well. Prior to MySQL 5.0.13, the following example was handled in what I believed what "the right way." In versions 5.0.13 and later, it was changed to use the same logic as Oracle.

Oracle: SELECT 'true' FROM dual WHERE GREATEST(1, NULL) IS NULL; -> 'true'
MySQL: SELECT GREATEST(1, NULL); -> NULL

What both do now is cause NULL to be the result if any one operand in the expression is NULL. In dealing with a general expression, I can understand this. When that expression deals with a logical set of associated row- or column-wise values, which this and some other functions do, the absence of a value for a single object in the set should not cause this to happen.

MySQL isn't even consistent about this; in a section on common problems with NULL in its documentation, it's stated that "Aggregate (summary) functions such as COUNT(), MIN(), and SUM() ignore NULL values." In another section on working with NULL values, it's stated that "Two NULL values are regarded as equal in a GROUP BY." NULL conveys the absence of a value, so how can this be?! Both Oracle and MySQL have a set of functions and operators for dealing with NULL. At least these treat NULL consistently.

In short, I'm not saying NULL shouldn't exist. I know it has its uses, such as rewriting queries to avoid using subqueries in order to improve efficiency. I'm saying that mainstream database systems should bear in mind the reason for the existence? of NULL when deciding how any given function call or expression should handle encounters with it. Until they do, all I can do is recommend caution when dealing with fields where creating fields that may contain NULL and querying against said fields. Be cognizant of these fields and operations involving them and make use of your respective database's functions and operators for handling these cases so as to avoid unexpected results. Don't let your data be nullified!

Page:  1