Graphs and Relational Databases

Situations involving hierarchical data and relational databases are quite common in web applications. Trees lend themselves quite well to providing organizational structures a web site, such as sitemaps and breadcrumb trails. A slightly less common and different type of situation, where the application is just as useful and a solution is a bit more complex to derive, is one involving graphical data (as in graphs, not graphics) and relational databases. These situations have issues like the shortest path problem and find their solutions in graph theory such as the A* algorithm or Dijkstra's algorithm. An example of such a situation is an airline web site that requires the ability to locate connecting and round-trip flights and find the flight path with the lowest cost in terms of time or ticket price.

If you use MySQL, this chapter from "Get It Done with MySQL 5" is a fairly verbose but comprehensive guide to using MySQL to store graphical data. It includes background information such as terminology used in graph theory and has numerous implementation examples of adjacency list graph models, nested set graph models, and breadth-first and depth-first graph search algorithms.

For Oracle users, there's a slightly more application-oriented tutorial that assumes more theoretical knowledge on the part of the reader. It shows that Oracle's hierarchical data features unfortunately can't be used in cases where cycles might exist in graphs (which is handy if you're trying to detect them) and then goes on to show an implementation that uses temporary tables to store a summary of a graph analysis. A worthy side note is that part 2 of that tutorial deals with a more specialized approach using state machines that may or may not be applicable to your situation.

If you'd like more information on this topic, a good place to look is your local university. Most with a computer science program offer a course in theory of computation, which deals with topics like these as well as context-free grammars in the context of developing programming languages. Even if you never actually use this information to develop a language yourself, it can still serve a good purpose: it make you more informed when engaging in discussions about language development, and it can increase your appreciation for the beauty of a language from a user perspective.

PHP, MySQL, and Oracle: An Odd Triangle

Maggie Nelson posted a blog entry recently with a review of an article entitled "Database Design for PHP Programmers" in the February 2008 edition of php|architect magazine. In it, she remarks on the article being MySQL-oriented and how limited MySQL explain plan support is compared to Oracle. I've had some thoughts in my head for a while that are related to these points, so I finally decided to, knock on wood, put pen to paper.

First, I'll agree with Maggie that Oracle beats MySQL hands down in a number of areas. Explain plans in Oracle are significantly more detailed and informative than they are in MySQL. Oracle offers features to support hierarchical data, whereas the most MySQL can recommend is to use the nested set model. Oracle also fully supports set operators, while MySQL still only supports unions. Oracle offers Oracle XE and Oracle SQL Developer to lower the barrier of entry for new developers. MySQL itself is free, but MySQL only currently offers similar developer-targetted features in the form of four separate tools, one of which (Workbench) is still very much in beta. These are just a few places where Oracle comes out the clear victor against MySQL. From a developer perspective, it's no contest.

The story doesn't end there, unfortunately. The major issue with Oracle is licensing, which still costs a pretty penny these days. Oracle is lowering this barrier a little to include midsize companies, but obviously MySQL still comes out more cost-effective. MySQL itself isn't completely innocent of causing licensing issues, but it has at least remained free and compatible with PHP in terms of licensing up to this point. The LAMP platform became popular with good reason: it gives smaller companies a scalable, zero-cost, effective means of rapidly developing applications. At the moment, the most Oracle can claim is two out of those three points.

In addition to licensing, Oracle also comes with additional administration costs. I'll use the managed server hosting provider Rackspace, with which I've had a fair amount of experience, as an example of this. They provide hosting for some fairly big names, Apple being one of them. Take a look at the database services they offer. They support both MySQL and Oracle. If you look closely, though, you'll notice this statement on the Oracle page: "Oracle Server Support is not available to our Managed Support Level
customers. For information about our Intensive Support Level, please click here." I think that about sums up my point.

I would actually like to see us switch to Oracle where I work. As a developer, I prefer its feature set, enough so that I'm willing to overlook some of its nastier qualities. However, with the additional costs that would come with licensing and administration by Rackspace (which is still very reasonable for what they provide), I don't think the cost difference is one that I can justify. Until the licensing and administration requirements that come with it are lessened, I don't believe Oracle is going to be an ideal solution for me or the average PHP developer.

Additionally, though I'm still a bit skeptical, the Sun buyout of MySQL could accelerate MySQL's development and in time put it closer to being on par with Oracle's feature set. Only time will tell on that one, but I'd say it's just as much of a possibility as Oracle expanding its offerings to include smaller companies. It's still too early after the buyout to make any remotely definitive predictions as to its long-term effects on MySQL as a product.

But, as Dennis Miller would say, "That's just my opinion, I could be wrong." What's your take?

Latest Oracle Gripes

I've been saving up Oracle-related gripes for a little while now. I've organized these into categories and, while each category isn't really enough to stand as a blog entry on its own, all the gripes collectively make for an entry of a decent length. So, here goes.

Oracle 10g Express Edition

A coworker of mine actually came across one of these issues. Specifically, she found out that XE by default does not include the grants necessary to allow the native UTL_FILE package to be used by PL/SQL routines. It would be nice if this was documented somewhere that didn't require a significant amount of digging to find out.

I managed to come across another one, and apparently an obscure one at that. The development team can't even figure out how to replicate this one. For no explicable reason, the Enter key stopped working within worksheet tabs. The fix is as simple as going to Accelerators in Preferences, explicitly selecting Default, and clicking OK. This was a really annoying bug to live with until I found out how to fix it.

Oracle Designer

This is the main laundry list of the post.

First off, if you perform a DDL export from a tab other than the DB Admin tab, grants are not included in the resulting deployment script. If you're deploying a nontrivial number of entities, it makes it annoying to have to switch tabs and go through the deployment process again.

This leads to the next point: all entities have to be reselected each time you want to deploy them. It would be a lot less tedious and annoying to, say, allow for sets of related entities to be created that could be deployed collectively by just selecting the set.

Going back to DDL exports, changes cannot be generated against a database unless you have the privileges necessary to deploy them. I honestly don't understand why this is, as read privileges are all that are necessary to read the data dictionary tables, which are all that should be required for the test. This limitation doesn't make the program very conducive to workflow environments.

Again in DDL exports, exported entities are not intelligently ordered in the generated deployment script such that dependencies come before reverse dependencies, which necessitates manual modifications to those deployment scripts and ergo documentation describing these modifications so that the script can be regenerated at a later time and potentially by a different developer.

Can you tell I'm not overly happy with the DDL export?

PL/SQL

I only really have one gripe for this section, but it's a big one: the first note in the Oracle Application Server mod_plsql User's Guide, section 3.6 Parameter Passing lies! It may be technically correct, but what it doesn't mention is that there is a way to return a default expression as opposed to a default value. It's a bit of a workaround, and it showcases how difficult it can be to work with aggregate data structures in PL/SQL, but it works. Documentation being this lacking is just completely substandard.

Conclusion

I think that about wraps it up. I don't do straight-up rants that often, but in these cases I felt it was necessary. It's entirely possible no one at Oracle will ever read them, but at least I can say it wasn't because they weren't stated somewhere.

NULLification

I've seen some "interesting" things during my time with database systems, but the one that takes the cake by far is variations in how NULL is interpreted. I'm going to provide some examples to showcase what I'm talking about using Oracle and MySQL, being that my experience is mostly with those two particular systems. Examples given are run on Windows XP SP2 using Oracle Express Edition 10.2.0.1.0 and MySQL Community Edition 5.0.45.

The stated intention of the existence of NULL is to convey the absence of any value. Both Oracle and MySQL say as much (and I particularly like MySQL's explanation of the reasoning behind this). Oracle, however, immediately goes on to contradict that principle and maintain that a character value with a length of zero is considered to be equivalent to NULL.

Oracle: SELECT CONCAT('test', NULL) FROM dual; -> 'test'
MySQL: SELECT CONCAT('test', NULL); -> NULL

This behavior manifests itself in some other "interesting" ways.

Oracle: SELECT 'true' FROM dual WHERE LENGTH('') IS NULL; -> 'true'
MySQL: SELECT LENGTH(''); -> 0

If you're wondering why I've formatted my Oracle query differently in this example, it's because this particular release of SQL*Plus appears to not want to give me a definitive answer. If I use the equivalent Oracle query, I simply get blank space where I would expect to see NULL. If you invert the WHERE clause in the query, you'll see what I mean. I wonder why this is?

To make things worse, it is only indirectly noted in the Oracle LENGTH function documentation in the statement that passing NULL to LENGTH will result in NULL. You have to read the section on NULL to find out that the empty string is equivalent to NULL, then put two and two together in order to figure this out. Oracle has made the statement that it is possible this behavior will change in the future. Given backward compatibility-related implications, however, I highly doubt that.

Strings aren't the only area to which I have an objection with respect to NULL; numbers are, as well. Prior to MySQL 5.0.13, the following example was handled in what I believed what "the right way." In versions 5.0.13 and later, it was changed to use the same logic as Oracle.

Oracle: SELECT 'true' FROM dual WHERE GREATEST(1, NULL) IS NULL; -> 'true'
MySQL: SELECT GREATEST(1, NULL); -> NULL

What both do now is cause NULL to be the result if any one operand in the expression is NULL. In dealing with a general expression, I can understand this. When that expression deals with a logical set of associated row- or column-wise values, which this and some other functions do, the absence of a value for a single object in the set should not cause this to happen.

MySQL isn't even consistent about this; in a section on common problems with NULL in its documentation, it's stated that "Aggregate (summary) functions such as COUNT(), MIN(), and SUM() ignore NULL values." In another section on working with NULL values, it's stated that "Two NULL values are regarded as equal in a GROUP BY." NULL conveys the absence of a value, so how can this be?! Both Oracle and MySQL have a set of functions and operators for dealing with NULL. At least these treat NULL consistently.

In short, I'm not saying NULL shouldn't exist. I know it has its uses, such as rewriting queries to avoid using subqueries in order to improve efficiency. I'm saying that mainstream database systems should bear in mind the reason for the existence? of NULL when deciding how any given function call or expression should handle encounters with it. Until they do, all I can do is recommend caution when dealing with fields where creating fields that may contain NULL and querying against said fields. Be cognizant of these fields and operations involving them and make use of your respective database's functions and operators for handling these cases so as to avoid unexpected results. Don't let your data be nullified!

More Oracle and Java Woes

Today I continued the trek toward completing the project described in my last entry. Though I don't think I ran into as many issues today as I did in the past week or so of working on the project, today certainly had it's fair share.

First up was a rather interesting exception being thrown by a JDBC operation, namely "java.sql.SQLException: SQL string is not Query." This is apparently intended to be JDBC's way of explaining that PreparedStatement.executeQuery() doesn't work for DML operations. To execute one of those, you have to use either execute() or executeUpdate(). Thankfully, a forum thread was able to point me in the right direction on that one.

Next on the list, if Oracle JDeveloper 10.1.3.3.0 tells you "The WAR file is already up to date," don't believe it! I don't know what logic it's using to decide whether or not the class files constituting a WAR file are out-of-date, but there are definitely some cases where it's flawed. I spent a better part of the morning trying to figure out why everything from undeploying and redeploying the EAR file to bouncing the OAS installation was still giving me illogical output. Come to find out, I didn't know the WAR file not being updated was relevant to the problem at the time, but it certainly proved to be in the end! Tried searching for a bug report on this, but came up empty, so maybe it's just me.

Last but not least, I take issue with the language used in the mod_plsql User's Guide to describe its process of file upload handling. Though it never explicitly states this, it seems to imply that the internal handling of performing an INSERT operation to place data for an uploaded file into the document table takes place in a separate transaction from that of the action procedure that gets executed afterward.

You have to go to the PL/SQL User's Guide to read why this is not the case. To sum it up, a transaction can span multiple procedures. A procedure being executed as a data cartridge operates within a transaction that is implicitly committed when that procedure terminates so long as no uncaught exceptions are raised. However, until that point, the effects of any DML operations executed are only visible to the procedure. This includes the INSERT procedure performed by mod_plsql on the document table. What this effectively means is that the only way something other than the procedure can see that the inserted record exists unless the procedure does an explicit COMMIT.

If you read my last post, you know that I was calling a servlet from the data cartridge. You can probably imagine the amount of aggravation this caused me when I ran my servlet locally without issue, had to backtrack to figure out where the servlet was failing when it was deployed, and then found out that a single COMMIT statement at the beginning of my data cartridge procedure made things work as expected. So, yay for lacking Oracle documentation.

I did get the servlet working, though. It can now pull data from the database, convert it from Excel binary to CSV format, and put the converted data back into the database. So, the Clean Content API, while not specifically designed for the purpose for which I'm using it, is at least a somewhat capable solution. That basically sums up my day, folks. I'll be back on Monday to do it again.

Page:  1 2