Possibly random thoughts of a oddly organized dba with a very short attention span

9.25.2008

the Exadata Storage Server

So I'm not at Oracle Open World but I have been waiting anxiously for the expected announcement from Larry, plus all the other interesting stuff that usually gets posted about OOW. Hearing about Exadata is pretty exciting for me as we've been considering some projects that would be well suited to this type of platform.

Late last year, I began researching some of the competing vendors for very large but simply structured databases. We talked to Teradata, Netezza, Vertica and ... HP Neoview. I went as far as a proof of concept build with Vertica as I figured I needed to turn my thinking sideways to grasp the columnar stuff (actually it was very easy to use) and I just saved all the data on the other guys until something built enough momentum to start setting up more demo builds.

I thought it odd at the time, but the HP Neoview crowd was a little stand-offish and wasn't pushing to get their foot in the door like the other guys. We thought it meant their product might not be quite ready for prime time but now it looks like there could be a whole other side to that story. I mean, if a customer already using Oracle is shopping data warehouse applicances but doesn't need it right now and you know your company and Oracle are combining efforts on something big later that same year, that could explain the slow-down-you're-moving-too-fast feeling from the sales team.

Whatever the story, Exadata is definitely going on the list and moving right up to a top contender. My biggest concern with Vertica was the lack of instrumentation - I can't imagine going back to an uninstrumented tool for critical applications. (but Vertica did promise to add the ability to retain the elapsed time so variance could be calculated to their list of potential enhancements.) Exadata should give us far more speed and retain all that good stuff we've come to depend on. Very anxious to learn more and get a demo build going.

Still curious about one thing though ... will Neodata go away? or is there something else that makes it unique enough to remain a player? the answer probably doesn't matter for us but I'd still like to know.

9.06.2008

Cause and Effect, the chancellors of God ...

... Ralph Waldo Emerson, 1856

This is the long overdue follow up to the post on the Apollo Root Cause book. It's so far overdue in fact that I had to go back and reread (well, skim) the chapter. I knew there were points I wanted to remember, just couldn't remember exactly what they were.

Anyway, according to Gano, there is a cause and effect principle and it consists of 4 characteristics:

1. Cause and effect are the same thing.
2. Causes and effects are part of an infinite continuum of causes.
3. Each effect has at least two causes in the form of actions and conditions.
4. An effect exists only if its causes exist at the same point in time and space.

Apollo Root Cause, pg 44

Characteristics one and two are related. Since causes and effects exist in an ongoing continuum, a cause creates an effect and that effect becomes the cause of the next effect, and so on and so on. This means that a cause in one analysis could be considered an effect in another analysis. Our choice to analyze the root cause of any effect is therefore contingent on us deciding that a particular effect is important enough to analyze, usually because we want to stop it from coming back.

There might be some room for debate around point 3 but not much. It may be possible to make a single change to a database that would create a specific effect, but it's also more likely that one change will impact many processes, leading to a tangle of causes and effect, some good, some bad, some negligible.

And in day to day troubleshooting, the interrelationships between all possible factors in a complex system practically guarantees you'll be looking at multiple causes and conditions. Determining which one can fixed to stop an unwanted effect, without causing further unwanted effects, is the bigger question.

Characteristic number 4 is the one that is most thought provoking, and probably the reason I wanted to remember the chapter in the first place. Gano uses the effect of a fire, which needs the presence of fuel, oxygen and a match as the causes required to be present at the same time, in the same space. Yet in an Oracle system, how many times do we run into a problem today that results from a change that occurred weeks or even months ago. Perhaps Microsoft sent out a patch, but the server wasn't restarted until later. Or an index was added for one process that later impacted a job that is only executed once a month. In some cases, the action that creates the effect occurs well before the effect is noticeable. In other cases, a prior cause only becomes relevant after a second change occurs, like when a system wide parameter is set to resolve one problem, later a new patch set is applied, and suddenly that parameter is no longer a good thing for your database.

What is true is that all the conditions that create the effect have to exist at the same point in time and space, but the trigger that creates the conditions may have occurred at an unknown time in the past. That's one of the reasons system troubleshooting can be complex but it's also what keeps the job interesting.

Haven't decided if this concept will be useful ... it may be a little too philosophical and the Logic of Failure stuff covers it in a more relevant way, but I figured I better write it down so I didn't have to read the chapter a third time :)

Search This Blog

Loading...

My Blog List

disclaimer ...

The views expressed on this page are mine and do not reflect the opinions of my employer, former employers, other associates or the opinions I may hold 3 days from now.