Possibly random thoughts of a oddly organized dba with a very short attention span

5.31.2009

on the importance of a good data model ...

An article written by Bert Scalzo was published in Information Management this week. The topic is 'Is Data Modeling Still Relavant?': it's short, to the point and well worth reading.

The article doesn't recommend a specific tool, it simply recommends the practice of capturing a model of the data at rest (old school approach) in addition to the newer techniques that focus on capturing the data in motion (process flows, business process modeling, etc). According to Mr. Scalzo, capturing the data-at-rest will lead to a solid design, one capable of maintaining both data accuracy and performance, and I absolutely agree. If a database administrator want to become a crucial resource to their company, knowledge of the data is the most direct path to achieve that goal. If you take the time to understand the data, understand how it is used and how it can be used by your customer, put that knowledge into the data structure and you will develop a system that your customer cannot live without. When the customer says, 'We didn't think of this in the initial planning, but we really need to be able to do X', (and they always, always say that) you will be prepared to provide that functionality far faster than they expected. (of course, it's up to you and your management whether you tell them you can do it two hours or two weeks - those decisions become all about managing workloads and expectations) Some times that extra functionality creates additional billables and revenue, which further increases your personal value right along with the system's value. It's not my role to negotiate which features are billable and which features become a gift to make up for some other issue, I just tell the suits, 'Yes, that can be done easily' or 'Yes, we can do that in the next release'. Sometimes I even get to say 'Yes, that functionality is already in there'. With a good data model, I don't have to tell them no and I like it that way.

The inherited project I've mentioned before landed in my lap with no data model at all, and a few months ago, it wasn't even capable of it's promised functionality much less anything extra. There was no real structure to the design, no constraints, no primary keys, no unique keys. (it did have duplicate records and that's not a good thing). It's been reworked and rebuilt, and in the last two weeks, I've been able to say 'Yes' to quite a few requested new features. The customer is beginning to see the real potential of the system, which is bound to lead to even more plans for new features and tools, and even more work and interesting challenges. None of this would have been possible with the original data un-model.

There was a second bit of the article I found interesting:

"... people get so caught up in novel paradigms such as extreme programming, agile software development or scrum that they compromise data modeling, or even skip it entirely. The problem is that these new approaches don’t always spell out exactly how data modeling should be incorporated, so people often forego it."

A separate project that I support on an as needed basis is in the midst of an agile development effort. They're building a new product which similar to an existing product so it's being built on the existing data model. I'd pointed out in the past that the data model is lacking in a few areas but nobody has time to revamp it as it would be a large undertaking with most of the code requiring a rewrite. I don't know much about the new app, and it may be that the data model is perfectly suited for this one. However, I have noticed that I wasn't needed for this 'agile' effort, until testing resulted in memory problems and by then, the release date was right around the corner.

Does anyone have an opinion about Mr. Scalzo's statement above? Do these programming paradigms ignore the step of data modeling? If so, do you think it is a failure to mention data modeling with the expectation that data modeling occurs elsewhere in the development effort? Or do people really believe that data models are not needed within Agile or extreme programming?

28 comments:

Noons said...

"people really believe that data models are not needed within Agile or extreme programming"

Nothing, but absolutely nothing, could surprise me anymore from that lot. Their entire modus operandi is to belittle any prior science in the name of sacrosaint "the new". Results are obvious and well known.

Robyn said...

Hi Noons,

I have to admit, this one surprised me. How does anyone start coding a database application without a data model?

Clearly I am old school (or just plain old.) On my project, I consider the data model to be THE most important deliverable. If the model's good and the developers are following the model, they can't go too far astray ...

John Brady said...

I work for a software house, and as I see the only thing that their application does is manipulate data stored in a database. Financial data at that. Yet as you say, a well designed data model comes lower in their priority list than things like application features and functionality and hitting deadlines.

What I have seen is that newer development methods (Object Orientation) and tools (Java) let developers more easily write code that manipulates data "objects" internally, and then somehow magically persist that data to a database. So the developers focus on application functionality, assuming that the "data persistence" problem will be taken care of somehow. And when it does not happen "transparently" and there are "issues", it then becomes a DBA area problem and not a development area problem.

The developers argue that they are taking a "model led design", but unfortunately it is an "object" led design, and not a "data" led design. From my experience, the object to relational mapping is non-trivial, and many object led designs are imperfect - there are major revisions between version 1.0 and 2.0. So what happens to the database structure between version 1.0 and version 2.0? And more importantly, what happens to all of the customer's data in that database?

Also, in many Object Models there can be a lot of repetition between different objects. Customers, People, Organisations for instance will all have names and addresses. If the Data Model were designed first this commonality would be recognised, and a single set of tables designed to store this data. Whether one shared table or separate identical tables does not matter - it is that the design is the same for all of them. But with separate development teams designing their own objects as they go, you run the risk of having separate Address objects for each piece of functionality developed, all differing in one way or another.

Data and Objects are not the same thing. They may have a lot in common, but they are not the same thing. And while developers continue to propagate this myth, we will continue to have poor data models, and poor databases, and poor performance as a result. I've gone on more than I wanted to, but you can see how I get annoyed when people say they are taking an "Object based development approach" and that the Database design can be ignored because they have an Object design and the database will simply drop out of the object design.

John

Troy P said...

I can take John's comments a step further. I work for a large telecom shop, and one of my biggest personal pet peeves is seeing new apps being rolled out internally to cover things that could've been done through existing applications/databases. For example, we have at least eight different places where the same set of call detail records reside, for eight different applications to access, albeit in slightly different ways. But it's the SAME DATA! Had the front-end developers known that the data they needed already existed somewhere else, they could've easily leveraged the existing hardware, and at least some of the existing code, and saved a ton of time and money. Instead we're stuck having 8 copies of the same data in different places every month, all because certain developers/architects didn't look at what was already there in terms of the actual data.

Robyn said...

Hey John,

Thank you for stopping by and I'm very pleased that you did go on longer than you intended - you've made some excellent points.

In the case of our database that has evolved into a good data centric model, there is another 'data model' document that is exactly what you describe - it's object oriented and class based, and it bears no resemblance at all to the database data model. When the database development was occurring with ONLY the object level data model doc to guide it, the database was useless and even worse, the integrity of the data was jeopardized. At this point, the developers working on the external application still refer primarily to the object level data model, while the development team working on the internal coding relies on the data centric model. It's not exactly nirvana ( I suspect we've coded the business logic in both places on a few tasks ) but it's gradually improving and the potential of the data store is becoming evident. (and the database is processing the business logic so much faster )

I've worked on some large third party apps, and the vendors always told me the data model was proprietary - we weren't even supposed to look at it but I always did anyway. Some were better than others but there's been a general slide away from good data models across the industry and it baffles me. Seems like everyone around me is working so much harder than they have to and ending up with a less capable product to boot.

thanks again for sharing your insights ... Robyn

Robyn said...

Hi Troy ....

Now you're talking about the enterprise wide data model - a very rare animal indeed but I have seen a few. At Lockheed, we had a model that showed just the connections between the manufacturing and engineering systems. It was affectionately referred to as the 'spaghetti chart'.

cheers ... Robyn

John Brady said...

Robyn,

Glad to hear you agreed with my points on "Object led design", and that others are seeing the same issues too.

I'll throw in another point. In my view the "Data Model" should stand in its own right. This is because the data and its meaning should be constant and consistent, regardless of which particular application is manipulating it.

If the "Data Model" is completely tied in to the application and its implementation, and cannot be separated from it, then this is admitting that it does not capture the true "meaning" of the data, and that such a database could not be used by another application, or be queried directly.

What I really mean by a "Data Model" here is a Logical or Conceptual Model, and the main data entities and the relationships between them. This should always be the same for a given set of data, regardless of which particular application is manipulating the Data itself.

Of course the details of the implementation of this Conceptual Model can differ. And all specific applications will require unique elements within the physical database - extra staging tables to hold intermediate results, flags to indicate the type of data or entry, status codes, etc. But these should be in addition to the core data elements identified in the Conceptual Model.

In other words, although the Physical Database will have lots of extra data columns and tables, you should still be able to map large parts of it back to the Entities in the Conceptual Model very easily.

I have worked on some projects where the Physical Database bore no resemblance to the Conceptual Model, because of various issues such as those already mentioned. In one case no foreign keys were defined, columns were all uniquely named, and the surrogate generated primary key column was named ID in every table. All relationships between tables had to be guessed at, as the developers did not document the Physical Database in any way.

It is things like this that wind me up, as it is much more difficult to add them after the event, and it would only have taken a little effort up front to add in proper documentation of the Conceptual Data Model.

John
Database Performance Blog

Robyn said...

Hi John,

You and I are in complete agreement on this one too. The data model should be able to stand on it's own and it should provide a complete picture of the data and the data relationships. The application that interacts with the data should not define the data model, nor should it be necessary to understand how the application works to understand the data model. It is important to document the flow of the data through the the application, but this is best captured in process or data flow chart, i.e. the data-in-motion view. (i like that term)

My favorite presentation in the past year was Toon Koppelaars' "The Helsinki Declaration". Toon covers many application/data/database truths, but the BIG TRUTH is that the database is normally the most stable, longest lasting component of a system. Applications come and go, tools fall in and out of favor, but data lasts and data is usually what the business values. Yes, some applications manage transient data that loses value at a rapid rate, and those applications are suited to the persistence models but I would argue that it's still important to understand and document those data entities in a data-at-rest state before you start building the the data-in-motion models, or the data in-and-out of memory application.

And this comes back to the reality I've seen in application design over and over again. If you get the data model right so that it accurately represents all the pieces of data and the relationships between them, it becomes so much easier and faster to build a useful application and to add new functionality when it is needed. It's the antithesis of Agile development: you move much slower in the first phases because it's so critical to understand the data and the business process, but that slow start up allows you to pick up speed at an exponential pace when you're actually building something. And oddly enough, this leads to real agility which works out better anyway because customers always come up with tons of new things they want just about the time you think you've finished testing.

Plus, it makes more sense to do the heavy lifting/thinking at the beginning of the project because customers are much quieter and better behaved in the early phases of projects before they get the direct phone numbers for the technical people set up on speed dial :)

ok ... now my brain has jumped back to aircraft scheduling: learning curves, prototypes, first article builds, and the slow ramp up to steady state production. I've got to get some sleep but that might make a good parallel thread some day :)

Thank you again for the comments - very good stuff.

Robyn

Noons said...

Robyn, please re-read what I wrote.
I think you misunderstood it...

As for how does one start coding a db app without a data model, all I can say is: ask the agile/extreme mob. They specialize in that nonsense. I think they call it "data in motion" model. Sure....
;)

Robyn said...

hey Noons,

I don't think I misunderstood your comment. I was personally quite surprised to discover that Agile development was leading people to believe the data model was unimportant - clearly you've had more exposure to this Agile creature than I have. There are a fair number of Dilbert cartoons hanging on the walls in my work area, most poking fun at Agile and the related chaos. Part of my surprise was discovering that it was actually in use.

On one of our projects, there are frequent references to the 'Data Model', yet I'd seen nothing that had even a slight resemblance to such an animal. Eventually I figured out that we were using the same language to refer to completely different things, which led to some pretty baffling team discussions.

There's been a lot of blood, sweat and tears poured into the other 'Data Model' and I believe that it is useful to the front end developers - but it couldn't take the place of what was missing. So we could start some sort of holy war between the data geeks and the object/class crowd, but the point is both kinds of information have their place and neither can stand in the others stead. There were some aspects to the customers data that were NOT clear until we had a data-at-rest, ERD picture of the information. It needed to evolve through several iterations, but the detailed discussions couldn't even happen until it existed.

Unfortunately, development did move forward, only to have to fall back and regroup once the real picture was understood. On the plus side, it's all going to work out just fine. The original version of the happenstance schema is officially dead, wooden stake through the damn thing's heart as of 5:30 am this morning. I've been up for about 39 hours now, but it was worth it and honestly, I'd throw an all-night party tonight if there were enough like minded souls in the area.

If you still think I misunderstood your comment, let me know. I value your viewpoint so I'd like to be sure I get it right.

I won't be surprised next time around ...

cheers ... Robyn

Noons said...

It's I who misunderstood you, Robyn. My apologies. Been a long week...

Yes indeed: been involved with some of the precursors of agile/extreme. I thank the Gods all that is behind me for the time being!

The problem is always the same: modern development teams confuse the data flow of an application with its data model.

Mostly because few have had a complete IT education. Most just "play around" with modern development tools for the "certification", or because they once did a shopping cart, or a web site for a friend.

It's a pity that along the line modern training mantra forgot that data modelling existed not because it created overheads and delays but because it SOLVED them!


Data flow and data model could not be more different. The former cannot exist without the latter.

Back when IT design and development was in the hands of responsible professionals, one of the disciplines of system development was data modelling - and schema design.

It was well understood and demonstrated many, many times that one cannot create a data flow out of nothing.

There has to be a base model as a foundation of one or more data flows, each mapping to its application

The fixation of modern developers with "rapid" non-data-model development is misguided if not downright dangerous for its later costs.

I do recall being involved with a system where not a single minute was spent developing a data model.

Sure: the screens showed up in very little time. Not every was perfect, but at least they were usable.

And the team was lauded as having achieved great "savings" in the development process.

When it came time to amplify and expand the app with additional requirements, those responsible for the design sheepishly admitted they would have to recode the ENTIRE application in order to make it meet the new process requirements.

Data was the same, processes had just evolved.

Of course, they got out quickly and the whole remaining project came crumbling down.

Some "savings", eh?

This sort of attitude to development is what I call "hit and run": get it going as fast as possible, get out quick before someone can pin on you the oncoming disaster.

Unfortunately, in my experience it's how the vast majority of modern development is carried out. With very few exceptions...

Good blog entry, this one!

Connor McDonald said...

Ah yes...the agile world. I'm currently on an agile project, where everything gets done in 2 weeks cycles. We are about 3 such cycles away from "project completion" - and as I look at the schedule I see one task called: "concurrency" planned for the penultimate cycle. Hmmmmm....that seems a tad late to me :-)

Agile (without a model) is hilarious and way too common. I was once on a project (also close to completion)that had to refactor pretty much all of the code because a core entity turned out to be 2 entities (one-to-many). Minimal upfront modelling led to this sequence of events, but to my amazement, this was been spun as a "positive outcome", that is, "lucky we caught this a month before go-live", whilst all the time I'm thinking quietly to myself: "Shouldn't this have been 'caught' a month before coding started?"

I get the feeling that the agile developer's secret agenda is to have all data models consisting of:

create table DATA
( key INTEGER,
stuff BLOB);

Robyn said...

Hi Conner,

I suspect you may be right on the secret agenda: there have been fewer and fewer tables and more and more *lobs in the development work I've seen lately. The database we just killed came very close the secret agenda ideal: less than 5 tables and only 3 of those were real data.

The really scary thing is that performance and concurrency problems caused by a failure to understand the data are caught far too late in the Agile development process, making it very difficult, if not impossible, to implement real improvements.

But as you saw on your project, as long as there are spin doctors, that message will be obscured and replaced with a 'positive outcome' ...

cheers ...

John Brady said...

But one of the mantras of Agile development is the principle of delaying irreversible decisions until the last possible moment. Hence things like Data Models and database schemas of tables and columns can be left to later phases of the development process. So the problem with the lack of a consistent Data Model in many Agile development projects is actually a direct result of the Agile philosophy itself.

Sometimes I think of Agile as a case of "do anything rather than nothing, whether it is right or not". And this is deliberate for the perception of progress in the application development - "we developed X modules in the past week". Whether any of this stuff will stand up to real world workload scenarios is another thing.

Another Agile principle is of incremental development and delivery. It is very easy for a developer to go in and rip out a piece of poorly performing existing application code, in whatever language, and replace it with something that is either better in some way or has more functionality. Unfortunately you just cannot do this in a Database. You cannot throw away old data in old tables and simply create new empty tables. You get a lot of very annoyed customers when you try and do that on their databases.

Agile Development seems to completely forget about the need for delivery of incremental Database Structure Changes (as I refer to them). Unlike application source code you cannot simply do a delete and replace. You must provide SQL scripts that perform incremental changes to the tables and the data in them, to achieve whatever the new structure must be, while always maintaining the existing data in the database. And this can be tricky, but is not impossible.

Again, leaving such important and critical issues to the later stages of a development project is just ignoring the oncoming storm, and then being completely unprepared for incremental database changes when you actually have to deliver the full working application software.

The only references I could find on Data Modelling in Agile Development some time ago were actually about Refactoring. This is about internal changes within the physical database, but with no change to the Data Model itself. Because the Data Model never changes, no corresponding application changes are needed.

This is the opposite of what Agile Development is doing. It is changes to the Application that drive changes to the Data Model that in turn need to be implemented in the real, physical database.

Also, Connor, don't joke about the Blob approach. ;-) Currently one of our development teams has decided to migrate a legacy non-Oracle application to Oracle using precisely this approach. Data that was stored as records in ISAM type files, will be stored in Oracle as byte sequences in Blobs, with an added surrogate generated unique primary key. When I ask what the point is of such an approach, no one can give me a sensible answer, other than "it works".

I just wonder what you call an Oracle Database where all the data in it is invisible to Oracle itself and cannot be accessed and manipulated directly using SQL? Or conversely, would you call a set of numbered binary records a database? Or just a collection of numbered binary records?

The delivery focussed people say that this conversion method is "good enough" with minimal effort and time involved. I say it isn't, and that the first customer to try it will complain very, very loudly. If any customer bothers to buy it that is.

And at some point in the future they will all realise that this method is a chocolate teapot of a database conversion (think about it if you've not come across that comparison before), and we will have to go back to square one and do another migration strategy all over again from scratch. Only one that makes sense this time.

John

Robyn said...

Hello John,

Your very first paragraph brings me back to my original question. The idea that one should delay making irreversible commitments until the last possible moment, or better yet, until the moment that you are certain of your path, is not necessarily a bad idea. I've made most of my educational and career decisions based on that approach, and it's worked rather well.

Example: I could not commit to a major in college. Every time I took a new subject, I found something else I was interested in. (ADHD) I changed my major 5 times while working on my bachelor's degree. Fortunately, I recognized that I could explore certain topics while still making sure those classes would count toward some aspect of my general education requirements. By doing this, I ended up completing my general education classes first, ie: the foundation of my education. In spite of all of my changes in direction, I ended up with only one class that didn't count toward a degree requirement. Had I followed followed the 'Agile' approach without making sure I addressed core competencies, I'd still be in school, working toward my 42nd major. (while considering yet another change)

I agree with all of your assessments of how Agile development is currently implemented in the software world. It's chaos and 99.99% of the people using Agile development assume that the foundation is no longer important. Thus far, every project I've seen developed using the whole Agile package has either been scrapped and rebuilt, or it desperately needs to be. My question is, did the originator of Agile expect that no one would be foolish enough to start the development process without a solid foundation architecture (and data model) and therefore expect that certain components would be in place before starting the scrums and iterative releases? Or did the Agile creator really believe that that it was no longer necessary to consider the relationships of the bits of business data to be processed? In either case, the key lesson is to never underestimate the human potential for silliness but I have seen plenty of excellent, original ideas evolve into something that the originator never intended.

Is Agile development one of those things?

As for the short sprint release thing, every developer, DBA or *nix admin that I have ever worked with ends up dealing with a production outage and mentioning 'I meant to go back and change 'x' but I:

a. forgot
b. got too busy
c. no longer had a charge code
d. no longer had access to the system

We may all have the best of intentions, but circumstances and schedules get in the way, and there's always more features being added to the mix. Next thing you know, there's a reoccurring problem in production but the underlying problem is now a beast that requires an enormous amount of work.

Besides, expecting to have to go back and refactor or release the work seems like a major inefficiency to me. Yes it happens, but it's that something we're trying to avoid? Making it a regular part of the process makes it normal, and therefore acceptable, to release bad code.

Ken said...

Hi everyone,

I'm a newbie on this blog but am already taken with it and its commenters.

I was on a pace-setting Agile project on top of Oracle. My contract was an unplanned, emergency position to deal with performance problems in the database. The developers were using Hibernate and apparently relied on it to create their tables on an as-needed basis, simply adding a table or a column in response to the (streaming) user requirements and not worrying about the model until it crawled out of the swamp and attacked them.

It's easy to attack a method like Agile; it may work fairly well when there's no database (or only something simple or read-only). In this instance, there was no method at all before Agile, so it was an improvement.

Instead, I lay the blame at the door of technical management, as it appears to allow them to bypass requirements, modeling, documentation (though we did have a 'documentation sprint' at the end) and even testing.

It appears to give something for nothing. It puts something onto the screen quickly, which looks like progress to the naive user.

I always sound the warning if I have the chance.

Thanks for hosting good thoughts and comments. Papers like the ones discussed here give me good ammo at the project meetings.

Regards, Ken

Robyn said...

Hello Ken,

Very nice to meet you and you've started out by making some good points. Agile development might work very well for projects without databases. It might also be very effective for projects WITH databases IF the development process includes a phase to evaluate and understand to data entities and the relationships. This should occur at the beginning of the project so that Agile development can be completed with a solid foundation. This is not to say that the database design has to cast in stone before work can begin, but we should certainly have a good grasp of what we know and what we expect may need to change.

Soon, I plan to post more about a project that is proceeding with agility AND a data centric approach, thanks to some very talented developers. (and if I don't, somebody call me on it) The approach I'm taking with my management it to show them the benefits that we gain in this new system as compared to others that have neglected the database design. As you say, it's easy to attack the method. It's much more effective to prove your points with actions and facts. (a whole lot of work ... but effective)

However, we should all keep sounding the warning every chance we get, anytime we see development proceeding without a solid database design ... somebody will listen, and even if they don't heed our advice this time, next time they might. (yes, I'm ever optimistic, even when encircled in doom)

Hope to hear from you again,

Robyn

Ken said...

Hi Robyn,

I don't know if you're taking requests, but allow me take the liberty of
setting out some issues I'd like to see you take into consideration on
this vital issue.

To begin: why do we even have methodologies like Agile? What ever
happened to methodologies with requirements, designs, traceability, test
plans, and the like?

I think that part of the answer lies in our cultural evolution. As a
society, we are becoming more accustomed to quick feedback (and payoffs
in investment), an inescapable result of the speed-up of society (think
mobile phones, chat rooms, etc.). The corollary is that we are less
patient with the long, complicated processes of reading, diagramming,
and chewing on alternatives that go into an undertaking like database
modeling. In short, 'data in motion' is more exciting and tangible than
'data at rest'.

I perceive a distinct cultural divide (something like C.P. Snow's 'two
cultures') between who grew up and went to college before commodity
computing (roughly, the baby boomers) and those who are now under 40. At
age 55, I'm a graybeard in the shop and find that others my age are much
more likely to share my outlook. As I get older, I find that managers
(and users and developers) increasingly fall into the younger camp and
are impatient with what, to them, is bureaucratic gold-plating (if
you're not part of the permanent staff, it can also look like padding
the bill). I did my computer-science degree at two different schools.
The one with the better reputation had no course offering for anything
resembling software engineering, and I suspect that the trend is to drop
or de-emphasize this aspect of CS education.

The second problem is financial. I recall in the 80s that projects in
large organizations would spend months in requirements, modeling,
prototyping, etc. Outside of high-risk domains like medicine or hardware
control (and not always there), this now seems as distant as punched
cards. Companies put off new development until the pain is unbearable,
by which time the pressure to produce leads irresistibly to those
methodologies which give the shortest time estimate.

Assuming that you share my perceptions, what I would like to see is:

1) how (or can) traditionally labor-intensive early project activities
be accommodated to this new reality? (A requirements scrum? A modeling
sprint? Animated data models?)

2) what are the best means to persuade, to 'sell' the idea of looking
before leaping to a skeptical audience? We know it's true, but how can
we show that it's cost effective?

Thanks for the consideration.

Brian Tkatch said...

@Ken

I would like to add to your reasons for for the quicker development process.

1) Many solutions are for transient problems. As such, a long development cycle--in those cases--outlives the need for the product.

2) Everybody on the team want to make a name for themselves. Going through the design process shares the importance by finding everyone's contributions equally important.

To make a name for oneself, things usually have to be either cheaper or faster (more reliable is, sadly, something not recognized enough). As design slows it down and does not make it visibly cheaper it doesn't help with making a name for oneself. Get a product out the door now, does.

Robyn said...

Ken,

This is new ... my readers are adding to my task list. Could be the start of a dangerous trend :)

Actually, I do share your perceptions and I think your suggestions touch on key issues. There is a cultural divide between old school and new school, plus there's been a significant shift in the kinds of products being developed. Choosing the right methodology is not a one-size-all proposition. Instead, we need to combine the beneficial tools from the old requirements/modeling/prototyping methodologies with the new approaches in use today. Cost effective requires time effective, and each project has it's own tipping point. The amount of time and effort spent on any phase of development should depend on the specific complexities of the project, not an arbitrary schedule from the latest management book.

The 'how to sell it' question is a big one. With management, the questions will always come down to ROI and TCO. We've got to convince them that support costs can be reduced by increasing reliability. That we can be more responsive to change requests with a sound data model, and that responsiveness will lead to increased customer satisfaction and sales. But recognizing those truths takes a big picture view, and if the focus is always on the immediate sale, some won't ever see it.

So I accept your challenge, sir and will at least propose a few ideas for everyone's consideration. Perhaps with a little audience participation, we'll come up with some sound answers.

cheers ... Robyn

p.s. to Brian ... Excellent point about the solutions for transient problems. The life cycle of the product needs to be weighed against the length of the development cycle. One of the first considerations should be the life of the data within the system. Short-lived data may be fine with a minimalistic approach to the model. But the longer the data is expected to live, the more time is needed to understand it and plan for it's care and feeding.

Brian Tkatch said...

@Robyn

Thanx. I can't take credit for the transient point. In our team room, i sit next to one of our BAs who cannot praise Agile enough. I took him to task on it and he challenged me with the transient case.

I admitted that on that point, he was correct. But it usually isn't the case. As you suggested, he was only thinking of active development time ending when the system is in the inactive supported mode.

In our case here, we're replacing something like a twenty-year old system and the new one should be the same. Though ongoing requirements gathering has taken a half-year already (this is a large application and will be used on multiple continents) and the active development phase is about two-years. To him, that meant go Agile.

Robyn said...

yikes ...

Based on the 'how long the data lives' test, I'd say he was dead wrong. Based on the 2 year development active development cycle, I'd say .... he's still dead wrong.

There is an Agile project in process right now. It's a new app that will be very similar to an existing app. I think their total development cycle is about 2 months, not 2 years.

That project did fail to revisit the data model and unfortunately, I've seen scalability issues in the data model in the original app. If an additional phase for data requirements and schema design had been added to this development effort, we'd be looking at maybe 3 months of development time, tops. If the changes proved to be beneficial, we could have retrofit them to the original system. Then we would have agility AND scalability. That would be an appropriate use of the methodology in my book.

Then my only complaint about Agile would be the crowd of developers that block the isles every morning for the standup meetings. I could live with that ...

Robyn

Brian Tkatch said...

Robyn, luckily this sin;t fully Agile. He just wished it were that way. :)

Kevin Closson said...

Sorry to poach the thread, Robyn, but I've got to know..uh, is this the John Brady I know from the old Sequent days? If so, I want you to know that I'm still upset about not getting that Steak and Kidney Pie on that lovely Friday afternoon at the Brick Layer's... that "thing" had Steak, but no Kidney and it was a bit shy on pie :-)

If you see Ian Cramb say, "Hi" for me! Boy oh boy that guy drove a fast car! That was fun!

Robyn said...

Hey Kevin,

You are of course welcome to poach any thread you like, but if you want to reach John, you might want to poach one of his. His blog has got some very good stuff on it and based on his description, this could be the John Brady you're looking for:

http://databaseperformance.blogspot.com/

Hope he comes through on the Kidney and Steak Pie thing. (shudder!)

cheers ... Robyn

Bryan said...

Robyn,

I am a university student in Computer Science and I have found your post and especially the subsequent comments very intriguing. You heavily discussed the importance of data modeling and the horror stories that resulted from ignoring it. Now I'm wondering, how do you know when you have a good data model? Do you use any factual measurements and/or subjective quality ratings to quantify or verify a good data model?

Thanks for your input!

Robyn said...

Hello Bryan,

That was an excellent setup :) I just happen to have some new numbers to publish that I think demonstrate the value of a good data model. I'll be posting those very soon.

Some benefits of a good data model are difficult to measure, but the results will still be visible. When the data model has been optimized for the database/application, the data will tend to mimic the real world, which makes it possible to respond to customer requests very quickly. (Dare I say 'with agility'?) I've been told that I'm too agreeable on calls with the customer - I promise to deliver everything they request. Honestly, once the data model was correct, everything they've requested has been easy to do.

However, deciding what makes a good data model for a specific application is one of those 'it depends' situations, which is why new projects should start up front by digging into the data before trying to answer that question.

I personally think you start with third normal form, and only denormalize once you know it's appropriate and necessary for your application. Even if you do decide to go with something other than third normal in your database, the time spent understanding the data and breaking into appropriate groups and sets will be beneficial. Time spent shifting through sets of the data, and testing different groups and results can give us a much better understanding of what the information means to the user. And the more we understand about how users interact with the information, the better our applications will be.

Data models are one of my favorite things and I will try to expand on the how and the why in the future. For now, I'll go post the measurement data ...

Cheers ... Robyn

Bryan said...

Robyn,

Thanks for the response! I am excited to see your numbers, as being able to quantify the value of a good data model sounds like it would benefit a lot of the impatient developers mentioned in comments here.

It's quite telling how a good data model lets you deliver on any inquiry. As such, would there be any metrics you would use that signify you are working with a good model?

Best,
Bryan

Search This Blog

Loading...

My Blog List

disclaimer ...

The views expressed on this page are mine and do not reflect the opinions of my employer, former employers, other associates or the opinions I may hold 3 days from now.