Tuesday, December 4, 2012

The Snowman Architecture Part Three: The Technical Benefits


This is the third part of a four part blog.about the Snowman Architecture. The first part was The Snowman Architecture: An Overview. The second part was The Snowman Architecture: The Economic Benefits. In this part, I will be discussing the technical benefits of the Snowman Architecture. But don't read this until you have read the overview! And if you care about things like ROI, check out part two.

But whatever you do, don't miss this part! Over the next twelve pages or so (yes, I know it's a bit long) I'm going to take you through fourteen of the most compelling technical reasons why the Snowman Architecture is a huge improvement over today's approaches to large IT. You are now reading nothing less than my declaration of war on traditional IT methodologies.The first snowball has now been fired!

Review

I gave an overview of the Snowman Architecture in part one, but let's review briefly.

The Snowman Architecture breaks down a large IT system into small vertically partitioned subsystems called Snowmen. These snowmen interact with each other through asynchronous messages. Snowmen are designed to be as autonomous as possible using a design methodology known as Simple Iterative Partitions1 (SIP).

Snowmen come in three layers. The head of the snowman consists of the business functions that make up a capability. The torso of the snowman consists of the technical systems that support those business functions. The bottom of the snowman consists of the data that is used by those technical systems. Each of these layers is strongly partitioned based on the business functions that make up the head.

Snowmen reach out to each other through their arms, the asynchronous messaging system. Often this is implemented as an SOA.


Snowmen reach out to each other through their arms.

Contrast to Traditional Architectures

A traditional IT architecture is also implemented in three layers. These layers are the same as those of the Snowman Architecture: business architecture, technical architecture, and data architecture. What differentiates the Snowman Architecture from a traditional IT architecture is the strong vertical partitioning, as shown in Figure 1.

Figure 1. Traditional IT Architecture vs. Snowman Architecture

It turns out that this strong vertical partitioning has a major impact on the effectiveness of the architecture. Let's take a look at fourteen key non-functional attributes of a large IT system. As you will see, every single one of them is improved by the strong vertical partitioning that characterizes the Snowman Architecture.

In this analysis, I assume that the system we are evaluating is a large (greater than ten million dollar) system. This is the point at which traditional IT architectural methodologies are no longer able to keep up with the exponential increases in system complexity2. I also assume that SIP was used to assign business functions to the head of the snowman, an essential step to minimizing the overall complexity of the Snowman Architecture.

Okay, given these two assumptions, let's see why the Snowman Architecture outperforms all traditional approaches to large IT system design. I'll start by listing the fourteen attributes and then go through them one by one. The attributes I will look at are these:
  • Business Alignment 
  • Regulatory Compliance 
  • Auditing 
  • Security
  • Agile Friendliness 
  • Maintainability 
  • Testability 
  • Reliability 
  • Recovery 
  • Throughput 
  • Scalability 
  • Flexibility 
  • Cloud Effectiveness 
  • Vendor Lock in
You might as well do a quick check-point. Are any of these attributes important to your IT systems? If not, you might as well stop reading now. If one or more of these are of interest then keep reading.

Okay, now let's go through them one by one. Feel free to skip those you don't care about.

Business Alignment

A system is well aligned when it meets the needs of the business. You can think of business alignment as the Wow factor. When the system is delivered, does the business say, "Wow!" Or does it shake its collective head and reach for the nearest bottle of Tequila?

In any system design life cycle, there is a phase in which the business requirements are gathered. In the traditional approach (the left hand side of Figure 1) the requirements are gathered more or less immediately after the project has been approved and before the technical architecture is designed.

The size of the requirements document(s) is always proportional to the size of the project. Massive projects require lots of requirements documentation, often tens of thousands of pages. The larger the stack of requirements, the lower the chances are that those requirements accurately reflect what the business actually needs.



In SIP (the guiding design methodology for the Snowman Architecture) an additional project phase is introduced: the Partitioning Phase. This is when the basic shape of the Snowman is identified.  

This is when the basic shape of the Snowman is identified.
What is important from an alignment perspective is that this partitioning of the larger system into smaller autonomous snowmen takes place before the requirements have been gathered. Since a typical snowman rarely exceeds one million dollars in cost, it's requirements are modest. Since the requirements are relatively modest, it is more likely that those requirements accurately reflect the actual business need.

Since the Snowman Architecture has more accurate requirements than the traditional IT architecture, it is likely to actually meet the business need.

Regulatory Compliance 

An IT system is considered compliant when it can be shown to operate within the constraints of regulatory laws and regulations. Some enterprises such as the video gaming industry have few if any restraints. Others, such as financial organizations, have a complex web of laws and regulations. 

Our ability to show that a given IT system operates within its regulatory constrains is dependent on the complexity of the system. The more complex the system, the more difficult it is to prove compliance.

Large traditional IT systems (the left side of Figure 1.)  have a highly complex web of relationships between business functions, technical processes, and data. Thus it is very difficult to prove compliance.

The Snowman Architecture (the right side of Figure 1.) has a number of simple relationships between business functions, technical processes, and data. The architectural simplicity is guaranteed by the SIP directed partitioning of the business functions into snowmen heads and the strong vertical partitioning that occurs once the technical and data architectures are created.

Because each snowman is relatively simple, it is correspondingly easy to prove that it operates in compliance with any relevant regulatory constrains.

Auditing 

An IT system is considered auditable when we can accurately trace data changes back to technical processes, from there back to business functions, and from there back to human beings. The more paths there are to the data, the more difficult it is to trace these paths. 

The traditional IT architecture has a large number of complex paths to the data. It is nearly impossible for an examiner to determine which of these paths resulted in a particular item of data being updated .

The Snowman Architecture has a small number of simple paths to the data. An examiner can easily get the whole picture and then determine which of these few candidate paths resulted in a particular item of data being updated. 

An examiner can easily get the whole picture.

Security

We when talk about the security of a system, we are generally talking about our ability to protect data. Data, of course, resides in a database. So security comes down to our ability to configure the database so that unauthorized updates are not possible. In a traditional architecture, there are so many processes that need to update so many parts of the database under so many different circumstances that it is difficult to figure out a secure configuration. And even if one does manage a secure configuration, the next process that is added will change everything.

In the Snowman Architecture, database configuration is much easier. Because the partitioning of the head of the snowman (the business processes) dictates the partitioning of the technical layer and then the partitioning of the data layer, the only processes that will ever need to access the data in the snowman are the processes in the torso of the snowman. This makes configuration easy: allow the processes in the snowman torso to access the data in the snowman bottom and don't allow any process outside the snowman to access any data in the snowman. Viola. Done.

Agile Friendliness 

Many organizations are attracted to Agile Development Methodologies. I agree that Agile development has a lot of promise. However I also think it doesn't scale. 

A recent paper by Vikash Lalsing et al.3 indicates that Agile projects of 0.5 person years or less are excellent candidates for Agile development. They predict such projects will be less than 10% over budget. By the time the project  size reaches 3.6 person years, the budget overrun increases to 18%. And by the time the project size reaches 8.2 person years, the budget overrun increases to 66%.

The Snowman Architecture is ideally suited to projects of greater than $10M. This equates to an effort of close to 100 person years. This is more than ten times the project size that yielded the 66% budget overrun. 

In the Snowman Architecture, the larger project is broken down into relatively simple, autonomous chunks of project work. Each of these chunks becomes an individual snowman. The use of the SIP methodology ensures that not only is each snowman as simple as possible, but the relationships between snowmen are as simple as possible. 

The project size of any one snowmen is unlikely to exceed $1M and in many cases will be much less. A $1M project is around 7 person years. This is still large by Agile standards, but far closer to a workable agile number than a project that does not have the benefit of the Snowman Architecture.

Maintainability 

A system is maintainable when it is easy to locate the source of bugs. The more complex the system, the more difficult it is to find the source of bugs.

The complexity of the traditional architecture (the left side of Figure 1.) is much higher than the complexity of the Snowman Architecture. The maintainability of a traditional architecture is therefore much lower. The use of the SIP methodology guarantees that not only is the overall complexity of the Snowman Architecture low, it is as low as it can possibly be4. Simplicity is important when it comes to snowmen. A simple snowman is simple to maintain.

A simple snowman is simple to maintain.

Testability 

System bugs can manifest themselves at any point in the system life cycle. The later in the life cycle the bug is manifest, the more problems it causes. Our goal in system testing is to find bugs in the system as early as possible and definitely before the system is delivered to customers. The most common strategy for system testing is to write code or scripts that exercise the system and ensure it is working correctly. 

To be sure a system is working correctly you must write test code that exercises every possible logical path through the system. The more logical paths there are through the system, the more difficult it will be to create the test code and the more likely it will be that you will have missed an important path. This translates to a greater likelihood that you will ship buggy code.

There are two reasons the Snowman Architecture is more testable than the traditional architecture.  

The first reason the Snowman Architecture is more testable has to do with the number of paths. 

A traditional IT architecture has many possible paths. By the time the system reaches a few million dollars in size, it effectively has an infinite number of paths and there is no way they can all be tested.

In the Snowman Architecture, each snowman can be tested independently. Since each snowman is relatively small and simple, there are relatively few paths through the snowman. Once you have tested all of the snowman and the connections between them, you have effectively tested the system as a whole. Thus your chances of shipping buggy code are greatly reduced if you are using the Snowman Architecture.

The second reason the Snowman Architecture is more testable has to do with how pieces of the system are connected together. In a traditional IT architecture, segments of code are often connected by shared data in a database. In the Snowman Architecture, snowmen are almost always connected through asynchronous messages. 

These two approaches to connections are very different from a testability perspective. Shared data connections are almost impossible to test. There are just too many ways the data can be accessed. Asynchronous messages, in contrast, are very easy to test. One need only write a messaging harness, a common practice among service-oriented architectures, and the connection points become easily tested. 

So we see two reasons the Snowman Architecture is so easier to test than the traditional IT architecture. First, it has fewer code paths. Second, it uses asynchronous messages for its connection points. It is hard to test a traditional IT architecture. It is easy to test a Snowman Architecture.

It is easy to test a Snowman Architecture.

Reliability 

Reliability is a measure of the typical amount of time a system will remain running before it unexpectedly drops dead. Reliability is often described as mean time between failures. 

Reliability is related to testability, the last attribute I discussed. The more testable a system is, the less likely it is to have post-delivery bugs. It is these post-delivery bugs that cause systems to fail. The fewer bugs, the less likely the system is to fail. Since the Snowman Architecture is easier to test than the traditional IT architecture, is is likely to have fewer bugs and thus will be less likely to fail.

But there is another factor that favors reliability of the Snowman Architecture. This has to do with how easy it is to quarantine a bug. Consider Figure 2, which is a blowup of the left hand side of Figure 1 with some labels added for reference.

Figure 2. Blow-up of Traditional IT Architecture.

Assume that database D crashes. We have three processes dependent on D, namely, n, o, and p. So these three processes crash. Processes g and i are both dependent on n, so they both go down. Process i is also dependent on o, but since it has already crashed, we need not worry about it further. Processes i, d, and f are all dependent on p. Process i is already down, but now d and f join the fun. So now we have D, n, o, p, i, d, and f all down. This can corrupt any databases they are involved with which includes C and E. This brings down their dependent processes b and h. Which in turn... you get the picture. There is no quarantine, so when one part of a system catches a bug, that bug can rapidly propagate to the entire system.

Contrast this to Figure 3., which shows a closeup of the Snowman Architecture.

Figure 3. Closeup of Snowman Architecture

Assume in Figure 3. that database B crashes. It can bring down processes d, e, f, and g. But that's it. The boundaries of the snowman have effectively quarantined the bug from spreading further. The only connections between d, e, f, g and other processes are through asynchronous messages, and these channels can easily be protected. So while the bug in B may crash the entire snowman, there is no pathway for the bug to spread further.

The bottom line is that bugs occur less frequently with the Snowman Architecture (because it is easier to test) and when they do occur they tend to have only a local impact. In the traditional architecture, bugs occur more frequently (because it is harder to test) and when they do occur, they tend to have a global impact.

Recovery 

Recovery is related to reliability (the last section.) Whereas reliability measures how often the system fails, recovery measures how long the failure lasts. In an ideal system, we have high reliability and fast recovery, meaning that the system rarely crashes and when it does, the crash doesn't last long.

It is difficult to develop an effective recovery strategy for a traditional IT architecture. There are too many databases, too many processes, and too many ways everything can be related to each other. When this web of relationships goes down, what do you do? You try to protect the entire system but this is difficult because the system is a large, it is complex, and it is a moving target. 

In contrast, it is easy to develop an effective recovery strategy for a Snowman architecture. All you need to do is shadow any requests to the snowman to a backup snowman. Then if a failure occurs, reroute all new requests to the backup. This rerouting can occur as quickly as one can notice that the primary snowman has failed. This is shown if Figure 4.

Figure 4. Recovery Mechanism for Snowman
Taking this and the last two sections together, I can make the following claims about the Snowman Architecture relative to the traditional IT architecture:
  • The Snowman Architecture will have fewer bugs.
  • The bugs will have less impact.
  • Recovery from that impact will be faster.

Throughput 

Throughput refers to the amount of work a system can process in a unit of time. Often we measure throughput in transactions per minute. Throughput should not be confused with response time which measures how long a single user waits for work to be completed.

Throughput is important because it directly influences cost. If a system has low throughput, then a lot of resources are needed to process a given workload. If a system has high throughput, the number of resources needed to process the same workload is much less. 

There are two architectural factors that strongly influence throughput: the number of synchronous connections and the amount of shared data. Synchronous connections slow down throughput by blocking processes until connected processes have completed their work. Shared data slows down throughput by blocking databases.

Both of these factors come together in the traditional IT architecture. These systems heavily favor synchronous connections and make extensive use of shared data. Between the two, throughput is substantially degraded.

In the Snowman Architecture, synchronous connections are only used within a snowman. All (or almost all) connections between snowman occur through asynchronous messaging. Which means no blocked processes.

In the Snowman Architecture, shared data is the equivalent of a multi-headed snowman. This is anathema to the Snowman Architecture. The only processes that are allowed to share data are those that live within a single snowman. Since only a few processes ever share data, database blocking is kept to a minimum.

Between the judicious use of asynchronous messaging and non-shared data, the Snowman Architecture performs at a much higher throughput that does the traditional IT architecture. This means lower costs per unit of work which means lower IT costs.

Shared data is the equivalent to a multi-headed snowman. This is anathema to the Snowman Architecture

Scalability 

Scalability refers to our ability to support larger and larger workloads. Say we have designed a system to support 100 concurrent customers and then our system become so popular that we must support 500 concurrent customers. Our ability to adapt to the higher customer load is dependent on our scalability.

In the past, scalability was seen as a hardware power problem. It was assumed that to allow a system to process larger and larger workloads, it had to run on more and more powerful hardware. When the current system could no longer support the workload, the hardware would be upgraded. This could involve faster processors, more memory, or larger disk drives. In the worst case, this involved replacing smaller cheaper machines by larger expensive machines.This is the model that served the power computer companies like IBM and Sun so well.

Today, scalability is seen as a hardware numbers problem rather than a hardware power problem. We now assume that to process a larger workload we don't replace cheap machines with expensive machines, instead we get more cheap machines. This is the model that powers the most scalable systems in the world today such as Google. Google runs its entire system on inexpensive throw-away hardware and has for more than a decade5.

Given this modern view of scalability, there are three factors that determine how scalable a system is.

The first scalability factor is the compactness of the system. Smaller, more compact systems are easier to scale. Larger, more disperse systems are harder to scale.

The second scalability factor is the usage of asynchronous messages. The judicious use of asynchronous messages goes a long way toward making a system scalable. Think of an asynchronous message system as like a mailbox. As mail comes in faster, one adds more receivers. As long as any of the receiver's can process the mail, scalability becomes limited only by the number of receivers you can support.

As mail comes in faster, one adds more receivers.
The third scalability factor is the size of the database on which the system depends. Because databases have such specialized hardware requirements, they are the most difficult part of a system to scale up.

To compare the scalability of the Snowman Architecture versus a traditional IT architecture, we must start by defining the unit of scalability. In the Snowman Architecture, the unit of scalability is an individual snowman. In a traditional IT architecture, it is the entire system.

In comparing the two architectures, we see the Snowman Architecture outperforming the traditional large IT architecture in all three scalability factors. First, it is much more compact. Second, it uses asynchronous messaging in all the right places, at the boundaries to the snowmen. Third, it minimizes the size of the data pool that must be scaled by enforcing the concept of strict vertical partitioning.

As a result, the Snowman Architecture is much more amenable to scaling using the modern efficient approach to scalability, scaling by numbers. The traditional IT architecture is largely consigned to the much more expensive and inefficient approach to scalability, scaling by power. Today, scaling by power seems as quaint as vinyl records.

Flexibility 

Flexibility refers to our ability to modify the system as our business needs evolve. Say we have build our payment system to take credit cards and we now want to take debit cards. How easy is it to update our system to take debit cards as well as credit cards?

Our ability to modify our system depends on how complex the system is. The more complex the system, the more difficult the modifications will be to implement  Traditional large IT systems are very complex. They are therefore very difficult to modify. Frequently changes in one part of the system causes unexpected problems in other parts of the system. 

The Snowman Architecture is composed of a series of autonomous, self-contained, relatively simple snowmen. Because of the synergy algorithms used by SIP to partition business functionality across snowmen, it is highly likely than any modifications necessary for a specific business change will all be located within a single snowman. Since any given snowman is simple (certainly relative to a traditional IT system) we can expect that the modifications will be much more straightforward than they would be with a traditional architecture.

Cloud Effectiveness

The cloud is an attractive platform because of its "pay for what you eat when you eat it" pricing model. But to leverage this platform, it is important to structure your systems so that you eat the least amount possible to accomplish your work.

Traditional large IT systems are poorly organized to leverage this model. Because of their sprawling nature, all or most of the system must be running on the cloud to accomplish even the most trivial of tasks. This means that you are paying for all or most of the system even when you are using only a small part of it. Even worse, when you need to add new instances to handle larger workload, you are adding sprawling new instances that quickly drive the cost out of sight.

The Snowman Architecture is a collection of smaller snowmen, each dedicated to a group of closely related ("synergistic") tasks. In most scenarios, a given workload will require only a single snowman. This means that you are paying only for the resources that that snowman requires. And when you add new instances, you add them in small, inexpensive, snowman sized amounts.

Figure 5 contrasts the traditional IT architecture and the Snowman Architecture running on the cloud.
Figure 5. The Cloud: Traditional IT Architecture versus
The Snowman Architecture.

Vendor Lock-in

A system exhibits vendor lock-in when it is dependent on a single vendor for some aspect of its life support. Usually this vendor is the one providing the software platform.

Vendor lock-in is either good or bad, depending on your perspective. If you are the client, vendor lock-in is bad. It puts you in a weak bargaining position with your vendor. If you are the software platform provider, vendor lock-in is good. It puts you in a strong bargaining position with your customer.

The standard customer approach to avoiding vendor lock-in is through the use of standards. If the customer builds a system on a standard API, then the customer can easily port the system to another software platform that supports that same API. Or at least, that is the logic.

How do vendors achieve lock-in in the face of a plethora of standards covering everything from data storage to virtual systems? Vendors achieve lock-in through the tried and true process called embrace and extend. Embrace and extend is a two part process. First, the vendors embrace a particular standard. Then the vendor extends the standard in vendor specific ways. These extensions are the bait that draws in the customer. The goal is to make the extensions so powerful that they are irresistible. Once the customer has taken the bait, they are trapped. Lock-in is complete.

I have seen many customers try to resist the bait with corporate edicts forbidding the use of any vendor extension. In the end, resistance is futile. You will be assimilated.

The larger and the more complex the system, the more difficult it is to locate and remove the vendor extensions. This mean it is more difficult to port the system to another vendor. If you can't take your code to another vendor, you are locked-in. And your future is now in the hands of a company whose main goal is wringing as much money as possible out of you in the next contract negotiation.

As I said, resisting vendor extensions is pointless. The best strategy for avoiding vendor lock-in is to make it as easy as possible to locate and rewrite those sections of a system that have used the vendor extensions. Your ability to locate and rewrite those sections is dependent on how small and simple the system is. We are dealing with the same issues I discussed in the section on Modifiability. Small, simple systems are easy to modify. Large, complex systems are not.

Thus small and simple is your best defense against vendor lock-in. And if you want small and simple, don't look to standards. Look to snowmen.

Summary

In part one of this blog, I introduced the Snowman Architecture. In part two, I discussed the non-technical advantages of this architecture. In this part, I have discussed the many technical advantages of this architectural approach.

If you are building a large IT system (say, over $10M) the Snowman Architecture offers a huge number of compelling advantages over traditional approaches. These advantages range from better security to improved reliability to lower cost to greater flexibility. In fact, there is not a single non-functional requirement that will not benefit from the Snowman Architecture.

If, at this point, you are preparing to build a large IT system and you aren't seriously considering the Snowman Architecture, then I don't know what else I can say. One of us is crazy.

One of us is crazy.
Stay tuned for part four of this blog, in which I will discuss the arguments against the Snowman Architecture and why they are all flawed.

- Roger Sessions
Houston, Texas

Did you find any errors (even spelling) in this blog? Let me know. I'd love to correct them.

Would you like to subscribe to notifications about my blogs, white papers, and webshorts? Sign up here.

References

(1) See, for example, the Web Short SIP Methodology for Project Optimization by Roger Sessions. Available here.

(2) See, for example, the Web Short The Relationship Between IT Project Size and Failure Rates by Roger Sessions. Available here.

(3) PEOPLE FACTORS IN AGILE SOFTWARE DEVELOPMENT AND PROJECT MANAGEMENT by Vikash Lalsing, Somveer Kishnah and Sameerchand Pudaruth in International Journal of Software Engineering & Applications (IJSEA), Vol.3, No.1, January 2012. Available here.

(4) The Mathematics of IT Optimization by Roger Sessions. (White Paper). Available here.

(5) WEB SEARCH FOR A PLANET: THE GOOGLE CLUSTER ARCHITECTURE by by Luiz André Barroso, Jeffrey Dean, and Urs Hölzle in IEEE Micro March/April 2003 Available here.

Acknowledgements

The snowman photos are all from Flickr under Creative Commons license. The photographers are, in order of appearance: 


Legal Notices

This blog is copyright (c) 2012 by Roger Sessions. It may be copied, reposted, and printed as long as it is not modified in any way. Other than that, unauthorized usage prohibited. Ask, though. I'll probably agree.

SIP is a trademark (t) of ObjectWatch, Inc. ObjectWatch is a registered trademark of ObjectWatch, Inc. All other trademarks are owned by their respective companies.

Thursday, October 18, 2012

Snowman Architecture Part Two: Economic Benefits


This is the second part of a four part blog about The Snowman Architecture. The first part was The Snowman Architecture: An Overview. In this blog, I will be discussing the economic benefits of the architecture. But don't read this until you have read the overview!

In the next installment (part three) I will discuss The Technical Benefits of the Snowman Architecture. The fourth part, by the way, will be The Criticisms, in which I will describe the many criticisms of the Snowman Architecture and why they are all wrong.

Originally I had planned to cover all of the benefits (economic and technical) in one blog. It turns out there are just too many benefits for one blog so I have had to separate them into those that are more economic in nature (this blog) and those that are more technical in nature (the next blog.)

Review

The Snowman Architecture breaks down a large IT system into small vertically partitioned subsystems called snowmen. These snowmen interact with each other through asynchronous messages. Snowmen are designed to be as autonomous as possible from each other using a design methodology known as Simple Iterative Partitions1 (SIP). Figure 1 shows an IT system designed using the Snowman Architecture.


Figure 1. Snowman Architecture

The Snowman Architecture is in contrast to a traditional architecture that uses a methodology such as TOGAF2 to create a horizontally partitioned system. Figure 2 shows an IT system designed using traditional methodologies.


Figure 2. Traditional Horizontally Partitioned Architecture

Points of Contrast

There are several contrasts that immediately jump out in comparing the Snowman Architecture to the traditional approach. 

The first contrast is in the orientation of the partitioning. The Snowman Architecture uses a strong vertical orientation to the partitioning. The traditional approach uses a weak horizontal orientation to the partitioning.

The second contrast is in the number of subsets in the partition. The Snowman Architecture supports an unlimited number of vertically oriented subsets (snowmen). The transitional approach has exactly three horizontally oriented subsets (business architecture, technical/SOA architecture, and data architecture.)

The third contrast is in the strength of the partitioning. The strength of the partitioning refers to the porosity of the boundaries separating subsets. The more "stuff" that passes between subsets, the greater the porosity. Porosity weakens the partitions, so the greater the porosity, the weaker the partition. The Snowman Architecture partitioning is strong, indicated by the minimal number of connections between subsets. The traditional horizontal architecture partitioning is weak, indicated by the large number of almost random connections between subsets. 

Economic Benefits of Snowman Architecture

Okay, now that you remember the basic overview, let's look at the economic advantages of The Snowman Architecture.

Benefit 1: Linear Versus Exponential Complexity Curve

As an IT system gets larger it gets more complex. This is because complexity is driven both by the amount of functionality in a system and the number of connections in a system3. Both the Snowman Architecture and the traditional architecture gets more complex as the system increases in size but how they increase in complexity is quite different. The complexity of the tranditional system increases exponentially. The complexity of the Snowman Architecture increases linearly

For small IT systems, the difference between an exponential increase and a linear increase of complexity is not important. But as the size of the IT system exceeds $5M in cost, the difference becomes very important. 

Figure 3 show the relationship between complexity and project size of a traditional versus a Snowman Architecture. 

Figure 3. Complexity of Traditional Architecture versus Snowman Architecture

As shown in Figure 3, the complexity of a traditional IT architecture increases exponentially. It starts low and then enters the Risk Zone (the zone in which project failure is likely) when the size hits someplace around $8M. From there it rapidly ascends into the Failure Zone (the zone in which project failure is certain)4.  

In contrast, the complexity of the Snowman Architecture starts low (as does the traditional architecture) and then increases with a shallow linear slope. There is little difference between a shallow linear line and an exponential slope at low numbers. In Figure 3, you can see that at project sizes under $1M, there is effectively no difference between the Snowman Architecture and the traditional approach.

However this changes quickly as the project size increases. Traditional architectues are already in the Danger Zone by the time they hit $8M and by the time they hit $10M they are in the Failure Zone. In contrast, the shallow linear complexity slope allows the size of the Snowman Architecture to remain comfortably  in the Success Zone until well past $100M in project size. In fact, it isn't even clear that there is a size limitation with the Snowman Architecture.

The bottom line: a traditional architecture becomes likely to fail at around $5M whereas a Snowman Architecture has a high probability of success even at $100M.

Benefit 2: Return on Investment (ROI)

To compare the ROI of the Snowman Architecture versus the traditional horizontally partitioned architecture, let's take some reasonable project numbers for, say, a $20M project. 

Using a traditional architectural methodology (e.g. TOGAF) we can reasonably assume the $20M project will go over budget by at least 200% and will cost an additional 400% in lost opportunity costs5

Using the Snowman Architecture we won't be doing a single $20M project, we will be doing some number of smaller project of at most a few $M each. Projects of this size are well within the Success Zone (as shown in Figure 3.) Projects in this zone typically have no overruns and no lost opportunity costs. 

The Snowman approach requires an additional phase in the project life cycle, a pre-planning phase. This is where most of the work is done to design and plan the snowmen. In the worst case, this phase could add 10% to the overall cost of the project.

Of course, these numbers are just best guesses based on what I have seen of industry data. Feel free to plug in actual numbers from your own projects.  But based on these numbers, we can calculate the Snowman ROI.

Without using the Snowman architecture, we expect a total cost of

   $20M (planned cost)
+ $20M (200% overrun)
+ $40M (lost opportunity costs
-----------
$80M (total cost)
With the Snowman architecture we expect a total cost of 

  $20M (planned cost)
+ $2M (10% overhead for Snowman preplanning)
---------
$22M (total cost)

The difference between the two approaches is

  $80M (Cost of traditional approach)
- $22M (Cost of Snowman approach)
---------
  $58M (Difference between approaches)

The ROI of using the Snowman approach is thus

  $58M (Difference in Costs) 
/   $2M (Added cost of Snowman Approach) 
X 100
--------
2900% (Calculated ROI)

The bottom line: the Snowman approach returns a 2900% ROI. A 2900% ROI is excellent by any measure.

Benefit 3: Non Tangible Benefits

There are many benefits to delivering a project on time other than eliminating the lost opportunity costs. It is hard to measure these benefits, but they certainly include the following:

  • Predictability of IT deliverables.
  • Increased trust between Business and IT.
  • Better ability to use IT as a strategic asset.
As you can see, there are compelling reasons favoring the Snowman Architecture over traditional approaches. The reduction in complexity is huge and the ROI would make even the most seasoned CFO salivate  But the most compelling reasons favoring the Snowman Architecture may not be economic, they may be technical. But for those benefits, you must wait for the next installment of this blog.

Footnotes

(1) SIP is a patented methodology for autonomy optimized partitioning. It is described in a number of places, including the web short SIP Methodology for Project Optimization.

(2) TOGAF® is a methodology owned by The Object Management Group. It is described on the TOGAF 9.1 On-Line Documentation.

(3) If you are interested in the mathematical relationship between size, connections, and complexity, see my white paper The Mathematics of IT Simplification.

(4) I have written about the relationship between traditional IT project size and failure rates in a number of places including the web short The Relationship Between IT Project Size and Failure Rates.

(5) Unfortunately, we do not have good data on what these number are world-wide. These particular numbers came from averaging a number of large projects discussed in the Victorian Ombudsman Investigation into ICT Enabled Projects (2011).

Acknowledgements

Snowman picture by CileSuns92

Saturday, September 1, 2012

Snowman Architecture Part One: Overview


Introduction

This is the first of a three part blog. The parts will be laid out as follows:
  • Part One: Snowman Overview. The basics of the Snowman Architecture and why I claim it is critical for enterprise architects.
  • Part Two: Snowman Benefits. Validation for the claimed benefits of the Snowman Architecture over traditional architectural approaches.
  • Part Three: Snowman Apologetics. The arguments against the Snowman Architecture and why they are wrong.
As Enterprise Architects, there is no lack of problems deserving of our attention. We need to ensure our organizations are well positioned for the Cloud, can survive disasters, and have IT systems that can chassé in perfect time with the business. 

And then there is the whole area of IT failures. Too many of our systems go over budget, are delivered late, and end up depressing rather than supporting the business. If you have been reading any of my work, you know all about this.

But what if there was one approach to architecture that could meet most of our needs and solve the lion's share of our problems? I believe there is. I believe there is a single architectural style that is so important, I consider it a fundamental enterprise architectural pattern. I call this the Snowman Architecture.

In my last blog, I talked about Radical IT Transformation, a transformation that redefines the relationship between the business and IT. The Snowman Architecture is the IT side of this radical transformation.

Fundamentals

If Snowman Architecture sounds too informal to you, feel free to refer to it by its formal name: Vertically Aligned Synergistically Partitioned (VASP) Architecture. Figure 1 shows the four main segments of a VASP architecture.


Figure 1. Basic Vertically Aligned Synergistically Partitioned (VASP) Architecture

With a little imagination (or with the help of Figure 2) you can see why I refer to a VASP architecture as a Snowman Architecture. 


Figure 2. Snowman Architecture

Now your first reaction to the Snowman Architecture is probably, "Hey, that looks just like a services-oriented architecture (SOA)." A typical SOA is shown in Figure 3. And you can see that all of the components of the Snowman Architecture also appear in an SOA.


Figure 3. Typical SOA

Snowman: SOA with Constraints

The best way to think of the Snowman Architecture is that it is an SOA with some very tight constraints. It is these constrains that are critical to addressing all of the issues I mentioned earlier, so let's go through them.

Constraint 1: Vertical Alignment.

The contours of the business architecture (Snowman head) define the contours of the technical, services, and data architecture.  

In other words, there is a close relationship between the business, technical, services, and data architectures. Let's take these one by one.

At the technical level, there is package of technical systems (Snowman torso) that implements the package of business systems (Snowman head.) The technical package is complete with respect to the business package, that is, it fully implements the business package and doesn't implement anything other than the business package.

This vertical alignment is respected down to the data level (Snowman bottom.) In other words, there is a package of data that meets the needs of the package of technical systems (Snowman torso). This package of data fully meets the needs of the business package and doesn't meet the needs of any other package.

At the Service level, each messaging relationship supported at the services level implements one dependency at the business level. Further, all messaging relationships can be traced back to a business level dependency.

Constraint 2. Synergistic Partitioning.

The functions in the business package (Snowman head) are synergistic with respect to each other. 

Since the contours of the business package (Snowman head) define the contours of the lower level packages, it is important that the "right" functions be  located together. The overall choice of which business functions should co-habitat with which others should be directed to minimizing the overall system complexity. Elsewhere1 I have shown that the least complex overall system is attained when the choice as to co-habitation is based on the mathematical concept that I call synergy

While the concept of synergy has a precise mathematical definition, it also has a pragmatic definition. For those who don't care about the mathematics, just think of synergy is "closely related." That is, two functions are synergistic if they are closely related to each other, like deposit and withdraw. For those who do care about mathematics, see my White Paper1.

Given these two constrains, you can see why I call this a Vertically Aligned Synergistically Partitioned Architecture. And given the complexity of that description, you can see why I prefer the term Snowman Architecture.

Terminology

I use the term capability to refer to the closely related packages of business, technical, service, and data architecture. This is somewhat similar to the way the term capability is used in various enterprise architecture methodologies, although most don't include anything other than the business architecture in the notion of capability. So if I am being precise, I will refer to one related grouping of the four package types as a capability. When I am being informal, I will refer to that same  grouping as a  Snowman. So I might say the Checking-Account capability or the Checking-Account Snowman. Either of these would mean the business processes that deal with checking accounts, the technical systems that support those processes, the data that feeds those technical systems, and the services that provides interoperability with the outside world.

When I want to be clear that I am talking about my understanding of a capability rather than somebody else's, I will use the term autonomous business capability (ABC) . The word autonomous reflects the synergistic assignment of business functions and the word business refers to the central role of the business layer in defining the overall capability structure.

When I am discussing the business architecture of the ABC, I will refer to the business level of the ABCSimilarly I will use the terms technical, services, and data level to refer to those respective architectures. 

So the business level of the ABC contains some collection of business functions that are synergistic with respect to each other. The technical level of the ABC provides the technical support needed by those functions. The data level of the ABC provides the data that fuels the technical level. And the services level of the ABC implements dependencies between ABCs.

Relating this back to the Snowman Architecture, the business level of the ABC is the head, the technical level of the ABC is the torso, the data level of the ABC is the bottom, and the service level of the ABC is the arms. 

Scaling Up

Since the Snowman architecture is a subset of an SOA, creating larger and larger systems is easy. We just add more Snowmen (or ABCs, if you prefer) into the mix and make sure they are connected through messages as shown in Figure 4.


Figure 4. Scaling Up the Snowman Architecture

Benefits

Let's go back to my original claim, that the Snowman architecture solves many of the problems that plague the enterprise architect. Now I should inject a caution here. I consider the problem space of the enterprise architect the delivery of large (say, greater than $1M) systems2. If all we are building are small systems, then many of these claims don't apply. For that matter, there should be no need for an enterprise architect. 

Given this caveat, I make the following claims about the Snowman architecture in comparison to a traditional SOA or any traditional architectural approach:
  1. The Snowman architecture is cheaper to build.
  2. The Snowman architecture is more likely to be delivered on time.
  3. The Snowman architecture is more likely to satisfy the business when delivered.
  4. The Snowman architecture is easier to adapt to the changing needs of the business.
  5. The Snowman architecture is more amenable to Agile Development.
  6. The Snowman architecture is easier to debug.
  7. The Snowman architecture is more secure.
  8. The Snowman architecture is more resilient to failure.
  9. The Snowman architecture is easier to recover when system failure occurs.
  10. The Snowman architecture makes more efficient use of the Cloud.
There are a number of other benefits I could claim, but this should be sufficient to make the point. And I think it is fairly obvious that if all of my claims are true, it will be a compelling argument in favor of the Snowman Architecture.

In Part Two of this blog, I will validate each of these claims. Then in Part Three, I will discuss all of the arguments against the Snowman Architecture and show why they are wrong.

If you would like to be notified when the next installments are ready, you have two choices. If you just want to know about new blog posts, you can use the email signup on the right. If you would also like to know about my white papers, webshorts, and seminars, then use the ObjectWatch sign-up system at http://www.objectwatch.com/subscriptions.html.

Either way, stay tuned!

-------------------------------
Workshop Announcement: 
Radical IT Transformation with Roger Sessions and Sarah Runge
For my New Zealand and Australia followers, I will soon be doing a workshop with Sarah Runge, author of Stop Blaming the Software. We will be spending two days discussing our work in Radical IT Transformation, a better way to do IT.
Auckland: October 11-12 2012
Sydney: October 15-16 2012
Cairns: October 18-19 2012

Check out our Agenda or Register!
------------------------------

Notes

[1] See for example my paper, The Mathematics of IT Simplification at http://www.objectwatch.com/white_papers.htm#Math.

[2] In passing, I also note that I consider the problem space of the Enterprise Architect the delivery of the maximum possible return on IT investment. Many enterprise architects disagree with this job description. See for example the extensive discussion in LinkedIn on the subject of What is EA?

Acknowledgements

The two Snowmen pictures are by (in order of appearance) jcarwash31 and chris.corwin on Flickr, both are licensed under Creative Commons.

A Note on Comments

I welcome your questions/comments on this blog and I will try to respond quickly. A word of caution: I am not interested in comments along the lines of "This is not EA, this is EA-IT" or "EA is not concerned with delivering more value from IT." If you would like to have that conversation, I suggest you contribute to one of the discussions on LinkedIn, such as What is EA? Comments here are reserved for the topic at hand, discussing the Snowman Architecture, its claims, and the arguments against it. Thank you!


Tuesday, August 14, 2012

Radical IT Transformation


The industry has reached a consensus: IT is in trouble and is in need of a transformation. This much seems clear. But exactly what that transformation should look like is much less clear.

HP and Cisco tells us that IT transformation is about the cloud (1,2). Microsoft narrows this to the private cloud (3). IBM restricts this even further, saying IT transformation is about “consolidation, standardization and—most important—virtualization” (4).

There’s more. According to CIO Magazine, IT transformation is about the ability to show cost of services (5). CapGemini says IT transformation is about “identifying the key business drivers that impact the IT function, and their implications on IT operations [sic]” (6). And Accenture has perhaps the most interesting proposal of all: IT Transformation is about getting rid of all vendors except Microsoft (7). If only it were that easy!

Each of the opinions has a grain of truth. Certainly a transformed IT operation will make effective use of the cloud and will find ways to consolidate its servers through virtualization. A transformed IT would understand how to show its cost of services and be able to identify its key business drivers. And most would concede that Microsoft technologies can contribute cost effectively in many areas.

But each of these opinions lacks a larger perspective. IT transformation is not about adopting the latest technology or business fad. True IT transformation is about rebuilding, from the ground up, the relationship between the business and IT.

Sarah Runge and I have been discussing what such an IT transformation might look like. Sarah is the author of Stop Blaming the Software and approaches the business/IT problems from the perspective of the business. I am author of Simple Architectures for Complex Enterprises and approach these same problems from the IT side. We have found valuable synergies in our perspectives. We too are calling for a transformation, but a transformation that goes to the very heart of the business/IT relationship. We call this radical IT transformation.

Why do we need a radical IT transformation? In a nutshell, we don’t think most organizations are coming close to realizing the potential benefits of their IT investments. We see many IT organizations stretched to the breaking point just trying to maintain existing systems. We see critical new projects being shelved. The new projects that are done are often delivered late, over budget, and missing key functionality. We see many organizations in which the business doesn’t trust IT and IT feels marginalized by the business. And we believe that few large organizations are well positioned to leverage interesting new technologies such as the cloud. Does any of this sound familiar?

We believe all of these problems can be solved, but we don’t think they will be solved with piecemeal solutions. We need a radical transformation not in only in how IT does its job, but in how business and IT work together.

Radical IT transformation is a foundational transformation of the entire business/IT relationship. At its core, the transformation takes us from a technology-centric business/IT relationship to a business-centric business/IT relationship. This transformation includes a number of strategic shifts, each playing a role in the bigger transformation. I’ll briefly describe each of these shifts, saving details for later presentations.

Shift 1: From IT driven to business driven solutions. Today, IT does its best to understand the business and then use that understanding to drive IT projects. In a transformed organization, business takes the lead in driving all IT projects.

Shift 2: From big to small. Today, IT often tries to deliver large far-ranging solutions. In a transformed organization, IT delivers small solutions targeted at very specific, well-defined problems.

Shift 3: From complex to simple. Today, IT projects quickly grow in complexity driving up cost and increasing risk. In a transformed organization, IT intentionally delivers the simplest possible solution that meets the business need.

Shift 4: From long-term to short-term value. Today, IT organizations focus on delivering long-term value from their projects. Unfortunately conditions and technologies evolve quickly, making long-term projections nearly useless. In a transformed organization, time-to-value is a more important metric than projected long term ROI.

Shift 5: From process focus to delivery focus. Today, many IT organizations are mired in lugubrious processes that drag on indefinitely and deliver little value. In a transformed organization, processes are slashed to the absolute minimum and delivery is rewarded.

Shift 6: From internally owned to public cloud environments. Today, most IT systems are running on costly privately owned machines that require huge operational investments. In a transformed organization, many more IT systems will be running on highly efficient leased cloud systems that require minimal operation investments.

Shift 7: From IT centric to business centric architectures. Today, most IT organizations create IT architectures that are independent of the business processes they support. This creates a major IT drag on business agility. In a transformed organization, the IT architecture intentionally mimics the business architecture, resulting in highly agile IT systems that can turn on a dime as the business evolves.

Shift 8: From design to implementation. Today, most IT organizations spend considerable time “doing design.” In a transformed organization, there is much less design done in IT, since the overall design is driven by the business architecture (see shift 7.) While IT doesn’t completely leave the design business, IT is seen as primarily responsible for implementing the design that is defined by the business rather than creating the design that will be used by the business.

Shift 9: From long to short time frames. Today, most IT organizations measure their milestones in months and their delivery dates in years. In a transformed organization, entire delivery cycles are reduced to months or less. With processes slashed, design de-emphasized, and focus shifted to small and simple solutions, time to deliver is cut to the minimum.

Do these shifts resonant with you? Perhaps you are a candidate for a radical transformation. If so, stay in touch. We’ll be discussing this more in the coming weeks.

Would you like to subscribe to notifications about my blogs, white papers, and webshorts? Sign up here:  http://www.objectwatch.com/subscriptions.html.


-------------------------------
Workshop Announcement: 
Radical IT Transformation with Roger Sessions and Sarah Runge
For my New Zealand and Australia followers, I will soon be doing a workshop with Sarah Runge, author of Stop Blaming the Software. We will be spending two days discussing our work in Radical IT Transformation, a better way to do IT.
Auckland: October 11-12 2012
Sydney: October 15-16 2012
Cairns: October 18-19 2012

Check out our Agenda or Register!
------------------------------


Citations:


(1) http://h30507.www3.hp.com/t5/Transforming-IT-Blog/bg-p/transforming-it
(2) http://www.cisco.com/assets/sol/cloud/cloudverse_videos/index.html
(3) http://www.microsoft.com/business/events/en-us/PrivateCloudExec/#fbid=J5GaDAi8tB6
(4) http://ibm.co/MCnX7s
(5) http://www.cio.com/article/663015/Transforming_IT_to_Show_Cost_of_Services_5_Best_Practices
(6) http://www.capgemini.com/services-and-solutions/challenges/transforming-it-function/overview/
(7) http://www.accenture.com/us-en/pages/success-accenture-microsoft-transforming-it-summary.aspx

Acknowledgements

Photo of Potter’s Hands is by Walt Stoneburner (http://www.flickr.com/photos/waltstoneburner/) licensed under Creative Commons

Thursday, July 5, 2012

The Misuse of Reuse

The software industry has been pursuing reuse for at least four decades. The approach has changed over that time. It started with structured programming promising reusable snippets of code. We then moved to object-oriented programming promising reuse through inheritance. Today we are focusing on service-oriented architectures promising reusable services that can be written once and then used by multiple applications.

For forty years we have been pursuing reuse and for forty years we have been failing. Perhaps it is time to reexamine the goal itself.

Let's start by reviewing the arguments in favor of reuse. They are quite simple. And, as we will soon see, they are quite flawed.

The argument goes as follows. Let's say we have three systems that all make implement the same function, say Function 1. This situation is shown in Figure 1.

Figure 1. Three Systems Implementing Function 1


It seems fairly obvious that implementing Function 1 three times is an ineffective use of resources. A much better way of implementing these three systems is to share a single implementation of Function1, as shown in Figure 2.

Figure 2. Three Systems Sharing A Single Function 1

In general, if there are S systems implementing Function 1 and it costs D dollars to implement Function 1 that the cost savings from reuse is given by

D * (S - 1)

If D is  $10000 and S is 5, then reuse should save us $40,000. Right? Not so fast.

In order to evaluate the claim of cost savings through reuse, we need to apply some principles of IT Complexity Analytics. IT Complexity Analytics tells us that the complexity of Function 1 is exponentially related to the number of systems using the Function. This is because each system is not using the exact same function, it is using some variant of the same function. Function 1 needs to be generalized for every possible system that might someday use it, not only those we know about, but those we don't know about. This adds considerable complexity to Function 1. 

If the size of the circle reflects the complexity of the functionality, then a much more realistic depiction of the reuse scenario is shown in Figure 3. 

Figure 3. Realistic Depiction of Sharing Functionality

Since system cost is directly related to system complexity (one of the axioms of IT Complexity Analytics) we can say that in most cases, the theoretical cost savings from reusing functionality is overwhelmed by the actual cost of the newly introduced complexity.

However, the situation is even worse than this. Not only is the cost savings from reuse rarely achieved, but a number of additional problems are introduced. 

For example, we now have a single point of failure. If the system implementing Function 1 fails, all three of our systems fail. 

We have also compromised our security. As IT Complexity Analytics predicts, the overall security of a system is inversely related to its complexity. The more complex a system is, the lower its inherent ability to maintain security. 

And we have created a highly inefficient system for running on a Cloud. The extra cloud segments we will need to pull in to support our reuse will dramatically increase our cloud costs.

Given all of the problems we have created, we most likely would have been better off not attempting to create a reusable function in the first place. 

Now I should point out that I am not totally opposed to reuse. There are situations in which reuse can pay dividends. 

In general, a reuse strategy is indicated when the inherent complexity of the functionality being shared is high and the usage of that functionality is relatively standard. In these situations, the complexity of the functionality dominates over the complexity of the sharing of the functionality. But this situation is unusual. 

When should you pursue reuse? It all comes down to complexity. Will your overall system be more complex with or without shared functionality? This requires a careful measure of system complexity with and without the proposed sharing. If you can lower system complexity by sharing, do it. If you can't, don't. 

Complexity trumps reuse. Reuse is not our goal, it is a possible path to our goal. And more often than not, it isn't even a path, it is a distraction. Our real goal is not more reusable IT systems, it is simpler IT systems. Simpler systems are cheaper to build, easier to maintain, more secure, and more reliable. That is something you can bank on. Unlike reuse. 

...............................
Roger Sessions writes about the topic of Organizational Complexity and IT. If you would like to get email notifications about new posts, use the widget on the right.