Brandon Werner

Archive for the ‘Software Architecture’ Category

Seattle Heat, Software Transactional Memory, Cloud Computing Storage, and me

Saturday, August 16th, 2008

Here in Seattle we are in the middle of a mini heat-wave, which usually means to our sensitive and water worn skin that the temperature is above 80 degrees. However, we’ve run in to a stretch of 90 degree weather these last few days - and since Seattle seems proud of its lack of air conditioning it’s much worse here than other places that get that hot regularly. In Cincinnati you would go from air conditioned homes to air conditioned cars to air conditioned office buildings. Here that isn’t the case. The situation is slightly worse in Redmond since the farther away from the Puget Sound you get the hotter the air. The best you can do is go out for the day, as my condo and most houses don’t have any air conditioning at all. Worse, all but the wimpiest indoor air conditioners are banned due to decree from the Home Owners Association, which I have come to suspect is a front for the Communist party. After all, the McCarthy trials were happening about the same time suburbs were getting built in the 1950s. Where were the poor communist to hide?

I’m not saying, I’m just saying.

Summer Reading While The Boss Is Away

If you are elsewhere in the country in late August, chances are you are acting like Seattle folks do the rest of the year - taking refuge inside and away from the weather outside. While you are in your air conditioned home waiting for the last and hottest part of summer to pass, why not do some good reading to prepare for the fall when everyone returns from vacation and you get back to the serious business of deadlines, programming and of course geeky arguments about the topics of the day. Here is a good reading list to bookmark.

Get up to speed of Generic Programming, or Programming In General

My first recommendation is the collected papers of Alexander Stepanov, which you can get from his website entitled… Collected Papers of Alexander Stepanov. For those who don’t know, Stepanov is the key person behind the C++ Standard Template Library, which he started to develop around 1993. He had earlier been working for Bell Labs close to Andrew Koenig and tried to convince Bjarne Stroustrup to introduce something like Ada Generics in C++. His papers are a treasure of thought on generic programming, logic, robotics and anything else that made you turn to the Computer Science page in your university’s catalog. Best of all he also provides slides for his book in progress, written with Paul McJones, called Programming Elements. This is a great book for refreshing your knowledge of abstract and concrete concepts in quick and easy powerpoint format. Just take a look at the table of contents and I dare you not to click on at least one of the Chapter links. Don’t worry, I won’t tell.

The “Core” Debate of the Community: Concurrent Programming and Software Transaction Memory

Yes, the pun was bad. It does however illustrate one of the facets of the problem that is burning up academic and commercial researchers alike, and responsible for a large amount of papers flooding the ACM portal: Software Transactional Memory (STM). Well, actually, that’s a possible answer to the problem - not the problem itself. They are often confused now. The problem is that since Intel and AMD have decided to start introducing more cores on to single chip we have to deal with the big problem that comes along with that: managing the threads of multiple cores trying to do the same work on behalf of the system it’s working for. It also scales in to bigger problems of any type of work you may want to farm off to “locales” that may need to cross boundaries and work on the same data within a transaction (for more information on some of this, see my post from the Google Scalability Conference regarding Cray’s work to replace MPI with a new concurrent language Chapel and the GIGA+ filesystem below)

I think Simon Peyton Jones from Microsoft Research in Cambridge illustrates it best in his paper Composable Memory Transactions(PPOPP’05) :

The dominant programming technique is based on locks, an approach that is simple and direct, but that simply does not scale with program size and complexity. To ensure correctness, programmers must identify which operations con?ict; to ensure liveness, they must avoid introducing deadlock; to ensure good performance, they must balance the granularity at which locking is performed against the costs of ?ne-grain locking. Perhaps the most fundamental objection, though, is that lock-based programs do not compose: correct fragments may fail when combined. For example, consider a hash table with thread-safe insert and delete operations. Now suppose that we want to delete one item A from table t1, and insert it into table t2; but the intermediate state (in which neither table contains the item) must not be visible to other threads. Unless the implementor of the hash table anticipates this need, there is simply no way to satisfy this requirement. Even if she does, all she can do is expose methods such as LockTable and UnlockTable - but as well as breaking the hash-table abstraction, they invite lock-induced deadlock, depending on the order in which the client takes the locks, or race conditions if the client forgets. Yet more complexity is required if the client wants to await the presence of A in t1, but this blocking behaviour must not lock the table (else A cannot be inserted). In short, operations that are individually correct (insert, delete) cannot be composed in to larger correct operations.

The most that has come out of this is that we know it’s a problem and we’d love to use the keyword “atomic” to wrap our transactional code in our languages. Beyond that, it’s a lot of hand waiving and Powerpoint slides. Some people though are actually trying to work it out. The best starting point here are the papers from the before mentioned researcher Simon Peyton Jones. His collection of papers on STM offers a good starting point of the problem and what some possible solutions are. In his papers he uses Haskell, and his work has led to Concurrent Haskell. Haskell lends itself to STM for reasons I won’t go in to here, but it will be quite a bit more of a challenge to get the same functionality in Java and C#, but there is already an API for C# Software Transactional Memory from Microsoft Research you may want to explore.

If you don’t care about this, just don’t go naming classes atomic and you should be fine.

Storing The Cloud: How Do We Scale?

Solid State (read: Flash) drives aren’t the only thing showing the age of our old file system technologies. As we expose software as services and begin taking on large numbers of tenants for our software, cloud computing needs clusters with thousands of nodes that, with the multi-core technology mentioned above, will impose a challenge for storage systems. We will need the ability to scale to handle data generated by applications executing in parallel in tens of thousands of threads. There have been some solutions posed, such as IBM General Parallel File System (GPFS) and Microsoft Research’s Boxwood technology.

I was lucky enough to watch a presentation on GIGA+, another solution that is being researched by Swapnil V. Patil at Carnegie Mellon University. One of its neatest ideas is leaving the header-table behind, using a bitmap instead. I got to sit down with him afterward and talk about the challenges we face in this space. It was a great time. His primary concern about GPFS and Boxwood is the use of hashing and B-trees, which causes the possibility of bottlenecks and synchronization issues. By using a bitmap, and keeping it small so that it can be shared across nodes easily, GIGA+ eliminates a need for “metanodes” or other controllers on the HPC storage architecture.

His paper, GIGA+ : Scalable Directories for Shared File Systems, is a great read for those interested both in the problem of high-performance computing and storage. Their work seeks to maintain the UNIX file structure however, so those who care about scaling Microsoft infrastructure may find less to enjoy, but the overall architecture and problems outlined in the paper is applicable to any massively large storage cluster technology.

Enough Already

That should be enough to get you through August. When your boss comes back from his Alaskan cruise, nothing will ensure he leaves you alone more than talking about Concurrent Haskell or how much you enjoyed Chapter 9 of Programming Elements: Algorithms on increasing ranges. Enjoy the air conditioning you lucky bums.

Typical Architecture Roles in an Enterprise Environment

Monday, June 23rd, 2008

I created the following slide on typical architecture roles and I thought I’d share it.

Typical Architecture Roles in an Enterprise Environment

Typical Architecture Team

Enterprise Architects

Primary role is to manage large scale product and process integrations and determine which products and processes are best suited to deliver on business requirements. They control the large picture of how everything works in an organization and maintains this in a centralized location. They should be experts in software and enterprise design methodologies with experience in how large systems interact and manage data. These architects are essential to competitive and cost-effective decision making and use of technologies.

Water-Cooler Talk: The latest research in to The Staged Event-Driven Architecture for Highly Concurrent Server Applications

Integration Architects

This is an emerging role in larger companies that have large and complicated deployments, particularly around Service Oriented Architecture (SOA). They are usually the ones that have the task of managing Business Processes. Put simply, they tie the software platforms the Software Architect designs together on the environments the Enterprise Architects deliver and purchase. Although Enterprise Architects are typically restricted to existing thinking and technology products, it is the combination of Integration and Software Architects that differentiate an organization and provide maximum benefit.

Water-Cooler Talk: How to change the business workflow so that they can be quicker than their competitors. May need to talk to the Software Architects about how the platform can be changed for quicker processing too.

Software Architects

Primary role is to take architectural directions and artifacts and produce and manage a software platform that provides strategic and operational advantage to an organization. They are usually the ones who maintain the core frameworks of an organization and are considered the gurus of whatever technology they design for. They are very important as they tend to add order and discipline to projects and ensure that best practices, appropriate abstraction and code re-use occurs. These architects are essential to good outsourcing of software development, especially near-shore and off-shore.

Water-Cooler Talk: The latest research in to how Dependency Injection in Java 5 eliminates the need for the Composite Entity pattern in enterprise development.

Google Scalability Conference: Haskell with DHT for Wikipedia / GIGA+ Filesystem

Friday, June 20th, 2008

Google just published some of the slides of the Google Scalability conference online that I attended last weekend and wrote a commentary about earlier this week. The two I’d like to call out are the GIGA+ file system (for storage geeks) and the Software Transactional Memory slides (for software geeks). Also, the ideas presented in the Wikipedia for Haskell / DHT I found really interesting as well.

Just consider it some light reading for your geek weekend.

Thoughts On Google’s Conference on Scalability In Seattle

Monday, June 16th, 2008

Google Scalability Conference LogoIf you are looking for a good collection of notes regarding the topics covered at the Seattle Conference on Scalability, you can do no better than what James Hamilton put together. Instead, I’ll write a quick commentary on what I experienced.

Scalability Is Your Problem Too

The goals of the conference are laudable. Scalability is an issue that almost all practitioners of software engineering face, especially as we move towards offering services both inside and outside the enterprise. Many are taken off guard by the sudden issues that confront them after wiring up a large scale services-based environment; especially around distributing load, distributing the data, and writing the data quickly. Sadly, I didn’t see too many people from large companies there - most were software companies like Microsoft, Google, MySpace and Amazon.com. The attendance may be a consequence of the subject matter. This was some intense stuff dealing with MPI at Cray and its hopeful successor, Wikipedia redone with DHT and Erlang, a b-tree vs. Hashmap debate and scalable storage issues when dealing with billions of files. A more fun loving person would have done better going over to Adobe and hanging out at BarCampSeattle, which was going on at the same time.

Despite the intimidating material, there are real architectural and design issues that these discussions present that should be in the mind of anyone dealing with large datacenters that scale globally or even nationally. The approach of GIGA+ file storage, maidsafe’s new computer architecture, and NetWorkSpaces for the R language was uniform: off-loading responsibility for management of data (meta or otherwise) to all vertices in the deployment graph instead of a central repository. NetWorkSpaces in R and maidsafe even discussed computational scalability - while Cray’s new Chapel language and the discussion around Software Transactional Memory focused on scalability across processing cores as well as machines.

GIGA+ Bitmap Example

GIGA+’s approach of maintaining a small bitmap file on each node and passing that around - while anticipating and accepting stale data on a few edge nodes - was brilliant in the patterns it hinted at, including that perhaps being right all the time isn’t as important as being fast. You can be right most of the time and accept the performance hit of not being right some of the time. There are many people who would cringe at this, but at this point we’re going to have to play loose and leave a few balls up in the air as we juggle - doing the math of how often one may fall while keeping the rest going as fast as we can.

Pay No Attention To The Man Behind The Curtain

Yet if I had to sum up the content of the conference I would say it was big on strategy and architecture but short on implementation. There was a lot of things hinted at “behind the curtain” but nothing assured hand raising from the compsci geeks in the room more than hand waving when you got to the distributed piece of your solution. For instance, one of the big benefits of Chapel - the MPI successor that Bratford Chamberlain of Cray presented - was that you could have distributed arrays and graphs that would be automatically sliced up to be distributed to parallel cores or even other “locales” if desired. How the language determines where to split these large arrays and graphs and farm them out was not discussed. One of the more interesting slides was dashed lines drawn across various nodes and vertices of a graph symbolizing how it would be chopped and distributed. Someone in the audience raised their hand at this - but he moved on and the hand went back down. To be fair, Chapel was called a “multi-resolution” language where one could start fairly abstract and then add more detail and control to get the best desired result - something I assume you have to do to get good or intelligent chopping and distribution of the data. Given that one of his slides was a comparison of code lines between Fortan using MPI and Chapel: seeing a working code snippet of Chapel would have been helpful. It may turn out to be the same amount of work after you get past the “global view”.

This was the trend though, as all of the presentations had a bit of hand waving regarding performance metrics and distribution of computation. This was highlighted by the talk of Vijay Menon of Google - whose work at Intel I was familiar with - discussing Software Transactional Memory. He illustrated the challenges of implementing this in an imperative language (I’m suspicious you can even do STM well in an imperative language with state - as I discussed before) but beyond suggesting the keyword “atomic” to replace “synchronized” in the Java language there was very little real content discussed for those already familiar with the issue of locks and multiprocessors. Concurrent Haskell wasn’t even mentioned. A better introduction and discussion is to be had by watching the O’Reily’s OSCON video from Simon Peyton-Jones (the writer of GHC and now at Microsoft Research) on the subject. After that, if you’re still hungry, his collection of papers on his Microsoft Research site is a delight.

Of course the point of these conferences is the discussions that occur during the breaks and in the networking event afterwards - something that I treasure having newly moved to the Seattle area from Cincinnati. Instead of just observing and blogging from afar - I get to be at the same table as Vijay Menon, Thorsten Schuett, Swapnil Patil, Paul Watson and others.

Summary of the Architectural Patterns I Saw

If I had to summarize what I took away from the conference from a high-level architectural stand-point, here are they are:

  • Every node must be aware of the state of every other node without a centralized controller.
  • To do this, a mechanism should be in place to share state quickly but peer-to-peer.
  • It’s ok to let some nodes go stale.
  • Client/Server is now one thing. Pub/Sub with computation. Every node on the graph should do work.
  • As much as possible, each node should maintain its own security and state. You should be able to have anonymous resources appear in your data center and be put to use without much configuration.
  • As much as possible, abstract the distribution of processing away from programmers.
  • Key,Value with Hashes are best for scalability and distribution (it seems to have won out in all the solutions presented here.) Blame MapReduce.
  • Ants can be used to demonstrate anything.

I hope everyone had a good of a time as I did.

ACM Article: Restful web services vs. big web services: making the right architectural decision

Tuesday, June 3rd, 2008

Great article on ACM regarding when to use REST vs. WS-* standards that are in wide use in SOA architectures today. Very interesting reading for those who may want to take the light-weight approach vs. using the webservice composition and discovery tools that enterprises may find in the TIBCO and IBM SOA stack.

ABSTRACT

Recent technology trends in the Web Services (WS) domain indicate that a solution eliminating the presumed complexity of the WS-* standards may be in sight: advocates of REpresentational State Transfer (REST) have come to believe that their ideas explaining why the World Wide Web works are just as applicable to solve enterprise application integration problems and to simplify the plumbing required to build service-oriented architectures. In this paper we objectify the WS-* vs. REST debate by giving a quantitative technical comparison based on architectural principles and decisions. We show that the two approaches differ in the number of architectural decisions that must be made and in the number of available alternatives. This discrepancy between freedom-from-choice and freedom-of-choice explains the complexity difference perceived. However, we also show that there are significant differences in the consequences of certain decisions in terms of resulting development and maintenance costs. Our comparison helps technical decision makers to assess the two integration styles and technologies more objectively and select the one that best fits their needs: REST is well suited for basic, ad hoc integration scenarios, WS-* is more flexible and addresses advanced quality of service requirements commonly occurring in enterprise computing.

ACM Article: How Intuitive is Object Oriented Design?

Saturday, May 17th, 2008

There is an incredible article that was published in the Communications of the ACM entitled “How Intuitive is Object Oriented Design?” by Irit Hadar from the University of Haifa, Israel and Uri Leron from the Israeli Institute of Technology.

It goes through the process of examining the disconnect between intuition and OO design for engineers and software designers.

The object-oriented programming paradigm was created partly to deal with the ever-increasing complexity of software systems. The idea was to exploit the human mind’s natural capabilities for thinking about the world in terms of objects and classes, thus recruiting our intuitive powers for building formal software systems. Indeed, it has commonly been assumed that the intuitive and formal systems of objects and classes are similar and that fluency in the former helps one deal efficiently with the latter. However, recent studies show that object-oriented programming is quite difficult to learn and practice. In this article, we document several such difficulties in the context of experts participating in workshops on object-oriented design (OOD). We use recent research from cognitive psychology to trace the sources of these difficulties to a clash between the intuitive and analytical modes of thinking.

It is currently hidden behind the ACM referred library portal but if you are an ACM member you can access it here.

Franz Responds To The Failure Of Lisp Post - What Platform Will Own Web 3.0?

Monday, May 5th, 2008

I took Franz and other Lisp companies to task a few weeks ago in a posting I wrote: The Rise Of Functional Programming: F#/Scala/Haskell and the failing of Lisp:

It’s hard to understand where it came from. Certainly one can argue the broader academic community had nothing to do with it, the old guard Common Lisp hackers are still as fickle and as judgmental to new comers as ever. Also, the old standards in Lisp languages, Franz and LispWorks have not lowered their prices to anything approachable to the casual developer.

Well, I got this email from a Franz representative in response:

Hi Brandon,

My name is Bernard… Very interesting blog, and it does look like you are still working with Lisp, and surprisingly, Semantic Web, too. Any chance you will be down in San Jose for SemTech 2008 this month?…

… I did see the post on Lisp. While we do need to run a business and stay afloat, it’s also in our best interest to have more interesting Lisp and ACL based projects out there. Give me a call if you would like to continue using ACL in your projects and we should try to work something out… I can also set up a temp license if you are interested in our RDF triple store AllegroGraph (http://agraph.franz.com/allegrograph/). The v3.0 release should be in a few weeks and will support both federation and social network analysis tools.
http://agraph.franz.com/support/documentation/3.0/reference-guide.html#header3-65

I am looking forward to talking to you.

First off, dangling the temping temp license for v3.0 of AllegroGraph is not playing fair. I have always been impressed with Allegro’s work in the semantic space set even before it was a popular buzz word. In fact, the same thing that led me to attempt to solve the Semantic problem with Lisp is what led to the same for Franz. If the semantic web and reasoning engines are to become reality, especially on parallel processing architectures, Lisp has to be at the front of the bus. Certainly, others will try to claim they do this just fine with interpreted OO languages with some runtime tweaking - but the problems facing us in the future demand we think differently about how we even construct algorithms to solve our problems. Brute force coding and heavy stacks are not going to get us there.

However, the fact that I’ve always admired it from afar is part of the discussion in my article linked above. It seems simply out of reach for mere mortals to use and incorporate in to their own development plans because of price. Certainly, Allegro deserves to be compensated for their hard work - this isn’t kids hacking PHP for the next Twitter reporting app after all - Allegro has always tackled the big problems where they can contribute value. Not the same thing can be said for many software companies out there.

Regardless, flirting with applications using this model is harkening back to the era of big client-server installations instead of quick and nimble collaborative innovation. As much as Allegro’s marketing may say that AllegroGraph is “Web 3.0″, the principles that drive it are not going to allow its success to be pinned to large engines running in a back room of a well funded company. If I get addicted to the software Allegro has - there is no remedy to bring in on board in my work.

This isn’t to say that Allegro hasn’t opened up to the community - they have opensourced good libraries - although through another license scheme, LLPGL. It also seems they are using the IBM model of “Community Driven Development” I complained about before when IBM released Project Zero to put “PHP on Rails”. They take contributions to fold back in to the Allegro products.

Although I really like working close with Allegro and writing about their accomplishments in this space, I challenge them to think if this is truly the model to gain traction in the coming Web 3.0 world. I would wager the Lisp community should still effort to create more nimble and open components for the semantic web - the internet will demand no less when picking it’s platform for Web 3.0.

Service Data Objects Architecture: Business Objects with Smarts Presentation

Thursday, April 17th, 2008

This is a presentation I created to describe how SDOs can be used in the Insurance enterprise space to provide sanity in the large and diverse messages. These are increasingly being passed around as Business Objects in a Domain architecture as companies move their old object patterns to a service based approach (I refer to it as servitized business objects).

If you are looking for my particular experience on how SDOs and the IBM EMF framework that contains them works against the large ACORD schema, you can find my critique of Websphere Process Server and ACORD here and the SDO design pattern plugin I wrote for Rational Software Architect here.

You can download the slideshow here.

The Rise Of Functional Programming: F#/Scala/Haskell and the failing of Lisp

Sunday, January 13th, 2008

Over at Lambda The Ultimate, the best academic programming blog on earth, there is a large debate going on regarding what the future of languages will be for 2008. The most important thing to emerge from the discussion is the larger role functional programming will play. It seems like a safe bet. This year has seen the explosion of interest and creation of functional languages such as Apple OS X’s Nu, Java’s JVM using Scala and Microsoft Research’s .Net language F#.

I am ecstatic at this change.

The Failure Of Lisp

It’s hard to understand where it came from. Certainly one can argue the broader academic community had nothing to do with it, the old guard Common Lisp hackers are still as fickle and as judgmental to new comers as ever. Also, the old standards in Lisp languages, Franz and LispWorks have not lowered their prices to anything approachable to the casual developer. There are open source ANSI Lisp implementations without all the supporting engines and functionality, such as SBCL. In fact, my most linked thing I’ve ever written in my career is the installation walk-through I did for installing SBCL and Allegro which includes adding your repository and packages for CLOS and automatically compiling the FASL files, especially dealing with the asdf differences between the implementations. The complexity of this in itself points to problems with portability and configuration in Lisp. However, even that project that targeted Lisp’s Bread and Butter, the parsing of semantic ontologies for the Semantic Web, was met in the message boards with worries on if there would be enough developer participation using such an odd language, and recommendations on moving it to Java.

In reality, Common Lisp showed its failure as a community by sitting out this enthusiasm that has been generated around functional programming languages. It didn’t have to be that way. I recall my first awarenesses of functional programming’s growth was the awesome work of Lemonodor’s blog and Sriram Krishnan posting “Lisp Is Sin“. I was happy at the time that Lisp was getting such attention, as well as functional language architectures in general. I imagined that as OO languages had grown so verbose and feature dense that even the IDEs to develop your applications run in to the tens of gigabytes, a new evolution “Back To The Future” was inevitable. Even more, I believe long suffering Lisp deserves to be back in favor again, it’s certainly spent its time in purgetory. Yet, it didn’t happen. You can blame the old 50 year old men sitting on IRC channels for that. It was the most thorny and un-inspiring community I’ve ever participated in, despite my extreme interest in the language. It’s jaw dropping that a language with such promise has sat out the resurgence, and speaks to what an un-friendly and un-inviting community can do a technology platform. I would be the first to march it off to the grave.

The Rise Of Functional Languages

The interest in functional programming actually grew up around more academic but pure languages like Scheme and Haskell. Although these languages sit within their own island and lack many of the “dirty” aspects of Lisp’s CLOS environment that make it easy to access OS and hardware resources, they are still strikingly useful in learning things that are the staple of functional languages, such as Closures and Lambdas. Indeed, one could argue that the movement to move Closures in to OO languages (first C#, now Java) was in part due to the rise of awareness of functional languages.

Further, it seems to me that functional programming languages answered two prayers of those more ambitious engineers who don’t seem to want to stick with the script and Java worlds they were taught in college. Those two large wins, far more important than the semantic features of functional languages that have gotten all the attention, are architecture foundations of functional languages:

  • Referential Transparency / Side Effects
  • Concurrency

Referential Transparency

To those coming from a pure OO world, Referential Transparency and the restriction of side-effects can be something hard to get their heads around. The best way I describe this concept is by hitting at the root of their assumptions: Everything they deal with are dead. The objects are dead, the variables are dead, the entire atmosphere is dead, as if something had come along and killed everything in your stack and you have to assemble your program by only what’s been given to you, nothing more. There are no instances, objects do not “come alive” and have state; a state that you have to poke in to and a state that can change at any time. A function will always do what you expect, and nothing can come along and change that behavior.

One of the things that seems to appeal to developers most about the promise of SOA architectures happening in enterprise environments, if you’re smart enough to pry it out of them, is that they get the same referential transparency in services. No one can override a service (besides versioning, which is explicit to the developer) and a service will only return what it did earlier in your code and earlier in the year. This forces developers to design services that have the same relationship to the world as functional programmers write their functions for. This is perhaps the trickiest part of migrating enterprise teams to a services based model, their expectations of the mutableness of the services they are accessing and their inability to anticipate what working in that world will be like. Especially for those who use tools or libraries to convert service interaction in to an object, the interaction can be jarring.

However, the soon find the predictability and the safety of such an environment liberating. In much the same way OO programmers were use to making their objects or variables immutable to maintain their contracts and relationships with other objects, often sacrificing many of the benefits that OO programming promised their stack, now they have immutability and transparency in an environment where functional paradigms are key, they do not expect to be able to “embrace and extend” services. They are what they are. This tends to cascade out to the living instantiated code a developer writes as well, as there is no point in entering the world of the living if what you have to return to is a dead function.

This was hinted at in an article in the ACM Queue magazine by Terry Coatta, entitled “From Here to There, The SOA Way“. He states,

Objects are still a very good way to model systems and they function reasonably efficiently in the local context. But they don’t distribute well, particularly if one tries to use them in a naive way. A service-oriented architecture solves this problem by dealing with the latency issues up front. It does this by looking at the patterns of data access in a system and designing the service-layer interfaces to aggregate data in such a way as to optimize bandwidth, usage, and latency.

Not that SOA limitations are the only thing that is affecting the consciousness of a software engineer, the other issue is the large rise in the complexity of managing a large enterprise library written in an OO language. One of the largest pain points of any application of large size is the management of graphs and graphs of live objects and the living data within them. When software engineers experience the lack of side-effects in functional languages, it’s a breath of fresh air.

Concurrency

A funny thing happened on the way to those multi-core processors. People loaded their applications on them and noticed nothing got much faster, particularly when it came to transaction intensive tasks. Turns out Intel and AMD left out an important fact about their Moore’s Law cheating multi-core environment: you can’t ring as much performance out of it without changing the way you manage concurrency and threads. Sequential programming could always rely on going faster as the single processor speed got faster, but as multicores come in to play that isn’t always the case. You want to farm off transactions to occur on separate processors, and in the living world of mutable objects and variables, breaking out two transactions to work concurrently that operate on the same living data is a bad idea. Add structural programming’s solution to this problem, optimistic and pessimistic locking, and you have dead-locks in short order.

Functional programming has been a natural place to explore parallel processing and new ways of doing atomic transactions because of the reasons above. More important, these atomic structures can be composable which is lost when doing locks in structural programming. A lot of the buzz has been generated around the idea of software transactional memory, where execution blocks can be flagged and managed and built upon. The best introduction to this topic is the paper by Tim Harris entitled Concurrent Programming Without Locks. Although this use to be expressed only in the confines of Concurrent Haskell, others have shown how the same techniques can be used in other functional languages, such as F# using nothing more than PowerList.

This experimentation is one of the large reasons why functional languages have become more important as software engineers wrestle with the problems and promise of multi-core processors in transaction processing. Although not every engineer will be interested in the deeper details of STM or other strategies in concurrent programming, the fact that these libraries will emerge and only be available in the functional realm will force software engineers to learn the core concepts and bring even more visibility to the functional programming space.

Functional Hybrids: Functional Programming Is Now Approachable

The other driver for adoption of functional programming languages, besides the architectural benefits it has to solve current problems, is the fact that languages such as F# and Scala have adopted a more hybrid model in their language design, where a developer isn’t forced completely outside her comfort zone. Scala is a combination of functional and deeper OO methodologies (as in SmallTalk) and has access to the entire Java library, significantly reducing the learning curve. The same can be said for F# and .Net and Nu and Objective-C. This does have draw-backs however, as both F# and Scala have not been able to use more of the STM strategies that Concurrent Haskell allows because the underlying thread architecture of the VMs they run against are built for structural programming languages. It is easy to see how this can be fixed, however, and allow those using hybrid functional languages the same power as those who express their ideas in Haskell or even Lisp.

As I said, I am excited about this new resurgence in functional programming languages, and I am enthusiastic 2008 will have even more to offer those who are just getting their toes wet. I personally know some college freshman who started out using Nu as their first language, and are already contributing to the community. The future of software engineering is bright.

Facebook, Scoble and Web 2.0: It’s Not The Data, It’s The Work You’ve Put In To It

Sunday, January 6th, 2008

Way back in 2005 I wrote at length about the danger we all face putting data online. Back then, I was running The Planning Studio Inc., and we had experienced a lot of expansion and were outgrowing Salesforce.com as a Sales/Contact manager. When we became aware that extracting not just our data but also the relationships between those data was impossible, I wrote a word of warning to the blogosphere about data. I would have assumed three years later there would have been a common and demanded way to export your data. Sadly, that is not the case. Scoble became a victim of this with Facebook, and what shocked him was what occurred to me back in 2005: In Web 2.0, you don’t own your data.

It’s Not Just About Your Data, It’s About The Relationships Between Them

Although a few bring up the fact there are some ways to export your data from these various services, that missed the point badly. What Scoble and the rest lost, although they may not have been able to articulate it, was the work they put in to their data. When we import our data in to a social network application, we don’t expect to leave it in a static state. We usually import our address book and calendar, among other things, so that we can then begin creating relationships and making our data more valuable.

Think of the tagging feature in Flickr and Facebook. How often have you sat and painstakingly tagged your friends or photos for maximum social use? You may have even organized your photos in to sets that would be more viewable and easy managed, spending hours uploading and tagging. What about the APIs that allow you to visualize, browse and draw conclusions from this metadata that exists from these relationships you’ve made? You may have seen relationships you had never expected. By using social applications on the web, you have changed and made your data more useful, as simply as if you imported a CSV file in to Excel and made graphs of that data.

Does that mean that Microsoft can claim that data and the charts you’ve generated are their property? Can they take those graphs and spreadsheets away at a whim?

The same problem applies to Google and Yahoo!, among others. While many wonder if these Web 2.0 application providers are looking at our confidential data, the real worry is if they decide you can’t anymore. Although Google says publicly they have no process to look at our data we store on their servers, they most certainly have a process to remove you from it if you violate their Terms of Service, something that is at best arbitrary and a process in which you have no legal right to appeal.

The Worse Part Is, Your Hard Work Benefits The Social Network Too

The most insulting part of Facebook and others wielding that sort of power over our meta-data is that it’s through our hardwork their service is useful. Although you could call our work managing and forming relationships between our data a non-zero sum game as we also reap the benefits of the connections, our hard work makes their social websites a better and more fun place to be. If no one took the time to tag their friends in their photos, or enter tags on blog postings and pictures how interesting would these places be? It seems to me that claiming complete control over that data is a slap in the face no user of social networks should tolerate. The same holds true for other Web 2.0 apps that manage our data, including Google and Yahoo!. Deli.cio.us would Just Another Bookmark site if it wasn’t for the hard work it’s users put in to tagging and managing their bookmark data.

Although social networks can be considered non zero-sum, other Web 2.0 companies are decidedly zero-sum; we do all the work. Although there is no social component to these services, it is usually this data and the relationships between them that are more important. Many small businesses use Salesforce.com and have their business contacts linked to companies they bill for services which are in turn linked to billing and payment systems. If they don’t pay their bills for a month, or worse if they wanted to take their business elsewhere, that painstaking work linking these data pieces together, notes and all, would be lost. The same is true with financial analysis of your bank account on Mint or Notebooks on Google.

How To Fix It

Being in software architecture for large enterprise systems, the solution to this seems simple and easy. We need a way to export this data in XML or some other format that, even if it doesn’t contain the content itself (pictures and files would not be practical) the relationships between this data could be exported. It need not be wrapped up in some long drawn-out collaborate standards body in which Flickr, Facebook, Salesforce.com, MySpace and IBM (they join every standards body) would sit down and spend five years coming out with a standard.

My proposal? Facebook just does it.

In the Web 2.0 world, the prime mover is usually the standard maker for the rest of the industry. When Facebook provides it’s users with some export tool, no matter how complicated or mangled the XML would turn out to be, immediately developers would come up with tools to parse and import this data in to other applications and systems. Some would even take this data and provide analytics across various networks, something that other websites are attempting to do but at a substantial risk to your privacy as you must provide them with your login credentials to every site you belong to. Facebook may be bearable to give your login information to, but your Amazon.com profile with your One-Click Buy enabled is quite a different matter.

Other Web 2.0 environments, either through customer demand or marketing reasons, would be quick to follow. Soon, some standard schema would emerge that would be predicable or at least validated online by the services themselves. XML transformation is standard fare in the enterprise world, and there are many C# and Java developers who know how to transform some XML to another XML standard with their eyes closed. It could open up more avenues for migration and populating social networks, as well as doing personal and business analysis of your data.

Whatever the format, as long as the data inside the file is described and well formed (what XML was designed to do, describe the data it contains so any software can make sense of it) we should be able to make quick work of migrating that data to other services or applications.

Why It’s Important

Although we often don’t see it, Scoble, you and I are still “in the bubble” when it comes to technology and social media trends. We often think that because something is important or obvious to us, the rest of the world should be up in arms about it as well, when in reality they are just sitting in the living room watching TV. There are many users of Salesforce.com, Facebook and Google Notebooks that don’t think about this problem, or worse assume that there would have to be a way because it’s just logical there would be. Why would they take your friends and your photos away without giving you any recourse? Why wouldn’t you be able to export your entire client list and company list and maintain those relationships? Even just drag them in to Quickbooks Professional Edition? Wouldn’t there be some law making sure they can do that?

The answer is no. Hopefully, those who have larger voices than I did in 2005 will take this issue to a more public sphere. The danger is that this problem fades back in to the blogosphere, where Scoble gets his traffic and his hits and moves on. I don’t want to write this article again in another three years.

We should do something now, maybe using those same social networks to organize.