It doesn't matter if you call it a web service, client/server, service oriented architecture(SOA), remote procedure call, or one of a myriad of other names, distributed computing is difficult. There have been a number of attempts at delivering flexible systems over the years. It seems though, that no matter how hard we try, we can't get it right. Ethernet got it right for local networks, and it proliferated. Networked computing got it right with TCP/IP, and it has proliferated. When it comes to the top of the network stack we can't seem to find a good stable solution that just works.
This article started out to be a well balanced view of currently available distributed technology. I apologise in advance because as it developed I found myself writing with amazement and frustration at the current state of communications technology. Why has it taken so long to achieve the fundamentals we require to deliver distributed applications easily?
In particular, my frustration was ignited after reading an article about the debate between REST and SOAP in web services. What amazed me is that this is a hotly debated area. Why? It completely misses the point that both of these communications systems are a huge step in the wrong direction for distributed computing. They over simplify the problem and just create more work for the developer. Unless this is an evil conspiracy to increase programmers income by requiring them to write more code, I don't see the point. Where is the loud calls and shouts from developers asking for the real problems to be solved?
There is plenty wrong with communications systems out there. Here's the reasons behind why I think various distributed computing technologies are broken:
XML-RPC
Let me look at REST and the XML-RPC approach first. Having in the past developed an application using this kind of mechanism, I can safely say that it is useful for only the most simple of services.
XML-RPC requires that the programmer handle the construction of requests, handle the marshalling of data, the responses, and in general requires the programmer do too much work. This mechanism offers nothing in terms of data typing, requiring the developer design and implement nearly everything themselves on top of the HTTP protocol. It learns absolutely nothing from past RPC mechanisms and adds little as well.
Importantly, once a service is developed it allows little re-use and adds little to the problem of integration with other services. In effect the developer needs to code each and every service as an individual entity.
SOAP/WEB SERVICES/SOA
This has got to be the strangest development in distributed computing in its history. If you had told one of the first communications engineers from the 60s that in the year 2004 there would be a text based communications system that required over 30(does anyone have the current number? It seems to be growing) separate standards he would be in
shock. Then tell him that binary data is encoded using a BASE64 algorithm because the data representation system can't handle binary data. Then tell him that to overcome the limitations of the encoding system, that people push the data through compression algorithms to attempt to reduce the data overhead. Then say that this system is being pushed by the largest corporate vendors as the silver bullet of
integrating distributed computing. If I was that communications engineer, I would have given up and changed jobs.
The current Service Orientated Architecture has learnt little if anything at all from past communications systems. The people who developed CORBA sit back and watch with amusement as all the problems solved in CORBA are being solved again in mostly the same ways. Instead of delivering simplicity, web services has created a more complex system than CORBA or anything before it.
At the heart of web services is a lone jewel in the crown. XML Schema provides a flexible type system in which to model data. While XML did come first, XML Schema provides the corner stone which allows web services to work.
At a recent seminar I watched a technology evangelist provide an excited speech about the wonders of Web Services. Sitting close to a colleague who I knew worked in the heyday of CORBA, I asked, "How much of this content have you seen before in the CORBA days?" His answer, "99.9%". Web Services is offering all the same messages as CORBA provided back in 1995, but as with CORBA, it's yet to deliver the goods.
CORBA
This of course brings me to CORBA. Having been interested in distributed communications for a long time, I had the pleasure of watching the rise and fall of CORBA as the latest silver bullet from 1995 to around 1999. While CORBA is not a complete failure, it was never going to become the defacto standard. Its design by committee approach and a monolithic architecture spelt inflexibility. If web
services have learnt one thing, it is that a monolithic one size fits all does not work for distributed computing.
CORBA's biggest Achilles heel is GIOP. The underlying data representation system and protocol in one, is inflexible and not exposed to developers. CORBA's interface definition language(IDL),the XML Schema of the CORBA world is not flexible enough, forcing the developer to jump through hoops to achieve many common tasks.
ASN1
An often forgotten about entrant into the distributed communications arena. ASN1 is the technology that could have been. It demonstrates that binary data can be tamed and that we don't need to use text based data representation just so developers can read the data they never see. However it never went far enough; its developers never seemed to deliver the next step of providing ASN1 technology to Object Oriented programming languages.
Research
I have been waiting patiently since I started university back in 1992for a better answer. Something that works. Where is it? For the first time last year I had the opportunity to attend three distributed computing conferences. At none of these conference did one paper stand out that challenged the underlying mechanisms of the above technologies. There was plenty of papers which built interesting services using these technologies, however, none questioned and attempted to improve the fundamentals of these systems.
This is a sad state of affairs for distributed communications research. We rely on the research community to deliver the fundamental paradigm shifts required to shake up the industry. However, the University researchers I spoke to said that to receive funding they needed to research the popular technologies. Does anyone do real research anymore?
There are however some good things that have been done. You just need to look hard, and have a memory longer than the latest sales pitch to find them. I'd be interested to here what else you think is right and works. This is a list of what I think they got right.
CORBA
The communication starts when a method call is made, and finishes when a method is called on the server. Everything that happens between is part of the communications system. This simplifies programming and allows the developer to concentrate on solving business problems, not technology problems.
XML Schema & XML
The data representation type system provided by XML Schema is flexible method to model and constrain data formats. Its extensibility allows the developer to create new data formats to model nearly any structured XML data. It is a separate technology from the communications system
allowing it to be used to model and share data not just in
communications. The same data formats can be reused in a files, web, distributed communications or databases.
Message Queuing
IBM's MQ Series is becoming the standard way to move messages around large corporate networks. Separating services from clients and forcing companies to think about the data they move around their organisation.
It provides a simple set of concepts and delivers it well. Lets hope that IBM doesn't get too distracted by web services and keeps improving this product without compromising it for the latest craze.
HTTP
The separation of the transport mechanism from the data format. Allows any data to be delivered using a simple request/response mechanism. This separation of the transport layer from the data ensures flexibility in the services it can deliver. A communications stack with distinct and
well thought out layers is always going to succeed over a monolithic approach.
Of all of these things, I believe the marriage between XML and XML Schema is the most important. It creates a single data representation which is strongly typed in an environment external to a programming language.
Strong data typing is a term generally used with programming languages. The more popular programming languages, such as Java, C++, C, etc are considered strongly typed languages. Having a strongly typed language is important because it ensures that the programmer will catch errors in their software earlier during the development cycle. Today's strong typed languages combined with a powerful integrated development environment (such as Eclipse), allows the developer to catch errors as they develop their code. This early catching of errors speeds development time and ensures less errors are found in production software.
XML and XML Schema provide a strongly typed data format that can be joined with a host programming language type system. The ability to provide a strongly typed data format which can move between languages is what gives its strength in web services.
XML and XML Schema has spurred a shift in the computer industry which recognises the importance of data. The emphasis on the structure of data and information that it contains is spurring research in meta data, ontologies and the software we use to manipulate data. What is surprising is that very little alternatives to XML and XML Schema are available. In fact, if you know of any, please let me know. How we represent and describe our data between applications and in communications is one of the most fundamental aspects of computer science. It also one of the least investigated.
But why is all this so important? Why should you care about the fact that we don't seem to have solved and delivering a distributed computing environment that works? Because the internet, mobile and wireless computing is becoming prolific. If developers are to be able to deliver on the promises of pervasive computing we need the tools to build dynamic and flexible systems.
If the largest organisations are so busy pushing the web services silver bullet it means that they are not busy solving the fundamental problems. Web services are not appropriate to be used in your mobile phone to do video conferencing, the latest networked game on your Sony Playstation Portable, or used to connect your car to your cities traffic management system. It's not going to be appropriate for sensor networks, or any number of other problems in the ubiquitous computing domain. If we are going to solve these problems we need to question the fundamentals.
What is required is that more developers take the risk of challenging the status quo. By developing, prototyping and delivering different solutions to the fundamentals the best will rise to the top. It is important to look at what we can't do with technology like web services, and then come up with ideas as to how to fix those technologies so we can. Then we might be able to deliver real time collaborative software, more interesting uses for peer to peer networks than file sharing, and start to explore mobile data mining, or a range of other possibilities.
I'm not just complaining about it, I am doing something about it. Argot and Colony are my effort at delivering an alternative solution. Does it provide a fundamental paradigm shift which learns from the past and delivers something new? I think so.
Argot is my corner stone, it is designed to describe any well structured binary data. The Argot software is like a binary type system found in programming languages. The difference here is that it is designed for data type agreement between platforms. Argot is also totally binary, describing itself in binary. This becomes an important concept when bootstrapping communications.
Argot is able to negotiate strong data typing for binary data between client and server, or directly to file. During communication the data types are negotiated and checked for consistency. In effect Argot provides the flexibility that XML and XML Schema provide but with binary data. This allows you to develop a consistent data representation for your data then reuse it in file formats, web services, streaming communications, etc. This is all done in a very small foot print, 40k jar for Argot in Java.
The Argot dictionary in effect provides a runtime reflective data model which can be extended. In a way it could be seen as taking some of the aspects of ASN1 and providing the information at runtime, and allowing it to be compared between client and server. To achieve the ability to compare data types, type identifiers are negotiated and set during communications. This allows only the data types used in a particular communication session to be compared and used. This method, among other things can allow security to be applied where only specific data types can be communicated to specific clients.
In some ways Argot is comparable with XML Schema. Argot provides the same strong data typing as does XML Schema. In the same way XML Schema describes its own Schema using its own descriptors, Argot describes its own format using Argot descriptors. However, instead of being constrained by XML as its building blocks, Argot describes the fundamental binary data used by computers. Doing this allows Argot to be used to describe any binary data which holds structure. This is an important fundamental change and allows the information we communicate to be delivered in a compact and more concise format.
In fact Argot has data typing as its own layer in the communications system. This can be placed on any transport mechanism, HTTP, Socket, SSL, etc. An interesting aspect of this is that by separating the data typing layer as its own layer means that it also handles all the marshalling. So the session and application layer concepts of the traditional OSI model work with data in the form most appropriate to the host language; be it object, structures, or binary buffers.
Colony builds on the type system provided by Argot to deliver remote method invocation in the same way Web Services builds on XML Schema. Colony however is not just a simple remote procedure call. Colony wraps method invocations in the concept of a network virtual machine. Combined with Argot it allows stacks, heaps and dynamic instructions to be transferred between computers. A Colony network virtual computer can behave like an Agent and make multiple hops between computers, or make multiple method invocations in a single request/response pair.
Having all the strict typing in a layer above the transport layer ensures that only the data we know how to handle is received, or sent. It also allows the concept of a network virtual machine relatively easy to put together. Being able to represent and marshal Stacks, Heaps and Instruction sets becomes simple. Any object that can be marshalled can be placed on a stack or in a heap and sent to another server.
The Application layer of the communications system then simply becomes a process of instantiating a network virtual machine, writing a few instructions to the machine and storing the values. The whole machine is then sent to the server where the instructions are executed.
An important aspect of runtime identifier negotiation is extensibility. The Argot dictionary provides an abstract data type. Concrete types can be mapped to abstract data types at anytime. This allows new data types to be added after a set of types are defined. In Colony thisis used to define instructions in the network virtual machine. A user is able to extend the instruction set of the virtual machine at anytime. The strong data typing between client and server ensures that an instance using a new instruction could only be transferred to machines which have also defined the new instruction.
Argot and Colony in effect provides Web Services functionality at lower bandwidth, lower CPU and memory cost, and stronger typing than XML based Web Services.
However, my work with Argot is only just starting. Argot used in a database could provide strongly typed data and objects that can parse easily from database to client software. A database could store more complex data structures and use the Argot type system to form the table structures. In effect allowing the database to handle the object/relational mapping for the software.
The Argot type system is used to describe the instructions of the network virtual machine. This provides proof that a dynamic instruction sets can be created to build virtual machines with dynamic byte code interpreters. The virtual machine byte code instructions being able extensible to provide specific instructions to meet the needs of its environment or execution hardware.
The Argot type system could even be used to create a totally dynamic binary language. By removing the requirements of syntax parsing, Argot can allow new concepts to be added to the language as needed. Using an editor that redisplayed the concepts of the language as text, the programmer can have the same experience as developing Java, C# or other high level language. Only in this environment they can use plug-in language elements to support concepts as Aspect oriented programming, parallel programming, or even distributed programming.
I think Argot and Colony provide a real shift in the way distributed computing is handled. I'd like nothing more than for you to want use it in your systems. However if you don't, use it as an example of how you can challenge the underlying technologies. Use it to question and work out what you can't do with distributed computing systems currently. Then with that knowledge start developing some solutions that might just work. Learn from the past, and try not to make the same mistakes that have been made before. Distributed computing has a lot of work before we can safely say it just works.
