Giovanni Guardalben, HiT Syndicaat

Developing Enterprise Syndication Management Systems

Name:
Location: Affi, Italy

Sunday, April 24, 2005

Syndication Management: Smart Client or Browser-based Console?

Early on in our design phase of HiT Syndicaat, we had a difficult technical decision to make. How were we going to develop our syndication management tools in such a way to make them simple to use and effective at the same time?

We had a few options to consider. Considering the architecture of our platform (Apache Tomcat, REST APIs and Lucene as backend database), initially we lent towards standard JSP development. Its major pluses are stability and maturity of the approach as well as the definite advantage to having a browser-based management console. In fact, all syndication and blogging platforms currently available (commercially and open-source) are browser-based applications (although most of them are PHP-based rather than J2EE applications). It is commonly agreed that browser applications are extremely effective for the average user since they do not require deployment and setup. There is also a definite similarity among all browser-based applications (once you are familiar with it) that makes the average user more comfortable.

Besides this option, we considered two other alternatives. In fact, we had major doubts about the JSP approach because on the one hand we figured that management consoles are complex GUI applications that over time get even more complex (by natural progression of the product). On the other hand, our typical user is not the typical "web surfer". In most instances, it is either systems management personnel or a "power user". These people are more inclined to appreciate effectiveness over GUI slickness.

So, we looked at DHTML development and smart-client development. The former approach is all the rage these days (see gmail and gmap by Google, or the discussions about AJAX). With smart DHTML development, by all means, your browser becomes a smart-client (especially when combined with asynchronous HTTP/XML calls to the backend server). Our major concern with scripting development was code stability and code maintenance. Until scripting environments mature significantly to become as robust a development environment as typical IDEs, we figured that quality and costs were not acceptable for our development schedule.

In the end, we were left with smart-client development. Considering that our target was the development of a powerful syndication management console, we were comforted in our decision by the fact that as far as we know most management consoles for DBMS are smart client applications (albeit not HTTP ones). Although many backend systems are provided with browser-based tools, full-feature management consoles are all stand-alone applications.

We think we overcame implicit limitations of the smart client approach by developing in Java/SWT (operating system portability) and by providing browser-based setup environments (to simplify application deployment and setup).

Saturday, April 16, 2005

Open Search & Syndication

Open Search is web tool that support searches on multiple heterogeneous web data sources using a minimalist web query syntax and a minimalist XML/RSS schema for returning query result sets. Query syntax is based on query templates provided by web data sources when they register their URLs to the Open Search directory of compliant web data sources.

Clearly, the primary objective of this specification is the one of radically changing the way web searches are currently supported by search engines. Rather than relying on ever growing search repositories, web search platforms should perform concurrent searches on user-selected specialized searchable repositories. This way, multiple specialized search engines can be searched at the same time and their results coalesced into a combined XML document.

In the Open Search approach, RSS 2.0 (i.e., the format used for returning search result sets) is just the result set XML schema specification. In fact in most cases, the internal data structure of the heterogeneous web data sources can be anything and typically, it will be neither RSS, nor XML (nor is the semantics of web data sources linked to syndication).

Even though in Open Search, RSS is only a convenient format for returning search results, there are no intrinsic reasons for not using the Open Search specification for searching syndication content.

Blogs can be very conveniently searched using Open Search query syntax. However, should "pure" (i.e., syndication documents not tied to a blog) syndication content be searchable with Open Search ?

If a syndication management system is used to complement a Web Content Management system, it should definitely support the Open Search specification. In this context, an important extension to Open Search should be support of authenticated access. This would enable searches based on named subscriptions. We also believe that more advanced search capabilities (such as the use of the RSS category element, or restrictions on all RSS elements or attributes) would improve the quality of results. It should also be possible to return data source specific extensions to RSS in the result sets.

In our implementation of an enterprise syndication management system (HiT Syndicaat), we decided to Open Search-enable queries from the first release. This turned out to be a fairly light task because our URL-based query syntax was already richer than the minimalist Open Search query specification.

I am hoping to hear from HiT Syndicaat users about how useful they found our Open Search support.

Tuesday, April 12, 2005

Web Content Management & Syndication

With limited effort, WCM systems can be extended (and most of them already are) to return XML feeds linked to dynamically generated web content (see for instance the CMS Matrix at CMS Matrix for a complete roundup of RSS support in CMS). In most instances, when WCMs are in place, many of the functionalities of an RSS Syndication Management Systems (Syndication Management Systems) are provided by the WCM itself.

Typically, this is accomplished by creating an infrastructure for feed management (feed creation and editing) and an infrastructure for feed content generation. Basically, when creating a new node (web page) in the WCM, the same node can be assigned (automatically or manually) both to the web hierarchy and/or to the RSS feed collection of items. This approach is definitely appropriate (and probably the optimal one) whenever there is an almost one-to-one relationship between what is shown on a web page and what is referred to by an RSS item.
However in many situations, this is not the case because, compared to syndication, web content is fairly static over a period of time where syndication content is generally highly dynamic. Not only that, there are other mismatches between the two models:

- Syndication content lives long (see The Long Tail about the Long Tail approach). It should also be searchable for a long time. Typically, we do not have the same requirements for web content. Given the long life of syndication content, a syndication system should be provided with the means for letting external applications to query its content (see for instance the OpenSearch http://opensearch.a9.com/specification) and the means to store its content indefinitely.

- Syndication content has limited graphical requirements. In many instances, syndication content could do without HTML. In other instances, when HTML content is present (typically pointed to by links in RSS items), editing of its graphical content should require only few strokes (possibly, from a set of pre-configured templates). For editors, item creation should be a simple yet powerful action.

- Management of syndication feeds should be highly dynamic. A powerful syndication system should support a dynamic model for the creation, editing and removal of feeds. Power users should be provided with the ability to affect the number of feeds and the choice of who is the potential target for the feed. Compared to typical usage patterns in web development, syndication feeds are closer to weblogs than to portals. Editors (not site administrators) are often empowered to create content and feeds.

- Another major difference in comparing RSS feed management to WCM, is the RSS ability to cater content from heterogeneous data sources compared to data coming from a central repository provided by WCM. RSS Syndication Management Systems (at least according to our view of RSS SyMS) should be totally programmable by means of REST XML APIs. This way, content could come from ERP systems, Office applications, databases, etc. Similarly, RSS content should be searchable and accessible to heterogeneous systems (not only the web browser) such as mobiles, multi-media players, iPods, etc.

Given the differences, we believe that a hybrid management system (i.e., WCM + Syndication Management System - that for simplicity we refer to as Syndication Content Management) should be considered having the following components:

- a light-weight content authoring GUI interface to manually edit content and create syndication HTML templates;
- a syndication management system;
- a set of web content aggregators (to simplify the creation of automatic content to syndicate);
- a query processor (compliant to the OpenSearch architecture) to make the syndication content searchable over the internet.

Monday, April 11, 2005

Enterprise Syndication: Do We Need Syndication Management Systems (SyMS) ?

This is a personal weblog about our new family of software products: HiT Syndicaat (www.hitsyndicaat.com). In this blog, rather than replicating marketing hype, I will try to provide technical background information on why we did what we did.

Please feel free to contact me at gianni@hitsyndicaat.com

And now, let us move on to the subject in the title.


In my previous life (as VP of R&D at HiT Software), I dedicated many years to database technology. During those years, it occurred to me many times that database technology followed a typical evolution path: initially, developers coded their storage routines directly in their applications. Gradually, things got so complicated that specialized libraries were used. Eventually, companies (database server vendors) created specialized servers capable of serving multiple users concurrently (concentrating advanced features server-side rather than client-side). Nowadays, developers interact with database servers by means of standardized (ODBC/JDBC/etc.) client libraries. In turn, client libraries support complex wire protocols to chat with their servers (to remotize all those advanced features that servers support).

So, the question is: will syndication technlogy follow the same path as database technology? (or, as a matter of fact, search technology, streaming technology, etc.). Are developers and companies currently mostly developing ad-hoc solutions for their syndication needs?

If we look at development patterns for syndication technology, we find the following typical cases:

-web portal developers manually extending their web sites to publish syndication content that derives from database storage
-weblogs re-publishing their content as syndication content
-search sites returning their search results as RSS (see OpenSearch 1.0)
-occasionally, people manually typing their RSS content and FTP it to web sites.

Definitely, we are in the midst of an evolving phase for syndication technology. Companies are scrambling to extend their current technology with syndication features. However, if the past is any indication for the future, perhaps we will see the birth of specialized syndication servers that will couple publishing and storage with search, administration, security, backups, etc.

So, what should syndication servers provide to their client applications? Here are a few ideas:

-storage (relational and/or text-based)
-search (field and/or text-oriented)
-query support (sql-like and/or search-like)
-security support (authentication, encryption)
-authenticated feeds
-authoring support (browser-based and/or smartclient-based)
-content management for feeds
-logging and backups

Did I miss anything here?