Giovanni Guardalben, HiT Syndicaat

Developing Enterprise Syndication Management Systems

Location: Affi, Italy

Wednesday, July 13, 2005

Interesting Article: Interview with Patrick Chanezon

Read "Interview with Patrick Chanezon". They talk about Enterprise Syndication. HiT Syndicaat is mentioned as one of the two players in this address space.

HiT Syndicaat is Powered by Apache Lucene

Originally, when we started development of HiT Syndicaat, we had envisioned a rather heterogeneous pattern of usage. Basically, we were concerned that content to syndicate would come from data vs. text content. In the former case, data is frequently updated. In the latter one, content needs to be searched in sophisticated ways. For this reason, in our product architecture, we organized repository to support both a text repository and a relational repository. It turned out that for internal reasons, we started developing the text repository support, first. Naturally, we immediately considered using Apache Lucene as our text search engine.

Things we noticed right away were the incredible performance of text searches (try searches at our demo RSS sites: HiT Syndicaat RSS Demo Feeds ) and the relatively slowness of response on updates (i.e., when changes are immediately visible – if changes are cached for later commitment, performance is outstanding). Soon, however we realized that content syndication is not about real time performance (as in online transaction processing) but rather about efficient and powerful text queries. For this reason, we decided to postpone development of the relational repository support and to stick to the Lucene-based text search engine for the foreseeable future.

Currently, as far as we know, not many web logs or RSS platforms are Lucene-based (check Powered by Lucene ). This is rather surprising considering that RSS and web log content is preeminently text and the efficiency of the Lucene software (let's not forget it is open-source...). What do you think?

Sunday, June 26, 2005

Anyone for A9 ?

Recently, in this web log, we described how we aggregated content obtained from BBC backstage RSS feeds, how we combined it into just one RSS feed and made that feed searchable thru A9. We received a number of objections to this experiment. People told us that popular RSS search engines can already be queried using A9. Others objected to the fact that RSS content is just a fraction of the "real" content pointed to by the RSS link field (optionally) contained in each RSS item. Both objections indeed raise reasonable doubts about the approach.

However, being able to carefully select just a small set of RSS feeds (possibly filtering them with feed-specific conditions) and make them collectively serchable with A9, has the advantage of visibility. Selecting a specific "column", A9 users make the conscious decision to keep "relevant" columns (i.e., subjects or data sources) visible at all times with query results. This will never happen when searching with a global RSS search engine in just one column.

The other objection about RSS content being too "slim" compared to the "real" content is a very valid one. Indeed, we are already operating on this issue in such a way NOT to include content pointed to by RSS link fields, but rather by ONLY text-indexing it in our data store. This way, when executing a text search on RSS content, items can be returned whose links point to text satisfying the query request.

We wish to proceed with our experiment and are avaiable to freely A9-enable other RSS content pointed out to us by readers of this blog. So, is there anyone for A9 out there ?

Saturday, June 04, 2005

BBC Backstage, A9 and HiT Syndicaat

Recently, the BBC started an interesting initiative called backstage (

According to their web site, the motto is Build what you want using BBC content. So, we tried a little experiment. We aggregated and stored all the provided BBC RSS feeds into a unique RSS feed (note: platform support provided by the HiT Syndicaat server). Our goal was to provide a demo environment to show the performance and resource utilization of our RSS server.

There is an interesting twist to our experiment, namely RSS search support. HiT Syndicaat supports searches and also supports the A9 search format.

So, we registered the BBC RSS feed collection to the A9 server and now you can run your queries on BCC content (starting May 18, 2005) within A9.

To do that follow these instructions:

Go to

Login (or Register if necessary)

Go to

In the Search Columns edit box enter: RSS BBC. There, you will find our collection of all RSS BBC feeds coalesced into one feed and provided by our server. Add that column and run your searches.

Alternatively, if you want to run your searches directly from our server, use the following URL:

followed by URL parameters as in the following examples: to return all RSS items in the last 15 minuts to return all RSS items containing Paris in the title to return all RSS items containing Paris in the description to return all RSS items saved three days ago.

Please, refer to HiT Syndicaat documentation for further instructions on search syntax.

Tuesday, May 17, 2005

RSS Content Syndication and Location-Based Services

More and more technology commentators indicate that RSS is going to be used for a lot more than blog content syndication (see for instance RSS, Not Just for Blogs Anymore! ).
In the past weeks, major developments came from A9 (Amazon - Open Search). Basically, they proposed that public search engines support a minimalist approach for query definition/result set descriptions based on RSS 2.0. This comes from the intuition that RSS as a collection of categories each one containing a time-sorted collection of items is a very simple yet effective way of distributing content (whether it comes from text searches, web searches, blogs, database searches, etc. or other datasources - such as GIS sources).
A few months, HiT Internet Technologies, Spa decided that RSS technology was a good candidate for strategic software development. For this reason, I started an initiative (called The HiT Syndicaat Initiative) to create a framework for an enterprise RSS content management and authoring.
So, now we have HiT Syndicaat ( which is an enterprise syndication management system (based on the RSS 2.0 XML standard) that simplifies the tasks involved in the creation and editing of content to syndicate.
However, where does this fit with Location Based Services ?
A few months ago, we had a project request for proposal for linking a GIS system to an event/alert management system. The underline requirement was to provide government and private institutions with modern means to distribute quasi-real time location-based events (suche road blocks, construction, disasters, etc.) over the internet/intranet (possibly, with security and authentication included).
We figured that RSS syndication was definitely the right way to support GIS-based alerts. Relying on HiT Syndicaat, this turned out to be a fairly simple task. We created a number of RSS feeds based on the event types and geographical areas to cover. Then, a web-based GIS application (MapWorld by HSC Italia MapWorld) was augmented to allow the creation of specific events in specific geographical areas.
Currently, users are returned alerts both as RSS alerts (by using standard RSS aggregators, or a HiT Syndicaat specialized alert) and as SMS/email messages for the syndication channels they subscribe to.
We presented our location-based solution at a recent Mobile Monday event in Rome. You can find and download a .ppt presentation at Mobile Monday LBS.
Future enhancements: we are considering extending our client alerts to define flexible areas to monitor for events (i.e., geographical area events are generated dynamically). Also, we are considering using the new GreaseMonkey technology (from Mozilla Firefox) to inject into generalist mapping engines (such as GMAP) in such way to make them capable of creating HiT Syndicaat-compatible syndication messages.

Sunday, April 24, 2005

Syndication Management: Smart Client or Browser-based Console?

Early on in our design phase of HiT Syndicaat, we had a difficult technical decision to make. How were we going to develop our syndication management tools in such a way to make them simple to use and effective at the same time?

We had a few options to consider. Considering the architecture of our platform (Apache Tomcat, REST APIs and Lucene as backend database), initially we lent towards standard JSP development. Its major pluses are stability and maturity of the approach as well as the definite advantage to having a browser-based management console. In fact, all syndication and blogging platforms currently available (commercially and open-source) are browser-based applications (although most of them are PHP-based rather than J2EE applications). It is commonly agreed that browser applications are extremely effective for the average user since they do not require deployment and setup. There is also a definite similarity among all browser-based applications (once you are familiar with it) that makes the average user more comfortable.

Besides this option, we considered two other alternatives. In fact, we had major doubts about the JSP approach because on the one hand we figured that management consoles are complex GUI applications that over time get even more complex (by natural progression of the product). On the other hand, our typical user is not the typical "web surfer". In most instances, it is either systems management personnel or a "power user". These people are more inclined to appreciate effectiveness over GUI slickness.

So, we looked at DHTML development and smart-client development. The former approach is all the rage these days (see gmail and gmap by Google, or the discussions about AJAX). With smart DHTML development, by all means, your browser becomes a smart-client (especially when combined with asynchronous HTTP/XML calls to the backend server). Our major concern with scripting development was code stability and code maintenance. Until scripting environments mature significantly to become as robust a development environment as typical IDEs, we figured that quality and costs were not acceptable for our development schedule.

In the end, we were left with smart-client development. Considering that our target was the development of a powerful syndication management console, we were comforted in our decision by the fact that as far as we know most management consoles for DBMS are smart client applications (albeit not HTTP ones). Although many backend systems are provided with browser-based tools, full-feature management consoles are all stand-alone applications.

We think we overcame implicit limitations of the smart client approach by developing in Java/SWT (operating system portability) and by providing browser-based setup environments (to simplify application deployment and setup).

Saturday, April 16, 2005

Open Search & Syndication

Open Search is web tool that support searches on multiple heterogeneous web data sources using a minimalist web query syntax and a minimalist XML/RSS schema for returning query result sets. Query syntax is based on query templates provided by web data sources when they register their URLs to the Open Search directory of compliant web data sources.

Clearly, the primary objective of this specification is the one of radically changing the way web searches are currently supported by search engines. Rather than relying on ever growing search repositories, web search platforms should perform concurrent searches on user-selected specialized searchable repositories. This way, multiple specialized search engines can be searched at the same time and their results coalesced into a combined XML document.

In the Open Search approach, RSS 2.0 (i.e., the format used for returning search result sets) is just the result set XML schema specification. In fact in most cases, the internal data structure of the heterogeneous web data sources can be anything and typically, it will be neither RSS, nor XML (nor is the semantics of web data sources linked to syndication).

Even though in Open Search, RSS is only a convenient format for returning search results, there are no intrinsic reasons for not using the Open Search specification for searching syndication content.

Blogs can be very conveniently searched using Open Search query syntax. However, should "pure" (i.e., syndication documents not tied to a blog) syndication content be searchable with Open Search ?

If a syndication management system is used to complement a Web Content Management system, it should definitely support the Open Search specification. In this context, an important extension to Open Search should be support of authenticated access. This would enable searches based on named subscriptions. We also believe that more advanced search capabilities (such as the use of the RSS category element, or restrictions on all RSS elements or attributes) would improve the quality of results. It should also be possible to return data source specific extensions to RSS in the result sets.

In our implementation of an enterprise syndication management system (HiT Syndicaat), we decided to Open Search-enable queries from the first release. This turned out to be a fairly light task because our URL-based query syntax was already richer than the minimalist Open Search query specification.

I am hoping to hear from HiT Syndicaat users about how useful they found our Open Search support.

Tuesday, April 12, 2005

Web Content Management & Syndication

With limited effort, WCM systems can be extended (and most of them already are) to return XML feeds linked to dynamically generated web content (see for instance the CMS Matrix at CMS Matrix for a complete roundup of RSS support in CMS). In most instances, when WCMs are in place, many of the functionalities of an RSS Syndication Management Systems (Syndication Management Systems) are provided by the WCM itself.

Typically, this is accomplished by creating an infrastructure for feed management (feed creation and editing) and an infrastructure for feed content generation. Basically, when creating a new node (web page) in the WCM, the same node can be assigned (automatically or manually) both to the web hierarchy and/or to the RSS feed collection of items. This approach is definitely appropriate (and probably the optimal one) whenever there is an almost one-to-one relationship between what is shown on a web page and what is referred to by an RSS item.
However in many situations, this is not the case because, compared to syndication, web content is fairly static over a period of time where syndication content is generally highly dynamic. Not only that, there are other mismatches between the two models:

- Syndication content lives long (see The Long Tail about the Long Tail approach). It should also be searchable for a long time. Typically, we do not have the same requirements for web content. Given the long life of syndication content, a syndication system should be provided with the means for letting external applications to query its content (see for instance the OpenSearch and the means to store its content indefinitely.

- Syndication content has limited graphical requirements. In many instances, syndication content could do without HTML. In other instances, when HTML content is present (typically pointed to by links in RSS items), editing of its graphical content should require only few strokes (possibly, from a set of pre-configured templates). For editors, item creation should be a simple yet powerful action.

- Management of syndication feeds should be highly dynamic. A powerful syndication system should support a dynamic model for the creation, editing and removal of feeds. Power users should be provided with the ability to affect the number of feeds and the choice of who is the potential target for the feed. Compared to typical usage patterns in web development, syndication feeds are closer to weblogs than to portals. Editors (not site administrators) are often empowered to create content and feeds.

- Another major difference in comparing RSS feed management to WCM, is the RSS ability to cater content from heterogeneous data sources compared to data coming from a central repository provided by WCM. RSS Syndication Management Systems (at least according to our view of RSS SyMS) should be totally programmable by means of REST XML APIs. This way, content could come from ERP systems, Office applications, databases, etc. Similarly, RSS content should be searchable and accessible to heterogeneous systems (not only the web browser) such as mobiles, multi-media players, iPods, etc.

Given the differences, we believe that a hybrid management system (i.e., WCM + Syndication Management System - that for simplicity we refer to as Syndication Content Management) should be considered having the following components:

- a light-weight content authoring GUI interface to manually edit content and create syndication HTML templates;
- a syndication management system;
- a set of web content aggregators (to simplify the creation of automatic content to syndicate);
- a query processor (compliant to the OpenSearch architecture) to make the syndication content searchable over the internet.