White Paper : An Introduction to GeoWebCache

In an interview with Susan Smith of GISCafe Arne Kepp provides a detailed overview of the history and functionality of GeoWebCache. The entirety of the interview is provided below.

GISCafe: What was the impetus behind the creation of GeoWebCache? The history?

The motivation behind GeoWebCache was the need for a caching solution that could easily be integrated with GeoServer. OpenGeo has been focusing on an enterprise quality, fully supported open source web application stack . GeoWebCache plays a key role in the OpenGeo Stack, sitting between OpenLayers, the JavaScript front end, and GeoServer, which serves the data from a variety of data sources like PostGIS, Oracle, ArcSDE and more. A tile cache ensures scalability and improves the user experience. Since GeoServer was written in Java it made sense to use the same language, both to provide tight integration and to make it easy to install on all major platforms.

Through Google Summer of Code, Chris Whitney was able to spend a summer creating what became known as jTileCache. It had very basic functionality, but also original ideas like using the Java Caching System to store image objects and compress them on the fly. Over the next nine months, that code was then reworked by OpenGeo into what is today known as GeoWebCache. Through external funding we were able to add native interfaces for Virtual Earth and Google Maps, making it easy to use those clients against data served by WMS servers.

Last summer the project benefited from another generous Summer of Code grant, allowing Marius Suta to contribute code that enabled XML configuration using XStream and a RESTful configuration interface. Google's Open Source Office has also funded OpenGeo to add streaming Google Earth support to GeoServer, and GeoWebCache benefited from this, gaining the ability to tile and cache KML placemarks and vectors.

Looking forward, we are continuing to break down some of limitations commonly associated with tile caches.

GISCafe: What is it's relationship to GeoServer?

GeoServer is sort of like an older sibling. GeoWebCache has benefited tremendously from the collective experience of the GeoServer developer community and the feedback from GeoServer users. They have had a strong influence on the design and helped identify bugs. Originally, GeoWebCache was intended to be a library for GeoServer, but since the WMS standard provides a very stable interface it turned out to be just as easy to develop a separate servlet.

GeoWebCache was first included as a plugin in GeoServer 1.7.1. In the 1.7.x series it basically has the same functionality as a standalone version of GeoWebCache, but the plugin is automatically configured and has a reduced footprint since it shares many libraries with GeoServer.

In the 2.x series of GeoServer, which is currently at at the alpha-stage, we have started going beyond what can be achieved using the standard HTTP requests. If a layer is added or reconfigured, the tile cache will be informed immediately through a callback interface and reevaluate any existing tiles. However, GeoWebCache also has a RESTful configuration interface that allows other servers to achieve the same effect.

GISCafe: What is new in the 1.1.0 release?

The main improvement in version 1.1.0 is the support for "modifiable parameters". Other tile caches have associated one layer name with one set of tiles, meaning they only consider the bounding box and the layer name of the request, the rest is defined by the configuration of the cache. GeoWebCache has for a long time supported a separate set for each combination of spatial reference system and output format.

The new release takes this one step further, allowing you to configure filters and specify what other parameters constitute a set of tiles. For example, you can now serve the same layer with multiple styles, you can apply CQL filters, or you can use the time and elevation parameters introduced in WMS 1.3.0. One of the filter types uses regular expressions, which are extremely flexible, and the other is written for matching floating point numbers.

One existing feature that many users appreciate is the automatic configuration of GeoWebCache from a WMS GetCapabilities document. The drawback with using this method has been that you could not specify additional projections or output formats. In 1.1.0 this problem has been reduced, if the configuration file and the GetCapabilities document have overlapping layer names the two configurations will simply be merged. In the long run we still hope to provide an AJAX interface to make this easy.

Very basic WFS caching is included in 1.1.0. The motivation behind this is that GeoServer's WFS supports zipped shape-files as an output format. These can be several hundred megabytes and very expensive to compute, so any public server should cache them. Again, you can limit what queries are allowed by using a regular expression, but this is definitely a feature that will be improved over time.

But most importantly, the 1.1.0 release lays a lot of the groundwork for future development. Key to this is the pluggable H2 database which stores meta information about tiles, so that it will now be possible to remove tiles that have not been accessed in a certain time period, or find tiles that have been outdated by a recent change.

GISCafe: Does it work with other open source software? Is it mostly used by developers?

GeoWebCache works great with any WMS compliant server, including Mapserver and deegree. But there are also a number of people who use it in front of ESRI products, Ionic and even custom WMS servers. On the client side it currently works with any software that can use the OSGeo WMS-C recommendations, including OpenLayers and uDig. There are also custom clients that use the Google Maps API.

The user base is anything from professional developers to home users. Based on the activity on the mailing list, my impression is that developers are actually the minority. Most questions appear to come from end users who own an existing WMS solution and wish to improve its performance or reduce costs.

While some understanding of WMS makes life easier, there are also those that do not want to deal with OGC services and use GeoWebCache so that they can access their data in Google Earth or use the APIs that Google Maps and Virtual Earth provide.

GISCafe: How does GeoWebCache speed up delivery of geographic data from OGC Web Services?

GeoWebCache acts like a proxy between clients and one or more WMS servers. When the client makes a request, GeoWebCache first checks to see whether it already has the corresponding tile. If not, the request is forwarded to the appropriate WMS server. When the response comes back, GeoWebCache first saves a copy (caches) and then forwards it to the client. The entire process adds only a few milliseconds to the time it takes to do the WMS request. Subsequent requests for the same tile are then answered in milliseconds using the copy, with the added benefit that this requires no resources on the WMS backend.

This improves the user experience and also opens up a number of new possibilities. For example, the response time becomes less important, so the WMS server can use more complex rules and render tiles that look better. You can also seed the cache in advance, using the built-in web interface, so that some or all tiles are cached before the instance is used in production.

GeoWebCache has been designed for speed and scalability. Even a laptop can serve tiles at several hundred megabits per second. I have come across blogs and emails where people assume that their instances would be limited to the throughput or seek times of their hard-drives, since this is where the tiles are persisted. But this is generally not the case, most modern operating systems have something called disk block caches. This effectively moves the most requested tiles into memory, so they can be accessed at much higher speeds. OpenGeo hopes that this increase in capacity will also allow data providers to make their data available to a wider audience.

GISCafe: What kind of relationship do you have with Google and Microsoft (Virtual Earth) and how do users benefit from using GeoWebCache with these geographic search engines?

In addition to Summer of Code, Google has contributed to both GeoServer and GeoWebCache by funding the development of three special output formats. Two of them are more closely related to Google Earth, namely raster super-overlays and regionated vectors. The first one works like the regular Google Earth background, improving the resolution of images as you zoom in. GeoWebCache can be used with any WMS server to achieve this effect.

The vector format uses OGC KML, and the key is that we gradually show more features as you zoom in. Developing code that automatically selects what items to show at what zoom level was a major undertaking. Both of these types of hierarchies can be cached using GeoWebCache. Google Earth requests a large number of tiles while you are zooming in or spinning the globe, so caching is crucial if you want to serve more than a few simultaneous clients.

The third format is what we call “geosearch” and most of the work is actually done in GeoServer. It is basically an XML sitemap, similar to those used for normal website. and KML files representing each feature or row in the underlying database. The KML is automatically generated from any backend that provides vector data. Googlebot reads the sitemaps and then fetches all the KML placemarks. It analyzes the description of each feature and its location. After a period of about two weeks your data then becomes visible as a user-contributed placemark on maps.google. GeoWebCache's role here is to provide fast access in case the person searching wants to view the entire dataset or download the data as a shape-file.

We have registered that Microsoft is also entering the domain of searching geospatial information, but so far we have not found documentation on how to contribute to the index. So in this sense our relationship is limited to providing an easy to user interface for Virtual Earth JavaScript developers. People who use the SilverLight edition of Virtual Earth have also used GeoWebCache to publish data from WMS servers, it turns out the SilverLight code uses the same tile indexes as Google Maps.

GISCafe: What are your goals for the product?

The goal is to make tile caching as unobtrusive and easy to use as possible. I hope that the WMTS standard> that OGC is working on will make it easier to develop clients and share tiles across applications. On the server side I want to make the cache more dynamic, to automatically expire tiles that are no longer accurate. This is particularly important to OpenGeo, one of our primary goals is to create software that lets end users contribute and edit geospatial information through their web browsers. The term we use for this is “wikiable maps”, it is based on WFS with transactions, but maintains multiple versions of the same data.

On the enterprise side of things we are actively looking for clients to fund features that are particularly important for large users. These include clustering, with lateral cache synchronization, for increased scalability and reliability. We would also like to develop tools that make it easier to maintain a cache and gather detailed statistics about usage. To get there I have made a list of menu items that we hope clients will fund. That said, GeoWebCache is an exciting platform with a lot of possibilities, things we have listed only represent a small subset of what can be done.

Other White Papers

Geospatial, An Open Source Microcosm

Open source has seen great success in general information processing, but does it have a future in vertical markets? In this article, we examine how geospatial open source provides an example of the market challenges of a mid-sized vertical market.

Read more...

OpenGeo Sensor Web Enablement (SWE) Suite

the Open Geospatial Consortium (OGC) has been engaged in developing a set of standards for web-enabling sensors and sensor observations. Version 1.0 of the Sensor Web Enablement (SWE) standards were approved and released. Versions 2.0 of these standards have either been approved, or will be approved by Fall.

Read more...

The OpenGeo Suite Enterprise Edition

This paper outlines how the OpenGeo Suite Enterprise Edition augments the innovation of open source software communities with the testing, certification, and maintenance necessary to create and maintain reliable, long-term enterprise production web services.

Read more...

The OpenGeo Architecture

The OpenGeo Suite is built from several open source projects (OpenLayers, GeoWebCache, GeoServer, PostGIS) that each provide distinct functionality. This paper explains what each component does and how they interact with other components.

Read more...

An Introduction to GeoWebCache

GeoWebCache is gaining popularity as enterprises look to accelerate their online maps. In this interview, Arne Kepp, the project founder and OpenGeo team member, provides historical background and technical details.

Read more...

Caching to Improve GeoWeb Reliability

The SDI model of distributed service providers can fall apart when services or connectivity are unreliable. National infrastructure providers can increase SDE reliability by providing a maintained caching infrastructure on top of distrobuted services.

Read more...

GeoServer in Production

GeoServer in a production environment can be evaluated according to three criteria: reliability, availability, and performance. This paper discusses methods for implementing production grade GeoServer deployments.

Read more...

Open Geocoding from OpenGeo

OpenGeo proposes to develop the first-ever robust, enterprise-ready, open source geocoding solution, and is looking for partners to provide feedback on requirements as well as project funding.

Read more...