Interactive Maps Without Map Servers

On 30 September 2015, I gave a talk to the Southern Maryland GIS User Group titled "Interactive Maps Without Map Servers." This post consists of the slides from that presentation interspersed with my "talk track" in order to provide context.

Today, I’m going to discuss publishing interactive maps without using a map server. I’m going to do this by focusing on a specific case study for one of our customers, the US Commission on Civil Rights. This example is fairly simple and I chose it for its ease of illustration for today’s talk. Before I get started, I think it’s necessary to clear up some terminology.

First, what do I mean by “interactive?” I'll start by explaining what I don’t mean.

I don’t mean ArcMap in a browser. When I talk about “interactive,” I am talking about fairly focused web mapping applications that are designed to enable users to answer one or two specific questions. The map should support this by providing some geographic richness to the experience but shouldn’t attempt to deliver full-blown GIS capability. There are metrics that support a focused approach over a kitchen sink approach, so today’s talk will look at single-use or narrow-use applications.

The second thing I need to clarify is what I mean by “map server.”

I really mean anything other than your vanilla web server. That could mean ArcGIS Server, GeoServer, MapServer, or custom middleware you develop yourself. It is perfectly feasible with today’s technology to deliver interactive maps without the use of any middleware and that’s what we’ll explore today.

With those terms clarified, let’s jump into our case study. I’ll be focusing on our monthly update workflow in this talk, but it’s important to understand that there is some initial project setup required. In this case, the setup included the creation and design of the site content such as HTML, CSS, folder structure, Javascript libraries, data file locations, and other standard parts of any web site. It also included creating some back-end data structures in PostGIS that will help with data processing, such as data tables and views. For this project, I didn’t need to do anything really exotic in the database.

This is the purpose of the site. As we go along, you may notice that the data doesn’t appear to be normalized by population and you would be correct. We are still working with the user to educate them on the importance of doing this and we expect to get their buy-in soon.

Here is our basic data flow. We receive formatted Excel files from the user and manually prep them for conversion to CSV. The prep involves stripping out some of the formatting and formulas, and then exporting the undecorated data to CSV. We then import the CSV to PostGIS, join it up with geometries, export it to GeoJSON, and publish the updated site to the web.

The tool set we use for publication is fairly minimal. I’ve already discussed Excel and PostGIS and I’ll get into more detail about we used PostGIS in upcoming slides. GDAL, really OGR, is used to export the data from PostGIS to GeoJSON. I could do all of that in the database directly but I prefer to use OGR because I find it easier to script the process that way. Leaflet is the Javascript mapping library that we use for visualization on the web site. It natively understands GeoJSON so it was a good fit. D3 is a charting library that we use for the charts on the site. We happen to post our updated site to GitHub Pages, but I use a local NGINX instance for QA/QC before posting. Any standard web server should suffice, however.

Don’t worry, you didn’t blink and miss a slide. I’m jumping in with Step 2. As I mentioned, Step 1 is simply formatting an Excel document and exporting to CSV. For the sake of time, I’m not going to discuss that step in detail. I’m picking up with getting the data into PostGIS.

PostGIS is a spatial data extension for the PostgreSQL relational database. PostgreSQL has a nice GUI admin tool called PgAdmin, which provides a helpful interface for importing the CSV into the database. The import tool is a pretty wrapper around the PostgreSQL COPY statement that can be issued from the command line. Either way is perfectly valid. I’m showing PgAdmin because it makes a better looking slide. It’s worth noting that I designed the database table as part of my initial project setup so it’s just sitting there waiting for data. When I import the CSV data, I am always appending to the existing table, so it contains the data for all monthly updates. You will also notice that I am not attaching any spatial data here.

One huge advantage of using a relational database is that it gets us out of “shapefile mode” where we have multiple working copies of geometries living throughout our organization. I long ago adopted as a best practice storing spatial data in a single reference table, with the only attributes being common identifiers that can be used as join columns. In this case, I am using US state boundary polygons that are attributed with state name, state abbreviation, and FIPS code. I then use views to dynamically join my detailed data to the geometries as needed. In this case, the diagram shows that I am joining the complaint data to the state polygons using the state abbreviation.

My database view does other work for me besides just joining the data. Views can be used to do myriad processing and analytic functions. It this case, my view contains aggregate functions (not shown here) that calculate the year-to-date totals required for the web map. The user requires the map to show year-to-date totals for each complaint type in addition to total complaints. Those totals are never stored as they would double the size of the data. I’m now ready to export the data.

I use GeoJSON to drive the published web map, so the next step is to export the current month’s data to GeoJSON.

If you’re not familiar with GeoJSON, it’s a widely adopted community standard for encoding geographic features in Javascript Object Notation for use in web applications. JSON is the native object encoding of Javascript so it’s readily understood by Javascript code. GeoJSON is specifically understood by the Leaflet mapping library that drives the web map being produced here. As a result, the map makes the use of attributed vector objects in the browser.

At the beginning of this talk, I mentioned that I use OGR to export GeoJSON from the PostGIS database. OGR is a command-line utility that converts between many different data formats. The command line here specifies my output as GeoJSON and provides the output file name. It also provides my database connection information in the quoted string following the ‘PG:’ in the command. Finally it specifies the specific SQL to be executed, with the results being dumped to the output GeoJSON file. This is where the view I previously discussed comes into play. Here, I tell it to only give me data for August of 2015. As a result, I’ll get a GeoJSON record for each US state containing its geometry and joined to only the data for the latest update. I run this command in the data directory of my web site so the data is exported in the proper location.

In practice, I generate two GeoJSON files and CSVs for each update. As a result, the above statement is run from a script with the month and year passed in as arguments. For the sake of time and simplicity, that isn’t shown here.

After all of the work done performing data prep and export, the publication process itself can seem a bit anti-climactic.

To publish the updated site, I just need to update the site configuration file, which is written in Javascript, and then copy the files to the correct location on the web server. We use GitHub Pages, so our “copy” is really a “git push.” If we were using a regular web server, however, we could simply use a file copy or FTP operation to publish the site. I’ll take a moment here to remind you that all of the HTML, CSS, and Javascript content was prepared as part of the initial project setup. So I’m simply updating data and configuration files each month but there was a bit of design work at the outset. Let’s talk about those configuration settings.

Here is the configuration file in its entirety. For publication, I update the month and the year everywhere they appear in either text or numeric form. So, to update for August, I changed every “July” to “August” and every “07” to “08”. Next year, I’ll replace “2015” with “2016”. The site’s internal scripts read these values and the site self-configures accordingly. So there’s not any heavy Javascript knowledge needed for this part. With these updates made, I copy the updated site files to their home on the server, after doing some local QA/QC.

Here’s a screen shot of the resulting site. You can see the cursor over Maryland and the current and year-to-date totals in the information box. Pan and zoom tools help hone in on areas where the data may be a little crowded. You can go the http://zekiah.github.io/usccr/ to see the full site in action, including the D3 charts.

Let me explain what’s going on here since it looks like any other web map, which is the point. The data that you are seeing is being streamed from the GeoJSON files that are sitting on the web server’s file system. All of the interactivity is being handled in the browser with no calls back to any map server or middleware. This is a pretty simple application. Leaflet has a lot of additional capability that we didn’t need to touch here, including a rich community of plug-ins. So the whole site lives self-contained on the web server with no additional hooks to other platforms.

So I’ve explained what we did and how we did it, but I haven’t really gotten into why you would do it this way. I’ll start by saying that this approach is not a panacea and that you shouldn’t go ditching your map servers just yet. This approach is one more way of publishing interactive maps that may be right in certain situations.

Here are a few of the advantages of this approach. The architectural simplicity really can’t be argued. Ultimately, you just need a web server. If you’ve ever set up, configured, and maintained any map server software, this is probably attractive to you.

Security is also another potential benefit. Most map servers accomplish their work by maintaining some form of live connection back to source data, such as databases or internal file systems. Unless you’ve got a large budget for IT architecture, your map server is probably reaching back to your working data, which means there’s a tunnel from the internet to your working data that could be potentially exploited. With the approach I’ve outlined today, that simply doesn’t happen. Data that you’ve already approved for release sits on your public web server with no connection back to your working data. A really good hacker may still find a way in but at least you’re not leaving bread crumbs.

Scalability is another potential advantage. Web servers and HTTP were designed around serving static content. All of the work over the years with web services and application servers and such have been attempts to make dynamic data look like static resources for the web. There are still many cases where that is necessary but, if you can get away with using static content, you can let web technologies do what they were designed to do. It’s fast and very scalable.

I also like the potential for configuration control. If you’ve ever gotten caught up in geodatabase versioning, you probably still have the scars. Esri’s approach isn’t great but they are still the only one to tackle the versioning problem. With the approach I’ve discussed, your published data becomes static, text-based content that you can configure and store in systems that were designed to perform that role, such as git, Subversion, or Team Foundation Server.

There’s also one other advantage that was very attractive to our resource-constrained customer:

I didn’t mention cost first because I’m not sure it should be the primary driver, but the approach I described here had no additional infrastructure cost. If the user were using a Windows server with IIS, there would be licensing costs involved but this approach wouldn’t add to them. Cost is really just one more way that this approach is streamlined.

As I mentioned, this approach is not a panacea. As with any technology implementation, it needs to be evaluated for applicability. Here are a few, but far from all, considerations to keep in mind. If you think of your work as a publication process, you may want to consider taking this approach. That means if you can visualize the steps to take to prepare and complete a finalized mapping “product,” you may want to think about doing so without map servers.

If you use your online maps to visualize highly transactional data, such as maintaining situational awareness of ongoing field data collection, this approach may not be right for you. Intermediate map server technology will probably help you with some of the heavy lifting.

If your data or products refresh frequently, such as multiple times a day, you many also want to consider sticking with map servers. That said, if your process is well defined and can be fully scripted, then running scripts using cron jobs or scheduled tasks may accomplish what you need without a map server.

The last point speaks to security again. If you want separation between your working environment and your finished products, the approach I’ve discussed here accomplishes that. Here’s where a little history is in order. Twenty years ago, your GIS production systems lived in your shop and your products were hard copy maps. The product was isolated from the GIS. Ten years later, that product was probably a PDF but the separation was still maintained. Around the turn of the millennium web mapping became popular but, to accomplish it, specialized map servers were needed. These servers broke the separation between the working and production environments, leading to a lot of architectural complexity to safeguard the link. Current web mapping technology has restored the possibility of maintaining a separation between working environments and production environments while still delivering a meaningful, interactive map experience. It’s something that should added into our toolsets as a design option.

As always, the primary requirement to make such a technical decision is a detailed understanding of your own workflows, data, and products. It is no longer a question of technology. Sufficient technology now exists to make the use of static geospatial content a valid approach if it is appropriate to your needs.

This post was written by:
Bill Dollins
Senior Vice President

For more information on this post or Zekiah's geospatial integration services, please e-mail us at contact@zekiah.com