Yet Another GIS Blog
GIS, Geography, Programming, and Neogeography

NoSQL - the next step in GIS?

Thursday, 17 June 2010 15:49 by boxshapedwo

Hopefully I won't make this posting too rambling.  Full Disclosure:  I am not a database expert of any kind.  For the most part, all GISers are probably familiar enough with what a database, or more specifically a Relational Database is.  Pretty much any entry-level GIS textbook discusses it, and you might even use one like the personal geodatabase in ArcGIS or something more sophisticated like Oracle Spatial.  The idea behind the relational database (and I realize I will butcher this) is that you have a series of tables, and those tables are related to one another throu primary keys and foreign keys stored in those tables.  There is a movement, and I'm not sure how long it has been going, for a non-relational data storage system that has been called NoSQL.  Google uses a NoSQL data store called BigTable, and Facebook uses something called Cassandra by Apache.  NoSQL is a bit all-encompassing, because it really covers any of the data stores that are not the classic Relational Databases.  These cover a range of things that I don't really want to get into.  The two that are of most interest to me are the Document Store and Graph Databases.  Document stores are cool because they store, well, documents rather than tables.  Each document can have it's own set of properties and values, and you aren't tied to a database schema.  Typically the documents are stored in a format like XML or JSON (Javascript Object Notation).  It isn't hard to make a leap of storing GIS geometry in a Document-oriented Database, because there already exists a specification called GeoJSON.  Personally, I find it freeing not to be tied to a database schema, and find it difficult to design them.

 But now for the meat of this post - Graph Database.  If you don't know a graph is not a chart.  A graph is a mathematical structure to model relationships.  We GISers are most familiar with its form as a network, or transportation network.  Graphs are made up of nodes, or vertices, and edges that connect nodes.  Importantly, an edge may have direction or no direction.  For example, node1 and node2 are mutual friends and are just connected, or node1 considers node2 a friend but node2 doesn't consider node1 a friend.  As you can probably guess, graphs are used extensively in social network analysis.  A graph database is a database that stores data as a graph, or I suppose multiple graphs.  The emphasis is on the relationship between the nodes of data.  Personally, I think this type of database is the obvious direction that spatial-enabled databases should take.  A lot of our spatial analysis tasks involve searching the relationships between data.  This could really expand those functions, and potentially make them quicker.  There are at least two areas that come to mind when I think of these possibilities.  One is topology.  What is topology to us but the relationship between different geometries?  Here is graph of the topological relationship of some theoretical data:

 

 One thing that might be obvious from this is that we are used to separating out our polygons into different tables or shapefiles that group our data.  At a higher level geometry is grouped by type: polygon, point, and polyline.  But with the graph database that wouldn't be necessary and we would be able to search for data based on their relationship with each other.  This presents new analytical possibilities because data is no longer separate.  See Tim Berners-lee's Ted talk for more info about linked data.

The other possibility that I see with this, is relationships between metadata.  Metadata in a GIS is boring.  Yes it is important, but no one seems to use it, and it is tedious to create.  FGDC is a pain.  Metadata through relationships sounds a lot more interesting to me.  Searching for related information by who it was created, regions/areas, or temporality could be really useful.  

Anyway, those are my thoughts on how NoSQL should be the next step in the GIS world.

 


 
 

Spatialite

Thursday, 14 May 2009 18:00 by boxshapedwo

I've noticed more and more people talking about SQLite and in particular Spatialite.  This is typically surrounded by talk about the hopeful end of shapefiles as a defacto exchange format.  I suppose the success of shapefiles has mostly been because of the openess, so every proprietary GIS software was able to support it.  Now pretty much everything spatial is available in shapefiles.  In typical government fashion the U.S. Census Bureau is just now releasing their data as shapefiles instead of Tigerline files...that's why it seems appropriate to move to another format now.  Is there any greater sign of the death of technology when a government takes it on board?  When I worked on the FEMA map modernization one of my greatest frustrations was that they required ESRI coverages to be submitted with the final product.  Not only is ESRI stopping support for coverages, it requires an ArcInfo license...and even then you have manually add the coverages toolbox.  Of course, if you want to try and edit a coverage you have to go back into ArcInfo workstation.  Anyhoo, that's not relevant to this entry.  I decided to take a look at Spatialite and see what it was about.  First lesson learned - it's spelled with one L. 

So what is the beauty of this format?  It doesn't install ANYTHING. Even the gui interface can just be run from the exe file.  This makes it really really easy to get started.  So I created a simple spatialite db and imported some Open Street Map shapefiles which contained thousands of polygons and polylines to check out the stability.  I thought I'd start with some simple SQL commands and started with AREA for the polygon geometry.  Well this crashed the gui without any explanation.  So then I moved to IsValid (this uses the GEOS API) to see what was wrong with the polygon geometry, and this crashed the gui again.  I switched at this point to the commandline thinking the gui just couldn't handle the amount of data I was using.  At least when the commandline came up with an error it told me what the error was and where it was stopping.  The first thing I discovered was I had a dozen or so polygons that were not closed (start and end point did not match).  I extracted these and moved them to another table using this select statement:

CREATE TABLE buildings_err
AS SELECT * FROM buildings WHERE IsClosed(ExteriorRing(Geometry)) = 0
DELETE FROM buildings WHERE IsClosed(ExteriorRing(Geometry)) = 0

After this it still would crash if I ran IsValid.  So I tried this Select statment to extract all the polygons that had less than or equal to three points creating it:

INSERT INTO buildings_err SELECT * from buildings where NumPoints(ExteriorRing(Geometry)) <= 3;
DELETE from buildings where NumPoints(ExteriorRing(Geometry)) <= 3;

Finally I then removed all the invalid polygons:

INSERT INTO buildings_err SELECT * from buildings where IsValid(Geometry) <= 0;
DELETE from buildings where IsValid(Geometry) <= 0;

 

And then ran an itnersect with another polygon table (cleaned that one up as well):

 SELECT *, intersection(buildings.Geometry, natural.Geometry) FROM [buildings], [natural]

 This hung up on me too.  I'm not sure where all the polygon errors were coming from, either the shapefile, or the import has issues.  The GEOS api seemed pretty sensitive to these errors as well.  So I gave up on trying after it crashed and moved on.  One thing I discovered is the geometry column, and thus the table, can only store one type of Geometry.  This extends to MULTI as well.  So you have to separate your POLYGONs from your MULTIPOLYGONS or I guess explode the multipolygons to polygons.  I think that might get confusing...Note the David in this google group is not me.

Finally, after all this I thought to myself - "Hey wait a second, AIR uses a SQLite database!"  But the SQLite functionality is all built-in so you can't swap dlls, and actionscript doesn't reference external libraries unfortunately.  So I tried the handy built-in SQLite function load_extension, but this too is not apart of the AIR SQLite functionality!!  KAAAAAAAHHHHHNNNNNNN!!!!!   Sorry for the Star Trek reference.  So no go to use a Spatialite database with FLEX/AIR.  And it is sort of moot anyway because I don't know any commercial GIS that fully supports Spatialite.  I think FME does...ESRI certainly doesn't.  So thus was my foray into Spatialite.  Still think it is pretty cool, but I'm a visual person and need to see the data not just type in a few sql statements.

Tags:  
Categories:   GIS | Spatial Database
Actions:   E-mail | del.icio.us | Permalink | Comments (6) | Comment RSSRSS comment feed

Correction Manifold and SQL Server 2008

Saturday, 29 November 2008 18:59 by boxshapedwo
I misunderstood about Manifold and SQL Server 2008. It sounds like you actually need the Enterprise edition of the product to use the native spatial data type...A point for MapInfo on this one, minus 1 for Manifold....If anyone is keeping a tally.

You've Got the Points - Now What?

Saturday, 29 November 2008 15:34 by boxshapedwo
So in the last post I showed how I got some points into SQL Server 2008. Not the most elegant bit of code (I'm a dabbler not a developer), but it worked. Now what can you do with it. Besides the many Transact SQL (Microsoft own brand of SQL) methods available for interacting with spatial data (see OGC ones here), you could create your own GeoRSS feed. I'm fairly new to creating RSS feeds, so this might not be exactly correct, but you may find the query useful. The key here is adding a reference to Microsoft.SqlServer.Types. Then you can actually use their geometry type and the methods associated with it to retrieve your results.

This is all in an generic handler (*.ashx). The Imports:

Imports System
Imports System.Web
Imports System.Xml
Imports System.Data
Imports System.Configuration
Imports System.Collections
Imports System.Web.Security
Imports System.Web.UI
Imports System.Web.UI.WebControls
Imports System.Web.UI.WebControls.WebParts
Imports System.Web.UI.HtmlControls
Imports System.Data.SqlClient
Imports System.Net
Imports System.Threading
Imports Microsoft.VisualBasic
Imports System.IO
Imports Microsoft.SqlServer.Types

And the Class:
Public Class geo_content : Implements IHttpHandler Public Sub ProcessRequest(ByVal context As HttpContext) Implements IHttpHandler.ProcessRequest context.Response.ClearHeaders() context.Response.Clear() context.Response.ContentType = "text/xml" context.Response.ContentEncoding = System.Text.Encoding.UTF8 context.Response.AddHeader("ContentType", "text/xml; charset=utf-8") Dim xmlWriter As New XmlTextWriter(context.Response.Output) xmlWriter.WriteStartDocument() xmlWriter.WriteStartElement("rss") xmlWriter.WriteAttributeString("version", "2.0") 'xmlWriter.WriteStartElement("rdf:RDF", "http://www.w3.org/1999/02/22-rdf-syntax-ns#") 'xmlWriter.WriteAttributeString("xmlns", "rdf", Nothing, "http://www.w3.org/1999/02/22-rdf-syntax-ns#") xmlWriter.WriteAttributeString("xmlns", "geo", Nothing, "http://www.w3.org/2003/01/geo/wgs84_pos#") xmlWriter.WriteStartElement("channel") xmlWriter.WriteElementString("title", "geoContentRSS") xmlWriter.WriteElementString("description", "latitude and longitude from sql server 2008") xmlWriter.WriteElementString("language", "en-US") Dim con As SqlConnection = Create_Connection() If con.State = ConnectionState.Closed Then con.Open() End If Dim geoReader As SqlDataReader = Create_Geo_reader(con) Try Dim q As String = """" If Not IsNothing(geoReader) Then If geoReader.HasRows Then Do While geoReader.Read xmlWriter.WriteStartElement("item") Dim geom As New SqlGeometry geom = CType(geoReader("Geom_Data"), SqlGeometry) Dim lat As Double = CType(geom.STY, Double) Dim lng As Double = CType(geom.STX, Double) Dim id As Integer = CType(geoReader("GeoID"), Integer) Dim name As String = CType(geoReader("Name"), String) xmlWriter.WriteStartElement("title") xmlWriter.WriteCData(name) xmlWriter.WriteEndElement() xmlWriter.WriteStartElement("description") xmlWriter.WriteCData(Name) xmlWriter.WriteEndElement() xmlWriter.WriteElementString("link", "http://www.boxshapedworld.com") xmlWriter.WriteElementString("id", id) xmlWriter.WriteStartElement("source") xmlWriter.WriteAttributeString("url", "http://www.boxshapedworld.com") xmlWriter.WriteString(name) xmlWriter.WriteEndElement() xmlWriter.WriteElementString("geo:lat", CStr(lat)) xmlWriter.WriteElementString("geo:lng", CStr(lng)) 'close item element xmlWriter.WriteEndElement() Loop End If End If If con.State = ConnectionState.Open Then con.Close() End If Catch ex As Exception MsgBox(ex.ToString) con.Close() End Try xmlWriter.WriteEndElement() xmlWriter.WriteEndElement() xmlWriter.WriteEndDocument() xmlWriter.Close() End Sub Public ReadOnly Property IsReusable() As Boolean Implements IHttpHandler.IsReusable Get Return False End Get End Property Private Function Create_Connection() As SqlConnection Try Dim Connection As New SqlConnection(System.Configuration.ConfigurationManager.ConnectionStrings("baseBSWstr").ConnectionString) Return Connection Catch ex As Exception MsgBox(ex.ToString) Return Nothing End Try End Function Private Function Create_Geo_reader(ByVal Connection As SqlConnection) As SqlDataReader Dim programReader As SqlDataReader Dim sqlStatement As String Dim command As New SqlCommand Dim dt As New DataTable("geograph_data") sqlStatement = "SELECT * FROM geograph_data WHERE [GeoID] <> @negGeoID ORDER BY [Name]" Try If Connection.State = ConnectionState.Closed Then Connection.Open() End If With command .CommandText = sqlStatement .Parameters.AddWithValue("@negGeoID", -9999) .CommandType = CommandType.Text .Connection = Connection End With programReader = command.ExecuteReader Return programReader Catch ex As Exception MsgBox(ex.ToString) Connection.Close() Return Nothing End Try End Function End Class
This starts by creating an xmltextwriter that is set to the context output stream. I add the xml namespace for geo. There is a separate function to create the connection string, and a separate function to create the sqldatareader. In this function you see a simple SELECT statement with parameters for the where value. This is returned and the geoReader starts to cycle through each row. A variable called geom is used to house the geometry, and you can see that it is of a type SqlGeometry. Using this type gave access to the STY and STX methods. Again, not the most elegant bit of code, but it works.

SQL Server 2008 Express

Thursday, 27 November 2008 12:08 by boxshapedwo

I've finally gotten a chance to play around with SQL Server 2008 express edition. In case you haven't heard 2008 introduced native spatial support. I'm not sure how the internals store spatial data but you can add geometry (or geography) data types as Well-Known Text or Well-Known Binary. Well-Known Text is pretty easy to understand and work with. It's pretty easy to work with SQL Server using the Management Studio Express Edition. If you have MapInfo 9.5, or Manifold version 8 you can directly connect to express edition and edit geometry that way. I don't think there is support for this in ArcGIS 9.3 without SDE (personal or otherwise) which means you probably need an ArcEditor license. I happened to have an XML file with latitude and longitude locations in it that I used to store locations for a mini (personal) google maps project. Using VB.NET 3.5 I could easily connect to both the xml file and sql server to add the geometry as point locations.

Here is the code that I used. It is server side code, with the connection string stored in the web.config file.

Dim xDoc As New XmlDocumentxDoc.Load(Server.MapPath("cities.xml"))

 Dim cNodes As XmlNodeListcNodes = xDoc.GetElementsByTagName("marker")

Dim i As Integer = 0

Dim con As Data.SqlClient.SqlConnectioncon = New Data.SqlClient.SqlConnection(ConfigurationManager.ConnectionStrings _

("baseBSWstr").ConnectionString) For Each XNode As XmlNode In cNodes

Dim ac As XmlAttributeCollection = XNode.Attributes

Dim lat As Double = CDbl(ac.Item(0).InnerText.ToString)

Dim lng As Double = CDbl(ac.Item(1).InnerText.ToString) Dim name As String = CStr(ac.Item(2).InnerText.ToString)con.Open()

Dim cmd As New SqlCommand("INSERT INTO geograph_dataVALUES (" & i & "," & "geometry::STGeomFromText('POINT(" & lng & " " & lat & ")', 0), '" & name & "')", _   con)'cmd.CommandText = "INSERT INTO geograph _

dataVALUES (" & i & "," & "geometry::STGeomFromText('POINT(" & lat & " " & lng & ")', 0), '" & name &  "')"cmd.ExecuteNonQuery()con.Close()i += 1

Next
This was just run on my hard drive with a local instance, so I wasn't too concerned about security. From what I understand using parameters is meant to be more secure. Here is an example.

Now if you want to play around with sql server 2008 with just management studio here are some tutorials.
Boston GIS
JasonFollas - this describes the difference between geometry and geography...pretty good series.
Developer Fusion
MSDN