Managing a large number of entities with inter-dependencies is usually easier when you have some kind of visual representation.

I was faced with this challenge recently with QlikView scripts. A client has in place a particularly large QlikView deployment, with hundreds of QVDs, QVWs, and various data sources.

I needed some kind of visual representation of which QVWs were reading which data sources, reading and writing to which QVDs and so on, to help me maintain a logical structure. For the sake of speed, I fell back on my core coding skills: C#.

After creating a new console app in Visual Studio, the first thing was to get hold of some files:

public interface IFileScanner { FileInfo[] GetFiles(); }

I used an app.config file to store the source directory of files to analyse, and retrieved all .qvw files:

public class QlikViewFileScanner : IFileScanner
{
	public FileInfo[] GetFiles()
	{
		FileInfo[] files = null;

		if (ConfigurationSettings.AppSettings == null
			|| ConfigurationSettings.AppSettings["directory"] == null)
		{
			Console.WriteLine("Directory setting in App.config not found.");
		}
		else
		{
			files = new DirectoryInfo(ConfigurationSettings.AppSettings["directory"]).GetFiles("*.qvw");

			if (files.Length == 0)
			{
				Console.WriteLine("No QVW files found in " + ConfigurationSettings.AppSettings["directory"]);
			}
		}
		return files;
	}
}

Then extract the loader scripts from the QVW files:

public interface IScriptExtractor { IList<Script> getScriptsFromFiles(FileInfo[] files); }

After adding QlikView.exe to project references, we can then get hold of the embedded Interop.QlikView.dll, which provides a .NET wrapper around QlikView:

public class QlikViewScriptExtractor : IScriptExtractor
{
	public IList<Script> getScriptsFromFiles(FileInfo[] files)
	{
		var scripts = new List<Script>();
		Type t = Type.GetTypeFromProgID("QlikTech.QlikView");
		QlikView.Application app = (QlikView.Application)Activator.CreateInstance(t);

		foreach (FileInfo file in files)
		{
			app.OpenDocEx(file.FullName, 1, false, null, null, null, true);
			scripts.Add(new Script(file.Name, app.ActiveDocument().GetScript()));
		}
		app.Quit(1);
		return scripts;
	}
}

I then want to analyse the loader scripts, and extract a list of nodes and vertices, representing dependent QVWs, QVDs and data sources.

Note: Sorry if I randomly interchange the word ‘node’ with ‘vertex’, and ‘edge’ with ‘arc’ throughout – they are synonyms, and I’ve never decided which I prefer!

For example, nodes could represent:

  • A QVW
  • A QVD
  • A database table
  • An Excel spreadsheet
  • A web service

Arcs represent the dependency between them. For example, if a QVW loads from a QVD, there is a directed arc from the QVW to the QVD.

So what I want is:

public interface INodeArcScanner { IList<Node> GetNodesFrom(IList<Script> strings); }

I wanted a node class to represent one relationship: itself, containing an arc to another node:

public class Node
{
	public string NodeName { get; set; }
	public Arc Arc { get; set; }

	public Node(string nodeName, Arc arc)
	{
		NodeName = nodeName;
		Arc = arc;
	}

	public string FromNode
	{
		get
		{
			return Arc.ArcDirection == ArcDirection.From ?
				NodeName :
				Arc.ToNode;
		}
	}

	public string ToNode
	{
		get
		{
			return Arc.ArcDirection == ArcDirection.From ?
				Arc.ToNode :
				NodeName;
		}
	}
}

I decided to use regular expressions to extract dependencies from loader scripts. So I created the following class for doing different types of RegEx matching to detect a dependency (node and arc):

public class QlikViewNodeArcMatcher : INodeArcMatcher
{
	private readonly string _regEx;
	private readonly string _matcherName;
	private readonly ArcDirection _arcDirection;
	private readonly ArcType _arcType;

	public QlikViewNodeArcMatcher(string regEx, string matcherName, ArcDirection arcDirection,
		ArcType arcType)
	{
		_regEx = regEx;
		_matcherName = matcherName;
		_arcDirection = arcDirection;
		_arcType = arcType;
	}

	public string RegEx	{ get { return _regEx; }}

	public string MatcherName { get { return _matcherName; }}

	public ArcDirection ArcDirection { get { return _arcDirection; }}

	public ArcType ArcType { get { return _arcType; }}
}

The difficult bit was crafting the regular expressions to match various dependencies – I’m a real amateur when it comes to RegEx. I came up with the following, which I’m sure will be offensive to RegEx pros:

Note: These RegExes are seat-of-the-pants and still under development – this is still just a proof of concept stage!

matchers.Add(new QlikViewNodeArcMatcher(
	@"FROM([ a-zA-Z_0-9\$\(\)\n\r\.\[\]\\]*?)\.qvd",
	"LOAD from QVDs",
	ArcDirection.From, ArcType.LoadFromQvd));

matchers.Add(new QlikViewNodeArcMatcher(
	@"FROM([ a-zA-Z_0-9\$\(\)\n\r\.\[\]\\]*?)\.xls|\.xlsx",
   "LOAD from Excel",
	ArcDirection.From, ArcType.LoadFromExcel));

matchers.Add(new QlikViewNodeArcMatcher(
	@"BINARY(.*?)\.qvw",
	"LOAD from QVWs (Binary)",
	ArcDirection.From, ArcType.LoadFromBinary));

matchers.Add(new QlikViewNodeArcMatcher(
	@"STORE([ a-zA-Z_0-9\n\r\*]*?)INTO(.*?)\.qvd",
	"STORE into QVD",
	ArcDirection.To, ArcType.StoreIntoQvd));

matchers.Add(new QlikViewNodeArcMatcher(
	@"SQL SELECT(.*?)FROM([ \n\r]?)([a-zA-Z_0-9\.\$\n\r;]+?)([ \n\r]+?)",
	"SQL SELECT from table",
	ArcDirection.From, ArcType.LoadFromSql));

Extracting nodes and arcs from scripts is then just a matter of:

private static IList<Node> RegexNodeMatch(INodeArcMatcher matcher, Script script)
{
	IList<Node> nodes = new List<Node>();

	Regex regex = new Regex(matcher.RegEx, RegexOptions.Singleline | RegexOptions.IgnoreCase);
	var matchCollection = regex.Matches(script.Content);

	Console.WriteLine("--------" + matcher.MatcherName + "--------");
	foreach (Match match in matchCollection)
	{
		Console.WriteLine(match);
		nodes.Add(new Node(script.SourceName, new Arc(match.ToString(), matcher.ArcDirection, matcher.ArcType)));
	}
	Console.WriteLine();
	return nodes;
}

We now have a list of nodes (QVWs and QVDs) with arcs. Great, but not much use until we can see ’em! So, lets map them:

public interface INodeArcMapper { void mapNodes(IList<Node> nodes);}

I used QuickGraph to create a ‘dot’ file, which can be consumed by the very handy GraphViz. So, after adding QuickGraph.dll and QuickGraph.Graphviz.dll:

public class QuickGraphNodeArcMapper : INodeArcMapper
{
	public void mapNodes(IList<Node> nodes)
	{
		var graph = new AdjacencyGraph<String, Edge<String>>(true);

		foreach (var node in nodes)
		{
			if (!graph.ContainsVertex(node.FromNode))
				graph.AddVertex(node.FromNode);
			if (!graph.ContainsVertex(node.ToNode))
				graph.AddVertex(node.ToNode);
		}

		foreach (var node in nodes)
		{
			if (!graph.ContainsEdge(new Edge<string>(node.FromNode, node.ToNode)))
				graph.AddEdge(new Edge<string>(node.FromNode, node.ToNode));
		}

		var graphviz = new GraphvizAlgorithm<String, Edge<String>>(graph);
		graphviz.FormatVertex += new FormatVertexEventHandler<String>( FormatVertex );

		string output = graphviz.Generate(new FileDotEngine(), "graph");
	}

	private void FormatVertex(object sender, FormatVertexEventArgs<String> e)
	{
		e.VertexFormatter.Label = e.Vertex;
	}
}

Eh voila! A .dot file is produced, which when opened in GraphViz, allows you to render to .gif, .svg or whatever you like, you dependency graph in visual form.

SQL Server 2008 now comes with spatial support, even in the express version. It supports a ‘geometry’ column type (unbounded coordinate axes) and a ‘geography’ column type (spherical geodetic coordinates corresponding to the shape of the earth). Even better, it follows the OGC OpenGIS standards, so is compatible with all other OpenGIS technologies. SQL Server therefore accepts data in the form of Well-Known-Text or Well-Known-Binary forms:

CREATE TABLE #testgeography
(
    G     GEOGRAPHY
)

INSERT INTO #testgeography (G)
VALUES    ('LINESTRING(5 5, 10 10, 21 2)')

NHibernate also supports spatial data types thanks to the NHibernate.Spatial community project on NHForge. Armed with a copy of SQL 2008 Express and NHibernate.Spatial, we can build a .NET spatial information system.

Creating a Session Manager

I like to have a class responsible for wrapping NHibernate’s Session Factory, providing ways to set the configuration, and get an open session. A simple example:

public class SessionManager : ISessionManager
{
    private FluentConfiguration _configuration { get; set; }
    private ISessionFactory _sessionFactory { get; set; }
    private ISession _session { get; set; }
    public string ConnectionString { get; set; }
    public FluentConfiguration Configuration
    {
        get { return _configuration; }
        set { _configuration = value; }
    }
    public ISessionFactory SessionFactory
    {
        get
        {
            if (_sessionFactory == null)
                _sessionFactory = Configuration.BuildSessionFactory();
            return _sessionFactory;
        }
    }
    public ISession GetSession()
    {
        if(_session == null || !_session.IsOpen)
        {
            return SessionFactory.OpenSession();
        }
        return _session;
    }
    public void Close()
    {
        if (_session != null)
            _session.Close();
        if(_sessionFactory != null)
            _sessionFactory.Close();
    }
}

Configuring NHibernate

I then have a class responsible for creating the NHibernate context and giving it to the session manager. Notice that the auxilliary database object provided by NHibernate.Spatial is added to the configuration – this provides information to NHibernate on how to interpret spatial data types.

public class InitialiseNhibernate : IInitialiseNHibernate
    {
        private readonly ISessionManager _sessionManager;

        public InitialiseNhibernate(ISessionManager sessionManager)
        {
            _sessionManager = sessionManager;
        }

        public string ConnectionString { get; set; }

        public void Initialise(Configuration configuration)
        {
            configuration.AddAuxiliaryDatabaseObject(new SpatialAuxiliaryDatabaseObject(configuration));

            var cfg = Fluently.Configure(configuration)
                .Database(MsSqlConfiguration.MsSql2005
                    .ConnectionString(conn => conn.Is(ConnectionString))
                    .ProxyFactoryFactory("NHibernate.ByteCode.Castle.ProxyFactoryFactory, NHibernate.ByteCode.Castle")
                    .ShowSql()
                    .Dialect("NHibernate.Spatial.Dialect.MsSql2008GeographyDialect,NHibernate.Spatial.MsSql2008"))
                    .Mappings(m =>
                                  {
                                      m.FluentMappings.AddFromAssemblyOf<UserMapping>();
                                      m.HbmMappings.AddFromAssemblyOf<UserMapping>();
                                  });

            _sessionManager.Configuration = cfg;
            _sessionManager.ConnectionString = ConnectionString;
        }
    }

A test framework for NHibernate Spatial (using MBUnit)

To test the session provider and NHibernate configuration, I have a test base (used in all tests) to setup Castle Windsor IoC container:


public class TestBase
{
	[SetUp]
	public virtual void SetupContext()
	{
		IWindsorContainer container = new RhinoContainer("Windsor.boo");
		IoC.Initialize(container);
	}

	[TearDown]
	public virtual void TearDownContext()
	{
	}
}

A database test base on top of that for database integration tests (but which is not database vendor specific), which instantiates a session manager and provides a way to generate the database schema:

    public class BaseDatabaseFixture : TestBase
    {
        protected ISessionManager SessionManager { get; set; }

        public override void SetupContext()
        {
            base.SetupContext();
            SessionManager = IoC.Resolve(typeof (ISessionManager)) as ISessionManager;
        }

        protected void GenerateDatabase(string filePath)
        {
            StreamWriter writer = File.CreateText(filePath);
            GenerateSchemaFromMappingFiles(writer);
        }

        private void GenerateSchemaFromMappingFiles(TextWriter writer)
        {
            var sqlConnection = new SqlConnection(SessionManager.ConnectionString);
            
                sqlConnection.Open();
                SessionManager.Configuration.ExposeConfiguration(c =>
                 {
                     var exporter = new SchemaExport(c);
                     exporter.Execute(true, true, false, sqlConnection, writer);
                 });
        }

        public override void TearDownContext()
        {
            base.TearDownContext();
            SessionManager.Close();
        }
    }

And a SQL-Server-specific test base on top of that, to complete the test context setup for SQL Server.

    public abstract class BaseMsSqlDatabaseFixture : BaseDatabaseFixture
    {
        private const string OutputFile = @"..\..\database-generation\output\MsSqlDatabaseSchema.sql";

        public override void SetupContext()
        {
            base.SetupContext();
            var initialiseNHibernate = IoC.Resolve(typeof(IInitialiseNhibernate), 
                "initialiseNhibernateForIntegrationTesting") as IInitialiseNhibernate;
            if (initialiseNHibernate == null)
                throw new ApplicationException("Failed to initialise NHibernate. IInitialiseNhibernate not resolved.");
            
            var configuration = new Configuration();
            initialiseNHibernate.Initialise(configuration);

            GenerateDatabase(OutputFile);
            DataGenerator.GenerateAndSaveTestData(SessionManager.GetSession());
        }
    }

Notice that when using Castle Windsor to resolve IInitialiseNhibernate, I am using the overloaded resolve method which accepts a service name. This allows me to specify two implementations of IInitialiseNhibernate which take different connection string arguments, one for testing and one for production. I choose to use Binsor to configure the container, so my windsor.boo looks something like this:

# Connection Strings - note we have to escape slashes
connectionString as string = "Server=localhost\\SQLEXPRESS;Initial Catalog=iClimb;Integrated Security=True"
connectionStringForIntegrationTests as string = "Server=localhost\\SQLEXPRESS;Initial Catalog=iClimbTesting;Integrated Security=True"

component "initialiseNhibernate", IInitialiseNhibernate, InitialiseNhibernate:
	ConnectionString=connectionString
component "initialiseNhibernateForIntegrationTesting", IInitialiseNhibernate, InitialiseNhibernate:
	ConnectionString=connectionStringForIntegrationTests

Create an NHibernate entity with a geography data type

Say we have an Area class, which represents a specific location on the earth which a rock climber might visit. It may look something like:

    public class Area : BaseEntity<Area>
    {
        public virtual Point Centre { get; set; }
        public virtual string Name { get; set; }
    }

Point is a data structure conforming to the OpenGIS specification for a point. I use the NetTopologySuite library. To map this class to a database using NHibernate, the following fluent mapping would be valid:

    public class AreaMapping : ClassMap<Area>
    {
        public AreaMapping()
        {
            Id(c => c.Id).GeneratedBy.Identity();
            Map(c => c.Name).Not.Nullable();

            Map(x => x.Centre)
                .CustomTypeIs<MsSql2008GeographyType>()
                .Not.Nullable()
                .CustomSqlTypeIs("GEOGRAPHY");
        }
    }

Test saving a geography type

We should be ready to save spatial data. Create a test which simply saves a climbing area and retrieves it.

    [TestFixture]
    [FixtureCategory("Integration")]
    [TestsOn(typeof(Area))]
    public class AreaMappingTests : BaseMsSqlDatabaseFixture
    {
        [Test]
        public void Can_save_an_area()
        {
            const double savedX = 10;
            const double savedY = 20;
            Area savedArea = EntityHelper.GetArea(new Point(savedX, savedY), "Amazing climbing here");
            
            ISession session = SessionManager.GetSession();
            var savedId = (int)session.Save(savedArea);
            session.Flush();
            session.Close();

            session = SessionManager.GetSession();
            var retrievedArea = session.Get<Area>(savedId);

            Assert.AreEqual(savedId, retrievedArea.Id);
            Assert.AreEqual(savedArea.Name, retrievedArea.Name);
            Assert.AreEqual(new Point(savedX, savedY), retrievedArea.Centre);
        }
    }

And hey presto! You should be able to see the area being saved:

NHibernate:  create table [Area] (
        Id INT IDENTITY NOT NULL,
       Name NVARCHAR(255) not null,
       Centre GEOGRAPHY not null,
    )

NHibernate: INSERT INTO [Area] (Name, Centre) VALUES (@p0, @p1); select SCOPE_IDENTITY();@p0 = 'Amazing climbing here', @p1 = POINT (10 20)

This is really cool. We can now leverage the power of SQL Server’s spatial processing (efficient distance calculations, spatial relationship analysis, spatial indexing etc) but largely shield this underneath the abstraction that NHibernate provides. The only painful bit now is how to allow the domain entities and services to actually use these functions (e.g. STIntersects): currently I am using stored procedures.

In my first post in this series, I presented R*Trees as a structure for efficient spatial analysis of geographical coordinates. In this post, I’m going to outline an implementation of a geographical system using SQLite’s spatial extension SpatiaLite, which includes SQLite’s R*Tree module.

SpatiaLite

I found SpatiaLite a great way to start learning about storing and analysing geographic data – it’s easy to get up and running with it’s GUI tools, visualise shapes and create spatial databases. Download either spatialite.exe (command line) or spatialite-gui.exe (GUI), and the sample database of Italian towns and highways – all of which you can download from the spatialite website.

Fire up the GUI, and load the sample database. If you show columns on the tables, you will notice they have a Geometry column. All mainstream spatial databases follow standards laid out by the OpenGIS, which includes a number of standard geometric shapes. You will see for example that towns are of type POINT, regions are of type POLYGON and highways are LINESTRING. There is an excellent tutorial for further reading.

As explained in my earlier post, we can’t apply a normal sorting optimisation to geographical data (i.e. creating a standard SQL index) – instead we need to use an R*Tree. To implement this in SpatiaLite on our sample database, simply execute to following on your column with geographical data (in SpatiaLite, this must be done on a column of type GEOMETRY):

SELECT CreateSpatialIndex('Towns', 'geometry');

This creates an R*Tree spatial index on Towns using the Geometry column. Spatial analyses on this column (see examples of what you can do in this SQL reference page) will then take advantage of the clever algorithms behind R*Tree to make spatial comparisons more efficient.

.NET Implementation in C#

At the time of writing, there were a few manual steps I had to do to get a working example going. The first was to execute the init_spatialite-2.3.sql file which comes with SpatiaLite to create a database with the necessary system tables and SRID data for supporting spatial processing.

spatialite.exe MySpatiaLiteDatabase.sqlite < init_spatialite-2.3.sql

We then need to initialise the libspatialite extension (included in the SpatiaLite download) by calling SELECT load_extension(‘libspatialite-2.dll’). I found that I had to have the dll on my windows class PATH. In C# we can do this for example like this:

var connection = new SQLiteConnection(_fileConnectionString);
connection.Open();

//load spatialite-extension
new SQLiteCommand("SELECT load_extension('libspatialite-2.dll');", connection)
.ExecuteScalar();

We can then continue to create tables with spatial indices and load in our data using the appropriate SQL.
This implementation is okay for playing around, but when you’re ready for a real application, I would advise checking out the NHibernate support for spatial databases (NHibernate.Spatial) provided by the NHContrib project. My next post will take a deeper look into building a spatial application in .NET suitable for a real application.