Home > Uncategorized > Grasp, A .NET Analysis Engine – Part 2: Variables

Grasp, A .NET Analysis Engine – Part 2: Variables

February 23rd, 2012 Leave a comment Go to comments

In part 1, we identified a family of systems at whose core is a data set and its analysis. We also set out two goals for the Grasp engine: represent a structured collection of data points and the rules which analyze it. In this post, we will explore how we can represent any set of domain-specific data.

Data Points

To describe a data set, we first need to define a unit of data. The variable is a well-known concept we can leverage here: it represents a value, consisting of a name and type. In Grasp, a type is a CLR type, so like a variable in a program, a Grasp variable can represent any manner of data.

A variable is a design-time construct; it communicates that, at some point, there will be a concrete value associated with it. Just like writing a program, this separates the rules of a system from the runtime which carries them out.

Schema

We can describe the set of all variables known to a system as its schema. This is similar in concept to a database schema, which also describes an organization of data. It is the "shape" of the data set.

A Grasp schema is similar to a database schema in another important respect: its data is always available. This is different from a program, where a variable’s scope, and thus availability, is determined by the extent its name appears in the source. A schema is effectively a single scope in which all variables reside.

This poses an interesting challenge: how can we effectively partition variables if they all live in the same bucket? We can’t have two variables named, say, TotalIncome, that mean different things in different contexts. Any decent-sized data set would have conflicts pretty quickly. Relational databases solve this issue using tables: a table qualifies a piece of data, making it uniquely identifiable within the schema.

Variables, though, are more fluid than the strict structure of tables; they are more akin to organizing types within an assembly. This implies we can borrow another well-known concept: the namespace. Its hierarchical nature allows us to fully qualify any variable in a data set, allowing us to get as fine-grained as necessary in describing data.

For example, let’s say we are accrediting the Acme School of Anvil Design. We may ask the total income of the school as well as the total income of its bookstore. We can represent both of these values by qualifying them with meaningful namespaces:

Acme.TotalIncome

Acme.Bookstore.TotalIncome

This is easier to understand, and will evolve better, than if we chose arbitrary names to differentiate them, such as TotalSchoolIncome/TotalBookstoreIncome or SchoolTotalIncome/BookstoreTotalIncome. It is more obvious that the variables represent similar values, and leaves room for other values to be organized at the school or bookstore level. Perhaps the bookstore also has a coffee shop; we can further organize the data along these lines:

Acme.Bookstore.CoffeeShop.TotalIncome

This approach organizes data along the contours of the problem domain, facilitating discoverability and learnability.

Let’s See Some Code

A variable is straightforward to represent. For starters, we create properties for the namespace, name, and type. These values do not change for the lifetime of an instance, so we can make them immutable via private setters:

public class Variable
{
  public string Namespace { get; private set; }

  public string Name { get; private set; }

  public Type Type { get; private set; }

  public override string ToString()
  {
    return Namespace + "." + Name;
  }
}

We also override ToString so it returns the fully-qualified name.

Next, we need to initialize these properties. The key here is to ensure the namespace and name are formatted correctly. For Grasp, this means following the .NET Framework’s definition of a namespace, which is a series of identifiers separated by the "." character. An identifier is a token composed of a combination of letters, numbers, and/or the "_" character, and does not start with a number.

We can encode these formatting rules as a set of static methods on the Variable class:

public static bool IsNamespace(string value)
{
  Contract.Requires(value != null);

  return Regex.IsMatch(value, @"^([_A-Za-z]+\w*)+(\.[_A-Za-z]+\w*)*$");
}

public static bool IsName(string value)
{
  Contract.Requires(value != null);

  return Regex.IsMatch(value, @"^[_A-Za-z]+\w*$");
}

Phew! Those are some imposing regular expressions on first glance. They actually pretty straightforward, though, as regular expressions go. Here is a breakdown:

Namespace
^   Start of string
(   Start a group to match the first namespace identifier
  [_A-Za-z]+ Match exactly one underscore or letter to start (no digits)
  \w* Match zero or more "word" characters (letters, digits, or underscores)
)+   Match exactly one identifier to start the namespace
(   Start a group to match the subsequent identifiers
  \. Match a single separating dot
  [_A-Za-z]+ Match exactly one underscore or letter to start (no digits)
  \w* Match zero or more "word" characters (letters, digits, or underscores)
)*   Match zero or more subsequent identifiers
$   End of string
Name
^ Start of string
[_A-Za-z]+ Match exactly one underscore or letter to start (no digits)
\w* Match zero or more "word" characters (letters, digits, or underscores)
$ End of string

Together these checks ensure that all namespaces and names for variables follow the well-known pattern for .NET namespaces. This enables a text-based calculation editor, where we would reference variable names in a parseable manner. But, we’ll get to that later.

Now that the Variable class has the ability to validate the format of its values, we can create a constructor that initializes the Namespace, Name, and Type properties:

public Variable(string @namespace, string name, Type type)
{
  Contract.Requires(IsNamespace(@namespace));
  Contract.Requires(IsName(name));
  Contract.Requires(type != null);

  Namespace = @namespace;
  Name = name;
  Type = type;
}

In the constructor, we ensure that the namespace and name values have the correct format, and that the type is not null. (If you don’t recognize the syntax, Contract.Requires is part of .NET Code Contracts. I use it throughout Grasp for argument checking.)

I used the "@" prefix for the namespace parameter because that is the best name but also happens to be a keyword. In these cases, we also have the option to compromise the name somehow, i.e. "ns", "nmespace", or "theNamespace". However, each of these is an end run around the issue and does not reflect to the reader why they chose that identifier; rather than have the next developer try to change it to "namespace", realize it won’t work, and have to go through the same decision process, I chose to make the decision explicit. This happens frequently with "@event" as well. Your mileage may vary.

Summary

We addressed the first goal of Grasp: represent the data of any data set. We were able to do this by combining the concepts of namespaces and variables to uniquely identify any piece of data. We also created a class to represent a namespace-qualified variable and ensured the namespace and name have the proper format.

Next time, we will tackle the other goal: rules which analyze the data.

Continue to Part 3: Calculations

Tags: , ,