In a previous article I wrote, I introduced a way to compare expressions on the basis of their makeup as opposed to simple reference equality checks. This functionality was provided in the form of an IEqualityComparer<T> implementation, which itself was supported by several components.

Like all equality comparers, our expression equality comparer must be able to perform two functions:

  1. Determine the equality between two given expression instances
  2. Generate a hash code for a given expression instance

The previous article covered the first of these jobs; in this article, we’ll be covering the hash code generation aspect of the comparer.

Revisiting Our Equality Comparer

Just to make sure we’re all on board in regards to how and where our hash code generator will be used, let’s take a second to revisit the equality comparer presented in the previous article.

ExpressionEqualityComparer.cs
/// <summary>
/// Provides methods to compare <see cref="Expression"/> objects for equality.
/// </summary>
public sealed class ExpressionEqualityComparer : IEqualityComparer<Expression>
{
    private static readonly ExpressionEqualityComparer _Instance = new ExpressionEqualityComparer();

    /// <summary>
    /// Gets the default <see cref="ExpressionEqualityComparer"/> instance.
    /// </summary>
    public static ExpressionEqualityComparer Instance
    {
        get { return _Instance; }
    }

    /// <inheritdoc/>
    public bool Equals(Expression x, Expression y)
    {
        return new ExpressionComparison(x, y).ExpressionsAreEqual;
    }

    /// <inheritdoc/>
    public int GetHashCode(Expression obj)
    {
        return new ExpressionHashCodeCalculator(obj).Output;
    }
}

As we can see above, the brunt of the equality comparison work is performed by the ExpressionComparison class, which is a type that we covered in the first article in this series.

If you look at the code for the ExpressionComparison type, you’ll see that it is a derivative of the .NET provided ExpressionVisitor class. The reason why ExpressionComparison was subclassed from ExpressionVisitor is because hooking into that infrastructure is logically congruent with the structure of the expressions it would be comparing.

In order to take into account all the various types of expressions and their properties, we needed to override the majority of the virtual (Visit[x]) methods exposed by ExpressionVisitor. We did not have to override all of them, only the ones targeting expression types whose makeup was unique among all other types.

Just like we did with ExpressionComparison, our ExpressionHashCodeCalculator will also subclass ExpressionVisitor, and it will behave in much the same way, except it will be calculate a running total of the hash codes of all the various properties significant to the given type of expression.

Expression Hash Code Calculator

Now, let’s get into the meat of the matter. As usual, I need to state a disclaimer regarding the usage of the code before we go over it. Although I’ve personally tested all part of the code, I would encourage you to do so yourself before just plopping into what you’re doing.

This code has received a fair amount of scrutiny and is used in some important parts of a larger project I’ve been authoring. In fact, the hash code generation aspect of my comparer rests at the heart of the reasons why I started out on this endeavor (it was my wish to be able to use expressions as keys in a dictionary).

We’ll first take a look at the code for the ExpressionHashCodeCalculator type, and then discuss its merits afterwards.

ExpressionHashCodeCalculator.cs

Update: Thanks to Denis in the comments for pointing out that there was a lack of support for constant collections; code has now been updated to support constant collections.

/// <summary>
/// Provides a visitor that calculates a hash code for an entire expression tree.
/// This class cannot be inherited.
/// </summary>
public sealed class ExpressionHashCodeCalculator : ExpressionVisitor
{
    /// <summary>
    /// Initializes a new instance of the <see cref="ExpressionHashCodeCalculator"/> class.
    /// </summary>
    /// <param name="expression">The expression tree to walk when calculating the has code.</param>
    public ExpressionHashCodeCalculator(Expression expression)
    {
        Visit(expression);
    }

    /// <summary>
    /// Gets the calculated hash code for the expression tree.
    /// </summary>
    public int Output
    { get; private set; }

    /// <summary>
    /// Calculates the hash code for the common <see cref="Expression"/> properties offered by the provided
    /// node before dispatching it to more specialized visit methods for further calculations.
    /// </summary>
    /// <inheritdoc/>
    public override Expression Visit(Expression node)
    {
        if (null == node)
            return null;

        Output += node.GetHashCode(node.NodeType, node.Type);

        return base.Visit(node);
    }

    /// <inheritdoc/>
    protected override Expression VisitBinary(BinaryExpression node)
    {
        Output += node.GetHashCode(node.Method, node.IsLifted, node.IsLiftedToNull);

        return base.VisitBinary(node);
    }

    /// <inheritdoc/>
    protected override CatchBlock VisitCatchBlock(CatchBlock node)
    {
        Output += node.GetHashCode(node.Test);

        return base.VisitCatchBlock(node);
    }

    /// <inheritdoc/>
    protected override Expression VisitConstant(ConstantExpression node)
    {
        IEnumerable nodeSequence = node.Value as IEnumerable;

        if (null == nodeSequence)
            Output += node.GetHashCode(node.Value);
        else
        {
            foreach (object item in nodeSequence)
            {
                Output += node.GetHashCode(item);
            }
        }

        return base.VisitConstant(node);
    }

    /// <inheritdoc/>
    protected override Expression VisitDebugInfo(DebugInfoExpression node)
    {
        Output += node.GetHashCode(node.Document,
                                        node.EndColumn,
                                        node.EndLine,
                                        node.IsClear,
                                        node.StartColumn,
                                        node.StartLine);

        return base.VisitDebugInfo(node);
    }

    /// <inheritdoc/>
    protected override Expression VisitDynamic(DynamicExpression node)
    {
        Output += node.GetHashCode(node.Binder, node.DelegateType);

        return base.VisitDynamic(node);
    }

    /// <inheritdoc/>
    protected override ElementInit VisitElementInit(ElementInit node)
    {
        Output += node.GetHashCode(node.AddMethod);

        return base.VisitElementInit(node);
    }

    /// <inheritdoc/>
    protected override Expression VisitGoto(GotoExpression node)
    {
        Output += node.GetHashCode(node.Kind);

        return base.VisitGoto(node);
    }

    /// <inheritdoc/>
    protected override Expression VisitIndex(IndexExpression node)
    {
        Output += node.GetHashCode(node.Indexer);

        return base.VisitIndex(node);
    }

    /// <inheritdoc/>
    protected override LabelTarget VisitLabelTarget(LabelTarget node)
    {
        Output += node.GetHashCode(node.Name, node.Type);

        return base.VisitLabelTarget(node);
    }

    /// <inheritdoc/>
    protected override Expression VisitLambda<T>(Expression<T> node)
    {
        Output += node.GetHashCode(node.Name, node.ReturnType, node.TailCall);

        return base.VisitLambda(node);
    }

    /// <inheritdoc/>
    protected override Expression VisitMember(MemberExpression node)
    {
        Output += node.GetHashCode(node.Member);

        return base.VisitMember(node);
    }

    /// <inheritdoc/>
    protected override MemberBinding VisitMemberBinding(MemberBinding node)
    {
        Output += node.GetHashCode(node.BindingType, node.Member);

        return base.VisitMemberBinding(node);
    }

    /// <inheritdoc/>
    protected override Expression VisitMethodCall(MethodCallExpression node)
    {
        Output += node.GetHashCode(node.Method);

        return base.VisitMethodCall(node);
    }

    /// <inheritdoc/>
    protected override Expression VisitNew(NewExpression node)
    {
        Output += node.GetHashCode(node.Constructor);

        return base.VisitNew(node);
    }

    /// <inheritdoc/>
    protected override Expression VisitParameter(ParameterExpression node)
    {
        Output += node.GetHashCode(node.IsByRef);

        return base.VisitParameter(node);
    }

    /// <inheritdoc/>
    protected override Expression VisitSwitch(SwitchExpression node)
    {
        Output += node.GetHashCode(node.Comparison);

        return base.VisitSwitch(node);
    }

    /// <inheritdoc/>
    protected override Expression VisitTypeBinary(TypeBinaryExpression node)
    {
        Output += node.GetHashCode(node.TypeOperand);

        return base.VisitTypeBinary(node);
    }

    /// <inheritdoc/>
    protected override Expression VisitUnary(UnaryExpression node)
    {
        Output += node.GetHashCode(node.IsLifted, node.IsLiftedToNull, node.Method);

        return base.VisitUnary(node);
    }
}

As you can see from the above code, the general method for calculating the hash code of a given expression is by visiting all its manifestations and parts and then generating hash codes from the properties significant to those parts.

The GetHashCode methods being invoked in the code are not native to the types they are being invoked on. Rather, they are extension methods, and are something I talked about in another article I wrote.

Each part of the calculate tacks on the total value of the various parts to an Output property, which holds the running total for our hash code. Calculation will end upon our Visit override being executed with a null node being passed.

There are many virtual Visit[x] methods offered by the ExpressionVisitor type. As I stated in the first article of this series, there were in general a large number of new types of expressions added to the .NET framework with 4.0.

When creating our ExpressionComparison class, we overrode many of these methods, but only the ones that held some bearing on the actual shape of the expression as far as an equality check was concerned. The same applies to our hash code calculator; it visits many of the same parts as was visited by the equality comparer, with some differences, namely, some overrides found in ExpressionComparison are not found in ExpressionHashCodeCalculator, and vice versa.

The reasons for not overriding a particular virtual method typically boil down to a lack of properties offered from which to grab hash codes that wouldn’t be offered in another, more centrally invoked override.

For example, one virtual method not overridden is the VisitBlock, method. This method accepts a single BlockExpression typed parameter. If we look at the BlockExpression type, we’ll notice that it offers no additional properties unique to what’s offered by more base Expression types. Well…at least except for the Result property. But even the presence of this property is not cause enough for us to override the method, the reason being that the Result property itself (which is an Expression) is visited by the base VisitBlock implementation, and therefore would be end up being visited by another block of our code anyways.

There are a few more methods not included, but I’ll leave them as an exercise for the reader.

If anyone finds any types of expressions that the above code does not account for, I’d appreciate your input. When constructing this code, however, I tried to be fairly exhaustive in my efforts.

 

This article is meant to serve as a reference for a particular set of functions that may be present in code snippets found in subsequent articles.

Every so often one may find themselves tasked with writing hash code generation algorithms for a particular type of object. This sort of requirement typically arises whenever we’re authoring either a value type or an implementation of an interface which requires such functionality (e.g. IEqualityComparer<T>).

While the way hash codes end up being calculated tends to differ between object types, hash code generation mechanisms can only be considered proper if the following requisites are met:

  1. If two objects are deemed equal, then the hash code generation mechanism should yield an identical value for each object.
  2. Given an instance of an object, its hash code should never change (thus using mutable objects as hash keys and actually mutating them is generally a bad idea).
  3. Although it is acceptable for the same hash code to be generated for objects instances which are not equal, the way in which the hash code is calculated should be such so that these kinds of collisions are as infrequent as possible.
  4. Calculation of the hash code should not be an expensive endeavor.
  5. The generation mechanism should never throw an exception.

Naturally, it makes sense to abstract the steps involved in satisfying such requirements into an independent function which can be used by any kind of object. Unfortunately, for the most part, it simply isn’t possible to guarantee the satisfaction of  points #1 and #2 with common code. We can, however, build something that contributes towards the success of #3.

To do this, we can create a set of functions that calculate a hash code based on the property values provided to them, with each function differing in the number of properties that they accept. A static helper class could serve as a home for these functions, but since I like to avoid static helper classes whenever possible, I thought it prudent to craft them as extension methods instead, such as the one shown below:

GetHashCode<TFirstProperty,TSecondProperty>
/// <summary>
/// Calculates the hash code for an object using two of the object's properties.
/// </summary>
/// <param name="value">The object we're creating the hash code for.</param>
/// <typeparam name="TFirstProperty">The type of the first property the hash is based on.</typeparam>
/// <typeparam name="TSecondProperty">The type of the second property the hash is based on.</typeparam>
/// <param name="firstProperty">The first property of the object to use in calculating the hash.</param>
/// <param name="secondProperty">The second property of the object to use in calculating the hash.</param>
/// <returns>
/// A hash code for <c>value</c> based on the values of the provided properties.
/// </returns>
/// ReSharper disable UnusedParameter.Global
public static int GetHashCode<TFirstProperty,TSecondProperty>([NotNull]this object value,
                                                              TFirstProperty firstProperty,
                                                              TSecondProperty secondProperty)
{
    unchecked
    {
        int hash = 17;

        if (!firstProperty.IsNull())
            hash = hash * 23 + firstProperty.GetHashCode();
        if (!secondProperty.IsNull())
            hash = hash * 23 + secondProperty.GetHashCode();

        return hash;
    }
}

Note: IsNull is another extension method that properly deals with checking if an instance is null when we don’t know whether we’re dealing with a value or reference type.

We’re using two prime numbers (17 and 23) for our seeds to aid in the effort of reducing collisions. It can be argued that they are not optimal; however, that would most likely end up being a weak argument, and such a discussion is not even in the scope of this article. The values originate from other sources out there that address the issue of collisions (e.g. stackoverflow.com). We have the code inside an unchecked block so that overflows do not result in exceptions.

Again, this could easily be placed into a helper class instead; regardless, I wanted to tap into the portability offered by extension methods. Also, given that all objects have a GetHashCode method, I don’t view the extension of the object type in this manner as harmful, even if we aren’t actually using the source object itself.

I have about six of these methods, with the number of parameters accepted ranging from one to six. All of this is neither earth shattering nor rocket science. As I stated at the beginning, I’m mainly sharing this with you, the reader, so I can refer to this if questions arise from their usage in articles subsequent to this one.

 

As I have stated elsewhere on this site, it is my intent to limit the scope of the articles I write to areas that fall within my field of expertise. Once again, however, I desire to break away from my usual routine to cover another legislative issue making the rounds in the current events sphere; [Read on for more...]

 

I’ve been swimming in a sea of COM Interop lately. Some time ago, I wrote an article that had some important tidbits regarding the nature of Runtime Callable Wrappers, or RCWs. I’d thought I’d bring to the surface one of the more important questions I answered in that article, specifically: What actions, when executed, incur [Read on for more...]

 

When working with .NET, it is lovely when everything we’re interacting with is rooted in .NET code; however, sometimes we need functionalities that can only be found in one or more COM types. Something that should cross our minds when dealing with COM types is what the implications may be for deployment efforts. Depending on [Read on for more...]

 

DataTemplates are essential tools provided by WPF that allow us to describe the visual structure of  any type of object. When declaring a DataTemplate, one typically is doing so with the intention of targeting a specific type (or valid subtype) of object. This is done by setting the DataType property of the DataTemplate to the Type of object we’re [Read on for more...]

 

Microsoft has recently released a preview version of their new Office 2013 product which adds support for their new touch screen platforms (while maintaining support for PCs). Looks like anyone can grab a copy and try it out for themselves, if interested you can head on over to the official site. If you have a [Read on for more...]

 

WPF affords developers the opportunity to create layouts that coincide greatly with how the look and behavior of a particular user interface was envisioned to be. The tricky part is, as always, knowing how to use the tools we’re given. One common layout-related issue people run into is the layout of items displayed within an [Read on for more...]

 

Resource mailboxes play host to a number of room resource specific settings and policies, such as the window (in days) in which you can “book” the resource, as well as a many-layered system of policies and permissions which affect who may use them as well as their experience in doing so. Your application may have [Read on for more...]

 

A reader recently sent me a message requesting further clarification on how one might use the generic IWeakEventListener implementation I recently wrote about (note to self: I need to disable the auto-comment disabling feature on this site). The Scenario Let’s say we have a child/parent pair of view models which serve as abstractions of a child/pair [Read on for more...]

© 2012-2013 Matt Weber. All Rights Reserved. Terms of Use.