Monthly Archives: September 2013

Imperfect Matching Problem

This little hash table problem seems appropriate given that School is about to go back into session and a new batch of fresh minds are going to be opened up to the world of Computer Science. So I'm going to try and write something more geared for that audience.

Problem: You have a set of widgets. They have a type and a compatibility mode of One, Many or Either. Given any type and compatibility mode combination, find out if you have a matching key in your hash table. Be mindful that if you have matching types where the one in your hash table is paired with compatibility mode "Either" and you're looking for "One" your table must return a hit.

Finally, to make ya think a little more, your answer must run in Θ(1) time.

Good one eh? I think so. Makes ya think how hash tables work and what they have to do in the event of a collision (hint hint to you awesome students who are going to try and puzzle out a solution before you read mine). (Those who skipped ahead, well, you skipped so you're probably not reading this. Never mind).

So, how DO hash tables work? Say you had something like a Social Security Number. You need nine digits to contain them and do something like link to someone's name right? ###-##-####. Right! Ok, well, you could store the string names in an array with the Social Security Number of that person as the index into the array but then positions in the array from 0 to 99,999,999 would all be unfilled wasting a lot of memory. That's bad. But you'd have instant access to those positions and that's good! How can we have our cake and eat it too? We can use a hash table! Granted the Θ(1) access into an array is less than the Θ(1) access into a hash table (because the hash must be calculated) but a hash table's Θ(1) is still very fast. Because we use the hash as our index into storage, we can then get away with reserving much less memory than if we just used an array.

So what about our problem? We're facing something that's not quite as nice as Social Security Number to name. A nice one to one. No, we're facing something a little more tricky.

So first, let's set the stage with our Widget.

public enum Compatibility {One = 1, Many = 2, Either = 3};
public abstract class Widget
{
  public Widget(Type type, Compatibility compatibility)
  {
    Type = type;
    Compatibility = compatibility;
  }

  public Type Type {get; private set;}
  public Compatibility Compatibility {get; private set;} 

  public override int GetHashCode(); //todo
  public override bool Equals(object obj); //todo
  protected abstract [what could I possibly be?];
}

How many hints do you catch there? I count/intend two.

Back to task, we have something to hash on. The common type of the widgets in question. Let's then write our override for GetHashCode():

public override int GetHashCode()
{
    return Type.GetHashCode();
}

That what you were expecting? As GetHashCode() is not guaranteed to generate a unique hash value, we're already running the risk of a collision. But that's okay, hash tables (or Dictionaries since we're operating in .NET) do lookups in Θ(1) time anyway.

But we are mucking with things here. The Θ(1) is guaranteed for dissimilar types and we're setting up a situation where we have to account for dissimilar types and for Compatibility mismatches. Will the solution we come up with still be Θ(1)? I'm going to let you chew on that a little more before I tell ya.

When a collision happens, hash tables need to verify if the match is actual or not. Therefore, we have to implement the Equals method. Let's get that started now.

public override bool Equals(object obj)
{
  var other as Widget;
  //Trivial check of is the supplied object this object.
  if(this == other)
    return true;

   //Types don't match, can't match (Still need this).
   if(this.Type != other.Type)
     return false;

  .
  .
  .
}

That's the easy part. Now the tricky one. If this object has Either compatibility, anything the other object is will be a match. Otherwise the Compatibility has to match. But, is this Widget in the hash table or is it a key I'm looking up?

Here's one of the hints I called out before. Note Widget is abstract. We're not going to make any of these really. Yeah yeah, I'm throwing some un-warned polymorphism in here. But I can be a stinker like that. 😉

Let's fix up our class now with the big reveal!

protected abstract [What could I possibly Be?];
protected abstract bool Equals(Widget other);

Ta-Da! Now let's make some subclasses.

public class DictionaryWidget : Widget
{
  protected override bool Equals(Widget other)
  {
    return Compatibility == Compatibility.Either;
  }
}

public class QueryWidget : Widget
{
  protected override bool Equals(Widget other)
  {
    return other.Compatibility == Compatibility.Either;
  }
}

All right. Now if the hash table's key is Either, we'll return true, hit.

One more case to go and we can do that back in the base class.

public override bool Equals(object obj)
{
  var other as Widget;
  //Trivial check of is the supplied object this object.
  if(this == other)
    return true;

   //Types don't match, can't match (Still need this).
   if(this.Type != other.Type)
     return false;

  if(Equals(other))
    return true;

  return (Compatibility & other.Compatibility) > 0;
}

What was added says if the hash table key is of compatibility type either, since we already now the types match return true. Finally if the current Widget's compatibility matches a bit the supplied Widget's compatibility, return true, Equals. Otherwise, it will return false (Compatibility.One & Compatibility.Many)  = (01b & 10b) = 0.

As for Θ(1) , yes. We're increasing the number of collisions, sure, but by another constant, the three values of the enumeration. So, we're still Θ(1) ! Yay! Albeit a slightly lower Θ(1) than we otherwise would be. 🙂

The usability on this is lower than I prefer because it depends upon developers putting DictionaryWidget's into the hash table and querying with QueryWidgets. But, that could be solved with some wrapping which is outside the scope of this problem. I just found the problem a good thought exercise on the basics of hash tables, hence the post.