Recently I’ve completed a task that required writing a complex query in Entity Framework Core. After completing said tasked I’ve decided to marvel at my creation and saw, that there’s not much to marvel at - it did the job, but it wasn’t very readable. I already felt pity for the poor soul that has to understand it, or even worse, modify it - especially because more than likely it’d be me, after I’ve long forgotten the monstrosity I created. That’s why I went out looking for a way to write more readable queries with Linq.

Demo setup

For the purpose of this demo, let’s imagine we’re working in a domain of order processing. Here’s the model of our order:

public class Order {
  public Guid Id { get; }
  public DateTime OrderDate { get; }
  public OrderState State { get; }
}

It’s rather barebones as I’ve omitted some fields not relevant to the query we’re going to perform for simplicity’s sake. Here’s the OrderState enum:

public enum OrderState {
  New,
  InProgress,
  Completed
}

Just to give ourselves a need to implement the query let’s say during tests we’ve noticed that some orders don’t process correctly due to transient errors. Because of that, we’d like to see if the order is in a certain state for too long, and if so, retrieve it from the database for reprocessing. We need to reprocess orders that stay new for more than an hour and orders that are still in progress after 12 hours from order date. Let’s see how we could go about adding a method to our repository to do that.

public IEnumerable<Order> GetOrdersForReprocessingWithLinq()
{
  return _orders.Where(o =>
    // take orders that are in state New and older than 1h
    (o.State == OrderState.New && o.OrderDate < DateTime.Now.AddHours(-1))
    // or orders that are in state InProgress and older than 12h
    || (o.State == OrderState.InProgress && o.OrderDate < DateTime.Now.AddHours(-12)))
    .ToList();
}

As you can see, the query isn’t easy to read. There’s just some parentheses to enforce the order of operations and a few logical operators but it’s already a little bit messy even in this very simple scenario. Even though we’ve added comments, understanding and modifying this query would take more time than it needs to.

Business logic in the model

How can we improve this? We’ve got a model, let’s add a method to it in a true DDD fashion and let the order decide whether it needs reprocessing.

public class Order {
  //removed fields for brevity
  public bool QualifiesForReprocessing()
  {
    return QualifiesForNewOrderReprocessing() || QualifiesForInProgressOrderReprocessing();
  }

  private bool QualifiesForNewOrderReprocessing()
  {
    return State == OrderState.New && DateTime.Now.AddHours(-1) < OrderDate;
  }

  private bool QualifiesForInProgressOrderReprocessing()
  {
    return State == OrderState.InProgress && DateTime.Now.AddHours(-12) < OrderDate;
  }
}

Let’s use it in our repository:

public IEnumerable<Order> GetOrdersForReprocessingWithEntityMethod()
{
  return _orders.Where(o => o.QualifiesForReprocessing())
    .ToList();
}

As it turns out, this approach was naive - we’ve received an exception instead of the expected results. Without going into unnecessary detail, queries in Entity Framework Core are performed with LINQ to SQL. First, the query is transformed into an expression tree, that’s then translated into SQL. The reason we’re getting an exception is that we can’t translate method calls into SQL, and the only way to perform this query would be to load all orders into memory and perform the query in C#. That wouldn’t be very performant, so we’re going to need to find another solution. You can see the message we received with the exception below.

The LINQ expression 'DbSet<Order>().Where(o => o.QualifiesForReprocessing())' could not be translated.Either rewrite the query in a form that can be translated, or switch to client evaluation explicitly by inserting a call to 'AsEnumerable', 'AsAsyncEnumerable', 'ToList', or 'ToListAsync'.

LINQKit

Is there any way to split our query into named parts in a way that would translate into SQL? Well, we could use expressions, but we’d have to combine them somehow. Fortunately, there’s a Nuget package called LINQKit with a few useful extensions for LINQ. We’re going to create our own expressions and use PredicateBuilder from this package to combine them.

public IEnumerable <Order> GetOrdersForReprocessingWithLinqKit()
{
  Expression<Func<Order,bool>> qualifiesForNewOrderReprocessing =
  o => o.State == OrderState.New && o.OrderDate < DateTime.Now.AddHours(-1);
  Expression<Func<Order,bool>> qualifiesForInProgressOrderReprocessing =
  o => o.State == OrderState.InProgress && o.OrderDate < DateTime.Now.AddHours(-12);

  var qualifiesForReprocessing = PredicateBuilder.New<Order>()
    .Or(qualifiesForNewOrderReprocessing)
    .Or(qualifiesForInProgressOrderReprocessing);

  return _orders.Where(qualifiesForReprocessing)
    .ToList();
}

That’s quite a lot more code than we started with, but now we’ve split our query into two variables which names imply what they’re responsible for. Then we’re using PredicateBuilder to combine them into one expression. We don’t need comments anymore and I think it’s clearer to the person reading the code what it’s doing, but we can do better.

Can we do better?

Now that we’re using expressions, we can extract them from our method and reuse in other queries. We can also create parametrized methods returning expressions to simplify things further.

private static Expression<Func<Order,bool>> IsInState(OrderState state)
{
  return o => o.State == state;
}

private static Expression<Func<Order,bool>> IsOlderThan(TimeSpan age)
{
  return o => o.OrderDate < DateTime.Now.AddMilliseconds(age.TotalMilliseconds);
}

public IEnumerable<Order> GetOrdersForReprocessingWithClearerLinqKit()
{
  var predicate = PredicateBuilder.New<Order>()
    .Or(IsInState(OrderState.New).And(IsOlderThan(TimeSpan.FromHours(1))))
    .Or(IsInState(OrderState.InProgress).And(IsOlderThan(TimeSpan.FromHours(12))));

  return _orders.Where(predicate)
    .ToList();
}

Above you can see the relevant code. Now we’ve got two private static methods accepting parameters and returning expressions - one for checking state of the order, and the other for checking if it’s old enough to be reprocessed. Then in the public method we’re combining them with PredicateBuilder. The public method reads almost like natural language - I think it conveys our intention in a clean and readable fashion. It should also be a breeze to modify it in the future. I wish we could replace DateTime.Now.AddMilliseconds(age.TotalMilliseconds) with DateTime.Substract(age), but LINQ to SQL can’t translate that.

I’m happy with the final result, but I’m not sure if I’d use it in this particular case. Our query was still rather simple and if it was the most complex query in our application I think adding LINQKit would add more complexity than it’d remove. I would happily use it in systems with multiple queries like that, which would benefit from sharing expressions, or in more complex queries where it’d be easier for a developer to learn how to use PredicateBuilder instead of understanding the query written in pure LINQ.

You can find the source code with some integration tests that allows running our repository methods here.