Why middleware may not be the right abstraction for your data policies.

There is nothing intrinsically difficult with the business logic of backends: after all, it's just code and algorithms. A fizzbuzz is a fizzbuzz: equally useless on the backend as it is on the…

Cover image for Why middleware may not be the right abstraction for your data policies.

Every frontend developer knows, or at least suspects, that backends are hard. But why would that be?

There is nothing intrinsically difficult with the business logic of backends: after all, it's just code and algorithms. A fizzbuzz is a fizzbuzz: equally useless on the backend as it is on the frontend.

#What makes backends harder?

Backends are hard because they deal directly with your data, and data has requirements that go beyond the data manipulation you want to do. For example:.

  • Who can access this data, and under which circumstances?
  • Where can the data be stored (and cached) to comply with regulations like GDPR?
  • Does the data storage comply with all the regulations and requirements of the business? For example, is it encrypted when and where it needs to be?

To clearly separate this task from the task of encoding the business logic of your backend, let's call this the data policies that the backend must implement.

#Ways to code policy compliant backends

Implementing data policies can be daunting for developers without direct experience in this area. And the cost of mistakes is tremendous. But if you have to do it… you have to do it! So what are some of the strategies that one can use to deal with those requirements?

One approach is to sprinkle policy logic all over your code. Whenever you need to test if someone is logged in, who they are, what they can do, just add a bunch of if/else statements and you're good to go!

This has obvious flaws: as your code evolves, it is hard to make sure that due to human error data policies are not skipped. Not only is this tedious, mistakes are hard to catch, especially in large code bases, and at some point data will escape the confines of the policy.

#Is middleware a solution?

The other alternative is to use a common pattern in most frameworks: middleware. Middleware is code that executes before and/or after an endpoint is called, and can deny the request right away, as well as transform output to conform to data policies.

The problem is that middleware is associated with your particular endpoint. Also, to allow for stacking, they are opaque, and operate at the request/response boundary.

But most of the time, what you really want is to protect the data in its current context. You want to be able to express things like

  • “if the user is properly authenticated, let them see their own posts. Otherwise, hide them”
  • “If this personal data is associated with an European user, store it in a location in Europe”
  • “Make sure this data goes through an encrypted connection”

#Simple vs. complex endpoints

For an endpoint that only ever touches a single piece of data– for example, an API to simply fetch an user by id, the mapping of data policy to endpoint is still manageable: authorizing the endpoint, means authorizing access to that user!

But the moment it gets more complex, like for example if you need to access a task list to decide which user to fetch, things get more complicated as your data policies may be different for each data.

it may also not perform well: the middleware, especially if it has to broker between many data types, may not have an opportunity to transform the queries. It operates opaquely on the request and responses.

Because of that, middleware tends to play a role in things like user authentication and providing general access to the endpoint. But efficient and safe access to the data still happens at the business logic level, with developers having to make sure queries are correctly filtered, masked, and transformed.

This breaks the middleware abstraction, and we're back to square one, with data potentially escaping the confines of the policy.

#Breaking the middleware abstraction

To understand how the opaque middleware abstraction is broken, let's look at the example of data filtering: once a user is authenticated, the context of the request, including the user's identity, can be used to restrict which data the user has access to.

A much better approach is to allow the developer to write data policies that integrate directly into the backend, and yet are simple and isolated enough to be inspected at any time, giving the developers the peace of mind that rules are applied.

One example of that is the data policy feature in ChiselStrike, which is now available as a preview feature in 0.13. In ChiselStrike, the database is abstracted away and everything is TypeScript-first. For example, here's how you could model a User entity:

// in models/User.ts
import { ChiselEntity } from '@chiselstrike/api';

export class User extends ChiselEntity {
  // the external user_id (coming from JWT auth, not public data)
  userId?: string;
  // the user's username
  username: string;
  // the user's email (not public data!)
  email?: string;
}

Data policies in ChiselStrike are aware of which user is logged in. That is done through standard JWTs that you can obtain from services like Okta or Clerk. Once properly decrypted, a JWT would look like this:

{
  "userId": "xxxx",
  "otherFields": "yyy"
}

You can then create a source file with a name matching the entity being protected (policies/User.ts) with the following contents:

import { User } from '../models/User';
import { Action, RequestContext } from '@chiselstrike/api';

export default {
  create: (user: User, ctx: RequestContext) => {
    if (ctx.token && user.userId && user.userId == ctx.token['userId']) {
      return Action.Allow;
    } else {
      return Action.Deny;
    }
  },
};

The first thing to note about these policies is that they are central and tied to the data. You can inspect it in a single location, and reason about what it is doing anywhere in your backend.

In this example, nobody can ever create a User entity unless its userId matches the contents of the JWT. This will be enforced automatically any time a User is created. Similar policies are also available for read and update operations.

Policies can also be used to filter entity instances that would be returned by a query:

import { User } from '../models/User';
import { Action, RequestContext } from '@chiselstrike/api';

export default {
  read: (user: User, ctx: RequestContext) => {
    if (ctx.token && user.userId && user.userId == ctx.token['userId']) {
      return Action.Allow;
    } else {
      return Action.Skip;
    }
  },
};

The interesting part here is that the ChiselStrike compiler inspects all policy files, and works at the query level to guarantee efficiency. In the example above, the userId is added as a filter on incoming User queries automatically. The data that is not supposed to be seen never leaves the database.

Data policies can do a lot more:

  • Using the create policy, you can make sure that data that lacks certain fields or has invalid values will never be saved to the database.
  • Similarly, policies can mask data, for example, denying access to certain fields. For example, you can express that only a logged in user sees their own personal information, or strip it out for queries originating from anyone else.
  • You can transform records before they are saved: adding fields specifying geographical placement in databases that support it and achieving compliance, or for example truncating or standardizing values.

#What now?

An initial version of this work is available in ChiselStrike 0.13 and we are working towards stabilizing it. All the work on that is in the open, on our GitHub repository.

Have thoughts on it? We'd love to hear how you plan to use this on our Discord or Twitter.

scarf