This post is part of a sequence I am calling automatic redis, which is my attempt to solve the cache invalidation problem.

In my previous post, I demonstrated that a library could infer cache update operations from database insert operations by performing algebraic manipulations on the queries that define the cache keys. The algebraic laws needed were the distribution laws between monoids. e.g. `count` distributes over the `Set` monoid to produce the `Sum` monoid. A library could also infer the arguments of the cache keys (e.g. `taskIds.{userId} -> taskIds.65495`) by performing functional logical evaluation on the cache key’s query. If the library’s goal became suspended during evaluation, it could proceed by unifying expressions of low multiplicity with all possible values. For instance, if the goal for a filter query became suspended, the library could proceed by considering the `true` and `false` cases of the filter separately.

In this post I would like to talk about sorting and limiting, as well as flesh out some of the data structures that might be used in an automatic redis library.

##### Set

`Set` is the simplest data structure, and forms the foundation for two of our other collection types.

``````type Set a = Data.Set.Set
``````

The monoidal operation for `Set` is simply set union.

##### List

`List` is a `Set` with an embedded sorting function. Tracking the sorting function enables us to compute redis sorted set keys if necessary.

``````data List a b = (Ord b) => List (a -> b) (Set a)
``````

A commonly used sorting function would be `x => x.modifiedDate`.

The monoidal operation for `List` is the merge operation from merge-sort, with one restriction: the sorting functions of both lists must be the same sorting function.

##### LimitedList

`LimitedList` is a `List` with an upper bound on its size.

``````data LimitedList a b = (Ord b) => LimitedList Integer (List a b)
``````

The length of the contained `List` must be less than or equal to the upper bound. Tracking the length enables us to know how to trim cache entries, e.g. when using the ZREMRANGEBYRANK command.

The monoidal operation for `LimitedList` is to merge-sort the two lists and truncate the result to the limit. Similarly to `List`, the library expects both lists to have the same upper limit.

##### First and Last

`First` and `Last` are essentially `LimitedList`s whose upper bound is `1`. Making specialized types for singleton LimitedLists makes working with non-collection redis data structures easier.

``````data First a b = (Ord b) => First (a -> b) (Maybe a)
data Last  a b = (Ord b) => Last  (a -> b) (Maybe a)
``````

Although `First` and `Last` have the same representation, they have different monoidal operations, namely `(x,y) => x` and `(x,y) => y`.

##### Maybe

The `Maybe` type is useful for queries that always generate a unique result (such as lookup by primary key), and as such the `Maybe` type does not need to contain a sorting function.

``````data Maybe a = Nothing | Just a
``````

The monoidal operation is to pick `Just` over `Nothing`, but with the restriction that both arguments cannot be `Just`s.

``````instance Monoid Maybe where
Nothing  `mappend` Nothing  = Nothing
Nothing  `mappend` (Just x) = Just x
(Just x) `mappend` Nothing  = Just x
(Just x) `mappend` (Just y) = error "This should never happen."
``````

Collision of `Just`s can happen if the application developer misuses the `The` operation (defined below). Unfortunately this error cannot be caught by an automatic redis library, because the library never actually computes the value of `mappend`. The library only tracks monoidal types so that it can know what the final redis commands will be.

Speaking of query operations, it’s about time I defined them. But first… one more monoid.

``````data Sum = Sum Integer

instance Monoid Sum where
mappend = (+)
``````

## Query operations

Query operations are parameterized over an input type and an output type.

``````-- QO = Query Operation
data QO input output where
-- The operations Where, Count, Sum, The, and SortBy are not concerned with the ordering
-- of their input, so they can work on Sets, Lists, LimitedLists, Firsts, Lasts,
-- and Maybes. In these constructor definitions, 'coll' can mean any of those types.
-- A real implementation might have multiples versions of these query operations,
--   e.g. WhereSet, WhereList, WhereLimitedList, ..., CountSet, CountList, etc.
Where :: Expr (a -> Boolean) -> QO (coll a)       (coll a)
Count ::                        QO (coll a)       Sum
Sum   ::                        QO (coll Integer) Sum

-- 'The' takes a collection which is expected to have no more than one element
-- and extracts the element.
The   :: QO (coll a) (Maybe a)

-- SortBy converts any kind of collection into a List.
SortBy :: (Ord b) => Expr (a -> b) -> QO (coll a) (List a)

-- Limit, First, and Last, are defined for any (seq)uence:
--   Lists, LimitedLists, Firsts, and Lasts.
Limit :: Integer -> QO (seq a) (LimitedList a)
First ::            QO (seq a) (First a)
Last  ::            QO (seq a) (Last a)

-- Mapping only works on Set!
Select :: Expr (a -> b) -> QO (Set a) (Set b)

-- Well technically Select also works on Maybe, but we'll make a separate
-- query operation for Maybes.
Apply :: Expr (a -> b) -> QO (Maybe a) (Maybe b)

-- Lists contain their sorting function, so we cannot allow arbitrary
-- mapping on lists. We can, however, support monotonic mappings.
SelectMonotonic        :: Expr (a -> b)          -> QO (seq a) (seq b)

-- Mappings which scramble the order are also allowed, as long as we
-- have a way to recover the order. i.e. 'a -> c' has to be monotonic,
-- even though 'a -> b' and 'b -> c' do not.
SelectReversible       :: Expr (a -> b) -> Expr (b -> c) -> QO (seq a) (seq b)
``````

A few more data structures and we will have all the pieces necessary for an application developer to define a cache schema.

``````data Table t = Table String

-- A Query is a sequence of query operations that begins with a table
data Query output where
From :: Table t -> Query (Set t)
Compose :: Query input -> QO input output -> Query output

-- convenience constructor
(+>) = Compose

data CacheKeyDefinition = CacheKeyDefinition {
keyTemplate :: String, -- e.g. "taskIds.{userId}"
}

``````

Putting it all together, we can showcase the cache schema for a simple task management website.

``````
type UserId = String

ownerId :: UserId,
title :: String,
completed :: Boolean,
dueDate :: Integer }

schema = do
-- type: String
-- expected redis commands on insert:
--   SET
Where (\t -> taskId t == tid) +>
The +>
Apply show

-- For each user, the ids of her most urgent tasks.
-- type: Sorted Set, where the keys are the dueDate and the values are the taskIds.
-- expected redis commands on insert:
--   ZREMRANGEBYRANK
Where (\t -> ownerId t == uid && not (completed t)) +>
SortBy dueDate +>
Limit 100 +>
SelectReversible (\t -> (dueDate t, taskId t)) fst

-- The number of tasks a user has successfully completed.
-- type: integer
-- expected redis commands on insert:
--   INCR
"numCompleted.{userId}" \$= \uid ->
It’s important to keep in mind that although I have made the above code look like haskell, no library in haskell could actually use the above code. The variables occuring after the `\$=` sign are logic variables, not function parameters. An EDSL could get close to something like the above, but the normal types for `==` and `&&` are unusable, and the lambdas inside the `Where` clauses would need to be reified anyway.