Every so often, a software technique takes the center stage of attention and becomes the source of countless articles, tutorials, and conference talks.
This is particularly true in the world of functional programming with techniques such as:
effect libraries:
effectful
, cats-effectIO
,ZIO
,...
Today I want to give a proper, real-life, example of using the so-called "Typed tagless final" approach. Over the years I have become quite frustrated with how this topic was presented. In the eyes of many people this is mostly a way to create interfaces where the result of each operation has an effect abstracted as a type variable:
trait CustomerRepository[F : Monad] {
def saveCustomer(c: Customer): F[()]
def getCustomers(): F[List[Customer]]
}
In practice F
ends up being a type permitting side-effects like IO
. F
might also end up being State[RepositoryState, _]
in order to write tests purely in memory for example.
I generally argue that:
This is merely using interfaces, parametrized with an effect.
In that case, this does not deserve a complicated and scary named such as "typed tagless final".
The "typed tagless final" approach is much more interesting and covers a different topic than building a system based on modules and interfaces.
What is it really?
You will find the best reference on the topic on Oleg Kiselyov's website:
This is a way to create internal Domain Specific Languages (DSL). "Internal" means that terms are embedded directly in a "host" programming language.
The DSL terms can be well-typed. The type-checker of the host language guarantees it.
The DSL terms can be interpreted in a variety of ways: evaluation, printing, optimization (in a type-preserving way), etc...
The DSL is extensible: new term types can be introduced, existing interpreters can be reused.
Why "Typed tagless final" then? In short:
Typed DSL terms can be typed
Tagless DSL terms do not require to embed tags to keep track of their type
Final DSL terms are not constructed as a specific data type but rather via functions
This possibly looks quite interesting, but still very abstract. Moreover, if you look at examples, most of the time they are about creating a DSL for:
A simplified arithmetic language, or
A version of the Lambda Calculus.
Those are hardly real-life examples.
Data serialization to JSON
Data serialization is (sadly) an essential part of many software projects, with many pitfalls in terms of correctness, performance, maintenance, and productivity.
For example the question on how to evolve serialization protocols, so that we can reuse most of our serialization code from one version to another, is not entirely trivial.
I have created a library registry-aeson
, based on the aeson
Haskell library, to propose a solution to this problem. That library is structured around two data types, Decoder a
, to decode a value of type a
from some JSON text, and Encoder a
to encode a value of type a
to a JSON value.
Encoding values seems a bit more straightforward than decoding them because there's no need for error management, but this is not the case! For performance reasons the aeson
library proposes a ToJSON
typeclass with 2 methods:
class ToJSON a where
toJSON :: a -> Value
toEncoding :: a -> Encoding
toJSON
creates a Value
which can be a String
, an Object
, an Array
etc... (all the types that we expect from a JSON data type). However, if we only had this method, we would spend time encoding to an intermediate data structure then transforming this structure to some bytes on wire. That is pretty ineffective.
This is why the toEncoding
method exists. To transform the value a
directly into bytes, using specific combinators. For example: string "hello"
or pairs (pair "name" value)
.
In the registry-aeson
library, the first version of the Encoder
data type was:
newtype Encoder a = Encoder { encode :: a -> (Value, Encoding) }
In order to define an Encoder
you need to return both a Value
and an Encoding
, like the ToJSON
typeclass does. But this is very tedious, in particular because the API to create values and encodings are very different in the aeson
library.
For example if you want to serialize a data type with both field names and values:
If you want to create a
Value
: you build anObject
with aKeyMap
from a list of keys and values.if you want to go to
ByteString
directly: you makepairs
with aSeries
of encoded values built with thepair
combinator.
What if we could have one expression and interpret it to either a Value
or an Encoding
? This is exactly what the Type Tagless Final approach gives us!
A JSON algebra
We are going to define operations that allow us to build simple JSON terms, with only strings, ints and maps:
-- | Operations used to create JsonTerms
data JsonAlgebra a = JsonAlgebra
{ string_ :: Text -> a,
int_ :: Int -> a,
object_ :: [(Key, a)] -> a,
}
-- | Polymorphic JSON term. It can be interpreted later
newtype JsonTerm = JsonTerm {term :: forall a. JsonAlgebra a -> a}
-- | Interpret a JsonTerm via a specific algebra
interpret :: JsonTerm -> JsonAlgebra a -> a
interpret (JsonTerm t) j = t j
-- | Top-level JSON DSL operations
string :: Text -> JsonTerm
string t = JsonTerm $ \ja -> string_ ja t
int :: Int -> JsonTerm
int b = JsonTerm $ \ja -> int_ ja b
object :: [(Key, forall a. JsonAlgebra a -> Pair a)] -> JsonTerm
object vs = JsonTerm $ \ja ->
object_ ja ((\(k, v) -> (k, interpret v ja)) <$> vs)
A
JsonAlgebra a
defines operations to build a concrete typea
. In our case we are eventually interested ina = Value
anda = Encoding
.A
JsonTerm
is something that uses a concreteJsonAlgebra a
and returns anya
specified by that algebra.The
string
,int
,object
functions are the top-level functions of our JSON DSL. They are the common API to build bothValues
andEncodings
.
Let's see an example of a JsonTerm
. For example the JSON value:
{ "name": "eric", "credits": 100 }
can be represented with the JsonTerm
:
let term :: JsonTerm =
object [("name", string "eric"), ("credits", int 100)]
This JsonTerm
can be interpreted via concrete JsonAlgebra
implementation. We can create such an algebra to build Values
:
valueJsonAlgebra :: JsonAlgebra Value
valueJsonAlgebra = JsonAlgebra {..}
where
string_ :: Text -> Value
string_ = String
int_ :: Int -> Value
int_ = Number . fromInteger . integerFromInt
object_ :: [(Key, Value)] -> Value
object_ = Object . fromList
This algebra replaces each invocation of a function in a JsonTerm
with an implementation building a Value
. Hence, if we apply it to our term
above we get a Value
:
let value :: Value = interpret term valueJsonAlgebra
> print value
> Object (fromList [("name",String "eric"), ("credits",Number 100.0)])
We can also define an implementation building an aeson
's Encoding
:
import Data.Aeson.Encoding as E
encodingJsonAlgebra :: JsonAlgebra Encoding
encodingJsonAlgebra = JsonAlgebra {..}
where
string_ :: Text -> Encoding
string_ = E.text
int_ :: Int -> Encoding
int_ = E.scientific . fromInteger . integerFromInt
object_ :: [(Key, Encoding)] -> Encoding
object_ = E.pairs . foldMap identity . fmap (\(k, v) -> E.pair k v)
This time we can interpret our JSON term directly as an Encoding
(more or less a ByteString
):
let encoding :: Encoding = interpret term encodingJsonAlgebra
> print encoding
> "{\"name\":\"eric\",\"credits\":100}"
Two interpretations for the same term! Now, users of the registry-aeson
library don't have to produce 2 expressions, with a different API, to generate efficient JSON ๐ค.
Putting the "typed" in "typed tagless final"
Our DSL is well-typed. It is not possible to use its API to build terms which cannot be meaningfully interpreted. However, it is not quite optimal because we are missing the point of aeson
's Encoding
a bit. Our object
operation requires building tuples and putting them in a list instead of building a ByteString
right away.
On the other hand, the documentation for Encoding
shows this example:
toEncoding (Person name age) = pairs ("name" .= name <> "age" .= age)
In the example above:
"name" .= name
creates aSeries
which directly contains the serialized string"name":"eric"
It is appended with
<>
to anotherSeries
to directly create the comma-separated string"name":"eric","age":100
We need better operations and better types!
Here is another version of the JsonAlgebra
:
data JsonAlgebra r = JsonAlgebra
{ string_ :: Text -> r (),
int_ :: Int -> r (),
pair_ :: Key -> r () -> r Key,
empty_ :: r Key,
concatenate_ :: r Key -> r Key -> r Key,
object_ :: r Key -> r (),
}
This time we introduce 3 new operations, almost as a sub-DSL:
pairs_
to create a key/value pair, ready to be added to other key/value pairsempty_
to denote the empty list of key/value pairsconcatenate_
to concat 2 lists of key/value pairs
We also use another representation, r
, parametrized by a type in order to track the type of term we are constructing:
r ()
is for regular terms denoting normal JSON valuesr Key
is for lists of key/value pairs being constructed before being passed to theobject_
function. We can choose any type that is different from()
, I just chose to reuse an existing type.
This new representation still guarantees that our DSL is type-safe, but with no need to introduce types like lists and pairs in the algebra operations.
If we add new top-level functions, we can see the new operations in action:
pair :: Key -> JsonTerm () -> (forall r. JsonAlgebra r -> r Key)
pair k v ja = pair_ ja k (interpret v ja)
(><) :: (forall r. JsonAlgebra r -> r Key) -> (forall r. JsonAlgebra r -> r Key) -> (forall r. JsonAlgebra r -> r Key)
(><) = concatenate_ ja (v1 ja) (v2 ja)
object :: (forall r. JsonAlgebra r -> r A.Key) -> JsonTerm ()
object vs = JsonTerm $ \ja -> object_ ja (vs ja)
Now we can build a term with:
let term :: JsonTerm =
object $ (pair "name" (string "eric") ><
(pair "credits" (int 100))
In this example, we are really concatenating pairs of key/values, as we go, without having to build an intermediary list of tuples (this is very similar to what happens in the Encoding
API of aeson
).
Can we still get both Values
and Encodings
from such a term? Absolutely, we just need to find the right implementation and representation for each concrete algebra.
Our previous algebras were parametrized with Value
and Encoding
, now we will use Values
and Encoded
:
data Values k where
SingleValue :: Value -> Values ()
ManyValues :: [(A.Key, Value)] -> Values A.Key
valueJsonAlgebra :: JsonAlgebra Values
valueJsonAlgebra = ...
data Encoded k where
Encoded :: Encoding -> Encoded ()
CommaSeparated :: E.Series -> Encoded A.Key
encodingJsonAlgebra :: JsonAlgebra Encoded
encodingJsonAlgebra = ...
It is not hard to implement each JsonAlgebra
with those data types:
Values
simply accumulates key/value pairs in a list when thepair_
andconcatenate_
operations are called.Encoded
re-uses theSeries
data type defined inaeson
'sEncoding
to efficiently create comma-separated string as key/value pairs are being added.
Now interpreting a JsonTerm
with a JsonAlgebra
produces either a Values
value or an Encoded
value. We need 2 more functions to produce a Value
or an Encoding
:
toValue :: Values k -> Value
toValue (SingleValue v) = v
toValue (ManyValues vs) = Object . fromList $ vs
toEncoding :: Encoded k -> Encoding
toEncoding (Encoded e) = e
toEncoding (CommaSeparated s) = E.pairs s
Wrapping-up
This whole post will probably look a bit scary if you have never read about the (proper) typed tagless final approach (or if you are new to Haskell and aeson
๐). Let me summarize:
We have defined a DSL to create simple JSON terms with strings, ints and maps.
This DSL uses the operations:
string
int
pair
object
let term :: JsonTerm = object $ (pair "name" (string "eric") >< (pair "credits" (int 100))
We can eventually interpret a DSL term to either:
a
Value
to do JSON manipulationsan
Encoding
to efficiently serialize data to binary
I hope that this blog post will encourage you to read more about this technique for creating DSLs, you never know when you might need it!