Data science and analytics
Capturing data pipeline errors functionally with Writer Monads
Either
data type as an alternative way of dealing with error conditions. I will be using Kotlin for my examples, as I feel the syntax is easy to follow. These concepts are not unique to Kotlin however. Any language that supports functional programming can implement them, one way or the other.NullPointerException
, or ArrayIndexOutOfBoundsException
indicate bugs. Other errors are part of the business logic. For instance, a validation failing, an expired authentication token, or a record not being present in the database.goto
statements, which are widely considered a lousy idea. The flow of the program is broken whenever an exception occurs. You land on an unknown point up in the call chain. Reasoning about the flow becomes harder, and it is more likely that you will forget to consider all the possible scenarios.@ExceptionHandler(JWTVerificationException::class) fun handleException(exception: JWTVerificationException): ResponseEntity<ErrorMessage> { return ResponseEntity .status(HttpStatus.BAD_GATEWAY) .body(ErrorMessage.fromException(exception)) }
TokenAuthentication
. This interface defines it:interface Verifier { / ** * @param jwt a jwt token * @return authentication credentials * / fun verify(jwt: String): TokenAuthentication }
Verifier
, we will eventually find something like this:/ ** * Perform the verification against the given Token * * @param token to verify. * @return a verified and decoded JWT. * @throws AlgorithmMismatchException * @throws SignatureVerificationException * @throws TokenExpiredException * @throws InvalidClaimException * / public DecodedJWT verifyByCallingExternalApi(String token);
Verifier
is lying to us! This method might throw an exception. The only way to discover what’s happening is by looking at the implementation. The fact that we have to inspect the implementation to pick this up is a sure sign that encapsulation is lacking.Verifier
implementation.verify
method shouldn’t throw an exception.verify
would then return a TokenAuthentication?
. But it has a fatal flaw: We’re losing all the information about what actually went wrong. If there are different causes for the error, we want to keep that information. Either
(dum dum dum...).Either
, what do I mean by data type? A data type is an abstraction that encapsulates one reusable coding pattern.Either
is an entity whose value can be of two different types, called left and right. By convention, Right
is for the success case and Left
for the error one. It’s a common pattern in the functional community. It allows us to express the fact that a call might return a correct value or an error, and differentiate between the two of them. The Left/Right
naming pattern is just a convention, though. It can help people who have used the nomenclature in existing libraries. You can use a different convention that makes more sense for your team, such as Error/Success
, for instance.when
expression to make the code cleaner and safer at the same time.sealed class Either<out L, out R> { data class Left<out L, out R>(val a: L) : Either<L, R>() data class Right<out L, out R>(val b: R) : Either<L, R>() } fun <E> E.left() = Either.Left<E, Nothing>(this) fun <T> T.right() = Either.Right<Nothing, T>(this)
Either
.Verifier
class now returns an Either
type to indicate that the computation might fail.interface Verifier { / ** * @param jwt a jwt token * @return authentication credentials, or an error if the validation fails * / fun verify(jwt: String): Either<JWTVerificationException, TokenAuthentication> }
Verifier
, we’re wrapping the problematic code with an extension method called unsafeVerify
. We use the extension methods that we defined above to create both sides of an Either:
private fun JWTVerifier.unsafeVerify(jwt: String): Either<JWTVerificationException, TokenAuthentication> = try { verifyByCallingExternalApi(jwt).right() } catch (e: JWTVerificationException) { e.left() }
Either.Left
whenever the verification doesn’t succeed.when
expression thanks to having defined our Either
as a sealed class.val result = verifier.verify(jwt) when (result) { is Either.Left -> ResponseEntity.badRequest().build() is Either.Right -> ResponseEntity.ok("Worked!") }
Either
based on its two possible values (left and right). However, we want to also operate on the value throughout our application without having to unwrap and rewrap it each time, as that makes the code hard to read again.Either
with two new methods, map
and flatMap
. Let's start with map
:fun <L, R, B> Either<L, R>.map(f: (R) -> B): Either<L, B> = when (this) { is Either.Left -> this.a.left() is Either.Right -> f(this.b).right() }
Either
. Either is right biased, which means that once it becomes a Left
value (i.e: an error), further computations won't be applied. Coming back to our ‘unsafeVerify
method, we want to convert the result of that call, which we'll do thanks to our new map
method:verifier .unsafeVerify(jwt) .map { it.asToken() }
Either
itself? If we use map, we'll return an Either
of an Either
, nesting types until it's impossible to use anymore. To prevent that, we'll add a new method, flatMap
.fun <L, R, B> Either<L, R>.flatMap(f: (R) -> Either<L, B>): Either<L, B> = when (this) { is Either.Left -> this.a.left() is Either.Right -> f(this.b) }
Either
as an example. A better idea is to use an existing implementation. The excellent arrow library includes an Either
type, among many other functional goodies.request.getHeader(Headers.AUTHORIZATION) .toEither() .flatMap { header -> header.extractToken() .flatMap { jwt -> verifier .verify(jwt) .map { token -> SecurityContextHolder.getContext().authentication = token } } }
Either.fx { val (header) = request.getHeader(Headers.AUTHORIZATION).toEither() val (jwt) = header.extractToken() val (token) = verifier.verify(jwt) SecurityContextHolder.getContext().authentication = token }
Result
class, which is typically used in a runCatching
block:runCatching { methodThatMightThrow() }.getOrElse { ex -> dealWithTheException(ex) }
Either
integrates nicely with all the other functionality provided by Arrow. In future articles, I plan to write in more detail about other things that you can do with it, such as converting it to an Option
or chaining multiple calls in more complex examples.Either
is a great way to make the error handling in your code more explicit. Interfaces are clearer about what a call can actually return. Moreover, you can safely chain multiple operations, knowing that if something goes wrong along the way, the computation will short circuit. This is especially useful if you are running data pipelines as a batch operation. There you may not want to bail out on the first error, but rather run the batch in full while accumulating errors and successes.Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.