Elixir Pipes in Ruby
I’ve been taking a little time here and there to learn some Elixir over the past couple of months. I’ve really enjoyed the language itself, and it’s been a fantastic introduction to functional programming.
I recently came across a problem in a Ruby app I was working on that provided a little “ah ha!” moment. It involved taking some of what I’ve learned in Elixir–more specifically, it’s [pipe operator |> ](http://elixir-lang.org/docs/stable/elixir/Kernel.html# |
>/2)—and applying it to that Ruby code. |
The Problem
The problem I faced required me to come up with a way to standardize some of the data in one of our database. The gist of the problem was that I needed to take a set of strings that came from a few years of user input and normalize them to a form that we could store elsewhere in the database while minimizing duplication.
class KeywordNormalizer
def call(keywords)
keywords
.map { |kw| kw.gsub(/(^[^[:alpha:]]+|[^[:alpha:]]+$)/, '') }
.map { |kw| LegacySpanishCorrector.new.correct(kw) }
.map { |kw| kw.gsub(/[^[:alpha:]\d'_\-\/]/, '') }
.reject { |kw| STOP_WORDS.include?(kw) }
end
end
This is the first pass of code that “worked” (minus a few of the more mundane transformations) and had our tests passing. Functionally, this was done, but it left a little to be desired.
We have a service object called KeywordNormalizer
. It’s single public method is #call
. It takes an array of strings as its only argument, and it performs a series of transformations on that array.
While this method mostly makes sense, we’re left with a couple of steps in the process that aren’t exactly clear. Let’s focus on the first and third steps. Both of these use regular expressions. While I love the power of regular expressions, they tend to be extremely cryptic, bordering on magical. For the sake of whoever has to touch this code next, these steps could really benefit from a little clarity, so we’ll extract these steps into separate, private methods and give them some decent names.
class KeywordNormalizer
# ...
private
def strip_outer_punctuation(array)
array.map { |el| el.gsub(/(^[^[:alpha:]]+|[^[:alpha:]]+$)/, '') }
end
def fix_spanish(array)
array.map { |el| LegacySpanishCorrector.new.correct(el) }
end
def clean_inner_punctuation(array)
array.map { |el| el.gsub(/[^[:alpha:]\d'_\-\/]/, '') }
end
def remove_stop_words(array)
array.reject { |el| STOP_WORDS.include?(el) }
end
end
Now, our regular expression-laden methods have been given names that give us a little clue about their purpose, “strip outer punctuation” and “clean inner punctuation”. Since the rules around how we handle internal and external punctuation are a little different, it makes sense that these methods are separate, and their names help to point us in that direction.
This refactoring definitely adds clarity around the individual steps, but even here, you can see that something is starting to smell. All four of these methods take an array, perform a transformation of some sort, and return the result.
There’s also an added awkwardness to our original “call” method. It looks like this now.
class KeywordNormalizer
def call(keywords)
keywords = strip_outer_punctuation(keywords)
keywords = fix_spanish(keywords)
keywords = clean_inner_punctuation(keywords)
remove_stop_words(keywords)
end
# ...
end
While each step is a little more clear, the process and flow is more disjointed. All of the local variable assignments add noise and a certain amount of misdirection from the purpose of this method.
I’ve seen this kind of “flow”, if you can call it that, in a number of codebases (and git blame
reminds me that I’ve written more of this kind of code than I’d care to admit). In fact, I’ve seen this pattern enough that I’ve started calling it the “tower of assignment” pattern.
We could just call this “good enough” and leave it as it is. But I think we can do better.
If we go back to our new private methods, this smelliness would normally lead us in the direction of extracting a new object. If we go down that road, we can create a new Collection
class that we’ll just embed in the KeywordNormalizer
for now.
class KeywordNormalizer
class Collection
def initialize(array)
@array = array
end
def strip_outer_punctuation
@array.map { |el| el.gsub(/(^[^[:alpha:]]+|[^[:alpha:]]+$)/, '') }
end
def fix_spanish
@array.map { |el| LegacySpanishCorrector.new.correct(el) }
end
def clean_inner_punctuation
@array.map { |el| el.gsub(/[^[:alpha:]\d'_\-\/]/, '') }
end
def remove_stop_words
@array.reject { |el| STOP_WORDS.include?(el) }
end
end
end
Collection
takes an array as the only argument to #initialize
, and each of our previously private methods are now public and do their transforms on the array that the class was initialized with.
That solves the surface-level problem that was staring us in the face, but if we go all the way back up to our #call
method, things aren’t any better.
class KeywordNormalizer
def call(keywords)
keywords = Collection.new(keywords).strip_outer_punctuation
keywords = Collection.new(keywords).fix_spanish
keywords = Collection.new(keywords).clean_inner_punctuation
Collection.new(keywords).remove_stop_words
end
# ...
end
I’d suggest that, from the #call
method’s perspective, this last refactoring step is a step backwards. We still have the disjointed flow and indirection from before, and now we have the added cognitive load of another class that we need to understand to some degree.
Elixir’s Solution
When I took a break for minute, I was reminded of what I’d been learning in Elixir. One of the most recent exercises I’d gone through involved writing some code that would count the number of times each word occurred in a given string. You’re given a string of some sort, and you’ve got to do similar things to what we’re doing here–remove punctuation, downcase everything, etc.–to get the words in a form state where you can reliably compare them against each other for counting.
During that exercise, I was introduced to the pipe operator. To demonstrate its use, here’s a simplified version of our KeywordNormalizer
class written in Elixir.
defmodule KeywordNormalizer do
def call(keywords) do
strip_outer_punc(keywords) |> fix_spanish |> clean_inner_punc |> remove_stop_words
end
defp strip_outer_punc(list) do
# do work
end
defp fix_spanish(list) do
# do work
end
defp clean_inner_punc(list) do
# do work
end
defp remove_stop_words(list) do
# do work
end
end
Even if you haven’t seen Elixir before, this should still feel pretty familiar. It’s very Ruby-esque in some ways. We’ve got a KeywordNormalizer
module with a call/1
function, which we’ll come back to. Then we’ve got our four private functions (defp
makes a method private, as opposed to the normal def
, which creates a public function). They’re exactly analagous to their Ruby counterparts when we first extracted those methods. Each one takes a List
(similar to Ruby’s Array
) as it’s only argument. We’ll assume that each one does the work it needs to and returns a new, transformed List
.
With that out of the way, let’s focus back on the call/1
function that we’re really interested in. And really, we’re interested in these pipe operators. They way they work is that they output of the function to the left of the operator is passed to the function on the right, as its first argument. It’s a sort of output-to-input flow, which is especially clean here, where each function just takes a single argument. And that’s why we don’t have to explicitly pass any arguments to the last three functions; the pipe operator handles that for us.
The result is a call/1
function that’s very clear and concise. It’s a fantastic picture of what we’re doing.
Translating |> into Ruby
After I reminded myself about Elixir’s pipe operator, I started thinking about how we might be able to implement something like it in Ruby. To begin with, I went back to the basics of what sets these languages apart. As a functional language, Elixir is focused around functions that take input and return output. It, as with any functional language, is steeped in this input/output idea that’s at the heart of the pipe operator.
Ruby, on the other hand, as an object oriented language is focused around sending messages to objects and having those objects do work and/or return values.
The pipe operator works based on the fact that the return of one function can be passed as an argument to the next function. So, to get this sort of thing working in an object oriented world, we need to have the return value of one method respond to the next message we want to send. Instead of the output becoming the input we need, the return value needs to become the object we need.
If we go back to our Collection
class, you can see that we’re halfway there already.
class KeywordNormalizer
class Collection
def initialize(array)
@array = array
end
def strip_outer_punctuation
@array.map { |el| el.gsub(/(^[^[:alpha:]]+|[^[:alpha:]]+$)/, '') }
end
def fix_spanish
@array.map { |el| LegacySpanishCorrector.new.correct(el) }
end
def clean_inner_punctuation
@array.map { |el| el.gsub(/[^[:alpha:]\d'_\-\/]/, '') }
end
def remove_stop_words
@array.reject { |el| STOP_WORDS.include?(el) }
end
end
end
We have an object that responds to every message we want to send in this workflow. All we need to do is to make sure that, every time one of these methods is called, we return a Collection
object rather than a plain Array
. And since a Collection
object is essentially a wrapper around an Array
, this happens to be straightforward.
We can start with this #remove_stop_words
method. Here’s the current implementation.
class KeywordNormalizer
class Collection
# ...
def remove_stop_words
@array.reject { |el| STOP_WORDS.include?(el) }
end
# ...
end
end
After our transformation, it looks like this.
class KeywordNormalizer
class Collection
# ...
def remove_stop_words
new_array = @array.reject { |el| STOP_WORDS.include?(el) }
self.class.new new_array
end
# ...
end
end
We just assign it’s previous return value to a new_array
variable and create a new instance of the Collection
class, passing in the new_array
.
Since we’re going to do this a few times, we’ll pull that logic out into a private method.
class KeywordNormalizer
class Collection
# ...
def remove_stop_words
new @array.reject { |el| STOP_WORDS.include?(el) }
end
# ...
private
def new(new_array)
self.class.new new_array
end
end
end
Lastly, we’ll add a #to_a
method so other objects that use instances of this class can still get at the array itself.
class KeywordNormalizer
class Collection
# ...
def remove_stop_words
new @array.reject { |el| STOP_WORDS.include?(el) }
end
# ...
def to_a
@array
end
private
def new(new_array)
self.class.new new_array
end
end
end
Once we’ve updated the rest of the methods in this class, our original #call
method looks something like this.
class KeywordNormalizer
def call(keywords)
Collection.new(keywords)
.strip_outer_punctuation
.fix_spanish
.clean_inner_punctuation
.remove_stop_words
.to_a
end
# ...
end
We create a Collection
with the keywords that are passed in, we call a series of transformation methods on that set, and then we get an array out the other end.
While we still have the cognitive overhead of a separate class, I’d argue that that overhead isn’t much more than understanding Elixir’s pipe operator. The result is much more readable than our initial attempt, while staying true to the procedural nature of the workflow itself.
Conclusion
As I finished writing this code, it occurred to me that I use objects in a similar way all the time. ActiveRecord::Relation
has been using a similar pattern of chainable methods for a while now. But there was something about the experience I had recently in Elixir that brought me back to this construct in Ruby. Also, it’s one thing to use a code that implements a certain pattern, and it’s a very different thing to come across an instance where you need to implement that pattern yourself.
While this is not a strict implementation of pipes in Ruby, I see this as the OO analogue of functional pipes. As I mentioned at the beginning, this pattern seems to fit much better within the Ruby ecosystem; though, that’s almost certainly a matter of opinion.
I think the thing that’s really stuck with me through this is the power of Ruby (again). We all know that it’s a powerful language and one of it’s greatest features is its flexibility. While we can (and probably all have) used it’s flexibility to a fault, it’s still a fantastic feature. I love that I can experiment with other languages–more specifically other programming paradigms–and they can inform and improve even my Ruby code. I didn’t expect that. That won’t always be the case, but it’s great when it is.