Ensure the contents of one array are in another array

January 17, 2013 Josh Bourgeois

Humans seem to be built so that there’s a point when, while learning a complex system, we begin to forget some of the more basic fundaments of the machines building that system. When I was taking my college calculus course, I could fly through things like finding the limit to a function, calculating sigma notation, or doing whatever it was that we did with those crazy fx/dx things, but if you asked me to add 55+137, I’d have to stop and think about the answer for (let’s see, two goes into the ones column, carry the minute…) a minute and two seconds.

As it is with mathematical systems, sometimes we also forget the simpler functions in our programming language of choice. Tracker does a lot of work with arrays, and in testing that one array has absorbed the contents of another, we tend to go through a loop that looks like this:


origin_array.each { |cargo| destination_array.should include?(cargo) }

It makes sense to iterate over the contents of an array to verify its contents. It’s how we first get taught to inspect an array. But the simpler way to do this in Ruby is to use Array’s intersection function, #&.


(origin_array & destination_array).should == origin_array.uniq

Whereas the iterative test’s fail case would only show you one object and that it was or wasn’t expected in the array, This code returns the array you expected next to the array you got. You might think that the iterative loop is more robust, as it would tell us if any duplicates from origin_array didn’t make it to destination_array. You’re wrong, sorry to say: Unless you’re deleting objects from destination_array as you find them, any duplicates in origin_array are just going match against the same object in destination_array as the object that got compared before it.

Of course, changing your testing paradigm isn’t worth a lick if it doesn’t make your failures more descriptive and/or more performant. It’s a matter of personal opinion whether you prefer RSpec’s array comparison or its more human-readable “expected x to include y, but didn’t”. Performance, though, is pretty cut-and-dry, so I decided I’d make a couple of arbitrarily large arrays and compare the time it took to run these different comparison methods.

I got my degree in English, so to get my arbitrarily large arrays, I made two arrays of all of the words used in The Tale of Two Cities and Great Expectations by Charles Dickens (arrays of 136,237 and 184,418 words, respectively). Then I ran them through this bit of code:

require 'benchmark'
Benchmark.bm do |x|
x.report("intersection") { two_cities & expectations }
x.report("subtraction") { two_cities - expectations }
x.report(" iteration") { two_cities.each{|word| expectations.include?(word)} }
end

I compared against array subtraction as well because I figured that while it returns the inverse of array intersection, it does it in a completely different way, and as such it deserved to be benchmarked too. To give some disclosure, this was performed on a late-2006 iMac with a 2GHz Intel Core 2 Duo processor and 2.5 GB of RAM.


user system total real
intersection 0.090000 0.010000 0.100000 ( 0.141857)
subtraction 0.080000 0.000000 0.080000 ( 0.079905) #it was the best of times
iteration 242.670000 0.570000 243.240000 (259.153025) #it was the worst of times

Anyway yeah, array intersection is a pretty cool guy. Also, it turns out putting things in code blocks means nothing on this WordPress deploy.

About the Author

Biography

Previous
It's business time!
It's business time!

Helps Anyone know how to make capybara-webkit be quiet? When we run our request specs we see this every tim...

Next
Ethics and the Data-Driven Enterprise
Ethics and the Data-Driven Enterprise

Despite the prodigious business value, civic innovation, and predictive insight yielded from the cascading ...