Tormenting Your Tests with Heckle
Posted by kev Tue, 19 Dec 2006 09:24:00 GMT
Update: Ruby2Ruby is having gem propogation issues. Feel free to download the gem here directly and install via gem install ruby2ruby-1.1.2.gem.
Update 2: We’ve found a bug in the loading that causes problems when you supply a method to Heckle. A bug fix has been checked into the repo and we’re preparing a release. Look for 1.1.1 soonish.
Update 3: Ok, 1.1.1 is out the door. The gem server is syncing, so look for a new version this afternoon (12/20) with several bugs including the loading error fixed.
Yes, I know what you’re thinking. “Holy crap, Kevin posted for the first time in months! I thought he died, or got eaten by a corporate zombie, or set out on a epic adventure to find himself.” But hey, good things come to those who wait, right?
So, you’ve been waiting, and I’ve been writing Heckle. It’s a good thing.
Heckle is a mutation tester. It modifies your code and runs your tests to make sure they fail. The idea is that if code can be changed and your tests don’t notice, either that code isn’t being covered or it doesn’t do anything.
It’s a little weird, I know, but I like to think about it as pen-testing. It’s like hiring a white-hat hacker to try to break into your server and making sure you detect it. You learn the most by trying to break things and watching the outcome.
Anyway, Heckle was inspired by Jester, and Ryan Davis wrote a proof of concept at RubyConf. As he notes, I went a little nuts and much of the current implementation I rewrote that night or on the plane home.
You can install Heckle from Ruby Gems:
gem install heckle --include-dependencies Let’s take the new toy out for a test drive.
Saying Hello to Branch Coverage
Sometimes line based code coverage tools can’t catch gaps. For example, let’s say we’re working on some simple greeter system. Our initial code and tests look like this:
class Greeter
def initialize(person)
@person = person
end
def greet
"Hi #{@person}!"
end
end require "test/unit"
class TestGreeter < Test::Unit::TestCase
def test_greet
@greeter = Greeter.new('Kevin')
assert_equal 'Hi Kevin!', @greeter.greet
end
endTests pass, and for this trivial example, coverage seems to be there. Running rcov confirms that every line in the Greeter class is being executed. But what happens when we decide to make the person attribute optional?
class Greeter
def initialize(person = nil)
@person = person
end
def greet
@person.nil? ? "Hi there!" : "Hi #{@person}!"
end
endWith this implementation, tests still pass and rcov still reports 100% coverage. Still, we know that a branch in that if isn’t being tested. Enter Heckle.
First let’s take a look at what Heckle tells us about these tests, and then we can go over how it does it. Usage information for Heckle is rather simple:
odysseus:~/code/heckle_demo kev$ heckle
Usage: heckle class_name [method_name]
-v, --verbose Loudly explain heckle run
-t, --tests TEST_PATTERN Location of tests (glob)
-h, --help Show this messageA simple run looks like this:
odysseus:~/code/heckle_demo kev$ heckle Greeter
Initial tests pass. Let's rumble.
**********************************************************************
*** Greeter#greet loaded with 3 possible mutations
**********************************************************************
3 mutations remaining...
2 mutations remaining...
1 mutations remaining...
The following mutations didn't cause test failures:
def greet
if @person.nil? then
"z#\010]\021\r\e3&TX\001z+\021fOy\016N6\t%F\acu\027\023w\024;}3Vcs>\035\017<Nc]ra\023V0\005 3UB\031]97rN1L\017\020TVJ\t\003k!l;\fA\036?[{lj;}ir2fPNaI\020\020w6$\eR*"
else
"Hi #{@person}!"
end
endHeckle replaced the string, “Hi there!” with a bunch of random characters but the tests still passed. The situation where @person is nil was never tested. If we add a new test then Heckle should quiet down:
def test_greet_nobody
@greeter = Greeter.new
assert_equal 'Hi there!', @greeter.greet
end odysseus:~/code/heckle_demo kev$ heckle Greeter
Initial tests pass. Let's rumble.
**********************************************************************
*** Greeter#greet loaded with 3 possible mutations
**********************************************************************
3 mutations remaining...
2 mutations remaining...
1 mutations remaining...
No mutants survived. Cool!Wait.. What? How’d it do that?
Heckle works by using the ParseTree and RubyToRuby libraries to grab the abstract syntax tree of methods, modify them, and evaluate the redefined method before running your tests. It can do all of this atomically, so each change can be seen individually. If you’d like to watch the action take place, you can supply the -v option. That last test run looks like this in verbose mode:
odysseus:~/code/heckle_demo kev$ heckle -v Greeter
Loaded suite /usr/local/bin/heckle
Started
..
Finished in 0.000447 seconds.
2 tests, 2 assertions, 0 failures, 0 errors
Initial tests pass. Let's rumble.
**********************************************************************
*** Greeter#greet loaded with 3 possible mutations
**********************************************************************
3 mutations remaining...
Replacing Greeter#greet with:
def greet
if @person.nil? then
"uO i\032X#mcV"
else
"Hi #{@person}!"
end
end
Loaded suite /usr/local/bin/heckle
Started
.F
Finished in 0.00812000000000002 seconds.
1) Failure:
test_greet_nobody(TestGreeter) [./test/test_greeter.rb:13]:
<"Hi there!"> expected but was
<"uO i\032X#mcV">.
2 tests, 2 assertions, 1 failures, 0 errors
Tests failed -- this is good
2 mutations remaining...
Replacing Greeter#greet with:
def greet
if @person.nil? then
"Hi there!"
else
"Hi #{@person}\0204\026\036]7D\020#wC\010&=-\004\017\t7.x\036\ap07hqO\f^\025\003+P\016]<0M\vV`lbU\e"
end
end
Loaded suite /usr/local/bin/heckle
Started
F.
Finished in 0.001194 seconds.
1) Failure:
test_greet(TestGreeter) [./test/test_greeter.rb:8]:
<"Hi Kevin!"> expected but was
<"Hi Kevin\0204\026\036]7D\020#wC\010&=-\004\017\t7.x\036\ap07hqO\f^\025\003+P\016]<0M\vV`lbU\e">.
2 tests, 2 assertions, 1 failures, 0 errors
Tests failed -- this is good
1 mutations remaining...
Replacing Greeter#greet with:
def greet
if @person.nil? then
"Hi #{@person}!"
else
"Hi there!"
end
end
Loaded suite /usr/local/bin/heckle
Started
FF
Finished in 0.001984 seconds.
1) Failure:
test_greet(TestGreeter) [./test/test_greeter.rb:8]:
<"Hi Kevin!"> expected but was
<"Hi there!">.
2) Failure:
test_greet_nobody(TestGreeter) [./test/test_greeter.rb:13]:
<"Hi there!"> expected but was
<"Hi !">.
2 tests, 2 assertions, 2 failures, 0 errors
Tests failed -- this is good
No mutants survived. Cool!FAQ
So what can Heckle.. um.. heckle?
In version 1.1, Heckle will create random replacements for: Strings, Regexps, Symbols, Ranges, and the Numeric types (Fixnum, Float, Bignum). It will flip true to false and vice versa. It will also flip the branches on if and unless statements, as well as until and while statements.
I used Jester and it was really slow. How’s Heckle?
Really very fast. There’s no compile step for Heckle (as there is when you modify Java code with Jester), so the bottleneck is usually your tests. Fast tests mean fast heckling.
What other options can Heckle take?
The other significant option heckle takes is --tests. This flag is used to give a pattern (Glob format) which matches the tests that should be loaded. This defaults to “test/test_*.rb”. If you have lots of test files and really only care about a few for a certain class, you may want to specify them using --tests to speed things up.
Also, though I didn’t show it in the examples, Heckle can run against a single method by supplying it after the class name.
If it modifies code, can’t bad things happen?
Well, yes. Heckle could feasibly break things. It throws crap into your code on purpose. It flips unless and while loops so infinite loops will probably occur at some point. For the next release I’m planning to put in some sort of timeout to avoid that.
Additionally, know what your code is doing. If randomly changing a string is going to actually break things irrevocably in testing, you probably should be stubbing those dangerous methods (eg. You probably shouldn’t run Heckle against methods that really delete files during testing if it’s based on a string).
But, does it work with Rails?
You bet your sweet tests. However, you probably want to run against methods by hand since Rails tends to add a whole bunch of methods on the fly (with associations, validations and other helpers) that you wouldn’t want to heckle.
Is there rSpec Support?
I used Test::Unit for my examples, but I’ve been working with Aslak Hellesoy on the rSpec team to make sure support is there, and they’ve added a --heckle flag which should be there in the next version.
Wait, so this is like… testing my tests?
Basically. Cool, huh?
Thanks
A big thanks to Ryan Davis for starting me on this whirlwind, and to he and Eric Hodel for ParseTree and RubyToRuby. Aslak Hellesoy also deserves recognition for his help refactoring the reporting system and his work with rSpec integration.
I’m really excited about this project, and I think it has a lot to offer the testing world. I’m sure there are bugs, so feel free to report them at the rubyforge tracker.
Help spread the word by digging Heckle.


Does this rely on ruby2ruby 1.1.2? gem doesn’t find it online, although it looks like one could manually download 1.1.1.
Sorry, looks like there was a problem with the ruby2ruby push last night. I’m investigating.
Yes, this will work with ruby2ruby 1.1.1, but there are bug fixes in 1.1.2 that you should have as soon as you can get it.
Great intro Kevin! See Heckle with RSpec for the RSpec version.
Ruby2Ruby is having gem propagation issues. Feel free to download the gem directly (http://rubyforge.org/frs/download.php/15738/ruby2ruby-1.1.2.gem) and install via gem install ruby2ruby-1.1.2.gem.
Ha! This is so cool! (except for the crashing on half of my classes part :))
Laurel: Thanks for the bug reports earlier! Keep it up as you find them.
I hope we can look forward to more innovations from the magical team at Powerset. If you’re a search engine, you better get ready to get rocked.
Yes, cool. But you know what would be cooler? A way to test these test tests.
I don’t understand floyd—would a program like that be testable?
Heckle will soon be able to heckle itself, if that’s what you’re looking for. It wasn’t able to previously because of a bug in ruby2ruby or ParseTree (I don’t recall) which has since been fixed.