IORCC Deobfuscation: Ruby Quiz Loader by James Edward Gray II

James says that he learned a lot and suffered even more when writing this obfuscated piece of Ruby code and if you have a look at even the out most layer of his code you will have to agree that he's not kidding about the suffering part. He's used quite a few evil tricks in this one.

Let's start by having a look at the complete code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$_=%{q,l= %w{Ruby\\ Quiz Loader}
n,p,a= "\#{q.do#{%w{w a n c}.sort{|o,t|t<=>o}}se.d\x65l\x65t\x65(' ')}.com/",
{"bmJzcA==\n".\x75np\x61ck("m")[0]=>" ","bHQ=\n".\x75np\x61ck((?n-1).chr)[0]=>
:<,"Z3Q=\n".\x75np\x61ck("m")[0]=>:>,"YW1w\n".\x75np\x61ck((?l+1).chr)[0]=>:&},
[[/^\\s+<\\/div>.+/m,""],[/^\\s+/,""],[/\n/,"\n\n"],[/<br \\/>/,"\n"],
[/<hr \\/>/,"-="*40],[/<[^>]+>/,""],[/^ruby/,""],[/\n{3,}/,"\n\n"]];p\165ts"
\#{l[0..-3]}ing...\n\n";send(Kernel.methods.find_all{|x|x[0]==?e}[-1],
"re\#{q[5...8].downcase}re '111112101110-117114105'.scan(/-|\\\\d{3}/).
inject(''){|m,v|v.length>1?m+v.to_i.chr: m+v}");o#{%w{e P}.sort.join.downcase
}n("http://www.\#{n}"){|w|$F=w.read.sc\x61n(/li>.+?"([^"]+)..([^<]+)/)};\160uts\
"\#{q}\n\n";$F.\145\141ch{|e|i=e[0][/\\d+/];s="%2s.  %s"%[i,e[1]];i.to_i%2==0 ?
\160ut\x73(s) : #{%w{s p}[-1]}rint("%-38s  "%s)};p\x72\x69\x6et"\n?  ";e\x76al(
['puts"\n\#{l[0..3]}ing...\n\n"','$c=gets.chomp.to_i'].sort.join(";"));#{111.chr
}pen("http://www.\#{n}"+$F[$c-1][0]){|n|$_=n.read[/^\\s+<span.+/m];#{('a'.."z").
to_a[10-5*2]}.e\141ch{|(z,f)|\x67sub!(z,f)};\147sub!(/&(\\w+);/){|y|p.
ke\171\077($1)?p[$1]:y};while$_=~/([^\n]{81,})/:z=$1.dup;f=$1.dup;f[f.rindex(
" ",80),1]="\n";f.s\165b!(/\n[ \t]+/,"\n");s\165b!(/\#{R\x65g\x65xp.
\x65scap\x65(z)}/,f)end};while\040\163ub!(/^(?:[^\n]*\n){20}/, ""):puts"\#$&
--\x4dO\x52E--";g=$_;g#{"\145"}ts;;#{"excited"[0..4].delete("c")}\040if$_[0]==?q
$_=g;end;$_.d#{"Internet Service Provider".scan(/[A-Z]/).join.downcase
}lay};eval$_

We basically have a huge braces-delimited string with lots of funky embedded escaping and interpolation that gets evaluated. The escapes are fairly self-explanatory (but it's important to know that \# delays interpolation until later) so I'll just have a quick look at the interesting interpolation constructs:

2
%w{w a n c}.sort{|o,t|t<=>o}

This sorts the letters "w", "a", "n" and "c" in reverse order thus producing ["w", "n", "c", "a"] — when this is interpolated into the string the method name 'downcase' is produced.

9
%w{e P}.sort.join.downcase

Sorts the letters 'e' and 'P' in alphabetical order, joins them together and downcases the result thus producing 'ep' which is interpolated into the string to form the 'open' method name.

14
15
('a'.."z").
to_a[10-5*2]

See, this is what I was talking about in the introduction. Here Jamis could just have written "a", but that would have been way too wimpish.

19
"excited"[0..4].delete("c")

I especially enjoyed this one. What an exciting way of exiting!

20
"Internet Service Provider".scan(/[A-Z]/).join.downcase

Another good one. Extracts the capital letters and downcases them thus producing the "isp" of "display".

The hidden code you aren't supposed to see

So here's the code that the eval from above does actually execute — I've inserted a few line breaks to protect against too long lines:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
q,l= %w{Ruby\ Quiz Loader}
n,p,a= "#{q.downcase.delete(' ')}.com/",
{"bmJzcA==
".unpack("m")[0]=>" ","bHQ=
".unpack((?n-1).chr)[0]=>
:<,"Z3Q=
".unpack("m")[0]=>:>,"YW1w
".unpack((?l+1).chr)[0]=>:&},
[[/^\s+<\/div>.+/m,""],[/^\s+/,""],[/
/,"

"],[/<br \/>/,"
"],
[/<hr \/>/,"-="*40],[/<[^>]+>/,""],[/^ruby/,""],[/
{3,}/,"

"]];puts"
#{l[0..-3]}ing...

";send(Kernel.methods.find_all{|x|x[0]==?e}[-1],
"re#{q[5...8].downcase}re '111112101110-117114105'.scan(/-|\\d{3}/).
inject(''){|m,v|v.length>1?m+v.to_i.chr: m+v}");
open("http://www.#{n}"){|w|$F=w.read.scan(/li>.+?"([^"]+)..([^<]+)/)};puts"#{q}

";$F.each{|e|i=e[0][/\d+/];s="%2s.  %s"%[i,e[1]];i.to_i%2==0 ?

puts(s) : print("%-38s  "%s)};print"
?  ";eval(
['puts"
#{l[0..3]}ing...

"','$c=gets.chomp.to_i'].sort.join(";"));
open("http://www.#{n}"+$F[$c-1][0]){|n|$_=n.read[/^\s+<span.+/m];
a.each{|(z,f)|gsub!(z,f)};gsub!(/&(\w+);/){|y|p.
key?($1)?p[$1]:y};while$_=~/([^
]{81,})/:z=$1.dup;f=$1.dup;f[f.rindex(
" ",80),1]="
";f.sub!(/
[ 	]+/,"
");sub!(/#{Regexp.
escape(z)}/,f)end};while sub!(/^(?:[^
]*
){20}/, ""):puts"#$&
--MORE--";g=$_;gets;;exit if$_[0]==?q
$_=g;end;$_.display

By the time you read this you are probably wondering why a nice Quiz-managing guy like James would all of a sudden switch to the dark side. Trust me, I'm as shocked as you are. Anyway, let's try to bring order into that chaos by reorganizing whitespace and variable assignment:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
q, l = %w{Ruby\ Quiz Loader}
n = "#{q.downcase.delete(' ')}.com/"

p = {
  "bmJzcA==\n".unpack("m")[0] => " ",
  "bHQ=\n".unpack((?n - 1).chr)[0] => :<,
  "Z3Q=\n".unpack("m")[0] => :>,
  "YW1w\n".unpack((?l + 1).chr)[0] => :&
}

a = [
  [/^\s+<\/div>.+/m, ""],
  [/^\s+/,           ""],
  [/\n/,             "\n\n"],
  [/<br \/>/,        "\n"],
  [/<hr \/>/,        "-=" * 40],
  [/<[^>]+>/,        ""],
  [/^ruby/,          ""],
  [/\n{3,}/,         "\n\n"]
];

puts "\n#{l[0..-3]}ing...\n\n";

send(Kernel.methods.find_all{|x|x[0]==?e}[-1],
"re#{q[5...8].downcase}re '111112101110-117114105'.scan(/-|\\d{3}/).
inject(''){|m,v|v.length>1?m+v.to_i.chr: m+v}");

open("http://www.#{n}") { |w|
  $F = w.read.scan(/li>.+?"([^"]+)..([^<]+)/)
};

puts "#{q}\n\n";

$F.each { |e|
  i = e[0][/\d+/];
  s = "%2s.  %s" % [i, e[1]];
  i.to_i % 2 == 0 ?
    puts(s) :
    print("%-38s  " % s)
};

print "\n?  ";
eval(
  [
    'puts"\n#{l[0..3]}ing...\n\n"',
    '$c=gets.chomp.to_i'
  ].sort.join(";")
);

open("http://www.#{n}" + $F[$c - 1][0]) { |n| 
  $_ = n.read[/^\s+<span.+/m];

  a.each { |(z, f)| 
    gsub!(z, f)
  };

  gsub!(/&(\w+);/) { |y|
    p.key?($1) ? p[$1] : y
  };

  while $_ =~ /([^\n]{81,})/:
    z = $1.dup;
    f = $1.dup;
    f[f.rindex(" ", 80), 1] = "\n";
    f.sub!(/\n[ \t]+/, "\n");
    sub!(/#{Regexp.escape(z)}/, f)
  end
};

while sub!(/^(?:[^\n]*\n){20}/, ""):
  puts "#$&\n--MORE--";
  g = $_;
  gets;;
  exit if $_[0] == ?q
  $_ = g;
end;

$_.display

It's still quite a chunk of raw code and it yet contains a few obscure eval calls, but it's already starting to make sense, isn't it? We'll examine it in small easily digestable parts.

A few basic variables

1
2
q, l = %w{Ruby\ Quiz Loader}
n = "#{q.downcase.delete(' ')}.com/"

This is looks simple enough, but it does in fact contain a small gotcha you need to be aware of: That string literal in line 1 is a word list. Word lists are white space delimited and get converted to an array of their words. This one contains an escaped space character which means that the it parses to ["Ruby Quiz", "Loader"] — and an array in list context gets auto-splatted. Thus q will be "Ruby Quiz" and l will be "Loader".

The next line is then easily understood: It uses the "Ruby Quiz" in q to construct the string "rubyquiz.com/" and assigns it to the variable n.

4
5
6
7
8
9
p = {
  "bmJzcA==\n".unpack("m")[0] => " ",
  "bHQ=\n".unpack((?n - 1).chr)[0] => :<,
  "Z3Q=\n".unpack("m")[0] => :>,
  "YW1w\n".unpack((?l + 1).chr)[0] => :&
}

This is a hash with mime-encoded string literals that get decoded via more or less obfuscated unpack("m") calls. Let's decipher the keys:

4
5
6
7
8
9
p = {
  "nbsp" => " ",
  "lt"   => :<,
  "gt"   => :>,
  "amp"  => :&,
}

Interesting, if we ignore for now the fact that symbols are used where we would expect strings then we can already see that this could be used for unescaping of HTML entities.

11
12
13
14
15
16
17
18
19
20
a = [
  [/^\s+<\/div>.+/m, ""],
  [/^\s+/,           ""],
  [/\n/,             "\n\n"],
  [/<br \/>/,        "\n"],
  [/<hr \/>/,        "-=" * 40],
  [/<[^>]+>/,        ""],
  [/^ruby/,          ""],
  [/\n{3,}/,         "\n\n"]
];

Hm, this seems to be an ordered regular expression substitution table tailored for converting HTML into plain old text. Note that the first regexp has a /m modifier meaning that . will also match newlines. The rules themself are fairly simple:

The one on line 12 will remove a </div> that is preceded by whitespace and one a line by itself, the whitespace before it and anything that comes after it. If you have a look at the source code of the RubyQuiz.com index and entry pages you will note that such a </div> comes after the list of entries in the index and after the actual page content on the entry pages.

The substitution on line 13 will just removes whitespace at the start of lines.

The one on line 14 converts single line breaks into paragraph breaks.

The rule on line 15 converts the HTML tag for a line break into a real line break. The page uses HTML line breaks in sample and code listings.

The one on line 16 converts the HTML tag for a horizontal ruler into alternating dashes and equal signs that are exactly 80 characters long. The quiz pages use the <hr /> tag for separating the challenge description and the quiz summary.

On line 17 there's a rule for removing the remaining HTML tags.

I think the rule on line 18 is only used for removing the Ruby source code type markers.

Finally, the last rule on line 19 collapses more than two newlines to two ones. This ensures that the output is nicely formatted into paragraphs.

Fetching the pages

Now that we have explained the purpose of all those variables we can finally look at code that actually does something by itself!

22
puts "\n#{l[0..-3]}ing...\n\n";

This uses the "Load" part of the "Loader" in the l variable to output "Loading..." preceded and followed by an empty line.

24
25
26
send(Kernel.methods.find_all{|x|x[0]==?e}[-1],
"re#{q[5...8].downcase}re '111112101110-117114105'.scan(/-|\\d{3}/).
inject(''){|m,v|v.length>1?m+v.to_i.chr: m+v}");

First of all, this piece of code depends on the order that methods will be returned by Object#methods and I think that that order is not guaranteed to be stable — as far as I know it might basically change depending on lots of factors. That aside, this will currently be equival to send("eval", ...).

The code supplied to eval containts another interpolation: This time the "Qui" of the "Ruby Quiz" in q is used to construct the method name require. Let's have a look at the code this executed in more readable form:

1
2
3
4
5
require '111112101110-117114105'.
  scan(/-|\d{3}/).
  inject('') { |m, v|
    v.length > 1 ? m + v.to_i.chr : m + v
  }

This code utilizes the scan method to tokenize that cryptic string to ['111', '112', '101', '110', '-', '117', '114', '105']. It then uses inject to accumulate that array into a new string. If the item's length is 1 then it will just chain the item to the result. (This is only the case for the dash.) Otherwise it will convert the item into the ASCII character of the item's numerical value.

The argument of require coincidentally happens to be 'open-uri'. The open-uri library lets us get web and other resources by doing open(uri) { |f| f.read } which is quite useful when doing web work.

28
29
30
open("http://www.#{n}") { |w|
  $F = w.read.scan(/li>.+?"([^"]+)..([^<]+)/)
};

And that's exactly what this code does now that the open-uri library has been loaded: It reads the HTML page at http://www.rubyquiz.com/ and extracts the quiz listing sidebar's contents into the global variable $F as an array of arrays. The sub-arrays are of the form [link, title].

The displaying of the table of contents

32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
puts "#{q}\n\n";

$F.each { |e|
  i = e[0][/\d+/];
  s = "%2s.  %s" % [i, e[1]];
  i.to_i % 2 == 0 ?
    puts(s) :
    print("%-38s  " % s)
};

print "\n?  ";
eval(
  [
    'puts"\n#{l[0..3]}ing...\n\n"',
    '$c=gets.chomp.to_i'
  ].sort.join(";")
);

This is quite a bit of code so we'll again split it into smaller parts:

32
puts "#{q}\n\n";

That code only outputs "Ruby Quiz" followed by an empty line.

34
35
36
37
38
39
40
$F.each { |e|
  i = e[0][/\d+/];
  s = "%2s.  %s" % [i, e[1]];
  i.to_i % 2 == 0 ?
    puts(s) :
    print("%-38s  " % s)
};

Ah, it's getting more complex now. This iterates over all the sub-arrays of $F which, as we discovered above, contains sub-arrays of quiz URIs and titles.

For every entry it first finds out the quiz number which fortunately is contained in the page URI. It does this via the str[regexp] method form which takes a Regexp and returns the first match.

It then uses format_str % args to format the quiz number and title in a nice way and assigns the result of doing so to s.

Then it checks whether the quiz numbers is even or odd and either displays the formated string on a new line or on the current one, but padded to the right. This is necessary because the index screen is two-columned.

42
print "\n?  ";

This simply displays an input marker ("? ") preceded by an empty line.

43
44
45
46
47
48
eval(
  [
    'puts"\n#{l[0..3]}ing...\n\n"',
    '$c=gets.chomp.to_i'
  ].sort.join(";")
);

This is another slightly tricky part. eval executes the code in the Array sorted by ASCII order and joined together with a semicolon. Because the dollar sorts before letters we get the following code:

1
2
puts "\n#{l[0..3]}ing...\n\n";
$c = gets.chomp.to_i

And that code again displays "Loading..." followed by an empty line and then reads a number and stores it to the global $c.

The quiz itself

50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
open("http://www.#{n}" + $F[$c - 1][0]) { |n| 
  $_ = n.read[/^\s+<span.+/m];

  a.each { |(z, f)| 
    gsub!(z, f)
  };

  gsub!(/&(\w+);/) { |y|
    p.key?($1) ? p[$1] : y
  };

  while $_ =~ /([^\n]{81,})/:
    z = $1.dup;
    f = $1.dup;
    f[f.rindex(" ", 80), 1] = "\n";
    f.sub!(/\n[ \t]+/, "\n");
    sub!(/#{Regexp.escape(z)}/, f)
  end
};

while sub!(/^(?:[^\n]*\n){20}/, ""):
  puts "#$&\n--MORE--";
  g = $_;
  gets;;
  exit if $_[0] == ?q
  $_ = g;
end;

$_.display

This is the hugest section of the code. Again, we will use the divide and conquer strategy for understanding it.

50
51
open("http://www.#{n}" + $F[$c - 1][0]) { |n| 
  $_ = n.read[/^\s+<span.+/m];

In line 50 the selected quiz page is retrieved: $F is an array of [link, title] tuplets and zero-based which is the reason why $c is decreased by one. So the argument of open is "http://www.rubyquiz.com/" + selected_link.

The code from line 51 then extracts everything from the page title on (marked-up as <span class="title">) to $_.

53
54
55
  a.each { |(z, f)| 
    gsub!(z, f)
  };

This applies the ordered HTML to plain text substitution table a to $_. (Kernel#gsub! is equivalent to $_.gsub! which Ruby took from perl and which is only supposed to get used in shell one-liners and obfuscations.)

57
58
59
  gsub!(/&(\w+);/) { |y|
    p.key?($1) ? p[$1] : y
  };

Processes all entities in the HTML source: If the entity unescaping hash in p contains the entity it is unescaped, else it will be left untouched. You might be remembering that some of the hash's values were actually symbols instead of strings — it's interesting to know that the block form of gsub! will automatically convert the result of the block to a string.

61
62
63
64
65
66
67
68
  while $_ =~ /([^\n]{81,})/:
    z = $1.dup;
    f = $1.dup;
    f[f.rindex(" ", 80), 1] = "\n";
    f.sub!(/\n[ \t]+/, "\n");
    sub!(/#{Regexp.escape(z)}/, f)
  end
}

This is fairly nice code for rewrapping lines longer than 80 characters.

The while uses a regexp to iterate over all pieces of text that contain no newlines for more than 80 characters. (By the way, Jamis could have used ~regexp here as that will apply the regexp to $_ as well.) Note the colon at the end of the while — this is legal Ruby and confused the highlighter and made reformating the code quite hard...

Lines 62 and 63 create two copies of the matched characters and assign them to z and f.

Line 64 is the interesting part. This uses String#rindex to search for a space from character 80 to the left and replaces it with a newline — which is a very nice way of wrapping text.

Line 65 then assures that the rewrap does not cause the following line to begin with whitespace.

Because all this was done on a substring and not on the page body itself the original overlong character sequence is now replaced with the rewrapped one in line 66. I'm not sure if James is using a regexp there just to be obscure, for being backwards compatible or because he just has not yet gotten used to in_str.sub!(with_str, by_str).

Line 68 finally ends the open call. Note that $_ will still be available outside this block even though the assignment to it was inside the block.

70
71
72
73
74
75
76
77
78
while sub!(/^(?:[^\n]*\n){20}/, ""):
  puts "#$&\n--MORE--";
  g = $_;
  gets;;
  exit if $_[0] == ?q
  $_ = g;
end;

$_.display

Ah, the final slice of code. We're almost done understanding all this fancy stuff!

The while loop will keep iterating as long as the regexp is still able to replace the exactly twenty lines it wants to match with nothing.

For every matched twenty lines it will then first invoke the puts statement in line 71. It uses another of those perlish variable, $&, which refers to the last matched string which in this case is equivalent to the string that just replaced with nothing which is nifty. So what this does is displaying the twenty lines, followed by "--MORE--" on a new line by itself.

In line 72 it will then back up the $_ variable to g because the gets in line 73 will overwrite it. Line 74 checks whether the user wants to quit (by testing if the user input starts with a 'q').

Finally, it will restore $_ so that the sub! call from line 70 will operate on the right string on the next iteration.

Because the page's line count will likely not be a multiple of twenty line 78 will display any shorter parts that might still be left. (obj.display is equivalent to puts obj.)

So after all this code wasn't that hard to understand and it's quite nice to see how much you can squeeze into 21 lines of Ruby without even trying too hard.

Florian Groß