Inside IronRuby – Cory Foy

The IronRuby team has been making great progress, and the stuff they are doing is very cool. I thought it would be a neat exercise to dive into what happens behind the scenes to take a snippet of Ruby Code and execute it.

First, you’ll need the latest version of the source from RubyForge (above). Go ahead and compile it by running “rake compile” being sure you are running it from a Visual Studio Command Prompt and have installed the Ruby module pathname2 by running “gem install pathname2”.

With that out of the way, here’s the example we’ll work with which is similar to one on the wiki:

using System;
using Ruby.Hosting;

namespace CSIronRuby
{
class Program
{
    static void Main(string[] args)
    {
      RubyEngine re = RubyEngine.CurrentEngine;
      string script = “puts ‘Hello, World!'”
      re.ExecuteCommand(script);
      Console.ReadLine();
    }
}
}

If we load this up, reference the Ruby and Microsoft.Scripting projects, and hit F5, we’ll see:

But what magic actually happened to cause that to print? Looking at our code, the first thing we had to do was get a RubyEngine instance. Looking at RubyEngine.CurrentEngine, we see that it calls Factory.GetInstance to find out from the current ScriptEnvironment what the correct engine is. And sure enough if we keep digging we see that it asks the CurrentManager of ScriptDomainManager to get a language provider type, which it then passes to a GetInstance method for setup.

The interesting thing is that our engine that is loaded isn’t just for Ruby. The engine is actually capable of passing calls for any of the 4 supported types in the DLR – IronRuby, IronPython, JScript and VBScript.

Once our engine is initialized, we actually load the LanguageProvider, which will be Ruby in this case. Up until now we could have been working with any DLR language – and in fact, it doesn’t look like there is anything necessarily stopping us from having multiple LanguageProviders loaded – appearing to let us mix and match Ruby and Python and other DLR languages. Which is cool stuff – you as an application developer can allow customization using any of the DLR languages pretty much just by including the right DLLs.

We then start getting into options specific to the RubyEngine. The current options look like:

One of the notes is that if ClrDebugging is enabled we have to demand a high trust. However, it looks to be off by default.

Along the way, I see the following snippet of code:

_dict = dictionary ?? new SymbolDictionary();

Which, according to the MSDN Docs returns the left-hand operator if not null, or else returns the right-hand operator. Similar to saying:

_dict = (dictionary != null) ? dictionary : new SymbolDictionary();

I learn something new every day. ;)

Finally we load up what looks to be the binding rules for the language (although there is a “//TODO: Remove” comment, so we’ll see what happens as we get further down) and we notify the host that the engine is loaded.

We’re now on to line 2. ;) We store the script into a string – pretty boring stuff. Then we get to this:

re.ExecuteCommand(script);

and now the fun begins. The first thing that happens is that it looks to see if the code needs to be compiled, or if we should just execute it. Here, we jump into ExecuteInteractiveCode. This in turn creates a snippet using our engine, the code, and marking it as interactive code. Our string of code actually gets wrapped by a SourceStringContentProvider. We then move on to getting the RubyCompilerOptions which is…an empty class (and a TODO asking if we actually need this class). We finally get to a CompileSourceCode method where we see the juicy ParseSourceCode method. Now the real magic begins.

In order to work effectively with the code, we need to turn it into an Abstract Syntax Tree. So we grab a Parser from the Ruby.Compiler namespace and get to work. During initialization we load up Parser.y – the Yacc spec for Ruby.

With the parser initialize, we actually call Parse. We then do the following:

Enter a lexical scope
Read and Parse the Source
Leave the lexical scope

Reading is easy – our source is stored in a sourceStringContentProvider from above, so our reader is just a normal TextReader. Next we load up a Tokenizer, and call Parse(). Now we’re looping over the tokens to evaluate the actual source. It’s fun stuff to look at – just know that RhsLenght is probably RhsLength. ;)

We do one pass to initialize some things before starting to pick through the code. I won’t step into everything that is going on, but to give you an idea, here’s a snippet of the Tokenizer:

Yep, they have cases for most every symbol or character you’d come across.

As I said above, now we’ve begun looping through the source code. We first pull out “puts” by looping over the source line until we hit a non indentchar. One thing that is interesting is that you have to take into account multi-byte characters even for source code. Fun stuff.

We’ve now pulled puts off the source, and although we know the next character isn’t an identchar, we have to make sure it isn’t a “!” or a “?”. The former is used in Ruby to signify that the method call should modify the existing object, for example this:

s = “Hello”
us = s.upcase
puts s
puts us

Would output “Hello” and “HELLO”, this:

s = “Hello”
s.upcase!
puts s

Would output “HELLO”. The latter is used to signify queries in Ruby. We have neither in our source code – so it keeps going. After making sure we aren’t trying to set a global ($) or local/class variable (@, or @@), we look to see if this is a constant (upper case first letter) or identifier. After specifying it is an identifier, we look to see if it is a reserved word. The reserved words for Ruby are in the Tokenizer.cs class, but “puts” isn’t in there. We then mark it as a command, add it to the stack, and keep processing.

We pull off the next token, “Hello, World” in a similar fashion. Well, that’s a bit of an understatement. We actually have a lot to pull off – “puts”, ” “, “‘”, “Hello”, “,”, “World”, “‘”, and “\0” (for end of file). Each one of those has to be pulled off and then
put together to make sense.

Once that is finished, we check for any errors (incomplete token, invalid, etc), and return the syntax tree. We’ve got the tree, so let’s Transform it using an AstGenerator. We create a code block, and then see if we need to create any local variables. None here, so we move on. In the code block, we add methods for GetRfc and GetSelf. With that complete, we bind any Closures.

As I was watching it walk the statements to bind the closures, I found the dump variable had been filled with:

Dump = “#\n# AST \n#\r\n.return (\r\n .action InvokeMember puts(\r\n .bound (#self)\r\n .call RubyOps.CreateMutableString (\r\n .args (\r\n (string) \”Hello, World!\”\r\n )\r\n )\r\n )\r\n)\r\n”

Interesting what a simple statement can be turned into. It’s also interesting how far in we’ve gone – with the closures bound, it checks the flow of the variables, and then returns the compiled statement back up the chain to ExecuteInteractiveCode as an ICompiledCode object. We then call PrintInteractiveCodeResult passing in the result of calling the Evaluate method on our code block.

Some magic happens in Evaluate where it takes the code block (seen in the dump above) and evaluates it. A lot of that magic is in generated code, so we can’t dive into it here. However, if you put a breakpoint on Kernel.PutString, you can see that the puts call gets converted to the Kernel.PutString call. You’ll also notice that by the time Evaluate is finished executing (but before we go into PrintInteractiveCodeResult) “Hello, World” has already been printed on the console.

With our message printed, we dive into PrintInteractiveCodeResult which is pretty straightforward:

protected override void PrintInteractiveCodeResult(object obj) {
Kernel.Print(_defaultContext, null, “=> “);
Kernel.PutString(_defaultContext,
null,
RubySites.Inspect(_defaultContext, obj));
}

The first, Kernel.Print, outputs the =>. The second looks at the return value of our expression and outputs that. Since we called puts, it returns nil, and that’s what will get printed. I was surprised to see Kernel.Print directly printed to Console instead of having the output be hooked up via a provider. Something that will be changed I’m sure.

And that’s it! We’ve now got our console window staring back at us waiting for someone to push enter. May as well go ahead and be you.

Note: This post was written just by loading up the source from RubyForge, creating a Visual Studio project, and stepping through it with a debugger. I’m not on the IronRuby team (although, if they need someone…), and I’ll update the post to fix any glaring omissions.

Also, this code is changing every day, so it may not look like this tomorrow. Especially given the number of TODOs I saw. ;)

Enjoy!