Monday, November 19, 2012

Are you kidding me, IE Driver? Another freaking thing to download?

Every now and again, I come across someone who remembers that the IE driver used to be completely self-contained for all of the supported language bindings, and realizes they need to download IEDriverServer.exe for the driver to work in more recent versions. Often, the question comes up as to why there's now a separate executable. The question is sometimes phrased like this:
You mean I have to download yet another component to use the IE driver now? I hated it when the Chrome driver forced me to do the same thing, and now you want me to go through even more hoops? Man, you Selenium developer guys really suck!
Admittedly, the need to download another component is a slight inconvenience. However, there were a few good reasons why this path was chosen.

The original implementation of the native (C++) code of the IE driver was in a .dll. In most of the client language bindings, this .dll was extracted at runtime from whatever packaging solution was appropriate for the language. For Java, this meant extracting it from the .jar; for .NET, this meant extracting it from a resource packed into the WebDriver.dll assembly. Ruby and Python packaging mostly relied on files laid out on disk, so no extraction was needed, just a reference to the path of the .dll. The language binding would then use its native API (JNI for Java, P/Invoke for .NET, ctypes for Python, FFI for Ruby) to load the .dll and call the exposed API for starting the "server" portion of the IE driver.

This worked alright for a time, especially in simple scenarios, but the development team eventually started noticing subtle differences in behavior between language bindings. For example, because of the way the .NET bindings loaded and managed the native .dll, it was able to support simultaneous multiple instances of IE, while the Java bindings did not. The Ruby bindings should have supported multiple instances, because it modeled its native code management after the .NET mechanism, but the native code interface the bindings used, FFI, didn't allow a loaded native library to be unloaded (i.e., it didn't support a call to the FreeLibrary Win32 API). All of this came down to the fact that each language's native code interaction method had slightly different semantics from the others.

Something had to be done to unify the user experience across languages. As it happens, using a separate process is a much more consistent story, because each language binding can use its process management API. Why does that result in more consistency? Because process management is defined by the operating system, the various versions of Windows in the case of the IE driver.

A happy side effect of moving to a separate executable is in the realm of 32-bit vs. 64-bit versions of the browser. It's a limitation of Windows that a 32-bit process cannot load a 64-bit .dll, and vice versa. So, this meant that if your language runtime was 64-bit, you could only run your WebDriver code against the 64-bit version of IE, and if it was important to you to run your WebDriver code against the 32-bit version of IE, you were out of luck. If this sounds like a far-fetched scenario, I'll point out that this was the default situation for .NET running on a 64-bit version of Windows. However, there is no restriction on a 64-bit process launching a 32-bit process. With the introduction of the standalone executable, it would be possible to run WebDriver code against a 32-bit or a 64-bit version of IE, no matter what the "bitness" of your language runtime.

Also, by moving the executable outside the normal delivery mechanism of the language bindings, delivery of the IE driver core is decoupled from full Selenium releases. That means it's possible to ship fixes in the IE driver without having to wait for a full release of Selenium. Since the introduction of the standalone IEDriverServer.exe executable, we've used this ability to deliver on bug fixes and functionality updates between releases of the language bindings.

Finally, the standalone executable is vastly easier to debug than a .dll loaded by the language bindings. Attaching to the standalone process is dead-simple from a debugger, and eliminates the guesswork of, "which java.exe (or ruby.exe or python.exe) process do I need to attach to?"

Of course, there is the matter of an extra thing that needs to be downloaded before you can run WebDriver code against IE. One might be tempted to try bundling the executable inside the language binding package like the .dll used to be, but extracting an executable at runtime and attempting to start it is what antivirus scanners just live to scream about. Protip: When you're designing your framework around WebDriver, use a command-line download utility like wget or curl to be able to download the IEDriverServer.exe from the web as you're setting up your environments.

Thursday, November 1, 2012

.NET Bindings: Whaddaymean "No response for URL"?

The Selenium project just pushed out version 2.26.0 for all of its languages, after a months-long hiatus between releases. The delay wasn't intentional, but it happened, so it's been awhile since the bindings were updated. As usual, the release was accompanied by a post to the user-facing mailing lists for the project. Also as usual, the first reply was asking a question about what didn't get fixed.

The issue, number 3719 in the issue tracker, involves the .NET bindings returning an intermittent failure for some operations. The message text of this error reads, "No response from server for url". The post to the mailing list basically asked why this issue wasn't fixed in the recent release. I had to struggle quite a bit not to summon the snark and respond,
"Sigh. How would you like this particular issue to be fixed?"
The .NET bindings use the .NET Framework's System.Net.HttpWebRequest class for communicating with a remote server that speaks the WebDriver JSON Wire Protocol. I must carefully note here that the term "remote server" can refer to many things. It can refer to an instance of the Java remote WebDriver server. It can also refer to an instance of IEDriverServer.exe or chromedriver.exe, the main components of the Internet Explorer and Chrome drivers, respectively. It can also refer to an instance of the Firefox extension that the FirefoxDriver uses internally to control Firefox. Note that in any of these cases, the "remote server" may, in fact, be running on the same machine as the client bindings. At present almost all of the driver implementations use this architecture of a server component running an HTTP server talking to the client bindings.

So the .NET bindings use the HttpWebRequest class to initiate command with the "server" component. To get the response back from the HTTP server, we call the GetResponse() method. Now, in the normal case, everything is just fine. The bindings get a valid response back from the server, interpret that response, and everything moves right along. Sometimes, the method throws a System.Net.WebException, like if the server is unreachable or the like. The bindings know about that possibility and catch the exception. The exception even has a .Response property on it to allow the bindings to continue to use a valid System.Net.HttpWebResponse to interpret what the remote WebDriver HTTP server is trying to say.

Sometimes, however, the HTTP server doesn't return any response, and it doesn't throw an exception. It just goes off into the aether, never to return. In that case, our response object is null, and here's the real question: What do you expect the .NET bindings to do in that case? The bindings have no idea of the status of the immediately preceding request. They don't know if it succeeded or failed. They don't know if the server is even still breathing or not. That means blindly attempting a retry would be futile at best, and destructive at worst. All the bindings can do is say, "Hey, we sent off a request, like you asked us to, but we didn't get a response back. Don't know what else to tell you, we tried. Sorry it didn't work out."

The worst part about this is that it looks like the bindings are at fault. The bindings are only reporting what happened, and I'm not sure what other approach any sane client could possibly do. Of course, you reading this may disagree. If so, and you have concrete ideas how to solve the problem that don't involve a blind retry or a complete rewrite of the .NET Framework's System.Net.HttpWebRequest class, I'd love to see the implementation. Show me the code; I love receiving patches.