Tuesday, July 19, 2011

Selenium 2.0 migration lessons

At Hearsay Social, we've upgraded our testing environment to use Selenium 2. We made the switch because there was enough evidence to suggest a huge 2-4x performance increase. Having learned a few lessons along the way, we thought it would be helpful to share what we found, especially for those who are considering making the transition.
  • Since Selenium 2 is redesigned to leverage what works best for the browser, whether it's a NPAPI plugin in Firefox or a DLL module for IE, we've discovered a huge performance gain, especially in Internet Explorer (IE) browsers that have much slower JavaScript engines. The new approach seems to allow us to run Selenium more conveniently on Internet Explorer browsers without the hassle of changing the security options because of all the exceptions that were thrown as a result of the older JavaScript-based architecture.
  • Selenium 2 gets closer to simulating the behavior of a user on a browser.  In Selenium 2, the DOM element that is actually clicked is determined by the X/Y coordinates of the mouse event. Therefore, if you attempt to search for a DOM element that is hidden or obstructed by another element, the top element will always be fired and you might encounter ElementNotVisibleException errors from the Selenium server. You need to keep this issue in mind when rewriting your tests, since Selenium 1 versions may not have had this restriction. (We use the Django web framework and the popular django-debug-toolbar, which adds a popup overlay in our web application that has to be disabled in our application during Selenium tests.)
  • We've found that the new Selenium 2 WebDriver-based API is easier to train our developers to use. The documentation for Selenium 2 is still somewhat sparse, especially for the updated Python bindings, so digging into the source code (in our case, remote/webdriver.py and remote/webelement.py code) is still the best way to learn what API commands are available. While Java developers may have access to WebDriverBackedSeleniumclass that can use existing Selenium 1 code while leveraging the WebDriver-based API, we didn't find any similar support for Python. So we took the plunge and refactored most of our tests.
        def tag_name(self):
            """Gets this element's tagName property."""
            return self._execute(Command.GET_ELEMENT_TAG_NAME)['value']
        def text(self):
            """Gets the text of the element."""
            return self._execute(Command.GET_ELEMENT_TEXT)['value']
        def click(self):
            """Clicks the element."""
        def submit(self):
            """Submits a form."""
        def clear(self):
            """Clears the text if it's a text entry element."""
On the server-end, it's important to study how the client API is sending remote commands by reviewing the JsonWireProtocol document posted on the Selenium Wiki, especially since Sauce Labs provides you with the raw logs to see what commands are actually being issued by the client.

  • While experimenting with Selenium 2, we found it much easier to test out the new WebDriver API by downloading and running the Selenium server locally. This way, your connection won't constantly timeout as a result of using your Sauce Labs account, giving you more freedom to experiment with all the various commands. If you need to run browser tests against an external site while using your own machine to drive the browser interactions, you can setup a reverse SSH tunnel and then experiment with Selenium 2 API by setting debugger breakpoints and testing out the API bindings. In the long-term, though, you definitely want to use Sauce Labs for hosting all the virtual machines in the cloud for running your browser tests!
  • If you're interested in using Firebug to help debug your application, Selenium 2 also provides a way to inject Firefox profiles. You can create a Firefox profile with this plug-in extension, and Selenium 2 includes an API that will base-64, zip-encode the profile that will be downloaded by the remote host. Note that this approach works best if you're running the Selenium server locally, since using it over a Sauce Labs instance only gives you access to view the video.
  • Selenium 2 continues to be a moving target with its API, so you'll want to keep up to date with any release notes posted on the Selenium HQ blog. Most recently, we found that the toggle() and select() commands have not only been deprecated but removed completely from the implementation. If you try to issue these commands, the Selenium server simply doesn't recognize the commands and WebDriverExceptions are raised. The best thing to do is look at the Selenium version number. In this particular example, version 2.0.0 (three decimal places) are used to represent the release candidate of the latest Selenium build. You may also instantiate your .jar files with the -debug flag to watch how your client bindings execute API commands to the Selenium server.
20:38:02.687 INFO - Java: Sun Microsystems Inc. 20.1-b02
20:38:02.687 INFO - OS: Windows XP 5.1 x86
20:38:02.703 INFO - v2.0.0, with Core v2.0.0. Built from revision 12817
  • Selenium 1 users will find that is_text_present() and wait_for_condition() commands no longer exist, and are replaced by a more DOM-driven approach of selecting the element first before firing click() events or retrieving attribute values through get_attribute(). You no longer have to have wait_for_condition() for page loads. Instead, you set implicitly_wait() to a certain timeout limit to rely on find_element_by_id() to wait for the presence of DOM elements to appear to between page loads.
  • Lastly, we've noticed in the Selenium discussion groups that often there are questions about how to deal with concurrent Ajax requests during your tests.  In many test frameworks, there's the concept of setup and tear down of the database between each set of tests.  One issue that we encountered is that if your browser is issuing multiple requests, you're better off waiting for the Ajax requests to complete in your tear down function since the requests could arrive when the database is an unknown state. If this happens, then your Selenium tests will fail and you're going to spend extra time trying to track down these race conditions. If you're using jQuery, you can check the ajax.global state to determine whether to proceed between pages (i.e. execute_script("return jQuery.active === 0")). You'll want to keep looping until this condition is satisfied (for an example of implementing your own wait_for_condition() command, click here.)
Hope you find these tips helpful for migrating over to Selenium 2. Happy testing!

Issues with Vista/Windows 7 with the Kinesis Keyboard

Kinesis has a few suggestions that may help resolve issues with the Kinesis on Vista/Windows 7.

The suggestions seem to be:

1. Hit Shift, both Ctrl keys, and both Alt keys to reset.
2. Change advanced power settings (USB) and set to disabled.
3. Replace the firmware (only through Kinesis) and/or firmware board.

2. Advantage keyboard not working after Windows computer goes to sleep mode
We have learned of a widely reported problem with Windows operating systems (XP & Vista) using third party keyboards (even though the problem has been reported with the Microsoft Natural keyboard as well) where the keyboard does not function after the computer enters sleep or hibernation mode. Fortunately there is a easy solution:
Go to Start-Settings-Control Panel, double click Keyboard. Click the Hardware Tab, highlight the "HID Keyboard device" and select Properties. Click Power Management. UNCHECK the box that says "Allow this device to bring the computer out of standby." Click OK. Now when your computer goes to standby or sleep mode, you will simply need to wake the computer by moving your pointing device. Your keyboard should now work perfectly fine.
For Vista users, go to Control Panel, Power Options, click "change plan settings" then "change advanced power settings" then go to "USB settings" and select "disabled."
Back to Top

8. Weird behavior with contoured keyboard
Stuck modifier key. Sometimes the computer misses the upstroke from keys that are held down in key combinations.   The result is a "stuck" key.  Try pressing both Shift keys, bothCtrl keysand both Alt keys. If this problem happens to your contoured keyboard more than once every few weeks, you may need a firmware upgrade or new main circuit board. 
Check your firmware version (serial numbers 20,000 and higher). Open a text editor other than Microsoft Word (e.g notepad, wordpad or equivalent). Press both Shift keys plus F12. The keyboard will produce a sentence which ends with the firmware version number and version date, such as:
        copyright 1986 - 1998 by interfatron-bbc, ltd., 
rev 2.48 08/13/98.
Circuit board may need replacement. If your contoured keyboard was built before April, 1998, it may need a new main circuit board. The old boards look brown if you look past the thumb keys at the underlying circuit board.  The new circuit boards look green with a gold grid. 

Wednesday, July 13, 2011

sudo race condition in Fabric 1.0.0

If you've used Fabric with sudo, you might notice keystrokes during the password phase sometimes doesn't work. The problem with Fabric for sudo() command is tracked here:


The diff for Fabric 1.0.2 is here:

It appears that the fix basically shuts off the input_loop when the connection is attempting to use get_pass() when performing an SSH connection to another machine with Fabric. The problem is
that while Fabric claims it is single-threaded, but it still creates thread handlers for standard input/output:

        workers = (
            ThreadHandler('out', output_loop, channel, "recv", stdout),
            ThreadHandler('err', output_loop, channel, "recv_stderr", stderr),
            ThreadHandler('in', input_loop, channel, using_pty)

The fix disables input checking so that the getpass inside the prompt_for_password() command can work correctly.

Tuesday, July 12, 2011

Pruning GitHub / merged

Tired of all your outdated branches? Now there's a way..
git branch --merged | grep -v "^* " | awk '{print "git branch -d " $1 "; git push origin :" $1}' | sh
What it boils down to:
git branch --merged = shows all the merged ones
grep -v "^*" = excludes the current branch (can't delete a branch you're currently on)
awk '{print "git branch -d " $1 "; git push origin :" $1}' = delete merged branch and delete it on GitHub.
If you want to double-check, take away the last "| sh" to verify everything works. You have been warned.

You may get some errors if the branch that has already been merged no longer exists on your GitHub origin..

Django and large datasets

If you've worked with large data sets with Python/Django, memory consumption can hit 2GB on dev, even if settings.DEBUG=False..

Django's documentation about querysets hints about the issue does little to mention this issue about big datasets:


Caching and QuerySets
Each QuerySet contains a cache, to minimize database access. It's important to understand how it works, in order to write the most efficient code.

In a newly created QuerySet, the cache is empty. The first time a QuerySet is evaluated -- and, hence, a database query happens -- Django saves the query results in the QuerySet's cache and returns the results that have been explicitly requested (e.g., the next element, if the QuerySet is being iterated over). Subsequent evaluations of the QuerySet reuse the cached results.

Keep this caching behavior in mind, because it may bite you if you don't use your QuerySets correctly. For example, the following will create twoQuerySets, evaluate them, and throw them away:

...so if you do queries against big datasets (i.e.):

for item in MyTable.objects.all():

...memory consumption can hit the roof. The culprit was iterating through all objects (475,000)...as the size increases, the self._result_cache in the QuerySet object just keeps appending. Even though Django fetches 100 rows at a time from MySQL, each time you iterate, it just adds to the result_cache array such that you will end up caching all 475K objects. No clearing, nada.

Disqus approached the problem by implementing a SkinnyQuerySet to avoid storing extra data in memory, though it's not clear to me that the changes they discussed attempted to query against a large dataset (i.e. not using LIMIT range queries) See p. 25 at http://www.scribd.com/doc/37072033/DjangoCon-2010-Scaling-Disqus (or the source code for SkinnyQuerySet at https://gist.github.com/550438). FYI -- the code also changes to prevent you from doing list() over a QuerySet.

The code that does all this stuff is here (ITER_CHUNK_SIZE is 100):
 def _result_iter(self):
     pos = 0
     while 1:
         upper = len(self._result_cache)
         while pos < upper:
             yield self._result_cache[pos]
             pos = pos + 1
         if not self._iter:
             raise StopIteration
         if len(self._result_cache) <= pos:

def _fill_cache(self, num=None):
    Fills the result cache with 'num' more entries (or until the results
    iterator is exhausted).
    if self._iter:

            for i in range(num or ITER_CHUNK_SIZE):
        except StopIteration:
            self._iter = None
The iterator() fixes the problem with Django caching over time, but I realized that the problem happened on the first iteration of a queryset, and looking at this object reference graph (you have to install the objgraph library) I inspected one of these objects and you can see that it's part of all the objects already loaded in MySQLdb's _rows dictionary after just one QuerySet iteration. So basically any QuerySet even with iterator() will store the entire SQL result in memory.
(Pdb) import objgraph
(Pdb) objgraph.show_most_common_types(limit=20)
tuple                      41682

(Pdb) next
-> if not result_type:
(Pdb) import objgraph
(Pdb) objgraph.show_most_common_types(limit=20)
tuple                      419858
What ultimately is happening is that it generates SQL command, sends it over to the Python MySQLdb wrapper, which eventually calls cursor.execute(), which calls store_result(), which stores the entire result set in memory (see store_result() reference in http://mysql-python.sourceforge.net/MySQLdb.html) The tradeoff is either to load everything in memory or one row at a time. Obviously we have more control and do LIMIT ranges to try to balance between too much memory and too many DB connections.
728  ->         cursor = self.connection.cursor()
729             cursor.execute(sql, params)

"""This is a MixIn class which causes the entire result set to be
stored on the client side, i.e. it uses mysql_store_result(). If the
result set can be very large, consider adding a LIMIT clause to your
query, or using CursorUseResultMixIn instead."""

def _get_result(self): return self._get_db().store_result()
The solution seems to use an implementation similar to query_iterator where you do LIMIT's upfront on QuerySet (you can also use iterator() to avoid caching the result set...so it seems a combination of Disqus and this snippet http://djangosnippets.org/snippets/1949/...

Selenium 2 RC3 and Python bindings have removed toggle() and select()...

-    def toggle(self):
-        """Toggles the element state."""
-        resp = self._execute(Command.TOGGLE_ELEMENT)
-        return resp['value']
80 68 
     def is_selected(self):
81 69 
         """Whether the element is selected."""
82 70 
         return self._execute(Command.IS_ELEMENT_SELECTED)['value']
83 71 
-    def select(self):
-        """Selects an element."""
-        self._execute(Command.SET_ELEMENT_SELECTED)

IEDriver.dll changes:

Friday, July 8, 2011

How Facebook invalidates fbs_ cookies through all.js...

If you've ever had to integrate with Facebook Connect, which allows third-party web sites to rely on Facebook for user authentication, you may find yourself wanting to test whether your integration works. You can obviously use Selenium to automate the login process through the popup window, but what if you just wanted to try to skip this step and set the cookie yourself?

Each time you login through the Facebook popup window, the JavaScript code will set an fbs_ cookie. The xxx in the fbs_xxxx cookie refers to the Facebook API key, and is signed with your application's secret key. If you have an infinite session key and a valid application access token, you can create your own fbs_xxx cookie. You can basically reverse the process that is normally done by the get_user_from_cookie from the Python Graph API library:
def get_user_from_cookie(cookies, app_id, app_secret):
    cookie = cookies.get("fbs_" + app_id, "")
    if not cookie: return None
    args = dict((k, v[-1]) for k, v in cgi.parse_qs(cookie.strip('"')).items())
    payload = "".join(k + "=" + args[k] for k in sorted(args.keys())
                      if k != "sig")
    sig = hashlib.md5(payload + app_secret).hexdigest()
    expires = int(args["expires"])
    if sig == args.get("sig") and (expires == 0 or time.time() < expires):
        return args
        return None
However, even when using a valid cookie signed by your app, it appears that Facebook invalidates the token. You end up seeing the same login popup window, but if you remove the all.js library and use a fake substitute, you'll notice that your app all of a suddenly works. So Facebook is doing something internally to disallow self-signed fbs_cookies.
function do_nothing(a, b, c, d, e, f) {
    return true;

var FB = {
    'init': do_nothing,
    'getLoginStatus': function () {},
    'login': do_nothing
When you first run the init() function for the Facebook JavaScript SDK code, it first loads the cookie that you set (fbs_ + API_KEY) and sets FB_session to correspond to these cookie values. The problem is that another auth.request command is sent to Facebook. This auth.request command tries to verify whether a session key was obtained from Facebook. If it fails, then the FB._session object is cleared and that pop-up window appears asking you to login. The Facebook JavaScript does this auth.request by injecting an <iframe src="http://www.facebook.com/extern/login_status.php?"> with three different callback functions in the URL query string (no_user, no_session, and ok_session).

The result from this IFrame request basically used to invoke the response, which is handled by either the no_user, no_session, or ok_session xdResponseWrappers. If the no_user/no_session response wrappers, then the 'connected' state will not be set internally and the session will be cleared. The code that invalidates this is located inside the cross-domain (aka 'xd') xdResponseWrapper function inside all.js:
xdResponseWrapper: function(a, c, b) {
   return function(d) {
     try {
       b = FB.JSON.parse(d.session);
     } catch (f) {}
     if (b) c = 'connected';
     var e = FB.Auth.setSession(b || null, c);
     e.perms = d && d.perms || null;
     a && a(e);
When the no_user response is called, d.session apparently is null, which will then be used by FB.Auth.setSession() to clear the FB._session object. You can't easily set a breakpoint on how this happens because the Facebook all.js code creates a function that does so, so it isn't apparent unless you insert alert() statements. But basically if you put alert statements, you will observe that the no_user result is returned when you attempt to just set the cookie and login to your site.

It looks like Facebook has implemented this policy to try to prevent people from using robots to directly login to their site. It also appears that there are additional cookies that must be set (datr, lsd, c_user, h_user, lxe, and xs) in order for the login_status.php to return back the ok_session callback. You can observe this behavior by monitoring the cookie traffic if you were to login through the popup window.

Apparently the 'xs' cookie is a well-known Facebook cookie too: http://www.duke.edu/~jyw2/wwwsecurity.html