Tuesday, May 31, 2011

Python and the multiprocessing module

Great writeup here:

A video talk about the Python Global Interpreter Lock

Monday, May 30, 2011

Using PGCEdit and ISOMaster to edit DVD chapter ordering..

If you've ever received a DVD and had the chapter sequencing out of order, here's how I managed to fix things by myself. One motivation is that I wanted to see if this issue could be fixed by editing the metadata within the DVD files stored within the .IFO file. I didn't want to get into the business of reauthoring the DVD and degrading the video quality.

Noticing that the chapters were correctly splicing but between Chapters 4 & Chapter 5 were not advancing, I decided to do some probing to understand why the bug was happening.. You can use a program called PGCEdit that lets you to playback/traces to see how the IFO file gets simulated in the DVD player, setting up Trace mode to walk through the sequence and watching as data in the general purpose memory registers (GPRM) and executes various commands that launch different sections of the DVD.

For whatever reason, there is an instruction (perhaps when the DVD needs to jump to the next layer) to jump back to "cell 4", the last clip that you're watching before things sequence to the next clip. I changed things to cell 5, and whala, everything works.

I then used Ubuntu ISO Editor which makes it easier to re-create ISO files from existing ones and re-burned a copy. The advantage of using ISO Editor is that you can easily add/remove files from an existing ISO file, and then generate a new one very quickly.

Wednesday, May 25, 2011

Using celeryev

You can either start celeryd with the -E command, or you can enable it by using celeryctl:

celeryctl inspect enable_events

Querying for Django date ranges

MySQL provides a simple DATE() function that allows you to convert a DATETIME (http://stackoverflow.com/questions/5182275/datetime-equal-or-greater-than-today-in-mysql), but how does one do so in Django?

Apparently, you have use the __range and use datetime.datetime.combine(), as well as datetime.time.min and datetime.time.max to do things:


{'date_field__range': (datetime.datetime.combine(date, datetime.time.min), datetime.datetime.combine(date, datetime.time.max))}

`myapp_mytable`.`date` BETWEEN 2011-05-25 00:00:00 and 2011-05-25 23:59:59)'

Sunday, May 22, 2011

Python unittest and Hudson

Hudson uses the error code of the last command in your script to determine whether your build has failed (http://stackoverflow.com/questions/4334591/setup-a-test-build-job-in-hudson-that-detects-when-make-fails-to-compile). How does this affect things if you're running Python/Django unit tests, or more importantly, if you run Selenium tests against multiple browsers? The Python unittest returns the error code based on a wasSuccessful() command, which in turn is determined by whether any failures/errors occurred;

 sys.exit(not result.wasSuccessful())

...where wasSuccessful() corresponds to this line:

   def wasSuccessful(self):
        "Tells whether or not this result was a success"
        return len(self.failures) == len(self.errors) == 0

Now suppose you build your own test runner, most notably if you're running Selenium to launch multiple browsers:

_environments = [
    # Firefox runs faster than IE7, so check first even though our customers use the latter browser more.
    {'platform' : "WINDOWS",
     'browserName' : "firefox",
     'version' :  "3.6"
    {'platform' : "WINDOWS",
     'browserName' : "iexplore",
     'version' :  "7"
    def run_suite(self, suite, **kwargs):
        r = None
        for env in _environments:
            print "Running %s" % env

            # We need Django nose to run XML coverage outputs on Hudson.  For multiple browser support, we need
            # to change the --xunit-file parameter.
            if hasattr(settings, 'NOSE_ARGS'):
                nose_utils.swap_nose_args(suite, env)

            selenium_cfg.current_env = env
            test = super(SeleniumTestSuiteRunner, self).run_suite(suite, **kwargs)
            if not r:
                r = test
If we try to use the code above as the test-runner, Hudson always will trigger a build failure. Why? Well it turns out whate need to make sure is that we use extend() instead of append(). If we use append(), we appending multiple lists within a list (i.e.):

>>> a = []
>>> a.append([])
>>> a
>>> a.append([])
>>> a
[[], []]

The fix is shown below.

52  52              if not r:
53  53                  r = test
54  54              else:
55                   r.failures.append(test.failures)
56                   r.errors.append(test.errors)
55                  r.failures.extend(test.failures)
56                  r.errors.extend(test.errors)
57  57 

Thursday, May 19, 2011

Injecting Firebug into a remote FIrefox instance on Selenium 2

There are ample instructions about how to instantiate Firefox with Firebug on a localhost, but what if you're trying to do so with a reverse SSH tunnel (see http://hustoknow.blogspot.com/2010/10/using-selenium-tests-with-djangopython.html) and don't intend to be running things on your localhost? Well, the Selenium 2 developers provide a way, albeit not very well documented. Apparently there is an API command to base-64 encode a .zip file that can be transferred over the network to the remote browser.

1. First, create the profile on your local machine:

firefox -ProfileManager --no-remote

2. Install the Firebug add-on, enabling the Net/Console panels after the plug-in is installed (so these settings get saved inside your prefs.js file))

3. Tar up the profile directory (stored inside ~/.mozilla/firefox with some special hash prefix i.e. ~/.mozilla/firefox/60f709x.selenium) and transfer to the machine that will invoke

(At the time of this writing, you also must add a user.js file to the profile directory, since the Selenium Python bindings expect to find this file even though this file does not normally get created by default -- see patch submitted at http://code.google.com/p/selenium/issues/detail?id=1692 and http://kb.mozillazine.org/User.js_file)

4. Unzip the file into a directory.

Then the profile can be added as follows:

from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
profile = FirefoxProfile(profile_directory="/home/myuserid/.mozilla")
profile.update_preferences() # increases the num of simultaneous connections among other things

from selenium.webdriver.common.desired_capabilities import DesiredCapabilities 
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
profile = FirefoxProfile(profile_directory="/home/myuserid/.mozilla")

selenium_browser = webdriver.Remote(desired_capabilities=DesiredCapabilities.FIREFOX,

The Python bindings for Selenium 2 will take the directory, zip it up, base-64 encode, and transfer it over the network to the Selenium RC server. Note that the old Selenium 1.0 way of using the java -jar selenium-server.jar -firefoxProfileTemplate “<selenium Profile Directory>" (as described in http://girliemangalo.wordpress.com/2009/02/05/creating-firefox-profile-for-your-selenium-rc-tests/) seems to be ignored for WebDriver-based Selenium 2 tests, so if you're planning on using only Selenium 2, the approach described approach seems to be the way things work.

Tuesday, May 17, 2011

Migrating from Selenium 1.0 to Selenium 2.0

1. The new Selenium 2.0 should run much faster (2-4x?) , for this reason in that it incorporates WebDriver (Google has a good writeup explaining the differences http://google-opensource.blogspot.com/2009/05/introducing-webdriver.html).

2. A lot of the esoteric Selenium commands are gone, replaced with a nice API exposed in http://code.google.com/p/selenium/wiki/JsonWireProtocol. A few changes to help port your code over from Selenium v1.0 to Selenium 2.0 (see example below):
  • The is_text_present() command is no longer provided. You need to use a combination of XPath selectors (i.e. a[text()='abc']) instead of attempting to scan the entire DOM tree for text. Firefox seems to crash if you attempt to do find_element_by_xpath("//html/body").text.find(txt).
  • The get_eval() command that executes any arbitrary JavaScript command is now execute_script(). You must also explicitly define what value you wish to return (i.e. execute_script("return document.activeElement")
  • The open() command is now get()
  • There is no longer a wait_for_condition() command. You have to implement your own to wait for a condition to satisfied.
  • def wait_for_condition(self, js_cmd):
            # Selenium 2 no longer has a wait_for_condition, so we have to create our own polling routine.
            for i in xrange(30):
                ret = self.selenium.execute_script(js_cmd)
                if ret is True:
                    return True
            if ret is False:
                print "JavaScript command %s did not finish completing (ret_val=%s)" % (js_cmd, ret)
                raise Exception
    You can then define a function to wait for jQuery Ajax events are completed by checking the jQuery.active flag. This flag is incremented whenever you invoke a $.ajax() call and decremented when you complete an Ajax call.
    def js_wait():
            wait_for_condition("return jQuery.active === 0")
  • You do not need wait_for_page_load() but you do need to invoke implicitly_wait() and set a max timeout (in seconds) to detect when certain DOM elements exist. Selenium 2.0 has no way of detecting when your JavaScript code has finished loading, so you need to create find_element_by_id() or find_element_by_xpath() statement that depend on searching for certain DOM elements that should appear.
3. The Java version provides a backwards compatible API with the new WebDriver toolkit, but the paradigms are slightly different so it helps just to simply port things over. There also isn't a very good selenium package that uses same commands. While you can use the old Selenium 1.0 commands, using the WebDriver-based Selenium 2.0 will considerably improve your performance time.

4. If you're using Firefox on Windows, keep in mind that native extensions are used to trigger click and other mouse-related events. This means that you should attempt to use your web application in a 1024x768 screen, since Selenium opens up the web browser with this size and sends the X/Y coordinates to the browser to fire the mouse-event. If you have DOM elements that are hidden, Selenium v2.0 also checks and triggers an exception. One such issue is in the Django debug toolbar, which can overlap with your buttons and cause events not to fire (See http://hustoknow.blogspot.com/2011/05/selenium-2-and-django-toolbar.html) SauceLabs has a basic intro here: http://saucelabs.com/docs/selenium2

Here's an example of migrating your Selenium 1.0 to Selenium 2.0: Old Selenium v1.0:
sel = selenium
         sel.type("news_title", "My News")
         sel.type("news_content", "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur sollicitudin faucibus magna quis varius. Aliquam erat volutpat. Morbi sit amet turpis at tellus consectetur aliquet. Tempus posuere interdum. Fusce eget orci risus.\n\nSed a diam ipsum. Duis faucibus blandit libero eget dictum. In non est justo, sit amet sollicitudin ipsum. Proin consequat, arcu sit amet venenatis dictum.")

Selenium v2.0:
sel = selenium
         sel.find_element_by_id("news_title").send_keys("My News")
         self.assertTrue(sel.find_element_by_xpath("//ul[@id='panels']/li[contains(@class, 'wrapper')]/H2[text()='Engagement']"))

Selenium 2 and the Django debug toolbar

One issue when using Selenium 2 with the Django debug toolbar is that Firefox on Windows attempts to use native code to mimic the user behavior. This process allows Selenium to model the way your users interact with your web app.  When you signal a click on an element, the first thing that happens is the element is first scrolled (element.ScrollToView()) before triggering the focus (via element.focus()).  For native implementations, Selenium also provides the X/Y coordinate of the element (using getClientRect() to get the absolute top/left positions) and then half the height/width of the element rather than just creating a trigger event on the actual element. If you're curious about the native implementation behavior, the code can be found in firefox-driver/js/utils.js and firefox-driver/extension/components/wrappedElement.js. You can download the Selenium source code from http://seleniumhq.org/download/source.html.

With Selenium 2's WebDriver implementation the screen that is opened will overlap the Django debug toolbar, causing your clicks to misfire if you have an element that is covering it. If you inspect the window.activeElement, you'll notice the focus is on the Django toolbar rather than the actual element you intended to click. You may notice this issue during your own testing since Selenium opens up a smaller window (1024px) and provides ample room for the toolbar to render without conflicting with your icons.

WebDriver offers user emulation via Javascript on all platforms, but on Windows "native events" are used. Specifically, sending keys to Firefox on Windows involves generating the Windows-specific events to make Firefox think it received user input. The same should be possible on Linux. As on Windows, it should be possible to run more than one instance of the Firefox driver at a time without the two interfering with each other. The final requirement, which is what makes native events on every platform challenging, is that the browser should not require focus in order to work properly --- it's useful to read email while the tests are running.


One way would be to use the 'djdt' cookie to disable the showing of the debug toolbar panel when your page loads, but Selenium v2.02b3 has a bug in using the add_cookie.  If you are using the Django debug toolbar with your app, you will be unable to set the 'djdt' cookie to be 'True' to prevent the toolbar from being rendered.


The temporary solution appears to be to disable the Django toolbar, or to execute hide() on the toolbar.  You can also issue a window.resizeTo() to enlarge the width of the browser, but this command does not appear to work on SauceLabs.

selenium_browser = webdriver.Remote(desired_capabilities=current_env,command_executor="http://myhost:80/wd/hub"
selenium_browser.execute_script('$("#djDebug, #djShowToolBar, #djDebugToolbarHandle").hide();')

One other approach is to set the DEBUG_TOOLBAR_CONFIG inside your settings.py file (or put some logic to detect the requesting IP address inside INTERNAL_IPS)


Monday, May 16, 2011

Updating Chrome on Ubuntu-64 for Angry BIrds

1. apt-get install chromium-browser
2. apt-get install chromium-web-inspector (Otherwise, Web Inspector will stop working)

3. Upgrading to the latest version of Chrome breaks the Flash plug-in.  The latest version (10.2.p3) for 64-bit Flash can be located here:


4. sudo cp libflashplayer.so /usr/lib/chromium-browser/plugins/

Thursday, May 12, 2011

JavaScript quirk..

Apparently this JavaScript command works:

var a = { bla: function() { alert(a.bla); } };

Wednesday, May 11, 2011



Returns the currently focused element, that is, the element that will get keystroke events if the user types any. This attribute is read only.
Often this will return an <input> or <textarea> object, if it has the text selection at the time.  If so, you can get more detail by using the element's selectionStart and selectionEnd properties.  Other times the focused element might be a <select> element (menu) or an <input> element, of type button, checkbox or radio.
Note: On Mac, elements that aren't text input elements tend not to get focus assigned to them.
Typically a user can press the tab key to move the focus around the page among focusable elements, and use the space bar to activate it (press a button, choose a radio).
Do not confuse focus with a selection over the document, consisting mostly of static text nodes.  See window.getSelection() for that. 
When there is no selection, the active element is the page's <body>
Note: This attribute is part of the in-development HTML 5 specification.


var curElement = document.activeElement;

Tuesday, May 10, 2011

Does Python have WebDriverBackedSelenium?

2011-03-21T07:28:03   Do the python bindings have webdriverbackedselenium?
2011-03-21T07:28:15   um kinda
2011-03-21T07:28:19   lol
2011-03-21T07:28:24   kinda is the right answer
2011-03-21T07:28:32   It's got the hooks we need to start adding pieces

But WebDriverBackedSelenium doesn't even exist in 2.0b2, which Sauce Labs currently provides..

2011-03-21T07:30:58   Unfortunately 2.0b2 doesn't have the webdriverbackedselenium
2011-03-21T07:31:32  * AutomatedTester didnt realise its been deleted
2011-03-21T07:31:41   It was a mitake
2011-03-21T07:31:43   mistake

Here is some background info from Google about Web Driver:


Docs for migrating:

  • http://saucelabs.com/docs/selenium2
    SauceLabs requires the 'name' parameter to name the job, since Selenium 2 now can execute jobs in parallel.
  • webdriver/remote.py contains the commands: find_element_by_id(), get(), toggle(), click(). You have to write the code in a way that gets the DOM element and then acts on it (basically JavaScript).
  • How does one execute JavaScript commands?
  • self.execute_javascript(script, *args) (located in remote/webdriver.py)

Interesting way to allow htpasswd changes via SSH..


How CKEditor strips out Word tags

If you've ever worked with <div> contenteditable, you know that copying/pasting from Word documents presents a major headache (injects Calibri font-face tags and new lines). How does CKEditor, one of the more popular WYSWYG editors, handle it? Here's a patch/diff submitted...it's not pretty:


Monday, May 9, 2011

XPath Selectors in IE7 for Selenium....

Apparently IE7 does not have a native XPath selector engine, so Selenium relies on the JavaScript
open-source version http://coderepos.org/share/wiki/JavaScript-XPath. You can confirm by unpackaging the Selenium server code (selenium-server-standalone-2.0b3) and dig through the core/xpath directory to confirm this point.

Suppose you have this DIV tag:

<input myAttr="hey"/>

And you want to use XPath selectors to find the appropriate attribute:
sel.click("//input[@myattr=%s]" % (id_selected))
sel.click("//input[@myAttr=%s]" % (id_selected))

You can look through all the possible XPath combinations by looking through the test code:

The JavaScript code also has a PathExpr.parse() function that does filter expression matching and parsing.

Sunday, May 8, 2011

Difference between attr_accessor and attr_accessible..


attr_accessor is ruby code and is used when you do not have a column in your database, but still want to show a field in your forms. The only way to allow this is to attr_accessor :fieldname and you can use this field in your View, or model, if you wanted, but mostly in your View.

attr_accessible allows you to list all the columns you want to allow Mass Assignment, as andy eluded to above. The opposite of this is attr_protected which means this field i do NOT want anyone to be allowed to Mass Assign to. More then likely it is going to be a field in your database that you don't want anyone monkeying around with. Like a status field, or the like.

Saturday, May 7, 2011

Sony XDCAM format, QuickTime, and the Calibrated scam

If you want to play Sony XDCAM files on the PC, you're pretty much out of luck. You need a QuickTime plug-in, which is only available from www.calibratedosoftware.com. The plug-in does work, but without paying for a license for EVERY single computer that will view it, you end up with the trial version with the Calibrated text sprawled across the entire video, making the trial version useless.


Ruby bundle installation on a Slicehost install (Ubuntu 10.0.4)

1. sudo apt-get install libsqlite3-dev
2. sudo apt-get install sqlite3
3. sudo apt-get install rubygems
4. sudo apt-get install libopenssl-ruby
5. sudo apt-get gem --update system
6. sudo /var/lib/gems/1.8/bin/update_rubygems
7. sudo gem install bundle
8. bundle install
9. rake db:schema:load

Why Bundle takes forever...it seems basic Slicehost installs with 256MB aren't sufficient to make bundle install work well. Fixed in https://github.com/carlhuda/bundler/issues/356, but it still runs too slow on a basic Slicehost instance.


Deploying to memory-constrained servers

When deploying to a server that is memory-constrained, like Dreamhost, you should run bundle package on your local development machine, and then check in the resulting Gemfile.lock file and vendor/cache directory. The lockfile and cached gems will mean bundler can just install the gems immediately, without contacting any gem servers or using a lot of memory to resolve the dependency tree. On the server, you only need to run bundle install after you update your deployed code.

You can update to the latest 'bundler' version, but I found it still didn't fix things.

1. git clone https://github.com/carlhuda/bundler.git
2. sudo gem-install rake
3. sudo gem-install rake1.8-dev
4. sudo rake spec:deps
5. sudo gem-install rcov
5. sudo rake


Thursday, May 5, 2011

Facebook's on.fb.me shortener on Facebook pages, hashbangs, and urlopen in Python..

Facebook's link shortener is actually a CNAME to bit.ly, and bit.ly has setup some type of nginx front-end that requires the Host: on.fb.me to be passed in. The urlopen() command will set a Host: (done in the AbstractHTTPHandler base class), but if you're testing with telnet, you need to make sure to set this Host: parameter explicitly.

It seems you have to add a Host: on.fb.me to the header:
telnet on.fb.me 80
Connected to cname.bit.ly.
Escape character is '^]'.
GET /[your shortened link here] HTTP/1.1
Host: on.fb.me

HTTP/1.1 301 Moved
Server: nginx

What happens though if you wish to use urlopen() on a Facebook shortened link, which points to a Facebook page? The big issue is that Facebook is starting to use #! in their Facebook pages and so urlopen() will throw 404 errors.

>>> import urllib2
>>> urllib2.urlopen('http://on.fb.me/[your shortened link here>']
Traceback (most recent call last):
    result = func(*args)
  File "/usr/lib/python2.6/urllib2.py", line 516, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found

If we want to fix this issue, we have to build a custom HTTPRedirectHandler. For now, we just strip out the entire URL fragment if it contains a '#' symbol.

class CustomHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
    # If a redirect happens within a 301, we deal with it here.

   def redirect_request(self, req, fp, code, msg, hdrs, newurl):

       parsed_url = urlparse.urlparse(newurl)

       # See http://code.google.com/web/ajaxcrawling/docs/getting-started.html
       # Strip out the hash fragment, since fragments are never (by
       # specification) sent to the server.  If you do, a 404 error can occur.
       # urllib2.urlopen() also will die a glorius death if you try, so you must
       # remove it.   See http://stackoverflow.com/questions/3798422 for more info.
       # Facebook does not really conform to the Google standard, so we can't
       # send the fragment as _escaped_fragment_=key=value.

       # Strip out the URL fragment and reconstruct everything if a hash tag exists.
       if newurl.find('#') != -1:
          newurl = "%s://%s%s" % (parsed_url.scheme, parsed_url.netloc, parsed_url.path)
       return urllib2.HTTPRedirectHandler.redirect_request(self, req, fp, code, msg, hdrs, newurl)

We can then do:
opener = urllib2.build_opener(CustomHTTPRedirectHandler())
req = urllib2.Request('http://on.fb.me[your shortened link here]')
print opener.open(req).read()

Also, urlopen() just does not like the '#' symbol and will report a 404 error...it just isn't obvious until you step through the urllib2.py code or install this redirect handler and add breakpoints to see what's going on....

Wednesday, May 4, 2011


DISTINCT columns need to be specified in the SELECT field because of Postgres limitations.


The reason every field in the table is included is because that's how
GROUP BY works - any field that is in the query that isn't an
aggregate needs to be included in the group by. When you call
Table.objects...., this implies SELECT Table.id, Table.name, ... at
the database level. Some other fields are included in the GROUP BY to
accomodate the needs of order_by() and select_related(). 


Tuesday, May 3, 2011

Removing trailing whitespace in Emacs automatically..


To remove trailing whitespace from the entire buffer, use any of the following:
  • ‘M-x delete-trailing-whitespace’ (GnuEmacs version 21 or later). You can put this in ‘before-save-hook’ to ensure that your files have no trailing whitespace:
(add-hook 'before-save-hook 'delete-trailing-whitespace)