Wednesday, August 31, 2011

The trouble with Protected Mode in Selenium 2 for setting cookies...

If you try to set cookies using the WebDriver API through Selenium 2, you may find that Internet Explorer fails to even set the cookie. The issue has been reported here:

Selenium has a bunch of test suites to verify the behavior so it seemed strange that there would be an issue. Examining the IEDriver code too shows that the AddCookieCommandHandler is very similar to behavior of DeleteAllCookiesHandler and other IE command handlers, so I didn't really find an issue. Nor did the AddCookie() method in DocumentHost.cpp handles the dispatching of adding cookies.

If we do:
>>> driver = webdriver.Remote(desired_capabilities=current_env,
>>> driver.execute_script("document.cookie='a=1';")
>>> driver.get_cookies()
[{u'name': u'a', u'value': u'1', u'path': u'/', u'hCode': 97, u'class': u'org.openqa.selenium.Cookie', u'secure': False}, {u'name': u'sessionid', u'value': u'e1257265399f35b5c7ae4cf630581c90', u'path': u'/', u'hCode': 607797809, u'class': u'org.openqa.selenium.Cookie', u'secure': False}]
>>> driver.delete_cookie('a')
>>> driver.get_cookies()
[{u'name': u'sessionid', u'value': u'e1257265399f35b5c7ae4cf630581c90', u'path': u'/', u'hCode': 607797809, u'class': u'org.openqa.selenium.Cookie', u'secure': False}]
>>> driver.add_cookie({'name' : 'a' , 'value' : '2', 'secure' : False})
(Pdb) driver.get_cookies()
[{u'name': u'sessionid', u'value': u'e1257265399f35b5c7ae4cf630581c90', u'path': u'/', u'hCode': 607797809, u'class': u'org.openqa.selenium.Cookie', u'secure': False}]

Both delete commands will work successfully. But doing an add_cookie() fails to work if IE7/IE8 are in Protected mode in Selenium 2.5.0, even though Internet Explorer/IEDriver does not report any error. However, I was able to get cookies to be set once I disabled Protected mode.

Since Selenium 2.5.0 requires all Protected Mode settings to be consistent, you have to go into your Internet Options and uncheck the Protected Mode for every single zone (i.e. click through the icons for Internet, Local Internet, Trusted sites, Restricted sites). Then cookies can be correctly set.

It appears that IE7 and IE8 have this issue. IE9 may not have this problem. I was not able to get cookies set in either Protected/non-Protected mode using Selenium v2.0.0b3 though so you may still need to upgrade to Selenium 2 beyond this version to get cookie support working in IE7/IE8.

Tuesday, August 30, 2011

Using add_cookie in Selenium 2

The documentation in the Python bindings for using the add_cookie() function Selenium 2 are unclear. The add_cookie() appears to take in a simple key/value pair:

def add_cookie(self, cookie_dict):
        """Adds a cookie to your current session.
            cookie_dict: A dictionary object, with the desired cookie name as the key, and
            the value being the desired contents.
            driver.add_cookie({'foo': 'bar',})
        self.execute(Command.ADD_COOKIE, {'cookie': cookie_dict})
If you're encountering NullPointerExceptions similar to a bug it's possible the problem is that your dictionary needs to include name, value, path, and secure keys. The tests in selenium/webdriver/common/ appear to back this point up:

self.COOKIE_A = {"name": "foo",
                         "value": "bar",
                         "path": "/",
                         "secure": False}

    def testAddCookie(self):
        self.driver.execute_script("return document.cookie")

Even the section posted at suggest that adding cookie just a matter of connecting to using a key/value pair too:

Before we leave these next steps, you may be interested in understanding how to use cookies. First of all, you need to be on the domain that the cookie will be valid for:

# Go to the correct domain

# Now set the cookie. This one's valid for the entire domain
cookie = {"key": "value"})

# And now output all the available cookies for the current URL
all_cookies = driver.get_cookies()
for cookie_name, cookie_value in all_cookies.items():
    print "%s -> %s", cookie_name, cookie_value

For disabling the Django debug toolbar in Selenium 2, then the command should be:
self.selenium.add_cookie({"name" : "djdt",
                          "value" : "true",
                          "path" : "/",
                          "secure" : False})

As of Selenium v2.5.0, It appears that all name/value and secure must be specified to avoid triggering the NullPointerException error.

An issue report has been filed here:

Sunday, August 28, 2011

Extracting audio clips from YouTube videos

If you use the stock Ubuntu v10.04 youtube-dl version, you may encounter this error message when trying to download a YouTube clip:
ERROR: no fmt_url_map or conn information found in video info
The solution is to git clone the youtube-dl repo and use the latest youtube-dl version:
git clone

Extracting only the audio portion means that you should set the -vn option, which disables video encoding. The -acodec option determines the output format. So you would execute the command (depending if you want Ogg bitstream or Mp3 format)

ffmpeg -i  -vn -acodec vorbis 
ffmpeg -i  -vn -acodec mp3 

Saturday, August 27, 2011

Getting branch support to work with Nose/Coverage

While Ned Batchelder's coverage utility has supported branch measurements for sometime, it hasn't been supported in the main line of the nose unit discovery util. We can see in the upcoming v1.1.3 release that the --cover-branches option will be supported:

Currently nose v1.1.3 is still labeled as a development version, so you'd have to get pip install nose==dev in order to install this copy.

pip install --upgrade coverage
pip install --upgrade nosexcover
pip install --upgrade nose==dev (1.1.3)

An alternative, which has long been suggested in discussion groups, is to create a .coveragerc file to enable branch coverage by default. This file must be placed in the location where is run, not necessarily in your home directory:
If you've enabled things correctly, you should see the header (instead of the default) as follows:
Name                                              Stmts   Miss Branch BrPart  Cover   Missing
If you're also using --with-xunit to generate Cobertura-style XML reports, hopefully you should also see the branch conditionals also being tallied correctly too!

Wednesday, August 17, 2011

Facebook's OAuth2 support for Python

Facebook recently announced that they will be phasing in OAuth 2.0 support and require its use starting October 1, 2011. On the JavaScript SDK side, there are several  changes on the JavaScript code that have to be done, which are listed as follows.

You can download the Python code here:

1. FB.init has to be initialized with the Facebook APP ID instead of the API Key, though the apiKey parameter still is used.

2. The oauth: true options. must be set in the FB.init() calls.

In other words, the code changes would be:
FB.init({apiKey: facebook_app_id,
             oauth: true,
             cookie: true});
3. Instead of response.session, the response should now be response.authResponse. Also,
make note that scope: should be used instead of perms:
FB.login(function(response) {
    if (response.authResponse) {
    {scope: 'email,publish_stream,manage_pages'}
Also, if you need to retrieve the user id on the JavaScript, the value is stored as response.authResponse.userID instead of response.session.uid:
       { method: 'fql.query',
        query: 'SELECT ' + permissions.join() + ' FROM permissions WHERE uid=' + response.authResponse.userID},
        function (response) { });

If you see yourself not being able to logout, it means you haven't set the right APP ID or forgot to set oauth: true in both your login and logout code. If you're going to make the change, you should make it everywhere in your code!

On the Python/Django side, you need to implement a few helper routines. If Facebook authenticates properly, a cookie with the prefix fbsr_ will be set as a cookie (instead of fbs_). This signed request includes an encoded signature and payload, which must be separated and verified. You can look at the PHP SDK code to understand how it's implemented, or you can review this Python version of the code (see
def parse_signed_request(signed_request, secret):

    encoded_sig, payload = signed_request.split('.', 2)

    sig = base64_urldecode(encoded_sig)
    data = json.loads(base64_urldecode(payload))

    if data.get('algorithm').upper() != 'HMAC-SHA256':
        return None
        expected_sig =, msg=payload, digestmod=hashlib.sha256).digest()

    if sig != expected_sig:
        return None

    return data
In the PHP SDK code, there is a base64_url_decode function that automatically adds the correct number of "=" characters to the end of the Base64 encoded string. The basic problem is that Base64 encodes 3 bytes for every 4 characters, so the total length will be 4*len(string)/3. We can use this knowledge to realize that the total length will be a multiple of 4 and then insert the appropriate number of '=' characters to the end of the string. Facebook also appears to use a Base64-uRL variant in which the '+' and '/' characters of standard Base64 are respectively replaced by '-' and '_', which then must be replaced during the decode process (see The code looks like the following:
def base64_urldecode(data):
    # 1. Pad the encoded string with "+".                                                                          
    # See                                         
    data += "=" * (4 - (len(data) % 4) % 4)

    return base64.urlsafe_b64decode(data)
If you're using the old Python SDK implementation, you may wish to implement code that mimics the way in which the Python SDK implemented get_user_from_cookie, since the expires, session_key, and oauth_token can be derived from retrieving the access token. We also set an fbsr_signed parameter in case you have debugging statements in your code and want to differentiate between your old get_user_from_cookie from this code.

Note: in order to make things backward-compatible, you need to make an extra URL request back to Facebook to retrieve the access token. This code was also inspired from the Facebook PHP SDK code too:
def get_access_token_from_code(code, redirect_url=None):
    """ OAuth2 code to retrieve an application access token. """

    data = {
        'client_id' : settings.FACEBOOK_APP_ID,
        'client_secret' : settings.FACEBOOK_SECRET_KEY,
        'code' : code,

    if redirect_url:
        data['redirect_uri'] = redirect_url
        data['redirect_uri'] = ''

   return get_app_token_helper(data)


def get_app_token_helper(data=None):
    if not data:
        data = {}

        token_request = urllib.urlencode(data)

        app_token = urllib2.urlopen(BASE_LINK + "/oauth/access_token?%s" % token_request).read()
    except urllib2.HTTPError, e:
        logging.debug("Exception trying to grab Facebook App token (%s)" % e)
        return None

    matches = re.match(r"access_token=(?P.*)", app_token).groupdict()

    return matches.get('token')

Tuesday, August 16, 2011

How to redirect bash time outputs..

How can I redirect the output of 'time' to a variable or file?

Bash's time keyword uses special trickery, so that you can do things like
   time find ... | xargs ...
and get the execution time of the entire pipeline, rather than just the simple command at the start of the pipe. (This is different from the behavior of the external command time(1), for obvious reasons.)
Because of this, people who want to redirect time's output often encounter difficulty figuring out where all the file descriptors are going. It's not as hard as most people think, though -- the trick is to call time in a SubShell or block, and then capture stderr of the subshell or block (which will contain time's results). If you need to redirect the actual command's stdout or stderr, you do that inside the subshell/block. For example:
  • File redirection:
       bash -c "time ls" 2>time.output      # Explicit, but inefficient.
       ( time ls ) 2>time.output            # Slightly more efficient.
       { time ls; } 2>time.output           # Most efficient.
       # The general case:
       { time some command >stdout 2>stderr; } 2>time.output

Saturday, August 13, 2011

Vizio 42" LCD TV Tivo remote control code

Tivo remote code is 0128.

Bash tests

Apache's mod_rewrite and changing URL cases...

Suppose we want to use Apache's mod_rewrite RewriteMap: to

RewriteMap uppercase int:toupper
RewriteRule [a-z] %{uppercase:%{REQUEST_URI}} [L,R=301]
The list of internal functions are listed here:
Internal Function
MapType: int, MapSource: Internal Apache function
Here, the source is an internal Apache function. Currently you cannot create your own, but the following functions already exist:

Converts the key to all upper case.
Converts the key to all lower case.
Translates special characters in the key to hex-encodings.
Translates hex-encodings in the key back to special characters.

You can also use this approach to map static content to a list of servers:

Rewrite map file

##  map.txt -- rewriting map

static   www1|www2|www3|www4
dynamic  www5|www6
Configuration directives

RewriteMap servers rnd:/path/to/file/map.txt

RewriteRule ^/(.*\.(png|gif|jpg)) http://${servers:static}/$1 [NC,P,L]
RewriteRule ^/(.*) http://${servers:dynamic}/$1 [P,L]
Hash File

Tuesday, August 9, 2011

Celery v2.3.1

Celery v2.3.0 has a new setting for the CELERY_RESULT_BACKEND that allows you to store the results of your apply_async() and dispatch() calls in something other than an AMQP-based backend. In previous versions Celery (without the ignore_result=True) would store these results as a message created by a separate queue corresponding to the taskset ID (a UUID). If you had a lot of tasks without consuming them (i.e. checking he result), you would eventually exhausting the memory usage.

The problem is well-described here. One of the issues was using an older version of RabbitMQ, which used a different persister that would try to keep everything in memory and would crash. With recent changes in RabbitMQ, which allow task results to be expired, the problem is much more mitigated. Nonetheless, setting ignore_result=True also helps with this respect. With the recent Celery v2.3.0 release you can also use a different backend (i.e. Redis) to store these task set results!

Note: Celery is still highly dependent on an AMQP host. Just because you can change the CELERY_RESULT_BACKEND doesn't mean you can use a completely different messaging system.

Using find with --regex option

Note that you need to specify .*/ in the beginning because find matches the whole path.

Sunday, August 7, 2011

Experimenting with the Kombu framework

The Kombu framework is an excellent way to work with RabbitMQ/AMQP message brokers.   The documentation has several different ways of doing so by instantiating the exchange, queue, and then the RabbitMQ (AMQP broker) connection (See The documentation at shows how you can use the py-amqplib to talk to RabbitMQ hosts, but the Kombu framework with its ability to support multiple back-ends provides a much more elegant approach.

Here's a simple code that we can use to talk to our queue if we also have our Celery configuration settings defined too. In this example, we only use the Queue class to consume messages. We can bind the default queue and exchange to the channel and then register a callback that will dump the message to stdout.

from celery.conf import settings
from kombu.connection import BrokerConnection
from kombu.messaging import Exchange, Queue, Consumer

connection = BrokerConnection(settings.BROKER_HOST, settings.BROKER_USER, settings.BROKER_PASSWORD, settings.BROKER_VHOST)

# RabbitMQ connection
channel =

default_exchange = Exchange("default", "direct", durable=True)
default_queue = Queue("default", exchange=default_exchange, key="default")
bound_default_queue = default_queue(channel)

def process_msg(msg):
    print "%s" % repr(msg)


while True:

We can also do the same by declaring a Consumer class too and calling the consume() to register the Consumer:

from celery.conf import settings
from kombu.connection import BrokerConnection
from kombu.messaging import Exchange, Queue, Consumer

connection = BrokerConnection(settings.BROKER_HOST, settings.BROKER_USER, settings.BROKER_PASSWORD, settings.BROKER_VHOST)

# RabbitMQ connection                                                                                                                                                                                                                          
channel =

default_exchange = Exchange("default", "direct", durable=True)

default_queue = Queue("default", exchange=default_exchange, key="default")

def process_msg(body, msg):
    print "body %s, msg %s" % (repr(body), repr(msg))

consumer = Consumer(channel, default_queue, callbacks=[process_msg])

while True:

How scheduled tasks and Celery and Kombu interfaces with RabbitMQ

A really good overview of how PBS Education Technology uses Celery:

If you want to learn the AMQP interface and try building consumers, producers, exchanges, queues, the Python code here is extremely useful to learning how the basic standard works with the amqplib, which is the basis on which Celery is built.  Celery handles a lot of the higher-level functionality of building task queues, but the underlying plumbing is built on the Kombu framework (originally Carrot but completely rewritten):

The internals of how scheduled tasks are outlined in the Celery documentation:

When you create a scheduled task in Celery (using either the eta= or countdown= parameter), a message gets created that gets pickled (assuming you're using the default CELERY_TASK_SERIALIZER as pickle) that includes an 'eta' keyword:

print pickle.loads(msg.body).keys()
['retries', 'task', 'args', 'expires', 'eta', 'kwargs', 'id']
(Pdb) print pickle.loads(msg.body)['eta']

When you startup a Celery worker, it will create a Consumer that will begin to receive messages from the RabbitMQ broker. If it sees a message with an eta parameter, then the task is moved into an ETA scheduler queue instead of the ready queue. The message itself is not acknowledged until the task is executed, though since it is received by the Consumer, any other workers will not receive the same task. If the Consumer disconnects or loses connection from the RabbitMQ worker, then RabbitMQ will attempt to redeliver this message. The message itself will not be deleted until an acknowledge is sent back to the broker:


def ack(self):
       """Acknowledge this message as being processed.,                                     
       This will remove the message from the queue.                                         
       :raises MessageStateError: If the message has already been                           
       if is not None:
           consumer_tag = self.delivery_info["consumer_tag"]
           if consumer_tag in
       if self.acknowledged:
           raise self.MessageStateError(
               "Message already acknowledged with state: %s" % self._state)
       self._state = "ACK"

When a message is received by Celery, it invokes a function called from_message(), which then passes on to the on_task() to insert into the queue. Notice that the 'ack' function, which will be used to acknowledge a message once it has been processed, is passed along to this routine.

task = TaskRequest.from_message(message, message_data, ack,
The ready queue itself (assuming no rate limits) uses a basic Python queue and uses the process_task function, which will then call self.acknowledge() and invoke the ack function that was passed into the initial TaskRequest.from_message creation.
if disable_rate_limits:
            self.ready_queue = FastQueue()
            self.ready_queue.put = self.process_task
Also, it appears that revoked tasks are not persistent if you do not setup a CELERYD_STATE_DB (defaults to None). Celery appears to keep all revoked tasks in memory and skips tasks if they are in this list of revoked task ID's without this setting. Without this configuration variable, all revoked tasks will be forgotten if you restart Celery.

Friday, August 5, 2011

Changes in Facebook's SWF code.

Yesterday evening (8/4/2011) at 9:55 pm, Facebook changed some code that affects its Flash code which is used by Internet Explorer to handle cross-domain communication:

1 /*1312412724,169546110,JIT Construction: v416050,en_US*/
1 /*1312520159,169918336,JIT Construction: v416929,en_US*/
33 if (!window.FB) window.FB = {
44     _apiKey: null,
3030             return FB._domain.api_read;
3131         case 'cdn':
3232             return (window.location.protocol == 'https:' || FB._https) ? FB._domain.https_cdn : FB._domain.cdn;
33         case 'cdn_foreign':
34             return FB._domain.cdn_foreign;
3335         case 'https_cdn':
3436             return FB._domain.https_cdn;
3537         case 'graph':
246248             for (var a = 0, b = FB.Flash._callbacks.length; a < b; a++) FB.Flash._callbacks[a]();
247249             FB.Flash._callbacks = [];
248250         };
249         FB.Flash.embedSWF('XdComm', FB.getDomain('cdn') + FB.Flash._swfPath);
251         FB.Flash.embedSWF('XdComm', FB.getDomain('cdn_foreign') + FB.Flash._swfPath);
250252     },
251253     embedSWF: function(d, e, b) {
252254         var a = !! document.attachEvent,
49524954         "api": "https:\/\/\/",
49534955         "api_read": "https:\/\/\/",
49544956         "cdn": "https:\/\/\/",
4957         "cdn_foreign": "https:\/\/\/",
49554958         "graph": "https:\/\/\/",
49564959         "https_cdn": "https:\/\/\/",
49574960         "https_staticfb": "https:\/\/\/",
49684971     "_minVersions": [
49694972         [10, 0, 22, 87]
49704973     ],
4971     "_swfPath": "rsrc.php\/v1\/yx\/r\/WFg56j28XFs.swf"
4974     "_swfPath": "rsrc.php\/v1\/yK\/r\/RIxWozDt5Qq.swf"
49724975 }, true);
49734976 FB.provide("XD", {
49744977     "_xdProxyUrl": "connect\/xd_proxy.php?version=3"

You can fetch the new SWF file at (note though that the diff above indicates that the SWF must now be downloaded by the browser from
By decompiling the SWF file using Sothink's SWF Decompiler (the unregistered version allows you to export up to the first two FLA files you designate to save), you can review the changes that were made.


<         public static function extractPathAndQuery(param1:String) : String
<         {
<             return /^\w+:\/\/[^\/]+(.*)$""^\w+:\/\/[^\/]+(.*)$/.exec(param1)[1
<         }// end function
It also appears that the XDComm receiver must be downloaded/loaded from now, or at least originate from the with an /intern/ URL specified.  Otherwise, the cross-domain receiver will not initiate.

diff XDComm_old.a
<             XdComm.fbTrace("XdComm Constructor", {url:stage.loaderInfo.url});
>             XdComm.fbTrace("XdComm Initialized", {});
<             var _loc_4:String = null;
<             var _loc_2:* = stage.loaderInfo.url;
<             var _loc_3:* = PostMessage.extractDomain(_loc_2);
<             if (_loc_3 != "")
<             {
<                 XdComm.fbTrace("XdComm is not loaded from
", {swfDomain:_loc_3});
<                 if (_loc_3.substr(-13) == "")
<                 {
<                     _loc_4 = PostMessage.extractPathAndQuery(_loc_2);
<                     if (_loc_4.substr(0, 8) != "/intern/")
<                     {
<                         XdComm.fbTrace("XdComm is NOT in intern mode", {swfPat
<                         return;
<                     }
<                     XdComm.fbTrace("XdComm is in intern mode", {swfPath:_loc_4
<                 }
<                 else
<                 {
<                     return;
<                 }
<             }
<             return param3;
>             if (param2 == 0)
>             {
>             }
>             else
>             {
>                 return param3;
>             }
>             return;
>             traceObject(param2);

The different versions of and are posted here: ( (

What the prefetch in Celery means...

Prefetch Limits

Prefetch is a term inherited from AMQP that is often misunderstood by users.
The prefetch limit is a limit for the number of tasks (messages) a worker can reserve for itself. If it is zero, the worker will keep consuming messages, not respecting that there may be other available worker nodes that may be able to process them sooner[#], or that the messages may not even fit in memory.

The workers’ default prefetch count is the CELERYD_PREFETCH_MULTIPLIER setting multiplied by the number of child worker processes[#].

If you have many tasks with a long duration you want the multiplier value to be 1, which means it will only reserve one task per worker process at a time.

However – If you have many short-running tasks, and throughput/roundtrip latency[#] is important to you, this number should be large. The worker is able to process more tasks per second if the messages have already been prefetched, and is available in memory. You may have to experiment to find the best value that works for you.
Values like 50 or 150 might make sense in these circumstances. Say 64, or 128.
If you have a combination of long- and short-running tasks, the best option is to use two worker nodes that are configured separatly, and route the tasks according to the run-time. (see Routing Tasks).
[†]RabbitMQ and other brokers deliver messages round-robin, so this doesn’t apply to an active system. If there is no prefetch limit and you restart the cluster, there will be timing delays between nodes starting. If there are 3 offline nodes and one active node, all messages will be delivered to the active node.
[‡]This is the concurrency setting; CELERYD_CONCURRENCY or the -c option to celeryd.

Reserve one task at a time

When using early acknowledgement (default), a prefetch multiplier of 1 means the worker will reserve at most one extra task for every active worker process.
When users ask if it’s possible to disable “prefetching of tasks”, often what they really want is to have a worker only reserve as many tasks as there are child processes.
But this is not possible without enabling late acknowledgements acknowledgements; A task that has been started, will be retried if the worker crashes mid execution so the task must be idempotent (see also notes at Should I use retry or acks_late?).
You can enable this behavior by using the following configuration options:

Thursday, August 4, 2011

Using the git completion bash script

Within the file, the description is listed here:

3) Consider changing your PS1 to also show the current branch:
#         Bash: PS1='[\u@\h \W$(__git_ps1 " (%s)")]\$ '

PS1 is the bash custom prompt:
The default bash prompt is usually:
echo $PS1

...which is defined in /etc/bash.bashrc:
# set variable identifying the chroot you work in (used in the prompt below)
if [ -z "$debian_chroot" ] && [ -r /etc/debian_chroot ]; then
    debian_chroot=$(cat /etc/debian_chroot)

The ${debian_chroot} is defined in /etc/bash.bashrc if the directory is chrooted, which it normally isn't so is left blank. The \u @ \h refers to the username and the host, the \w refers to the current working directory, and \$ will show '$' if the UID is not 0 (in which case, will show '#' to indicate root-level access)

Tuesday, August 2, 2011

time.mktime() versus calendar.timegm()

time.mktime() assumes that the passed tuple is in local time, calendar.timegm() assumes its in GMT/UTC. Depending on the interpretation the tuple represents a different time, so the functions return different values (seconds since the epoch are UTC based).
The difference between the values should be equal to the time zone offset of your local time zone.

Same-origin policy..

The same origin policy requires that the protocol, port, and host are the same for both pages.  Otherwise, Firefox apparently "preflights" the HTTP request:

Mozilla considers two pages to have the same origin if the protocol, port (if one is specified), and host are the same for both pages. The following table gives examples of origin comparisons to the URL
URLOutcomeReason protocol port host