Friday, April 13, 2012

Django v1.4 sessions

We recently upgraded to Django v1.4 and noticed that the session key was no longer being captured in our logs, or at least appearing as "None" instead of the unique identifier string. Normally these session keys are implicitly created on the first reference to request.session_key in your code (assuming that you include django.contrib.sessions middleware in your INSTALLED_APPS), but we noticed no new session cookies were being created.

Inside django.contrib.sessions, normally request.session is created:
class SessionMiddleware(object):
def process_request(self, request):
engine = import_module(settings.SESSION_ENGINE)
session_key = request.COOKIES.get(settings.SESSION_COOKIE_NAME, None)
request.session = engine.SessionStore(session_key)
Before Django v1.4, the request.session_key in django/backends/base.py is a property that will create a session key on the first reference:
    def _get_session_key(self):
if self._session_key:
return self._session_key
else:
self._session_key = self._get_new_session_key()
return self._session_key
Although Django v1.4 has documentation about a new signed cookie session approach, there is no mention of this change. It turns out that the session keys are no longer be explicitly created unless it is modified directly as shown in this diff:

https://code.djangoproject.com/changeset/17155
commit 864facf5ccc0879dde7e608a3bea3f4bee1d3a57
Author: aaugustin
Date: Sun Nov 27 17:52:24 2011 +0000

Fixed #11555 -- Made SessionBase.session_key read-only. Cleaned up code slightly. Refs #13478.

This also removes the implicit initialization of the session key on the first access in favor of explicit initialization.
In other words, it used to be that just referencing the session_key would cause its creation. The implications are that you must explicitly initialize a session key somewhere in your middleware code. If you start noticing None session keys, chances are that you were operating under this assumption. The fix is to create one somewhere in your middleware.

If your middleware comes after django.contrib.sessions.middleware.SessionMiddleware, then you could initialize the key with the following:
if hasattr(request, 'session') and hasattr(request.session, 'session_key') and getattr(request.session, 'session_key') is None:
logging.debug("Creating a session key since there is none..")
request.session.create()

This bug (with a proposed fix) has been filed in: https://code.djangoproject.com/ticket/18128

Thursday, April 12, 2012

Receiving git commit notifies by email

GitHub has the ability to check commits before they're merged, but what if you wanted to receive the commit diffs by email so that you knew when they actually made them into your main branch? The Ruby script git-commit-notifier helps fill this purpose, and you can take advantage of GitHub's post-receive hooks and use a Sinatra server to receive notifications such as the following via HTTP POST requests:

Example

So how would you integrate git-commit-notifier with GitHub? First you need to clone the branch on the host that can receive these POST requests. Because GitHub sends a post-receive hook for each merge, you can often receive multiple web requests being sent to your Sinatra server. The naive way would be to launch the script within the POST command (i.e. %x in Ruby), but using this approach requires any operation you perform to be completed before the HTTP connection is closed.

require 'rubygems'
require 'json'
require 'sinatra'

post '/' do
if params[:payload]
   push = JSON.parse(params[:payload])
 print "JSON response: #{push.inspect}"
end

   # Tried to run as a daemon, fork, but maybe just doing an exec will produce fewer dup msgs.
   system("mirror_repo.sh ")

end

The script that would perform the merge_repo.sh task would then be something of the following:
cd /home/myrepo
git fetch github
# Sends diffs between our branch and origin/master
CURRENT_HASH=`git log HEAD -1 --pretty=%H`
NEW_HASH=`git log origin/master -1 --pretty=%H`
# git merge will actually be a fast-forward
git merge origin
git log --no-merges --pretty=format:"%P %H" $CURRENT_HASH..$NEW_HASH | awk '{print "echo " $0 " refs/heads/master | git-commit-notifier /usr/local/config/git-commit-notifier.yml"}' | sh
git checkout master
exit 0

The basic approach fetches the branch (assuming it's located on 'github'), pipes the changes listed by the parent and git hash into using the git-commit-notifier script before merging the branch. Assuming the upstream branch is simply an ancestor of the current branch, the merging should be a fast-forward.

If you plan on mirroring the repository (i.e. via Gitosis), you could setup a post-receive hook that would be configured to send diffs each time they were merged.
while read oldrev newrev ref
do
if [ "$REFSPEC" == "refs/heads/master" ]; then
echo "$oldrev $newrev $ref" | git-commit-notifier  /usr/local/config/git-commit-notifier.yml
fi
done

Thursday, April 5, 2012

Tracking JavaScript exceptions with TraceKit

One of the challenges in modern browsers is trapping all the JavaScript exceptions your users might be experiencing. Internet Explorer (IE) is especially unforgiving about sending invalid JavaScript commands. If you issue $('#myid').text() on an input HTML tag causes, IE will raise a "Object doesn't support this property or method". IE also has some security restrictions that prevent you from submitting a change to an input file HTML tag, causing you to see an "Access Denied" error message but not on other browsers such as Chrome and Firefox. You wouldn't know about these issues unless you started finding ways to report these errors.

The first thing you might consider is to use window.onerror() to perform an Ajax server-side to log the message, url, and line number. This approach works, but provides limited information especially for minified JavaScipt code since the line number would always refer to first few lines of the code! Without a character number, you'd be lost to find the location of the offending code. Since the stack trace in JavaScript is lost once the window.onerror() function is called, you have to intercept the exception before it gets to this function.

An open-source alternative is to use TraceKit, which provides a cross-browser mechanism of capturing stack traces. You can then setup to start receiving stack traces similar to the following:
url: https://myhost.com/static/js/tracekit.js
line: 164
context:
}, (stack.incomplete ? 2000 : 0));

throw ex; // re-throw to propagate to the top level (and cause window.onerror)
}
func:

url: https://myhost.com/static/js/my.js
column: 45
line: 1841
context:

if ($("#mydata").length === 0) {
$('#myid').text(title).attr('href', url);
$('#myid2_url').text(url).attr('href', url);
func: myfunction_name


url: https://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js
column: 4351
line: 3
context: None
func: HTMLDocument.

(You may notice that rethrow line at the top. For Internet Explorer, the exception needs to be thrown to the top of the browser. For other browsers, you'll have to live with this extraneous information.)

How does TraceKit work? Since only the error message, URL, and line number is reported once window.onerror() is called, the key to this approach is to wrap jQuery document.ready(), $.event.add, and $.ajax calls around a try/except block, collecting the stack traces, finding the code snippet from the line number/column number, and adding a reporting handler that will be called after all the stack traces have been pulled.

You can read through the rest of the code to see how the stack traces are derived for Internet Explorer, Chrome, Safari, and Firefox. In order to derive the code snippet, TraceKit actually performs an XmlHttpRequest() call (in other words, an Ajax request) to retrieve this file from your localhost. As a result, any document that's not located on your server will be printed as "HTMLDocument." in the final stack trace output. The current version on the GitHub repo also doesn't obey this restriction, so there's a pull request that will attempt to retrieve the .JS files only if they are part of the document.domain. There is also pull request to retrieve files with https:// prefixes and deal with jQuery 1.7+, which introduces an additional parameter in the $.event.add() function.

You can then setup a reporting handler to POST this data back to your server:
TraceKit.report.subscribe(function (stackInfo) {                                                                                                                                                           
$.ajax({
url: '/error_reporting/',
type: 'POST',
data: {
browserUrl: window.location.href,
stackInfo: JSON.stringify(stackInfo)
}
});
return true; //suppress error on client
});
The information that gets passed includes a list of stack traces with each element as a dictionary. You can decode this information and output the information with something like the following code:

def process_js_error(request):
stack_info = request.POST.get('stackInfo', '')
logging.debug("stackInfo: %s" % stack_info)
try:
stack_info = json.loads(stack_info)
except ValueError:
content = "Stack Info: %s\nBrowser URL: %s\nUser Agent: %s\n" % (stack_info, request.POST['browserUrl'], request.META['HTTP_USER_AGENT'])
else:
# Prettier formatting stack trace.
if isinstance(stack_info['stack'], list):
stack_trace = "\n"
for stack_item in stack_info['stack']:
for key, val in stack_item.iteritems():
if isinstance(val, list) and val:
stack_trace += "%s: \n%s\n" % (key, "\n".join(val))
else:
stack_trace += "%s: %s\n" % (key, val)
stack_trace += "\n"

else:
stack_trace = stack_info['stack']

content = """
Error Type: %(error_type)s
Browser URL: %(browser_url)s
User Agent: %(user_agent)s
User Name: %(user_name)s

"" % {'error_type': stack_info['message'],
'browser_url': request.POST['browserUrl'],
'user_agent': request.META['HTTP_USER_AGENT'],
'stack_trace': stack_trace}

There are a bunch of pull requests that you may want to consider when using TraceKit, some of which were discussed in this writeup. Remember that there are a couple of limitations mentioned when using TraceKit. First, Chrome/Firefox/Opera for the most part will output full stack frames with column numbers but Internet Explorer will only report the top stack frame and the column numbers are not always guaranteed. Furthermore, code snippets can only be made available if they are locally hosted on your site (i.e. for files hosted on CDN such as those prefixed with https://ajax.googleapis.com/) Hope you find this tool useful for tracking down your JavaScript exceptions as we did!

Tuesday, April 3, 2012

Git 1.7.4.2

Git for Ubuntu 10.04 has a major bug in that it keeps file descriptors open for every pack file. If you use git --mirror, you will easily run out of file descriptors. You may see the following error messages.
error: cannot create pipe for index-pack: Too many open files
fatal: fetch-pack: unable to fork off index-pack
error: cannot create pipe for pack-objects: Too many open files
fatal: git pack-objects failed: Too many open files
error: cannot create pipe for index-pack: Too many open files
fatal: fetch-pack: unable to fork off index-pack
error: cannot create pipe for pack-objects: Too many open files
fatal: git pack-objects failed: Too many open files
The fix was made in Git 1.7.4.2:
https://raw.github.com/gitster/git/master/Documentation/RelNotes/1.7.4.2.txt

 * We used to keep one file descriptor open for each and every packfile    that we have a mmap window on it (read: "in use"), even when for very    tiny packfiles.  We now close the file descriptor early when the entire    packfile fits inside one mmap window.

The fix is to add the repository to your /etc/apt/sources.list file at https://launchpad.net/~git-core/+archive/ppa:
https://launchpad.net/~git-core/+archive/ppa
deb http://ppa.launchpad.net/git-core/ppa/ubuntu lucid main
deb-src http://ppa.launchpad.net/git-core/ppa/ubuntu lucid main

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E1DF1F24
sudo apt-get update
sudo apt-cache search git
sudo apt-get install git=1:1.7.9.4-1.1~ppa1~lucid1