Thursday, October 27, 2011

Celerybeat and celerybeat-schedule

In my effort to attempt to replace /etc/init.d/celerybeat with a version that worked more reliably with fabric, one of the discoveries is that celerybeat keeps firing off all tasks from the scheduled task list because Celerybeat stores the last_run time of a scheduled task usually in a celerybeat-schedule in the default dir)...if it hasn't been run in awhile (by virtue of using an older celerybeat-schedule) then you may see a lot of "Sending due task".

You can check the last_run_at apparently by using the shelve library from Python, which celerybeat uses to store all the scheduled tasks defined in your Celery configurations. Each time you restart Celerybeat, this celerybeat-schedule gets merged with your Celery scheduled tasks. Those that don't exist already are added.
sudo python
>>> import shelve
>>> shelve.open("/var/run/celerybeat-schedule")
>>> a['entries']['my_task'].last_run_at
datetime.datetime(2011, 10, 28, 2, 1, 57, 717454)
The key is to specify explicitly define the celerybeat-schedule:

/etc/default/celerybeat:
export CELERYBEAT_OPTS="--schedule=/var/run/celerybeat-schedule"

IPython: interactive Python

- who: shows you what var
- store: stores a variable to a file (%store foo > a.txt)
- reset: clears namespace
- logstart, logon, logoff
- lsmagic

- run -d <file>: run python code step-by-step
- run -p <file>:

- xmode Context (xmode Verbose: shows the call values)
- pdb: turns on uncaught exception
- time (func): times run

Django caches all its models:

http://stackoverflow.com/questions/890924/how-do-you-reload-a-django-model-module-using-the-interactive-interpreter-via-m/903943#903943

Saturday, October 22, 2011

Integrating OpenID Google Apps Single Sign On with Hudson/Jenkins....

A not so well-documented aspect of using Hudson is that you can integrate OpenID single-sign on (SSO) with your Google Apps domain. You could implement SSO using the Jenkins Crowd plugin that comes pre-packaged with Hudson, but then you'd have to do custom integration work. Since the Crowd protcol is all SOAP-based, just getting the SOAP bindings right can be a big pain. Then you'd have to go about either setting up Crowd identity server or creating your own version via the Crowd API.

The OpenId plugin does not seem to be provided with the Hudson/Jenkins v2.1.2 release, but you can download and install it yourself. You do need the Sun version of Java (not OpenJDK), since there seems to be some sun.com dependencies existing in the Jenkins code base (the instructions for setting up on Ubuntu are listed here). You also need to install Maven too (sudo apt-get install maven2) and configure your ~/.m2/settings.xml.

Once Java and maven are setup, you can clone the OpenID repo and compile:

1) git clone https://github.com/jenkinsci/openid-plugin.git

2) mvn

If the compile was successful, the openid.hpi plugin should have been compiled into the target/ dir. You need to copy this open.hpi into your Hudson plugins/ dir (i.e. /var/lib/hudson/plugins). You don't appear to need to add an openid.hpi.pinned to avoid Hudson from overwriting this package, since the OpenID does come with Jenkins by defualt.

3) The OpenID plugin expects that the URL that a user connects to your continuous integration ends with a trailing slash ('/'). In your Apache2 config, you may find that you need to add a rewrite rule to force connections to your server always to end with a '/'. If your server is just http://hudson.myhost.com, the rewrite rule becomes:

RewriteEngine on
  RewriteRule  ^$  /  [R]

(The major reason is that the getRootUrl() command in the Jenkins code base borrows from the request URL). The OpenID plugin, when concatenates the OpenID finish callbacks, assumes that there will be a trailing slash at the end. Without it, your OpenID authorization flows may not work):

src/main/java/hudson/plugins/openid/OpenIdSession.java:
receivingurl="Hudson.getInstance().getRootUrl()+finishUrl";

If you notice that the OpenID callbacks (i.e federatedLoginService/openid/finish) are not prefixed with a '/', it means that you are missing this trailing slash!

4) Inside the Hudson configuration screen, the OpenID SSO will be https://www.google.com/accounts/o8/id. Your permissions will be defined by the email address of the SSO. If you do not wish anonymous users to be able to login, you should make sure that they do not have any types of permissions.

5) Make sure to enable OpenID SSO support in your Google Apps domain.  The checkbox should be enabled inside "Manage this domain"->"Advanced Tools"->"Federated Authenticatin using OpenID".

One extra bonus...if you're using the Git plugin with Hudson, you may have also noticed, depending on which version of the Git plugin, that User accounts were based either on the full name or the e-mail username of the Git committer. If you want the user accounts associated with your Git committers to also be linked to your SSO solution, then this pull-request may also be useful.

https://github.com/rogerhu/git-plugin/pull/new/fix_git_email

(If you have pre-existing users, you may wish to convert their user directories from "John Doe" to jdoe@myhost.com to be consistent.)

(Interesting note: the Git plugin used in the Jenkins/Hudson 2.1.2 release is located at https://github.com/hudson-plugins/git-plugin, whereas the older v1 versions are at https://github.com/jenkinsci/git-plugin. The code base appears to have diverged a little bit, so one commit patch incorporated in https://github.com/jenkinsci/git-plugin.git 3607d2ec90f69edcf8cedfcb358ce19a980b8f1a that attempted to create accounts based on the Git commiter's username is not included in the v2.1.2 Jenkins release.)

Also, if you use automated build triggers, it appears they still work even if you turned on the OpenID SSO on too!

Update: it looks like the Git plug-in will start to expose an option to use the username's entire email address as a Hudson/Jenkins option.  See the PR below:

https://github.com/hudson-plugins/git-plugin/pull/31/files

Thursday, October 20, 2011

start-stop-daemon

Wondering what the internals of the start-stop-daemon source code are?

http://doxygen.kannel.org/d6/d8e/start-stop-daemon_8c-source.html

Crowd and WSDL

Need the latest copy of the Crowd WSDL file?

1. Visit http://www.atlassian.com/software/crowd/CrowdDownloadCenter.jspa

2. Download a copy.

3. Unpack the .jar files, and go into the atlassian-x.x.x directory.

4. vi crowd-webapp/WEB-INF/classes/crowd-init.properties

crowd.home=/home/myuser/projects/atlassian-crowd/data

5. ./start_crowd.sh

6. Go to http://localhost:8095 (or your dev server IP).

7. You should be able to connect and setup the Crowd service.

8. Go through the setup flow, and get a license key from Atlassian .

9. wget http://yourhost.com:8095/crowd/services/SecurityServer?wsdl

Need to get the WSDL working in Python? Either use ZSI (which is Google App Engine compatible) or the Python suds library:

https://jira.atlassian.com/browse/CWD-159

The instructions below will show you how to do Crowd authentication using ZSI:

http://tearsoffire.org/twiki/bin/view/Projects/CrowdSoapApi

from SecurityServer_services import SecurityServerLocator, SecurityServerHttpBindingSOAP
import SecurityServer_services as sss
from SecurityServer_services_types import ns0

loc = SecurityServerLocator()
server = loc.getSecurityServerPortType()

#build up the application authentication token
r = ns0.ApplicationAuthenticationContext_Def('ApplicationAuthenticationContext')
cred = ns0.PasswordCredential_Def('_credential').pyclass()
req = sss.authenticateApplicationRequest()
r._name='soaptest'
cred._credential = 'passwordGoesHere'
r._credential=cred
req._in0 = r
token = server.authenticateApplication( req )

#Look up a principle from the 'soaptest' application
prin = sss.findPrincipalByNameRequest()
prin._in0 = token._out
prin._in1 = 'cpepe'
me = server.findPrincipalByName( prin )
for i in me._out._attributes._SOAPAttribute:
    print '%s: %s' % (str(i._name), str(i._values.__dict__))

Using Fabric with sudo

If you've ever had to use Fabric, one of the issues is that your scripts must return an error code of 0 in order for the sudo() command to assert that the command executed successfully. Any non-zero error code will result in an error message.

Fatal error: sudo() encountered an error (return code 1) while executing 'sudo"...

If you're using bash scripts, this means that any "set -e" or "bash -e" statements that trigger an abnormal exit. The "kill -0 ", which allows you to test whether a process exists and can be killed, suffers from a flaw in that if you provide a PID that does not exist, it will trigger an exception and cause bash to break out if "set -e" or "bash -e" is set (normally you can use $? to check the return value).

You also should check the integer value (if [ $? -eq 0 ]; then or if [ ?! -eq 1]; then) to determine which step to use.

Wednesday, October 19, 2011

Minus sign

The minus sign is the default value if the variable isn't set. This line has two sets of fallbacks:

CELERYBEAT_PID_FILE=${CELERYBEAT_PID_FILE:-${CELERYBEAT_PIDFILE:-$DEFAULT_PID_FILE}}

value_for_platform

http://wiki.opscode.com/display/chef/Recipes#Recipes-valueforplatform

Chef template specificity

http://wiki.opscode.com/display/chef/Templates

Template Location Specificity

Cookbooks are often designed to work on a variety of hosts and platforms. Templates often need to differ depending on the platform, host, or function of the node. When the differences are minor, they can be handled with a small amount of logic within the template itself. When templates differ dramatically, you can define multiple templates for the same file. Chef will decide which template to render based on the following rules.
Within a Cookbook's template directory, you might find a directory structure like this:
  • templates
    • host-foo.example.com
    • ubuntu-8.04
    • ubuntu
    • default
For a node with FQDN of foo.example.com and the sudoers.erb resource above, we would match:
  • host-foo.example.com/sudoers.erb
  • ubuntu-8.04/sudoers.erb
  • ubuntu/sudoers.erb
  • default/sudoers.erb
In that order.
Then, for example: sudoers.rb placed under the files/host-foo.example.com/ directory, means it will be only copied to the machine with the domain name foo.example.com. (Note the "host-" prefix to the directory name)
So, the rule distilled:
  1. host-node[:fqdn]
  2. node[:platform]-node[:platform_version]
  3. node[:platform]
  4. default

Dealing with IOError and mod_wsgi

http://permalink.gmane.org/gmane.comp.python.django.devel/30886

Monday, October 17, 2011

Sunday, October 16, 2011

get_task_logger() in Celery...

If you looked at the Celery documentation, you'll notice that the get_task_logger() examples
constantly show up.

@task
def add(x, y):
    logger = add.get_logger()
    logger.info("Adding %s + %s" % (x, y))
    return x + y
What does this function do? Well, it turns out that it will create a separate logger instance specifically tied to the task name (submitted as a PR on https://github.com/ask/celery/issues/129). The propagate=False is always set, so that any messages passed to it will not move up the parent/ancestor chain.

Instead, a handler is always added to this task. If you wish to adjust the logger level,
you could do:

import logging
logging.getLogger('myproject.add').setLevel(logging.DEBUG)

If no loglevel is specified in get_logger(), then the default log level defined in CELERYD_LOG_LEVEL is used. Be careful though! The right way is to set the level number (not the level name) if you are modifying directly through Python:

from celery import current_app
from celery.utils import LOG_LEVELS
current_app.conf.CELERYD_LOG_LEVEL = LOG_LEVELS['DEBUG']  # pretty much the same as logging.DEBUG

What's the purpose of get_task_logger()? Well it appears the motivation is to allow logging by task names. If we were just to import the standard logging module, Celery will patch the logger module to add process-aware information (ensure_process_aware_logger()), and then add format/handlers to both the root logger and the logger defined by the multiprocessing module (the multiprocessing get_logger() does not use process shared-logs but it allows you to login things to the "multiprocessing" namespace, which adds SUBDEBUG/SUBWARNING debug levels).

def setup_logging_subsystem(self, loglevel=None, logfile=None, format=None, colorize=None, **kwargs):                                                             
        if Logging._setup:                   
            return                                                                                            
        loglevel = loglevel or self.loglevel      
        format = format or self.format                                                                         
        if colorize is None:                
            colorize = self.supports_color(logfile)                                                            
                                                                                                              
        if mputil and hasattr(mputil, "_logger"):                    
            mputil._logger = None                                                                              
        ensure_process_aware_logger()                               
        receivers = signals.setup_logging.send(sender=None,                    
                        loglevel=loglevel, logfile=logfile,                                 
                        format=format, colorize=colorize)                      
        if not receivers:                                           
            root = logging.getLogger()                              
                                                                                                              
            if self.app.conf.CELERYD_HIJACK_ROOT_LOGGER:                       
                root.handlers = []                                                                             
                                                                    
            mp = mputil.get_logger() if mputil else None                 
            for logger in filter(None, (root, mp)):                         
                self._setup_logger(logger, logfile, format, colorize, **kwargs)               
                logger.setLevel(loglevel)                        
                signals.after_setup_logger.send(sender=None, logger=logger,                                    
                                        loglevel=loglevel, logfile=logfile,                                    
                                        format=format, colorize=colorize)              

Debugging Celery tasks locally

Want to make sure your Celery tasks work correctly before you deploy? Here are a bunch of useful tips you can do:

First, set the root logger and "celery.task.default" to use DEBUG mode:
import logging 
logging.getLogger('celery.task.default').setLevel(pythonLogging.DEBUG)
logging.getLogger().setLevel(pythonLogging.DEBUG)

Set ALWAYS_EAGER mode so that Celery will always invoke tasks locally instead of dispatching to the Celery machine.

Set EAGER_PROPAGATES_EXCEPTION so that any exceptions within tasks will be bubbled up so that you can actually see any exceptions that may cause your batch calls to fail (i.e. any uncaught exception can cause a fatal error!)
from celery import current_app
current_app.conf.CELERY_ALWAYS_EAGER = True
current_app.conf.CELERY_EAGER_PROPAGATES_EXCEPTIONS = True

from celery.utils import LOG_LEVELS
current_app.conf.CELERYD_LOG_LEVEL = LOG_LEVELS['DEBUG']  # pretty much the same as logging.DEBUG

Finally, if you are invoking a task from the same Python script, you should import the task_name as if it were being imported, even if the function is declared within the same file. The reason is that when running the Celeryd daemon and looks for registered tasks, Celery will consider the task function you invoked to come from the "__main__" class. The way to get around it is to import the task residing in the same file, assuming your PYTHONPATH is set correctly.

from celery.decorators import task

@task
def task_name():
   print "here"
   return 1

if "__name__ == "__main__":
  from  import task_name

  task_name.apply_async()

(Note: This information has been updated to reflect Celery v2.3.3 inner-workings).

Saturday, October 15, 2011

Celery and the big instance refactor

One of the strange parts in Celery is that if you want a logger that will write to celery.task.default instead of its own default name, you can do:

import celery.task import Task
logger = Task.get_logger()

The Task class appears to be a global instantiation of Celery. Normally, the task logger is setup via the get_logger() method, which then calls setup_task_logger(), which in turn calls get_task_logger. If you invoke get_logger() within a Task class, the name of the task name is used:

def setup_task_logger(self, loglevel=None, logfile=None, format=None,
            colorize=None, task_name=None, task_id=None, propagate=False,
            app=None, **kwargs):
        logger = self._setup_logger(self.get_task_logger(loglevel, task_name),
                                    logfile, format, colorize, **kwargs)

If you use Task.get_logger(), no name is used and the logger namespace is set to celery.task.default.

def get_task_logger(self, loglevel=None, name=None):                                                                               
      logger = logging.getLogger(name or "celery.task.default")  
      if loglevel is not None:
         logger.setLevel(loglevel)                                                                                  
      return logger

This Task appears to be part of the “The Big Instance” Refactor. It appears that there are plans for multiple instances of the Celery object to be instantiated.

Also, one thing to note:

http://ask.github.com/celery/userguide/tasks.html#logging

Instantiation
A task is not instantiated for every request, but is registered in the task registry as a global instance.

This means that the __init__ constructor will only be called once per process, and that the task class is semantically closer to an Actor.

If you have a task,

Friday, October 14, 2011

FQL is not being deprecrated

You can now do FQL queries via the Open Graph API....

One of the most common misconceptions we hear from developers is the belief that FQL will be deprecated with the REST API. The primary reason for this misunderstanding is that you can only issue FQL queries using the fql.query or fql.multiquery methods. We want to make it clear that FQL is here to stay and that we do not have any plans to deprecating it. Today, we are enabling developers to issue FQL queries using the Graph API. With this change, FQL is now just an extension of Graph API.

You can issue an HTTP GET request to https://graph.facebook.com/fql?q=QUERY. The ‘q’ parameter can be a single FQL query or a multi-query. A multi-query is a JSON-encoded dictionary of queries.

http://developers.facebook.com/blog/post/579/

Thursday, October 13, 2011

Pylint on Ubuntu

Ever see this issue?
>> from logilab.common.compat import builtins
Traceback (most recent call last):
  File "", line 1, in 
ImportError: cannot import name builtins

Chances are you have an old version of logilab that is stored inside /usr/lib/pymodules:
>>> import logilab
>>> logilab.common

>>> logilab.common.compat

>>> logilab.common.compat.__file__
'/usr/lib/pymodules/python2.6/logilab/common/compat.pyc'

The solution is to delete the logilab directory in /usr/lib/pymodules, or do:

sudo apt-get remove python-logilab-common
sudo apt-get remove python-logilab-astng

Then you can do:
pip install -U pylint