Saturday, March 31, 2012

One more issue in making the Hudson to Jenkins switch...

The Hudson/Jenkins continuous integration server relies on using Jelly scripts, which is considered some type of executable XML format. The format resembles sort of like Django/Jinja2 templates with very little documentation to accompany how it all works. Hudson/Jenkins documents have very few references, so you're left to guess what the statements such as the index.jelly in the Violations plug-in actually do.

One issue encountered was using the Violations plug-in, which allows users to see exactly which lines of code triggered Pylint, PEP8, JSLint-releated errors. The Violations plug-in parses the error codes from these files and supposedly renders the summary similar to how its shown below (taken from https://wiki.jenkins-ci.org/download/attachments/2916418/violations-file-1.png?version=1&modificationDate=1187153094000):



The problem in Jenkins was that the screen above was blank, whereas Hudson didn't have this issue. While tracing through the code, it also didn't seem like there were that many drastic changes with this plugin over the past 4-5 years. Reverting back to a previously known working version of the Violations plugin on Hudson also showed the same phenomenon on Jenkins.

Furthermore, it turns out this particular section in the Violations plug-in seems to have problems. The template attempts to set a variable named 'model' using the {it.fileModel} attribute, which corresponds to a getFileModel() function within the code. There is also a getDisplayName() function inherited from the AbstractFileModel.java, which is inherited by FileModel.java.
   <l:main-panel>
<j:set var="model" value="${it.fileModel}"/>

<j:set
var="image"
value="${rootURL}/plugin/violations/images/48x48/dialog-warning.png"/>

<j:set
var="iconDir"
value="${rootURL}/plugin/violations/images/16x16"/>

<j:set var="href" value="${it.showLines}"/>
<h3><img src="${image}"/> ${model.displayName}</h3>


<j:forEach var="t" items="${model.typeMap.entrySet()}">
<table class="pane">
<tbody>
<tr><td class="pane-header" colspan="5">${it.typeLine(t.key)}</td></tr>
<j:forEach var="v" items="${t.value}">
<tr>
<td class="pane">
<j:if test="${href}">
<a href="#line${v.line}">${v.line}</a>
</j:if>
<j:if test="${!href}">
${v.line}
</j:if>
</td>
<!--<td class="pane">${v.source}</td> -->
<td class="pane">${it.severityColumn(v)}</td>
<td class="pane" width="99%">${v.message}</td>
</tr>
</j:forEach>
</tbody>
</table>
<p></p>
</j:forEach>
The issue was first reported last month in this ticket. The problem didn't appear in any Hudson-based servers, so the problem appears to have occurred when making the switch from Hudson to Jenkins. The workaround to resolve the issue was to create the two functions within FileModel.java, as shown in this pull request.
   /**
* Get the display name of this file.
* @return the name to use whrn displaying the file violations.
*/
@Override
public String getDisplayName() {
return super.getDisplayName();
}

/**
* Get the map of types to violations.
* @return the type to violation map.
*/
@Override
public TreeMap> getTypeMap() {
return super.getTypeMap();
}
It's not clear to me why one needs to expose the methods and call a subclasses' parent methods, but all of a sudden, the Violations summary screen came to life after making this fix and recompiling the plugin. Although this Violations plugin hasn't been updated in the recent 8 months, one can only hope that Jenkins users will make sure to merge the fix and re-release the binary soon!

Thursday, March 29, 2012

Using Pynliner with Hudson/Jenkins email-ext plugin...

The current version of Pynliner is case-sensitive based, which makes it hard to generate inline CSS styles for XML-based documents that have difference case settings for the CSS styles. An example of this issue occurs in the html.jelly file of Hudson/Jenkins' email-ext plugin where the tags are lower-case but the CSS styles are upper case. In order to generate an XML document that includes these inline styles, one either has to convert everything or update Pynliner to be able to be case-insenstive.

A pull request submitted here attempts to generate these inline styles automatically from the original file. The script can probably be adapted to generate any Jelly-based document, though some replace/substitution has to be done at the end.

https://github.com/rogerhu/email-ext-plugin/tree/jelly_pynliner

Hopefully the changes are incorporated into Pynliner too:

https://github.com/rennat/pynliner/pull/1

EmailMessage in Django v1.4

In Django v1.4, creating email messages with the EmailMessage class now tries to default to 7-bit encoding unless there is a reason to use quoted-printable format. The reason? It appears that using ASCII text helps to prevent extra attention from spam filters.

https://code.djangoproject.com/attachment/ticket/11212/0001-Ticket-11212-default-to-7bit-email.patch

If you have any tests that attempt to compare the email-message against what you're sending, you'll need to update them too!

+        # Ticket #11212
+ # Shouldn't use quoted printable, should detect it can represent content with 7 bit data
+ msg = EmailMessage('Subject', 'Body with only ASCII characters.', 'bounce@example.com', ['to@example.com'], headers={'From': 'from@example.com'})
+ s = msg.message().as_string()
+ self.assertFalse('Content-Transfer-Encoding: quoted-printable' in s)
+ self.assertTrue('Content-Transfer-Encoding: 7bit' in s)
+
+ # Shouldn't use quoted printable, should detect it can represent content with 8 bit data
+ msg = EmailMessage('Subject', 'Body with latin characters: àáä.', 'bounce@example.com', ['to@example.com'], headers={'From': 'from@example.com'})
+ s = msg.message().as_string()
+ self.assertFalse('Content-Transfer-Encoding: quoted-printable' in s)
+ self.assertTrue('Content-Transfer-Encoding: 8bit' in s)
+
+ msg = EmailMessage('Subject', u'Body with non latin characters: А Б В Г Д Е Ж Ѕ З И І К Л М Н О П.', 'bounce@example.com', ['to@example.com'], headers={'From': 'from@example.com'})
+ s = msg.message().as_string()
+ self.assertFalse('Content-Transfer-Encoding: quoted-printable' in s)
+ self.assertTrue('Content-Transfer-Encoding: 8bit' in s)

Wednesday, March 28, 2012

Moving from Hudson to Jenkins

Last year's dispute with Oracle has caused many of the main developers of the continuous integration Hudson to fork the project into Jenkins. While the instructions for upgrading seem relatively painless, there were also several issues that we encountered when making the switch from Hudson to Jenkins:

1. You can use the Ubuntu instructions to download the latest Debian package from the Jenkins repo. You may need to be prepared to grep any of your configs to switch from 'jenkins' to 'hudson' (or to perhaps a more neutral name) that make these assumptions such as your Apache configs or build scripts. The directory in /var/lib/hudson may also need to be backed up.

2. Ubuntu also inserts new config files (i.e. /etc/default/jenkins), so you may need to update your port settings and JENKINS_HOME settings. Python in particular has problems dealing with files using symlinks so it's best not to create symbolic links that map from the old Hudson directories to the new ones.You're better off moving directory locations since the symlinks cause extra headaches that can be avoided by renaming existing directories.

3. Assuming you startup Jenkins correctly, you may go into the Manage Plugins section and be asked to upgrade your plugins. Avoid doing so until you go into the Advanced section and make sure you click on "Check now" to update the database of known plugins and Jenkins versions:

Although most Hudson/Jenkins plugins are compatible, there are issues with later forks of the Git plugin (see item #4). If you don't update your plug-in data in the Advanced section, you may end up downloading plug-ins that ony work on Hudson and not Jenkins, and may see confusing messages about needing to upgrade to Hudson v2.20 when v1.457 is the latest version since the database information is now different.

4. If you are attempting to use the Hudson git plugin, you may find this error (discussed at http://stackoverflow.com/questions/6230233/hudson-build-fails-with-fatal-null-java-abstractmethoderror)
FATAL: null
java.lang.AbstractMethodError
at hudson.model.AbstractBuild.getCulprits(AbstractBuild.java:278)
at hudson.model.AbstractBuild.getCulprits(AbstractBuild.java:275)
at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:565)
at hudson.model.Run.run(Run.java:1386)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceControl
The issue is that there are starting to be incompatibilities that arise between the Hudson and Jenkins fork of the Git plugin. When cloning/compiling plug-ins, you have to remember that there are now two separate forks that have very different implementations, one for Hudson (forked at https://github.com/hudson-plugins/git-plugin and the other for Jenkins (forked at https://github.com/jenkinsci/git-plugin. If you don't compile plugins, you don't have to worry about this issue but the later versions of Git plugin in particular seems to have introduced problems.

5. If you're using OpenID/SSO integration with Hudson, the OpenID plugin comes with the latest version of Jenkins but you have to compile your own version if you use Hudson. The Git plugin for Hudson already exposes the option to decide whether to create User accounts in Hudson based on author or email, but this pull request just submitted now allows for this option to be used on Jenkins.

6. The Violations plugin had problems with parsing jobs using an older version that was used on Hudson, causing certain jobs to be missing. You may have to delete the XML-based sections in your jobs and reload the configs for the jobs to be read correctly. Your best bet for tracking down these issues are in /var/log/jenkins/jenkins.log. You will also have problems viewing the Violations Summary without this pull request fix, which was reported in this ticket.

7. Starting in Jenkins v1.426, API tokens are used in lieu of passwords for OpenID/Crowd-based logins. In other words, you can no longer use pre-defined passwords and must go into the Users section and look at an individual's API token to access Hudson's API to trigger builds and retrieve information about jobs (i.e. wget needs to be changed to use the API token instead of the user's password). More detailed instructions are located at: https://wiki.jenkins-ci.org/display/JENKINS/Authenticating+scripted+clients

Friday, March 23, 2012

Moving RabbitMQ machines with the Kombu framework

If you've ever contemplated moving a RabbitMQ master node from one machine to another, there are several ways to consider. The first would be to zip up the mnesia database directory (often located in /var/lib/rabbitmq/mnesia) and transfer files to the new machine. Another way would be to setting up RabbitMQ mirroring (available since RabbitMQ 2.6.0+ for clustering support and high availability) so that one master node could be taken down. Here's some of our reasons why we chose not to go towards those options and instead used another approach.

The first option may have issues since the database often has the hostname tied directly in the database. The following discussion came from a thread on Server fault thread (see below).
The database RabbitMQ uses is bound to the machine's hostname, so if you copied the database dir to another machine, it won't work. If this is the case, you have to set up a machine with the same hostname as before and transfer any outstanding messages to the new machine. If there's nothing important in rabbit, you could just clear everything by removing the RabbitMQ files in /var/lib/rabbitmq.

You could follow this approach, but in our case, we had already spun up a separate machine and wanted to have both the old and new machines running at the same time without any further config changes. We also explored trying to use the Erlang interpreter to make changes to the mnesia database, but we weren't sure if there had been any changes to the Mnesia database format between the RabbitMQ versions we were using so the option seemed a bit risky.

In terms of using RabbitMQ's high availability features, we were upgrading a host machine that was running with RabbitMQ 2.3.1 and didn't have some of the clustering support in later versions. It also seemed that we would have to invest more time to learn how to implement things and verifying the replication was happening correctly.

In addition, we thought it might have been easier to point all new jobs to the new RabbitMQ host and allow the Celery workers on the old RabbitMQ host to drain the rest of the remaining queue. The problem of this approach is that we were using the Celery framework, which supports the eta/countdown parameter, which is extremely useful for setting up scheduled tasks that fire at their relatively precise times. These scheduled tasks are implemented by keeping the messages unacknowledged until their scheduled date/time, whereby the worker which picked up the job and kept in a different scheduled queue (for more information, click here) moves the task into its ready queue. If any tasks needed to be cancelled, the new Celery workers using this new host would not know how to deal with these revoked tasks.

The reason is that all cancelled tasks rely on sending a revoked message to all Celery workers, which are instructed not to move any tasks that match a specific task ID from the scheduled to ready queue. Without this existing message in the new AMQP host, none of the Celery workers would know how to handle these cancelled tasks. The old Celery workers connected to the old AMQP host, which would not be receiving these cancelled messages, would fire them off without realizing that they were actually revoked.

Since we were using Celery, however, there was a third option for us. Celery is built on top of the Kombu framework, which provides an abstraction layer for communicating with AMQP hosts. We could create two AMQP connections, one to the old broker host and to the new one. By draining all messages without acknowledging them, the messages could be transferred from one to the other. If a failure occurred, we could use the camqadm utility to purge the queue and restart.

The script below is an example of how you can move the messages stored in one AMQP host to another, assuming that all of the messages you enqueued were using the Celery framework. This script also only works if all Celery workers have been stopped since some of the messages may be held by Celery's scheduler queue and assumes that the queues you are using are all AMQP-direct exchanges, so you may need to tweak things if your settings are different. Finally, we specified the serializer as "pickle", which is the default mechanism used to store Python objects.

# NOTE: Before running this script, all Celery workers must be killed.  This script will copy all messages from one RabbitMQ host to another.                                                                                                             

from celery import conf
from kombu.connection import BrokerConnection
from kombu.messaging import Exchange, Queue, Consumer, Producer

import dateutil
import signal
import socket
import sys

OLD_BROKER_HOST = "oldhost.mydev.com" # change
NEW_BROKER_HOST = "newhost.mydev.com" # change

DEFAULT = "default" # change to your queue name

global old_connection, new_connection, producer

def signal_handler(signal, frame):
print 'You pressed Ctrl+C!'

for connection in [old_connection, new_connection]:
if connection:
print "Closing broker connection %s" % (connection)
connection.release()

sys.exit(0)


def process_msg(body, msg):
task = body.get('task')
eta = body.get('eta')
kwargs = body.get('kwargs')

#msg.ack() # Acknowledge the message so that it gets removed from the queue

print "Enqueuing new task to publish to Producer (task=%s, eta=%s, kwargs=%s)" % (task, eta, kwargs)
producer.publish(body)
print "body %s, msg %s" % (repr(body), repr(msg))

# If we don't cancel them, then the messages are still being held by the Celery workers.
char = raw_input("Can you verify that all Celery workers have been stopped? (Y/N): ")

if char != 'Y':
print "This script will not work if there are Celery workers that are still reserving RabbitMQ messages. Exiting..."
sys.exit(0)

old_connection = BrokerConnection(OLD_BROKER_HOST, conf.BROKER_USER, conf.BROKER_PASSWORD, conf.BROKER_VHOST)
new_connection = BrokerConnection(NEW_BROKER_HOST, conf.BROKER_USER, conf.BROKER_PASSWORD, conf.BROKER_VHOST)

signal.signal(signal.SIGINT, signal_handler) # Ctrl-C handler

# RabbitMQ connection
old_channel = old_connection.channel()
old_default_exchange = Exchange(DEFAULT, "direct", durable=True)
old_default_queue = Queue(DEFAULT, exchange=old_default_exchange, routing_key=DEFAULT)

consumer = Consumer(old_channel, old_default_queue, callbacks=[process_msg])
consumer.consume()

new_channel = new_connection.channel()
new_default_exchange = Exchange(DEFAULT, "direct", durable=True) # should be DEFAULT

# Use pickle serializer
producer = Producer(new_channel, exchange=new_default_exchange, serializer="pickle", routing_key=DEFAULT)

while True:
try:
old_connection.drain_events(timeout=10) # 10 seconds is an acceptable timeout
except socket.timeout:
print "No more events came down the pipeline after 10 seconds...exiting."
old_connection.release()
exit(0)
except socket.error:
print "Socket error...exiting."
old_connection.release()
exit(0)

You can verify that all messages have been transferred by doing sudo rabbitmqctl -p <vhost> list_messages to verify the number of messages in old/new AMQP broker hosts match. If you really want to make sure the messages were copied successfully, you can tweak this same script to print all the messages in the new AMQP host (instead of initiating a connection to both hosts), as well as running Celery on the new AMQP host to verify the messages can be executed as tasks.

I'd recommend shutting down all your Celery workers before trying this approach. After the Celery tasks have all been transferred, you can shutdown the old RabbitMQ server, point your new Celery workers to use the new RabbitMQ host, and startup your Celery workers again. Good luck!

Monday, March 12, 2012

SQLAlchemy and Connection Pooling

Since Django opens a MySQL connection for every ORM access, we recently decided to implement connection pooling with SQLAlchemy. The basic instructions, which basically are Django version-specific, are posted here. It works fairly well though you have to put custom specific code in the Database.connect() routine since some of the parameters normally passed are not serializable, which SQLAlchemy needs to generate a unique key within its connection pool.

Another issue is the use of "CREATE DATABASE" and "DROP DATABASE" when connected to the same database. If you happen to issue these statements to which a database that is currently connected to this same DB, you may find doing subsequent SELECT statements cause a "Database not selected" error. For instance, these set of commands fail:

import sqlalchemy
engine = sqlalchemy.create_engine('mysql://username:password@localhost/testdb')
connection = engine.connect()

connection.execute("DROP DATABASE testdb")
connection.execute("CREATE DATABASE testdb")
connection.execute("SHOW TABLES")


But this command works:
import sqlalchemy
engine = sqlalchemy.create_engine('mysql://username:password@localhost/testdb')
connection = engine.connect()

connection.execute("DROP DATABASE testdb")
connection.execute("USE testdb")
connection.execute("CREATE DATABASE testdb")
connection.execute("SHOW TABLES")

The problem also happens in MySQLdb too:

import MySQLdb
conn = MySQLdb.connect (host="localhost", user="username", passwd="password", db="testdb")
cursor = conn.cursor()
cursor.execute ("DROP DATABASE testdb")
cursor.execute("CREATE DATABASE testdb")
cursor.execute("SHOW TABLES")

Django's test runner avoids this issue by requiring that the original database connection to be used and then issuing the CREATE/DROP on the test database inside the create_test_db() function. Then it closes the connection and sets the test database name, which forces a reconnect.

django/db/backends/creation.py:
def create_test_db(self, verbosity=1, autoclobber=False):
        """                                                                                                                                                                                                  
        Creates a test database, prompting the user for confirmation if the                                                                                                               
        database already exists. Returns the name of the test database created.                                                                                                              
        """
        self._create_test_db(verbosity, autoclobber)
        self.connection.close()
        self.connection.settings_dict["NAME"] = test_database_name

    def _create_test_db(self, verbosity, autoclobber):
                    cursor.execute("DROP DATABASE %s" % qn(test_database_name))
                    cursor.execute("CREATE DATABASE %s %s" % (qn(test_database_name), suffix))

It appears that the connection needs to be invalidated in SQLAlchemy using connection.invalidate() for SQLAlchemy. Otherwise, the issue can occur. Normally Django will close the connection and reinitiate but SQLAlchemy won't know to do this unless we explicitly invalidate it.

Since the connection appears to be reused, this issue appears to be caused by not issuing a "USE " between CREATE/DROP. This issue was encountered when rolling out django-nose, which was posted here.

Wednesday, March 7, 2012

GitHub and SSH keys

GitHub is requiring people to re-approve their SSH keys but don't provide any context about how to verify your SSH public keys should match their hex digests.    You can check here to derive the SSH fingerprint:

http://stackoverflow.com/questions/6682815/deriving-an-ssh-fingerprint-from-a-public-key-in-python


If you want to check manually, here's another approach (courtesy of Sean Conaty writing this stuff up):

> cat ~/.ssh/id_rsa.pub

copy everything after the "ssh-rsa " through and including the "=="

> python

>> import base64
>> import md5
>> decoded = base64.b64decode(key_goes_here) 
>> md5.md5(decoded).hexdigest()

Tuesday, March 6, 2012

Android branches

List all branches in an Android:
git --git-dir .repo/manifests/.git/ branch -a

Checking out the 2.3.6..
git --git-dir .repo/manifests/.git/ checkout android-2.3.6_r1

To update cacerts.bk on the emulator, make sure you set the partition-size.  Otherwise, you may see "Out of Memory" exceptions:

~/projects/android/android-sdk-linux/tools/emulator-arm -avd <.avd image> -partition-size 128

Then you can pull the cert

./adb pull /system/etc/security/cacerts.bks android23_cacerts.bks  

To use keytool to add a cert, you need to install the BouncyCastleProvider (bcprov-jdk16-146.jar) into /usr/lib/jvm/java-6-openjdk/jre/lib/ext dir.

keytool -keystore android22_cacerts.bks -storetype BKS -provider org.bouncycastle.jce.provider.BouncyCastleProvider -storepass changeit -import -v -file bla.cer

./adb shell mount -o remount, rw /system

/adb push android22_cacerts.bks /system/etc/security/cacerts.bks


Monday, March 5, 2012

FileZilla Fail..

If you get a GnuTLS error -12, a TLS fatal alert has occurred, with VSFTPd, chances are that you have an SSL cipher suite negotiation problem. The problem first appears to be that FileZilla 3.5.3 (and not FileZilla 3.5.2) has changed its supported encryption schemes.

http://trac.filezilla-project.org/ticket/7873

The changes in FileZilla appear to have changed the cipher suites that allowed to be supported. The gnutls_priority_set_direct() function appears to now expect 3DES-CBS as the main cipher. The SECURE256 flag got renamed to SECURE192 (according to the ChangeLog inside GNU TLS 3.0.9) defines these types of suites:

static const int sign_priority_secure192[] = {
  GNUTLS_SIGN_RSA_SHA384,
  GNUTLS_SIGN_ECDSA_SHA384,
  GNUTLS_SIGN_RSA_SHA512,
  GNUTLS_SIGN_ECDSA_SHA512,
  0
};

A more higher security got used in the latest version of FileZilla:
http://svn.filezilla-project.org/filezilla/FileZilla3/trunk/src/engine/tlssocket.cpp?r1=4384&r2=4383&pathrev=4384

The diff got changed from using SECURE256:

gnutls_dh_set_prime_bits(m_session, 512);          
res = gnutls_priority_set_direct(m_session, "SECURE256:+CTYPE-X509:-CTYPE-OPENPGP", 0);

...to 3DESC-CBC:

res = gnutls_priority_set_direct(m_session, "NORMAL:-3DES-CBC:-MD5:-SIGN-RSA-MD5:+CTYPE-X509:-CTYPE-OPENPGP", 0);
gnutls_dh_set_prime_bits(m_session, 2048);

The workaround is to change ciphers=HIGH inside /etc/vsftpd.conf, since the default is DES-CBC3-SHA. This obviously will cause more restrictions for SSL clients and forces them to use 3DES instead of DES encryption.

Thursday, March 1, 2012

Using Facebook's Test Account API...

Using Facebook's Test Account system is amazing. You can be given up to 500 test accounts whereby you can login, create a profile/business pages, and just about do anything in a controlled, sandboxed environment.

One issue they don't mention: to implement the API for resetting passwords, it turns out that you need to specify a minimum of 6 characters for the password. If you find that trying to change the password consistently returns back false (instead of true), then chances are your password isn't long enough.

We also found that in order to get the emails, you should make a request with the read_email permission. The other thing is that when you do /test-users, you only get back an access token and the ID. In order to get the email, you may have to usse Facebook's Batch API to grab a bunch of the emails to display at once.