Hus to Know?: March 2013

Sunday, March 31, 2013

Turbo Tax and SEP IRA

For some odd reason, I noticed that Turbo Tax 2012 doesn't have any step-by-step wizards for handling SEP IRA contributions. You can search for the tax form and can find it yourself within the software tool. The main issue that isn't covered is how to calculate how much you're allowed to deduct with self-employed IRA's. Normally you are allowed to deduct up to 20% of self-employment income, but you have to use the IRS' Publication 560 to subtract the employment tax. You can scroll down the entire contributation rate table and max out at the 25%, which using the reduced rate table, appears to push you down to 20%.

If you factor this amount, divide the amount you're to contribute the SEP IRA by your total self-employment income, you should come up with the 18.587045% mentioned in the Wiki article.

Note: this isn't tax advice but does seem to be a weird oddity in the tax code that can only be understood by correlating multiple sources to understand what's going on.

Friday, March 29, 2013

More Python 2.7 Unicode strangeness

Apparently in Python 2.7 you can't mix Unicode strings with UTF-8 encoded:

>>> x = "sdds"
>>> y = u'\u2013t'.encode('utf-8')
>>> y
'\xe2\x80\x93t'
>>> x = u"sdds"
>>> x + y
Traceback (most recent call last):
  File "", line 1, in 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

But you can do:

>>> x.encode('utf-8') + y
'sdds\xe2\x80\x93t'

Wednesday, March 27, 2013

What 536871023 means..

Using the Celery framework and the librabbitmq package, we start seeing these errors when we specified an invalid host:

Type: 
Value: Error opening socket: Unknown error 536871023

What does the 536871023 means? To understand what the code means, you can fork the librabbitmq code base and look inside the AMQP-related code.

git clone git://github.com/celery/librabbitmq.git

Within the amqp_open_socket(), the error code is returned:

int amqp_open_socket(char const *hostname,                                                                
             int portnumber)   {

.
.
.

    return -amqp_socket_error();   

}

Where amqp_socket_error() is equal to the errno code masked with a constant:

int                                                                                                     
amqp_socket_error(void)                                                                                 
{                                                                                                      
    return errno | ERROR_CATEGORY_OS;                                                                  
}

This constant is defined in rabbitmq-c/librabbitmq/amqp_private.h:

#define ERROR_CATEGORY_OS (1 << 29) /* OS-specific error codes */  

>>> int(x)
536871023
>>> hex(int(x))
'0x2000006f'

...so we ignore the higher bit and convert the 0x6f..

>>> int(0x6f)
111
>>>

The 111 amounts to a Connection refused error message, which indicates an issue connecting to the RabbitMQ host.

Saturday, March 23, 2013

Multi-threading in Python

What do you think you will get the from the above result?

import threading

class A(threading.local):
    def __init__(self, *args, **kwargs):
        self.val = 1

bla = A()


class Example(threading.Thread):

    def __init__(self, thread_id, value):
        self.thread_id = value
        self.value = value
        super(Example, self).__init__()

    def run(self):
        bla.val = self.value
        print "Thread: %s, id=%s" % (self.thread_id, bla.val)


example1 = Example(1, "A")
example2 = Example(2, "B")

example1.start()
example2.start()

example1.join()
example2.join()

print "Final value %s" % bla.val

The output is:

Thread: A, id=A
Thread: B, id=B
Final value 1

Thread-local storage (TLS) is actually implemented at the hardware level. This Stack Overflow article provides a link to how TLS is implemented on Linux.
http://people.redhat.com/drepper/tls.pdf

This link provides a good general explanation of the Python thread locals library.

Thursday, March 7, 2013

RSA-OAEP and OpenSSL

The XML Digital Signature specs detail the use of RSA-OAEP padding. The best explanation is available at Wikipedia with the following diagram on the right. Sections 9.1.1 of the PKCS 2.0 standard also explain how this padding algorithm works. http://www.w3.org/TR/xmlenc-core/#sec-RSA-OAEP

5.4.2 RSA-OAEP

Identifier:
http://www.w3.org/2001/04/xmlenc#rsa-oaep-mgf1p (REQUIRED)
The RSAES-OAEP-ENCRYPT algorithm, as specified in RFC 2437 [PKCS1], takes three parameters. The two user specified parameters are a MANDATORY message digest function and an OPTIONAL encoding octet string OAEPparams. The message digest function is indicated by the Algorithm attribute of a child ds:DigestMethod element and the mask generation function, the third parameter, is always MGF1 with SHA1 (mgf1SHA1Identifier). Both the message digest and mask generation functions are used in the EME-OAEP-ENCODE operation as part of RSAES-OAEP-ENCRYPT. The encoding octet string is the base64 decoding of the content of an optional OAEPparams child element . If no OAEPparams child is provided, a null string is used.

Schema Definition:
     
     
     An example of an RSA-OAEP element is:

  
     9lWu3Q==

If you want to use OpenSSL to decode RSA-OAEP padded data, however, the OAEPparams option cannot be used. Normally, when a message is initially prepared, it is hashed with an input parameter, which is step 3 in Section 9.1.1 RFC 2437 under "Generate an octet string PS consisting of emLen-||M||-2hLen-1 zero octets. The length of PS may be 0.".

But according to http://www.openssl.org/docs/crypto/RSA_public_encrypt.html, EME-OAEP as defined in PKCS #1 v2.0 with SHA-1, MGF1 and an empty encoding parameter. (EME standards for encoding methods for encryption). Therefore, the octect string PS must be empty.

Friday, March 1, 2013

Why Python, Ruby, and JavaScript are slow

Jinja2, which can be used in lieu of Django's templating engine, recommends in its documentation to use the markupsafe library:

More Speed with MarkupSafe

As of version 2.5.1 Jinja2 will check for an installed MarkupSafe module. If it can find it, it will use the Markup class of that module instead of the one that comes with Jinja2. MarkupSafe replaces the older speedups module that came with Jinja2 and has the advantage that is has a better setup script and will automatically attempt to install the C version and nicely fall back to a pure Python implementation if that is not possible.

The C implementation of MarkupSafe is much faster and recommended when using Jinja2 with autoescaping.

>>> import markupsafe
>>> markupsafe

If we don't use the markupsafe package, Jinja has to rely on a Python implementation, which basically means for each character it encounters, it needs to expand the buffer by the extra # of characters, move the data, and repeat the process all over again. The C library extension seems to try to be smarter by using BYOB (bring your own buffer) by expanding the buffer to the total size and block copying the data when escaping is not needed.

This deck helps to sum up why the C implementation is much faster: https://speakerdeck.com/alex/why-python-ruby-and-javascript-are-slow