Sunday, January 20, 2013

The deflate algorithm

The deflate algorithm, the compression approach used in the original Pkzip 2.0 version, is a fascinating read defined in the RFC1951 standard.  In order to understand the spec, it's helpful to work through the "hello" examples discussed in this article.

Section 3.2.2. in the RFC defines the basic approach to creating Huffman trees, but this article provides a solid explanation about how it works and shows how it's implemented.   The key insight is that Huffman codes are generated by counting the lengths of the variable-length codes (For more background/context, see Stanford lectures on algebraic error control codes and/or the textbook used in the class).

Gzip is essentially using the deflate algorithm but adds a header/trailer to the end.  RFC1952 provides a breakdown of how the header is generated, but you can also test for yourself:

$echo "hello" > hello
$gzip hello
>>> data = open("hello.gz", "r").read()
>>> data[0:2]
>>> data[3:4]
>>> [hex(ord(a)) for a in data[4:8]]
['0x9e', '0x2e', '0xfa', '0x50']
>>> 0x50fa2e9e
>>> datetime.datetime.fromtimestamp(1358573214)
datetime.datetime(2013, 1, 18, 21, 26, 54)

>>> data[10:16]

More related articles too:

Thursday, January 10, 2013

RSA block type is not 02 error

Ever seen the error message "block type is not 02" when performing RSA decryption via M2Crypto?  It's a direct result of an RSA decryption not succeding.  Before data is encrypted, it's padded with a header and a block type of 01 or 02, depending on whether the public or private key was used.  If the private key was used, the RFC spec ( says that the padded header should be 01 (though it also says 00 can be used).  If the public key was used to encrypt the data, the RFC header will be set the BT block to 02:

A block type BT, a padding string PS, and the data D shall be
   formatted into an octet string EB, the encryption block.

              EB = 00 || BT || PS || 00 || D .           (1)

   The block type BT shall be a single octet indicating the structure of
   the encryption block. For this version of the document it shall have
   value 00, 01, or 02. For a private- key operation, the block type
   shall be 00 or 01. For a public-key operation, it shall be 02.

During the decryption phase, the RSA encryption algorithm first converts things back to a block of data (ultimately each block is converted from an integer) and then does one additional verification process against this block type.   If the block type doesn't match 01 or 02, it's likely the wrong key was used to decrypt.   You have the wrong decryption key, may be not using the private key if the public key was used to encrypt, or the public key if the private key was used.  (In the latter two cases, just the other pair has to be used to reverse the original operation.)

Saturday, January 5, 2013

Negative modulo's

This function from basically tries to add extra padding characters before attempting to base64-decode a string:

  # Pad the encoded string with "+".
  data += "=" * (4 - (len(data) % 4) % 4)
  return base64.urlsafe_b64decode(data)

Django has the equivalent function in its django.core.signing library:

def b64_decode(s):
    pad = '=' * (-len(s) % 4)
    return base64.urlsafe_b64decode(s + pad)
The code from Django's function uses negative modulo to compute the vaule. How do negative modulos work?    See this excerpt from

The mod function is defined as the amount by which a number exceeds the 
largest integer multiple of the divisor that is not greater than that 
number. In this case, -340 lies between -360 and -300, so -360 is the 
greatest multiple LESS than -340; we subtract 60 * -6 = -360 from -340 
and get 20:

-420 -360 -300 -240 -180 -120  -60   0   60   120  180  240  300  360
       | |                                                    |  |
   -360| |-340                                             300|  |340
       |=|                                                    |==|
        20                                                     40

Working with a positive number like 340, the multiple we subtract is 
smaller in absolute value, giving us 40; but with negative numbers, we 
subtract a number with a LARGER absolute value, so that the mod 
function returns a positive value. This is not always what people 
expect, but it is consistent.

If you want the remainder, ignoring the sign, you have to take the 
absolute value before using the mod function.