Section 3.2.2. in the RFC defines the basic approach to creating Huffman trees, but this article provides a solid explanation about how it works and shows how it's implemented. The key insight is that Huffman codes are generated by counting the lengths of the variable-length codes (For more background/context, see Stanford lectures on algebraic error control codes and/or the textbook used in the class).
Gzip is essentially using the deflate algorithm but adds a header/trailer to the end. RFC1952 provides a breakdown of how the header is generated, but you can also test for yourself:
$echo "hello" > hello
>>> data = open("hello.gz", "r").read()
>>> [hex(ord(a)) for a in data[4:8]]
['0x9e', '0x2e', '0xfa', '0x50']
>>> 0x50fa2e9e 1358573214 >>> datetime.datetime.fromtimestamp(1358573214) datetime.datetime(2013, 1, 18, 21, 26, 54)
>>> data[10:16] 'hello\x00'
More related articles too: