Finding Patterns in Data Sequences

Some time ago I was presented question if there would be easy and quick way to recognize certain usage patterns in huge amount of event data. The events in question were common user activity like clicking buttons, waiting certain time between and so on. This data could then used to improve user experience or catch cheaters that write bots to play their game for them and so on.

There are multiple ways to approach this problem but I came up with relatively simple method that seems to have flexibility. We can think the events of interest as sequence of symbols and in case also time between events is of interest, quantize the time as symbols for short, long and medium time. Then, sequence of events can be presented as list of symbols


Here A could be button click to open window, B a tab click in window, C close button, 1 a second pause before moving mouse, 2 a few second pause before moving mouse and X is collect resources etc..

Now we can approach problem with pattern finding. First let’s define function that is supposed to find recurring patterns and output them and input as sequence of those patterns. (Following is written in Python)

def findpattern(s, trivial):
	patterns = {}
	symbolmap = {}

	p = patterns
	acc = []
	output = []
	for c in s:
		if not c in p:
			if not c in patterns or (not trivial and acc[-1] == c):
				p[c] = {}
				p = p[c]
				acc = ''.join(map(lambda x: str(x),acc))
				if not acc in symbolmap:
					symbolmap[acc] = true

				acc = [c]
				p = patterns[c]
			p = p[c]

	if acc:
		acc = ''.join(map(lambda x: str(x),acc))
		if not acc in symbolmap:
			symbolmap[acc] = true

	return (symbolmap.keys(), output)

Function accepts two arguments, the list of symbols and flag if trivial pattern should be accepted. This is needed to determine if you consider a input list “AAAAA” to be one instance of pattern “AAAAA” or 5 instances of pattern “A”.

When this function is applied on the example string above, we get list of core patterns and the input as pattern list.

> s = "ABC123ABC123ABC123ABC123XABC123ABC123ABC123ABC123XABC123ABC123ABC123ABC123X"
> findpattern(s, False)
['ABC123', 'ABC123X'] ['ABC123', 'ABC123', 'ABC123', 'ABC123X', 'ABC123', 'ABC123', 'ABC123', 'ABC123X', 'ABC123', 'ABC123', 'ABC123', 'ABC123X']

Here it’s easy to see that the string is composed of two main patterns, ABC123 and ABC123X. The second output list is the input string split as instances of pattern.
Real data is noisier than the example here, so proper data cleaning and event definition / filtering are vital for any kind of useful pattern recognition.

In this case one could guess that the player might be bot, as the same sequence occur regularly with too much precision and too little variability for human player. (Remember that the symbols 1, 2 and 3 presented also time in this example.)

Going forward we can also define function that recursively uses the output of the findpattern to find patterns of patterns up to the “prime” pattern of the input, i.e. the pattern that when repeated N times produces the original input string.

def findprimepattern(s, trivial=False):
	o = s
	while True:
		s, o = findpattern(o, trivial)
		print s,o
		if len(s) == 1:
			return s[0]

Example with more complicated pattern string

> s = (((("ABC"*2)+"DE")*3+"X")*2+"O")*4
> findprimepattern(s)

And indeed the input string is the prime pattern ABCABCDEABCABCDEABCABCDEXABCABCDEABCABCDEABCABCDEXO concatenated 4 times.

Note effect of parameter trival on the function.

> findprimepattern('AAAA', False)
> findprimepattern('AAAA', True)
['A'] ['A', 'A', 'A', 'A', 'A']

Former is single instance of pattern “AAAA” and latter is 4 instances of pattern “A”.

Reduce mp3 file size for app and web use

Typical 2 minutes piece of music is roughly 2-4MB in common mp3 format. This is size for usual quality that is stereo audio 256 kbps and 44.1 kHz sample rate. Tough the file size is not that large, it will slow down loading HTML5 game or mobile app, especially on wireless networks.

Web HTML5 or mobile game applications do not need nearly as high quality and the size can be reduced to ~400-700kB without too much audible quality penalty. Difference can be noticed if you listen for it, but it’s unlikely that users do notice anything, as they have nothing to compare it against. This also depends lot of the music, sharp high frequency sounds tend to distort first.

MP3/WAV to reduced size MP3

Easy way to convert mp3s is lame. A flexible command line mp3 decoder and encoder.

Here we use 1:45 minutes 1.7MB music piece Lake Hylia from Zelda Reorchestrated. I don’t own any rights to this music and it’s used purely for demonstration purposes. I selected this piece because it’s melodic and instrumental like most game music is.

Key to reducing file size is lower sample and bitrate, and most importantly converting file to mono. Example below is how to do this with lame 3.99.5 on OS/X. I installed lame using Macports.

$ lame -hv -mm --resample 22.05 "Zelda Reorchestrated - Lake Hylia.original.mp3" -B 32 "Zelda Reorchestrated - Lake Hylia.small.mp3"
LAME 3.99.5 64bits (
Autoconverting from stereo to mono. Setting encoding to mono mode.
Resampling:  input 44.1 kHz  output 22.05 kHz
polyphase lowpass filter disabled
Encoding Zelda Reorchestrated - Lake Hylia.mp3
      to Zelda Reorchestrated - Lake Hylia.small.mp3
Encoding as 22.05 kHz single-ch MPEG-2 Layer III VBR(q=4)
    Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
  3996/3996  (100%)|    0:01/    0:01|    0:01/    0:01|   82.375x|    0:00
  8 [  36] **
 16 [  24] *
 24 [   4] *
 32 [3932] **************************************************************************************************************************************
   kbps       mono %     long switch short %
   31.7      100.0        99.9   0.1   0.1
Writing LAME Tag...done
ReplayGain: +3.7dB

This converts the input mp3 (lame also accepts .wav files) to 22.05kHz 64kbs monoaural mp3. The output file size is 412kB, resulting to nearly 75% savings. You can reduce file further by using even lower bitrate, (experiment -B 24 or -B 16), but you may start hearing distortions.

$ du -hs Zelda\ Reorchestrated\ -\ Lake\ Hylia.*
1.6M	Zelda Reorchestrated - Lake Hylia.original.mp3
404K	Zelda Reorchestrated - Lake Hylia.small.mp3

Test play the files here and listen for any differences.

Original: Zelda Reorchestrated – Lake Hylia.original.mp3
Reduced: Zelda Reorchestrated – Lake Hylia.small.mp3

MP3/WAV to reduced OGG Vorbis

This is main for Firefox as it still supports only Ogg Vorbis. You can convert mp3/wav file to reduced mono ogg vorbis with ffmpeg tool.

This example uses ffmpeg 2.1, and like lame I installed is on OS/X using Macports.

$ ffmpeg -i "Zelda Reorchestrated - Lake Hylia.original.mp3" -strict experimental -acodec libvorbis -ab 32k -ar 22050 -ac 1 "Zelda Reorchestrated - Lake Hylia.small.ogg"
ffmpeg version 2.1 Copyright (c) 2000-2013 the FFmpeg developers
  built on Oct 30 2013 23:24:10 with Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
  configuration: --prefix=/opt/local --enable-swscale --enable-avfilter --enable-avresample --enable-libmp3lame --enable-libvorbis --enable-libopus --enable-libtheora --enable-libschroedinger --enable-libopenjpeg --enable-libmodplug --enable-libvpx --enable-libspeex --enable-libass --enable-libbluray --enable-gnutls --enable-libfreetype --disable-outdev=xv --mandir=/opt/local/share/man --enable-shared --enable-pthreads --cc=/usr/bin/clang --arch=x86_64 --enable-yasm --enable-gpl --enable-postproc --enable-libx264 --enable-libxvid --enable-nonfree --enable-libfaac
  libavutil      52. 48.100 / 52. 48.100
  libavcodec     55. 39.100 / 55. 39.100
  libavformat    55. 19.104 / 55. 19.104
  libavdevice    55.  5.100 / 55.  5.100
  libavfilter     3. 90.100 /  3. 90.100
  libavresample   1.  1.  0 /  1.  1.  0
  libswscale      2.  5.101 /  2.  5.101
  libswresample   0. 17.104 /  0. 17.104
  libpostproc    52.  3.100 / 52.  3.100
Input #0, mp3, from 'Zelda Reorchestrated - Lake Hylia.original.mp3':
    title           : Lake Hylia
    artist          : Zelda Reorchestrated
    album           : Twilight Princess
    track           : 11
  Duration: 00:01:44.39, start: 0.000000, bitrate: 128 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 128 kb/s
Output #0, ogg, to 'Zelda Reorchestrated - Lake Hylia.small.ogg':
    title           : Lake Hylia
    artist          : Zelda Reorchestrated
    album           : Twilight Princess
    track           : 11
    encoder         : Lavf55.19.104
    Stream #0:0: Audio: vorbis (libvorbis), 22050 Hz, mono, fltp, 32 kb/s
      title           : Lake Hylia
      artist          : Zelda Reorchestrated
      album           : Twilight Princess
      TRACKNUMBER     : 11
      encoder         : Lavf55.19.104
Stream mapping:
  Stream #0:0 -> #0:0 (mp3 -> libvorbis)
Press [q] to stop, [?] for help
size=     398kB time=00:01:44.38 bitrate=  31.2kbits/s
video:0kB audio:387kB subtitle:0 global headers:3kB muxing overhead 1.866299%

Resulting OGG file size and quality is comparable to reduced mp3.

$ du -hs Zelda\ Reorchestrated\ -\ Lake\ Hylia.*p3
400K	Zelda Reorchestrated - Lake Hylia.small.ogg
1.6M	Zelda Reorchestrated - Lake Hylia.original.mp3

Compare here:

Original: Zelda Reorchestrated – Lake Hylia.original.mp3
Reduced: Zelda Reorchestrated – Lake Hylia.small.ogg

Minesweeper clone in HTML5

In my two previous block entries I wrote about one possible ways to do simple Slots and Car games on HTML5 technologies. This writeup combines some of those methods and introduces new ones to implement Minesweeper game with a twist.


This game is actually “reverse” minesweeper, or should we say Applesweeper. Players mission is to find all the apples without clicking on of the empty tiles. Score is increased when player finds an apple and is decreased when he misses. Game ends when all the apples are found.

Try out the game here:

As in previous games, hud is a html div, and the game grid is drawn on a canvas. Grey slab that hides tiles content is procedurally generated on offscreen canvas at the startup.

this.slab = document.createElement('canvas');
var ctx = this.slab.getContext('2d');
this.slab.width = this.resolution;
this.slab.height = this.resolution;

ctx.fillStyle = 'grey';
ctx.fillRect(0, 0, this.resolution, this.resolution);

ctx.fillStyle = 'white'
ctx.moveTo(0, 0);
ctx.lineTo(this.resolution, 0);
ctx.lineTo(this.resolution, this.resolution);
ctx.lineTo(0, 0);

ctx.fillStyle = 'lightgrey';
ctx.fillRect(4, 4, this.resolution-8, this.resolution-8);

The this.slab holds off-screen canvas that contains the generated slab image.

Game area size and number of apples are function of screen size, to adapt to different screensizes.

var width  = window.innerWidth;
var height = window.innerHeight;

GRID_W = Math.min( 12, ~~(width / GRID_RESOLUTION));
GRID_H = Math.min( 12, ~~(height / GRID_RESOLUTION)) - 1;
var APPLE_COUNT = ~~((GRID_W * GRID_H) / 8);

For example, Here is the game on iPhone 3Gs.


Game main loop is passive, when user clicks on the screen click handler sets the location on object that holds the clicked tile x and y coordinates

$('#container').click( function( e ){
    var p = $('#canvas').offset();    = {

Main update handler is called on each frame and it checks if button has been clicked and updates the grid and redraws it if required. Grid is simple array where x and y are mapped as position.

Game.prototype.pos = function( x, y ) {
    return y*this.width+x;
Game.prototype.xy = function( pos ) {
    return {

Grid array values are integers where content is bit masked. Higher bits are used to flag if grid location has slab and apple, empty or number.

var SLAB_MASK = Math.pow(2, 16);
var APPLE_MASK = Math.pow(2, 15);

// grid location (5, 6) has slab and number 3
var pos = this.pos(5, 6);
this.grid[pos] = 3;
this.grid[pos] |= SLAB_MASK;

The slab is removed from location just by negating it

this.grid[pos] &= ~SLAB_MASK;

Draw loop just checks for each position and checks with mask what it contains

for ( var y=0; y < this.height; y++ ) {
    for ( var x=0; x < this.width; x++ ) {
        // Draw each tile
        var s = this.pos(x,y);
       if (s & SLAB_MASK) {
          // Still covered tile
         this.ctx.drawImage( this.slab, x * this.resolution, y * this.resolution )

       } else if (s & APPLE_MASK) {
         // Uncovered apple
         this.ctx.drawImage( tile, x * this.resolution + 2, y * this.resolution + 2 )
       } else if (s > 0) {
         // Neighbour number
         this.ctx.fillText( '' + s ,
                            x * this.resolution + this.resolution/2,
                            y * this.resolution + this.resolution/2)

When player clicks on empty tile, recursive function walks through the grid clearing slabs from adjacent empty tiles.

function _empty( x, y, force ) {
    if ( x < 0 || x >= that.width || y < 0 || y >= that.height ) return;

    var pos = that.pos(x, y);
    var d = that.grid[pos];

    if (d && (d & SLAB_MASK) && (force || !(d & APPLE_MASK))) {

        that.grid[pos] &= ~SLAB_MASK; // clear out slab

        // Clear next neighbor if this is empty tile
        if (that.grid[pos] == 0) {
            _empty(x, y - 1) // north
            _empty(x, y + 1) // south
            _empty(x - 1, y) // west
            _empty(x - 1, y - 1) // north west
            _empty(x - 1, y + 1) // south east
            _empty(x + 1, y) // east
            _empty(x + 1, y - 1) // north east
            _empty(x + 1, y + 1) // south east

Code is available at GitHub.

Apple Push Notifications with Haskell

In series of language evaluation it’s this time Push notifications with Haskell. Haskell is a pure functional language with strong static typing and as such is not ideally suited for IO code (networking) with dynamic data (JSON). Let’s see how it compares with the others. So far in the series

Step 1. Prerequisites

This code uses GHC 7.4.2 (Haskell compiler). On OS/X its easiest to install with port (or homebrew)

    $ sudo port install ghc
    $ ghc --version
    The Glorious Glasgow Haskell Compilation System, version 7.4.2

This installs the compiler and interpreter. You will also need cabal package manager. Base Haskell installation has surprisingly few tricks out of the box and lots of libraries are needed.

    $ sudo port install hs-cabal
    $ cabal-0.14.0 update
    $ cabal-0.14.0 install cabal-install

After this cabal command can be run from ~/.cabal/bin/cabal

  • Read introduction to Apple Push here and get application and private key sandbox certificates as .pem files.
  • And of course you need to have 32 byte push token from your iOS application.

Step 1. The Utilities.

Hexadecimal to binary

Haskell like many similar languages encourage coding style where programs are written as composition of small simple functions. We’ll start with one that encodes hexadecimal string to ByteString. ByteString is Haskell’s practical presentation of binary data.

import qualified Data.ByteString as B
import GHC.Word (Word8)
import Data.Convertible (convert)
import Data.Char (ord, chr, toUpper)
import Data.Bits (shift, (.|.), (.&.)) 

hexToByteString :: String -> B.ByteString
hexToByteString s
  | null s = B.empty
  | otherwise = B.pack . hexToWord8 $ s
    hexToWord8 :: String -> [Word8]
    hexToWord8 [] = []
    hexToWord8 [x] = error "Invalid hex stream"
    hexToWord8 (x:y:xs) = [ hn .|. ln ] ++ hexToWord8 xs
        hn = (shift (decodeNibble x) 4)
        ln = decodeNibble y
        decodeNibble c
          | o >= oA && o <= oF = convert (o - oA + 10) :: Word8
          | o >= o0 && o <= o9 = convert (o - o0) :: Word8
          | otherwise = error $ "Invalid hex: " ++ [c]
          where o = ord . toUpper $ c
                oA = ord 'A'
                oF = ord 'F'
                o0 = ord '0'
                o9 = ord '9'

Note the imports for Word8 and convert. Haskell as statically typed language requires that every single item has to have unambiguous data type. Imported functions are needed to manipulate and convert data so we can get from String (that is list of Char’s) to list of 8bit bytes (list of Word8’s) finally to ByteString

Install these packages to import Data.Convertible and ByteString

~/.cabal/bin/cabal install convertible
~/.cabal/bin/cabal install bytestring

Save file as Hex.hs and try it out

$ ghci
GHCi, version 7.4.2:  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude> :load Hex.hs
[1 of 1] Compiling Util.Hex         ( Hex.hs, interpreted )
Ok, modules loaded: Util.Hex.
*Util.Hex> hexToByteString "AABBCC"
*Util.Hex> :type hexToByteString "AABBCC"
hexToByteString "AABBCC" :: B.ByteString
*Util.Hex> hexToByteString "Something"
"*** Exception: Invalid hex stream
*Util.Hex> hexToByteString "AA"

Seems to be working.

JSON encoding

Next step is function that produces a UTF8 encoded JSON object that contains our push notification payload. We use simple object that contains only the alert message.

Save this in file Push.hs

import Text.JSON as JSON
import qualified Data.ByteString.UTF8 as BU

getJSONWithMessage :: String -> JSObject (JSValue)
getJSONWithMessage msg =
  let jmsg = JSString (toJSString msg) in
  toJSObject [("aps",
               JSObject (toJSObject [("alert", jmsg)]))]

Install packages for UTF8 ByteString and Text.JSON

~/.cabal/bin/cabal install json
~/.cabal/bin/cabal install utf8-string

Check that we get correctly formatted JSON string

Prelude> :load Push.hs
[1 of 1] Compiling Main             ( Push.hs, interpreted )
Ok, modules loaded: Main.
*Main> getJSONWithMessage "Hello World"
*Main> getJSONWithMessage "Hello World"
JSONObject {fromJSObject = [("aps",JSObject (JSONObject {fromJSObject = [("alert",JSString (JSONString {fromJSString = "Hello World"}))]}))]}
*Main> JSON.encode . getJSONWithMessage $ "Hello World"
"{\"aps\":{\"alert\":\"Hello World\"}}"

Lets also check that messages with non-ASCII character can be supported

*Main BU> BU.fromString . JSON.encode . getJSONWithMessage $ "Mötörhead"
*Main BU> 

The encoded JSON looks fine.

Building the PDU

For this purpose we use Put that supports building Lazy BinaryString’s with content

import qualified Data.ByteString as B
import qualified Data.ByteString.UTF8 as BU
import qualified Data.ByteString.Lazy as BL
import Data.Binary.Put
import GHC.Word (Word32, Word16)
import Data.Convertible (convert)

buildPDU :: B.ByteString -> BU.ByteString -> Word32 -> Put
buildPDU token payload expiry
  | (B.length token) /= 32 = fail "Invalid token"
  | (B.length payload > 255) = fail "Too long payload"
  | otherwise = do
    putWord8 1 -- command
    putWord32be 1 -- transaction id, can be anything
    putWord32be expiry  -- expiry time as seconds from epoch
    putWord16be ((convert $ B.length token) :: Word16) -- length of token
    putByteString token  -- push token
    putWord16be ((convert $ B.length payload) :: Word16) - payload length
    putByteString payload -- the json encoded as utf-8 string

We need also simple function to compute expiry time as relative to now

import Data.Time.Clock.POSIX (getPOSIXTime)

getExpiryTime :: IO (Word32)
getExpiryTime = do
  pt <- getPOSIXTime
  -- One hour expiry time
  return ( (round pt + 60*60):: Word32)

Install packages for Data.Binary.Put

~/.cabal/bin/cabal install binary

Step 2. Connecting to the Server

Make sure you have the certificate files (cert.pem and key-noenc.pem).

import Network.Socket
import Network.BSD (getHostByName, hostAddress, getProtocolNumber)
import OpenSSL
import OpenSSL.Session as SSL (

main = withOpenSSL $ do
  -- Prepare SSL context
  ssl <- context
  contextSetPrivateKeyFile ssl "key-noenc.pem"
  contextSetCertificateFile ssl "cert.pem"
  contextSetDefaultCiphers ssl
  contextSetVerificationMode ssl SSL.VerifyNone

  -- Open socket
  proto <- (getProtocolNumber "tcp")
  he <- getHostByName ""
  sock <- socket AF_INET Stream proto
  Network.Socket.connect sock (SockAddrInet 2195 (hostAddress he))

  -- Promote socket to SSL stream
  sslsocket <- connection ssl sock
  SSL.connect sslsocket  -- Handshake

  -- we'll send pdu here

Install OpenSSL package

~/.cabal/bin/cabal install HsOpenSSL

Step 3. Send the PDU

Now we’re finally ready to send the actual push notification message. Replace the push token to your own in following example.

  -- Promoto socket to SSL stream
  sslsocket <- connection ssl sock
  SSL.connect sslsocket  -- Handshake

  expiration <- getExpiryTime
  -- we send pdu here
  let token = "6b4628de9317c80edd1c791640b58fdfc46d21d0d2d1351687239c44d8e30ab1"
      message = "Hello World"
      btoken = hexToByteString token
      payload = BU.fromString . JSON.encode . getJSONWithMessage $ message
      lpdu = runPut $ buildPDU btoken payload expiration  -- build binary pdu
      pdu = toStrict lpdu  -- from lazy bytestring to strict
    in do
    SSL.write sslsocket pdu
    SSL.shutdown sslsocket Unidirectional -- Close gracefully
    toStrict = B.concat . BL.toChunks

Then compile and run program

$ ghc -threaded -o Push Push.hs
[1 of 2] Compiling Hex              ( Hex.hs, Hex.o )
[2 of 2] Compiling Main             ( Push.hs, Push.o )
Linking Push ...
$ ./Push

If everything went fine, the program exits within few seconds and you’ll see your push notification appear on your iOS device.

Full source of this example available here:

Interpreting Go Socket Errors

Go sockets returns error variables when something goes wrong, and the different error codes are documented here in the Go documentation. However I was not able to find coherent example that would show how the error variable is supposed to be used. Canonical way seems to be just checking it against nill and dump it out in case it’s something else, like this:

n, err := conn.Read(buffer[:])
if err != nill {
    fmt.Printf("%v\n", err)

Real applications (especially system applications) need to branch based on error to recover properly, so just error description is not enough. I made here example what it’s possible to deduct from the error variable.

conn, err := net.Dial("tcp", "", "")
n, err := conn.Read(buffer[:])

if err != nil {

    // print error string e.g.
    // "read tcp resource temporarily unavailable"
    fmt.Printf("reader %v\n", err)

    // print type of the error, e.g. "*net.OpError"
    fmt.Printf("%T", err)

    if err == os.EINVAL {
      // socket is not valid or already closed
    if err == os.EOF {
      // remote peer closed socket

    // matching rest of the codes needs typecasting, errno is
    // wrapped on OpError
    if e, ok := err.(*net.OpError); ok {
       // print wrapped error string e.g.
       // "os.Errno : resource temporarily unavailable"
       fmt.Printf("%T : %v\n", e.Error, e.Error)
       if e.Timeout() {
         // is this timeout error?
       if e.Temporary() {
         // is this temporary error?  True on timeout,
         // socket interrupts or when buffer is full

      // specific granular error codes in case we're interested
     switch e.Error {
        case os.EAGAIN:
           // timeout
       case os.EPIPE:
          // broken pipe (e.g. on connection reset)
          // just write raw errno code, can be platform specific
          // (see syscall for definitions)
          fmt.Printf("%d\n", int64(e.Error.(os.Errno)))

For example in case read times out, the code would print following

read tcp resource temporarily unavailable
os.Errno : resource temporarily unavailable

Apple Push Notifications with Go language

I started to familiarize myself to the Go language, and decided to do the usual try out, i.e. sending Apple Push Notifications. It’s my personal usability benchmark for new programming environments. So far in the series

Step 1. Prerequisites

Get and build Go. Example here was done on Ubuntu 10.04 LTS x64 with Go installed based on instructions here at Go getting started guide.

  • Read introduction to Apple Push here and get application and private key sandbox certificates as .pem files.
  • And of course you need to have 32 byte push token from your iOS application.

Step 2. The Code.

The code here is complete, copy it to file apn.go or get it from Github.

Make sure you change the certificate files (cert.pem and key-noenc.pem) to point to your own certificate files. Also, replace the push token with your own push token, it’s written as hexadecimal string in this example for clarity.

package main

import (

func main() {

   // load certificates and setup config
   cert, err := tls.LoadX509KeyPair("cert.pem", "key-noenc.pem")
   if err != nil {
       fmt.Printf("error: %s\n", err.String())
   conf := &tls.Config {
        Certificates: []tls.Certificate{cert},

   // connect to the APNS and wrap socket to tls client
   conn, err := net.Dial("tcp", "", "")
   if err != nil {
      fmt.Printf("tcp error: %s\n", err.String())
   tlsconn := tls.Client(conn, conf)

   // Force handshake to verify successful authorization.
   // Handshake is handled otherwise automatically on first
   // Read/Write attempt
   err = tlsconn.Handshake()
   if err != nil {
      fmt.Printf("tls error: %s\n", err.String())
   // informational debugging stuff
   state := tlsconn.ConnectionState()
   fmt.Printf("conn state %v\n", state)

   // prepare binary payload from JSON structure
   payload := make(map[string]interface{})
   payload["aps"] = map[string]string{"alert": "Hello Push"}
   bpayload, err := json.Marshal(payload)

   // decode hexadecimal push device token to binary byte array
   btoken, _ := hex.DecodeString("6b4628de9317c80edd1c791640b58fdfc46d21d0d2d1351687239c44d8e30ab1") 

   // build the actual pdu
   buffer := bytes.NewBuffer([]byte{})
   // command
   binary.Write(buffer, binary.BigEndian, uint8(1))

   // transaction id, optional
   binary.Write(buffer, binary.BigEndian, uint32(1))

   // expiration time, 1 hour
   binary.Write(buffer, binary.BigEndian, uint32(time.Seconds() + 60*60))

   // push device token
   binary.Write(buffer, binary.BigEndian, uint16(len(btoken)))
   binary.Write(buffer, binary.BigEndian, btoken)

   // push payload
   binary.Write(buffer, binary.BigEndian, uint16(len(bpayload)))
   binary.Write(buffer, binary.BigEndian, bpayload)
   pdu := buffer.Bytes()

   // write pdu
   _, err = tlsconn.Write(pdu)
   if err != nil {
      fmt.Printf("write error: %s\n", err.String())

   // wait for 5 seconds error pdu from the socket

   readb := [6]byte{}
   n, err := tlsconn.Read(readb[:])
   if n > 0 {
     fmt.Printf("received: %s\n", hex.EncodeToString(readb[:n]))


Step 3. Compile and Run


$ 6g apn.go
$ 6l apn.6
$ ./6.out
conn state {true 47}

If everything went fine, the program exits within few seconds and  you’ll see your push notification appear on your iPhone.

Socket binary data stream parsing in Erlang

Erlang code has two different ways of reading from the socket, active and passive. In passive mode, your code calls recv to the socket to receive bytes. In active mode you install controlling process to the socket and receive data as Erlang messages.

Binary data parsing is more complicated on the latter, as application code doesn’t have any control over size of the packets (or flow control for that matter) that it receives. It could be 1 byte, few kilobytes or whatever. Naive pattern matching fails if you don’t get exactly the amount of bytes you want.

Lets assume simple binary protocol, where you need to read packets that are varying in length and formatted as following. Timestamp is defined in first 4 bytes, next 2 bytes define length of payload followed by the payload itself.

|    Timestamp   |  Len  |    Payload      |
|  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | ... |

Incidentally, this is the data format used by Apple Push notification feedback service.

First lets define a function that accepts binary Data and process Pid where it sends parsed result. This tail recursive function simply matches as many packets from data as possible, and returns with remaining unparsed data.

match_data(Data, Parent) ->
   case Data of
      <<Timestamp:32/big, Size:16/big, PushToken:Size/binary-unit:8, Rest/binary>> ->
         Parent ! {Timestamp, PushToken}, % notify parent
         %% parse rest of the data
        match_data(Rest, Parent);
      Rest ->
         %% no match

Then function that actually receives the data from the socket. It receives arbitrary pieces of data, concatenates it to existing unparsed data and calls the match_data handler to make actual packet matching. Then it loops again with unparsed portion of data.

loop(Bin, Parent) ->
      {_, _Sock, Data} ->
         loop(match_data(erlang:list_to_binary([Bin, Data]), Parent), Parent);
      {ssl_closed, _Sock} -> ok;
      {_event, _Event} ->
         error_logger:error_msg("Unexpected", [_event, _Event])

Install the loop as controlling process, with initially empty “seed” data.

Pid = self(),
ssl:setopts(Sock, [{active, true}, {mode, binary}]),
ssl:controlling_process(Sock, spawn(fun() -> loop(<<>>, Pid) end)).

This way data is parsed correctly, no matter what size of chunks data is returned from the socket.

Apple Push Notifications with Erlang

Continuing from the Node.js based example I wrote earlier, here is example how to do the same with Erlang. You can check more details from the previous post, but as reminder the Apple Push Notification interface is simple binary based protocol that you use over SSL authenticated socket.

1. Prerequisites

I assume you have erlang installed, the version I’m using here is Erlang R13B03 (erts-5.7.4).

Check instructions here at Node.js based example how to get the push certificates as .pem files.

Install mochiweb package in your erlang environment.

Check that you’ve all set.

$ ERL_LIBS=. erl
Erlang R13B03 (erts-5.7.4)  [64-bit] [rq:1] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.7.4  (abort with ^G)
2> mochijson:encode("cat").
3> application:start(ssl).

Later releases of Erlang may require you to start ‘crypto’ and ‘public_key’ applications before starting ssl.

2. Sending Push Notification

First code to convert hexadecimal strings to binary format. This is mainly for readability for the example. I don’t remember where I snacked that code, but it seems to be found from several sites around the Intertubes.


bin_to_hexstr(Bin) ->
   lists:flatten([io_lib:format("~2.16.0B", [X]) ||
                  X <- binary_to_list(Bin)]).

hexstr_to_bin(S) ->
   hexstr_to_bin(S, []).
hexstr_to_bin([], Acc) ->
hexstr_to_bin([X,Y|T], Acc) ->
   {ok, [V], []} = io_lib:fread("~16u", [X,Y]),
   hexstr_to_bin(T, [V | Acc]).

Then the code to actually connect to APN and send the PDU


sendpush() ->
  Address = "",
  Port = 2195,
  Cert = "cert.pem",
  Key = "key-noenc.pem",  

  %Options = [{cacertfile, CaCert}, {certfile, Cert}, {keyfile, Key}, {mode, binary}],
  Options = [{certfile, Cert}, {keyfile, Key}, {mode, binary}],
  Timeout = 1000,
  {ok, Socket} = ssl:connect(Address, Port, Options, Timeout),

Open SSL socket to the APN server with application certificate and private key.

  Payload = mochijson:encode({struct, [{"aps", {struct, [{"alert", "This is Message"}]}}]}),
  BPayload = erlang:list_to_binary(Payload),
  PayloadLen = erlang:byte_size(BPayload),

Convert JSON payload to binary

  Token = "7518b1c2c7686d3b5dcac8232313d5d0047cf0dc0ed5d753c017ffb64ad25b60",
  BToken = hex:hexstr_to_bin(Token),
  BTokenLength = erlang:byte_size(BToken),

Convert token from hexadecimal string to binary

  SomeID= 1,
  {MSeconds,Seconds,_} = erlang:now(),
  Expiry = MSeconds * 1000000 + Seconds + 3600*1,

Transaction id (can be always 0) and 1 hour  expiration time

  Packet = <<1:8, SomeID:32/big, Expiry:32/big, BTokenLength:16/big, BToken/binary, PayloadLen:16/big, BPayload/binary>>,

Construct the binary packet.

  ssl:send(Socket, Packet),

Send the PDU and close the socket

3. Listening for Errors

In case something went wrong, Apple will send you back single error packet for the first error and closes the socket. You need to read that one error code. The packet that triggered error is identified by the ID you set when sending it.

See table 5-1 at Apple documentation to interpret error codes.

Example error listener

recv(Parent) ->
       {ssl, Sock, <<Command, Status, SomeID:32/big>>} ->
                                  [Command, Status, SomeID]),
           Parent ! {error, SomeID}; % notify parent
      {ssl_closed, _Sock} -> ok  %

And remember to spawn process and set it as the controlling process after creating the socket

  Pid = self(),
  ssl:controlling_process(Sock, spawn(fun() -> recv(Pid) end)),

Note that you need to implement also poller application to read feedback info from Apple Feedback server. This is very similar to the receiver above as it only needs to connect and wait for packets from Apple server until it closes the socket. See Apple documentation for more in depth explanation.


Facebook Places v.s. FourSquare API

I’ve been doing lately lots of work related to location and geocoding and had change to do integration to both Facebook Places API and Foursquare API. Here is simple side-by-side comparison that you find useful.

Both API’s are JSON based HTTP REST interfaces. They offer place search by query string (e.g ‘pizza’), coordinates (lat & lon) and radius of search.  Authentication is based on OAuth, though FourSquare can be be also used without user token, within rate limit.

Facebook Places API

Places API is a part of the larger Graph API and enables places query in addition to people, objects, etc…

Simple example query with keyword pizza around New Jersey.,-74.1&distance=1000&access_token=<<oauth access token>>

Example return value JSON

{u'data': [{u'category': u'Local business',
               u'id': u'154150281291249',
               u'location': {u'city': u'Wallington',
                   u'latitude': 40.843026999999999,
                   u'longitude': -74.105144999999993,
                   u'state': u'NJ',
                   u'street': u'435 Paterson Ave',
                   u'zip': u'07057-2202'},
               u'name': u"Marina's Pizza & Restaurant"},
             {u'category': u'Local business',
              u'id': u'116347165056160',
              u'location': {u'city': u'Carlstadt',
                   u'latitude': 40.841267000000002,
                   u'longitude': -74.101149000000007,
                   u'state': u'NJ',
                   u'street': u'326 Garden St',
                   u'zip': u'07072-1626'},
              u'name': u'Garden Pizza Ice Cream & Cafe'},

Note that as of time of writing,  Places API does not work outside United States. if you try to call it from IP address outside US  you’ll probably get following cryptic error response:

   "error": {
      "type": "OAuthException",
      "message": "(#603) The table you requested does not exist"

API is pretty fast but data quality is mediocre as its best and you pretty much get only the name, location and some sparse address data that might be enough.

  • API does not support unauthenticated requests
  • Results have often duplicates, typos and include strange places that seem to have been automatically scraped from some  database.
  • Rate limit is not problem. Facebook does endorse some kind of (high) rate limit, but it’s not documented. If you keep it less than 1-2s  per access token you are clear.
  • Every location, even mountains seem to be categorized as ‘Local business’
  • Currently can not be used outside US and it rarely returns results for locations outside US.
  • Can not be used anonymously, you need the users OAuth access token.
  • Results can be paged

Before Facebook adds proper category info, this API is useful only for finding places by name.

FourSquare v1 API (will be deprecated mid-2011)

Corresponding query from FourSquare v1 API is the ‘venues’ call that is slower but the data quality is much better and denser. You get also category information that is pretty good for filtering and can be used also as human readable description for the venue.

Example query

Example result JSON

{'groups': [{'type': 'Matching Places',
             'venues': [{'address': '85 Rte 17 South',
                        'city': 'East Rutherford',
                        'distance': 1299,
                        'geolat': 40.830497762250729,
                        'geolong': -74.093245267868042,
                        'id': 370040,
                        'name': "CiCi's Pizza Buffet",
                        'phone': '2014388200',
                        'primarycategory': {
                            'fullpathname': 'Food:Pizza',
                            'iconurl': '',
                            'id': 79081,
                            'nodename': 'Pizza'},
                        'state': 'NJ',
                        'stats': {'herenow': '0'},
                        'verified': False,
                        'zip': '07073'},
                        {'address': '271 Main St.',
                         'city': 'Belleville',
                         'distance': 5168,
                         'geolat': 40.789736300000001,
                         'geolong': -74.146524900000003,
                         'id': 1206994,
                         'name': u'Pizza Village Caf\xe9 II',
                         'phone': '9734501818',
                         'primarycategory': {
                             'fullpathname': 'Food:Pizza',
                             'iconurl': '',
                             'id': 79081,
                             'nodename': 'Pizza'},
                         'state': 'New Jersey',
                         'stats': {'herenow': '0'},
                         'verified': False,
                         'zip': '07109'},

Data quality is  better than Facebooks and results seem much more relevant, at least for now.

  • Query latency varies a lot
  • Little duplicates, some because of user typos
  • Lots of results also for non-US locations
  • Strict rate limit, you can only do few requests per second and daily total is few hundred. (computed per ip + authenticated user)
  • You should OAuth authenticate your users from Foursquare, otherwise rate limit will make it impossible to use their API.
  • Maximum number of results per query is 50
  • Sometimes returns pretty irrelevant places like homes and such, as category can not be used as query filter

FourSquare v2 API (Beta, work in progress)

v2 API promises improvement over the old v1 and is now available for public use. However it’s still work in progress and there is no guarantee that it stays backwards compatible. Official info here:

Unlike v1, v2 does not allow completely anonymous access, so you have to register your application here. Also, only HTTPS is supported. User authentication is OAuth2 that is much simpler to implement than OAuth1 used in v1.

Example (unauthenticated) query.<<my app secret>>&ll=60.205065%2C24.654196&client_id=<<my app key>>&l=100&llAcc=100

Example response

{'meta': {'code': 200},
 'response': {'groups':
                              [{'icon': '',
                                'id': '4bf58dd8d48988d1ca941735',
                                'name': 'Pizza',
                                'parents': ['Food'],
                                'primary': True},
                               {'icon': '',
                                'id': '4bf58dd8d48988d123941735',
                                'name': 'Wine Bar',
                                'parents': ['Nightlife']},
                               {'icon': '',
                                'id': '4bf58dd8d48988d116941735',
                                'name': 'Bar',
                                'parents': ['Nightlife']}],
                         'contact': {'phone': '4155589991',
                                     'twitter': 'patxispizza'},
                         'hereNow': {'count': 0},
                         'id': '43d7e5dff964a5205d2e1fe3',
                         'location': {'address': '511 Hayes St',
                                      'city': 'San Francisco',
                                      'crossStreet': 'at Octavia Blvd.',
                                      'distance': 474,
                                      'lat': 37.776435900000003,
                                      'lng': -122.425003,
                                      'postalCode': '94102',
                                      'state': 'CA'},
                         'name': "Patxi's Chicago Pizza",
                         'specials': [{'description': 'every check-in',
                                       'id': '4c06d44386ba62b5945588b3',
                                       'message': 'Check-in, show your server, and your first fountain beverage (w/ purchase) is free or your first PBR is only $1. Free 10-inch half-baked pizza of any type, once a week, for the Mayor.',
                                       'type': 'frequency'}],
                         'stats': {'checkinsCount': 2269,
                                   'usersCount': 1115},
                                   'verified': True},

Improvements and changes over v1 API

  • Rate limited as v1
  • Maximum number of results is now over 100
  • Better category structure
  • Venue Twitter handle
  • Address detail includes crossStreet info
  • Specials, etc..

They also changed venue id format from number to hash string, that makes things difficult for those who have already used the old API. I also wish there would be way to filter out private homes from results.

Socket Pooling on Node.js

UPDATE: This post is pretty old and nowadays there are quite a few socket pooling implementations for Node.js. I recommend taking look into Jackpot ( It’s simple and does the job.

I was looking for connection pooling for my Apple Push notification proxy but couldn’t find proper pool implementation, just some experiments that didn’t do proper error handling or could not handle basic real life requirements. I also needed a pool that could be called by both blocking and non-blocking mode.

Basic functionality requirements of a typical connection pool

  • Ensure that connections are available and no more than maximum number of parallel connections exists
  • Do not keep unnecessary connections open for long. Less traffic, less connections.
  • Handle connection decay. For example if connection gets closed by remote peer while it’s waiting in pool.
  • Sane retry logic. Most importantly do not retry connections as fast as CPU allows in case of error.
  • Can either guarantee waiting time, or supports way of checking connection availability.

Lets define first the interface as Node.js module that exports reserve and release methods.

exports.ConnectionPool = function(factory) {
 var self = {};
 var waitlist = [] ;   // callbacks waiting for connection
 var connections = [];  // unused connections currently in pool
 var ccount = 0;  // number of current connections

 // called to reserve connection from pool. Calls callback without connection if wait is false
 self.reserve = function(callback, wait) { ... }
 // returns connection to the pool. Connection is destroyed if destroy is true
 self.release = function(connection, destroy) { ... }

Pool requires user provided factory that is dictionary that defines 3 functions and one property.

factory = {
create : function(callback) { ... }  // create connection and call callback with it
validate: function(validate) { .. } // return true or false for connection
destroy: function(connection) { .. }  // destroy connection
max: 5

Pool implementation needs several methods for house keeping. checkWaiters() is called to create new connections for waiting callbacks.

function checkWaiters() {
  if(waitlist.length > 0 && ccount < factory.max) {
     factory.create(function(connection) {          
       if(!connection) {
         ccount--; // failed                    
       } else {
         if(waitlist.length > 0)

Then its counterpart, destroyConnection() that removes connection from pool for good. This function can be called from several places and situations so it adds “deleted” flag to the connection to avoid duplicate processing. It also tries to recreate new connection immediately (if needed) by calling checkWaiters().

function destroyConnection(connection) {
    if(connection.destroyed) {
    connection.destroyed = true;
    for(var i=0; i < connections.length; i++) {
        // remove from pool if it's there                                
        if(connection == connections[i]) {

    // connection was lost, we need to create new one if there are       
    // waiting requests                                                  

Then the actual reserve interface method. Function has two modes, if wait is false it returns immediately if it fails to give connection, otherwise the callback goes to the waiting list.  Connections from pool are validated with factory before they are passed to the callback.

self.reserve = function(callback, wait) {
    if (wait == undefined) {
        wait = true;
    if(connections.length > 0) {  // pool has available connections
        connection = connections.shift();
        if(factory.validate(connection)) {  // is it still valid
           clearTimeout(connection.timeoutid);  // cancel the cleanup timeout
        } else {
            destroyConnection(connection);  // stale connection
    if(ccount >= factory.max) { // maximum number of connections created
       if(!wait) {
       } else {
    ccount++;  // try to create connection
    factory.create(function(connection) {
        if(!connection) {
           ccount--; // failed                                          
           if(!wait) {
           } else {
        } else {
          callback(connection); // connection created successfully

And the release method. Release method destroys connection if requested and forgets about it then completely. Otherwise it tries to find callback from waiting list and passes the connection for immediate reuse. In case there is nobody waiting, it puts the connection back  to pool and times the connection cleanup event in 10 seconds.

self.release = function(connection, destroy) {
    if(destroy) {
    } else {
       if(waitlist.length > 0) {
       connection.timeoutid = setTimeout(function() {
       }, 10000);

And finally the background polling that is responsible mainly of connection retry logic. It polls the waiting list once per second, as you remember that function creates connections if there is anyone waiting for it. The connections are normally created on demand in reserve() call.

function poll() {
   setTimeout(poll, 1000);
setTimeout(poll, 1000);  // start poller

How this then handles the different error cases?

  • Idle connections are cleaned by timeout call that is set on release()
  • User code can request connection delete by setting the destroy flag to true in call to release()
  • Connections that go bad while in pool are (hopefully) intercepted by user provided factory.validate()
  • Counter keeps track of maximum number of created connections and because its increased before creating connection it also limits maximum number of parallel connection attempts!
  • When connection creation fails, the callback goes to waiting list and poll per second tries to create new parallel connections.
  • User code that needs process immediately can set wait flag to false when reserving connection. Code could be also changed to timed out callback call, so wait could be defined as milliseconds instead of instant fail v.s. infinite wait as its done now.

Then how to use it.

HTTP Pool Example

var pool = require('./pool');
 var http = require('http');
var httppool = pool.ConnectionPool({
 create: function(callback) { callback(http.createClient(80, "")); },
 validate: function(connection) { return true; /* no need to validate  */ },
 destroy : function(httpclient) {  /* nothing to destroy */},
 max: settings.couchdbmax

Using the pool

 httppool.reserve(function(connection) {
   var req = connection.request("GET", "/index.html");
   req.on('error', function(error) {
     httppool.release(connection, true);
   req.on('response', function(response) {
     body = '';
     response.on('data', function(data) {
     body += data;
   response.on('end', function() {

Client socket pool

var pool = require('./pool');
var net = require('net');
var apnpool = pool.ConnectionPool({
 create: function(callback) {
   function errorcb(error) {  // error handler
   connection = net.createConnection(12345, '');
   connection.once('error', errorcb); 
   connection.on('connect', function() {
     connection.removeListener('error', errorcb); // clear error handler before passing forward
 validate: function(connection) { return connection.writable; },
 destroy : function(connection) { connection.end(); },
 max: 5,

Reuse is little bit tricky with sockets, as you need probably set and clear ‘error’ and  ‘data’ event handlers for reading the responses in each worker.