Finding Patterns in Data Sequences

Some time ago I was presented question if there would be easy and quick way to recognize certain usage patterns in huge amount of event data. The events in question were common user activity like clicking buttons, waiting certain time between and so on. This data could then used to improve user experience or catch cheaters that write bots to play their game for them and so on.

There are multiple ways to approach this problem but I came up with relatively simple method that seems to have flexibility. We can think the events of interest as sequence of symbols and in case also time between events is of interest, quantize the time as symbols for short, long and medium time. Then, sequence of events can be presented as list of symbols

ABC123ABC123ABC123ABC123XABC123ABC123ABC123ABC123XABC123ABC123ABC123ABC123X

Here A could be button click to open window, B a tab click in window, C close button, 1 a second pause before moving mouse, 2 a few second pause before moving mouse and X is collect resources etc..

Now we can approach problem with pattern finding. First let’s define function that is supposed to find recurring patterns and output them and input as sequence of those patterns. (Following is written in Python)

def findpattern(s, trivial):
	patterns = {}
	symbolmap = {}

	p = patterns
	acc = []
	output = []
	for c in s:
		if not c in p:
			if not c in patterns or (not trivial and acc[-1] == c):
				p[c] = {}
				p = p[c]
				acc.append(c)
			else:
				acc = ''.join(map(lambda x: str(x),acc))
				if not acc in symbolmap:
					symbolmap[acc] = true
				output.append(acc)

				acc = [c]
				p = patterns[c]
		else:
			p = p[c]
			acc.append(c)

	if acc:
		acc = ''.join(map(lambda x: str(x),acc))
		if not acc in symbolmap:
			symbolmap[acc] = true
		output.append(acc)

	return (symbolmap.keys(), output)

Function accepts two arguments, the list of symbols and flag if trivial pattern should be accepted. This is needed to determine if you consider a input list “AAAAA” to be one instance of pattern “AAAAA” or 5 instances of pattern “A”.

When this function is applied on the example string above, we get list of core patterns and the input as pattern list.

> s = "ABC123ABC123ABC123ABC123XABC123ABC123ABC123ABC123XABC123ABC123ABC123ABC123X"
> findpattern(s, False)
['ABC123', 'ABC123X'] ['ABC123', 'ABC123', 'ABC123', 'ABC123X', 'ABC123', 'ABC123', 'ABC123', 'ABC123X', 'ABC123', 'ABC123', 'ABC123', 'ABC123X']

Here it’s easy to see that the string is composed of two main patterns, ABC123 and ABC123X. The second output list is the input string split as instances of pattern.
Real data is noisier than the example here, so proper data cleaning and event definition / filtering are vital for any kind of useful pattern recognition.

In this case one could guess that the player might be bot, as the same sequence occur regularly with too much precision and too little variability for human player. (Remember that the symbols 1, 2 and 3 presented also time in this example.)

Going forward we can also define function that recursively uses the output of the findpattern to find patterns of patterns up to the “prime” pattern of the input, i.e. the pattern that when repeated N times produces the original input string.

def findprimepattern(s, trivial=False):
	o = s
	while True:
		s, o = findpattern(o, trivial)
		print s,o
		if len(s) == 1:
			return s[0]

Example with more complicated pattern string

> s = (((("ABC"*2)+"DE")*3+"X")*2+"O")*4
"ABCABCDEABCABCDEABCABCDEXABCABCDEABCABCDEABCABCDEXOABCABCDEABCABCDEABCABCDEXABCABCDEABCABCDEABCABCDEXOABCABCDEABCABCDEABCABCDEXABCABCDEABCABCDEABCABCDEXOABCABCDEABCABCDEABCABCDEXABCABCDEABCABCDEABCABCDEXO"
> findprimepattern(s)
['ABCDEXO', 'ABC', 'ABCDE', 'ABCDEX'] ['ABC', 'ABCDE', 'ABC', 'ABCDE', 'ABC', 'ABCDEX', 'ABC', 'ABCDE', 'ABC', 'ABCDE', 'ABC', 'ABCDEXO', 'ABC', 'ABCDE', 'ABC', 'ABCDE', 'ABC', 'ABCDEX', 'ABC', 'ABCDE', 'ABC', 'ABCDE', 'ABC', 'ABCDEXO', 'ABC', 'ABCDE', 'ABC', 'ABCDE', 'ABC', 'ABCDEX', 'ABC', 'ABCDE', 'ABC', 'ABCDE', 'ABC', 'ABCDEXO', 'ABC', 'ABCDE', 'ABC', 'ABCDE', 'ABC', 'ABCDEX', 'ABC', 'ABCDE', 'ABC', 'ABCDE', 'ABC', 'ABCDEXO']
['ABCABCDE', 'ABCABCDEX', 'ABCABCDEXO'] ['ABCABCDE', 'ABCABCDE', 'ABCABCDEX', 'ABCABCDE', 'ABCABCDE', 'ABCABCDEXO', 'ABCABCDE', 'ABCABCDE', 'ABCABCDEX', 'ABCABCDE', 'ABCABCDE', 'ABCABCDEXO', 'ABCABCDE', 'ABCABCDE', 'ABCABCDEX', 'ABCABCDE', 'ABCABCDE', 'ABCABCDEXO', 'ABCABCDE', 'ABCABCDE', 'ABCABCDEX', 'ABCABCDE', 'ABCABCDE', 'ABCABCDEXO']
['ABCABCDEABCABCDEABCABCDEX', 'ABCABCDEABCABCDEABCABCDEXO'] ['ABCABCDEABCABCDEABCABCDEX', 'ABCABCDEABCABCDEABCABCDEXO', 'ABCABCDEABCABCDEABCABCDEX', 'ABCABCDEABCABCDEABCABCDEXO', 'ABCABCDEABCABCDEABCABCDEX', 'ABCABCDEABCABCDEABCABCDEXO', 'ABCABCDEABCABCDEABCABCDEX', 'ABCABCDEABCABCDEABCABCDEXO']
['ABCABCDEABCABCDEABCABCDEXABCABCDEABCABCDEABCABCDEXO'] ['ABCABCDEABCABCDEABCABCDEXABCABCDEABCABCDEABCABCDEXO', 'ABCABCDEABCABCDEABCABCDEXABCABCDEABCABCDEABCABCDEXO', 'ABCABCDEABCABCDEABCABCDEXABCABCDEABCABCDEABCABCDEXO', 'ABCABCDEABCABCDEABCABCDEXABCABCDEABCABCDEABCABCDEXO']
ABCABCDEABCABCDEABCABCDEXABCABCDEABCABCDEABCABCDEXO

And indeed the input string is the prime pattern ABCABCDEABCABCDEABCABCDEXABCABCDEABCABCDEABCABCDEXO concatenated 4 times.

Note effect of parameter trival on the function.

> findprimepattern('AAAA', False)
['AAAAA'] ['AAAAA']
AAAAA
> findprimepattern('AAAA', True)
['A'] ['A', 'A', 'A', 'A', 'A']
A

Former is single instance of pattern “AAAA” and latter is 4 instances of pattern “A”.

Reduce mp3 file size for app and web use

Typical 2 minutes piece of music is roughly 2-4MB in common mp3 format. This is size for usual quality that is stereo audio 256 kbps and 44.1 kHz sample rate. Tough the file size is not that large, it will slow down loading HTML5 game or mobile app, especially on wireless networks.

Web HTML5 or mobile game applications do not need nearly as high quality and the size can be reduced to ~400-700kB without too much audible quality penalty. Difference can be noticed if you listen for it, but it’s unlikely that users do notice anything, as they have nothing to compare it against. This also depends lot of the music, sharp high frequency sounds tend to distort first.

MP3/WAV to reduced size MP3

Easy way to convert mp3s is lame. A flexible command line mp3 decoder and encoder.

Here we use 1:45 minutes 1.7MB music piece Lake Hylia from Zelda Reorchestrated. I don’t own any rights to this music and it’s used purely for demonstration purposes. I selected this piece because it’s melodic and instrumental like most game music is.

Key to reducing file size is lower sample and bitrate, and most importantly converting file to mono. Example below is how to do this with lame 3.99.5 on OS/X. I installed lame using Macports.

$ lame -hv -mm --resample 22.05 "Zelda Reorchestrated - Lake Hylia.original.mp3" -B 32 "Zelda Reorchestrated - Lake Hylia.small.mp3"
LAME 3.99.5 64bits (http://lame.sf.net)
Autoconverting from stereo to mono. Setting encoding to mono mode.
Resampling:  input 44.1 kHz  output 22.05 kHz
polyphase lowpass filter disabled
Encoding Zelda Reorchestrated - Lake Hylia.mp3
      to Zelda Reorchestrated - Lake Hylia.small.mp3
Encoding as 22.05 kHz single-ch MPEG-2 Layer III VBR(q=4)
    Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
  3996/3996  (100%)|    0:01/    0:01|    0:01/    0:01|   82.375x|    0:00
  8 [  36] **
 16 [  24] *
 24 [   4] *
 32 [3932] **************************************************************************************************************************************
-------------------------------------------------------------------------------------------------------------------------------------------------
   kbps       mono %     long switch short %
   31.7      100.0        99.9   0.1   0.1
Writing LAME Tag...done
ReplayGain: +3.7dB

This converts the input mp3 (lame also accepts .wav files) to 22.05kHz 64kbs monoaural mp3. The output file size is 412kB, resulting to nearly 75% savings. You can reduce file further by using even lower bitrate, (experiment -B 24 or -B 16), but you may start hearing distortions.

$ du -hs Zelda\ Reorchestrated\ -\ Lake\ Hylia.*
1.6M	Zelda Reorchestrated - Lake Hylia.original.mp3
404K	Zelda Reorchestrated - Lake Hylia.small.mp3

Test play the files here and listen for any differences.

Original: Zelda Reorchestrated – Lake Hylia.original.mp3
Reduced: Zelda Reorchestrated – Lake Hylia.small.mp3

MP3/WAV to reduced OGG Vorbis

This is main for Firefox as it still supports only Ogg Vorbis. You can convert mp3/wav file to reduced mono ogg vorbis with ffmpeg tool.

This example uses ffmpeg 2.1, and like lame I installed is on OS/X using Macports.

$ ffmpeg -i "Zelda Reorchestrated - Lake Hylia.original.mp3" -strict experimental -acodec libvorbis -ab 32k -ar 22050 -ac 1 "Zelda Reorchestrated - Lake Hylia.small.ogg"
ffmpeg version 2.1 Copyright (c) 2000-2013 the FFmpeg developers
  built on Oct 30 2013 23:24:10 with Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
  configuration: --prefix=/opt/local --enable-swscale --enable-avfilter --enable-avresample --enable-libmp3lame --enable-libvorbis --enable-libopus --enable-libtheora --enable-libschroedinger --enable-libopenjpeg --enable-libmodplug --enable-libvpx --enable-libspeex --enable-libass --enable-libbluray --enable-gnutls --enable-libfreetype --disable-outdev=xv --mandir=/opt/local/share/man --enable-shared --enable-pthreads --cc=/usr/bin/clang --arch=x86_64 --enable-yasm --enable-gpl --enable-postproc --enable-libx264 --enable-libxvid --enable-nonfree --enable-libfaac
  libavutil      52. 48.100 / 52. 48.100
  libavcodec     55. 39.100 / 55. 39.100
  libavformat    55. 19.104 / 55. 19.104
  libavdevice    55.  5.100 / 55.  5.100
  libavfilter     3. 90.100 /  3. 90.100
  libavresample   1.  1.  0 /  1.  1.  0
  libswscale      2.  5.101 /  2.  5.101
  libswresample   0. 17.104 /  0. 17.104
  libpostproc    52.  3.100 / 52.  3.100
Input #0, mp3, from 'Zelda Reorchestrated - Lake Hylia.original.mp3':
  Metadata:
    title           : Lake Hylia
    artist          : Zelda Reorchestrated
    album           : Twilight Princess
    track           : 11
  Duration: 00:01:44.39, start: 0.000000, bitrate: 128 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 128 kb/s
Output #0, ogg, to 'Zelda Reorchestrated - Lake Hylia.small.ogg':
  Metadata:
    title           : Lake Hylia
    artist          : Zelda Reorchestrated
    album           : Twilight Princess
    track           : 11
    encoder         : Lavf55.19.104
    Stream #0:0: Audio: vorbis (libvorbis), 22050 Hz, mono, fltp, 32 kb/s
    Metadata:
      title           : Lake Hylia
      artist          : Zelda Reorchestrated
      album           : Twilight Princess
      TRACKNUMBER     : 11
      encoder         : Lavf55.19.104
Stream mapping:
  Stream #0:0 -> #0:0 (mp3 -> libvorbis)
Press [q] to stop, [?] for help
size=     398kB time=00:01:44.38 bitrate=  31.2kbits/s
video:0kB audio:387kB subtitle:0 global headers:3kB muxing overhead 1.866299%

Resulting OGG file size and quality is comparable to reduced mp3.

$ du -hs Zelda\ Reorchestrated\ -\ Lake\ Hylia.*p3
400K	Zelda Reorchestrated - Lake Hylia.small.ogg
1.6M	Zelda Reorchestrated - Lake Hylia.original.mp3

Compare here:

Original: Zelda Reorchestrated – Lake Hylia.original.mp3
Reduced: Zelda Reorchestrated – Lake Hylia.small.ogg

Minesweeper clone in HTML5

In my two previous block entries I wrote about one possible ways to do simple Slots and Car games on HTML5 technologies. This writeup combines some of those methods and introduces new ones to implement Minesweeper game with a twist.

mine

This game is actually “reverse” minesweeper, or should we say Applesweeper. Players mission is to find all the apples without clicking on of the empty tiles. Score is increased when player finds an apple and is decreased when he misses. Game ends when all the apples are found.

Try out the game here: http://ikonen.me/examples/mine/

As in previous games, hud is a html div, and the game grid is drawn on a canvas. Grey slab that hides tiles content is procedurally generated on offscreen canvas at the startup.

this.slab = document.createElement('canvas');
var ctx = this.slab.getContext('2d');
this.slab.width = this.resolution;
this.slab.height = this.resolution;

ctx.fillStyle = 'grey';
ctx.fillRect(0, 0, this.resolution, this.resolution);

ctx.beginPath();
ctx.fillStyle = 'white'
ctx.moveTo(0, 0);
ctx.lineTo(this.resolution, 0);
ctx.lineTo(this.resolution, this.resolution);
ctx.lineTo(0, 0);
ctx.closePath();
ctx.fill();

ctx.fillStyle = 'lightgrey';
ctx.fillRect(4, 4, this.resolution-8, this.resolution-8);

The this.slab holds off-screen canvas that contains the generated slab image.

Game area size and number of apples are function of screen size, to adapt to different screensizes.

var width  = window.innerWidth;
var height = window.innerHeight;

GRID_W = Math.min( 12, ~~(width / GRID_RESOLUTION));
GRID_H = Math.min( 12, ~~(height / GRID_RESOLUTION)) - 1;
var APPLE_COUNT = ~~((GRID_W * GRID_H) / 8);

For example, Here is the game on iPhone 3Gs.

mine_iphone

Game main loop is passive, when user clicks on the screen click handler sets the location on object that holds the clicked tile x and y coordinates

$('#container').click( function( e ){
    var p = $('#canvas').offset();        
    game.click = {
        x:parseInt((e.pageX-p.left)/game.resolution),
        y:parseInt((e.pageY-p.top)/game.resolution)};
    }
);

Main update handler is called on each frame and it checks if button has been clicked and updates the grid and redraws it if required. Grid is simple array where x and y are mapped as position.

Game.prototype.pos = function( x, y ) {
    return y*this.width+x;
}
Game.prototype.xy = function( pos ) {
    return {
        x:parseInt(pos%this.width),
        y:parseInt(pos/this.width)
    }
}

Grid array values are integers where content is bit masked. Higher bits are used to flag if grid location has slab and apple, empty or number.

var SLAB_MASK = Math.pow(2, 16);
var APPLE_MASK = Math.pow(2, 15);

// grid location (5, 6) has slab and number 3
var pos = this.pos(5, 6);
this.grid[pos] = 3;
this.grid[pos] |= SLAB_MASK;

The slab is removed from location just by negating it

this.grid[pos] &= ~SLAB_MASK;

Draw loop just checks for each position and checks with mask what it contains

for ( var y=0; y < this.height; y++ ) {
    for ( var x=0; x < this.width; x++ ) {
        // Draw each tile
        var s = this.pos(x,y);
       if (s & SLAB_MASK) {
          // Still covered tile
         this.ctx.drawImage( this.slab, x * this.resolution, y * this.resolution )

       } else if (s & APPLE_MASK) {
         // Uncovered apple
         this.ctx.drawImage( tile, x * this.resolution + 2, y * this.resolution + 2 )
       } else if (s > 0) {
         // Neighbour number
         this.ctx.fillText( '' + s ,
                            x * this.resolution + this.resolution/2,
                            y * this.resolution + this.resolution/2)
       }
    }
}

When player clicks on empty tile, recursive function walks through the grid clearing slabs from adjacent empty tiles.

function _empty( x, y, force ) {
    if ( x < 0 || x >= that.width || y < 0 || y >= that.height ) return;

    var pos = that.pos(x, y);
    var d = that.grid[pos];

    if (d && (d & SLAB_MASK) && (force || !(d & APPLE_MASK))) {

        that.grid[pos] &= ~SLAB_MASK; // clear out slab

        // Clear next neighbor if this is empty tile
        if (that.grid[pos] == 0) {
            _empty(x, y - 1) // north
            _empty(x, y + 1) // south
            _empty(x - 1, y) // west
            _empty(x - 1, y - 1) // north west
            _empty(x - 1, y + 1) // south east
            _empty(x + 1, y) // east
            _empty(x + 1, y - 1) // north east
            _empty(x + 1, y + 1) // south east
        }
    }
}

Code is available at GitHub.

Apple Push Notifications with Haskell

In series of language evaluation it’s this time Push notifications with Haskell. Haskell is a pure functional language with strong static typing and as such is not ideally suited for IO code (networking) with dynamic data (JSON). Let’s see how it compares with the others. So far in the series

Step 1. Prerequisites

This code uses GHC 7.4.2 (Haskell compiler). On OS/X its easiest to install with port (or homebrew)

    $ sudo port install ghc
    $ ghc --version
    The Glorious Glasgow Haskell Compilation System, version 7.4.2

This installs the compiler and interpreter. You will also need cabal package manager. Base Haskell installation has surprisingly few tricks out of the box and lots of libraries are needed.

    $ sudo port install hs-cabal
    $ cabal-0.14.0 update
    $ cabal-0.14.0 install cabal-install

After this cabal command can be run from ~/.cabal/bin/cabal

  • Read introduction to Apple Push here and get application and private key sandbox certificates as .pem files.
  • And of course you need to have 32 byte push token from your iOS application.

Step 1. The Utilities.

Hexadecimal to binary

Haskell like many similar languages encourage coding style where programs are written as composition of small simple functions. We’ll start with one that encodes hexadecimal string to ByteString. ByteString is Haskell’s practical presentation of binary data.

import qualified Data.ByteString as B
import GHC.Word (Word8)
import Data.Convertible (convert)
import Data.Char (ord, chr, toUpper)
import Data.Bits (shift, (.|.), (.&.)) 

hexToByteString :: String -> B.ByteString
hexToByteString s
  | null s = B.empty
  | otherwise = B.pack . hexToWord8 $ s
  where
    hexToWord8 :: String -> [Word8]
    hexToWord8 [] = []
    hexToWord8 [x] = error "Invalid hex stream"
    hexToWord8 (x:y:xs) = [ hn .|. ln ] ++ hexToWord8 xs
      where
        hn = (shift (decodeNibble x) 4)
        ln = decodeNibble y
        decodeNibble c
          | o >= oA && o <= oF = convert (o - oA + 10) :: Word8
          | o >= o0 && o <= o9 = convert (o - o0) :: Word8
          | otherwise = error $ "Invalid hex: " ++ [c]
          where o = ord . toUpper $ c
                oA = ord 'A'
                oF = ord 'F'
                o0 = ord '0'
                o9 = ord '9'

Note the imports for Word8 and convert. Haskell as statically typed language requires that every single item has to have unambiguous data type. Imported functions are needed to manipulate and convert data so we can get from String (that is list of Char’s) to list of 8bit bytes (list of Word8’s) finally to ByteString

Install these packages to import Data.Convertible and ByteString

~/.cabal/bin/cabal install convertible
~/.cabal/bin/cabal install bytestring

Save file as Hex.hs and try it out

$ ghci
GHCi, version 7.4.2: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude> :load Hex.hs
[1 of 1] Compiling Util.Hex         ( Hex.hs, interpreted )
Ok, modules loaded: Util.Hex.
*Util.Hex> hexToByteString "AABBCC"
"\170\187\204"
*Util.Hex> :type hexToByteString "AABBCC"
hexToByteString "AABBCC" :: B.ByteString
*Util.Hex> hexToByteString "Something"
"*** Exception: Invalid hex stream
*Util.Hex> hexToByteString "AA"
"\170"

Seems to be working.

JSON encoding

Next step is function that produces a UTF8 encoded JSON object that contains our push notification payload. We use simple object that contains only the alert message.

Save this in file Push.hs

import Text.JSON as JSON
import qualified Data.ByteString.UTF8 as BU

getJSONWithMessage :: String -> JSObject (JSValue)
getJSONWithMessage msg =
  let jmsg = JSString (toJSString msg) in
  toJSObject [("aps",
               JSObject (toJSObject [("alert", jmsg)]))]

Install packages for UTF8 ByteString and Text.JSON

~/.cabal/bin/cabal install json
~/.cabal/bin/cabal install utf8-string

Check that we get correctly formatted JSON string

Prelude> :load Push.hs
[1 of 1] Compiling Main             ( Push.hs, interpreted )
Ok, modules loaded: Main.
*Main> getJSONWithMessage "Hello World"
*Main> getJSONWithMessage "Hello World"
JSONObject {fromJSObject = [("aps",JSObject (JSONObject {fromJSObject = [("alert",JSString (JSONString {fromJSString = "Hello World"}))]}))]}
*Main> JSON.encode . getJSONWithMessage $ "Hello World"
"{\"aps\":{\"alert\":\"Hello World\"}}"

Lets also check that messages with non-ASCII character can be supported

*Main BU> BU.fromString . JSON.encode . getJSONWithMessage $ "Mötörhead"
"{\"aps\":{\"alert\":\"M\195\182t\195\182rhead\"}}"
*Main BU> 

The encoded JSON looks fine.

Building the PDU

For this purpose we use Put that supports building Lazy BinaryString’s with content

import qualified Data.ByteString as B
import qualified Data.ByteString.UTF8 as BU
import qualified Data.ByteString.Lazy as BL
import Data.Binary.Put
import GHC.Word (Word32, Word16)
import Data.Convertible (convert)

buildPDU :: B.ByteString -> BU.ByteString -> Word32 -> Put
buildPDU token payload expiry
  | (B.length token) /= 32 = fail "Invalid token"
  | (B.length payload > 255) = fail "Too long payload"
  | otherwise = do
    putWord8 1 -- command
    putWord32be 1 -- transaction id, can be anything
    putWord32be expiry  -- expiry time as seconds from epoch
    putWord16be ((convert $ B.length token) :: Word16) -- length of token
    putByteString token  -- push token
    putWord16be ((convert $ B.length payload) :: Word16) - payload length
    putByteString payload -- the json encoded as utf-8 string

We need also simple function to compute expiry time as relative to now

import Data.Time.Clock.POSIX (getPOSIXTime)

getExpiryTime :: IO (Word32)
getExpiryTime = do
  pt <- getPOSIXTime
  -- One hour expiry time
  return ( (round pt + 60*60):: Word32)

Install packages for Data.Binary.Put

~/.cabal/bin/cabal install binary

Step 2. Connecting to the Server

Make sure you have the certificate files (cert.pem and key-noenc.pem).

import Network.Socket
import Network.BSD (getHostByName, hostAddress, getProtocolNumber)
import OpenSSL
import OpenSSL.Session as SSL (
  context,
  contextSetPrivateKeyFile,
  contextSetCertificateFile,
  contextSetCiphers,
  contextSetDefaultCiphers,
  contextSetVerificationMode,
  contextSetCAFile,
  connection,
  connect,
  shutdown,
  write,
  read,
  SSL,
  VerificationMode(..),
  ShutdownType(..)
  )

main = withOpenSSL $ do
  -- Prepare SSL context
  ssl <- context
  contextSetPrivateKeyFile ssl "key-noenc.pem"
  contextSetCertificateFile ssl "cert.pem"
  contextSetDefaultCiphers ssl
  contextSetVerificationMode ssl SSL.VerifyNone

  -- Open socket
  proto <- (getProtocolNumber "tcp")
  he <- getHostByName "gateway.sandbox.push.apple.com"
  sock <- socket AF_INET Stream proto
  Network.Socket.connect sock (SockAddrInet 2195 (hostAddress he))

  -- Promote socket to SSL stream
  sslsocket <- connection ssl sock
  SSL.connect sslsocket  -- Handshake

  -- we'll send pdu here

Install OpenSSL package

~/.cabal/bin/cabal install HsOpenSSL

Step 3. Send the PDU

Now we’re finally ready to send the actual push notification message. Replace the push token to your own in following example.


...
  -- Promoto socket to SSL stream
  sslsocket <- connection ssl sock
  SSL.connect sslsocket  -- Handshake

  expiration <- getExpiryTime
  -- we send pdu here
  let token = "6b4628de9317c80edd1c791640b58fdfc46d21d0d2d1351687239c44d8e30ab1"
      message = "Hello World"
      btoken = hexToByteString token
      payload = BU.fromString . JSON.encode . getJSONWithMessage $ message
      lpdu = runPut $ buildPDU btoken payload expiration  -- build binary pdu
      pdu = toStrict lpdu  -- from lazy bytestring to strict
    in do
    SSL.write sslsocket pdu
    SSL.shutdown sslsocket Unidirectional -- Close gracefully
  where
    toStrict = B.concat . BL.toChunks

Then compile and run program

$ ghc -threaded -o Push Push.hs
[1 of 2] Compiling Hex              ( Hex.hs, Hex.o )
[2 of 2] Compiling Main             ( Push.hs, Push.o )
Linking Push ...
$ ./Push

If everything went fine, the program exits within few seconds and you’ll see your push notification appear on your iOS device.

Full source of this example available here: https://github.com/tikonen/blog/tree/master/apn-haskell

Interpreting Go Socket Errors

Go sockets returns error variables when something goes wrong, and the different error codes are documented here in the Go documentation. However I was not able to find coherent example that would show how the error variable is supposed to be used. Canonical way seems to be just checking it against nill and dump it out in case it’s something else, like this:

n, err := conn.Read(buffer[:])
if err != nill {
    fmt.Printf("%v\n", err)
}

Real applications (especially system applications) need to branch based on error to recover properly, so just error description is not enough. I made here example what it’s possible to deduct from the error variable.

conn, err := net.Dial("tcp", "", "example.com:80")
n, err := conn.Read(buffer[:])

if err != nil {

    // print error string e.g.
    // "read tcp example.com:80: resource temporarily unavailable"
    fmt.Printf("reader %v\n", err)

    // print type of the error, e.g. "*net.OpError"
    fmt.Printf("%T", err)

    if err == os.EINVAL {
      // socket is not valid or already closed
      fmt.Println("EINVAL");
    }
    if err == os.EOF {
      // remote peer closed socket
      fmt.Println("EOF");
    }

    // matching rest of the codes needs typecasting, errno is
    // wrapped on OpError
    if e, ok := err.(*net.OpError); ok {
       // print wrapped error string e.g.
       // "os.Errno : resource temporarily unavailable"
       fmt.Printf("%T : %v\n", e.Error, e.Error)
       if e.Timeout() {
         // is this timeout error?
         fmt.Println("TIMEOUT")
       }
       if e.Temporary() {
         // is this temporary error?  True on timeout,
         // socket interrupts or when buffer is full
         fmt.Println("TEMPORARY")
       }

      // specific granular error codes in case we're interested
     switch e.Error {
        case os.EAGAIN:
           // timeout
           fmt.Println("EAGAIN")
       case os.EPIPE:
          // broken pipe (e.g. on connection reset)
          fmt.Println("EPIPE")
       default:
          // just write raw errno code, can be platform specific
          // (see syscall for definitions)
          fmt.Printf("%d\n", int64(e.Error.(os.Errno)))
     }
 }

For example in case read times out, the code would print following

read tcp 192.0.32.10:80: resource temporarily unavailable
*net.OpError
os.Errno : resource temporarily unavailable
TIMEOUT
TEMPORARY
EAGAIN

Apple Push Notifications with Go language

I started to familiarize myself to the Go language, and decided to do the usual try out, i.e. sending Apple Push Notifications. It’s my personal usability benchmark for new programming environments. So far in the series

Step 1. Prerequisites

Get and build Go. Example here was done on Ubuntu 10.04 LTS x64 with Go installed based on instructions here at Go getting started guide.

  • Read introduction to Apple Push here and get application and private key sandbox certificates as .pem files.
  • And of course you need to have 32 byte push token from your iOS application.

Step 2. The Code.

The code here is complete, copy it to file apn.go or get it from Github.

Make sure you change the certificate files (cert.pem and key-noenc.pem) to point to your own certificate files. Also, replace the push token with your own push token, it’s written as hexadecimal string in this example for clarity.

package main

import (
   "crypto/tls"
   "fmt"
   "net"
   "json"
   "os"
   "time"
   "bytes"
   "encoding/hex"
   "encoding/binary"
)

func main() {

   // load certificates and setup config
   cert, err := tls.LoadX509KeyPair("cert.pem", "key-noenc.pem")
   if err != nil {
       fmt.Printf("error: %s\n", err.String())
       os.Exit(1)
   }
   conf := &tls.Config {
        Certificates: []tls.Certificate{cert},
   }

   // connect to the APNS and wrap socket to tls client
   conn, err := net.Dial("tcp", "", "gateway.sandbox.push.apple.com:2195")
   if err != nil {
      fmt.Printf("tcp error: %s\n", err.String())
      os.Exit(1)
   }
   tlsconn := tls.Client(conn, conf)

   // Force handshake to verify successful authorization.
   // Handshake is handled otherwise automatically on first
   // Read/Write attempt
   err = tlsconn.Handshake()
   if err != nil {
      fmt.Printf("tls error: %s\n", err.String())
      os.Exit(1)
   }
   // informational debugging stuff
   state := tlsconn.ConnectionState()
   fmt.Printf("conn state %v\n", state)

   // prepare binary payload from JSON structure
   payload := make(map[string]interface{})
   payload["aps"] = map[string]string{"alert": "Hello Push"}
   bpayload, err := json.Marshal(payload)

   // decode hexadecimal push device token to binary byte array
   btoken, _ := hex.DecodeString("6b4628de9317c80edd1c791640b58fdfc46d21d0d2d1351687239c44d8e30ab1") 

   // build the actual pdu
   buffer := bytes.NewBuffer([]byte{})
   // command
   binary.Write(buffer, binary.BigEndian, uint8(1))

   // transaction id, optional
   binary.Write(buffer, binary.BigEndian, uint32(1))

   // expiration time, 1 hour
   binary.Write(buffer, binary.BigEndian, uint32(time.Seconds() + 60*60))

   // push device token
   binary.Write(buffer, binary.BigEndian, uint16(len(btoken)))
   binary.Write(buffer, binary.BigEndian, btoken)

   // push payload
   binary.Write(buffer, binary.BigEndian, uint16(len(bpayload)))
   binary.Write(buffer, binary.BigEndian, bpayload)
   pdu := buffer.Bytes()

   // write pdu
   _, err = tlsconn.Write(pdu)
   if err != nil {
      fmt.Printf("write error: %s\n", err.String())
      os.Exit(1)
   }

   // wait for 5 seconds error pdu from the socket
   tlsconn.SetReadTimeout(5*1E9)

   readb := [6]byte{}
   n, err := tlsconn.Read(readb[:])
   if n > 0 {
     fmt.Printf("received: %s\n", hex.EncodeToString(readb[:n]))
   }

   tlsconn.Close()
}


Step 3. Compile and Run

Simple

$ 6g apn.go
$ 6l apn.6
$ ./6.out
conn state {true 47}
$

If everything went fine, the program exits within few seconds and  you’ll see your push notification appear on your iPhone.

Socket binary data stream parsing in Erlang

Erlang code has two different ways of reading from the socket, active and passive. In passive mode, your code calls recv to the socket to receive bytes. In active mode you install controlling process to the socket and receive data as Erlang messages.

Binary data parsing is more complicated on the latter, as application code doesn’t have any control over size of the packets (or flow control for that matter) that it receives. It could be 1 byte, few kilobytes or whatever. Naive pattern matching fails if you don’t get exactly the amount of bytes you want.

Lets assume simple binary protocol, where you need to read packets that are varying in length and formatted as following. Timestamp is defined in first 4 bytes, next 2 bytes define length of payload followed by the payload itself.

|    Timestamp   |  Len  |    Payload      |
|  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | ... |

Incidentally, this is the data format used by Apple Push notification feedback service.

First lets define a function that accepts binary Data and process Pid where it sends parsed result. This tail recursive function simply matches as many packets from data as possible, and returns with remaining unparsed data.

match_data(Data, Parent) ->
   case Data of
      <<Timestamp:32/big, Size:16/big, PushToken:Size/binary-unit:8, Rest/binary>> ->
         Parent ! {Timestamp, PushToken}, % notify parent
         %% parse rest of the data
        match_data(Rest, Parent);
      Rest ->
         %% no match
        Rest
   end.

Then function that actually receives the data from the socket. It receives arbitrary pieces of data, concatenates it to existing unparsed data and calls the match_data handler to make actual packet matching. Then it loops again with unparsed portion of data.

loop(Bin, Parent) ->
   receive
      {_, _Sock, Data} ->
         loop(match_data(erlang:list_to_binary([Bin, Data]), Parent), Parent);
      {ssl_closed, _Sock} -> ok;
      {_event, _Event} ->
         error_logger:error_msg("Unexpected", [_event, _Event])
   end.

Install the loop as controlling process, with initially empty “seed” data.

Pid = self(),
ssl:setopts(Sock, [{active, true}, {mode, binary}]),
ssl:controlling_process(Sock, spawn(fun() -> loop(<<>>, Pid) end)).

This way data is parsed correctly, no matter what size of chunks data is returned from the socket.