CouchDB cleanup script for purging old docs
May 31, 2012 Leave a comment
CouchDB does not have straightforward ways to clean up old data. This is one simple way do delete entries by date, but it requires that
- Your documents have date or timestamp property
- There is view for each database to fetch documents for that property
Prerequisities
- Node.js
- jss module, i.e. ‘npm install jss’.
1. Prepare views for Cleanup
Define view in each database that needs regular cleanup. Use something like this where emitted key field is timestamp in seconds.
views: { created: { map: function(doc) { if ( doc.created ) { emit(doc.created, doc._rev); } } } .... }
2. The Cleanup script
Script queries old doc ids from the cleanup view and marks them as deleted. Documents are not deleted immediately but are removed physically on next CouchDB compact. CouchDB 1.2.0 supports autocompact so just enable it and don’t worry about it.
#!/bin/bash DBHOST=localhost # Get key for entries that are over 6 months old. This assumes that created view can be queried using timestamps as keys. if uname -a | grep -i darwin > /dev/null then TODAY=$(date '+%Y-%m-%d') MONTHSAGO=$(date -v -24w '+%Y-%m-%d') MONTHSAGO_E=$(date -v -24w '+%s') else TODAY=$(date '+%Y-%m-%d') MONTHSAGO=$(date -d '24 weeks ago' '+%Y-%m-%d') MONTHSAGO_E=$(date -d '24 weeks ago' '+%s') fi PATH=$PATH:/usr/local/bin # JSON scripting tool JSS=$(npm bin)/jss cleanup() { DATABASE=$1 DESIGN=$2 echo "Cleaning $DATABASE/$DESIGN" curl --silent -S http://$DBHOST:5984/$DATABASE/_design/$DESIGN/_view/created?endkey=$MONTHSAGO_E | \ $JSS --bulk_docs '$.id' '{_id: $.id, _rev:$.value, _deleted:true}' | \ curl --silent -S -X POST -d @- -H "Content-Type:application/json" http://$HOST:$PORT/$DATABASE/_bulk_docs | \ sed 's/\({[^}]*}\),/\1\n/g' | tr -d '[]' | \ $JSS '$.ok != true' } echo "STATS CLEANUP <= $MONTHSAGO - Start" `date` # Put databases and views here cleanup somedb1 someview cleanup somedb1 otherview cleanup somedb2 alsoview echo "STATS CLEANUP - Done" `date`
The script does this
- Get expired docs, e.g. curl ‘http://localhost:5984/mydatabase/_design/mydesign/_view/created?endkey=1337049581’
- Build bulk doc delete request (jss)
- Issue delete bulk request (curl post)
- sanitize couchdb output, i.e. add newlines and remove brackets (sed)
- print failed ones
Note that default version of jss doesn’t output proper JSON if no documents are found, use my fork to workaround this problem if you dont want to see errors in logs.
npm install https://github.com/tikonen/jss/tarball/master