CouchDB cleanup script for purging old docs
May 31, 2012 Leave a comment
CouchDB does not have straightforward ways to clean up old data. This is one simple way do delete entries by date, but it requires that
- Your documents have date or timestamp property
- There is view for each database to fetch documents for that property
Prerequisities
- Node.js
- jss module, i.e. ‘npm install jss’.
1. Prepare views for Cleanup
Define view in each database that needs regular cleanup. Use something like this where emitted key field is timestamp in seconds.
views: {
created: {
map: function(doc) {
if ( doc.created ) {
emit(doc.created, doc._rev);
}
}
}
....
}
2. The Cleanup script
Script queries old doc ids from the cleanup view and marks them as deleted. Documents are not deleted immediately but are removed physically on next CouchDB compact. CouchDB 1.2.0 supports autocompact so just enable it and don’t worry about it.
#!/bin/bash
DBHOST=localhost
# Get key for entries that are over 6 months old. This assumes that created view can be queried using timestamps as keys.
if uname -a | grep -i darwin > /dev/null
then
TODAY=$(date '+%Y-%m-%d')
MONTHSAGO=$(date -v -24w '+%Y-%m-%d')
MONTHSAGO_E=$(date -v -24w '+%s')
else
TODAY=$(date '+%Y-%m-%d')
MONTHSAGO=$(date -d '24 weeks ago' '+%Y-%m-%d')
MONTHSAGO_E=$(date -d '24 weeks ago' '+%s')
fi
PATH=$PATH:/usr/local/bin
# JSON scripting tool
JSS=$(npm bin)/jss
cleanup() {
DATABASE=$1
DESIGN=$2
echo "Cleaning $DATABASE/$DESIGN"
curl --silent -S http://$DBHOST:5984/$DATABASE/_design/$DESIGN/_view/created?endkey=$MONTHSAGO_E | \
$JSS --bulk_docs '$.id' '{_id: $.id, _rev:$.value, _deleted:true}' | \
curl --silent -S -X POST -d @- -H "Content-Type:application/json" http://$HOST:$PORT/$DATABASE/_bulk_docs | \
sed 's/\({[^}]*}\),/\1\n/g' | tr -d '[]' | \
$JSS '$.ok != true'
}
echo "STATS CLEANUP <= $MONTHSAGO - Start" `date`
# Put databases and views here
cleanup somedb1 someview
cleanup somedb1 otherview
cleanup somedb2 alsoview
echo "STATS CLEANUP - Done" `date`
The script does this
- Get expired docs, e.g. curl ‘http://localhost:5984/mydatabase/_design/mydesign/_view/created?endkey=1337049581’
- Build bulk doc delete request (jss)
- Issue delete bulk request (curl post)
- sanitize couchdb output, i.e. add newlines and remove brackets (sed)
- print failed ones
Note that default version of jss doesn’t output proper JSON if no documents are found, use my fork to workaround this problem if you dont want to see errors in logs.
npm install https://github.com/tikonen/jss/tarball/master