| « The televised funeral of Michael Jackson | Real Time Photo: Mmm, bready » |
imap2maildir: a tool for mirroring IMAP to maildir
Link: http://github.com/rtucker/imap2maildir/
For awhile now, I've had that paranoia kicking in about my online data. Almost all of us have a lot of useful information out there that is entirely under someone else's control: if someone messes up, or the wrong component fails, or the wrong business process fails, or a company goes out of business with your data as their asset, poof! It's gone. The only way to ensure that your data has a chance to survive is by properly backing it up. I am a heavy user of Gmail, Google's gift to those who love e-mail. It's got a solid web interface, spam filtering that lets you forget spam exists, and (best of all) IMAP support. It's also got a huge-ass quota, so you can keep your mail around forever. This has resulted in me accumulating over 80,000 e-mails so far. That's a lot. In the unlikely event Gmail loses my mailbox, I'd be right miffed. However, I'd at least have my mail, thanks to... imap2maildir! Faced with a lack of any quick and reliable way to do exactly what I wanted to do, I wrote a quick script to incrementally back up any IMAP mailbox (defaulting to Gmail) to a local maildir store. This stores messages in individual files, and is readable by pretty much any IMAP server one might need to deploy in a hurry. This script kind of sucks, but that's what open source is all about, right? It's got a few limitations and is lacking some important stuff, and there's no companion maildir2imap tool yet, but I suppose if Gmail does eat my mailbox, I'll be motivated to create one. So, give it a spin, let me know what you think. It will definitely work fine under Linux, although you should, in theory, be able to use it with Windows (assuming the maildir filenames aren't eaten by NTFS). You can download it from http://github.com/rtucker/imap2maildir/, or you can clone the repository withgit clone git://github.com/rtucker/imap2maildir.git. 5 comments
Comment from: jd [Visitor]
Thanks for sharing this.
I was able to grab a few messages from Gmail, but eventually I get a MemoryError in Python's imaplib, on line 1150:
data = self.sslobj.read(size-read)
I understand this isn't your code but do you have any ideas on how to fix this? This is using Python 2.5 (on Windows, but that shouldn't matter).
Thanks...
I was able to grab a few messages from Gmail, but eventually I get a MemoryError in Python's imaplib, on line 1150:
data = self.sslobj.read(size-read)
I understand this isn't your code but do you have any ideas on how to fix this? This is using Python 2.5 (on Windows, but that shouldn't matter).
Thanks...
08/07/09 @ 18:30
I have noticed that on my end, too. I've been working on a number of major changes in a new branch, which should help with memory performance... not yet ready for prime time (and it kinda took the back seat to some other projects recently), but I think it will be better in the long run.
Glad to know it at least tries to work under Windows! :-)
The new branch is at: http://github.com/rtucker/imap2maildir/tree/newiterator
Glad to know it at least tries to work under Windows! :-)
The new branch is at: http://github.com/rtucker/imap2maildir/tree/newiterator
08/08/09 @ 12:16
OK, I've reworked a lot of the IMAP handling stuff in the newiterator branch, and am able to process my mailbox of great girth (92,470 messages) with no abnormal memory issues.
It is slower, alas. It took about 10 hours on my mailbox (!), so I've added some additional caching and verification of the local files (aka "turbo mode"). This has reduced the time on my mailbox down to a less-unreasonable ~3 hours.
I think I might be able to do better, but this is a huge mailbox, and that's why we have cron, right?
On a 192-message mailbox:
master branch: 5.0 seconds
newiterator branch, no turbo mode: 33.0 seconds
newiterator branch, turbo mode: 2.1 seconds
But more importantly: no memoryerror crash!
I'll probably merge newiterator into master early next week, unless something bursts into flame... -rt
It is slower, alas. It took about 10 hours on my mailbox (!), so I've added some additional caching and verification of the local files (aka "turbo mode"). This has reduced the time on my mailbox down to a less-unreasonable ~3 hours.
I think I might be able to do better, but this is a huge mailbox, and that's why we have cron, right?
On a 192-message mailbox:
master branch: 5.0 seconds
newiterator branch, no turbo mode: 33.0 seconds
newiterator branch, turbo mode: 2.1 seconds
But more importantly: no memoryerror crash!
I'll probably merge newiterator into master early next week, unless something bursts into flame... -rt
08/14/09 @ 12:38