The best of a bad lot
« The televised funeral of Michael JacksonReal Time Photo: Mmm, bready »

imap2maildir: a tool for mirroring IMAP to maildir

Permalink 07/04/09 09:25, by admin, Categories: Geekery, Howto , Tags: , , , , , , , , , , ,

Link: http://github.com/rtucker/imap2maildir/

For awhile now, I've had that paranoia kicking in about my online data. Almost all of us have a lot of useful information out there that is entirely under someone else's control: if someone messes up, or the wrong component fails, or the wrong business process fails, or a company goes out of business with your data as their asset, poof! It's gone. The only way to ensure that your data has a chance to survive is by properly backing it up. I am a heavy user of Gmail, Google's gift to those who love e-mail. It's got a solid web interface, spam filtering that lets you forget spam exists, and (best of all) IMAP support. It's also got a huge-ass quota, so you can keep your mail around forever. This has resulted in me accumulating over 80,000 e-mails so far. That's a lot. In the unlikely event Gmail loses my mailbox, I'd be right miffed. However, I'd at least have my mail, thanks to... imap2maildir! Faced with a lack of any quick and reliable way to do exactly what I wanted to do, I wrote a quick script to incrementally back up any IMAP mailbox (defaulting to Gmail) to a local maildir store. This stores messages in individual files, and is readable by pretty much any IMAP server one might need to deploy in a hurry. This script kind of sucks, but that's what open source is all about, right? It's got a few limitations and is lacking some important stuff, and there's no companion maildir2imap tool yet, but I suppose if Gmail does eat my mailbox, I'll be motivated to create one. So, give it a spin, let me know what you think. It will definitely work fine under Linux, although you should, in theory, be able to use it with Windows (assuming the maildir filenames aren't eaten by NTFS). You can download it from http://github.com/rtucker/imap2maildir/, or you can clone the repository with git clone git://github.com/rtucker/imap2maildir.git.

8 comments

Comment from: jd [Visitor]
****-
jdThanks for sharing this.

I was able to grab a few messages from Gmail, but eventually I get a MemoryError in Python's imaplib, on line 1150:

data = self.sslobj.read(size-read)

I understand this isn't your code but do you have any ideas on how to fix this? This is using Python 2.5 (on Windows, but that shouldn't matter).

Thanks...
08/07/09 @ 18:30
Comment from: Ryan [Member] Email
RyanI have noticed that on my end, too. I've been working on a number of major changes in a new branch, which should help with memory performance... not yet ready for prime time (and it kinda took the back seat to some other projects recently), but I think it will be better in the long run.

Glad to know it at least tries to work under Windows! :-)

The new branch is at: http://github.com/rtucker/imap2maildir/tree/newiterator
08/08/09 @ 12:16
Comment from: jd [Visitor]
jdLooking forward to your changes...

08/08/09 @ 23:03
Comment from: Ryan [Member] Email
RyanOK, I've reworked a lot of the IMAP handling stuff in the newiterator branch, and am able to process my mailbox of great girth (92,470 messages) with no abnormal memory issues.

It is slower, alas. It took about 10 hours on my mailbox (!), so I've added some additional caching and verification of the local files (aka "turbo mode"). This has reduced the time on my mailbox down to a less-unreasonable ~3 hours.

I think I might be able to do better, but this is a huge mailbox, and that's why we have cron, right?

On a 192-message mailbox:

master branch: 5.0 seconds
newiterator branch, no turbo mode: 33.0 seconds
newiterator branch, turbo mode: 2.1 seconds

But more importantly: no memoryerror crash!

I'll probably merge newiterator into master early next week, unless something bursts into flame... -rt
08/14/09 @ 12:38
Comment from: jd [Visitor]
jdSent you a msg via gmail regarding your recent changes.
08/15/09 @ 13:43
Comment from: Miblon [Visitor]
*****
MiblonExcelent work and way to go! The best open source products evolved because of people needing it and by putting your code on github you created an good starting point!

I will give it a spin later next week or so. I intend to have it running on one of my own servers that then backups to another one of my servers :D. I am a bit of a python programmer myself, so if I come across something I think can help enhance your code, you will find it in github too.

Respect!

On Oct 22, Miblon wrote "asus eee 1005HA-H met moblin 2.1".

06/26/10 @ 22:01
Comment from: Ryan [Member] Email
RyanThanks! :-)

I read last night about Quickly in the latest Linux Journal... it's a framework for something the article referred to as "opportunistic programming." Putting a name on, and giving respect to, my style and motivation for programming was pretty awesome. (The framework looks pretty awesome, too.)
06/29/10 @ 13:47
Comment from: Buy Online [Visitor]
Buy OnlineUnlike you I do not have the same paranoia, I have trust in the abilities of Google to keep all my data safe. You have a great idea there by the way and it would work for those who have the same fears.
02/27/11 @ 12:33

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
Code:
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
Blog posts come from a can. They were put there by a man in a factory downtown.

Recent Twitterings

    Stalk me with RSS

    Search the Blog

     

    Support the Beer Fund

    Powered by Linode: Life's too short for crappy hosting

    [Powered by Linode]

    powered by b2evolution

    © 1962-2012 by Ryan Tucker (Public Key)

    Contact | Help | b2evolution skin by Asevo | blogging tool | blog hosting | Francois