Read my inbox..

October 25, 2016| Mike Stott

What? What’s this crazy title..  read my inbox? Isn’t that me giving away all of my messages?

OK… so maybe not read my inbox. More like analyse my inbox and you can read the outcome. Why am I doing this?

Check out how many messages I have…  16,764..  that’s quite a lot. I’m sure it’s not that many by most standards, but the size of my inbox (well over 1GB when including attachments) was starting to scare me.

Oh. That’s just my inbox for mike@epicplugins.comby the way. What I if lost access. How many customer emails have I missed storing in my CRM through only starting recently with Zero BS CRM.

These are all questions which I set out to answer. How many times were these people emailing me. Was I missing some customer value metric (contact points) which I could add to my customer records.

Email Address Book

This is a simple way of getting at the contact emails from your inbox. But for that you actually need to have been adding people to your address book. Scratch that..

Hidden data in my inbox

screen-shot-2016-10-24-at-17-42-24

I’m sat in the airport waiting for my connecting flight so I wanted to do something quirky. Something which is fun for me. Something which might be useful for you to do too.

Do you know how many emails you get to your inbox each day? Do you know which senders send you the most? How about what time of the day do your emails come through? All this can be gleaned from analysing your inbox. What else can you find out?

  • Can you cut down the amount of email hitting your inbox through the use of automatic filters
  • Are you getting notifications (such as a new user notification) clogging up your email

I include the script I used below if you want to do this yourself…

I didn’t know I could export my full inbox

Whoops. My bad. FacePalm. I’ve always used RoundCube for my business email webmail. I’m also pretty bad at filtering or moving things out of my inbox.

Using Horde I can export my full inbox. Wicked. It downloads in a massive data file (it downloads all the attachment data too).

So what do I naturally do with a lot of data. I wanna crunch some numbers on it. But like any problem of this kind, exporting a big inbox. Here’s some problems I started to run into

  1. 16,000+ emails is a lot of data, especially when you include attachments
  2. Attachments can be pretty big (my largest email was 30MB from peopleperhour – they attached a ton of videos)

Combine those two and the size of the resulting MBOX file was pretty scary and started to break things like trying to download the full inbox. Ouch. OK thinking outside the box time. Literally outside the inbox.. Once I had the data out I could run my script on it and start analysing the data. But getting the data out was proving difficult (this is over not so great WiFi).

Some email sorting is required first…

To get around the problems above I did a bit of email sorting starting with creating a new Folder for emails with attachments. Roundcube let me filter emails ‘with attachments’ then I moved these into an ‘Attachments’ folder (and out of my inbox). Then heading into the Roundcube ‘Settings’ you can see how big your resulting folder is.

attachment-size

But even after this step I had a TON of large emails in my inbox (still c850MB). So onto another look at options to manage the inbox and get the data OUT to run some analysis. Help.. OK… website backup. That’ll have the info I need..

Website Backup

A good place to access your email backup is within your website backup via cpanel. This gives you all your data including all your emails (in the homedir->mail->[domain]->mike->cur). This gave me all my messages in a /dir/ which I then needed a script to run through.

I decided to put the basic information into a simple DB so I can then run various date based queries. I didn’t store any subject lines (but it would be relatively easy to do so) if I wanted to analyse keywords, or anything else in my email history.

The Script

OK, so first thing first is to create the DB table, run this simple SQL

CREATE TABLE `messages` 
( `id` int(10) unsigned NOT NULL, 
`email` varchar(255) COLLATE latin1_general_ci NOT NULL, 
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ) 
ENGINE=InnoDB AUTO_INCREMENT=72576 DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;

Next is to loop through the messages in the folder (I moved these to my localhost)

ini_set('max_execution_time', 300);  //one off... in case of a lot of emails..
$messages = array();
$files = scandir('maildir');
$i=0;
foreach($files as $file){
	if($file != '.' && $file != '..'){
		$handle = fopen("maildir/" . $file , "r");
		if ($handle) {
		    while (($buffer = fgets($handle)) !== false) {
				if (strpos($buffer,'Delivery-date: ')  !== false){
					$messages[$i]['date'] = substr($buffer, 15);
					$timestamp = strtotime($messages[$i]['date']);
				}
				if (strpos($buffer,'Return-path: ')  !== false){
					$em = substr($buffer, 13);
					$em = extract_email_address($buffer);
					$messages[$i]['email'] = $em[0];  //should only be one email.
				}
		    }
		    mySQL_add_record($messages[$i]['email'], $timestamp);
		    if (!feof($handle)) {
		        echo "Error: unexpected fgets() fail\n";
		    }
		    fclose($handle);
		}
	}
	$i++;
}

The function is hopefully straightforward. It scans the folder where your individual email messages are (I called this maildir) it then analyses each file to get the date it was sent and the email (from the Return-path). To do this it extracts the email it finds on that line using the following function

function extract_email_address ($string) {
    foreach(preg_split('/\s/', $string) as $token) {
        $email = filter_var(filter_var($token, FILTER_SANITIZE_EMAIL), FILTER_VALIDATE_EMAIL);
        if ($email !== false) {
            $emails[] = $email;
        }
    }
    return $emails;
}

I then do a simple mySQL store (with mySQL_add_record) which is below…  note I’ve been a bit naughty here and put the mySQL connect in with the code to add the new record. Plus I’m in control of the data so I don’t need to do any validation before inserting into my DB.

function mySQL_add_record($email, $timestamp){
	/**
	  * Connect to the database
	  * http://php.net/manual/en/mysqli.real-connect.php
	*/
	$user = '***';
	$pass = '***';
	$db   = '[db_name]';    //add your database name here...
	$host = 'localhost';
	$port = 3306;
	$link = mysqli_init();
	if (!$link) {
	    die('mysqli_init failed');
	}

	if (!$link->options(MYSQLI_INIT_COMMAND, 'SET AUTOCOMMIT = 0')) {
	    die('Setting MYSQLI_INIT_COMMAND failed');
	}

	if (!$link->options(MYSQLI_OPT_CONNECT_TIMEOUT, 5)) {
	    die('Setting MYSQLI_OPT_CONNECT_TIMEOUT failed');
	}

	if (!$link->real_connect(
	  $host,
	  $user,
	  $pass,
	  $db
	)) {
	    die('Connect Error (' . mysqli_connect_errno() . ') ' . mysqli_connect_error());
	}

	// Quick test
	// echo 'Success... ' . $link->host_info . "\n";

	$connection = $link->real_connect(
	  $host,
	  $user,
	  $pass,
	  $db
	);

	/**
	  * More testing
	*/
	// echo '$link: '; print_r($link); echo "\n";
	// echo '$connection: '; print($connection); echo "\n";
	// echo '$connection _r: '; print_r($connection); echo "\n";

	if (!$connection) {
	  die('Database connection failed.');
	}

	/**
	  * Insert data into table
	*/

	$d = date('Y-m-d H:i:s',$timestamp);

	$query  = "INSERT INTO messages (email, timestamp)";
	$query .= "VALUES ('$email', '$d')";

	echo "<pre>";
	echo $query;
	echo "</pre>";

	// $result = mysqli_query($connection, $query);
	$result = mysqli_query($link, $query);

	if (!$result) {
	  die('Error: ' . mysqli_error($link));
	}

	/**
	  * Close connection
	*/
	$link->close();
}

Finally, this is all running on my localhost, so none of my emails are spinning around on public servers.

The results..

So with less than a day of code and analysis I’ve been able to see roughly how many emails I get a month. In October I put my out of office on and directed people to the support inbox. This cut down my personal emails quite drastically. I also did less “work” and emails since I was on holiday.

screen-shot-2016-10-25-at-16-34-11

There’s more insights I can glean from this analysis that I’ll tweak and update the analysis in a few months…

  • I get a number of ‘you have a new user at epic plugins’ from ‘wordpress@epicplugins.com’ these clog up my inbox
  • Bounces from my convertkit sends hit this inbox (i.e. I get away from office emails a lot)
  • PayPal sends me a lot – I’ll filter these out into a separate folder

So. A useful exercise. If I can cut the number of emails down to under a 100 a month from almost 6x that it’ll really help my email management.

Some caveats

OK, the maildir extracting process from the host takes a fair bit of time. In future I’ll just do a targetted MBOX download from horde and analyse that. I don’t have too many years of data in my mailbox. If you’ve been in the game for years this might be painful to analyse using the maildir method.

But… give it a go. Let me know how you get on in the comments.

 

Categories: Epic Plugins

Leave a Reply

Your email address will not be published. Required fields are marked *

Rating*