19 Mar 2013

At this time of writing, Posterous is an elegant, easy way to keep a blog. In 2011, I set up a simple blog to post articles I came across that had some angle on the interconnectedness of energy systems. I loved how I could simply send an email to <blog-name>@posterous.com and the article was posted.

However, Twitter bought Posterous as a modern way to hire their staff and then decided to shut Posterous down. All content will die end of April 2013. This is another hint that if you make something that lives online and there is a possibility you want to keep it, own the domain and be aware that you or someone you can ask for help needs to be able to curate the content over time (I started that blog just posting links, but then also wrote a summary or an opinion here or there - sometimes you don't know you'll want to keep something around for longer when you just started it).

Anyway - so I had to export the content from Posterous and import it into the MySQL database which currently underlines this website. Widely-used blogging software like Wordpress offers an importing tool, but everyone else is probably thinking how to get their Posterous data into their web publishing software.

Posterous offers an export in XML form, but it could have made it easier to deal with it. For instance,  it is hardly parsable XML, and the creation date is in a format used in emails. It took me a bit to get a simple Python script working, and I thought I'd share it here for anyone who needs to do something similar:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#!/usr/bin/python

from datetime import datetime
import re
import os
from bs4 import BeautifulSoup
from email.utils import parsedate_tz


def san(unicode_str):
    '''
    for sanitizing strings:
     - getting rid of all non-utf8 things
     - escape single quotations
    '''
    s = unicode_str.encode('utf-8', 'ignore')
    s = s.replace("\\'", "'") # first unescape escaped ones
    s = re.sub('''(['"])''', r'\\\1', s) # now we escape all
    return s


psql = open('posterous-import.sql', 'w')

xmls = [xf for xf in os.listdir('posts') if xf.endswith('.xml')]
for xf in xmls[:-2]:
    xfp = open('posts/{}'.format(xf), 'r')
    soup = BeautifulSoup(xfp.read())
    
    title = soup.find('title').text
    tt = parsedate_tz(soup.find('pubdate').text)
    tdt = datetime(tt[0], tt[1], tt[2], tt[3], tt[4], tt[5])
    sql_date = tdt.strftime('%Y-%m-%d %H-%M-%S')
    content = soup.find('content:encoded').text
    
    psql.write("INSERT INTO my_table \
                (title, content, inputdate) VALUES \
                ('{t}', '{c}', '{ed}');\n\n"\
               .format(t=san(title), c=san(content),
                ed=sql_date))

psql.close()

Well, it seemed to work for the contents of my Posterous blog, at least. I only cared about title, content and creation date of each post. 

Place the script above (download) into the folder you get from exporting your Posterous data (mine is called "space-3179959-energy-systems-ticker-0e4cc18feed3e8982ddae1ef4537b529") under the name of import.py (or whatever you like) and then call it via

python import.py

Then, you should find a file called posterous-import.sql which you can use to populate your Posterous posts into your own (MySQL) database. I used BeautifulSoup to be able to parse the XML, so you'll have to install that python library first, e.g. by

pip install beautifulsoup4

Also, you probably first would edit the INSERT statement in line 35 to match your specific table structure which you're importing to.

# lastedited 19 Mar 2013
You are seeing a selection of all entries on this page. See all there are.