kwc.org Photos Spare Cycles MythBusters

Fixing broken MT links after server move/database crash

This entry is for anyone who's tried to move their MovableType installation or had their Berkeley DB crash and discovered that once they resurrect their blog, all of the URLs for their entries have changed. MovableType 3.0 fixes this problem by changing how URLs for entries are created, but their are plenty of people out there who still feel the pain (MovableType 2.x users may wish to innoculate themselves).

meta's database crashed a couple weeks ago, which means that a lot of her older entries were no longer displaying properly nor were comments working properly on those older entries, so I wrote a quick script that is similar to the innoculation technique above. This script can be used to 'fix' broken links after a Berkeley DB crash, or it can be used to move your entries from one server/host to another. It requires that you have the old monthly archive pages pre-crash/move, and once it's done it creates a single file that you have to upload to your Web server. That's it.

I've included the source code of this program in the full entry, but I don't expect people who find this entry to know how to pull apart the code, because if they did, then they would probably be able to write it themselves, so feel free to leave me a comment if you find yourself in a situation similar to the one described here and I'll tailor it to your needs.

I mostly did this because it took less than 30 minutes, and I really need to practice my python.

WARNING: I don't expect anyone to read anything after this point. The description and instructions are mostly for myself so I remember what this does, though I wrote enough that the particularly adventurous might be able to get it to work as well<

Description: compares your old monthly archive pages to your new monthly archive pages and generates an Apache .htaccess file that will seamlessly redirect requests from your old entry pages to your new pages.

Instructions (requires python):

Prepping your MovableType installation

  1. After re-importing your MT entries, make sure that you do not do a rebuild. You will need your old monthly archive pages and we don't want to overwrite them.
  2. If you are not running MovableType 3.0, follow Step 1 of the future-proofing MT tutorial, titled "Fix the Archive File Templates." If you are still running MovableType 2.x, you may want to consider upgrading to MovableType 3.0 at this time anyway.
  3. You should now rebuild your blog

Running the script

  1. save the code below as mtmonthly.py
  2. edit url_base, archive_old_base_short, archive_old_base, and archive_new_base
  3. run python -i mtmonthly.py
  4. type generate_htaccess(".htaccess")
  5. after its done, upload the .htaccess file to your old archives directory
  6. Try loading one of your old entry pages -- it should automatically be redirected to your new entry page.

Python code:

import urllib
import re
import sys

\#your web server
url_base = "http://www.metamanda.com/"

\#location of the archive for the new monthly pages. omit the server name
archive_old_base_short = "blog/oldstuff/"

\#location of the archive for the old monthly pages
archive_old_base = "http://www.metamanda.com/blog/archives/"

\#location of the archive for the new monthly pages
archive_new_base = "http://www.metamanda.com/blog/archives/"

example1 = 'Posted by metamanda at &lt;a href="http://www.metamanda.com/blog/oldstuff/000948.html"&gt;10:51 PM&lt;/a&gt;'

re_posted = re.compile('Posted by (.*?) at &lt;a href="'+url_base+'([^&gt;]*?).html">(.*?)&lt;/a&gt;')

def get_monthly(url):
    """get an array representing the relative URLs of all the posts for
    the specified month"""
    f = urllib.urlopen(url)
    buff = f.readline()
    post_urls = []
    while buff:
        group = re_posted.search(buff)
        if group:
            post_urls.append("/"+group.groups()[1]+".html")
        buff = f.readline()
    return post_urls

def get_month_year(month, year):
    """month is an int (1..12) as is year (2004)"""

    #fetch the old array
    if month < 10:
        str_month = '0' + str(month)
    else:
        str_month = str(month)
    old_archive_url_short = archive_old_base_short + str(year) + '_' + str_month + ".html"
    old_archive_url = archive_old_base + str(year) + '_' + str_month + ".html"
    print old_archive_url
    old_post_urls = get_monthly(old_archive_url)
    if not old_post_urls:
        print "No archives for %s/%s"%(month, year)
        return []

    \#fetch the new array
    new_archive_url = archive_new_base + str(year) + '/' + str_month + '/index.html'
    print new_archive_url
    new_post_urls = get_monthly(new_archive_url)

    if len(old_post_urls) is not len(new_post_urls):
        print "ERROR: monthly archive for %s/%s does not contain matching set of entries"%(month, year)
        return []

    redir_lines = []
    for (old, new) in zip(old_post_urls, new_post_urls):
        redir_lines.append("Redirect 301 %s %s"%(old, url_base+new[1:]))
    redir_lines.append("Redirect 301 /%s %s"%(old_archive_url_short, new_archive_url))
    return redir_lines

def get_year(year):
    redir_lines = []
    for month in range(1, 13):
        redir_lines.extend(get_month_year(month, year))
    return redir_lines

def generate_htaccess(output_file):
    redir_lines = get_year(2003)
    redir_lines.extend(get_year(2004))
    output = open(output_file, "w")
    for line in redir_lines:
        output.write(line+'\n')
    output.close()

Post a comment


tags.

related entries.

what is this?

This page contains a single entry from kwc blog posted on August 1, 2004 12:23 AM.

The previous post was More Firefox extensions installed.

The next post is What book are you.

Current entries can be found on the main page.