Impenetrable title, no? The occasion is my completion of a 40-hour intensive Introduction to Hindi course. The title is the romanized rendition of a statement I really wanted to learn, thinking I would be the first person to ever render it into Hindi. Oh, what does it mean, you ask? By way of a hint, the phrase (in English) was first uttered here:
Alas, being first was a naïve hope. Omniglot has already published translations in many languages, including not only in Hindi, but also in Marathi, Punjabi and Tamil, just some of the different “state languages” in India. This phrase, since its inception has gone on to be a kind of touchstone for the often comically inaccurate results of automatic translation.
The course was a great experience. Obviously not much can be reinforced in just 40 hours, but I think I internalized the important basics. I can count – sort of – say what my name is, where I live, who is all in my family, how many rooms in my house, things like that. Word order is different than in English and many Western languages; English is subject-verb-object, while Hindi subject-object verb. For example, in inquiring about a purchase you might say:
Maiṁ jahāṁ ēka hōvarakrāphṭa kharīda sakatē haiṁ?
which roughly translated is: “I where one hovercraft buy able to is?”
Need to look into that when I get to Pune.
Long, long ago, in the infancy of blogging – or, at the very least the petulant, awkward middle-school of blogging – I decided to host a blog. The blog I mean is String-or-Nothing, which belongs to my wife Kim. She started her blog in 2004; back then it was hosted on Blog-City. In 2006 I thought there had to be better systems than the then-lame Blog-City; I also thought maybe we could get some ad revenue out of a self-hosted blog and finally – to be fair – I felt like fooling around with blogging software. Using a hosting provider we were already running another site, based on Microsoft ASP classic. Searching around I came across dasBlog – a blogging system written for ASP.Net 2.0. Anyway after a few evenings dinking around, I had it running. A few more evenings saw the creation of a converter that transformed the blog-city export files into dasBlog-compatible XML. In Sept. 2006 we launched the re-vamped blog, with a cool template, my own ads and a pretty current feature set, blogging-wise.
Now it’s 2012. The years work changes on us all, blogs not the least. I started with a 0.9 version of dasBlog, I’m now on 2.0.7226.0. But despite that it lags in features compared to the state-of-the-art. Also – I admit it – I am getting tired of playing IT guy for this venture. So I pitched to Kim the idea of moving to WordPress.com, which you’re looking at right now. Great features, great templates and 10 GB storage, all for $99 a year. “Great idea,” says Kim. “I can’t wait to see my old content imported.”
Aye, there’s the rub. She’s right of course, you can’t just dump 8 years worth of high-quality content like you find at String. But I assume you are not surprised when I tell you there is no “Import DasBlog to WordPress” command just sitting around on the WordPress dashboard.
There’s evidence on Google of folks having done this before. Most examples, like this one, assume you are importing into a WordPress instance that you yourself host or control. I’m trying to get out of the hosting-my-own-blog business, thank you. What to do? Part of the reason I got into the dasBlog thing in the first place was to provide an excuse for a little coding. I quickly saw it was some coding that would get me out of this.
One of the ways you can import a blog into WordPress.com is to export from an existing WordPress blog into an WordPress eXtended RSS (WXR) file. There’s no documentation I could find – certainly nothing definitive – on WXR. But it is a form of RSS. Also I just exported my own blog and starting looking at that output. Turns out the WXR format is simple. Up top is a list of categories – these are all categories in the blog. Then any images you want to import get represented by an <item> tag of post_type attachment. Each post is an <item>, with nested <category> and <comment> tags. There was some brute-force text processing I had to do, to find all <img> src and <a> href attributes, but after that the program came together pretty easy – 1,500 or so lines of Java (including comments) that makes, far as I can see, valid WXR. The early results are on the new WordPress String-Or-Nothing. It will take a week or 2 to migrate everything; there is 250 MB of images and 6 MB of text after all.
Anyway, looks like a happy case of a problem solved by a tried-and-true method: Coding. Is there nothing it can’t do?