sgmllib.py parser woes

This post is older then 6 months, which means opinions contained were mine and any technical information is most likely obsolete.
Please contact me for text I would also sign, not only acknowledge or if post got broken during one of many server upgrades. I will be most grateful.

Does anybody have problems with sgmllib.py?

After I spent way too much time hunting a bug in my code, I gave up, read the goahead function in sgmllib.py and I’m certain now that its parser is broken.

Let’s say you’re handling a web page with inline Javascript code which also includes HTML tags. Even if you use a setliteral method to skip processing data inside <script> tags, sgmllib.py will start doing so when it encounters first </. It interprets this as a start of an end tag and tries to close it. Even though code handles cases of known and unknown tags, it fails to do the right thing because it simply doesn’t expect a scenario where this isn’t a tag at all.

The other possibility is that I’m simply a dolt who should do stuff like this only when rested. By the way, where’s the proper place to make an ass out of myself complaining about standard library?

Single post trackback spam

This post is older then 6 months, which means opinions contained were mine and any technical information is most likely obsolete.
Please contact me for text I would also sign, not only acknowledge or if post got broken during one of many server upgrades. I will be most grateful.

I’m a coeditor of another blog and like most blogs we have some problems with comment spam (if you’re using WordPress, you really ought to give akismet a go). In general I don’t mind it too much and take it as a price of self-expression.

But we get spam that drives me nuts, because it’s something that could be permanently fixed. You see, ALL of our trackback spam is directed to the same blog post. So, if I could turn trackback off only on that post, it would cut our spam to more than a half and leave my nerves intact.

I don’t think it’s doable with default installation of WordPress 1.5, but if any of you is using a plugin that support this functionality, I’d appreciate it greatly if you could let me know.

Localized interfaces in globalized world

This post is older then 6 months, which means opinions contained were mine and any technical information is most likely obsolete.
Please contact me for text I would also sign, not only acknowledge or if post got broken during one of many server upgrades. I will be most grateful.

There’s an issue I’ve been struggling with for months now. Well, more than a year really. We built Marela so it would be used in Slovenia by people preferring Slovene language over the others. It’s not the only distinguishing point from similar services found on the web, but it’s an important one.

However, Internet and low fares airlines certainly shrank our planet and it’s not uncommon for many of us to have friends who don’t speak our language but with whom we’d still like to share parts of our lives and our creations. And this is a problem for Marela, which was (also) built to fill this need of sharing.

It’s simply not possible to use Marela in any language but Slovene.

We decided to do this to avoid Orkut syndrome, where brazilian users (legitimately) subverted a global service into a mostly Portuguese speaking one. We didn’t want to risk making our site unfriendly to Slovene-only speaking members by creating an environment in which a large portion of our community neither could or would speak Slovene.

I believe our interfaces are mostly self-explanatory and easy to use, but it’s difficult to tell how challenging they are for those who don’t speak their language. Judging by personal experience in using a dutch Windows 95 years ago, I’d say they are probably not easy enough.

So, what can be done?

There was an idea to enforce Slovene only for logged in users, but this can work only as long as unregistered users are not allowed to contribute and the idea itself doesn’t feel natural to me. It seems such an artificial restriction and that never amounts to a good thing. And that was one of more promising ones.

I ran out of my own ideas long time ago. Any ideas you might have are therefore more than welcome.

Customizable interfaces

This post is older then 6 months, which means opinions contained were mine and any technical information is most likely obsolete.
Please contact me for text I would also sign, not only acknowledge or if post got broken during one of many server upgrades. I will be most grateful.

There’s a vast number of potential Marela features that we are able, but not willing to implement, since they are detrimental to our design goals. There are also many, which we are willing and will implement, but it doesn’t change the fact that there will always be parts where we won’t compromise.

Still, even though it seems natural that benefits to many should be more important than benefits to few, I never got quite comfortable with this. That’s why we also want to make Marela as open as possible to let our members customize it without affecting other people.

First, there are public APIs. They tend to work well for building new tools or interfaces, but offer little help in customizing existing ones. Which is what our users want most of the time.

Every modern browser lets users define a custom style sheet to change the presentation of websites. That’s why we recently added ID to body element, which lets users more easily customize the look of Marela alone.

Then there are tools like Greasemonkey, which let proficient users using Mozilla based browsers customizing existing interfaces. Greasemonkey works quite well, if you’re proficient in Javascript.

I think the biggest problem with Greasemonkey (and custom CSS for that matter) is that it’s tied to a browser and a computer. If you change either of them, your customization won’t work anymore without significant effort on your part. Sometimes, in restricted environments, it’s not even possible to make it work. It’s a bit easier to make and transfer CSS customizations, but you’re still forced to carry your file around.

I’m not too bothered with CSS. I simply don’t see much need for it beyond what Marela’s design and modern browsers already offer and I’m still biased against skin-deep themes offered by some programs. In any case a possible solution for it would be practically the same as what I have in mind for a case that does trouble me.

Idea is pretty simple. Define javascript hooks that get called after window.onload event when they exist. If there’s an easy way to add your javascript file to a page (say, by uploading it to our server), then you can create a simple, but cross-browser and cross-computer Greasemonkey.

Question that remains is, is this appealing enough for you to use it?

Universal Encoding Detector

This post is older then 6 months, which means opinions contained were mine and any technical information is most likely obsolete.
Please contact me for text I would also sign, not only acknowledge or if post got broken during one of many server upgrades. I will be most grateful.

Few months ago, while exporting vCards from Apple’s Address Book (which uses UTF-16 instead of to me more common UTF-8), I discovered that there’s really no general agreement on which encoding should be used for storing vCards. It was quite a disheartening discovery, since you can’t get this information from a filesystem and it’s difficult to transform encoding to a uniform one, if you don’t know the encoding of the source.

I decided to tackle this problem once other problems were solved and I’m happy to say my procrastination payed off. Mark Pilgrim wrote another excellent module which solves my problem better than I ever could.

Universal Encoding Detector is a python port of code used by Mozilla to accomplish the same thing and is really very simple to use. Obviously it can’t be perfect since it’s not possible detect encoding completely reliably. But it works quite well and if you need such functionality, you should really give it a try.

And people say laziness doesn’t pay off.

Next Page »