A notch above a monkey

Updated sgmllib

I wrote about a sgmllib problem a few days ago. I still may be a dolt and my code certainly needed fixing, but the bug remained nevertheless.

Hence I’ve made some small changes to sgmllib that fix problems I’ve had. New version, which passes all unit tests included in Python distribution, can be found here and I’d really appreciate if users of sgmllib could give it a go (that includes users of htmllib and BeautifulSoup).

Update: As suggested I’ve added an updated version of test_sgmllib.py, which includes an example where the old library fails and new one doesn’t.

Update 2 : It seems this is valid, even required SGML behavior.

Outage

This blog and other content of less dubious nature were inaccessible in last three days. A freak, previously unseen and therefore unforeseen scenario happened where two disks forming RAID-1, which haven’t shown a single sign of a problem, suddenly died simultaneously. As has been noted, a million to one chance scenarios crop up 9 times out of 10, so firm steps have been taken to prevent something like this from occurring ever again.

I’ve also rediscovered how I detest system administration and I’m happy to have competent friends to absolve me from it. I used to earn a part of my salary doing such work, which reminds me a lot of the war. Prolonged periods of preparations and shear boredom are pierced with short episodes full of adrenaline and occasionally terror of a fuck-up.

Another lesson form another unexpected discovery. I missed my blog as much as I did my email. A sad discovery indeed, showing I don’t prefer listening over talking as much as I should. However, there’s a silver lining in knowing that I haven’t longed for either too much.

So, this is it. Everything should be back to normal, but if you find a dead link or something not working quite as it should, I’d appreciate a line or two describing the problem.

sgmllib.py parser woes

Does anybody have problems with sgmllib.py?

After I spent way too much time hunting a bug in my code, I gave up, read the goahead function in sgmllib.py and I’m certain now that its parser is broken.

Let’s say you’re handling a web page with inline Javascript code which also includes HTML tags. Even if you use a setliteral method to skip processing data inside <script> tags, sgmllib.py will start doing so when it encounters first </. It interprets this as a start of an end tag and tries to close it. Even though code handles cases of known and unknown tags, it fails to do the right thing because it simply doesn’t expect a scenario where this isn’t a tag at all.

The other possibility is that I’m simply a dolt who should do stuff like this only when rested. By the way, where’s the proper place to make an ass out of myself complaining about standard library?