Speed test of PyRSS2Gen, kid and atomixlib

This post is older then 6 months, which means opinions contained were mine and any technical information is most likely obsolete.
Please contact me for text I would also sign, not only acknowledge or if post got broken during one of many server upgrades. I will be most grateful.

I’ve spent this evening building RSS2 and Atom feeds with PyRSS2Gen, kid and atomixlib, as proposed by helpful people few days ago.

We’d like to add feeds promiscuously to our service (right now we have exactly one). But before we can decide how to tackle this, we need to know how fast we can generate a feed on average.

DISCLAIMER: I tried to generate feeds with same data using what seemed a reasonable python code to do so, but I didn’t try to save every millisecond as I only cared about crude speed approximations. I have no problem believing that someone else might get completely different results. No serious statistical analysis has been made and rigorous scientific approach has been almost completely absent. Better make your own tests, if speed (or anything else really) is important to you. But if dubious numbers delight you, then please continue.

So, here are my results. Time to generate a feed with 10 entries on 1GHz G4 Powerbook:

  • with PyRSS2Gen somewhere around 70ms
  • with atomixlib around 120-140ms
  • and with kid around 30-35ms

In other words, you can generate between 8 and 30 feeds per second on my notebook and (I guess) 2-3x as many on a modern server. This is more than enough for most cases, but I’m afraid it probably won’t be enough for us. Which means either producing feeds by gluing strings together or having a more intelligent approach than building a new one on every request.

As a side note, I found all three packages easy to work with.

Update: Sylvain has released atomixlib 0.3 which makes it even easier to create atom feeds and brings also significant speed improvements. On my computer it takes now around 60-65ms to build a feed.

Another update: I made a couple of quick tests with mixed results on fairly new Opteron server. PyRSS2Gen was actually slower with 80-85ms and I have no idea why. kid was blazing fast with times between 8 and 9ms. Definitely good enough.

But I couldn’t get atomixlib to work, because 4Suite failed to build, so it will have to wait until I can figure out why it chokes on a perfectly legitimate XSLT.

Update 3: 4Suite has been promptly fixed (thanks!) and atomixlib 0.3 takes now 18-19ms. I believe this is an excellent time.

8 Comments »

  1. Hello Marko,

    Interesting. I had never really tested atomixlib performance wise but being 2 or 3 times slower is somehow significant enough for me to have a look at it and try to improve a bit the picture.

    Two things could help me out:

    1. Which version of atomixlib have you used?
    2. What was the code of your test?

    Would you mind sending me those details via email please?

    Thanks :)
    - Sylvain

    Comment by Sylvain Hellegouarch — November 5, 2005 @ 10:07 am

  2. Rock!

    Comment by Ryan Tomayko — November 5, 2005 @ 11:29 am

  3. Thanks for this analysis Marko. Out of curiosity I took a closer look at the atomixlib code and ended up giving Sylvain performance pointers. Seems he worked pretty quickly on those because he’s now released atomixlib 0.3.0. According to my timings it is almost three times faster than 0.2.0, which is the version you tried. I used a derivative of timeit.py and it posted the following summary for 0.2.0:

    10 loops, best of 3: 32.1 msec

    and for 0.3.0:

    10 loops, best of 3: 13.2 msec

    I’d be curious to see whether you notice a similar speed-up.

    It seems that you need *much* more speed, however, than even the new atomixlib or Kid would provide, which makes me wonder what the characteristics are of the atom feeds you are creating. Maybe, as you suggest, there are portions of the problem in which you can use caching. I just worry that if you take the stich-strings-together route, it’s really hard to ensure well-formed and valid Atom. Using a toolkit is much safer, and I’d love to see less broken Atom in the world :-)

    Anyways, keep us posted in this analysis. This is the sort of thing I’m covering in my new XML.com column[1], so I’m very interested in what practitioners like you are up to.

    Thanks.

    [1] http://www.xml.com/pub/at/39

    Comment by Uche — November 6, 2005 @ 7:03 am

  4. Forgot to give an atomixlib 0.3.0 link:

    http://www.defuze.org/oss/blog/entry/2005/11/05/atomixlib-0.3

    Comment by Uche — November 6, 2005 @ 7:05 am

  5. I’ve noticed a speed-up, but slightly less than you. My test code for atomixlib is pretty much Sylvain’s example, but changed so it creates 10 entries and with addition of a few time.time functions.

    So, pretty basic really.

    I’ll definitely test all of them today on a more modern hardware, since your results are much better and are in fact where I’d consider them fast enough for our needs.

    I also agree about stiching strings together. It’s probably not worth risking broken feeds and caching them should be simple enough.

    Thanks.

    Comment by markos — November 6, 2005 @ 11:45 am

  6. Marko, We’re wondering whether your build problem may be an endianness problem with some of our customizations of expat. If so, it’s a pretty serious matter, and we’ll investigate, get to the bottom of it, and release a 4Suite update if need be. Again, your patient investigation is proving very useful. I’ll keep you posted.

    Comment by Uche — November 7, 2005 @ 2:49 am

  7. re ’stiching strings’ – I wonder if Uche or others would comment on what they see are the biggest dangers with that approach… in particular Uche’s comment I just worry that if you take the stich-strings-together route, it’s really hard to ensure well-formed and valid Atom. Using a toolkit is much safer, and I’d love to see less broken Atom in the world caught my eye – perhaps here on in a column or on his blog he could give some real world examples of the sorts of gotyas people are likely to run into.

    Comment by Mike Watkins — November 8, 2005 @ 4:38 pm

  8. I can’t speak for Uche or anybody else for that matter apart myself, but I think main problem with stiching is not so much in getting feed structure wrong although that can happen too. Probably bigger and at least in my experience more often problem is getting the payload wrong.

    It’s simply too easy to break XML and that’s something that XML toolkits can help you immensely with.

    I’m also sure as an XML lightweight that I’m missing other, possibly more subtle problems.

    Comment by markos — November 8, 2005 @ 8:52 pm

RSS feed for comments on this post. | TrackBack URI

Post a comment: