Hmm … I’ve read the original Twitter blog post but I cannot find a strong direct statement that points the finger at XMPP not scaling … perhaps the Jabber folks or someone can clarify this entry (or perhaps the point is simply being stretched a bit to make the case for Gnip). Is it the Twitter design or a wall that XMPP hits for this type of social application. My inclination so far was to think that XMPP itself was not the problem, that it had to do with the design, topology and infrastructure related to Twitter. But the folks at Gnip seem to imply that XMPP might be an issue when used in this manner at least – thoughts?
Let’s address the first issue: How we would benefit Twitter and anyone that wants to integrate with Twitter data.
Twitter has found that XMPP doesn’t scale for them and as a result, people are forced to poll their API *a lot* to get updates for their users. MyBlogLog has over 25,000 Twitter users that they throw against the Twitter API every 15 minutes. This results in nearly 2.5 million queries against the API every day, for maybe 250K updates. Now add millions of pings from Plaxo and SocialThing and Lijit and heaven forbid Yahoo starts beating up their API…
If Twitter starts pushing updates to us, via our dead simple API or Atom or their XMPP server, we can immediately reduce by an order of magnitude the number of requests that some very large sites are making against their API. At the same time, we reduce the latency between when someone Tweets and when it shows up on consuming sites like Plaxo. From 15 minutes or more to 60 seconds or less.

I think you have it right -- there's really no information about XMPP not scaling (and there really shouldn't be since it's about the perfect protocol for this type of use case). The issue is that Twitter sending out every single update on its network is an *enormous* amount of data that would stretch the boundaries no matter what technology was being used.
The typical solution to this problem is to provide feeds for just the data that's relevant to each party. In the case of the Zappos integration mentioned -- it would be listening for updates from just Zappos employees and not the entire network. In other words, a pubsub architecture. Building out a robust pubsub stack isn't trivial, but it's probably the best option for Twitter and will get them out of the horrible polling problem.
Posted by: Matt Tucker | July 11, 2008 at 10:52 AM
Thanks Matt ... that's how I felt - I don't see XMPP per se as the problem with Twitter vs. the design pattern and implementation of XMPP specific to Twitter's need.
Posted by: Mike Gotta | July 11, 2008 at 03:01 PM
I read that Twitter uses Openfire and I wonder if Ejabberd wouldn't be better for it.
Posted by: kael | July 12, 2008 at 08:52 AM
Actually, what we've heard is that ejabberd wasn't coming close to scaling for them and that they are now experimenting with Openfire.
Posted by: Matt Tucker | July 12, 2008 at 10:48 AM
Heya -- Sorry for the confusion. I wasn't pointing the finger at XMPP at all. I was simply pointing out that in the Twitter post I linked to, Biz said this:
Despite delivery over a faster and cheaper technology, this entire public feed of Twitter updates is resource intensive—we had to be very careful about giving it out.
Posted by: Eric Marcoullier | July 12, 2008 at 05:24 PM
Actually, I am quite familiar with the discussion, so let me provide a few points:
- XMPP is not at the core of the architecture. It means that XMPP does not have any influence on Twitter scalability. XMPP is used on the side of the Twitter architecture, very much like the SMS gateway for example. Twitter is based around a website and a database. It is not event based but statefull.
- Twitter is a Java / Scala shop. They are experimenting with a Java XMPP server because they need to reuse their internal workforce knowledge.
Posted by: Mickael Rémond | July 13, 2008 at 05:34 AM
I think Mickael hits the nail on the head -- XMPP is not at the core of how Twitter built things out. So even though it may seem to us now that Twitter is a messaging / pubsub system, I don't think it was designed that way from the beginning. Thus some of the scaling challenges.
Posted by: Peter Saint-Andre | July 14, 2008 at 11:56 AM
Thanks everyone for providing additional info and clarification!
Posted by: Mike Gotta | July 16, 2008 at 09:31 AM