Twitter Search History Dwindling, Now at Four Days

Adam DuVander
Aug. 12 2010, 03:23PM EDT

Developers looking for trends in Twitter search are finding it more difficult now that the micro-blogging site has decreased the search history to four days. Previously going back weeks and months, the backlog has steadily decreased, now too short for some types of applications. At the same time, the newer streams have become the go-to API for former search use cases.

Back when you searched Twitter with Summize, the history went back several months. Once Twitter acquired Summize, that continued. Though, as Twitter became more popular, the history decreased. It appears to fluctuate, based on how long it takes to hit its maximum storage. Twitter stores tweets in MySQL and recently dropped a planned move to Cassandra, a system open sourced by Facebook, which some believe could improve Twitter's search performance.

Twitter refused to comment, citing a policy against making user statistics public. However, its own documentation lists the limit at 1.5 weeks. The same page claimed a history of one month, then three weeks, in early 2009. It was last updated in March. The drop to less than one week likely happened just last month. Damon Cortesi noted the change in a tweet.

Cortesi's company makes RowFeeder, a tool that performs social media monitoring, adding references to a spreadsheet. The service uses Twitter, among other sources, and now uses the streaming API for the bulk of its work. However, "when new customers sign up," Cortesi told us, "they ask if we can get back data." For that operation, RowFeeder uses Twitter search, which is subject to the what is now a four day limit. Cortesi said sometimes it goes back up to five days.

Twitter streams, which we've covered previously, are a "push" technology. Rather than an application polling for the latest data, it registers to receive specific searches automatically. Then, when there is new content, Twitter sends it over a persistent connection.

Update. Twitter's Matt Harris chimed in on the dev list:

To answer your question about the search index history, we don't publish that information. The size of the index fluctuates based on the number of Tweets being made which means, the more Tweets there are the shorter the index period is. We're working to improve the duration of the search index and improve the relevance of the results.

Adam DuVander -- Adam heads developer relations at Orchestrate, a database-as-a-service company. He's spent many years analyzing APIs and developer tools. Previously he worked at SendGrid, edited ProgrammableWeb and wrote for Wired and Webmonkey. Adam is also the author of mapping API cookbook Map Scripting 101.



[...] but the results that come back from a search aren’t very comprehensive, since the index is limited to only a few days’ worth of tweets. That’s created a market for search engines like Topsy, which says it now has the [...]

[...] Seperti yang dituliskan oleh TechCrunch, hasil pencarian realtime dari Google ini merupakan langkah yang cukup baik yang dilakukan dari Google, memang kita bisa melakukan hasil pencarian dari konten yang ada di Twitter, misalnya dengan melakukan pencarian lewat web Twitter, namun salah satu kelebihan dari hasil pencarian realtime Google adalah jangka waktu konten yang didapatkan, Google akan menggali konten dari bulan Februari awal tahun 2010, sedangkan Twitter hanya dalam rentang waktu 4 hari. [...]

I have to admit, I hope Twitter doesn't extend the search period. Nobody needs to know what I ate for dinner three weeks ago, much less three years ago. I can imagine a premium service where individuals can pay to archive their own tweets and/or the tweets of the people they follow, but I'd really like to see Twitter take a stand against the default "keep-it-forever" approach to web communications. I'm still a little annoyed that they're donating my tweets to the Library of Congress (though not annoyed enough to quit Twitter, obviously).

[...] 競争力の核心を検索におくGoogleのような企業としては賢明な動きと評価できるだろう。リアルタイム情報の源泉としては現在Twitterが市場をほとんど独占しているものの、Twitter自身の検索は4日前までしか及ばない。Googleの検索は今年の2月まで遡れる上に、私の経験からすると全般的にTwitterより信頼性も高い。 [...]

We've actually been archiving twitter and over 100 other 140 character sites. We just released our real-time livestream feed, and will be releasing our search feed shortly. we don't limit based on time or number of records. Check out the implementation at