2001/6/11

Most of the applications such as word and excel all have a little LRU cache on the file menu that keeps track of the most recently used files. The algorithm for maintaing this list is trivial, simply move an item from its current location in the list to the top of the list every time it is accessed.

However, this does not strike me as optimally useful. The problem is that often there will be a clump of files that are often accessed in a group, for example all the documents relevant to the proxy belong in a cluster that is quite different from the group of stories that I am writing or reading. I believe that there is a better and not much more complicated way. Suppose one had a list of every file access ever made. From that list it is easy to extract digram frequency. What this tells you is that given a file, what is the file that you most often accessed next. You could not only move the file to the top of the LRU cache but you could move its digram buddy to next on the list, and then move its digram buddy to next on the list, etc.

This is only a bit more work, and it will cause information that truly is clumped together from an access standpoint to remain clumped together in the LRU cache.

Note: this idea is relevant not only to file access but could be considered as an organizational principle for other information services. For example consider a portal to the web. I like to get up, read the Wall Street Journal headlines, read some articles, look at the stock price for MSFT and SNHK, see if there are any new viruses posted to the web and then check my email. Different shit, same order every day. Why not have the portal page configure itself, not once statically, with links to Wall St, and Quote.com, and hotmail, but rather work like the cache. This way when I am looking at Wall St. the system knows from previous use that that access is most often followed by access to Quote.com so it brings it to the secondmost on my LRU list.