Junk Filter
Several people on various forums have asked for more details about the LimeWire junk filter, so here's a short summary of the LimeWire junk filter.
The main point is that the filter actually learns in two ways. First of all, it learns patterns of words, file sizes, IP addresses, etc. that keep being associated with results that the user marks as junk. Secondly, the filter learns which words, file sizes, and IP addresses tend to be in search results together. It uses these two types of patterns to make guesses about which results are likely junk.
For every search result, LimeWire internally generates a bunch of hints called "tokens". There's a token for each word in the file name, the size of the file, the IP address of the host that sent the result, etc. For the sake of description, let's say token has an associated "goodness" value from 0.0 (junk) to 1.0 (not-junk). Each time a result is marked by the user as junk, the goodness for each token associated with that result goes down a little. Each time a result is marked not-junk by the user, its tokens go back to a goodness of 1.0. This is how the filter learns from the user. (The code actually keeps track of 1.0 minus the goodness, which it calls the "rating", but it's easier to explain in terms of goodness. The code does a lot of subtracting values from 1.0 in order to deal with goodness in the middle of expressions.)
Now, when a result comes in, it gets assigned a goodness equal to the product of all of its associated tokens. (1.0 minus this goodness, and multiplied by 100 is the junk rating displayed to the user.)
The interesting thing is that the filter learns from the network itself. Whenever the filter determines that a result has a very high goodness (not-junk, low rating), all of its tokens have their goodness increased by a small amount. Likewise, very low goodness (junk, high rating) will cause the associated tokens to have their goodness decreased by a small amount. These changes in goodness are not as large as if the user had explicitly marked the file as junk or not-junk, but these changes still help the filter learn based on which tokens tend to show up together.
It's also important to note that there's a lower limit for the goodness (actually an upper limit on rating) below which a result will not be used to learn about the network. The reason for this is that a single very low badness token will can cause a result to be marked as extreme junk, which results in the lowering of the goodness of all associated tokens. The highest goodness tokens (1.0 goodness, 0.0 rating) actually have no effect on a result's rating. Tokens can only have a negative effect and this limit on learning from the network helps reduce the influence of extremely low goodness (high rating) tokens.

