[mdlug] ASSP update...

Carl T. Miller carl at carltm.com
Tue Jan 22 09:49:45 EST 2008


Mark Thuemmel wrote:
> You must have a powerful server.  I'm running one server on a p550 512mb
> and it takes a while and puts a load on it to rebuild the databases.
> I'm once a day and early in the morning before users wake up.

All but one is not very powerful.  Most are Pentium IIs with a
gig of ram.  If the process takes a long time to run, I decrease
the number of files I save in the spam, okmail and notspam
folders.  I also use nice to lower the priority of the rebuild
process.  And I check to make sure the last rebuild process has
finished before starting a new one.

> I'd think it would be better to run the spam/notspam rebuild frequently
> to keep the Bayesian fresh, but in the grand scheme of things, in the
> long run, Bayesian already knows most all the tricks is my experience.
> The more properly your corpus is categorized and the number of examples,
> the better.

Well, that depends on the spam you're receiving.  Currently
someone is sending spam that has a different subject every
time, and the body consists of a single URL which changes
frequently.  The URL is the only thing that distinquishes
these messages, so you continue to receive these messages
until the first time a new URL is marked as spam and the
database is updated.  That's why I update more frequently
and not depend on having lots of files in the various folders.

> I was curious why the ls -T.  My man page says -T is expand tabs and
> when I literally type in ls -T I get an error on my Ubuntu 7.10 box.

Oops.  My memory is getting foggy.  I meant -t, which sorts
based on time.

c





More information about the mdlug mailing list