[mdlug] ASSP update...
Carl T. Miller
carl at carltm.com
Tue Jan 22 09:49:45 EST 2008
Mark Thuemmel wrote:
> You must have a powerful server. I'm running one server on a p550 512mb
> and it takes a while and puts a load on it to rebuild the databases.
> I'm once a day and early in the morning before users wake up.
All but one is not very powerful. Most are Pentium IIs with a
gig of ram. If the process takes a long time to run, I decrease
the number of files I save in the spam, okmail and notspam
folders. I also use nice to lower the priority of the rebuild
process. And I check to make sure the last rebuild process has
finished before starting a new one.
> I'd think it would be better to run the spam/notspam rebuild frequently
> to keep the Bayesian fresh, but in the grand scheme of things, in the
> long run, Bayesian already knows most all the tricks is my experience.
> The more properly your corpus is categorized and the number of examples,
> the better.
Well, that depends on the spam you're receiving. Currently
someone is sending spam that has a different subject every
time, and the body consists of a single URL which changes
frequently. The URL is the only thing that distinquishes
these messages, so you continue to receive these messages
until the first time a new URL is marked as spam and the
database is updated. That's why I update more frequently
and not depend on having lots of files in the various folders.
> I was curious why the ls -T. My man page says -T is expand tabs and
> when I literally type in ls -T I get an error on my Ubuntu 7.10 box.
Oops. My memory is getting foggy. I meant -t, which sorts
based on time.
c
More information about the mdlug
mailing list