[mdlug] ASSP update...

Mark Thuemmel ldaphelp at thuemmel.com
Mon Jan 21 21:40:48 EST 2008


Carl T. Miller wrote:
> Mark Thuemmel wrote:
>> ASSP has a cronjob that updates once a day.  I was manually running the
>> update if I moved a bunch of files around, but these days it is usually
>> just one or two a month so I don't bother anymore.
> 
> I forgot to mention that I run a cron job that does several things.
> It checks to see if users have moved any messages to a folder named
> "spam" and processes the messages.  It checks how many files are in
> each of the folders for okmail and spam and deletes the oldest ones
> if there are too many.  It also rebuilds the spam database.  I run
> this from 1 to 6 times an hour, depending on the system.
> 
> I also do not let the messages get saved with a number.  That makes
> the subject become the name of the file.  That way I can ls -T to
> see the recent newcomers to okmail and spam and then move any that
> are misidentified.

You must have a powerful server.  I'm running one server on a p550 512mb 
and it takes a while and puts a load on it to rebuild the databases. 
I'm once a day and early in the morning before users wake up.

You are right abuout cleaning up olders files.  I have been waiting 
until the spam rebuld takes forever and then running a find/xargs 
command to clean up baised on age when necessary.  About every 3 months 
on my hard drive.  I think last time I set it to run every day and 
delete if over 180 days old.  Drive has not been full lately and the 
rebuild is finished before users get on.

I'd think it would be better to run the spam/notspam rebuild frequently 
to keep the Bayesian fresh, but in the grand scheme of things, in the 
long run, Bayesian already knows most all the tricks is my experience. 
The more properly your corpus is categorized and the number of examples, 
the better.

I would repeat, the biggest thing I did with ASSP to stop spam was 
enable the greylist/delaying for 5 minutes.  I had this before with just 
Postfix, but left it off while training ASSP.  Was very noticeable in 
the reduction of spammers that will actually try again after 5 minutes. 
  I know it sucks to have to wait for strangers to email you, but having 
my users email them first whitelists them, so no problem unless the 
people sending you email don't know what email address they are using to 
send mail to you.  For them, I say wait.  Email is not considered time 
sensitive in my world for strangers.


I was curious why the ls -T.  My man page says -T is expand tabs and 
when I literally type in ls -T I get an error on my Ubuntu 7.10 box.



More information about the mdlug mailing list