[mdlug] ASSP update...
Mark Thuemmel
ldaphelp at thuemmel.com
Mon Jan 21 21:40:48 EST 2008
Carl T. Miller wrote:
> Mark Thuemmel wrote:
>> ASSP has a cronjob that updates once a day. I was manually running the
>> update if I moved a bunch of files around, but these days it is usually
>> just one or two a month so I don't bother anymore.
>
> I forgot to mention that I run a cron job that does several things.
> It checks to see if users have moved any messages to a folder named
> "spam" and processes the messages. It checks how many files are in
> each of the folders for okmail and spam and deletes the oldest ones
> if there are too many. It also rebuilds the spam database. I run
> this from 1 to 6 times an hour, depending on the system.
>
> I also do not let the messages get saved with a number. That makes
> the subject become the name of the file. That way I can ls -T to
> see the recent newcomers to okmail and spam and then move any that
> are misidentified.
You must have a powerful server. I'm running one server on a p550 512mb
and it takes a while and puts a load on it to rebuild the databases.
I'm once a day and early in the morning before users wake up.
You are right abuout cleaning up olders files. I have been waiting
until the spam rebuld takes forever and then running a find/xargs
command to clean up baised on age when necessary. About every 3 months
on my hard drive. I think last time I set it to run every day and
delete if over 180 days old. Drive has not been full lately and the
rebuild is finished before users get on.
I'd think it would be better to run the spam/notspam rebuild frequently
to keep the Bayesian fresh, but in the grand scheme of things, in the
long run, Bayesian already knows most all the tricks is my experience.
The more properly your corpus is categorized and the number of examples,
the better.
I would repeat, the biggest thing I did with ASSP to stop spam was
enable the greylist/delaying for 5 minutes. I had this before with just
Postfix, but left it off while training ASSP. Was very noticeable in
the reduction of spammers that will actually try again after 5 minutes.
I know it sucks to have to wait for strangers to email you, but having
my users email them first whitelists them, so no problem unless the
people sending you email don't know what email address they are using to
send mail to you. For them, I say wait. Email is not considered time
sensitive in my world for strangers.
I was curious why the ls -T. My man page says -T is expand tabs and
when I literally type in ls -T I get an error on my Ubuntu 7.10 box.
More information about the mdlug
mailing list