spugspam - a confirmation-based mail filter
spugspam is a flexible sender-confirmation mail filter: it is
designed to fight spammers who use fake e-mail addresses by requiring an
unrecognized sender to confirm that they were, in fact, the sender by replying
to a confirmation message. It is designed to be used with procmail as
part of a more general mail-filtering pipeline.
In its standard mode of operation, spugspam reads an e-mail message
from standard input, evaluates it and marks it with a special header
(X-SpugSpam-State) and writes it back out to standard output.
spugspam is not a delivery agent, nor does it preempt the delivery of
messages: it is a mail filter designed for use with other mail tools, notable
procmail.
As of version 1.1, spugspam supports the Sender Policy Framework (SPF). This
is a DNS convention that allows domains to identify hosts that are permitted to
send mail for them. SPF is an important feature for a confirm-response filter
because it prevents you from issuing confirmation requests to senders who could
not legally have sent the message based on the contents of their SPF records.
Only check the rules and the whitelists: don't do confirmation processing.
Add all recipients (in the to, cc and bcc headers) to
the whitelist if they are not there already. A recipient will not be added if
he is on the blacklist.
Forces spugspam to accept a state header with a possibly invalid signature.
This allows it to reread a message that was previously processed by spugspam and
has since been altered by some other filtering agent.
Identifies the root directory of all spugspam housekeeping and
configuration files. This defaults to $(HOME)/spugspam.
Adds the address to the whitelist if it is not already there. Removes it
from the blacklist if it is currently blacklisted.
Adds the address to the blacklist if it is not already there. Removes it
from the whitelist if it is currently whitelisted.
Removes the address from the whitelist.
Removes the address from the blacklist.
Write all of the addresses from the given list to standard output.
list is "white" or "black".
Purges all pending addresses and messages that are older than the given time.
Time can be specified with a suffix of "d" indicating days, "h" indicating
hours, "m" indicating minutes, or "s" indicating seconds. If no suffix is
specified, the value is assumed to be in days.
Write the list of all pending messages (message awaiting confirmation).
Display a message in the pending message list (ids are on the second line for
each message line displayed by show-pending)
Disables the SPF check. This is intended to be used for situations where
spugspam is invoked multiple times. To permanently disable SPF, use the
option in the config file instead.
spugspam stores all of its configuration and housekeeping files in a
single directory tree which is, by default, $(HOME)/.spugspam. This
can be overriden with the --root-dir option.
This directory contains the following files and subdirectories:
(required) The contents of this file are your "inner key" - this is the value
that is combined with an e-mail address or the entire message text to generate
an MD5 signature that is used to identify a confirmation and verify that no one
has spoofed spugspam's state header.
This file should contain something unique, and its permissions should make it
unreadable by anyone other than the owner. You should not change the value of
this file while there are pending confirmation messages: doing so will leave you
unable to confirm them.
This file contains a list of "open addresses" - if these addresses are
discovered in the "to" header of an incoming message, they are automatically
marked "allow,openaddr". This lets you define special purpose e-mail addresses
where you can receive mail without the benefit of spugspam's filtering.
This is a list of whitelisted addresses. When spugspam encounters a
message from a whitelisted address, it marks the message "allow,whitelist". You
should not edit this file directly, as spugspam modifies it and it would
be bad if you were changing it at the same time as a spugspam instance.
Use the command line options ("--whitelist", "--unwhitelist", "--list white")
instead.
This is a list of blacklisted addresses. When spugspam encounters a
message from a blacklisted address, it marks the message "deny,blacklist". As
with whitelist, you should not edit this file directly.
(required) The confirmation message template. This specifies the
confirmation message that will be sent to an unrecognized sender, and it should
generally be of the standard RFC822 message format - consisting of a list of
headers, followed by a blank line, followed by the message body. The original
message will be appended to the end of the confirmation message prior to sending
it, and all variables will be expanded (see Template
Files below).
confmsg can contain the following variables:
The e-mail address of the recipient of the confirmation message (which is the
unrecognized "from" address of the sender of the original message).
The e-mail address that should be used as the "from" address of the
confirmation. This is the first address in the recvaddrs file that is
listed among the recipients of the original message.
This is the spugspam signature, which is used to identify the
confirmation reply. This variable must be included in the message, and
should generally be in both the "Subject" header and the body of the message.
(required) This is the local message delivery script template: it is executed
in the shell to deliver messages queued from a previously unknown sender when a
confirmation reply is received. All variables are expanded prior to execution
(see Template Files below).
deliver can contain the following variables:
The full destination address of the message: this will be the e-mail address
of the first address in the recvaddrs file that is listed in the
recipients for the message.
This is just the destination address with the domain name information
removed: if the destination address is "john@doe.com", the
localDestAddr is simply "john".
This is the spugspam logfile. You may want to take a look at this
every now and then to make sure everything is working ok. You may also want to
add a cron job to archive this, as spugspam itself will continue adding
to it indefinitely.
(required) This is the list of e-mail addresses that it is legal to receive
messages at, separated by whitespace. spugspam determines the message
recipient by scanning an incoming message's recipients for each entry in
recvaddrs until it finds a match, so the designated recipient of an
incoming message will be the first matching entry in recvaddrs.
This is a list of rules that is processed prior to performing any other
checks on the message: it allows you to circumvent the normal processing of a
message if the headers or the body contain a combination of specified regular
expressions. See The Rules File below.
(required) This is the "send message" script template used to send
confirmation messages (see Template Files below).
sender can contain the following variables:
The recipient of the confirmation message (the sender of the original
message).
This is a directory full of files identifying e-mail addresses which are
awaiting confirmation. The name of each file is an e-mail address, the contents
of the file are a list of message file names (relative to the spugspam
root directory) where messages from the address that are to be delivered when a
confirmation is received are stored.
This is a directory where the actual messages which are awaiting confirmation
are stored, one per file. File names are constructed from the date and time
that the message was processed and a serial number.
This file contains a list of regular expressions used to parse the "received"
header on your system. See The Received
Patterns File below.
A general purpose configuration file. This file is directly parsed by the
python interpreter, and contains simple configuration variable definitions. See
The Config File below.
"Template files" can contain variables which are expanded when the files are
used. A variable is specified as a python formatting sequence:
"%(var-name)s". Since variable expansion is actually
implemented using python formatting strings, the full power of these forms is
available
The rules file contains a set of rules. A rule consists of a sequence of
entries identifying matching criteria followed by an "action" entry identifying
actions to take on the message if all of the criteria are met.
Entries have an appearance similar to rfc822 headers. They must each be on
their own line and they each consist of a tag followed by a colon and then a
value which is a regular expression to be matched against a line in the message.
All regular expressions are matched case-insensitive, and they begin matching at
the beginning of the line - to match elsewhere in the line, start the expression
with ""
You may include comments in the file by starting a line with a '', but a
comment must be on its own line and must be the only thing on that line - you
can not mix comments and rule lines.
The following entry headers are supported:
Identifies a regular expression to be checked against a header
Identifies a regular expression to be checked against each line of the
message body (including any MIME attachments)
Indicates an action to be taken if all of the preceeding tests succeed.
Actions are:
causes the message to be marked with the state "allow,rule-match" and aborts
all further processing.
causes the message to be marked "deny,rule-match", also aborting all further
processing.
has the same effect as "allow", but also adds the sender address to the
whitelist (if it is not already there).
Treats the message as a "control request", writing a signed instruction sheet
to standard output instead of the message body. See Control Messages.
Treats the message as if it were a signed "control command. Evaluates
commands in the body and writes their output to standard output instead of the
message body.
A rule consists of a sequence of one or more header and body
entries followed by a single action entry.
The following rule file might be used to allow you to receive mailings from
the "spugspam" and "SpamAssassin" mailing lists, and to block chain letters from
a particularly annoying relative:
SPF checking uses the "received" headers to determine where the message was
sent from. "Received" is a header added by MTA's to record the routing of the
message. Unfortunately, its form is not entirely consistent accross different
MTA's. Fortunately, you only need to be concerned with the MTA's that you have
control over: those between you and the outside world.
Received headers are parsed from top to bottom in a message. Generally there
are one or more received headers that you want to ignore (the "local" headers)
followed by one header that you're very interested in (the "remote" header, the
address of the foreign host that actually is sending the e-mail). Local headers
are inserted by your delivery agent and any MTA's that are either within your
domain or allowed to forward to your domain. Remote headers are inserted by the
outermost trusted MTA, and these contain the address information that you want
to do SPF verification on.
The recvdpat has two keywords to accomodate these different kinds of
files: "local" and "remote". Each is specified on its own line and is followed
by a colon and the regular expression that you wish to match. Trailing and
leading whitespace is ignored.
The local keyword may also include an integer indicating the maximum number
of repetitions of the header to match, or an asterisk indicating that an
unlimited number of repetitions may be matched. If neither is specifed, the
pattern will match at most one repetition. It is always possible for a pattern
to be ignored if it does not match: there is no way to indicate that a pattern
must match some minimum number of headers.
There can be only one instance of the remote keyword, it must follow all
"local" lines.
An an example, lets say that your mail is received by myhost.com, and you
also have a forwarding account on friend.com. Your recdpat file might
look something like this:
There are two special groups in the remote expression,
(?P<ip>[^)]+) and (?P<host>[^)]). These match the
IP address and host name (HELO/EHLO domain name, actually) of the remote
address. These must be present in any remote pattern that you supply,
their values are used as areguments to the SPF check. (they need not match such
lame expressions, though, (?P<ip>d+(.d+){3}) might be more
appropriate for an IP address)
If friend.com had a number of internal relay hosts, we might want to change
the second rule to look more like this:
As stated earlier, the "local*" usage allows an unlimited number of matches.
If we wanted to limit this to, say 3 matches, we could have used "local3"
instead.
It is completely legal for a message to contain only local received headers:
in this case it will be assumed that the message originated locally and no SPF
check will be performed.
Certain configuration options to spugspam are specified in the config
file (named config). If it is present, config is parsed and
executed by the python interpreter, so you have the full power of the
programming language in it. That said, it only supports two variables at this
time, so doing anything fancy with it probably doesn't make very much sense.
The variables supported in config are:
[boolean, default = True] Turns on the SPF check.
It is rather inconsiderate to disable SPF checking, because it causes
confirmation messages to be sent for spams that could not have legitimately come
from the sender.
[int, default = 5] Number of seconds before timeout on an SPF check. Since
SPF records are served from the domain's nameserver, this is essentially a DNS
timeout.
Example config file:
After analyzing a message, spugspam brands the message with a
message state. The message state is stored in the
X-SpugSpam-State header, followed by an md5 signature created from the
rest of the message and the user's inner key:
The md5 signature is very important because spugspam reads this state
header looking for information conveyed from a previous spugspam instance
- without the signature, a spammer could simply add an X-SpugSpam-State:
allow header and get a free pass through the system
Other programs on the mail pipeline (e.g. procmail) should scan for
the state header and make a decision as to what to do with the message based on
it. In general, you want to deliver messages with an "allow" state, ignore
messages with a "deny" or "unrecognized" state, and possibly do special stuff
with the others.
Some of the states are followed by a comma and a substate: the substate gives
more information as to how the message state was determined.
This is the set of all message states:
A state header was found with bad signature. This can mean one of three
things:
The message has been modified since the last time that spugspam processed it
The user's inner key has changed.
Someone has tried to circumvent spugspam
The message sender is unrecognized, so a confirmation has been sent.
The message is a confirmation reply. You may choose to filter these out or
deliver them depending on whether or not you are interested in seeing
confirmation replies.
Allow the message to go through. "substate" is a reason code. Known reason
codes are:
sender is whitelisted
confirmation was received. You will see these when the initial confirmation
is received for all messages queued for the sender. After this, the sender is
added to the whitelist and messages from them will be marked "allow,whitelist".
receive address was in the "allowrecv" file
the message is interesting enough to merit investigation by the programmer.
This was used initially to investigate the various "bounce" formats used by
mailer daemons. You should never receive it.
A rule-file rule matched with an action of "allow"
deny the message. "substate" is a reason code. Reason codes are:
sender is blacklisted
A rule-file rule matched with an action of "deny"
The message failed an SPF check: the host that provided it is not authorized
to send messages for the domain of the "from" address.
Indicates that the "from" address is invalid (i.e. a message to that address
bounced).
error occurred during message processing. Hopefully this will be very rare.
This is an internal state used to indicate that the message should not be
marked with state information
the message is a control message
This is an internal state used to track the fact that a state has not been
determined. You should never see this state in a message that has been
processed.
In addition to command line options, spugspam supports "control
messages" as a management technique. These are messages which spugspam
recognizes as containing control information.
There are two kinds of control messages, a control request and a
control command. A control command is a message containing commands to
be executed. A control request performs no actions of its own, but it replaces
the body of the input message with a body containing a special signature
(actually, a signed timestamp) and instructions listing all of the available
commands. The recipient edits this, inserting actual commands to be executed,
and sends back a reply which is a control command.
The rationale for splitting control messages into this request/command set
is:
To allow the user to get help text (specifically, to see the commands
available)
To prevent an attacker from submitting controls to someone elses account. To
be accepted as a control command, a message must have a signature generated as
the result of a control request message (or match a rule specifying a
control-command action).
There are two ways of causing a message to be treated as a control message:
This should probably be done if you have a dedicated address for receiving
control messages. In this case, you can set up a separate pipeline that runs
spugspam with this option.
This allows you to control the system through special markups in your e-mail.
For example, if you wanted to do direct control without having to first do a
request, you could use something like the following rule:
The text identified in the body header would serve as an indicator
that you were sending a command message and the password would protect your
account from control by an attacker. The pound sign ("#") at the beginning
would be useful in this context because spugspam ignores "#" commented
lines when processing a command message.
First of all, you'll need a fairly recent version of the Python interpreter.
spugspam was developed on Python 2.2, and has been tested on Python 2.3
and 2.4.
Edit the spugspam script so that the first line contains an acceptable
method of bootstrapping your local python interpreter. Now copy the script to
some place on your $PATH ( is the recommended location).
You should probably be able to run the tester script at this point
and see "0 tests failed" at the end of it (no guarantees here, as tester is a
shell script and I am not certain of its portability).
Create a ".spugspam" directory under your home directory (you can put it
somewhere else if you want to, if you do be sure to use the "--root-dir" option
to identify this location in the places where you run spugspam).
Create all of the required files identified in Configuration Files in your .spugspam
directory. Examples follow:
The innerkey file need only consist of a string of data that is not
easily guessed. If you are particularly paranoid, and have on your
system, you might want to just do this:
Alternately, creating the file with a unique passphrase in your favorite
editor should work just as well.
Be sure that your innerkey file is not world readable so that nobody can get
your inner key and use it to trick their way through the system. In fact, you
might want to make your entire .spugspam directory unreadable, as it is
likely to contain some of your e-mail at various points in time.
The "confmsg" file is just a template for your confirmation message. An
example follows:
The deliver file usually just invokes your mail delivery agent. Be
warned that if you use procmail as a delivery agent, and are running
spugspam from procmail, you will get a nasty hang if spugspam is
invoked recursively - your best bet is probably to specify an alternate
.procmailrc file. Given this, your deliver file would look
something like this:
The sender file should normally just invoke your mail transfer
agent. If you are using sendmail, it would look something like this:
And that's all. If you have special needs, you can set up the other files as
well.
If you are using spugspam with procmail, you will want to invoke it
from your .procmailrc file and filter its messages afterwards.
In the most simple case, your .procmailrc file should look something
like this:
This example assumes that you trust spugspam implicitly and only want
to receive messages that it allows and everything else goes into the bit-bucket.
In reality, you probably want to just use spugspam to mark messages
for a week or two so you can see the results.
If you are using spugspam with other mail filters that modify the
message (e.g. SpamAssassin), you may want to run spugspam twice:
Michael A. Muller
Report bugs to mmuller@enduden.com Name
Synopsis
spugspam [options]
Description
Options
Configuration Files
Template Files
The Rules File
Example
# allow all messages from the spugspam list
header: list-id:.*spugspam
action: allow
# allow all messages from the SpamAssassing list
header: list-id:.*spamassassin
action: allow
# ignore cousin Fred's silly chain letters
header: from:.*cuzinfred@hotmail.com
body: .*send this message to (3|three) people
action: deny
# all messages from me with a special body signature are control
# messages - evaluate commands in their body
header: from:.*myaddress@myhost.com
body: password=z3cr3t
action: control-command
The Received Patterns File
# match for your MTA
local: \(delivered to myaccount by foomailer\)
# optional match for messages received from friend.com
local: from .*\.friend\.com \(192\.168\..*\) (HELO friend.com)
# match for external hosts
remote: from .* \((?P<ip>[^)]+)\) \(HELO (?P<host>[^)])\)
# optional match for messages received from friend.com - relayed by any
# number of hosts within friend.com.
local*: from .*\.friend\.com \(192\.168\..*\) (HELO friend.com)
The Config File
# enable SPF (which is the default anyway)
spf_enabled = True
# ten second timeout
spf_timeout = 10
Message States
From: "Test Dude" <test1@bogus.com>
To: mmuller@enduden.com
Subject: test message
X-SpugSpam-State: allow,whitelist:vKsIrl5UACqA9Xr5giyuqA==
This is a test message
Control Messages
body: ^#Spugspam password=123doremi
action: control-command
Installation
Building your Configuration Directory
(umask 077; head -c 64 </dev/urandom >~/.spugspam/innerkey)
From: %(from)s
To: %(to)s
Subject: Confirmation request %(sig)s
Hi, this is a one-time confirmation message to verify that you are the
sender of the message below. If you are, please reply with the following
text in the subject or body of the message:
%(sig)s
In most cases, just hitting "reply" should work.
Thank you,
procmail /home/myname/.procmailrc-inner
/usr/sbin/sendmail %(to)s
Modifying .procmailrc
:0fw
| spugspam
:0
* x-spugspam-state: allow
mymailfile
:0
/dev/null
# only do the basic checks of the rules and white/black lists
:0fw
| spugspam --check-lists
# anything that spugspam recognizes gets to go right on through
:0
* x-spugspam-state: allow
mymailfile
# run it through another spam filter
:0fw
| spamc
# filter out stuff that other spam filter marked as spam
:0
* x-spam-state: SPAM
/dev/null
# --force-accept-state makes it read the state information from the first
# pass even though the message may have been modified.
:0fw
| spugspam --force-accept-state
Author
Reporting Bugs