spugspam 1 November 28, 2005

Name

spugspam - a confirmation-based mail filter

Synopsis

   spugspam [options]

Description

spugspam is a flexible sender-confirmation mail filter: it is designed to fight spammers who use fake e-mail addresses by requiring an unrecognized sender to confirm that they were, in fact, the sender by replying to a confirmation message. It is designed to be used with procmail as part of a more general mail-filtering pipeline.

In its standard mode of operation, spugspam reads an e-mail message from standard input, evaluates it and marks it with a special header (X-SpugSpam-State) and writes it back out to standard output.

spugspam is not a delivery agent, nor does it preempt the delivery of messages: it is a mail filter designed for use with other mail tools, notable procmail.

As of version 1.1, spugspam supports the Sender Policy Framework (SPF). This is a DNS convention that allows domains to identify hosts that are permitted to send mail for them. SPF is an important feature for a confirm-response filter because it prevents you from issuing confirmation requests to senders who could not legally have sent the message based on the contents of their SPF records.

Options

--check-lists

Only check the rules and the whitelists: don't do confirmation processing.

--whitelist-recipients

Add all recipients (in the to, cc and bcc headers) to the whitelist if they are not there already. A recipient will not be added if he is on the blacklist.

--force-accept-state

Forces spugspam to accept a state header with a possibly invalid signature. This allows it to reread a message that was previously processed by spugspam and has since been altered by some other filtering agent.

--root-dir directory

Identifies the root directory of all spugspam housekeeping and configuration files. This defaults to $(HOME)/spugspam.

--whitelist address

Adds the address to the whitelist if it is not already there. Removes it from the blacklist if it is currently blacklisted.

--blacklist address

Adds the address to the blacklist if it is not already there. Removes it from the whitelist if it is currently whitelisted.

--unwhitelist address

Removes the address from the whitelist.

--unblacklist address

Removes the address from the blacklist.

--list list-name

Write all of the addresses from the given list to standard output. list is "white" or "black".

--purge time

Purges all pending addresses and messages that are older than the given time. Time can be specified with a suffix of "d" indicating days, "h" indicating hours, "m" indicating minutes, or "s" indicating seconds. If no suffix is specified, the value is assumed to be in days.

--show-pending

Write the list of all pending messages (message awaiting confirmation).

--show-msg message-id

Display a message in the pending message list (ids are on the second line for each message line displayed by show-pending)

--no-spf

Disables the SPF check. This is intended to be used for situations where spugspam is invoked multiple times. To permanently disable SPF, use the spf_enable option in the config file instead.

Configuration Files

spugspam stores all of its configuration and housekeeping files in a single directory tree which is, by default, $(HOME)/.spugspam. This can be overriden with the --root-dir option.

This directory contains the following files and subdirectories:

innerkey

(required) The contents of this file are your "inner key" - this is the value that is combined with an e-mail address or the entire message text to generate an MD5 signature that is used to identify a confirmation and verify that no one has spoofed spugspam's state header.

This file should contain something unique, and its permissions should make it unreadable by anyone other than the owner. You should not change the value of this file while there are pending confirmation messages: doing so will leave you unable to confirm them.

allowrecv

This file contains a list of "open addresses" - if these addresses are discovered in the "to" header of an incoming message, they are automatically marked "allow,openaddr". This lets you define special purpose e-mail addresses where you can receive mail without the benefit of spugspam's filtering.

whitelist

This is a list of whitelisted addresses. When spugspam encounters a message from a whitelisted address, it marks the message "allow,whitelist". You should not edit this file directly, as spugspam modifies it and it would be bad if you were changing it at the same time as a spugspam instance. Use the command line options ("--whitelist", "--unwhitelist", "--list white") instead.

blacklist

This is a list of blacklisted addresses. When spugspam encounters a message from a blacklisted address, it marks the message "deny,blacklist". As with whitelist, you should not edit this file directly.

confmsg

(required) The confirmation message template. This specifies the confirmation message that will be sent to an unrecognized sender, and it should generally be of the standard RFC822 message format - consisting of a list of headers, followed by a blank line, followed by the message body. The original message will be appended to the end of the confirmation message prior to sending it, and all variables will be expanded (see Template Files below).

confmsg can contain the following variables:

to

The e-mail address of the recipient of the confirmation message (which is the unrecognized "from" address of the sender of the original message).

from

The e-mail address that should be used as the "from" address of the confirmation. This is the first address in the recvaddrs file that is listed among the recipients of the original message.

sig

This is the spugspam signature, which is used to identify the confirmation reply. This variable must be included in the message, and should generally be in both the "Subject" header and the body of the message.

deliver

(required) This is the local message delivery script template: it is a single command executed to deliver messages queued from a previously unknown sender when a confirmation reply is received. All variables are expanded prior to execution (see Command Files below).

deliver can contain the following variables:

destAddr

The full destination address of the message: this will be the e-mail address of the first address in the recvaddrs file that is listed in the recipients for the message.

localDestAddr

This is just the destination address with the domain name information removed: if the destination address is "john@doe.com", the localDestAddr is simply "john".

log

This is the spugspam logfile. You may want to take a look at this every now and then to make sure everything is working ok. You may also want to add a cron job to archive this, as spugspam itself will continue adding to it indefinitely.

recvaddrs

(required) This is the list of e-mail addresses that it is legal to receive messages at, separated by whitespace. spugspam determines the message recipient by scanning an incoming message's recipients for each entry in recvaddrs until it finds a match, so the designated recipient of an incoming message will be the first matching entry in recvaddrs.

rules

This is a list of rules that is processed prior to performing any other checks on the message: it allows you to circumvent the normal processing of a message if the headers or the body contain a combination of specified regular expressions. See The Rules File below.

sender

(required) This is the "send message" script template used to send confirmation messages (see Command Files below).

sender can contain the following variables:

to

The recipient of the confirmation message (the sender of the original message).

pending

This is a directory full of files identifying e-mail addresses which are awaiting confirmation. The name of each file is an e-mail address, the contents of the file are a list of message file names (relative to the spugspam root directory) where messages from the address that are to be delivered when a confirmation is received are stored.

queue

This is a directory where the actual messages which are awaiting confirmation are stored, one per file. File names are constructed from the date and time that the message was processed and a serial number.

recvdpat

This file contains a list of regular expressions used to parse the "received" header on your system. See The Received Patterns File below.

config

A general purpose configuration file. This file is directly parsed by the python interpreter, and contains simple configuration variable definitions. See The Config File below.

Template Files

"Template files" can contain variables which are expanded when the files are used. A variable is specified as a python formatting sequence: "%(var-name)s". Since variable expansion is actually implemented using python formatting strings, the full power of these forms is available

Command Files

"Command files" are similar to template files, only they are used to invoke a single command and they use a stripped-down shell syntax. The contents of a command file are a sequence of words separated by whitespace. Words can be any of:

A single-quoted string (example 'this is a string')
A double-quoted string (example "this is also a string")
Any sequence of non-whitespace characters not beginning with a single or double quote (example this-is-a-word).

Note that single-and double quotes are ignored anywhere but the beginning of a word, so the text this-is-"a command" would expand to two words: 'this-is-"a' and 'command"', not 'this-is-a command'.

Like Template Tiles, Command files can also include python formatting sequences. They are expanded in non-quoted words and in double quoted strings, but not in single quoted strings.

In general, single quoted and double-quoted strings behave very much like their shell counterparts: double-quoted strings support the full set of C-style escape characters (including hex and octal character representations), single quoted strings support only "'" and "\".

So, putting it all together, we might have a deliver script that looks something like this:

   /usr/bin/weird-deliver-program "--deliver-to=%(localDestAddr)s"
      '--extra-text=here is some extra text'

The Rules File

The rules file contains a set of rules. A rule consists of a sequence of entries identifying matching criteria followed by an "action" entry identifying actions to take on the message if all of the criteria are met.

Entries have an appearance similar to rfc822 headers. They must each be on their own line and they each consist of a tag followed by a colon and then a value which is a regular expression to be matched against a line in the message. All regular expressions are matched case-insensitive, and they begin matching at the beginning of the line - to match elsewhere in the line, start the expression with ".*"

You may include comments in the file by starting a line with a '#', but a comment must be on its own line and must be the only thing on that line - you can not mix comments and rule lines.

The following entry headers are supported:

header

Identifies a regular expression to be checked against a header

body

Identifies a regular expression to be checked against each line of the message body (including any MIME attachments)

action

Indicates an action to be taken if all of the preceeding tests succeed. Actions are:

allow

causes the message to be marked with the state "allow,rule-match" and aborts all further processing.

deny

causes the message to be marked "deny,rule-match", also aborting all further processing.

whitelist

has the same effect as "allow", but also adds the sender address to the whitelist (if it is not already there).

control-request

Treats the message as a "control request", writing a signed instruction sheet to standard output instead of the message body. See Control Messages.

control-command

Treats the message as if it were a signed "control command. Evaluates commands in the body and writes their output to standard output instead of the message body.

A rule consists of a sequence of one or more header and body entries followed by a single action entry.

Example

The following rule file might be used to allow you to receive mailings from the "spugspam" and "SpamAssassin" mailing lists, and to block chain letters from a particularly annoying relative:


   # allow all messages from the spugspam list
   header: list-id:.*spugspam
   action: allow
   
   # allow all messages from the SpamAssassing list
   header: list-id:.*spamassassin
   action: allow
   
   # ignore cousin Fred's silly chain letters
   header: from:.*cuzinfred@hotmail.com
   body: .*send this message to (3|three) people
   action: deny
   
   # all messages from me with a special body signature are control
   # messages - evaluate commands in their body
   header: from:.*myaddress@myhost.com
   body: password=z3cr3t
   action: control-command

The Received Patterns File

SPF checking uses the "received" headers to determine where the message was sent from. "Received" is a header added by MTA's to record the routing of the message. Unfortunately, its form is not entirely consistent accross different MTA's. Fortunately, you only need to be concerned with the MTA's that you have control over: those between you and the outside world.

Received headers are parsed from top to bottom in a message. Generally there are one or more received headers that you want to ignore (the "local" headers) followed by one header that you're very interested in (the "remote" header, the address of the foreign host that actually is sending the e-mail). Local headers are inserted by your delivery agent and any MTA's that are either within your domain or allowed to forward to your domain. Remote headers are inserted by the outermost trusted MTA, and these contain the address information that you want to do SPF verification on.

The recvdpat has two keywords to accomodate these different kinds of files: "local" and "remote". Each is specified on its own line and is followed by a colon and the regular expression that you wish to match. Trailing and leading whitespace is ignored.

The local keyword may also include an integer indicating the maximum number of repetitions of the header to match, or an asterisk indicating that an unlimited number of repetitions may be matched. If neither is specifed, the pattern will match at most one repetition. It is always possible for a pattern to be ignored if it does not match: there is no way to indicate that a pattern must match some minimum number of headers.

You can have as many "remote" keywords as you like. They must follow all "local" lines, and the program will stop processing received headers at the first matching "remote" line. If no remote rules are specified, two common styles of received header are used as defaults:

   remote: from \S+ \((?P<ip>\d+(\.\d+){3})\) \(HELO (?P<host>\S+)\)
   remote: from (?P<host>\S+) \(\[(?P<ip>\d+(\.\d+){3})\]\)

An an example, lets say that your mail is received by myhost.com, and you also have a forwarding account on friend.com. Your recdpat file might look something like this:

   # match for your MTA
   local: \(delivered to myaccount by foomailer\)

   # optional match for messages received from friend.com
   local: from .*\.friend\.com \(192\.168\..*\) (HELO friend.com)

   # match for external hosts - first match will be used
   remote: from .* \((?P<ip>[^)]+)\) \(HELO (?P<host>[^)])\)
   remote: from (?P<host>\S+) \(\[(?P<ip>\d+(\.\d+){3})\]\)

There are two special groups in the remote expression, (?P<ip>[^)]+) and (?P<host>[^)]). These match the IP address and host name (HELO/EHLO domain name, actually) of the remote address. These must be present in any remote pattern that you supply, their values are used as areguments to the SPF check. (they need not match such lame expressions, though, (?P<ip>d+(.d+){3}) might be more appropriate for an IP address)

If friend.com had a number of internal relay hosts, we might want to change the second rule to look more like this:

   # optional match for messages received from friend.com - relayed by any
   # number of hosts within friend.com.
   local*: from .*\.friend\.com \(192\.168\..*\) (HELO friend.com)

As stated earlier, the "local*" usage allows an unlimited number of matches. If we wanted to limit this to, say 3 matches, we could have used "local3" instead.

It is completely legal for a message to contain only local received headers: in this case it will be assumed that the message originated locally and no SPF check will be performed.

The Config File

Certain configuration options to spugspam are specified in the config file (named config). If it is present, config is parsed and executed by the python interpreter, so you have the full power of the programming language in it. That said, it only supports two variables at this time, so doing anything fancy with it probably doesn't make very much sense.

The variables supported in config are:

spf_enabled

[boolean, default = True] Turns on the SPF check.

It is rather inconsiderate to disable SPF checking, because it causes confirmation messages to be sent for spams that could not have legitimately come from the sender.

spf_timeout

[int, default = 5] Number of seconds before timeout on an SPF check. Since SPF records are served from the domain's nameserver, this is essentially a DNS timeout.

Example config file:

   # enable SPF (which is the default anyway)
   spf_enabled = True

   # ten second timeout
   spf_timeout = 10

Message States

After analyzing a message, spugspam brands the message with a message state. The message state is stored in the X-SpugSpam-State header, followed by an md5 signature created from the rest of the message and the user's inner key:

   From: "Test Dude" <test1@bogus.com>
   To: mmuller@enduden.com
   Subject: test message
   X-SpugSpam-State: allow,whitelist:vKsIrl5UACqA9Xr5giyuqA==

   This is a test message

The md5 signature is very important because spugspam reads this state header looking for information conveyed from a previous spugspam instance - without the signature, a spammer could simply add an X-SpugSpam-State: allow header and get a free pass through the system

Other programs on the mail pipeline (e.g. procmail) should scan for the state header and make a decision as to what to do with the message based on it. In general, you want to deliver messages with an "allow" state, ignore messages with a "deny" or "unrecognized" state, and possibly do special stuff with the others.

Some of the states are followed by a comma and a substate: the substate gives more information as to how the message state was determined.

This is the set of all message states:

illegal

A state header was found with bad signature. This can mean one of three things:

The message has been modified since the last time that spugspam processed it
The user's inner key has changed.
Someone has tried to circumvent spugspam

awaiting-conf

The message sender is unrecognized, so a confirmation has been sent.

confirmed

The message is a confirmation reply. You may choose to filter these out or deliver them depending on whether or not you are interested in seeing confirmation replies.

allow, substate

Allow the message to go through. "substate" is a reason code. Known reason codes are:

whitelist

sender is whitelisted

confirmed

confirmation was received. You will see these when the initial confirmation is received for all messages queued for the sender. After this, the sender is added to the whitelist and messages from them will be marked "allow,whitelist".

openaddr

receive address was in the "allowrecv" file

interesting

the message is interesting enough to merit investigation by the programmer. This was used initially to investigate the various "bounce" formats used by mailer daemons. You should never receive it.

rulematch

A rule-file rule matched with an action of "allow"

deny,substate

deny the message. "substate" is a reason code. Reason codes are:

blacklist

sender is blacklisted

rulematch

A rule-file rule matched with an action of "deny"

spffail

The message failed an SPF check: the host that provided it is not authorized to send messages for the domain of the "from" address.

bad-address

Indicates that the "from" address is invalid (i.e. a message to that address bounced).

error

error occurred during message processing. Hopefully this will be very rare.

unmarked

This is an internal state used to indicate that the message should not be marked with state information

control

the message is a control message

unknown

This is an internal state used to track the fact that a state has not been determined. You should never see this state in a message that has been processed.

Control Messages

In addition to command line options, spugspam supports "control messages" as a management technique. These are messages which spugspam recognizes as containing control information.

There are two kinds of control messages, a control request and a control command. A control command is a message containing commands to be executed. A control request performs no actions of its own, but it replaces the body of the input message with a body containing a special signature (actually, a signed timestamp) and instructions listing all of the available commands. The recipient edits this, inserting actual commands to be executed, and sends back a reply which is a control command.

The rationale for splitting control messages into this request/command set is:

To allow the user to get help text (specifically, to see the commands available)
To prevent an attacker from submitting controls to someone elses account. To be accepted as a control command, a message must have a signature generated as the result of a control request message (or match a rule specifying a control-command action).

There are two ways of causing a message to be treated as a control message:

Specifying the --control command line option

This should probably be done if you have a dedicated address for receiving control messages. In this case, you can set up a separate pipeline that runs spugspam with this option.

Specifying a rule with a control-request or control-command action

This allows you to control the system through special markups in your e-mail. For example, if you wanted to do direct control without having to first do a request, you could use something like the following rule:

      body: ^#Spugspam password=123doremi
      action: control-command

The text identified in the body header would serve as an indicator that you were sending a command message and the password would protect your account from control by an attacker. The pound sign ("#") at the beginning would be useful in this context because spugspam ignores "#" commented lines when processing a command message.

Installation

First of all, you'll need a fairly recent version of the Python interpreter. spugspam was developed on Python 2.2, and has been tested on Python 2.3 and 2.4.

Edit the spugspam script so that the first line contains an acceptable method of bootstrapping your local python interpreter. Now copy the script to some place on your $PATH ( /usr/local/bin is the recommended location).

You should probably be able to run the tester script at this point and see "0 tests failed" at the end of it (no guarantees here, as tester is a shell script and I am not certain of its portability).

Building your Configuration Directory

Create a ".spugspam" directory under your home directory (you can put it somewhere else if you want to, if you do be sure to use the "--root-dir" option to identify this location in the places where you run spugspam).

Create all of the required files identified in Configuration Files in your .spugspam directory. Examples follow:

The innerkey file need only consist of a string of data that is not easily guessed. If you are particularly paranoid, and have /dev/urandom on your system, you might want to just do this:

   (umask 077; head -c 64 </dev/urandom >~/.spugspam/innerkey)

Alternately, creating the file with a unique passphrase in your favorite editor should work just as well.

Be sure that your innerkey file is not world readable so that nobody can get your inner key and use it to trick their way through the system. In fact, you might want to make your entire .spugspam directory unreadable, as it is likely to contain some of your e-mail at various points in time.

The "confmsg" file is just a template for your confirmation message. An example follows:

   From: %(from)s
   To: %(to)s
   Subject: Confirmation request %(sig)s

   Hi, this is a one-time confirmation message to verify that you are the
   sender of the message below.  If you are, please reply with the following
   text in the subject or body of the message:

      %(sig)s

   In most cases, just hitting "reply" should work.

   Thank you,

The deliver file usually just invokes your mail delivery agent. Be warned that if you use procmail as a delivery agent, and are running spugspam from procmail, you will get a nasty hang if spugspam is invoked recursively - your best bet is probably to specify an alternate .procmailrc file. Given this, your deliver file would look something like this:

   procmail /home/myname/.procmailrc-inner

The sender file should normally just invoke your mail transfer agent. If you are using sendmail, it would look something like this:

   /usr/sbin/sendmail -t

And that's all. If you have special needs, you can set up the other files as well.

Modifying .procmailrc

If you are using spugspam with procmail, you will want to invoke it from your .procmailrc file and filter its messages afterwards.

In the most simple case, your .procmailrc file should look something like this:

   :0fw
   | spugspam
   
   :0
   * x-spugspam-state: allow
   mymailfile
   
   :0
   /dev/null

This example assumes that you trust spugspam implicitly and only want to receive messages that it allows and everything else goes into the bit-bucket.

In reality, you probably want to just use spugspam to mark messages for a week or two so you can see the results.

If you are using spugspam with other mail filters that modify the message (e.g. SpamAssassin), you may want to run spugspam twice:

   # only do the basic checks of the rules and white/black lists
   :0fw
   | spugspam --check-lists

   # anything that spugspam recognizes gets to go right on through
   :0
   * x-spugspam-state: allow
   mymailfile

   # run it through another spam filter
   :0fw
   | spamc

   # filter out stuff that other spam filter marked as spam   
   :0
   * x-spam-state: SPAM
   /dev/null
   
   # --force-accept-state makes it read the state information from the first 
   # pass even though the message may have been modified.
   :0fw
   | spugspam --force-accept-state

Author

Michael A. Muller

Portions contributed by Sam Lantinga.

Reporting Bugs

Report bugs to mmuller@enduden.com

Muller's World