Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: [Original edit by JustinMason]

...

  • "enterprise-class anti-spam filter", but aren't we all (wink)
  • centralized filter with personalized performance
  • includes a "Bulk Mail Manager" for outbound *bulk* mail, interesting
  • uses a "DNS analysis" step which sounds like it performs SPF checks
  • DNS and domain analysis: check open relays, reverse DNS lookups and static IP tables; mail from dyn IPs; recency of dom registration; probabilistic analysis of Received trail
  • bayes learning also feeds blacklist/whitelist; AWL is actually probabilistic
  • "plagiarism detection": signature based really: "fast analysis of common k-grams"; learns from few examples; almost guaranteed not to be a FP; high FN rate though
  • text classifier: Linear Discriminant: regularized linear classifier; approximates SVM
  • Chung-Kwei (which rocks): really really effective: 86% with < 0.01% FPs on their test corpus
  • test: corpus: 173k msgs, 130k spam, 42k good
  • spam defn: UCE (not UBE). cleaned repeatedly
  • combining algorithms: right with SpamAssassin dogma (wink)
  • nice graph of aggregated performance; 96% with < 0.01% FPs
  • SpamAssassin TODO: we need to add short-circuiting again!
  • http://www.research.ibm.com/spam
  • q: "what period, who were the 100 users?" a: users at IBM Watson
  • q: how do you get your "recency of domain registration" data? a: straight from WHOIS

Richard Clayton: Stopping Spam by Extrusion Detection

  • from demon.co.uk
  • ISPs can spot smarthost load going up, and suspect that there's a spammer active
  • insecure customers main problem for UK ISPs
  • ISP's real problem: blacklisting of IP ranges and smarthosts; rapid action is req'd
  • hard problem to solve: expensive to examine outgoing content; legal issues with blocking, and FP may cost you customers; volume is not good indicator!; "incorrect" sender domain doesn't indicate spam
  • solution: spot delivery failure errors (due to user unknown, remote blocks) in smarthost logs
  • heuristics: "too many" delivery failures (40/day sufficient); ignore "bounces" – have null <> return-path; ignore "mailing lists" (most dests work, few fail)
  • when first turned on, was finding 40 infected customers *per day*!
  • http://www.cl.cam.ac.uk/~rnc1/
  • q: "direct-to-MX spam? trapping port 25?" a: no we don't do that and don't mind about that, as much as spammers using our smarthost and getting that blocklisted
  • q: "sending outbound (or parts thereof) through SpamAssassin?" a: SpamAssassin is too expensive (in terms of load)
  • q: "hair-trigger nature of listing?" a: it's not automatic. there's always a manual verification, and it's usually very obvious at that step

Resisting Spam Delivery through TCP damping:

  • by default, TCP allows sender to control rate of flow; sender can achieve highest speed permitted by network
  • TCP damping tries to reduce net efficiency at the receiver side; more time, more bandwidth, more CPU cycles
  • low pain for recipients, high aggregated pain to spammers
  • need to do this at TCP layer; higher and lower aren't useful
  • even with tarproxy or similar, a smart spammer can blast the entire message to your TCP layer in one blat, even if you're tarpitting at the application layer
  • damping: increase sending time (delaying TCP packets); consume network bandwidth (request more packets)
  • increase delay: set adv_win = 0; fake congestion; delay outgoing ACKs (TCP conn terminates after 14 retries). cost at receiver: long idle TCP conn
  • increase bandwidth costs: request more retrans.; request more ACKs – reuse sequence numbers, use seqs that won't be used in this conn; send packets in reverse order. cost: about 1:1 ratio
  • used SpamAssassin at delivery time to estimate spamminess! mostly headers during early SMTP conversation, but you can use body rules before "250 Message Accepted for Delivery"
  • q: economics. "increases senders costs, but not a transfer to the recipient." a: there are no existing techniques to do this, and TCP damping must work in existing system.
  • q: if I was a spammer, and I figured out you were TCP damping, I'd ignore your advertised windows and blat entire message, hurting the network overall. a: sure, but hurting the spammer's bandwidth like this is worth it
  • q: but this encourages broken TCP implementations. a: but a broken TCP stack still won't get their spam delivered
  • q from John Levine: TurnTide does exactly this technique by narrowing the TCP window on the spammer's connections.
  • q: why not just use delayed ACKs? a: because it's not entirely as effective as the other techniques