Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: [Original edit by JustinMason]

...

Social network talk:

  • pretty useless spamfiltering-wise at least; not any spam orientation at all

...

  • corpus analysis, from Hotmail's feedback loop
    • volunteers classify random samples of their mail as spam or good; tens of thousands of hand-classified messages per day; large "unbiased" (???) sample of spam
  • additional analysis on two sets of spam:
    • about a year between the two
    • products sold, exploits used, trends
  • viagra types: 17% 2003, to 34% 2004
  • graphic porn down: 13% to 7%
  • exploits: increasing rapidly, 1.33 exploits 2003 to 1.73 in 2004
  • word obscuring: up to 20% in 2004
  • URL chaffing, adding good URLs to spam: not there in 2003, 10% in 2004 – anti-SURBL attack (wink)
  • Spammers are putting more work into each spam

Introducing the Enron Corpus:

  • 1.3million messages originally; removed msgs with "integrity problems", replaced usernames etc
  • http://www-2.cs.cmu.edu/~enron
  • 200,399 useful, non-dupe messages
  • 158 messages, 1,268 msgs/user
  • missing message headers, so not much use for spam filtering; Exchange-mangled; no HTML. still, maybe good for "body" rules and FP avoidance
  • no mention how much of the corpus was spam (wink)

Larry Lessig:

  • extraordinary amount going to tech fixes; very little going to how the law could address it
  • compares govt attention to "pirate radio" creating static for large commercial stations, vs the spam problem
  • multiple types of regulators: the law, social norms, the market, and architecture (example: windows in lecture theatre are closed to enforce paying attention to speakers)
  • the law also regulates the other three
  • (that was the wrong talk! starts again!)
  • 1. "regulation is always multiple modalities"
  • 2. "interests will react"
  • 3. "special interests defeat general interests"
  • in the old days, we had norms to defeat spam; that failed
  • using code to fix; so far that's failed
  • "the market will fix the problem"; ISPs trying to be the spam-free email provider; that's also failed
  • CAN-SPAM: totally failed – even displaced effective state legislation
  • not any single modality alone can fix it
  • regulation is a restriction, plus somebody to enforce it
  • CAN-SPAM: wanted truthful headers
  • opt-out doesn't provide any way for you to know if you've really been opted-out
  • enforcement: state AGs, ISPs, federal - centralised; too big though. they have better things to do with their time than bust spammers
  • solution: marries legal/architectural/market
  • legal: has two parts: (1) labels ("ADV" in the subject line)
  • (2) a bounty
  • (q: SEXUALLY-EXPLICIT tag is a label, already massively flouted by spammers. other labels would be flouted just as much.)
  • architecture: filter code then blocks mails with "ADV"
  • market: spammers would then have to incentivise people to receive their mail by sending offers they want (yeah right (wink)
  • enforcement: spam will only be sent if you can be paid, so "follow the money" – part of CAN-SPAM states "the business that benefits is responsible"
  • market in enforcement: bounty hunters who identify label-less spam (ah). amateurs, not law enforcement, large population
  • during CAN-SPAM development: labels were undesirable. Reason: "labels are too effective", because e.g. Amazon would have to have labelled their ads (because there was no distinction between opt-in and opt-out) and would be filtered
  • fundamental problem: corruption due to vested interests lobbying (cf CAN-SPAM)
  • sees difficulties in differentiating
  • q: tracing spam to the business that benefits often involves getting forwarding addresses from e.g. a CGI script running on a server in the Ukraine. *needs* law-enforcement power to get that IMO. a: "yes, and law-enforcement power is available, and jurisdiction problems are easy" (not sure about that! at least for the non-LE bounty-hunter case)
  • q: opt-in would have fixed it, like it has in Australia; but DMA keeps emasculating the laws into YOU-CAN-SPAM. a: agrees that there are multiple answers, but prefers not requiring opt-in across the board and uses the UCE definition as it allows political speech without adding to their costs. (I disagree, personally; the "UBE" definition works for me --jm)
  • Jon Praed: enforcement requires tremendous resources, and in some cases you've got to get to that IP address within 7 days to get those logs, with LE power. This is not easy. Notes that spammer margins are incredibly low, and those bounties as a result would be small and/or hard to get.
  • JP again: also suggests labels to label "good" commercial mail, personal mail, and then leave over "unknown" mail – which is then suspect. also suggests that the *headers* are the labelling, in reality.
  • q: "special interests always seem to wipe out general interest on this issue in laws. what can we do, law-wise?" "my brand is pessimism", "there was this moment, when they passed CAN-SPAM, when legislators were keen to fix it – then the special interests came in".
  • observation from audience: spots the parallel between UK and Pirate radio in the late 60's, which also passed a McCain anti-advertiser provision to deal with it.
  • Dave Crocker: believes that the suggestion would result in little real effect on spammers, and quite a heavy hit on legit businesses