Page History

...

Social network talk:

pretty useless spamfiltering-wise at least; not any spam orientation at all

...

corpus analysis, from Hotmail's feedback loop
- volunteers classify random samples of their mail as spam or good; tens of thousands of hand-classified messages per day; large "unbiased" (???) sample of spam
additional analysis on two sets of spam:
- about a year between the two
- products sold, exploits used, trends
viagra types: 17% 2003, to 34% 2004
graphic porn down: 13% to 7%
exploits: increasing rapidly, 1.33 exploits 2003 to 1.73 in 2004
word obscuring: up to 20% in 2004
URL chaffing, adding good URLs to spam: not there in 2003, 10% in 2004 – anti-SURBL attack
Spammers are putting more work into each spam

Introducing the Enron Corpus:

1.3million messages originally; removed msgs with "integrity problems", replaced usernames etc
http://www-2.cs.cmu.edu/~enron
200,399 useful, non-dupe messages
158 messages, 1,268 msgs/user
missing message headers, so not much use for spam filtering; Exchange-mangled; no HTML. still, maybe good for "body" rules and FP avoidance
no mention how much of the corpus was spam

Larry Lessig:

extraordinary amount going to tech fixes; very little going to how the law could address it
compares govt attention to "pirate radio" creating static for large commercial stations, vs the spam problem
multiple types of regulators: the law, social norms, the market, and architecture (example: windows in lecture theatre are closed to enforce paying attention to speakers)
the law also regulates the other three
(that was the wrong talk! starts again!)
1. "regulation is always multiple modalities"
2. "interests will react"
3. "special interests defeat general interests"
in the old days, we had norms to defeat spam; that failed
using code to fix; so far that's failed
"the market will fix the problem"; ISPs trying to be the spam-free email provider; that's also failed
CAN-SPAM: totally failed – even displaced effective state legislation
not any single modality alone can fix it
regulation is a restriction, plus somebody to enforce it
CAN-SPAM: wanted truthful headers
opt-out doesn't provide any way for you to know if you've really been opted-out
enforcement: state AGs, ISPs, federal - centralised; too big though. they have better things to do with their time than bust spammers
solution: marries legal/architectural/market
legal: has two parts: (1) labels ("ADV" in the subject line)
(2) a bounty
(q: SEXUALLY-EXPLICIT tag is a label, already massively flouted by spammers. other labels would be flouted just as much.)
architecture: filter code then blocks mails with "ADV"
market: spammers would then have to incentivise people to receive their mail by sending offers they want (yeah right
enforcement: spam will only be sent if you can be paid, so "follow the money" – part of CAN-SPAM states "the business that benefits is responsible"
market in enforcement: bounty hunters who identify label-less spam (ah). amateurs, not law enforcement, large population
during CAN-SPAM development: labels were undesirable. Reason: "labels are too effective", because e.g. Amazon would have to have labelled their ads (because there was no distinction between opt-in and opt-out) and would be filtered
fundamental problem: corruption due to vested interests lobbying (cf CAN-SPAM)
sees difficulties in differentiating
q: tracing spam to the business that benefits often involves getting forwarding addresses from e.g. a CGI script running on a server in the Ukraine. *needs* law-enforcement power to get that IMO. a: "yes, and law-enforcement power is available, and jurisdiction problems are easy" (not sure about that! at least for the non-LE bounty-hunter case)
q: opt-in would have fixed it, like it has in Australia; but DMA keeps emasculating the laws into YOU-CAN-SPAM. a: agrees that there are multiple answers, but prefers not requiring opt-in across the board and uses the UCE definition as it allows political speech without adding to their costs. (I disagree, personally; the "UBE" definition works for me --jm)
Jon Praed: enforcement requires tremendous resources, and in some cases you've got to get to that IP address within 7 days to get those logs, with LE power. This is not easy. Notes that spammer margins are incredibly low, and those bounties as a result would be small and/or hard to get.
JP again: also suggests labels to label "good" commercial mail, personal mail, and then leave over "unknown" mail – which is then suspect. also suggests that the *headers* are the labelling, in reality.
q: "special interests always seem to wipe out general interest on this issue in laws. what can we do, law-wise?" "my brand is pessimism", "there was this moment, when they passed CAN-SPAM, when legislators were keen to fix it – then the special interests came in".
observation from audience: spots the parallel between UK and Pirate radio in the late 60's, which also passed a McCain anti-advertiser provision to deal with it.
Dave Crocker: believes that the suggestion would result in little real effect on spammers, and quite a heavy hit on legit businesses

Child pages

Versions Compared

Old Version 5

New Version 6

Key