THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
Social network talk:
- pretty useless spamfiltering-wise at least; not any spam orientation at all
...
- corpus analysis, from Hotmail's feedback loop
- volunteers classify random samples of their mail as spam or good; tens of thousands of hand-classified messages per day; large "unbiased" (???) sample of spam
- additional analysis on two sets of spam:
- about a year between the two
- products sold, exploits used, trends
- viagra types: 17% 2003, to 34% 2004
- graphic porn down: 13% to 7%
- exploits: increasing rapidly, 1.33 exploits 2003 to 1.73 in 2004
- word obscuring: up to 20% in 2004
- URL chaffing, adding good URLs to spam: not there in 2003, 10% in 2004 – anti-SURBL attack
- Spammers are putting more work into each spam
Introducing the Enron Corpus:
- 1.3million messages originally; removed msgs with "integrity problems", replaced usernames etc
- http://www-2.cs.cmu.edu/~enron
- 200,399 useful, non-dupe messages
- 158 messages, 1,268 msgs/user
- missing message headers, so not much use for spam filtering; Exchange-mangled; no HTML. still, maybe good for "body" rules and FP avoidance
- no mention how much of the corpus was spam
Larry Lessig:
- extraordinary amount going to tech fixes; very little going to how the law could address it
- compares govt attention to "pirate radio" creating static for large commercial stations, vs the spam problem
- multiple types of regulators: the law, social norms, the market, and architecture (example: windows in lecture theatre are closed to enforce paying attention to speakers)
- the law also regulates the other three
- (that was the wrong talk! starts again!)
- 1. "regulation is always multiple modalities"
- 2. "interests will react"
- 3. "special interests defeat general interests"
- in the old days, we had norms to defeat spam; that failed
- using code to fix; so far that's failed
- "the market will fix the problem"; ISPs trying to be the spam-free email provider; that's also failed
- CAN-SPAM: totally failed – even displaced effective state legislation
- not any single modality alone can fix it
- regulation is a restriction, plus somebody to enforce it
- CAN-SPAM: wanted truthful headers
- opt-out doesn't provide any way for you to know if you've really been opted-out
- enforcement: state AGs, ISPs, federal - centralised; too big though. they have better things to do with their time than bust spammers
- solution: marries legal/architectural/market
- legal: has two parts: (1) labels ("ADV" in the subject line)
- (2) a bounty
- (q: SEXUALLY-EXPLICIT tag is a label, already massively flouted by spammers. other labels would be flouted just as much.)
- architecture: filter code then blocks mails with "ADV"
- market: spammers would then have to incentivise people to receive their mail by sending offers they want (yeah right
- enforcement: spam will only be sent if you can be paid, so "follow the money" – part of CAN-SPAM states "the business that benefits is responsible"
- market in enforcement: bounty hunters who identify label-less spam (ah). amateurs, not law enforcement, large population
- during CAN-SPAM development: labels were undesirable. Reason: "labels are too effective", because e.g. Amazon would have to have labelled their ads (because there was no distinction between opt-in and opt-out) and would be filtered
- fundamental problem: corruption due to vested interests lobbying (cf CAN-SPAM)
- sees difficulties in differentiating
- q: tracing spam to the business that benefits often involves getting forwarding addresses from e.g. a CGI script running on a server in the Ukraine. *needs* law-enforcement power to get that IMO. a: "yes, and law-enforcement power is available, and jurisdiction problems are easy" (not sure about that! at least for the non-LE bounty-hunter case)
- q: opt-in would have fixed it, like it has in Australia; but DMA keeps emasculating the laws into YOU-CAN-SPAM. a: agrees that there are multiple answers, but prefers not requiring opt-in across the board and uses the UCE definition as it allows political speech without adding to their costs. (I disagree, personally; the "UBE" definition works for me --jm)
- Jon Praed: enforcement requires tremendous resources, and in some cases you've got to get to that IP address within 7 days to get those logs, with LE power. This is not easy. Notes that spammer margins are incredibly low, and those bounties as a result would be small and/or hard to get.
- JP again: also suggests labels to label "good" commercial mail, personal mail, and then leave over "unknown" mail – which is then suspect. also suggests that the *headers* are the labelling, in reality.
- q: "special interests always seem to wipe out general interest on this issue in laws. what can we do, law-wise?" "my brand is pessimism", "there was this moment, when they passed CAN-SPAM, when legislators were keen to fix it – then the special interests came in".
- observation from audience: spots the parallel between UK and Pirate radio in the late 60's, which also passed a McCain anti-advertiser provision to deal with it.
- Dave Crocker: believes that the suggestion would result in little real effect on spammers, and quite a heavy hit on legit businesses