Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3
Panel
titleContents
Table of Contents

You can combine Traffic Server with Linux iproute2 to shape traffic between the proxy and origin or between the proxy and client. Or you can use BSD ALTQ or a separate device like a capable router or network switch. Traffic Server marks the traffic and iproute2 etc. do the actual traffic shaping. The two can be on the same machine or they can be separate devices. This is sometimes called bandwidth management, traffic shaping or QoS.

Here are some ways Traffic Server can communicate with iproute2 etc.

  • Packet or connection mark marks (on the same machine only)
  • Type of Service (ToS) or Differentiated Services (DiffServ) Field (IP packet)
  • Priority Code Point (PCP) Field (Ethernet frame)

There's Traffic Server offers more than one way to set some of these marks and some ways don't work in all scenarios:

...

mark_in and tos_in mark traffic destined for the client (the packets that make up a client response) and mark_out and tos_out mark traffic destined for the origin (the packets that make up an origin request). Sometimes you can mark traffic sent *from* the origin with the Netfilter CONNMARK iptables/ip6tables target (the packets that make up an origin response). mark_in and mark_out set the packet mark and tos_in and tos_out set the ToS/DiffServ Field. Configuration variables for the connection mark and PCP Field haven't been implemented.

...

mark_out and tos_out are overridable but mark_in and tos_in are not. In addition to configuring them globally in records.config you can override them per transaction with the conf_remap plugin, the set-config header_rewrite operator, or TSHttpTxnConfigIntSet() *however* they are only consulted when the socket is created which happens before SEND_REQUEST_HDR_HOOK so you must set them in READ_REQUEST_HDR_HOOK or at some other earlier time. See Connection::open(). It might be possible to update an existing socket when you change the configuration variables by implementing a callback.

API Functions

ClientPacketMarkSet() and ClientPacketTosSet() mark traffic destined for the client and ServerPacketMarkSet() and ServerPacketTosSet() mark traffic destined for the origin. Sometimes you can mark traffic sent *from* the origin with the Netfilter CONNMARK iptables/ip6tables target. ClientPacketMarkSet() and ServerPacketMarkSet() set the packet mark and ClientPacketTosSet() and ServerPacketTosSet() set the ToS/DiffServ Field. API functions for the connection mark and PCP Field haven't been implemented but you can call TSHttpSsnClientFdGet() or TSHttpTxnClientFdGet() to get the client socket and call setsockopt() to set the option yourself.

Unlike the configuration variables the API functions work whether the socket has already been created or not. They immediately update an existing socket , and the options are also get applied when a new socket is eventually created.

...

The set-conn-dscp operator immediately updates an existing socket. (The client socket will already have been created because a transaction doesn't exist without one.)

Packet or Connection

...

Marks

These marks work only when Traffic Server and iproute2 etc. are on the same machine, they aren't communicated to a separate device, however a separate device can might want to copy marks from the ToS/DiffServ or PCP fields to the connection mark to mark traffic sent *from* the origin. The socket option is setsockopt(sockfd, SOL_SOCKET, SO_MARK, &optval, optlen)

ToS/DiffServ Field

This is an 8-bit field in the IPv4 and IPv6 packets but the 2 least significant bits are now reserved for Explicit Congestion Notification (ECN). You can't set them! The values you can set are 0x00, 0x04, 0x08, ... 0xfc. If you set the field to 0xff the effective value will be 0xfc. Furthermore

  • the values XXX000XX have special meaning for backwards compatibility with the IP Precedence Field (see RFC 2474 section 4.2)
  • and the values XXXXX0XX are reserved for standards action (see RFC 2474 section 6).
  • The values XXXX10XX are initially available for experimental or local use but future standards should preferentially claim them if other values are exhausted.

Only the values XXXX11XX or 0x0c, 0x1c, 0x2c, ... 0xfc are reserved for experimental or local use.

These marks work even if a separate device does the actual traffic shaping. The ToS Field was originally specified in RFC 791. Both it and the IPv6 Traffic Class Field were superseded by the DiffServ Field specified in RFC 2474. The IPv4 socket option is setsockopt(sockfd, IPPROTO_IP, IP_TOS, &optval, optlen) The IPv6 socket option is setsockopt(sockfd, IPPROTO_IPV6, IPV6_TCLASS, &optval, optlen)

PCP Field

This is a 3-bit field in the Ethernet frame. The field is specified in IEEE 802.1Q and the values are the 8 priority levels specified in IEEE P802.1p

These marks work even if a separate device in the same broadcast domain does the actual traffic shaping. The socket option is setsockopt(sockfd, SOL_SOCKET, SO_PRIORITY, &optval, optlen)

Nothing related to the PCP Field has been added to Traffic Server but you can call TSHttpSsnClientFdGet() or TSHttpTxnClientFdGet() to get the client socket and call setsockopt() to set the option yourself. There's a feature request (TS-3037) to implement the PCP Field in Traffic Server.

Here's an example of a plugin that can set the PCP Field on traffic sent to the client (the packets that make up a client response) based on the request target. You could adapt it to instead mark traffic sent to the origin (the packets that make up an origin request) or to support additional criteria like the user agent.

Use the tsxs utility to compile it:

Code Block


  $ tsxs -o priority.so priority.cc

Add lines like the following to remap.config to configure it:

Code Block


  map <a class="external-link" href="http://example.com" rel="nofollow">http://example.com</a> <a class="external-link" href="http://example.com" rel="nofollow">http://example.com</a> @plugin=priority.so @pparam=4
  regex_map http://.*\.example.com http://$0 @plugin=priority.so @pparam=4

Traffic Sent *From* the Origin

You can copy marks from the ToS/DiffServ or PCP fields to the connection mark with the Netfilter CONNMARK target, to mark traffic sent *from* the origin, even if Traffic Server is on a separate device. In the following example, if you set the ToS/DiffServ Field (0x0c, 0x1c, etc.) on traffic sent to the origin, it will add both the origin request and response to the same iproute2 class (2:1, 2:3, etc.):

Code Block


  iptables -t mangle -A POSTROUTING -m tos --tos 0x0c -j CONNMARK --set-mark 1
  iptables -t mangle -A POSTROUTING -m connmark --mark 1 -j CLASSIFY --set-class 2:1
  iptables -t mangle -A POSTROUTING -m tos --tos 0x1c -j CONNMARK --set-mark 2
  iptables -t mangle -A POSTROUTING -m connmark --mark 2 -j CLASSIFY --set-class 2:3

Zero Penalty Hit

Zero Penalty Hit is a feature of Squid to mark traffic sent to the client based on the cache lookup status (hit, miss, etc.) The use case for this is the management of upstream bandwidth without limiting access to content that's already cached. Here's an example of a plugin that does the same thing but I'm skeptical that it isn't better to directly manage the bandwidth between the proxy and origin? You can mark traffic sent *from* the origin with the Netfilter CONNMARK target (the packets that make up an origin response).

Use the tsxs utility to compile the plugin:

Code Block


  $ tsxs -o tos.so tos.cc

iproute2, iptables/ip6tables, and ebtables all support the packet mark:

  • iproute2 can match marked traffic directly, the mark u32 filter selector applies to the packet mark.
  • The -m mark --mark iptables/ip6tables match extension matches traffic based on the packet mark. You can apply an iproute2 class with the CLASSIFY target or set the connection mark with the CONNMARK target.
  • The -m m_mark --mark ebtables match extension matches traffic based on the packet mark.

You can match traffic based on the connection mark with the -m connmark --mark iptables/ip6tables match extension. The packet mark socket option is setsockopt(sockfd, SOL_SOCKET, SO_MARK, &optval, optlen)

ToS/DiffServ Field

This is an 8-bit field in the IPv4 and IPv6 packets but the 2 least significant bits are now reserved for Explicit Congestion Notification (ECN). You can't set them! The values you can set are 0x00, 0x04, 0x08, ... 0xfc. If you set the field to 0xff the effective value will be 0xfc. Furthermore

  • the values XXX000XX have special meaning for backwards compatibility with the IP Precedence Field (see RFC 2474 section 4.2)
  • and the values XXXXX0XX are reserved for standards action (see RFC 2474 section 6).
  • The values XXXX01XX are initially available for experimental or local use but future standards should preferentially claim them if other values are exhausted.

The standardized values are listed in the IANA registry along with their applicable specifications and the differentiated services Wikipedia article discusses them in greater detail. Only the values XXXX11XX or 0x0c, 0x1c, 0x2c, ... 0xfc are reserved for experimental or local use.

These marks work even if a separate device does the actual traffic shaping. iproute2, iptables/ip6tables, and ebtables all support the ToS/DiffServ Field:

  • iproute2 can match marked traffic directly with the ip tos and ip6 priority u32 filter selectors.
  • iptables/ip6tables can match marked traffic with any of the following match extensions: -m dscp --dscp, -m dscp --dscp-class, and -m tos --tos. You can apply an iproute2 class with the CLASSIFY target or set the connection mark with the CONNMARK target.
  • ebtables can match marked traffic with the -m ip --ip-tos and -m ip6 --ip6-tclass match extensions.

The ToS Field was originally specified in RFC 791. Both it and the IPv6 Traffic Class Field were superseded by the DiffServ Field specified in RFC 2474. The IPv4 socket option is setsockopt(sockfd, IPPROTO_IP, IP_TOS, &optval, optlen) and the IPv6 socket option is setsockopt(sockfd, IPPROTO_IPV6, IPV6_TCLASS, &optval, optlen)

PCP Field

This is a 3-bit field in the Ethernet frame. The field is specified in IEEE 802.1Q and the values are the 8 priority levels specified in IEEE P802.1p.

These marks work even if a separate device in the same broadcast domain does the actual traffic shaping. You can match marked traffic with the ebtables -m vlan --vlan-prio match extension. The socket option is setsockopt(sockfd, SOL_SOCKET, SO_PRIORITY, &optval, optlen)

Nothing related to the PCP Field has been added to Traffic Server but you can call TSHttpSsnClientFdGet() or TSHttpTxnClientFdGet() to get the client socket and call setsockopt() to set the option yourself. There's a feature request (TS-3037) to implement the PCP Field in Traffic Server.

Here's an example of a plugin that can set the PCP Field on traffic sent to the client (the packets that make up a client response) based on the request target. You could adapt it to instead mark traffic sent to the origin (the packets that make up an origin request) or to support additional criteria like the user agent.

Use the tsxs utility to compile it:

Code Block
languagebash

Add lines like the following to remap.config to configure it:

Code Block
languagebash
titleremap.config

Traffic Sent *From* the Origin

You can copy marks from the ToS/DiffServ or PCP fields to the connection mark with the CONNMARK iptables/ip6tables target, to mark traffic sent *from* the origin, even if Traffic Server is on a separate device. In the following example, if you set the ToS/DiffServ Field (0x0c, 0x1c, etc.) on traffic sent to the origin, it will add both the origin request and response to the same iproute2 class (2:1, 2:3, etc.):

Code Block
languagebash

Zero Penalty Hit

Zero Penalty Hit is a feature of Squid to mark traffic sent to the client based on the cache lookup status (hit, miss, etc.) The use case for this is the management of upstream bandwidth without limiting access to content that's already cached. Here's an example of a Traffic Server plugin that does the same thing but I'm skeptical that it isn't better to directly manage the bandwidth between the proxy and origin? What effect does constricting traffic between the proxy and client have on the upstream traffic? See background fill and Read While Writer. The CONNMARK iptables/ip6tables target and transparency can help directly manage the bandwidth between the proxy and origin.

Use the tsxs utility to compile the plugin:

Code Block
languagebash

If the content was already cached (cache hit) then the plugin sets the ToS/DiffServ Field on the client response to 0x0c. Edit the source code to change the If the content was already cached (cache hit) then the plugin sets the ToS/DiffServ Field on the client response to 0x0c. Edit the source code to change the value or implement a configuration variable to set it. You could adapt it to mark other cache lookup statuses (see TSHttpTxnCacheLookupStatusGet()) or instead set the packet mark, connection mark, or PCP Field.

...

This is handy for communicating more detail to iproute2 etc. For example here's how to divide upstream bandwidth equally among all clients with iproute2 and SFQ:

Code Block


  # The source of origin requests and destination of origin responses is
  # the address of the client
  CONFIG proxy.config.http.server_ports STRING 8080:tr-out

Code Block


  # Remember if traffic originated from our internet connection
  iptables -t mangle -A PREROUTING -i eth0.2 -j MARK --set-mark 1/1

  ifconfig ifb0 up

  # A qdisc is required before we can add a filter
  insmod sch_prio
  tc qdisc add dev br-lan root handle 1 prio

  # Shape only traffic originating from our internet connection
  # (packet mark 1/1)
  insmod cls_u32
  insmod act_mirred
  tc filter add dev br-lan parent 1: protocol ip pref 1 u32 match mark 1 1 flowid 1:1 action mirred egress redirect dev ifb0

  # Don&apos;t shape traffic (reorder/delay/drop) while there&apos;s available
  # capacity.  Unfortunately available capacity must be manually
  # configured and fine-tuned.  The following assumes isolated
  # up/downstream capacity (full-duplex).
  insmod sch_tbf
  tc qdisc add dev eth0.2 root handle 1 tbf rate .5mbit burst 5k latency 70ms
  tc qdisc add dev ifb0 root handle 1 tbf rate 2.5mbit burst 5k latency 70ms

  # Schedule an equal amount of traffic for each client
  insmod sch_sfq
  tc qdisc add dev eth0.2 parent 1: handle 2 sfq
  tc qdisc add dev ifb0 parent 1: handle 2 sfq

  # Divide downstream traffic into clients by destination IP address.
  # Divide upstream traffic into clients by *Netfilter connection
  # tracking* source IP address (after NAT all upstream traffic shares the
  # same source IP address).
  insmod cls_flow
  tc filter add dev eth0.2 parent 2: pref 1 handle 1 flow hash keys nfct-src divisor 1024
  tc filter add dev ifb0 parent 2: protocol ip pref 1 handle 1 flow hash keys dst divisor 1024

Background

Doing the actual traffic shaping with standard tools means you can combine all of their existing features with Traffic Server, you don't need to know and maintain another system specifically for Traffic Server, and you can shape the aggregate of proxy and non-proxy traffic. For example you can limit the sum of all traffic except access to Wikipedia and Khan Academy.

Something you can't yet do with Linux iproute2 is specify a bandwidth for each connection or each client (without enumerating all of the clients in advance). This feature is available in MikroTik RouterOS, they call it PCQ.

If you need to distinguish one transaction from another you're required to use configuration variables for origin traffic but header_rewrite operators or API functions for client traffic. I think this asymmetry unnecessarily burdens the administrator with implementation details. It would be more consistent and more user friendly to just implement configuration variables for each socket option and make them work in all scenarios (origin and client traffic, records.config, the conf_remap plugin, the set-config header_rewrite operator, and TSHttpTxnConfigIntSet()). It might be possible to update an existing socket when you change the configuration variables by implementing a callback.

I wonder how the current implementation works with persistent connections? For example I suspect if you set a socket option per transaction and then reuse the connection to make another origin request or if the client reuses it to make another request, the option doesn't get reset?

TS-1090 and commit b77838991531d6cb402618c3d690b83e95b92d63 originally added the packet mark and ToS/DiffServ Field configuration variables and API functions. TS-3002 added the set-conn-dscp rewrite_header operator.

Example

Here's a full example of how to shape traffic between the proxy and origin based on the request target. It assigns websites one of three priorities. After the upstream bandwidth is scheduled by priority, each client gets an equal portion of each priority.

Traffic Server and iproute2 can be on different devices. It communicates the priority to iproute2 in the ToS/DiffServ Field which it overrides with the tos_out configuration variable and the conf_remap plugin. It shapes traffic sent *from* the origin by copying the priority from the ToS/DiffServ Field to the connection mark. The origin connection is transparent so iproute2 can tell which client the traffic belongs to. The client connection is transparent so Traffic Server can get the address of the origin in the rare case that the client neglects to send a Host header (HTTP/1.0 doesn't require it). You could alternatively configure the client to use the proxy explicitly.

Code Block
languagebash
titlerecords.config
Code Block
languagebash

Background

Doing the actual traffic shaping with standard tools means you can combine all of their existing features with Traffic Server, you don't need to know and maintain another system specifically for Traffic Server, and you can shape the aggregate of proxy and non-proxy traffic. For example you can limit the sum of all traffic except access to Wikipedia and Khan Academy.

Something you can't yet do with iproute2 is specify a bandwidth for each connection or each client (without enumerating all of the clients in advance). This feature is available in MikroTik RouterOS, they call it PCQ.

If you need to distinguish one transaction from another you're required to use configuration variables for origin traffic but header_rewrite operators or API functions for client traffic. I think this asymmetry unnecessarily burdens the administrator with implementation details. It would be more consistent and more user friendly to just implement configuration variables for each socket option and make them work in all scenarios (origin and client traffic, records.config, the conf_remap plugin, the set-config header_rewrite operator, and TSHttpTxnConfigIntSet()). It might be possible to update an existing socket when you change the configuration variables by implementing a callback.

I wonder how the current implementation works with persistent connections? For example I suspect if you set a socket option per transaction and then reuse the connection for another origin request or if the client reuses it to make another request, the option doesn't get reset?

TS-1090 and commit b77838991531d6cb402618c3d690b83e95b92d63 originally added the packet mark and ToS/DiffServ Field configuration variables and API functions. TS-3002 added the set-conn-dscp rewrite_header operator.

Example

Here's a full example of how to shape traffic between the proxy and origin based on the request target. It assigns websites one of three priorities, after the upstream bandwidth is scheduled by priority, each client gets an equal portion of each priority.

Traffic Server and iproute2 can be on different devices. It communicates the priority to iproute2 in the ToS/DiffServ Field which it overrides with the tos_out configuration variable and the conf_remap plugin. It shapes traffic sent *from* the origin by copying the priority from the ToS/DiffServ Field to the connection mark. The origin connection is transparent so iproute2 can tell which client the traffic belongs to. The client connection is transparent so Traffic Server can get the address of the origin in the rare case that a client neglects to send a Host header (HTTP/1.0 doesn't require it). You could alternatively configure each client to use the proxy explicitly.

Be careful of ICMP redirects, they can sometimes cause clients to route non-web traffic to the proxy!

Code Block
languagebash
titlerecords.config
Code Block
languagebash
titleremap.config
Code Block
languagebash
Be careful of ICMP redirects, they can sometimes cause clients to route non-web traffic to the proxy.
Code Block


  # The source of client responses and destination of client requests is
  # the address of the origin.  The source of origin requests and
  # destination of origin responses is the address of the client.
  CONFIG proxy.config.http.server_ports STRING 8080:tr-full

Code Block


  # Give high priority to Wikipedia, low priority to YouTube
  map <a class="external-link" href="http://wikipedia.org" rel="nofollow">http://wikipedia.org</a> <a class="external-link" href="http://wikipedia.org" rel="nofollow">http://wikipedia.org</a> @plugin=conf_remap.so @pparam=proxy.config.net.sock_packet_tos_out=0x0c
  regex_map http://.*\.wikipedia\.org http://$0 @plugin=conf_remap.so @pparam=proxy.config.net.sock_packet_tos_out=0x0c
  map <a class="external-link" href="http://youtube.org" rel="nofollow">http://youtube.org</a> <a class="external-link" href="http://youtube.org" rel="nofollow">http://youtube.org</a> @plugin=conf_remap.so @pparam=proxy.config.net.sock_packet_tos_out=0x1c
  regex_map http://.*\.youtube\.org http://$0 @plugin=conf_remap.so @pparam=proxy.config.net.sock_packet_tos_out=0x1c

# Remember if traffic originated from our internet connection iptables -t mangle -A PREROUTING -i eth0.2 -j MARK --set-mark 1/1 # Route web traffic to the proxy server except traffic already # originating from it. Matching web traffic by port number isn&apos;t # perfect but it&apos;s good enough. This is the MAC address of the proxy # server. Because it&apos;s configured to make origin connections # transparent this is the only way to match traffic already originating # from it: # <a class="external-link" href="http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.general/45405" rel="nofollow">http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.general/45405</a> iptables -t mangle -A PREROUTING -m mac --mac-source 00:22:15:d2:1e:61 -j RETURN iptables -t mangle -A PREROUTING -p tcp --dport 80 -j MARK --set-mark 2/2 iptables -t mangle -A PREROUTING -i eth0.2 -p tcp --sport 80 -j MARK --set-mark 2/2 # Web traffic is medium priority by default but the proxy server further # breaks down some high/low priority traffic. It communicates this by # setting the ToS/DiffServ Field (it uses the pool of codepoints reserved # for experimental or local use, 0x0c/0x0c). Mark the connection to # remember the priority and apply the same classification to response # traffic (on which the ToS/DiffServ Field is not set). iptables -t mangle -A POSTROUTING -m tos --tos 0x0c -j CONNMARK --set-mark 1 iptables -t mangle -A POSTROUTING -m connmark --mark 1 -j CLASSIFY --set-class 2:1 iptables -t mangle -A POSTROUTING -m tos --tos 0x1c -j CONNMARK --set-mark 2 iptables -t mangle -A POSTROUTING -m connmark --mark 2 -j CLASSIFY --set-class 2:3 # Route web traffic to the proxy server ip route add table 1 via 192.168.1.2 ip rule add fwmark 2/2 table 1 ifconfig ifb0 up # A qdisc is required before we can add a filter insmod sch_prio tc qdisc add dev br-lan root handle 1 prio # Shape only traffic originating from our internet connection # (packet mark 1/1) insmod cls_u32 insmod act_mirred tc filter add dev br-lan parent 1: protocol ip pref 1 u32 match mark 1 1 flowid 1:1 action mirred egress redirect dev ifb0 # Don&apos;t shape traffic (reorder/delay/drop) while there&apos;s available # capacity. Unfortunately available capacity must be manually # configured and fine-tuned. The following assumes isolated # up/downstream capacity (full-duplex). insmod sch_tbf tc qdisc add dev eth0.2 root handle 1 tbf rate .5mbit burst 5k latency 70ms tc qdisc add dev ifb0 root handle 1 tbf rate 2.5mbit burst 5k latency 70ms # Schedule traffic according to three priorities tc qdisc add dev eth0.2 parent 1: handle 2 prio tc qdisc add dev ifb0 parent 1: handle 2 prio # For each priority schedule an equal amount of traffic for each client insmod sch_sfq tc qdisc add dev eth0.2 parent 2:1 handle 3 sfq tc qdisc add dev ifb0 parent 2:1 handle 3 sfq tc qdisc add dev eth0.2 parent 2:2 handle 4 sfq tc qdisc add dev ifb0 parent 2:2 handle 4 sfq tc qdisc add dev eth0.2 parent 2:3 handle 5 sfq tc qdisc add dev ifb0 parent 2:3 handle 5 sfq # Divide downstream traffic into clients by destination IP address. # Divide upstream traffic into clients by *Netfilter connection # tracking* source IP address (after NAT all upstream traffic shares the # same source IP address). insmod cls_flow tc filter add dev eth0.2 parent 3: pref 1 handle 1 flow hash keys nfct-src divisor 1024 tc filter add dev ifb0 parent 3: protocol ip pref 1 handle 1 flow hash keys dst divisor 1024 tc filter add dev eth0.2 parent 4: pref 1 handle 1 flow hash keys nfct-src divisor 1024 tc filter add dev ifb0 parent 4: protocol ip pref 1 handle 1 flow hash keys dst divisor 1024 tc filter add dev eth0.2 parent 5: pref 1 handle 1 flow hash keys nfct-src divisor 1024 tc filter add dev ifb0 parent 5: protocol ip pref 1 handle 1 flow hash keys dst divisor 1024
Code Block

Resources

The best source of iproute2 documentation is the Linux Advanced Routing and Traffic Control project.

...