Quality of Service (QoS) is an attempt to give priority to a packet type or data connection on a per session basis. Hierarchical Fair Service Curve takes QoS to the next level over CBQ by focusing on guaranteed real-time, adaptive best-effort, and hierarchical link-sharing service.
Though this may sound difficult, it is really easy to use once you understand the basics.
What HSFC means without technical jargon is, you have the ability to setup rules to govern how data leaves the system. For example...
- You may choose to have ack packets in the queue with the most bandwidth and is least shared with other queues to guarantee those packets go out first. Ack packets are the way you tell the remote system you have received the latest payload and to continue to send the next. This will make sure your data transfers go as fast as they can even on a saturated connection.
- What if you are an avid gamer and other users on your network are slowing your connection down or causing you to loose your connection. You choose to give priority to your gaming traffic over normal web traffic. This way you can play games without slowing down and keep your latency low while other users on the network browse the web and download files.
- What if you are running a web server and you find the majority of your data is text based and is less than 10KB per page, but you do have a few larger data files around 5MB. You decide you want to serve out data quickly in the beginning of the connection and slow down after a few seconds. This is called a nonlinear service curve (NLSC or just SC). You can setup HFSC to serve out the first few seconds of a connection at full speed, lets say 100KB/sec and then slow the connection down after 5 seconds to 25KB/sec. This allows you to serve out your HTML page at full speed and still allow people to download the 5MB files at slow speed, saving bandwidth for other web clients.
Quality of Service gives you the tools you need to shape traffic.
HFSC is _not_ a perfect solution. First, HFSC is currently built upon ALTQ and ALTQ is very inefficient. Simply by enabling ALTQ on an interface in pf.conf you will loose as much as ten(10) percent of the available interface bandwidth due to inefficiencies in the ALTQ code base. If your network interface is already hitting the upper end of the transmit bandwidth limit then ALTQ will be a hindrance. If you are not at one(1) gigabit on a one(1) gigabit interface then you will not notice much of a slow down (perhaps only a bit of additional latency) and HFSC is fine to use.
The second note is ALTQ will probably be removed from OpenBSD as soon as v5.3 is released. The "replacement" priority scheduler in OpenBSD is called "prio" and is based on a simpler priority queuing. OpenBSD's prio is a lot more efficient, but it lacks much of the functionality of HFSC on ALTQ. The initial version of prio is available in OpenBSD v5.1 so check the man page for pf(4) for more details and make an informed decision. FreeBSD Pf is going to be SMP friendly in FreeBSD 10 and significantly more efficient compared to OpenBSD's implementation. FreeBSD's ALTQ will not change though.
Lets take a look at the basic set of commands in HFSC and why you would uses them in the real world:
bandwidth: has two(2) slightly different meanings depending on if it is defined in the parent or child queue lines. Make note not to confuse the two.
In the parent "altq" line this defines the maximum bit rate for all queues on an interface. The total aggregate upload bandwidth allowed by the ISP not the negotiated speed of the NIC. It is important to specify a value slightly less than the maximum amount of upload bandwidth available so altq can queue the data and not some upstream router (96% of the upload speed of 1000kbit/sec as in our example).
In the child "queue" line(s) this directive specifies the maximum bit rate to be processed by the queue at any one time. This directive is actually the same as using the "linkshare" with the a value in (m2). This value must not exceed the value of the parent queue and can be specified as an absolute value or a percentage of the parent queue's bandwidth. If not specified, this defaults to 100% of the parent queue's bandwidth. It is advisable to assign a percentage of your total bandwidth to each child queue up to a total of no more than 100%.
priority: the level specifies the order in which a service is to occur relative to other queues and is used in CBQ and PRIQ, but not HFSC. The higher the number or value, the higher the priority. Remember, HFSC does _NOT_ use priority at all even if you add this directive to your queues. This directive is a simple way of specifying which packets are first out of the gate compared to others. Priority is does _not_ define an amount of bandwidth, but the order in which packets are buffered before being set out of the interface. For example, if you have gaming data and bulk web data. You want gaming data to be first since it is interactive and bulk web traffic can wait. Set the gaming data queue at least one(1) priority level higher than the bulk web traffic queue. The priority range for cbq is 0 to 7. Priority 0 is the lowest priority for the least important data. When not specified, a default of 1 is used. Priq type queues with a higher priority are always served first. CBQ and PRIQ type queues with a higher priority are served first if the link is saturated and the "realtime" bandwidth is also exhausted.
qlimit: the amount of packets to buffer and queue when the amount of available bandwidth has been exceeded. This value is 50 packets by default. When the total amount of upload bandwidth has been reached on the outgoing interface or higher queues are taking up all of the bandwidth then no more data can be sent. The qlimit will put the packets the queue can not send out into slots in memory in the order that they arrive. When bandwidth is available the qlimit slots will be emptied in the order they arrived; first in, first out (FIFO). If the qlimit reaches the maximum value of qlimit, the packets will be dropped.
Look at qlimit slots as "emergency use only," but as a better alternative to dropping the packets out right. Understand dropping packets is the proper way TCP knows it needs to reduce bandwidth; so dropping packets are not bad. The problem is TCP Tahoe or Reno methods will slow down the connection too severely and it takes a while to ramp back up after a dropped packet. A small qlimit buffer helps smooth out the connection, but "buffer bloat" works against TCP's congestion control. Also, do not think that setting the qlimit really high will solve the problem of bandwidth starvation and packet drops. What you want to do is setup a queue with the proper bandwidth boundaries so that packets only go into the qlimit slots for a short time (no more than a second), if ever.
If the qlimit is too large then you will run into a common issue called buffer bloat. Search on Google for "buffer bloat" for more information. A good idea is to set the qlimit to the amount of packets you want to buffer (not drop) in no more then a given amount of time. Take the total amount of upload bandwidth you have for your connection. Lets say that is 25 megabit upload speed. Now decide how much time you are willing to buffer packets before they get sent out. Lets say we will buffer 0.5 seconds which is quite long. So, 25 megabit divided by 8 is 3.125 megabytes per second. The average maximum segment size is 1460 bytes. 3.125 MB/sec divided by 0.001460 MB is 2140.41 packets per second. Now, we decided that we want to queue 0.5 seconds which is 2140.41 packets per second time 0.5 seconds which is 1070 packets. Thus, we set the qlimit at 1070. 1070 packets at a MSS of 1460 bytes is a 1.562 megabyte buffer. This is just a rough model, but you get the idea. We prefer to set our buffer a little high so that network spikes get buffered for 0.5 to one(1) second and then sent out. This method smooths out upload spikes, but does add some buffer bloat to our external network connection. In _our_tests on _our_ network a larger buffer worked better in the real world then the default qlimit of 50 packets set by OpenBSD. Do your own tests and make an informed decision.
realtime: the amount of bandwidth that is guaranteed to the queue no matter what any other queue needs. Realtime can be set from 0% to 80% of total connection bandwidth. Lets say you want to make sure that your web server gets 25KB/sec of bandwidth no matter what. Setting the realtime value will give the web server queue the bandwidth it needs even if other queues want to share its bandwidth.
upperlimit: the amount of bandwidth the queue can _never_ exceed. For example, say you want to setup a new mail server and you want to make sure that the server never takes up more than 50% of your available bandwidth. Or lets say you have a p2p user you need the limit. Using the upperlimit value will keep them from abusing the connection.
linkshare (m2): this value has the exact same use as "bandwidth" above. If you decide to use both "bandwidth" and "linkshare" in the same rule, pf (OpenBSD) will override the bandwidth directive and use "linkshare m2". This may cause more confusion than it is worth especially if you have two different settings in each. For this reason we are not going to use linkshare in our rules. The only reason you may want to use linkshare _instead of_ bandwidth is if you want to enable a nonlinear service curve.
nonlinear service curve (NLSC or just SC): The directives realtime, upperlimit and linkshare can all take advantage of a NLSC. In our example below we will use this option on our "web" queue. The format for service curve specifications is (m1, d, m2). m2 controls the bandwidth assigned to the queue. m1 and d are optional and can be used to control the initial bandwidth assignment. For the first d milliseconds the queue gets the bandwidth given as m1, after wards the value given in m2.
default: the default queue. As data connections or rules which are not specifically put into any other queue will be put into the default queue rule. This directive must be in only one rule. You can _not_ have two(2) default directives in any two(2) rules.
ecn: In ALTQ, ECN (Explicit Congestion Notification) works in conjunction with RED (Random early detection). ECN allows end-to-end notification of network congestion without dropping packets.
ECN is an optional feature which is used when both endpoints support it and are willing to use it. OpenBSD has ecn disabled by default and Ubuntu has it turned on only if the remote system asks for it first. Traditionally, TCP/IP networks signal congestion by dropping packets. When ECN is successfully negotiated, an ECN-aware router may set a mark in the IP header instead of dropping a packet in order to signal impending congestion. The receiver of the packet echoes the congestion indication to the sender, which must react as though a packet was dropped. ALTQ's version of RED is similar to Weighted RED (WRED) and RED In/Out (RIO) which provide early detection when used with ECN. The end result is a more stable TCP connection over congested networks.
Be very careful when enabling ECN on your machines. Remember that any router or ECN enabled device can notify both the client and server to slow the connection down. If a machine in your path is configured to send ECN when their congestion is low then your connections speed will suffer greatly. For example, telling clients to slow their connections when the link is 90% saturated would be reasonable. The connection would have a 10% safety buffer instead of dropping packets. Some routers are configured incorrectly and will send ECN when they are only 10%-50% utilized. This means your throughput speeds will be painfully low even though there is plenty of base bandwidth available. Truthfully, we do not use ECN or RED due to the ability of routers, misconfigured or not, to abuse congestion notification.
Now, lets take a look at a custom HFSC queue setup. The following group of rules splits data into 8 subsets and gives each one of them specific data tasks and limits. You do not have to follow this example exactly, especially since you have the definitions above. Lets go through what each line does and why it is used, then you can decide for yourself.
cut-and-paste this set if you want. It works perfectly fine.
### FIOS Upload = 20Mb/s (queue at 97%) altq on $ExtIf bandwidth 19.40Mb hfsc queue { ack, dns, ssh, web, mail, bulk, bittor, spamd } queue ack bandwidth 30% qlimit 500 hfsc (realtime 20%) queue dns bandwidth 5% qlimit 500 hfsc (realtime 5%) queue ssh bandwidth 20% qlimit 500 hfsc (realtime 20%) {ssh_login, ssh_bulk} queue ssh_login bandwidth 50% qlimit 500 hfsc queue ssh_bulk bandwidth 50% qlimit 500 hfsc queue bulk bandwidth 20% qlimit 500 hfsc (realtime 20% default, ecn) queue web bandwidth 5% qlimit 500 hfsc (realtime (10%, 10000, 5%)) queue mail bandwidth 5% qlimit 500 hfsc (realtime 5%) queue bittor bandwidth 1% qlimit 500 hfsc (upperlimit 95%) queue spamd bandwidth 1% qlimit 500 hfsc (upperlimit 1Kb)
The first line is simply a comment. It reminds one that our total upload bandwidth is 20Mb/s (megabits per second). You never want to use exactly the total upload speed, but a few kilobytes less.
Why? You want to use your queue as the limiting factor in the connection. When you send out data and you saturate your link the router you connect to will decide what packets go first and that is what we want HSFC to do. You can _not_ trust your upstream router to queue packets correctly.
So, we limit the upload speed to just under the total available bandwidth. "Doesn't that waste some bandwidth then?" Yes, in this example we are not using 60KB/s, but remember we are making sure the upstream routers sends out the packets in the order we want, not what they decide. This makes all the difference with ACK packets and will actually increase the available bandwidth on a saturated connections.
The second line is the parent queue for the external interface ($ExtIf), it shows we are using "hfsc queue" and lists out all eight(8) of the child queues (ack, dns, ssh, web, mail, bulk, bittor, spamd). This is where we specify the bandwidth limit at 97% of the total 20Mb/sec or 19.40MB/sec.
The next set of lines specify the eight(8) child queues and also two sub-child queues in the ssh rule. All of these rules use the external interface and are limited by the parent queue's bandwidth limitations.
This is the ack queue. it can processes as much as 30% out of the total link bandwidth and has a very high queue limit of 500 slots. The realtime of 20% means this queue is guarantee at least 20% of the total bandwidth no matter what any other rules wants.
The queue is for ack (acknowledge) packets without payload. Ack packets are the method your system tells the remote servers you have received the payload they sent and to send the next one. By prioritizing these packets you can keep your transfer rates high even on a highly saturated link. For example, if you are downloading a file and you receive a chunk of data the remote system will not send you the next chunk of data until you send them an OK. The OK is the ack packet. When you send the ack packet the remote system knows you got the packet and it has checked out, thus it will send the next one. If on the other hand you delay ack packets, the transfer rate will diminish quickly because the remote system wont send anything new until you respond.
This is the dns queue. it is allowed to processes as much as 5% out of the total bandwidth and has a high queue limit of 500 slots. The realtime of 5% means this queue is guarantee at least 5% of the total bandwidth no matter what any other rules wants.
This queue is simply to make sure dns packets get out on time. Though this is not really necessary your web browsing users will be thankful. When you go to a site or enter a URL the clients need the ip of the server. This rule simply allows dns queries to go out before other traffic.
This is the ssh parent and child queues. The parent queue can processes as much as 20% out of the total bandwidth and has a very high queue limit of 500 slots. The realtime of 20% means this queue is guarantee at least 20% of the total bandwidth.
The two(2) child queues are for ssh's interactive logins (ssh_login) and bulk transfer data like scp/sftp (ssh_bulk). These two queues are under the parent queue and both divide the parents bandwidth of 20% of the total aggregate nic bandwidth. In this example we want to make sure interactive ssh like authentication has at least 50% of the bandwidth. The rest of the bandwidth is used for bulk transfers like scp and sftp transfers. Both child queues do have the ability to share bandwidth from each other. The priorities of the ssh child queues are independent of all of the other queues. We could have picked any other priorities as long at ssh_login was higher than ssh_bulk.
Normally only one queue name is given with the queue keyword, but if a second name is specified that queue will be used for packets with a Type of Service (ToS) of low-delay (tos 0x10) and for TCP ACK packets with no data payload. A good example of this is found when using SSH. SSH login sessions will set the ToS to low-delay while SCP and SFTP sessions will not. PF can use this information to queue packets belonging to a login connection in a different queue than non-login connections. This can be useful to prioritize login connection packets over file transfer packets.
REMEMBER: when setting up your pass rules for ssh traffic you need to have the two queues in the correct order. For ssh, the first queue listed is for bulk traffic and the second is for interactive traffic. For example, "queue (ssh_bulk, ssh_login)" is the correct order for your pass rules. Check our OpenBSD Pf Firewall "how to" (pf.conf) Guide for a working example.
This is the bulk queue. The bulk queue can processes as much at 20% out of the total bandwidth and has a very high queue limit of 500 slots. The realtime of 20% means this queue is guaranteed at least 20% of the total bandwidth no matter what any other rules wants.
This queue is the where all of the general traffic will go. If one does not specify a queue for a rule, that traffic will go here. Notice the directive "default" after the realtime tag. You must specify one and only one "default" queue.
Notice we have included the ecn (Explicit Congestion Notification) directive. This is just an example as you can add ecn to any rule. Normally we do _NOT_ use ecn on any of our rules.
This queue is an example showing the use on nonlinear service curve ( nlsc or just sc) with the realtime directive. We could assign this queue to the traffic coming into the external network interface and accessing our public web server.
In this example we are using three(3) variables to shape the bandwidth over time. The format for service curve specifications is (m1, d, m2). m2 controls the bandwidth assigned to the queue. This is what we used for all of the previous realtime variables. m1 and d are optional and can be used to control the initial bandwidth assignment. For the first d milliseconds the queue gets the bandwidth given as m1, after wards the value given in m2.
So, our web queue will guarantee bandwidth up to 10% of the parent queue (19.40/10 or 1.75Mb/sec) for at least 10000 milliseconds (10 seconds) after the transfer starts. Then after 10000 milliseconds the bandwidth will go down to 5% or 875Kb/sec. This might be useful to keep short interactive transfers fast, but slow down big downloads which might otherwise monopolize your bandwidth.
This queue can be used for incoming mail server connections once they have passed your spamd checks. It can borrow as much as 5% of the total bandwidth and has a high queue limit of 500 slots. The realtime of 5% means this queue is guarantee at least 5% of the total bandwidth no matter what any other rules wants.
This is bittor queue. The bittor queue can borrow as much at 1% of the total bandwidth and has a very high queue limit of 500 slots. Notice this rule does not have a real time directive. This is because we have decided that bittorrent traffic is expendable and we want to make sure this queue gives up all bandwidth to other queues if they need it. The upperlimit directive makes sure this rule will never us more than 95% of the total bandwidth from any other queue if it is available.
This rule is here to show that one can use peer sharing tools and still have control of their network. You will notice that remote clients using p2p sharing tools and connecting to you will hammer your connection. This rule will allow the data to transfer at up to 95% of your full speed, but if another queue needs the bandwidth, the bittor queue will be pruned almost instantly to 1%.
Imagine if you are getting the latest OpenBSD distro through a torrent and then you want to browse the web. Normally, you would experience a slow connection because you are fighting for bandwidth with p2p clients. With this rule your browsing traffic gets the bandwidth it needs instantly since it is using the bulk queue. The bittor queue on the other hand gets reduced and starts using the qlimit slots until you are done using the bandwidth browsing. Best of both worlds.
This is the spamd queue. The spamd queue can process as much as 1% of the total parent bandwidth and has a very high queue limit of 500 slots. Notice this rule does not have a real time directive. This is because we have decided that spamd traffic is expendable and we want to make sure this queue gives up all bandwidth to the queues that want need it. The upperlimit directive makes sure this rule will _never_ borrow or use more than 1Kb/s (1 kilo bit per second) of the total bandwidth from any other queue.
This rule is used for spammers and grey listed smtp hosts. It is linked to the spamd daemon to annoy spammers or figure out if a smtp host is valid or not. Since the traffic on this queue has very low bandwidth requirements we have decided to set the upper bounds at 1Kb/s (125 bytes/sec). If we setup spamd to stutter connections at 1 packet per ip per 3 seconds (spamd -s3) we will need around 17.6 bytes per second of bandwidth per smtp connection. A single spamd smtp data payload is 53 bytes per packet per ip. Our limit of 1Kb/sec should be able to handle around 7.1 concurrent smtp connections. Even if you had more spammers connected the queue would never use more than 1Kb/s of the bandwidth. Any extra packets would go into qlimit bucket and if that fills, packets would be dropped. This is not a problem since the data is expendable and real mail clients will always reconnect if they are RFC compliant. If you need more information about spamd check out our Spamd tarpit and grey listing anti-spam "how to" (spamdb).
Now that we have taken a detailed look at the queue rules and directives, we now need to look at a way to apply those queues to our pf rules.
Here we have two(2) examples of rules you can use queuing on. Notice the queue names we used above like ack, bulk, ssh_login, and ssh_bulk at the end of the rules. Also, notice the order that we have put the two queues in on each rule. The first queue name in "bulk, ack" is for general data and the second "ack" is for special short length packets (TOS).
pass out on $ExtIf inet proto tcp from ($ExtIf) to any flags S/SA modulate state queue (bulk, ack)
pass out on $ExtIf inet proto tcp from ($ExtIf) to any port ssh flags S/SA modulate state queue (ssh_bulk, ssh_login)
The first rule is passing out bulk traffic on the external interface and prioritizing ack packets. The second rule is passing out data on port 22(ssh) and prioritizing the interactive ssh traffic. This traffic is originating on our internal network or on the firewall itself.
If we decided to have a rule with only one queue directive it would look like so.
pass out on $ExtIf inet proto tcp from ($ExtIf) to any flags S/SA modulate state queue (bulk)
You can also queue data on the return trip on an external stateful connection. Remember you can _not_ queue data coming into the box, only going out. Lets say you have a web server and clients from the outside connect to you and you want their data responses to be queued. The following works perfectly.
pass in on $ExtIf inet proto tcp from any to ($ExtIf) port www flags S/SA modulate state queue (web, ack)
So, now you have read all about queuing and you have applied the queue tags to your rules. Now you need to verify that what you setup works actually does what you think it should do. OpenBSD has a built in tool for this purpose, if you are using OpenBSD 4.4 or later use the command "systat queues".
The following is an example output from "systat queues".
QUEUE BW SCH PRIO PKTS BYTES DROP_P DROP_B QLEN BORROW SUSPEN P/S B/S root_em0 19M hfsc 0 0 0 0 0 0 0 0 ack 5820K hfsc 8 375231 23477741 0 0 0 101 8532 dns 970K hfsc 7 12780 1139485 0 0 0 0 0 ssh 3880K hfsc 6 0 0 0 0 0 0 0 ssh_login 1940K hfsc 6 167985 14010918 0 0 0 0.6 144 ssh_bulk 1940K hfsc 5 809262 1074556K 0 0 0 0 0 bulk 3880K hfsc 5 682211 78703328 0 0 0 0 0 web 970K hfsc 4 387738 375064K 0 0 0 270 37645 mail 970K hfsc 3 2519 338557 0 0 0 0 0 bittor 194K hfsc 2 15 59K 0 0 0 0 0 spamd 194K hfsc 40507 2347010 0 0 0 0.2 10
The output above is similar to what you are looking for. You need to test each type of queue you setup to make sure you see the packets being added to the correct queue. For example, you could ssh to another machine going out the external interface and as you do so you should see interactive packets (like typing) being add to the "ssh_login" queue. If you scp/sftp a file you should packets being added to the "ssh_bulk" queue. Then do a dns query and see if the packets are put into the dns queue. Take some time and test to make sure everything works like it should.
Do you have a "how to" about PF ?Yes, we sure do. Check out our OpenBSD Pf Firewall "how to" ( pf.conf ) which covers PF and includes this HFSC quality of service (QOS) into the config.
Can I use Pf's HFSC with my ISP's implementation of SpeedBoost or PowerBoost ?
Yes, you can. Some Internet service providers (ISP) offer an upload burst for the first few megabytes of data before the upload speed is reduced for the rest of the data.
A few ISPs that support "upload bursting" are Comcast with PowerBoost, SpeedBoost or Blast!, Road Runner Turbo with Power Boost or Speed Boost and Shaw Power Boost. Every ISP will have their own name for the same implementation and might change the marketing names in the future, so these are just a few examples.
For this example we will use Comcast SpeedBoost as an example. Comcast on the east coast of the USA offers a upload burst at 12 megabits per second (12Mb/s or 1.5MB/s) for the first 15 megabytes (15MB). We can transfer 15MB at 1.5MB/sec in 10 seconds. The upload speed then drastically drops to 2 megabits per second (2Mb/s or 250KB/s) for the remainder of the transfer. Currently, Verizon FIOS does not offer any upload bursting. Every ISP in every area is going to offer different speeds no matter what they actually say. You need to test your speeds independently and adjust the "upperlimit" values accordingly.
In order to take advantage of the initial burst in speed we can make a parent queue called "speedboost" which will use all the upload bandwidth of 12Mb/s for the first 10 seconds (10000 milliseconds) and then we will limit the connection to 2Mb/sec for the remainder. This way we get to take advantage of the available bandwidth in the beginning of the transfer and still allow our firewall to properly queue the packets.
### Comcast Upload with PowerBoost = 12Mb/s to 2Mb/s altq on $ExtIf bandwidth 12Mb hfsc queue { speedboost } queue speedboost bandwidth 12Mb hfsc (upperlimit (12Mb, 10000, 2Mb) ) {ack, dns, ssh, bulk} queue ack bandwidth 30% qlimit 500 hfsc (realtime 20%) queue dns bandwidth 5% qlimit 500 hfsc (realtime 5%) queue ssh bandwidth 20% qlimit 500 hfsc (realtime 20%) {ssh_login, ssh_bulk} queue ssh_login bandwidth 50% qlimit 500 hfsc queue ssh_bulk bandwidth 50% qlimit 500 hfsc queue bulk bandwidth 20% qlimit 500 hfsc (realtime 20% default)