Friday, February 9, 2007

A comedy of errors... (IP tuning for the perverse)

As some may know, my afe ethernet drivers are currently being tested by Sun prior to integration into Solaris.

I've been spending time with Alan DuBoff at Sun to diagnose a certain set of problems. It turns out that I believe that there is a hardware problem with a certain configuration causing many frames to be dropped (large increasing values of the fcs_errors statistic.) At one point it looked like ~5-10% of the packets were getting dropped. The devices seems to function, but it performs poorly.

That in itself isn't too interesting. But add to this an obscene level of test, where UDP packets with very large sizes (~60K) are sent to the device at wirespeed.

Well, if you do the math, you realize that almost no such UDP packets will get thru. But more importantly, while this is occuring, IP has to track all these frames for packet reassembly. At 100Mbps this adds up to 12MB/sec. How long does IP hold these packets for, before giving up on them and discarding them.

On this particular system, Alan noticed that literally hundreds of megabytes of memory were in use. No, they weren't leaked. They were just waiting to be discarded!

I think this is a particular "opportunity" for some more tuning in TCP/IP . Perhaps a maximum quota on outstanding fragments (after which they are discarded in FIFO order?)

Alan has promised to pick up another NIC, so we can follow up with further test.

Meantime, I'm very, very grateful for kstats. Without fcs_errors, we would almost never have been able to figure out what was going on. One possible side benefit of all this, is that I have added some improvements to the error handling (including a full chip reset), so in perverse scenarios like this the chip has a better chance of recovering from errors.

PS: This also suggests an opportunity for a test version of a driver that has a random "drop" tunable, which drops packets at a specified percentage. Who knows what other bugs might turn up?

PPS: I think this may be a potential DoS attack, just flood a system with IP fragments. Of course, this might be much harder to do if the attacker does not have access to a local network. Any firewall should be secure agains this kind of abuse.

No comments: