As I said in my last post, I'm trying out the new rt73usb drivers from compat-wireless on ARM, running linux-188.8.131.52. I'm encountering a problem whenever I upload a large amount of data.
I can ping the device, and it responds. I can use something like scp to copy data to the device, but if I try to transmit a large amount of data [i10ihw01r]from[/i10ihw01r] the device, the card "locks up", and requires me to pull it to get it working again. It gets anywhere between 200k and 500k before freezing. If I take a look at iwconfig, I can see the "Link Quality" number adjusting, and the register values don't change between the time it works and the time it's failed
00000000: 30783136 0a 0x16.
00000000: 30783030 30323537 33610a 0x0002573a.
00000000: 30783235 37330a 0x2573.
00000000: 30783030 30303263 30630a 0x00002c0c.
The output from iw shows that packets are being received and sent, even though I can't do anything with the network, not even ping
Station 00:1d:46:24:b8:d1 (on wlan0)
inactive time: 53380 ms
rx bytes: 296083
rx packets: 2440
tx bytes: 751157
tx packets: 635
signal: -54 dBm
tx bitrate: 54.0 MBit/s
It could be the kernel that I'm using that's contributing to this behavior. However, what's particularly interesting is that although this particular card crashes (it's got a chip labeled RT2571WF), I have an rt2500 (chip RT2571F) that I can also use on this exact same setup, and it doesn't exhibit this behavior. Nor does any other wireless driver.
Where should I begin looking to try and track down this issue?
Some more information.
This problem occurs in both the latest compat-wireless from today, as well as in the drivers in linux-184.108.40.206.
It seems to be a failure of the txqueue, because if I run tcpdump on the device and then ping it from outside, I can see pings continue for a few seconds after a disconnect occurs (I'm guessing that the AP disconnects me after a few seconds of not receiving a response.)
Around the time that the disconnect occurs, I start getting interesting failures with rt2x00usb_kick_tx_entry(), where test_and_clear_bit(ENTRY_DATA_PENDING, &entry->flags) starts returning false.
I'm going to dig into the queue code some more, but that's what I've found so far.
I think the problem is caused by a faulty USB EHCI controller. The controller is sending out short packets, and retrying in rapid succession. The rt73 seems most susceptible to failure because of these packets, while the rt2500 seems the most robust.
Fixing the controller, unsurprisingly, has fixed the rt73 drivers.