[rt2x00-users] rt2x00: status update regarding txdone handling

Ivo Van Doorn ivdoorn at gmail.com
Wed Aug 4 08:39:45 UTC 2010


On Wed, Aug 4, 2010 at 10:37 AM, Gertjan van Wingerde
<gwingerde at gmail.com> wrote:
> On Wed, Aug 4, 2010 at 10:29 AM, Ivo Van Doorn <ivdoorn at gmail.com> wrote:
>> On Wed, Aug 4, 2010 at 10:25 AM, Helmut Schaa
>> <helmut.schaa at googlemail.com> wrote:
>>> Am Montag 02 August 2010 schrieb Ivo Van Doorn:
>>>> On Mon, Aug 2, 2010 at 10:38 AM, Helmut Schaa
>>>> <helmut.schaa at googlemail.com> wrote:
>>>> > Am Donnerstag 29 Juli 2010 schrieb Ivo Van Doorn:
>>>> >> On Thu, Jul 29, 2010 at 6:30 PM, Helmut Schaa
>>>> >> <helmut.schaa at googlemail.com> wrote:
>>>> >> > Am Donnerstag 29 Juli 2010 schrieb Ivo Van Doorn:
>>>> >> >> Hi,
>>>> >> >>
>>>> >> >> > I'm still fighting with the txdone handling in rt2800pci. Sometimes the tx
>>>> >> >> > queues get stuck. It took me a few days but it seems as if the TX_STA_FIFO
>>>> >> >> > register limitation to hold max 16 entries is causing this problem.
>>>> >> >>
>>>> >> >> Weird, for rt2800usb this doesn't seem to be the problem. The queue locks
>>>> >> >> up even when I continuously read the TX_STA_FIFO register.
>>>> >> >
>>>> >> > Maybe I should add that this behaviour only happens when running iperf
>>>> >> > from the SoC board to an associated client. And the host CPU utilization
>>>> >> > is near to 100%.
>>>> >>
>>>> >> Ah ok, well I haven't managed to get iperf working on my system yet,
>>>> >> so I can't test that. But since the queue locks up regardless of CPU and
>>>> >> TX_STA_FIFO status, it might be a different problem (perhaps even for SoC).
>>>> >
>>>> > Just for the records, since I use OpenWrt for development I cannot simply run
>>>> > rt2x00 git on the board, hence I normally use the current compat-wireless +
>>>> > some rt2x00 patches (if any).
>>>>
>>>> Well I use the compat-wireless package as well, I really don't want to reinstall
>>>> the kernel for each update. :)
>>>>
>>>> > The funny thing is: I can easily trigger the stuck tx queue problem with
>>>> > 2.6.34 + compat-wireless whereas it seems to not happen with 2.6.32 +
>>>> > compat-wireless.
>>>> >
>>>> > I've just double checked but also on 2.6.32 the TX_STA_FIFO contains sometimes
>>>> > 16 entries. So maybe my conclusion regarding the TX_STA_FIFO overflow is
>>>> > incorrect ... Not sure though. I can also see some non-freed entries in the
>>>> > tx queue when using 2.6.32 but I cannot make it stuck completely.
>>>> >
>>>> > /me is confused now.
>>>>
>>>> Well not me. :P I was doubting about the TX_STA_FIFO explanation anyway, since
>>>> it would mean that rt2800pci and rt2800usb suffer from the same queue lockup,
>>>> but with apparent different causes. Now that it seems that the
>>>> rt2800pci queue lockup
>>>> isn't caused by TX_STA_FIFO overflow, it might be the same bug as
>>>> rt2800usb. Which
>>>> actually is nice, since if this issue is fixed, it is fixed for all hardware. :)
>>>>
>>>> Perhaps we should check the TX(WI) descriptor, I understood from Ralink that
>>>> the HW queue handler might lockup when it finds an unexpected value. At least in
>>>> rt2800usb the values for TXINFO_W0_USB_DMA_NEXT_VALID and
>>>> TXINFO_W0_SW_USE_LAST_ROUND could cause problems, although rt2800usb
>>>> does send the correct values (always 0). But maybe there is a wrong
>>>> TXWI field....
>>>
>>> Ok, I found some time to investigate a little bit further. I used my patches
>>> to read the TX_STA_FIFO from hard irq context and processing it in the
>>> interrupt thread, this should reduce the average number of tx status read
>>> from the register. Second, I now used all DMA_DONE and the TX_DONE interrupts
>>> for reading the TX_STA_FIFO.
>>>
>>> The most notable difference is that with these changes the read of TX_STA_FIFO
>>> never ever comes close to 16 status reads which means there shouldn't happen an
>>> overflow at all. However, even in that case it seems as if the tx status of
>>> some frames get lost :(. In short, the number of tx status reads from
>>> TX_STA_FIFO is smaller then the number of tx'ed frames which again leaves some
>>> frames remaining in the tx queue.
>>>
>>> So, I'm trying to find the answers to these questions now:
>>>
>>> - Can a TX_STA_FIFO overflow happen at all?
>>
>> I think not.
>>
>>> - Does every frame get a tx status? Or in which cases might a tx status get
>>>  lost and why? Is that easily detectable?
>>
>> Well I think I mentioned a couple of times before, but this issue is
>> the same as rt61pci,
>> frames are being lost in there as well. If we can map a TX_STA_FIFO
>> entry correctly
>> to a particular entry in the queue, then we can report all missed
>> entries with the
>> UNKNOWN state. There is hardly any alternative to this approach...
>>
>> The reason for the loss of TX status reports is unknown, and Ralink
>> couldn't give any
>> answers on this issue either.
>>
>
> Just an "out-of-the-box thinking" idea. Would it be possible that the
> hardware sometimes
> reports the TX statuses in a different order than we uploaded the
> frames to the hardware?
>
> I can easily see how our driver gets lost if that happens.
>
> (Just an idea; I have no idea if this actually happens).

This behavior hasn't been seen in rt61pci. The status reports where
always sequential, but sometimes with a gap in them.
For rt2800pci I am not sure, since I haven't looked at the TX_STA_FIFO reports,
however with rt61pci with could pass the index of the queue, which is
not possible with rt2800pci.

Ivo



More information about the users mailing list