[rt2x00-users] rt2x00: status update regarding txdone handling
Gertjan van Wingerde
gwingerde at gmail.com
Wed Aug 4 08:37:20 UTC 2010
On Wed, Aug 4, 2010 at 10:29 AM, Ivo Van Doorn <ivdoorn at gmail.com> wrote:
> On Wed, Aug 4, 2010 at 10:25 AM, Helmut Schaa
> <helmut.schaa at googlemail.com> wrote:
>> Am Montag 02 August 2010 schrieb Ivo Van Doorn:
>>> On Mon, Aug 2, 2010 at 10:38 AM, Helmut Schaa
>>> <helmut.schaa at googlemail.com> wrote:
>>> > Am Donnerstag 29 Juli 2010 schrieb Ivo Van Doorn:
>>> >> On Thu, Jul 29, 2010 at 6:30 PM, Helmut Schaa
>>> >> <helmut.schaa at googlemail.com> wrote:
>>> >> > Am Donnerstag 29 Juli 2010 schrieb Ivo Van Doorn:
>>> >> >> Hi,
>>> >> >>
>>> >> >> > I'm still fighting with the txdone handling in rt2800pci. Sometimes the tx
>>> >> >> > queues get stuck. It took me a few days but it seems as if the TX_STA_FIFO
>>> >> >> > register limitation to hold max 16 entries is causing this problem.
>>> >> >>
>>> >> >> Weird, for rt2800usb this doesn't seem to be the problem. The queue locks
>>> >> >> up even when I continuously read the TX_STA_FIFO register.
>>> >> >
>>> >> > Maybe I should add that this behaviour only happens when running iperf
>>> >> > from the SoC board to an associated client. And the host CPU utilization
>>> >> > is near to 100%.
>>> >>
>>> >> Ah ok, well I haven't managed to get iperf working on my system yet,
>>> >> so I can't test that. But since the queue locks up regardless of CPU and
>>> >> TX_STA_FIFO status, it might be a different problem (perhaps even for SoC).
>>> >
>>> > Just for the records, since I use OpenWrt for development I cannot simply run
>>> > rt2x00 git on the board, hence I normally use the current compat-wireless +
>>> > some rt2x00 patches (if any).
>>>
>>> Well I use the compat-wireless package as well, I really don't want to reinstall
>>> the kernel for each update. :)
>>>
>>> > The funny thing is: I can easily trigger the stuck tx queue problem with
>>> > 2.6.34 + compat-wireless whereas it seems to not happen with 2.6.32 +
>>> > compat-wireless.
>>> >
>>> > I've just double checked but also on 2.6.32 the TX_STA_FIFO contains sometimes
>>> > 16 entries. So maybe my conclusion regarding the TX_STA_FIFO overflow is
>>> > incorrect ... Not sure though. I can also see some non-freed entries in the
>>> > tx queue when using 2.6.32 but I cannot make it stuck completely.
>>> >
>>> > /me is confused now.
>>>
>>> Well not me. :P I was doubting about the TX_STA_FIFO explanation anyway, since
>>> it would mean that rt2800pci and rt2800usb suffer from the same queue lockup,
>>> but with apparent different causes. Now that it seems that the
>>> rt2800pci queue lockup
>>> isn't caused by TX_STA_FIFO overflow, it might be the same bug as
>>> rt2800usb. Which
>>> actually is nice, since if this issue is fixed, it is fixed for all hardware. :)
>>>
>>> Perhaps we should check the TX(WI) descriptor, I understood from Ralink that
>>> the HW queue handler might lockup when it finds an unexpected value. At least in
>>> rt2800usb the values for TXINFO_W0_USB_DMA_NEXT_VALID and
>>> TXINFO_W0_SW_USE_LAST_ROUND could cause problems, although rt2800usb
>>> does send the correct values (always 0). But maybe there is a wrong
>>> TXWI field....
>>
>> Ok, I found some time to investigate a little bit further. I used my patches
>> to read the TX_STA_FIFO from hard irq context and processing it in the
>> interrupt thread, this should reduce the average number of tx status read
>> from the register. Second, I now used all DMA_DONE and the TX_DONE interrupts
>> for reading the TX_STA_FIFO.
>>
>> The most notable difference is that with these changes the read of TX_STA_FIFO
>> never ever comes close to 16 status reads which means there shouldn't happen an
>> overflow at all. However, even in that case it seems as if the tx status of
>> some frames get lost :(. In short, the number of tx status reads from
>> TX_STA_FIFO is smaller then the number of tx'ed frames which again leaves some
>> frames remaining in the tx queue.
>>
>> So, I'm trying to find the answers to these questions now:
>>
>> - Can a TX_STA_FIFO overflow happen at all?
>
> I think not.
>
>> - Does every frame get a tx status? Or in which cases might a tx status get
>> lost and why? Is that easily detectable?
>
> Well I think I mentioned a couple of times before, but this issue is
> the same as rt61pci,
> frames are being lost in there as well. If we can map a TX_STA_FIFO
> entry correctly
> to a particular entry in the queue, then we can report all missed
> entries with the
> UNKNOWN state. There is hardly any alternative to this approach...
>
> The reason for the loss of TX status reports is unknown, and Ralink
> couldn't give any
> answers on this issue either.
>
Just an "out-of-the-box thinking" idea. Would it be possible that the
hardware sometimes
reports the TX statuses in a different order than we uploaded the
frames to the hardware?
I can easily see how our driver gets lost if that happens.
(Just an idea; I have no idea if this actually happens).
---
Gertjan.
More information about the users
mailing list