[rt2x00-users] rt2x00 workqueue problems

Ivo Van Doorn ivdoorn at gmail.com
Tue Dec 14 03:03:40 EST 2010


Hi,

>>> With the recent rt2x00 code (with the queue refactoring) I have a user
>>> which reported some problems which I at first assumed to be unrelated. But
>>> I seem to have found the cause, but I want to discuss the possible solutions.
>>>
>>> But first the bugs:
>>>  - Association time with unencrypted or WPA encrypted networks can
>>>    take up to 2 seconds.
>>>  - Latency problems in xterm where keystrokes are recognized with a delay,
>>>    or sometimes ignored altogether.
>>>  - Errors in the log about failed TX queue flush operations
>>>
>>> These errors all disappeared by simply deactivating the
>>> flush() callback function from mac80211.
>>>
>>> The reason why this removal works is simple, because flush() stalls for a long
>>> time, we get long association times (wpa_supplicant always scans
>>> before associating).
>>> Blocking the kernels threads will also cause the xterm latency problems.
>>>
>>> But why is flush() stalling? As I understand it now, is that flush()
>>> is being called
>>> from the mac80211 workqueue. But flush() depends on the mac80211 workqueue
>>> being available for processing the TX and RX status. This was a
>>> similar issue with the
>>> watchdog which I for that specific reason moved to the kernel workqueue. However
>>> if my assumption is correct then the proper assignment must be:
>>>
>>> watchdog -> mac80211 workqueue
>>> flush -> mac80211 workqueue
>>> tx/rx status handling -> kernel workqueue
>>>
>>> Does anybody have an alternative idea of cause of this problem,
>>> or are there any objections against this reassignment of the workqueues?
>>>
>>
>> Your analysis seems to be correct. At least there are some instances of flush()
>> being called from the mac80211 workqueue (at least the flush() calls that are
>> done when starting a scan).
>>
>> So, I guess that moving the flush to a different workqueue than where tx/rx
>> status handling is occurring should help this issue.
>
> Maybe it would even make sense to use a private workqueue for rx/tx to
> reduce latency
> due to workqueue contention? So, as you suggest:
>
> watchdog -> mac80211 workqueue
> flush -> mac80211 workqueue
> tx/rx -> private workqueue
>
> Or, if taking the option Johannes was playing with (async register
> read for tx status) we
> could move the rx/tx handling to tasklets. I'm just coming up with
> this because I've got some
> beaconing refactoring + pci driver to tasklet conversion patches in my
> queue and the
> interrupt thread to tasklet conversion improves performance on embedded systems
> considerable (>25% improvement in throughput).
>
> Nevertheless, to fix the initial issue I agree that moving the rx/tx
> handling into its own
> workqueue (maybe using the kernel workqueue as a first step) seems
> like the way to go.


Moving to tasklets sounds like a fine idea as well, however I've done some
experimenting with moving the TX and RX status handlers to the other workqueue,
and it didn't resolve the problem for me. In fact, I seem to got the problem
that the queue wasn't flushed correctly because the last entry was send,
but the status for this entry was never available in the registers.

Ivo



More information about the users mailing list