[rt2x00-users] [OpenWrt-Devel] rt61pci performance
Helmut Schaa
helmut.schaa at googlemail.com
Wed Oct 27 08:25:44 UTC 2010
Hi,
[Removed openwrt-devel from CC]
Am Freitag 22 Oktober 2010 schrieb Ivo Van Doorn:
> Hi,
>
> > Am Donnerstag 21 Oktober 2010 schrieb Scott Nicholas:
> >> On Thu, Oct 21, 2010 at 12:25 PM, Helmut Schaa
> >> <helmut.schaa at googlemail.com> wrote:
> >> > I fully agree. The same also applies to rt2800pci SoCs (rt305x) which are
> >> > working quite good already (in terms of WiFi) but also show more CPU usage
> >> > then the legacy drivers.
> >> >
> >> > IMO it would really make sense to work on improving rt2x00 & mac80211 instead
> >> > of putting much work in a maybe-better-behaving but unmaintainable driver.
> >> >
> >> > For example you could start by profiling rt61pci (if your platform supports
> >> > perf counters) to see if you can find any obvious bottlenecks.
> >>
> >> My platform SoC is MIPS32 4Kc core which does not seem to do
> >> profiling... I wonder if some reports made on different machine would
> >> help steer me in the correct direction, and if anyone has access to
> >> rt61pci that could do this..
In order to optimize rt2x00 we should first concentrate on the hotpaths (RX
and TX).
I've just reviewed the TX code and the TX descriptor handling seems to consume
more CPU then necessary. At least all TX descriptor fields are always filled
even though a device won't need all of these (for exampe rt2800 doesn't care
about some older fields like the plcp settings).
Furthermore the tx descriptor gets filled with data like cw_min and aifs which
is also available to the drivers through the queue struct and also is not used
by all drivers.
And third some bits in the tx descriptor flags are just "copied" from the tx
info provided by mac80211. For example:
327 if (!(tx_info->flags & IEEE80211_TX_CTL_NO_ACK))
328 __set_bit(ENTRY_TXD_ACK, &txdesc->flags);
The reason for generating the tx descriptor first is that different hw drivers
share parts of the tx descriptor handling and it doesn't make sense for example
to calculate the plcp settings in each driver on its own.
Hence, I'd suggest the following (at least in the long run):
1) Remove information from the tx desciptor that can be easily retrieved from
some other struct (like the queue parameters) -> easy.
2) Remove the tx descriptor but provide functions in rt2x00queue to generate
the fields in a central place and only the needed ones from the according
drivers.
Btw. is the tx info provided by mac80211 still available when calling the
drivers write_tx_desc function?
Then we could easily do:
3) Remove values and flags from the tx descriptor that are already present in
tx info.
I don't expect huge performance gains from these but at least on embedded
devices every CPU cycle is worth it ;)
Helmut
More information about the users
mailing list