PHY_CSR3 register busy

Live forum: http://rt2x00.serialmonkey.com/viewtopic.php?t=4192

mcornils

20-08-2007 20:36:27

Hmm, while scanning/associating works fine, I just ssh'ed into the box with the rt73usb stick, worked for a while editing with vim, when suddenly the connection hung.

On the console, I get the following messages (without the [x])
[1] phy0 -> rt73usb_init_bbp Error - BBP register access failed, aborting.
[2] phy0 -> rt73usb_enable_radio Error - Register initalization failed.
[3] phy0 -> rt73usb_bbp_read Error - PHY_CSR3 register busy. Read failed.
[4] phy0 -> rt2x00usb_vendor_request Error - Vendor request 0x06 failed for offset 0x308c with error -110.
[5] phy0 -> rt2x00usb_vendor_request Error - Vendor request 0x07 failed for offset 0x308c with error -110.

The [4,5] lines occur quite often, seemingly with a 50% chance of [4] versus [5]. About every 20 [4,5] lines, there's the [3] message, about every 60 [4,5] lines there's the [1] line followed by the [2] line.

Any ideas? I'd be glad to provide further info and/or debugging.

Yours,
-Malte Cornils

mcornils

20-08-2007 20:44:36

By the way My changes to my previous setup (in addition to the reverted patch IvD pointed out in this thread
http//rt2x00.serialmonkey.com/phpBB2/v ... php?t=4188
) are limited to the disabling of the debug messages in mac80211/rt2x00 drivers.

IvD

20-08-2007 21:27:35

You are using the latest verion right, and manually reverted the patch or used git to jump back to before that patch was applied?

e.g Your version contains the following patch http//git.kernel.org/?p=linux/kernel/g ... 0884c085d2

mcornils

20-08-2007 21:45:02

Yes, I "unapplied" the IV encryption flag patch manually while staying on current git. So (and I just rechecked), my tree does contain the mentioned patch "rt2x00 Clear MAC and BSSID when non-monitor interface goes down".

Or should I revert to an older git version?

-Malte Cornils

IvD

20-08-2007 22:11:28

(I assume you mean the patch "rt2x00 Check return value of usb_control_msg()")

Anyway, no unapplying that patch manually was what I intented. )

Could you edit rt2x00usb.c and change the function rt2x00usb_vendor_request()

From
[code3vfzlpqr]
if (status == -ETIMEDOUT)
time *= 2;
[/code3vfzlpqr]

To
[code3vfzlpqr]
if (status == -ETIMEDOUT)
time *= 3;
[/code3vfzlpqr]

And see if that produces better results

chrisV

20-08-2007 22:33:28

Hmm, while scanning/associating works fine, I just ssh'ed into the box with the rt73usb stick, worked for a while editing with vim, when suddenly the connection hung.

On the console, I get the following messages (without the [x])
[1] phy0 -> rt73usb_init_bbp Error - BBP register access failed, aborting.
[2] phy0 -> rt73usb_enable_radio Error - Register initalization failed.
[3] phy0 -> rt73usb_bbp_read Error - PHY_CSR3 register busy. Read failed.
[4] phy0 -> rt2x00usb_vendor_request Error - Vendor request 0x06 failed for offset 0x308c with error -110.
[5] phy0 -> rt2x00usb_vendor_request Error - Vendor request 0x07 failed for offset 0x308c with error -110.
[/quote2i5mavp2]

Line [4] is the same as the error I reported in my first message under the message thread title "rt2500usb DHCP/communication Issues" (Mon Aug 13, 2007 701 pm), but I got a slightly different version of line [5]. It also occurred in similar circumstances, that is where a mounted an NFS filesystem on the laptop which has the USB stick and started copying files out of it. Going in the reverse direction works OK.

I cannot test any fixes at the moment because the latest version of the driver has different issues which prevent me using it effectively.

Chris

chrisV

20-08-2007 23:03:15


I cannot test any fixes at the moment because the latest version of the driver has different issues which prevent me using it effectively.
[/quote2xkjewa9]

I went back to 13th August CVS, which works reasonably well for me, and applied the change to rt2x00usb_vendor_request() in that version and it didn't seem to make much difference (although the offset number of the error report changed slightly, probably entirely coincidentally). Here is typical output with this change, when I try to mount an NFS filesystem on the laptop which has the stick from another computer

laptop kernel phy0 -> rt2x00usb_vendor_request Error - Vendor Request 0x07 failed for offset 0x30c0 with error -110.
laptop kernel phy0 -> rt2x00usb_vendor_request Error - Vendor Request 0x07 failed for offset 0x30c4 with error -110.
laptop kernel phy0 -> rt2x00usb_vendor_request Error - Vendor Request 0x07 failed for offset 0x308c with error -110.
laptop last message repeated 4 times
laptop kernel phy0 -> rt73usb_bbp_read Error - PHY_CSR3 register busy. Read failed.

Chris

mcornils

20-08-2007 23:27:09

Hello ChrisV, IvD,

I just tried the latest suggestion by IvD, increasing the timeout value. The connection still started failing.

There are some further hints, though
1) I forgot to mention that there's some lines mentioning writes to PHY_CSR4 which fail also.
2) it's the second time in a row that the connection drops after starting vim under the ssh connection and pressing PgDown. It's not necessarily the first PgDown, but I can reproduce scrolling with the arrow keys, pressing PgDown/PgUp a few times, and pressing the key some times is ok, but all of a sudden it hangs again immediately after having pressed PgDown. This is just weird, since it's encrypted traffic and should not generate any specific pattern while pressing a certain key (I hope).

[b12lifcuy]Update[/b12lifcuy] It's also not related to sudden traffic bursts in a single direction - PgUp works fine, and scp'ing of files works in both directions...

Also attached are the dmesg messages before the problems occur (since the "duplicate address detected" warning seems suspicious)
agpgart Found an AGP 2.0 compliant device at 00000000.0.
agpgart Putting AGP V2 device at 00000000.0 into 1x mode
agpgart Putting AGP V2 device at 00000100.0 into 1x mode
[drm] Setting GART location based on new memory map
[drm] Loading R200 Microcode
[drm] writeback test succeeded in 1 usecs
apm BIOS version 1.2 Flags 0x03 (Driver version 1.16ac)
apm overridden by ACPI.
hci_usb_isoc_rx_submit hci0 isoc rx submit failed urb d7191414 err -28
hci_usb_isoc_rx_submit hci0 isoc rx submit failed urb d7191414 err -28
Bluetooth L2CAP ver 2.8
Bluetooth L2CAP socket layer initialized
Bluetooth RFCOMM socket layer initialized
Bluetooth RFCOMM TTY layer initialized
Bluetooth RFCOMM ver 1.8
wlan0 Initial auth_alg=0
wlan0 authenticate with AP 00150cf7ecf1
wlan0 RX authentication from 00150cf7ecf1 (alg=0 transaction=2 status=0)
wlan0 authenticated
wlan0 associate with AP 00150cf7ecf1
wlan0 RX AssocResp from 00150cf7ecf1 (capab=0x411 status=0 aid=1)
wlan0 associated
wlan0 CTS protection enabled (BSSID=00150cf7ecf1)
ADDRCONF(NETDEV_CHANGE) wlan0 link becomes ready
wlan0 no IPv6 routers present
wlan0 duplicate address detected!

chrisV

21-08-2007 09:44:02


[bu6kqy3vf]Update[/bu6kqy3vf] It's also not related to sudden traffic bursts in a single direction - PgUp works fine, and scp'ing of files works in both directions...
[/quoteu6kqy3vf]

Well it is for me it is definitely related to sudden traffic bursts in a single direction in certain circumstances, related to cases where a server is running on the laptop which has the stick (tested with nfsd and sshd).

With [bu6kqy3vf]NFS[/bu6kqy3vf], if I mount a NFS filesystem on the main computer on the network from the laptop, I can do whatever I want with the filesystem - I can copy files onto it or from it with the laptop with no ill effects.

If I mount a NFS filesystem on my laptop from my main computer on the network, I can copy files from the main computer onto the laptop with no ill effects. However if I stimulate significant traffic from (rather than to) the laptop by copying a file from the laptop onto my main computer it locks up with vendor request errors.

Similarly I can do what I want with [bu6kqy3vf]SSH[/bu6kqy3vf] when logged into the main computer from the laptop (I do this very regularly). However if I am logged into the laptop from the main computer and stimulate significant traffic from the laptop, say by viewing a large text file with 'less', then I get a lock up with the vendor request errors. This is the equivalent of what you are doing with vi.

The pattern looks clear. But then when I reproduce your test with scp by copying a file from the laptop to the main computer by using scp on the main computer (thus logging into the laptop sshd server) it does not lock up. Bizarre.

Chris

mcornils

22-08-2007 15:04:54

A further detail vim has two key bindings for "scrolling down", PgDown and Ctrl-F.

Both lead to the timeout problems. So, it really seems to be a specific traffic pattern that causes it.

As mentioned before, scrolling in the other direction works fine.

Any ideas what I could do to narrow this down?

AdamBaker

23-08-2007 20:13:03

git was updated yesterday, plugged into a USB2 port. I haven't changed the timeout to *3

Activity was a simple http get

[code2g3fl6oy]
[ 9201.736580] wmaster0: STA 00:14:bf:23:80:62 Average rate: 540 (3240/6)
[ 9261.720554] wmaster0: STA 00:14:bf:23:80:62 Average rate: 540 (8100/15)
[ 9321.750623] wmaster0: STA 00:14:bf:23:80:62 Average rate: 540 (1080/2)
[ 9362.872854] phy1 -> rt2x00usb_vendor_request: Error - Vendor Request 0x07 failed for offset 0x30c0 with error -110.
[ 9363.916785] phy1 -> rt2x00usb_vendor_request: Error - Vendor Request 0x07 failed for offset 0x308c with error -110.
[ 9364.548438] phy1 -> rt2x00usb_vendor_request: Error - Vendor Request 0x06 failed for offset 0x308c with error -110.
[ 9365.180807] phy1 -> rt2x00usb_vendor_request: Error - Vendor Request 0x07 failed for offset 0x308c with error -110.
[ 9383.483650] wmaster0: STA 00:14:bf:23:80:62 Average rate: 540 (17280/32)
[ 9443.499592] wmaster0: STA 00:14:bf:23:80:62 Average rate: 540 (3780/7)
[/code2g3fl6oy]

The low numbers of packets being seen by rate control seem to be consistent with the very low throughput I'm seeing.

I do notice that /proc/sys/debug/ieee80211/phy1/statistics/multicast_received_frame_count is surprisingly high and grows much faster than rx packets as reported by ifconfig

mcornils

22-09-2007 20:28:43

By the way, after checking out latest git (well 20 hours ago) including the "Increase register timeout" patch, I still have the same problem. It's still reproducible.

Any ideas?

IvD

22-09-2007 21:45:43

Could you try editing the file rt2x00usb.h
and change the define REGISTER_TIMEOUT
from 35 to 500

mcornils

23-09-2007 00:02:17

Unfortunately, the problem still occurs -(

-Malte

IvD

23-09-2007 10:10:23

Does the legacy driver actually work for you?

mcornils

23-09-2007 20:21:02

Hmm, I have to admit I did not try those since they seem to require a non-standard configuration method for WPA connections.

I hope I have some time to try it anway soon, but if not, it could take up to three weeks. Sorry!

-Malte

chrisV

23-09-2007 22:32:49

Does the legacy driver actually work for you?[/quote2a1hdqvj]
With rt2x00 CVS as existing immediately prior to your commit of 22nd September, and with a stock 2.6.22.6 kernel, together with patches of rt2x00mac_tx_rts_cts() in rt2x00mac.c, IEEE80211_TXCTL_LONG_RETRY_LIMIT in rt2x00_compat.h and IEEE80211_HW_WEP_INCLUDE_IV in rt73usb.c, and changing REGISTER_TIMEOUT to 500 in rt2x00usb.h, I still get the same error with my rt73 stick in the circumstances described earlier in this thread. Apart from that the rt2x00 works fine (no problems with association and, before it hangs on vendor_request_errors, reasonable throughput). This is with it plugged into a USB2 port.

The rt73 stick works correctly with the legacy driver.

chrisV

25-09-2007 00:09:50

I have also now tried with your (IvD's) git kernel tree, as last committed by you just over an hour ago, and I get the same problem, although it takes longer to establish itself. When it does, the reported error is now writing to the PHY_CSR4 register, rather than reading from the PHY_CSR3 register, and the offset is different, and the error number varies (mainly -110, but also -71 and -19). Here is a typical log extract

Sep 25 005426 laptop kernel phy4 -> rt2x00usb_vendor_request Error - Vendor Request 0x07 failed for offset 0x3090 with error -110.
Sep 25 005427 laptop kernel phy4 -> rt2x00usb_vendor_request Error - Vendor Request 0x06 failed for offset 0x3090 with error -110.
Sep 25 005427 laptop kernel phy4 -> rt2x00usb_vendor_request Error - Vendor Request 0x07 failed for offset 0x3090 with error -71.
Sep 25 005427 laptop kernel phy4 -> rt2x00usb_vendor_request Error - Vendor Request 0x07 failed for offset 0x3090 with error -19.
Sep 25 005427 laptop last message repeated 3 times
Sep 25 005427 laptop kernel phy4 -> rt73usb_rf_write Error - PHY_CSR4 register busy. Write failed.
Sep 25 005427 laptop kernel phy4 -> rt2x00usb_vendor_request Error - Vendor Request 0x07 failed for offset 0x3090 with error -19.
Sep 25 005427 laptop last message repeated 4 times
Sep 25 005427 laptop kernel phy4 -> rt73usb_rf_write Error - PHY_CSR4 register busy. Write failed.
Sep 25 005427 laptop kernel phy4 -> rt2x00usb_vendor_request Error - Vendor Request 0x07 failed for offset 0x3090 with error -19.
Sep 25 005427 laptop last message repeated 4 times
Sep 25 005427 laptop kernel phy4 -> rt73usb_rf_write Error - PHY_CSR4 register busy. Write failed.
Sep 25 005427 laptop kernel phy4 -> rt2x00usb_vendor_request Error - Vendor Request 0x07 failed for offset 0x3090 with error -19.
Sep 25 005427 laptop last message repeated 4 times

However, if I don't provoke these errors by logging into my laptop with ssh, in general use I now get quite an unstable connection with disassociation after a minute or so of use, so there is also something else amiss with the latest git version.

Chris

chrisV

25-09-2007 11:20:16


However, if I don't provoke these errors by logging into my laptop with ssh, in general use I now get quite an unstable connection with disassociation after a minute or so of use, so there is also something else amiss with the latest git version.
[/quote2ge8lux8]

On examining this further, I have had your kernel running all morning on my laptop and the disassociation is not very problematic after all. The association is dropped at 30 minutes intervals and then it reassociates itself again without any dramas - probably this is something done by the AP rather than rt2x00. Apart from the vendor_request_error lock-ups when I try to ssh into my laptop, it is working quite well.

I am going to go back to a stable kernel + legacy driver again now but if there is any further information you need about the vendor_request_error lock-ups, let me know.

Chris

chrisV

09-10-2007 22:53:28

I have tested today's git kernel and I no longer get these errors. I have to apply Adam Baker's patch to get 802.11g speeds, but apart from that it seems to work well with my rt73 stick.

Chris

mcornils

18-10-2007 18:53:32

Hello,

I have also tested latest git to see whether the issue still crops up - it does not (at least for now). Also, the driver feels much snappier. I'll be sure to let you know of further problems, but for now, I'd recommend marking this as no longer reproducible/probably fixed in your issues list. Woohoo!