Kernel panics on two rt61pci cards (ad-hoc)

Live forum: http://rt2x00.serialmonkey.com/viewtopic.php?t=5247

dxdx

09-04-2009 23:19:09

I have two different rt2561-based cards connected in ad-hoc mode, one in the laptop (edenspring) and one in the server (carl0s). Both are using the rt61pci driver.

[code396yrapy]Linux edenspring 2.6.29-ARCH #1 SMP PREEMPT Sun Mar 29 18:08:33 UTC 2009 i686 AMD Turion(tm) 64 Mobile Technology MT-30 AuthenticAMD GNU/Linux
Linux carl0s 2.6.29-wl-dx #1 Mon Apr 6 02:12:05 ART 2009 i686 Pentium II (Deschutes) GenuineIntel GNU/Linux[/code396yrapy]

I've compiled the latter from the rt2x00 branch and followed the steps in this thread[/url396yrapy] to set up an access point, got hostapd working and everything but couldn't send anything -dhcp/ping/etc- from one host to the other. It would also help if anyone has a clue on this one.

But, anyway, they are now in ad-hoc mode and it seemed to be OK, with and without encryption (just WEP). But yesterday after checking the packet loss rates everywhere in my house, i returned to my room, and when i walked next to the server the capslock/numlock keys started flashing. X was frozen and syslog didn't catch the whole thing, but I hope this helps

[code396yrapy]
Apr 8 19:32:16 edenspring kernel: ------------[ cut here ]------------
Apr 8 19:32:16 edenspring kernel: WARNING: at net/mac80211/rc80211_minstrel.c:69 minstrel_tx_status+0x8d/0x110 [mac80211]()
Apr 8 19:32:16 edenspring kernel: Hardware name: 2100 Series
Apr 8 19:32:16 edenspring kernel: Modules linked in: radeon drm ipv6 sco bridge stp llc bnep l2cap bluetooth cpufreq_ondemand ext2 arc4 ecb rt61pci crc_itu_t snd_seq_oss rt2x00pci rt2x00lib snd_seq_midi_event snd
_seq snd_seq_device rfkill joydev snd_pcm_oss snd_mixer_oss snd_atiixp_modem snd_atiixp snd_ac97_codec sdhci_pci sdhci mac80211 pcmcia psmouse ac97_bus cfg80211 mmc_core snd_pcm snd_timer pcspkr ohci1394 serio_raw
snd soundcore snd_page_alloc k8temp eeprom_93cx6 led_class yenta_socket rsrc_nonstatic pcmcia_core ieee1394 8139cp 8139too mii sg i2c_piix4 ohci_hcd ehci_hcd usbcore shpchp pci_hotplug ati_agp agpgart evdev therm
al fan button battery ac msi_laptop powernow_k8 freq_table processor rtc_cmos rtc_core rtc_lib ext4 mbcache jbd2 crc16 sr_mod cdrom sd_mod ata_generic pata_atiixp pata_acpi libata scsi_mod radeonfb fb_ddc i2c_algo
_bit i2c_core
Apr 8 19:32:16 edenspring kernel: Pid: 0, comm: swapper Tainted: G A 2.6.29-ARCH #1
Apr 8 19:32:16 edenspring kernel: Call Trace:
Apr 8 19:32:16 edenspring kernel: [<c012f9a7>] warn_slowpath+0x87/0xe0
Apr 8 19:32:16 edenspring kernel: [<c01264f7>] enqueue_task_fair+0x137/0x1a0
Apr 8 19:32:16 edenspring kernel: [<ddadb43d>] minstrel_tx_status+0x8d/0x110 [mac80211]
Apr 8 19:32:16 edenspring kernel: [<ddabeeab>] ieee80211_tx_status+0xfb/0x510 [mac80211]
Apr 8 19:32:16 edenspring kernel: [<ddabf3df>] ieee80211_tasklet_handler+0x11f/0x130 [mac80211]

Apr 8 19:34:09 edenspring kernel: Linux version 2.6.29-ARCH (root@T-POWA-LX) (gcc version 4.3.3 (GCC) ) #1 SMP PREEMPT Sun Mar 29 18:08:33 UTC 2009
Apr 8 19:34:09 edenspring kernel: KERNEL supported cpus:
Apr 8 19:34:09 edenspring kernel: Intel GenuineIntel
...
[/code396yrapy]


Today, the same thing happened with the server. I didn't notice it when it happened, but I could take a photo of the screen. Only the last 25 lines, but it's better than nothing, and syslog didn't catch anything of course.
The quality sucks so i'll just transcribe the function names. If I missed anything important, i'll upload the pic and let you decipher it..
[code396yrapy]
[<hex>] ? dev_queue_xmit +hex/hex
[<hex>] ? ip_finish_output +hex/hex
[<hex>] ? ip_output +hex/hex
[<hex>] ? ip_forward_finish +hex/hex
[<hex>] ? ip_forward +hex/hex
[<hex>] ? ip_rcv_finish +hex/hex
[<hex>] ? ip_rcv +hex/hex
[<hex>] ? netif_receive_skb +hex/hex
[<hex>] ? napi_gro_receive +hex/hex
[<hex>] ? process_backlog +hex/hex
[<hex>] ? net_rx_action +hex/hex
[<hex>] ? __do_softirq +hex/hex
[<hex>] ? do_softirq +hex/hex
[<hex>] ? irq_exit +hex/hex
[<hex>] ? do_IRQ +hex/hex
[<hex>] ? common_interrupt +hex/hex
[<hex>] ? piix4_probe +hex/hex
Code: ...a long list of hex numbers that you may or may not need...
EIP: [<hex>] rt2x00queue_create_tx_descriptor+0x182/0x2ae SS/ESP <hex>
[/code396yrapy]
Then kernel panic, not syncing, fatal exception in interrupt.


Any ideas? Is this a bug, some conflict with other module, or what? (piix4 sounds like module for me)


[b396yrapy]edit[/b396yrapy] The same thing happened today, the server froze. This hurts my uptime wink.
[b396yrapy]edit2[/b396yrapy] I'm trying to enable debugfs in the server.

IvD

10-04-2009 18:52:44

The first bug is located in the mac80211 stack, I'll see if there have been some patches for that issue which have not yet been merged into wireless-testing or rt2x00.git yet. The bug looks obvious enough for more people to have experienced the problem, so I bet there is a patch somewhere which I can merge. ;)

dxdx

11-04-2009 17:20:18

I've got new ones. This time no panics, just a nice oops.

Everything that follows is in carl0s
Linux carl0s 2.6.29-wl-dx #1 Mon Apr 6 021205 ART 2009 i686 Pentium II (Deschutes) GenuineIntel GNU/Linux
I still haven't installed the debugfs-enabled kernel.


[code3s9oa9t1][root@carl0s ~]# iwconfig wlan0 essid lol mode ad-hoc
[root@carl0s ~]# carl0s [76143.761770] Oops: 0002 [#1]
carl0s [76143.761832] last sysfs file: /sys/kernel/uevent_seqnum
carl0s [76143.762046] Process phy0 (pid: 520, ti=cf99a000 task=cf8a9650 task.ti=cf99a000)
carl0s [76143.762046] Stack:
carl0s [76143.762046] cf99bfbc cf99bfbc cf952e80 c0225270 00000000 cf99bfe0 c02276d6 c022769d
carl0s [76143.762046] 00000000 00000000 c02032df cf82bda8 00000000 00000000 00000000 00000007
carl0s [76143.762046] Call Trace:
carl0s [76143.762046] [<c022531f>] ? worker_thread+0xaf/0xbb
carl0s [76143.762046] [<c02279e5>] ? autoremove_wake_function+0x0/0x33
carl0s [76143.762046] [<c0225270>] ? worker_thread+0x0/0xbb
carl0s [76143.762046] [<c02276d6>] ? kthread+0x39/0x5e
carl0s [76143.762046] [<c022769d>] ? kthread+0x0/0x5e
carl0s [76143.762046] [<c02032df>] ? kernel_thread_helper+0x7/0x10
carl0s [76143.762046] Code: 50 68 b4 bc 4e c0 68 66 60 5e c0 e8 e1 d2 2b 00 e8 b1 d1 2b 00 83 c4 0c e9 83 00 00 00 8d 59 fc 8b 7b 0c 89 5e 10 8b 11 8b 41 04 <89> 42 04 89 10 89 49 04 89 09 fb 8b 41 fc 83 e0 fc 39 c6 74 04
carl0s [76143.762046] EIP: [<c0224e80>] run_workqueue+0x44/0xcf SS:ESP 0068:cf99bf9c
[/code3s9oa9t1]

dmesg

[code3s9oa9t1][76143.761518] BUG: unable to handle kernel NULL pointer dereference at 00000004
[76143.761660] IP: [<c0224e80>] run_workqueue+0x44/0xcf
[76143.761753] *pde = 00000000
[76143.761770] Oops: 0002 [#1]
[76143.761832] last sysfs file: /sys/kernel/uevent_seqnum
[76143.761902] Modules linked in: pcspkr
[76143.761971]
[76143.762033] Pid: 520, comm: phy0 Not tainted (2.6.29-wl-dx #1) Deskpro EN Series SFF
[76143.762046] EIP: 0060:[<c0224e80>] EFLAGS: 00010083 CPU: 0
[76143.762046] EIP is at run_workqueue+0x44/0xcf
[76143.762046] EAX: 00000000 EBX: cf83f4a8 ECX: cf83f4ac EDX: 00000000
[76143.762046] ESI: cf952e80 EDI: 00000000 EBP: cf99bfa8 ESP: cf99bf9c
[76143.762046] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[76143.762046] Process phy0 (pid: 520, ti=cf99a000 task=cf8a9650 task.ti=cf99a000)
[76143.762046] Stack:
[76143.762046] cf952e80 cf99bfb0 cf952e88 cf99bfd0 c022531f 00000000 cf8a9650 c02279e5
[76143.762046] cf99bfbc cf99bfbc cf952e80 c0225270 00000000 cf99bfe0 c02276d6 c022769d
[76143.762046] 00000000 00000000 c02032df cf82bda8 00000000 00000000 00000000 00000007
[76143.762046] Call Trace:
[76143.762046] [<c022531f>] ? worker_thread+0xaf/0xbb
[76143.762046] [<c02279e5>] ? autoremove_wake_function+0x0/0x33
[76143.762046] [<c0225270>] ? worker_thread+0x0/0xbb
[76143.762046] [<c02276d6>] ? kthread+0x39/0x5e
[76143.762046] [<c022769d>] ? kthread+0x0/0x5e
[76143.762046] [<c02032df>] ? kernel_thread_helper+0x7/0x10
[76143.762046] Code: 50 68 b4 bc 4e c0 68 66 60 5e c0 e8 e1 d2 2b 00 e8 b1 d1 2b 00 83 c4 0c e9 83 00 00 00 8d 59 fc 8b 7b 0c 89 5e 10 8b 11 8b 41 04 <89> 42 04 89 10 89 49 04 89 09 fb 8b 41 fc 83 e0 fc 39 c6 74 04
[76143.762046] EIP: [<c0224e80>] run_workqueue+0x44/0xcf SS:ESP 0068:cf99bf9c
[76143.762046] ---[ end trace a09399dac1b66dc0 ]---
[/code3s9oa9t1]

alexv

11-05-2009 18:01:38

Hi there... Not sure if i have the exactly same problem but i'm getting bugged by something similar. I'm experiencing lockups (cursor freezes, keyboard leds too) when i try using a kernel newer than 2.6.28, the backports package in ubuntu or the drivers in the compat-wireless package. I've found that they all share something similar - the driver they use is version 2.3.0. The unfortunate thing is when it happens the screen just freezes, and after the hard-reboot there is nothing at all in the logs which can tell me why it happened ( . I'm almost a 100% sure that the bug is in the rt61pci driver because on the laptop (atheros card) i don't have such problems. If there is a way for me to help resolve this frustrating bug please tell me.

BTW. forgot to mention - i too have an ad-hoc newtork with WEP encryption (WPA/WPA2 doesn't work for some reason) between two RT2561 cards (and the atheros one on the laptop). It freezes when it receives traffic from either one of them.

dxdx

12-05-2009 05:03:44

Hi there... Not sure if i have the exactly same problem but i'm getting bugged by something similar. I'm experiencing lockups (cursor freezes, keyboard leds too) when i try using a kernel newer than 2.6.28, the backports package in ubuntu or the drivers in the compat-wireless package. I've found that they all share something similar - the driver they use is version 2.3.0. [/quotek7kh1222]

You mean that there was an older version that worked fine with ad-hoc?

And yes, those are the same symptoms I have. It seems to be reproducible always (for me), so you could switch to a tty (it would be better if it is bigger than 80x25, but i'm not sure if that would print the error), connect to the ad-hoc and let it die. Then, uhm, take a photo?

alexv

12-05-2009 06:01:24

Yeah, version 2.2.1 which is found in the 2.6.28 kernel works fine (i'm using it right now). I've tried the tty method but nothing useful came out. It just freezes without any warning or output.