Linux – QLE2562 HBA/qla2xxx problems on CentOS 5.3

centoslinux

I have several Linux servers (SunFire X4270) running CentOS 5.3 (kernel-2.6.18-128.1.16.el5) with Qlogic FC-8 QLE2562 HBA… I'm experiencing a lot of problem with these new servers, one of them displaying every second the following messages:

qla2xxx 0000:2f:00.0: Passthru CT request failed to login management server
qla2xxx 0000:2f:00.0: Passthru CT failed
qla2xxx 0000:2f:00.1: Passthru CT request failed to login management server
qla2xxx 0000:2f:00.1: Passthru CT failed

Also, I have several servers ending in panic with the following trace (see below).
I have tried several kernel versions of CentOS 5.3 2.6.18-128.el5 and 2.6.18-128.1.16.el5 (latest), also I have tried the latest driver from Qlogic with 4.06 embedded QLE2562 firmware, without success. Strange thing is that I have one other server, with the same hardware/software configuration running well (stable…). Sun support (available with these servers) has not been able to solve the problem yet…
Any ideas ?

qla2xxx_eh_abort(8): aborting sp ffff81037d86ebc0 from RISC. pid=952 sp->state=7 q->q_flag=2
qla2xxx 0000:2f:00.1: Mailbox command timeout occurred. Issuing ISP abort.
NMI Watchdog detected LOCKUP on CPU 13
CPU 13
Modules linked in: autofs4 sunrpc ipv6 xfrm_nalgo crypto_api cpufreq_ondemand acpi_cpufreq freq_table dm_mirror dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev qla2xxx(U) qla2xxx_conf(U) igb i2c_i801 intermodule(U) i2c_core sg pcspkr dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ahci libata shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 2982, comm: scsi_eh_8 Tainted: G      2.6.18-128.el5 #1
RIP: 0010:[<ffffffff8000c6f2>]  [<ffffffff8000c6f2>] __delay+0x8/0x10
RSP: 0018:ffff81067dc7db88  EFLAGS: 00000097
RAX: 00000000ecd06b41 RBX: 000000000018c42b RCX: 00000000ecd05808
RDX: 0000000000000324 RSI: 0000000000000046 RDI: 0000000000003689
RBP: ffffc20000034000 R08: 0000000000000002 R09: ffff81067dc7db54
R10: 0000000000000001 R11: ffffffff80213fbd R12: ffff81037e84c4f8
R13: 0000000000000246 R14: 0000000000000001 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff81067fc46140(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000006bb424 CR3: 000000067d035000 CR4: 00000000000006e0
Process scsi_eh_8 (pid: 2982, threadinfo ffff81067dc7c000, task ffff81010c6ec040)
Stack:  ffffffff8827f743 ffff81037e84c4f8 ffff81067dc7dc90 ffff81060000dc20
 ffff81037fa461c8 ffff81037e84c4f8 ffff81067dc7dc90 0000000000000100
 ffffffff88285488 ffff81037fa461c8 ffff81037e84c4f8 ffff81067dc7dc90
Call Trace:
 [<ffffffff8827f743>] :qla2xxx:qla2x00_reset_chip+0x157/0x47e
 [<ffffffff88285488>] :qla2xxx:qla2x00_abort_isp+0x6c/0x70b
 [<ffffffff88286dfd>] :qla2xxx:qla2x00_mailbox_command+0x48e/0x553
 [<ffffffff88286960>] :qla2xxx:qla2x00_mbx_sem_timeout+0x0/0xf
 [<ffffffff882886f5>] :qla2xxx:qla2x00_issue_iocb_timeout+0x5f/0xc0
 [<ffffffff88288fd0>] :qla2xxx:qla24xx_abort_command+0xf9/0x1a5
 [<ffffffff88289099>] :qla2xxx:qla2x00_abort_command+0x1d/0x124
 [<ffffffff80064c08>] _spin_unlock_irqrestore+0x8/0x9
 [<ffffffff8827f1e6>] :qla2xxx:qla2xxx_eh_abort+0x9f8/0xba0
 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8807919f>] :scsi_mod:scsi_error_handler+0x290/0x4ac
 [<ffffffff88078f0f>] :scsi_mod:scsi_error_handler+0x0/0x4ac
 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032360>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032262>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11


Code: 29 c8 48 39 f8 72 f5 c3 41 54 83 3d ad d8 3c 00 00 49 89 f4
Kernel panic - not syncing: nmi watchdog
 BUG: warning at kernel/panic.c:137/panic() (Tainted: G     )

Call Trace:
 <NMI>  [<ffffffff8008efff>] panic+0x1da/0x1eb
 [<ffffffff8006ba21>] _show_stack+0xdb/0xea
 [<ffffffff8006bb14>] show_registers+0xe4/0x100
 [<ffffffff8006537d>] die_nmi+0x66/0xa3
 [<ffffffff80065ac3>] nmi_watchdog_tick+0x157/0x1d3
 [<ffffffff800656e1>] default_do_nmi+0x81/0x225
 [<ffffffff8006594e>] do_nmi+0x43/0x61
 [<ffffffff80064fa7>] nmi+0x7f/0x88
 [<ffffffff80213fbd>] pci_mmcfg_read+0x0/0x92
 [<ffffffff8000c6f2>] __delay+0x8/0x10
 <<EOE>>  [<ffffffff8827f743>] :qla2xxx:qla2x00_reset_chip+0x157/0x47e
 [<ffffffff88285488>] :qla2xxx:qla2x00_abort_isp+0x6c/0x70b
 [<ffffffff88286dfd>] :qla2xxx:qla2x00_mailbox_command+0x48e/0x553
 [<ffffffff88286960>] :qla2xxx:qla2x00_mbx_sem_timeout+0x0/0xf
 [<ffffffff882886f5>] :qla2xxx:qla2x00_issue_iocb_timeout+0x5f/0xc0
 [<ffffffff88288fd0>] :qla2xxx:qla24xx_abort_command+0xf9/0x1a5
 [<ffffffff88289099>] :qla2xxx:qla2x00_abort_command+0x1d/0x124
 [<ffffffff80064c08>] _spin_unlock_irqrestore+0x8/0x9
 [<ffffffff8827f1e6>] :qla2xxx:qla2xxx_eh_abort+0x9f8/0xba0
 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8807919f>] :scsi_mod:scsi_error_handler+0x290/0x4ac
 [<ffffffff88078f0f>] :scsi_mod:scsi_error_handler+0x0/0x4ac
 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032360>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032262>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11

BUG: warning at drivers/input/serio/i8042.c:846/i8042_panic_blink() (Tainted: G     )

Call Trace:
 <NMI>  [<ffffffff801fa015>] i8042_panic_blink+0x112/0x2a5
 [<ffffffff8008efa5>] panic+0x180/0x1eb
 [<ffffffff8006ba21>] _show_stack+0xdb/0xea
 [<ffffffff8006bb14>] show_registers+0xe4/0x100
 [<ffffffff8006537d>] die_nmi+0x66/0xa3
 [<ffffffff80065ac3>] nmi_watchdog_tick+0x157/0x1d3
 [<ffffffff800656e1>] default_do_nmi+0x81/0x225
 [<ffffffff8006594e>] do_nmi+0x43/0x61
 [<ffffffff80064fa7>] nmi+0x7f/0x88
 [<ffffffff80213fbd>] pci_mmcfg_read+0x0/0x92
 [<ffffffff8000c6f2>] __delay+0x8/0x10
 <<EOE>>  [<ffffffff8827f743>] :qla2xxx:qla2x00_reset_chip+0x157/0x47e
 [<ffffffff88285488>] :qla2xxx:qla2x00_abort_isp+0x6c/0x70b
 [<ffffffff88286dfd>] :qla2xxx:qla2x00_mailbox_command+0x48e/0x553
 [<ffffffff88286960>] :qla2xxx:qla2x00_mbx_sem_timeout+0x0/0xf
 [<ffffffff882886f5>] :qla2xxx:qla2x00_issue_iocb_timeout+0x5f/0xc0
 [<ffffffff88288fd0>] :qla2xxx:qla24xx_abort_command+0xf9/0x1a5
 [<ffffffff88289099>] :qla2xxx:qla2x00_abort_command+0x1d/0x124
 [<ffffffff80064c08>] _spin_unlock_irqrestore+0x8/0x9
 [<ffffffff8827f1e6>] :qla2xxx:qla2xxx_eh_abort+0x9f8/0xba0
 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8807919f>] :scsi_mod:scsi_error_handler+0x290/0x4ac
 [<ffffffff88078f0f>] :scsi_mod:scsi_error_handler+0x0/0x4ac
 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032360>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032262>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11

BUG: warning at drivers/input/serio/i8042.c:849/i8042_panic_blink() (Tainted: G     )

Call Trace:
 <NMI>  [<ffffffff801fa0fe>] i8042_panic_blink+0x1fb/0x2a5
 [<ffffffff8008efa5>] panic+0x180/0x1eb
 [<ffffffff8006ba21>] _show_stack+0xdb/0xea
 [<ffffffff8006bb14>] show_registers+0xe4/0x100
 [<ffffffff8006537d>] die_nmi+0x66/0xa3
 [<ffffffff80065ac3>] nmi_watchdog_tick+0x157/0x1d3
 [<ffffffff800656e1>] default_do_nmi+0x81/0x225
 [<ffffffff8006594e>] do_nmi+0x43/0x61
 [<ffffffff80064fa7>] nmi+0x7f/0x88
 [<ffffffff80213fbd>] pci_mmcfg_read+0x0/0x92
 [<ffffffff8000c6f2>] __delay+0x8/0x10
 <<EOE>>  [<ffffffff8827f743>] :qla2xxx:qla2x00_reset_chip+0x157/0x47e
 [<ffffffff88285488>] :qla2xxx:qla2x00_abort_isp+0x6c/0x70b
 [<ffffffff88286dfd>] :qla2xxx:qla2x00_mailbox_command+0x48e/0x553
 [<ffffffff88286960>] :qla2xxx:qla2x00_mbx_sem_timeout+0x0/0xf
 [<ffffffff882886f5>] :qla2xxx:qla2x00_issue_iocb_timeout+0x5f/0xc0
 [<ffffffff88288fd0>] :qla2xxx:qla24xx_abort_command+0xf9/0x1a5
 [<ffffffff88289099>] :qla2xxx:qla2x00_abort_command+0x1d/0x124
 [<ffffffff80064c08>] _spin_unlock_irqrestore+0x8/0x9
 [<ffffffff8827f1e6>] :qla2xxx:qla2xxx_eh_abort+0x9f8/0xba0
 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8807919f>] :scsi_mod:scsi_error_handler+0x290/0x4ac
 [<ffffffff88078f0f>] :scsi_mod:scsi_error_handler+0x0/0x4ac
 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032360>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032262>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11

BUG: warning at drivers/input/serio/i8042.c:851/i8042_panic_blink() (Tainted: G     )

Call Trace:
 <NMI>  [<ffffffff801fa17b>] i8042_panic_blink+0x278/0x2a5
 [<ffffffff8008efa5>] panic+0x180/0x1eb
 [<ffffffff8006ba21>] _show_stack+0xdb/0xea
 [<ffffffff8006bb14>] show_registers+0xe4/0x100
 [<ffffffff8006537d>] die_nmi+0x66/0xa3
 [<ffffffff80065ac3>] nmi_watchdog_tick+0x157/0x1d3
 [<ffffffff800656e1>] default_do_nmi+0x81/0x225
 [<ffffffff8006594e>] do_nmi+0x43/0x61
 [<ffffffff80064fa7>] nmi+0x7f/0x88
 [<ffffffff80213fbd>] pci_mmcfg_read+0x0/0x92
 [<ffffffff8000c6f2>] __delay+0x8/0x10
 <<EOE>>  [<ffffffff8827f743>] :qla2xxx:qla2x00_reset_chip+0x157/0x47e
 [<ffffffff88285488>] :qla2xxx:qla2x00_abort_isp+0x6c/0x70b
 [<ffffffff88286dfd>] :qla2xxx:qla2x00_mailbox_command+0x48e/0x553
 [<ffffffff88286960>] :qla2xxx:qla2x00_mbx_sem_timeout+0x0/0xf
 [<ffffffff882886f5>] :qla2xxx:qla2x00_issue_iocb_timeout+0x5f/0xc0
 [<ffffffff88288fd0>] :qla2xxx:qla24xx_abort_command+0xf9/0x1a5
 [<ffffffff88289099>] :qla2xxx:qla2x00_abort_command+0x1d/0x124
 [<ffffffff80064c08>] _spin_unlock_irqrestore+0x8/0x9
 [<ffffffff8827f1e6>] :qla2xxx:qla2xxx_eh_abort+0x9f8/0xba0
 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8807919f>] :scsi_mod:scsi_error_handler+0x290/0x4ac
 [<ffffffff88078f0f>] :scsi_mod:scsi_error_handler+0x0/0x4ac
 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032360>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032262>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11

Best Answer

For qla2xxx 0000:2f:00.0: Passthru CT request failed to login management server if it appends only on one server you may have an hardware problem with the card. Did you try to put this card in another server ?
For the server running well, I would try the same test by putting his card from serverA to serverB and see if serverB start to be stable or if serverA is still stable.

Related Topic