Quantcast
Channel: Active questions tagged linux-kernel - Stack Overflow
Viewing all articles
Browse latest Browse all 12382

'Bad page state' kernel error appearing randomly on azure vm when SAS is running

$
0
0

Since a few months, we start seeing Bad page state errors appearing in /var/log/message.

Here is the exact stack trace

Sep 27 15:14:11 az-prod-sas1 kernel: BUG: Bad page state in process sas  pfn:1a49ffSep 27 15:14:11 az-prod-sas1 kernel: page:ffffd9a146927fc0 count:0 mapcount:1 mapping:          (null) index:0x7f48e7fffSep 27 15:14:11 az-prod-sas1 kernel: page flags: 0x2fffff00080018(uptodate|dirty|swapbacked)Sep 27 15:14:11 az-prod-sas1 kernel: page dumped because: nonzero mapcountSep 27 15:14:11 az-prod-sas1 kernel: Modules linked in: binfmt_misc iptable_security bridge stp llc nf_conntrack_netlink nfnetlink ext4 mbcache jbd2 nfsv3 nfs_acl nfs lockd grace drbg fscache ansi_cprng cmac arc4 md4 nls_utf8 cifs ccm dns_resolver overlay(T) ipt_REJECT nf_reject_ipv4 xt_conntrack iptable_filter ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack sunrpc dm_mirror dm_region_hash dm_log dm_mod joydev sb_edac iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg hv_utils ptp hv_balloon pps_core pcspkr i2c_piix4 ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic hv_netvsc hv_storvsc scsi_transport_fc hyperv_keyboard scsi_tgt hid_hyperv crct10dif_pclmul crct10dif_common crc32c_intel ata_genericSep 27 15:14:11 az-prod-sas1 kernel: pata_acpi floppy hyperv_fb ata_piix serio_raw libata hv_vmbusSep 27 15:14:11 az-prod-sas1 kernel: CPU: 2 PID: 117797 Comm: sas Tainted: G               ------------ T 3.10.0-957.12.2.el7.x86_64 #1Sep 27 15:14:11 az-pro-sas1 kernel: BUG: Bad page state in process sas  pfn:1a49ffSep 27 15:14:11 az-pro-sas1 kernel: page:ffffd9a146927fc0 count:0 mapcount:1 mapping:          (null) index:0x7f48e7fffSep 27 15:14:11 az-pro-sas1 kernel: page flags: 0x2fffff00080018(uptodate|dirty|swapbacked)Sep 27 15:14:11 az-pro-sas1 kernel: page dumped because: nonzero mapcountSep 27 15:14:11 az-pro-sas1 kernel: Modules linked in: binfmt_misc iptable_security bridge stp llc nf_conntrack_netlink nfnetlink ext4 mbcache jbd2 nfsv3 nfs_acl nfs lockd grace drbg fscache ansi_cprng cmac arc4 md4 nls_utf8 cifs ccm dns_resolver overlay(T) ipt_REJECT nf_reject_ipv4 xt_conntrack iptable_filter ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack sunrpc dm_mirror dm_region_hash dm_log dm_mod joydev sb_edac iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg hv_utils ptp hv_balloon pps_core pcspkr i2c_piix4 ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic hv_netvsc hv_storvsc scsi_transport_fc hyperv_keyboard scsi_tgt hid_hyperv crct10dif_pclmul crct10dif_common crc32c_intel ata_genericSep 27 15:14:11 az-pro-sas1 kernel: pata_acpi floppy hyperv_fb ata_piix serio_raw libata hv_vmbusSep 27 15:14:11 az-pro-sas1 kernel: CPU: 2 PID: 117797 Comm: sas Tainted: G               ------------ T 3.10.0-957.12.2.el7.x86_64 #1Sep 27 15:14:11 az-pro-sas1 kernel: Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017Sep 27 15:14:11 az-pro-sas1 kernel: Call Trace:Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a963041>] dump_stack+0x19/0x1bSep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a95dcf3>] bad_page.part.76+0xdc/0xf9Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a3bf100>] free_pages_prepare+0x170/0x190Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a3bfb74>] free_hot_cold_page+0x74/0x160Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a3c4a13>] __put_single_page+0x23/0x30Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a3c4a65>] put_page+0x45/0x60Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a42ca37>] __split_huge_page+0x357/0x880Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a42cfd6>] split_huge_page_to_list+0x76/0xf0Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a42de30>] __split_huge_page_pmd+0x1d0/0x5c0Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a3e781d>] unmap_page_range+0xbdd/0xc30Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a3e78f1>] unmap_single_vma+0x81/0xf0Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a3e8d2d>] zap_page_range+0x11d/0x190Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a3e3c1d>] SyS_madvise+0x49d/0xac0Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a975ddb>] system_call_fastpath+0x22/0x27Sep 27 15:14:11 az-pro-sas1 kernel: Disabling lock debugging due to kernel taintSep 27 15:14:12 az-pro-sas1 sh: abrt-dump-oops: Found oopses: 1disa

We manage to replicate the problem on other VMs and we tried upgrading/downgrading the kernel, we also tried disabling transparent huge pages but without luck.

Its a CentOS7 vm, running on Azure with the following versions:

  • CentOS Linux release 7.6.1810 (Core)
  • Linux 3.10.0-957.12.2.el7.x86_64

The error is appearing randomly but its always when SAS is running. When it occurs the SAS process just hangs forever and after some time the vm the CPUs start burning and the vm become non-responsive

Any help would be greatly appreciated!


Viewing all articles
Browse latest Browse all 12382

Trending Articles