linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Fwd: kernel tried to execute NX-protected page - exploit attempt? (uid: 0) (qbittorrent with tx-nocache-copy)
@ 2023-10-24  8:15 Bagas Sanjaya
  2023-10-24  8:53 ` Bagas Sanjaya
  0 siblings, 1 reply; 4+ messages in thread
From: Bagas Sanjaya @ 2023-10-24  8:15 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Linux Networking,
	Linux Memory Management List
  Cc: David S. Miller, Eric Dumazet, Benjamin Poirier, Tom Herbert,
	Jakub Kicinski, Paolo Abeni, Andrew Morton, CM76

Hi,

I notice a regression report on Bugzilla [1]. Quoting from it:

> I believe this is also an issue with the Broadcom bnx2 drivers since it only seem to happen when I enable "tx-nocache-copy" in ethtool.  
> 
> The issue started when I was running Mainline/stable Kernel v6.5.x on another machine, after google-ing a bit I landed on an article from redhat that pointed at the possibility of an issue caused by a failing hardware. I was renting the server, so I didn't bother to fill a bug report and assumed it was the server that was going bad. But then it happened again on my other server as soon as I switched the bittorrent client to the same I was using on that other server. I turned "tx-nocache-copy" off and ran mainline kernel v6.5 (on Ubuntu 23.04) for a day or two without issue. After that I switched the kernel back to Ubuntu's kernel (v6.2) and the server ran for a couple more days without issue. Two days ago I turned "tx-nocache-copy" on again out of curiosity (kernel v6.2), and the server didn't run into any issue with this setting set to on. This morning I upgraded to Ubuntu 23.10 that runs their version of Kernel v6.5. The kernel panicked and server rebooted a couple of hours later. 
> 
> 
> The issue seem to be triggered with a certain configuration of applications, I've ran Mainline/stable kernel 6.5.x since its release (and before that v6.4.x) with the rtorrent bittorrent client and "tx-nocache-copy" turned on, the kernel didn't run into any issue for weeks until I switched to another bittorrent client (qbittorrent) last week. It doesn't seem to matter when it happens, the kernel can Opps when the client is downloading a single small sized torrent to when it's downloading multiple torrents at the same time. 
> 
> 
> I tried to use the crash utiliy to get the backtrace but it doesn't seem to work correctly. I get "crash: invalid structure member offset: module_core_size FILE: kernel.c  LINE: 3781  FUNCTION: module_init()" when I try to load the kernel dump. 
> 
> The kernel panic happens with 6.5.x Mainline/stable kernel as well as the 6.5 kernel that comes with ubuntu 23.10.
> 
> The bittorrent clients run as systemd services with normal user privileges and "ProtectKernelModules=yes" "NoNewPrivileges=yes" set in the systemd service. 
> 
> I joined the full dmesg as attachement, and I can send the kdump generated kernel dump file if needed. 
> 
> 
> ------------------------
> [12090.273551] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> [12090.273577] BUG: unable to handle page fault for address: ffff9441c9734458
> [12090.273590] #PF: supervisor instruction fetch in kernel mode
> [12090.273602] #PF: error_code(0x0011) - permissions violation
> [12090.273614] PGD 157401067 P4D 157401067 PUD 23ffff067 PMD 108a81063 PTE 8000000109734063
> [12090.273632] Oops: 0011 [#1] PREEMPT SMP PTI
> [12090.273643] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Not tainted 6.5.0-9-generic #9-Ubuntu
> [12090.273658] Hardware name: Dell Inc. PowerEdge R210 II/03X6X0, BIOS 2.10.0 05/24/2018
> [12090.273674] RIP: 0010:0xffff9441c9734458
> [12090.273694] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 58 44 73 c9 41 94 ff ff 00 00 00 00 00 00
> [12090.273723] RSP: 0018:ffffb3c380138980 EFLAGS: 00010282
> [12090.273734] RAX: ffff9441c9734458 RBX: ffff9441c9734400 RCX: 0000000000000000
> [12090.273746] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9441c9734400
> [12090.273758] RBP: ffffb3c380138990 R08: 0000000000000000 R09: 0000000000000000
> [12090.273771] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9441c9734400
> [12090.273783] R13: 00000000000005dc R14: ffff9441c49dda00 R15: ffffffff9e55ec40
> [12090.273795] FS:  0000000000000000(0000) GS:ffff9442f7c40000(0000) knlGS:0000000000000000
> [12090.273811] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [12090.273823] CR2: ffff9441c9734458 CR3: 0000000155a3a006 CR4: 00000000001706e0
> [12090.273837] Call Trace:
> [12090.273845]  <IRQ>
> [12090.273851]  ? show_regs+0x6d/0x80
> [12090.273864]  ? __die+0x24/0x80
> [12090.273873]  ? page_fault_oops+0x99/0x1b0
> [12090.273884]  ? kernelmode_fixup_or_oops+0xb2/0x140
> [12090.273896]  ? __bad_area_nosemaphore+0x1a5/0x2c0
> [12090.273908]  ? bad_area_nosemaphore+0x16/0x30
> [12090.273918]  ? do_kern_addr_fault+0x7b/0xa0
> [12090.273927]  ? exc_page_fault+0x1a4/0x1b0
> [12090.273939]  ? asm_exc_page_fault+0x27/0x30
> [12090.273952]  ? skb_release_head_state+0x27/0xb0
> [12090.273964]  consume_skb+0x33/0xf0
> [12090.273973]  tcp_mtu_probe+0x565/0x5d0
> [12090.273984]  tcp_write_xmit+0x579/0xab0
> [12090.273994]  __tcp_push_pending_frames+0x37/0x110
> [12090.274005]  tcp_rcv_established+0x264/0x730
> [12090.274015]  ? security_sock_rcv_skb+0x39/0x60
> [12090.274027]  tcp_v4_do_rcv+0x169/0x2a0
> [12090.274037]  tcp_v4_rcv+0xd92/0xe00
> [12090.274046]  ? raw_v4_input+0xaa/0x240
> [12090.274056]  ip_protocol_deliver_rcu+0x3c/0x210
> [12090.274068]  ip_local_deliver_finish+0x77/0xa0
> [12090.274078]  ip_local_deliver+0x6e/0x120
> [12090.274089]  ? __pfx_ip_local_deliver_finish+0x10/0x10
> [12090.274369]  ip_sublist_rcv_finish+0x6f/0x80
> [12090.274638]  ip_sublist_rcv+0x171/0x220
> [12090.274931]  ? __pfx_ip_rcv_finish+0x10/0x10
> [12090.275201]  ip_list_rcv+0x102/0x140
> [12090.275459]  __netif_receive_skb_list_core+0x22d/0x250
> [12090.275714]  netif_receive_skb_list_internal+0x1a3/0x2d0
> [12090.275967]  napi_complete_done+0x74/0x1c0
> [12090.276218]  bnx2_poll_msix+0xa1/0xe0 [bnx2]
> [12090.276468]  __napi_poll+0x33/0x1f0
> [12090.276708]  net_rx_action+0x181/0x2e0
> [12090.276943]  __do_softirq+0xd9/0x346
> [12090.277172]  ? handle_irq_event+0x52/0x80
> [12090.277393]  ? handle_edge_irq+0xda/0x250
> [12090.277604]  __irq_exit_rcu+0x75/0xa0
> [12090.277812]  irq_exit_rcu+0xe/0x20
> [12090.278015]  common_interrupt+0xa4/0xb0
> [12090.278217]  </IRQ>
> [12090.278411]  <TASK>
> [12090.278602]  asm_common_interrupt+0x27/0x40
> [12090.278798] RIP: 0010:cpuidle_enter_state+0xda/0x730
> [12090.278992] Code: 11 04 ff e8 a8 f5 ff ff 8b 53 04 49 89 c7 0f 1f 44 00 00 31 ff e8 26 bb 02 ff 80 7d d0 00 0f 85 61 02 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 f7 01 00 00 4d 63 ee 49 83 fd 0a 0f 83 17 05 00 00
> [12090.279402] RSP: 0018:ffffb3c3800cbe18 EFLAGS: 00000246
> [12090.279612] RAX: 0000000000000000 RBX: ffff9442f7c7ec00 RCX: 0000000000000000
> [12090.279827] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
> [12090.280042] RBP: ffffb3c3800cbe68 R08: 0000000000000000 R09: 0000000000000000
> [12090.280259] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff9d0d24a0
> [12090.280478] R13: 0000000000000003 R14: 0000000000000003 R15: 00000afefc75867b
> [12090.280698]  ? cpuidle_enter_state+0xca/0x730
> [12090.280918]  ? finish_task_switch.isra.0+0x89/0x2b0
> [12090.281142]  cpuidle_enter+0x2e/0x50
> [12090.281363]  call_cpuidle+0x23/0x60
> [12090.281583]  cpuidle_idle_call+0x11d/0x190
> [12090.281804]  do_idle+0x82/0xf0
> [12090.282022]  cpu_startup_entry+0x1d/0x20
> [12090.282240]  start_secondary+0x129/0x160
> [12090.282460]  secondary_startup_64_no_verify+0x17e/0x18b
> [12090.282685]  </TASK>
> [12090.282902] Modules linked in: tcp_diag inet_diag ip6table_filter ip6_tables xt_LOG nf_log_syslog xt_recent xt_limit xt_tcpudp xt_conntrack iptable_filter xt_CT xt_set iptable_raw bpfilter ip_set_hash_ip ip_set_hash_net ip_set_hash_ipport ip_set_list_set ip_set_bitmap_port ip_set_hash_netiface ip_set nfnetlink binfmt_misc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel aesni_intel crypto_simd cryptd rapl intel_cstate ipmi_ssif mgag200 drm_shmem_helper cfg80211 input_leds drm_kms_helper dcdbas at24 i2c_i801 lpc_ich i2c_smbus ie31200_edac acpi_ipmi i2c_algo_bit ipmi_si ipmi_devintf ipmi_msghandler sch_fq tcp_bbr nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 hid_generic usbhid hid crc32_pclmul ahci mpt3sas libahci raid_class bnx2 scsi_transport_sas wmi
> [12090.285082] CR2: ffff9441c9734458
> ----

Later, the reporter (Cc'ed) narrowed down the culprit range:

> Probably has nothing to do with the Broadcom bnx2 driver. The server crashed with "tx-nocache-copy" set to off. 
> 
> I added the dmesg as attachment, the backtrace and kmem of the RIP address are below.
> 
> I ran qbittorrent on a different server with the same hardware config back in June this year, the server was running Mainline/Stable kernel version 6.3.x then 6.4.0 and the server never rebooted once.

See Bugzilla for the full thread and attached dmesg logs.

Anyway, I'm adding this regression to regzbot:

#regzbot introduced: v6.4..v6.5 https://bugzilla.kernel.org/show_bug.cgi?id=218033
#regzbot title: kernel panic when downloading torrent with qbittorrent with tx-nocache-copy on

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=218033

-- 
An old man doll... just what I always wanted! - Clara


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Fwd: kernel tried to execute NX-protected page - exploit attempt? (uid: 0) (qbittorrent with tx-nocache-copy)
  2023-10-24  8:15 Fwd: kernel tried to execute NX-protected page - exploit attempt? (uid: 0) (qbittorrent with tx-nocache-copy) Bagas Sanjaya
@ 2023-10-24  8:53 ` Bagas Sanjaya
  2023-10-24  9:25   ` Eric Dumazet
  0 siblings, 1 reply; 4+ messages in thread
From: Bagas Sanjaya @ 2023-10-24  8:53 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Linux Networking,
	Linux Memory Management List, Linux Regressions
  Cc: David S. Miller, Eric Dumazet, Benjamin Poirier, Tom Herbert,
	Jakub Kicinski, Paolo Abeni, Andrew Morton, CM76

Hi CM76,

On 24/10/2023 15:15, Bagas Sanjaya wrote:
> Hi,
> 
> I notice a regression report on Bugzilla [1]. Quoting from it:
> 
>> I believe this is also an issue with the Broadcom bnx2 drivers since it only seem to happen when I enable "tx-nocache-copy" in ethtool.  
>>
>> The issue started when I was running Mainline/stable Kernel v6.5.x on another machine, after google-ing a bit I landed on an article from redhat that pointed at the possibility of an issue caused by a failing hardware. I was renting the server, so I didn't bother to fill a bug report and assumed it was the server that was going bad. But then it happened again on my other server as soon as I switched the bittorrent client to the same I was using on that other server. I turned "tx-nocache-copy" off and ran mainline kernel v6.5 (on Ubuntu 23.04) for a day or two without issue. After that I switched the kernel back to Ubuntu's kernel (v6.2) and the server ran for a couple more days without issue. Two days ago I turned "tx-nocache-copy" on again out of curiosity (kernel v6.2), and the server didn't run into any issue with this setting set to on. This morning I upgraded to Ubuntu 23.10 that runs their version of Kernel v6.5. The kernel panicked and server rebooted a couple of hours later. 
>>
>>
>> The issue seem to be triggered with a certain configuration of applications, I've ran Mainline/stable kernel 6.5.x since its release (and before that v6.4.x) with the rtorrent bittorrent client and "tx-nocache-copy" turned on, the kernel didn't run into any issue for weeks until I switched to another bittorrent client (qbittorrent) last week. It doesn't seem to matter when it happens, the kernel can Opps when the client is downloading a single small sized torrent to when it's downloading multiple torrents at the same time. 
>>
>>
>> I tried to use the crash utiliy to get the backtrace but it doesn't seem to work correctly. I get "crash: invalid structure member offset: module_core_size FILE: kernel.c  LINE: 3781  FUNCTION: module_init()" when I try to load the kernel dump. 
>>
>> The kernel panic happens with 6.5.x Mainline/stable kernel as well as the 6.5 kernel that comes with ubuntu 23.10.
>>
>> The bittorrent clients run as systemd services with normal user privileges and "ProtectKernelModules=yes" "NoNewPrivileges=yes" set in the systemd service. 
>>
>> I joined the full dmesg as attachement, and I can send the kdump generated kernel dump file if needed. 
>>
>>
>> ------------------------
>> [12090.273551] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
>> [12090.273577] BUG: unable to handle page fault for address: ffff9441c9734458
>> [12090.273590] #PF: supervisor instruction fetch in kernel mode
>> [12090.273602] #PF: error_code(0x0011) - permissions violation
>> [12090.273614] PGD 157401067 P4D 157401067 PUD 23ffff067 PMD 108a81063 PTE 8000000109734063
>> [12090.273632] Oops: 0011 [#1] PREEMPT SMP PTI
>> [12090.273643] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Not tainted 6.5.0-9-generic #9-Ubuntu
>> [12090.273658] Hardware name: Dell Inc. PowerEdge R210 II/03X6X0, BIOS 2.10.0 05/24/2018
>> [12090.273674] RIP: 0010:0xffff9441c9734458
>> [12090.273694] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 58 44 73 c9 41 94 ff ff 00 00 00 00 00 00
>> [12090.273723] RSP: 0018:ffffb3c380138980 EFLAGS: 00010282
>> [12090.273734] RAX: ffff9441c9734458 RBX: ffff9441c9734400 RCX: 0000000000000000
>> [12090.273746] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9441c9734400
>> [12090.273758] RBP: ffffb3c380138990 R08: 0000000000000000 R09: 0000000000000000
>> [12090.273771] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9441c9734400
>> [12090.273783] R13: 00000000000005dc R14: ffff9441c49dda00 R15: ffffffff9e55ec40
>> [12090.273795] FS:  0000000000000000(0000) GS:ffff9442f7c40000(0000) knlGS:0000000000000000
>> [12090.273811] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [12090.273823] CR2: ffff9441c9734458 CR3: 0000000155a3a006 CR4: 00000000001706e0
>> [12090.273837] Call Trace:
>> [12090.273845]  <IRQ>
>> [12090.273851]  ? show_regs+0x6d/0x80
>> [12090.273864]  ? __die+0x24/0x80
>> [12090.273873]  ? page_fault_oops+0x99/0x1b0
>> [12090.273884]  ? kernelmode_fixup_or_oops+0xb2/0x140
>> [12090.273896]  ? __bad_area_nosemaphore+0x1a5/0x2c0
>> [12090.273908]  ? bad_area_nosemaphore+0x16/0x30
>> [12090.273918]  ? do_kern_addr_fault+0x7b/0xa0
>> [12090.273927]  ? exc_page_fault+0x1a4/0x1b0
>> [12090.273939]  ? asm_exc_page_fault+0x27/0x30
>> [12090.273952]  ? skb_release_head_state+0x27/0xb0
>> [12090.273964]  consume_skb+0x33/0xf0
>> [12090.273973]  tcp_mtu_probe+0x565/0x5d0
>> [12090.273984]  tcp_write_xmit+0x579/0xab0
>> [12090.273994]  __tcp_push_pending_frames+0x37/0x110
>> [12090.274005]  tcp_rcv_established+0x264/0x730
>> [12090.274015]  ? security_sock_rcv_skb+0x39/0x60
>> [12090.274027]  tcp_v4_do_rcv+0x169/0x2a0
>> [12090.274037]  tcp_v4_rcv+0xd92/0xe00
>> [12090.274046]  ? raw_v4_input+0xaa/0x240
>> [12090.274056]  ip_protocol_deliver_rcu+0x3c/0x210
>> [12090.274068]  ip_local_deliver_finish+0x77/0xa0
>> [12090.274078]  ip_local_deliver+0x6e/0x120
>> [12090.274089]  ? __pfx_ip_local_deliver_finish+0x10/0x10
>> [12090.274369]  ip_sublist_rcv_finish+0x6f/0x80
>> [12090.274638]  ip_sublist_rcv+0x171/0x220
>> [12090.274931]  ? __pfx_ip_rcv_finish+0x10/0x10
>> [12090.275201]  ip_list_rcv+0x102/0x140
>> [12090.275459]  __netif_receive_skb_list_core+0x22d/0x250
>> [12090.275714]  netif_receive_skb_list_internal+0x1a3/0x2d0
>> [12090.275967]  napi_complete_done+0x74/0x1c0
>> [12090.276218]  bnx2_poll_msix+0xa1/0xe0 [bnx2]
>> [12090.276468]  __napi_poll+0x33/0x1f0
>> [12090.276708]  net_rx_action+0x181/0x2e0
>> [12090.276943]  __do_softirq+0xd9/0x346
>> [12090.277172]  ? handle_irq_event+0x52/0x80
>> [12090.277393]  ? handle_edge_irq+0xda/0x250
>> [12090.277604]  __irq_exit_rcu+0x75/0xa0
>> [12090.277812]  irq_exit_rcu+0xe/0x20
>> [12090.278015]  common_interrupt+0xa4/0xb0
>> [12090.278217]  </IRQ>
>> [12090.278411]  <TASK>
>> [12090.278602]  asm_common_interrupt+0x27/0x40
>> [12090.278798] RIP: 0010:cpuidle_enter_state+0xda/0x730
>> [12090.278992] Code: 11 04 ff e8 a8 f5 ff ff 8b 53 04 49 89 c7 0f 1f 44 00 00 31 ff e8 26 bb 02 ff 80 7d d0 00 0f 85 61 02 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 f7 01 00 00 4d 63 ee 49 83 fd 0a 0f 83 17 05 00 00
>> [12090.279402] RSP: 0018:ffffb3c3800cbe18 EFLAGS: 00000246
>> [12090.279612] RAX: 0000000000000000 RBX: ffff9442f7c7ec00 RCX: 0000000000000000
>> [12090.279827] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
>> [12090.280042] RBP: ffffb3c3800cbe68 R08: 0000000000000000 R09: 0000000000000000
>> [12090.280259] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff9d0d24a0
>> [12090.280478] R13: 0000000000000003 R14: 0000000000000003 R15: 00000afefc75867b
>> [12090.280698]  ? cpuidle_enter_state+0xca/0x730
>> [12090.280918]  ? finish_task_switch.isra.0+0x89/0x2b0
>> [12090.281142]  cpuidle_enter+0x2e/0x50
>> [12090.281363]  call_cpuidle+0x23/0x60
>> [12090.281583]  cpuidle_idle_call+0x11d/0x190
>> [12090.281804]  do_idle+0x82/0xf0
>> [12090.282022]  cpu_startup_entry+0x1d/0x20
>> [12090.282240]  start_secondary+0x129/0x160
>> [12090.282460]  secondary_startup_64_no_verify+0x17e/0x18b
>> [12090.282685]  </TASK>
>> [12090.282902] Modules linked in: tcp_diag inet_diag ip6table_filter ip6_tables xt_LOG nf_log_syslog xt_recent xt_limit xt_tcpudp xt_conntrack iptable_filter xt_CT xt_set iptable_raw bpfilter ip_set_hash_ip ip_set_hash_net ip_set_hash_ipport ip_set_list_set ip_set_bitmap_port ip_set_hash_netiface ip_set nfnetlink binfmt_misc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel aesni_intel crypto_simd cryptd rapl intel_cstate ipmi_ssif mgag200 drm_shmem_helper cfg80211 input_leds drm_kms_helper dcdbas at24 i2c_i801 lpc_ich i2c_smbus ie31200_edac acpi_ipmi i2c_algo_bit ipmi_si ipmi_devintf ipmi_msghandler sch_fq tcp_bbr nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 hid_generic usbhid hid crc32_pclmul ahci mpt3sas libahci raid_class bnx2 scsi_transport_sas wmi
>> [12090.285082] CR2: ffff9441c9734458
>> ----
> 

Please see [1] for how to decode stack trace symbols.
And also, the most important thing to get this regression fixed
is to find culprit commit by bisecting (for reference see
Documentation/admin-guide/bug-bisect.rst).

[1]: https://lore.kernel.org/all/CANn89iL9Twf+Rzm9v_dwsH_iG4YkW3fAc2Hnx2jypN_Qf9oojw@mail.gmail.com/

Thanks.

-- 
An old man doll... just what I always wanted! - Clara



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Fwd: kernel tried to execute NX-protected page - exploit attempt? (uid: 0) (qbittorrent with tx-nocache-copy)
  2023-10-24  8:53 ` Bagas Sanjaya
@ 2023-10-24  9:25   ` Eric Dumazet
  2023-10-24 12:32     ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Dumazet @ 2023-10-24  9:25 UTC (permalink / raw)
  To: Bagas Sanjaya
  Cc: Linux Kernel Mailing List, Linux Networking,
	Linux Memory Management List, Linux Regressions, David S. Miller,
	Benjamin Poirier, Tom Herbert, Jakub Kicinski, Paolo Abeni,
	Andrew Morton, CM76

On Tue, Oct 24, 2023 at 10:53 AM Bagas Sanjaya <bagasdotme@gmail.com> wrote:
>
> Hi CM76,
>
> On 24/10/2023 15:15, Bagas Sanjaya wrote:
> > Hi,
> >
> > I notice a regression report on Bugzilla [1]. Quoting from it:
> >
> >> I believe this is also an issue with the Broadcom bnx2 drivers since it only seem to happen when I enable "tx-nocache-copy" in ethtool.
> >>
> >> The issue started when I was running Mainline/stable Kernel v6.5.x on another machine, after google-ing a bit I landed on an article from redhat that pointed at the possibility of an issue caused by a failing hardware. I was renting the server, so I didn't bother to fill a bug report and assumed it was the server that was going bad. But then it happened again on my other server as soon as I switched the bittorrent client to the same I was using on that other server. I turned "tx-nocache-copy" off and ran mainline kernel v6.5 (on Ubuntu 23.04) for a day or two without issue. After that I switched the kernel back to Ubuntu's kernel (v6.2) and the server ran for a couple more days without issue. Two days ago I turned "tx-nocache-copy" on again out of curiosity (kernel v6.2), and the server didn't run into any issue with this setting set to on. This morning I upgraded to Ubuntu 23.10 that runs their version of Kernel v6.5. The kernel panicked and server rebooted a couple of hours later.
> >>
> >>
> >> The issue seem to be triggered with a certain configuration of applications, I've ran Mainline/stable kernel 6.5.x since its release (and before that v6.4.x) with the rtorrent bittorrent client and "tx-nocache-copy" turned on, the kernel didn't run into any issue for weeks until I switched to another bittorrent client (qbittorrent) last week. It doesn't seem to matter when it happens, the kernel can Opps when the client is downloading a single small sized torrent to when it's downloading multiple torrents at the same time.
> >>
> >>
> >> I tried to use the crash utiliy to get the backtrace but it doesn't seem to work correctly. I get "crash: invalid structure member offset: module_core_size FILE: kernel.c  LINE: 3781  FUNCTION: module_init()" when I try to load the kernel dump.
> >>
> >> The kernel panic happens with 6.5.x Mainline/stable kernel as well as the 6.5 kernel that comes with ubuntu 23.10.
> >>
> >> The bittorrent clients run as systemd services with normal user privileges and "ProtectKernelModules=yes" "NoNewPrivileges=yes" set in the systemd service.
> >>
> >> I joined the full dmesg as attachement, and I can send the kdump generated kernel dump file if needed.
> >>
> >>
> >> ------------------------
> >> [12090.273551] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> >> [12090.273577] BUG: unable to handle page fault for address: ffff9441c9734458
> >> [12090.273590] #PF: supervisor instruction fetch in kernel mode
> >> [12090.273602] #PF: error_code(0x0011) - permissions violation
> >> [12090.273614] PGD 157401067 P4D 157401067 PUD 23ffff067 PMD 108a81063 PTE 8000000109734063
> >> [12090.273632] Oops: 0011 [#1] PREEMPT SMP PTI
> >> [12090.273643] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Not tainted 6.5.0-9-generic #9-Ubuntu
> >> [12090.273658] Hardware name: Dell Inc. PowerEdge R210 II/03X6X0, BIOS 2.10.0 05/24/2018
> >> [12090.273674] RIP: 0010:0xffff9441c9734458
> >> [12090.273694] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 58 44 73 c9 41 94 ff ff 00 00 00 00 00 00
> >> [12090.273723] RSP: 0018:ffffb3c380138980 EFLAGS: 00010282
> >> [12090.273734] RAX: ffff9441c9734458 RBX: ffff9441c9734400 RCX: 0000000000000000
> >> [12090.273746] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9441c9734400
> >> [12090.273758] RBP: ffffb3c380138990 R08: 0000000000000000 R09: 0000000000000000
> >> [12090.273771] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9441c9734400
> >> [12090.273783] R13: 00000000000005dc R14: ffff9441c49dda00 R15: ffffffff9e55ec40
> >> [12090.273795] FS:  0000000000000000(0000) GS:ffff9442f7c40000(0000) knlGS:0000000000000000
> >> [12090.273811] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [12090.273823] CR2: ffff9441c9734458 CR3: 0000000155a3a006 CR4: 00000000001706e0
> >> [12090.273837] Call Trace:
> >> [12090.273845]  <IRQ>
> >> [12090.273851]  ? show_regs+0x6d/0x80
> >> [12090.273864]  ? __die+0x24/0x80
> >> [12090.273873]  ? page_fault_oops+0x99/0x1b0
> >> [12090.273884]  ? kernelmode_fixup_or_oops+0xb2/0x140
> >> [12090.273896]  ? __bad_area_nosemaphore+0x1a5/0x2c0
> >> [12090.273908]  ? bad_area_nosemaphore+0x16/0x30
> >> [12090.273918]  ? do_kern_addr_fault+0x7b/0xa0
> >> [12090.273927]  ? exc_page_fault+0x1a4/0x1b0
> >> [12090.273939]  ? asm_exc_page_fault+0x27/0x30
> >> [12090.273952]  ? skb_release_head_state+0x27/0xb0
> >> [12090.273964]  consume_skb+0x33/0xf0
> >> [12090.273973]  tcp_mtu_probe+0x565/0x5d0
> >> [12090.273984]  tcp_write_xmit+0x579/0xab0
> >> [12090.273994]  __tcp_push_pending_frames+0x37/0x110
> >> [12090.274005]  tcp_rcv_established+0x264/0x730
> >> [12090.274015]  ? security_sock_rcv_skb+0x39/0x60
> >> [12090.274027]  tcp_v4_do_rcv+0x169/0x2a0
> >> [12090.274037]  tcp_v4_rcv+0xd92/0xe00
> >> [12090.274046]  ? raw_v4_input+0xaa/0x240
> >> [12090.274056]  ip_protocol_deliver_rcu+0x3c/0x210
> >> [12090.274068]  ip_local_deliver_finish+0x77/0xa0
> >> [12090.274078]  ip_local_deliver+0x6e/0x120
> >> [12090.274089]  ? __pfx_ip_local_deliver_finish+0x10/0x10
> >> [12090.274369]  ip_sublist_rcv_finish+0x6f/0x80
> >> [12090.274638]  ip_sublist_rcv+0x171/0x220
> >> [12090.274931]  ? __pfx_ip_rcv_finish+0x10/0x10
> >> [12090.275201]  ip_list_rcv+0x102/0x140
> >> [12090.275459]  __netif_receive_skb_list_core+0x22d/0x250
> >> [12090.275714]  netif_receive_skb_list_internal+0x1a3/0x2d0
> >> [12090.275967]  napi_complete_done+0x74/0x1c0
> >> [12090.276218]  bnx2_poll_msix+0xa1/0xe0 [bnx2]
> >> [12090.276468]  __napi_poll+0x33/0x1f0
> >> [12090.276708]  net_rx_action+0x181/0x2e0
> >> [12090.276943]  __do_softirq+0xd9/0x346
> >> [12090.277172]  ? handle_irq_event+0x52/0x80
> >> [12090.277393]  ? handle_edge_irq+0xda/0x250
> >> [12090.277604]  __irq_exit_rcu+0x75/0xa0
> >> [12090.277812]  irq_exit_rcu+0xe/0x20
> >> [12090.278015]  common_interrupt+0xa4/0xb0
> >> [12090.278217]  </IRQ>
> >> [12090.278411]  <TASK>
> >> [12090.278602]  asm_common_interrupt+0x27/0x40
> >> [12090.278798] RIP: 0010:cpuidle_enter_state+0xda/0x730
> >> [12090.278992] Code: 11 04 ff e8 a8 f5 ff ff 8b 53 04 49 89 c7 0f 1f 44 00 00 31 ff e8 26 bb 02 ff 80 7d d0 00 0f 85 61 02 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 f7 01 00 00 4d 63 ee 49 83 fd 0a 0f 83 17 05 00 00
> >> [12090.279402] RSP: 0018:ffffb3c3800cbe18 EFLAGS: 00000246
> >> [12090.279612] RAX: 0000000000000000 RBX: ffff9442f7c7ec00 RCX: 0000000000000000
> >> [12090.279827] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
> >> [12090.280042] RBP: ffffb3c3800cbe68 R08: 0000000000000000 R09: 0000000000000000
> >> [12090.280259] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff9d0d24a0
> >> [12090.280478] R13: 0000000000000003 R14: 0000000000000003 R15: 00000afefc75867b
> >> [12090.280698]  ? cpuidle_enter_state+0xca/0x730
> >> [12090.280918]  ? finish_task_switch.isra.0+0x89/0x2b0
> >> [12090.281142]  cpuidle_enter+0x2e/0x50
> >> [12090.281363]  call_cpuidle+0x23/0x60
> >> [12090.281583]  cpuidle_idle_call+0x11d/0x190
> >> [12090.281804]  do_idle+0x82/0xf0
> >> [12090.282022]  cpu_startup_entry+0x1d/0x20
> >> [12090.282240]  start_secondary+0x129/0x160
> >> [12090.282460]  secondary_startup_64_no_verify+0x17e/0x18b
> >> [12090.282685]  </TASK>
> >> [12090.282902] Modules linked in: tcp_diag inet_diag ip6table_filter ip6_tables xt_LOG nf_log_syslog xt_recent xt_limit xt_tcpudp xt_conntrack iptable_filter xt_CT xt_set iptable_raw bpfilter ip_set_hash_ip ip_set_hash_net ip_set_hash_ipport ip_set_list_set ip_set_bitmap_port ip_set_hash_netiface ip_set nfnetlink binfmt_misc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel aesni_intel crypto_simd cryptd rapl intel_cstate ipmi_ssif mgag200 drm_shmem_helper cfg80211 input_leds drm_kms_helper dcdbas at24 i2c_i801 lpc_ich i2c_smbus ie31200_edac acpi_ipmi i2c_algo_bit ipmi_si ipmi_devintf ipmi_msghandler sch_fq tcp_bbr nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 hid_generic usbhid hid crc32_pclmul ahci mpt3sas libahci raid_class bnx2 scsi_transport_sas wmi
> >> [12090.285082] CR2: ffff9441c9734458
> >> ----
> >
>
> Please see [1] for how to decode stack trace symbols.
> And also, the most important thing to get this regression fixed
> is to find culprit commit by bisecting (for reference see
> Documentation/admin-guide/bug-bisect.rst).
>
> [1]: https://lore.kernel.org/all/CANn89iL9Twf+Rzm9v_dwsH_iG4YkW3fAc2Hnx2jypN_Qf9oojw@mail.gmail.com/
>
> Thanks.

This has been fixed already two weeks ago.

commit 71c299c711d1f44f0bf04f1fea66baad565240f1
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Tue Oct 10 10:36:51 2023 -0700

    net: tcp: fix crashes trying to free half-baked MTU probes

    tcp_stream_alloc_skb() initializes the skb to use tcp_tsorted_anchor
    which is a union with the destructor. We need to clean that
    TCP-iness up before freeing.

    Fixes: 736013292e3c ("tcp: let tcp_mtu_probe() build headless packets")
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://lore.kernel.org/r/20231010173651.3990234-1-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Fwd: kernel tried to execute NX-protected page - exploit attempt? (uid: 0) (qbittorrent with tx-nocache-copy)
  2023-10-24  9:25   ` Eric Dumazet
@ 2023-10-24 12:32     ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 0 replies; 4+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-10-24 12:32 UTC (permalink / raw)
  To: Eric Dumazet, Bagas Sanjaya
  Cc: Linux Kernel Mailing List, Linux Networking,
	Linux Memory Management List, Linux Regressions, David S. Miller,
	Benjamin Poirier, Tom Herbert, Jakub Kicinski, Paolo Abeni,
	Andrew Morton, CM76

On 24.10.23 11:25, Eric Dumazet wrote:
> On Tue, Oct 24, 2023 at 10:53 AM Bagas Sanjaya <bagasdotme@gmail.com> wrote:
>>
>> On 24/10/2023 15:15, Bagas Sanjaya wrote:
>>>
>>> I notice a regression report on Bugzilla [1]. Quoting from it:
>>>> I believe this is also an issue with the Broadcom bnx2 drivers since it only seem to happen when I enable "tx-nocache-copy" in ethtool.
>>[...]
>> Thanks.
>
> This has been fixed already two weeks ago.
> 
> commit 71c299c711d1f44f0bf04f1fea66baad565240f1

Eric, thx for letting us know!

#regzbot fix: 71c299c711d1f44f0

Bagas, maybe in a case like this wait with forwarding the report until
the reporter confirmed that the bug happens with a really fresh kernel.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-10-24 12:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-24  8:15 Fwd: kernel tried to execute NX-protected page - exploit attempt? (uid: 0) (qbittorrent with tx-nocache-copy) Bagas Sanjaya
2023-10-24  8:53 ` Bagas Sanjaya
2023-10-24  9:25   ` Eric Dumazet
2023-10-24 12:32     ` Linux regression tracking (Thorsten Leemhuis)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox