From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36C9FC433E0 for ; Mon, 22 Jun 2020 16:25:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E7D182076A for ; Mon, 22 Jun 2020 16:25:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E7D182076A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 843A76B0006; Mon, 22 Jun 2020 12:25:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7F42F6B0007; Mon, 22 Jun 2020 12:25:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 70A7D6B000A; Mon, 22 Jun 2020 12:25:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0229.hostedemail.com [216.40.44.229]) by kanga.kvack.org (Postfix) with ESMTP id 511886B0006 for ; Mon, 22 Jun 2020 12:25:33 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 0AE5882437D1 for ; Mon, 22 Jun 2020 16:25:33 +0000 (UTC) X-FDA: 76957373346.08.clam78_440243426e34 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id CEF65180E9CE2 for ; Mon, 22 Jun 2020 16:25:32 +0000 (UTC) X-HE-Tag: clam78_440243426e34 X-Filterd-Recvd-Size: 10832 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf01.hostedemail.com (Postfix) with ESMTP for ; Mon, 22 Jun 2020 16:25:30 +0000 (UTC) IronPort-SDR: 6FyRyWdQhGhHeZqKjd7f80eKN/ISy1hpJISEwROWDGB/0uPgExA2s4Rlnhokic0g30vdoenjhO kYejcPR0LZMw== X-IronPort-AV: E=McAfee;i="6000,8403,9660"; a="141310767" X-IronPort-AV: E=Sophos;i="5.75,267,1589266800"; d="scan'208";a="141310767" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2020 09:25:29 -0700 IronPort-SDR: OKI90ZOy+UQTOx8VdlV78FylV4VZBePLI4ajqNsv5ugCk2njFD/O9itD4gR3qUCpxUvvbbXgHF AeeIYxXYIleA== X-IronPort-AV: E=Sophos;i="5.75,267,1589266800"; d="scan'208";a="384571093" Received: from unknown (HELO intel.com) ([10.252.132.84]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2020 09:25:29 -0700 Date: Mon, 22 Jun 2020 09:25:27 -0700 From: Ben Widawsky To: David Hildenbrand Cc: linux-mm , Andrew Morton , "Scargall, Steve" , Dan Williams , Vishal Verma Subject: Re: [PATCH] mm: Fix false softlockup during pfn range removal Message-ID: <20200622162527.bo765xhid563u6vp@intel.com> Mail-Followup-To: David Hildenbrand , linux-mm , Andrew Morton , "Scargall, Steve" , Dan Williams , Vishal Verma References: <20200619231213.1160351-1-ben.widawsky@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: X-Rspamd-Queue-Id: CEF65180E9CE2 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 20-06-22 09:16:20, David Hildenbrand wrote: > On 20.06.20 01:12, Ben Widawsky wrote: > > When working with very large nodes, poisoning the struct pages (for > > which there will be very many) can take a very long time. If the system > > is using voluntary preemptions, the software watchdog will not be able > > to detect forward progress. This patch addresses this issue by offering > > to give up time like __remove_pages() does. This behavior was > > introduced in v5.6 with: > > commit d33695b16a9f ("mm/memory_hotplug: poison memmap in remove_pfn_ra= nge_from_zone()") >=20 > We try to stay <=3D 72 chars in the commit message (except for kernel spl= ats). >=20 Thanks. checkpatch does complain fwiw, I just somehow missed i. > >=20 > > Alternately, init_page_poison could do this cond_resched(), but it seems > > to me that the caller of init_page_poison() is what actually knows > > whether or not it should relax its own priority. > >=20 > > Based on Dan's notes, I think this is perfectly safe: > > commit f931ab479dd2 ("mm: fix devm_memremap_pages crash, use mem_hotplu= g_{begin, done}") >=20 > We shouldn't be holding any spin locks, so it's safe. (we could think > about doing this outside of the memory hotplug lock in the case of > devmem, however, it's only a debugging feature so we most probably don't > care). >=20 > >=20 > > Aside from fixing the lockup, it is also a friendlier thing to do on > > lower core systems that might wipe out large chunks of hotplug memory > > (probably not a very common case). >=20 > It really only is an issue for devmem. Ordinary hotplugged system memory > is not affected (onlined/offlined in memory block granularity). Could you explain this a bit? I was fixing the issue found on PMEM systems,= but it seems like regularly memory hotplug was potentially a victim. I think on= e of the reasons PMEM might be more likely is the time it takes to work with any= data structures store in the PMEM itself is slower (just a guess). >=20 > >=20 > > Fixes this kind of splat: > > [ 352.142079] watchdog: BUG: soft lockup - CPU#46 stuck for 22s! [daxc= tl:9922] > > [ 352.150067] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv= 6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct n= ft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_man= gle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defra= g_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security rfkill ip= _set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_f= ilter ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transpo= rt_srp ib_ipoib ib_umad vfat fat kmem intel_rapl_msr intel_rapl_common rpcr= dma sunrpc rdma_ucm ib_iser isst_if_common rdma_cm skx_edac iw_cm ib_cm x86= _pkg_temp_thermal libiscsi intel_powerclamp scsi_transport_iscsi coretemp k= vm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i4= 0iw intel_cstate iTCO_wdt ib_uverbs iTCO_vendor_support device_dax ib_core = joydev intel_uncore i2c_i801 lpc_ich ipmi_ssif ioatdma dca wmi ipmi_si ipmi= _devintf ipmi_msghandler dax_pmem > > [ 352.150123] dax_pmem_core acpi_pad acpi_power_meter drm ip_tables x= fs libcrc32c nd_pmem nd_btt i40e e1000e crc32c_intel nfit > > [ 352.260774] irq event stamp: 138450 > > [ 352.264692] hardirqs last enabled at (138449): []= trace_hardirqs_on_thunk+0x1a/0x1c > > [ 352.275134] hardirqs last disabled at (138450): []= trace_hardirqs_off_thunk+0x1a/0x1c > > [ 352.285662] softirqs last enabled at (138448): []= __do_softirq+0x347/0x456 > > [ 352.295233] softirqs last disabled at (138443): []= irq_exit+0x7d/0xb0 > > [ 352.304210] CPU: 46 PID: 9922 Comm: daxctl Not tainted 5.7.0-BEN-142= 38-g373c6049b336 #30 > > [ 352.313283] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLY= XCRB1.86B.0578.D07.1902280810 02/28/2019 > > [ 352.324308] RIP: 0010:memset_erms+0x9/0x10 > > [ 352.328905] Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01= 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1= aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01 > > [ 352.349953] RSP: 0018:ffffc90021b5fd50 EFLAGS: 00010202 ORIG_RAX: ff= ffffffffffff13 > > [ 352.358450] RAX: 00000000000000ff RBX: ffff88983ffd5000 RCX: 0000000= 065df0e40 > > [ 352.366457] RDX: 00000003a0000000 RSI: 00000000ffffffff RDI: ffffea0= 39b20f1c0 > > [ 352.374465] RBP: ffff88983ffd6c00 R08: 0000000000000000 R09: ffffea0= 061000000 > > [ 352.382473] R10: 0000000000000001 R11: 0000000000000000 R12: ffffea0= 06f808000 > > [ 352.390480] R13: 0000000001840000 R14: 000000000e800000 R15: ffff899= 7d7b764e0 > > [ 352.398487] FS: 00007f724ef81780(0000) GS:ffff8997df100000(0000) kn= lGS:0000000000000000 > > [ 352.407562] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 352.414011] CR2: 00007ffcd4070768 CR3: 000001178c722002 CR4: 0000000= 0003606e0 > > [ 352.422056] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000= 000000000 > > [ 352.430092] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000= 000000400 > > [ 352.438115] Call Trace: > > [ 352.440865] remove_pfn_range_from_zone+0x3a/0x380 > > [ 352.446244] ? cpumask_next+0x17/0x20 > > [ 352.450361] memunmap_pages+0x17f/0x280 > > [ 352.454670] release_nodes+0x22a/0x260 > > [ 352.458888] __device_release_driver+0x172/0x220 > > [ 352.464070] device_driver_detach+0x3e/0xa0 > > [ 352.468753] unbind_store+0x113/0x130 > > [ 352.472868] kernfs_fop_write+0xdc/0x1c0 > > [ 352.477273] vfs_write+0xde/0x1d0 > > [ 352.482218] ksys_write+0x58/0xd0 > > [ 352.487207] do_syscall_64+0x5a/0x120 > > [ 352.492529] entry_SYSCALL_64_after_hwframe+0x49/0xb3 > > [ 352.499446] RIP: 0033:0x7f724f40b5e7 > > [ 352.504673] Code: Bad RIP value. > > [ 352.509484] RSP: 002b:00007ffcd40738f8 EFLAGS: 00000246 ORIG_RAX: 00= 00000000000001 > > [ 352.519213] RAX: ffffffffffffffda RBX: 00007f724ef816a8 RCX: 00007f7= 24f40b5e7 > > [ 352.528410] RDX: 0000000000000007 RSI: 00005617d7cd1277 RDI: 0000000= 000000003 > > [ 352.537573] RBP: 0000000000000003 R08: 00000000ffffffff R09: 00007ff= cd40737d0 > > [ 352.546764] R10: 0000000000000000 R11: 0000000000000246 R12: 0000561= 7d7cd1277 > > [ 352.555929] R13: 0000000000000000 R14: 0000000000000007 R15: 0000561= 7d7cd1230 > > [ 370.353742] Built 2 zonelists, mobility grouping on. Total pages: 4= 9050381 > > [ 370.373317] Policy zone: Normal > > [ 374.948164] Built 3 zonelists, mobility grouping on. Total pages: 4= 9312525 > > [ 375.017496] Policy zone: Normal >=20 > I'd have stripped this to a reasonable minimum. >=20 > >=20 > > Fixes: commit d33695b16a9f ("mm/memory_hotplug: poison memmap in remove= _pfn_range_from_zone()") > > Reported-by: "Scargall, Steve" > > Reported-by: Ben Widawsky > > Cc: Dan Williams > > Cc: David Hildenbrand > > Cc: Vishal Verma > > Signed-off-by: Ben Widawsky > > --- > > mm/memory_hotplug.c | 13 +++++++++++-- > > 1 file changed, 11 insertions(+), 2 deletions(-) > >=20 > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > > index 9b34e03e730a..da374cd3d45b 100644 > > --- a/mm/memory_hotplug.c > > +++ b/mm/memory_hotplug.c > > @@ -471,11 +471,20 @@ void __ref remove_pfn_range_from_zone(struct zone= *zone, > > unsigned long start_pfn, > > unsigned long nr_pages) > > { > > + const unsigned long end_pfn =3D start_pfn + nr_pages; > > struct pglist_data *pgdat =3D zone->zone_pgdat; > > - unsigned long flags; > > + unsigned long pfn, cur_nr_pages, flags; > > =20 > > /* Poison struct pages because they are now uninitialized again. */ > > - page_init_poison(pfn_to_page(start_pfn), sizeof(struct page) * nr_pag= es); > > + for (pfn =3D start_pfn; pfn < end_pfn; pfn +=3D cur_nr_pages) { > > + cond_resched(); > > + > > + /* Select all remaining pages up to the next section boundary */ > > + cur_nr_pages =3D > > + min(end_pfn - pfn, SECTION_ALIGN_UP(pfn + 1) - pfn); >=20 > Nit: I'd have used the same indentation as the code you copied this from. >=20 > > + page_init_poison(pfn_to_page(pfn), > > + sizeof(struct page) * cur_nr_pages); > > + } > > =20 > > #ifdef CONFIG_ZONE_DEVICE > > /* > >=20 >=20 > Thanks! >=20 > Acked-by: David Hildenbrand >=20 > --=20 > Thanks, >=20 > David / dhildenb