From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EFBB5C433EF for ; Thu, 14 Apr 2022 10:33:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7DF406B0071; Thu, 14 Apr 2022 06:33:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 78EA46B0073; Thu, 14 Apr 2022 06:33:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 657D76B0074; Thu, 14 Apr 2022 06:33:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 55F7F6B0071 for ; Thu, 14 Apr 2022 06:33:55 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 29F6121985 for ; Thu, 14 Apr 2022 10:33:55 +0000 (UTC) X-FDA: 79355124030.04.534EAA9 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf16.hostedemail.com (Postfix) with ESMTP id 7545218000A for ; Thu, 14 Apr 2022 10:33:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649932434; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=jUX2G7DyK38yuP2M+/ZAFmDx8S4by9kdwSl0hA3UCfk=; b=A0yid2lkxGS4BrcyebgtnJ/xxUg4AQVIT5Ar1QexOuxf9KOB2C/1zI/Q5IQEmOw5rL4aoU 6Bpe2KM1GN7ueQ97Hf1aGKv1VsgKRR2DIQ7AGfy7fuOGuH+06pkXYoY/GDsM3eppyp46aj qvbL3+0RoyguT+Wv5LAAGR6lVm0k/2g= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-454-V5CEsT3DPh28sTFgk3Ezyw-1; Thu, 14 Apr 2022 06:33:48 -0400 X-MC-Unique: V5CEsT3DPh28sTFgk3Ezyw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 22BCC802803; Thu, 14 Apr 2022 10:33:48 +0000 (UTC) Received: from localhost (ovpn-13-186.pek2.redhat.com [10.72.13.186]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4D2411415105; Thu, 14 Apr 2022 10:33:46 +0000 (UTC) Date: Thu, 14 Apr 2022 18:33:43 +0800 From: Baoquan He To: Omar Sandoval Cc: linux-mm@kvack.org, kexec@lists.infradead.org, Andrew Morton , Uladzislau Rezki , Christoph Hellwig , x86@kernel.org, kernel-team@fb.com Subject: Re: [PATCH v3] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore Message-ID: References: <53f25ddc953211d50bed06427d695f51f5ea37c7.1649867251.git.osandov@fb.com> MIME-Version: 1.0 In-Reply-To: <53f25ddc953211d50bed06427d695f51f5ea37c7.1649867251.git.osandov@fb.com> X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=A0yid2lk; spf=none (imf16.hostedemail.com: domain of bhe@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=bhe@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 7545218000A X-Stat-Signature: h7oy9gamo9raduhocekr94797nq593e6 X-HE-Tag: 1649932434-47786 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 04/13/22 at 09:28am, Omar Sandoval wrote: > From: Omar Sandoval > > Commit 3ee48b6af49c ("mm, x86: Saving vmcore with non-lazy freeing of > vmas") introduced set_iounmap_nonlazy(), which sets vmap_lazy_nr to > lazy_max_pages() + 1, ensuring that any future vunmaps() immediately > purge the vmap areas instead of doing it lazily. > > Commit 690467c81b1a ("mm/vmalloc: Move draining areas out of caller > context") moved the purging from the vunmap() caller to a worker thread. > Unfortunately, set_iounmap_nonlazy() can cause the worker thread to spin > (possibly forever). For example, consider the following scenario: > > 1. Thread reads from /proc/vmcore. This eventually calls > __copy_oldmem_page() -> set_iounmap_nonlazy(), which sets > vmap_lazy_nr to lazy_max_pages() + 1. > 2. Then it calls free_vmap_area_noflush() (via iounmap()), which adds 2 > pages (one page plus the guard page) to the purge list and > vmap_lazy_nr. vmap_lazy_nr is now lazy_max_pages() + 3, so the > drain_vmap_work is scheduled. > 3. Thread returns from the kernel and is scheduled out. > 4. Worker thread is scheduled in and calls drain_vmap_area_work(). It > frees the 2 pages on the purge list. vmap_lazy_nr is now > lazy_max_pages() + 1. > 5. This is still over the threshold, so it tries to purge areas again, > but doesn't find anything. > 6. Repeat 5. > > If the system is running with only one CPU (which is typicial for kdump) > and preemption is disabled, then this will never make forward progress: > there aren't any more pages to purge, so it hangs. If there is more than > one CPU or preemption is enabled, then the worker thread will spin > forever in the background. (Note that if there were already pages to be > purged at the time that set_iounmap_nonlazy() was called, this bug is > avoided.) > > This can be reproduced with anything that reads from /proc/vmcore > multiple times. E.g., vmcore-dmesg /proc/vmcore. > > It turns out that improvements to vmap() over the years have obsoleted > the need for this "optimization". I benchmarked > `dd if=/proc/vmcore of=/dev/null` with 4k and 1M read sizes on a system > with a 32GB vmcore. The test was run on 5.17, 5.18-rc1 with a fix that > avoided the hang, and 5.18-rc1 with set_iounmap_nonlazy() removed > entirely: > > |5.17 |5.18+fix|5.18+removal > 4k|40.86s| 40.09s| 26.73s > 1M|24.47s| 23.98s| 21.84s > > The removal was the fastest (by a wide margin with 4k reads). This patch > removes set_iounmap_nonlazy(). > > Fixes: 690467c81b1a ("mm/vmalloc: Move draining areas out of caller context") > Reviewed-by: Christoph Hellwig > Reviewed-by: Uladzislau Rezki (Sony) > Acked-by: Chris Down > Signed-off-by: Omar Sandoval This is a great fix and with detailed explanation, thanks. Acked-by: Baoquan He > --- > Changes from v2 -> v3: > > - Add Fixes and Reviewed-by tags (no code changes) > > Changes from v1 -> v2: > > - Remove set_iounmap_nonlazy() entirely instead of fixing it. > > arch/x86/include/asm/io.h | 2 -- > arch/x86/kernel/crash_dump_64.c | 1 - > mm/vmalloc.c | 11 ----------- > 3 files changed, 14 deletions(-) > > diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h > index f6d91ecb8026..e9736af126b2 100644 > --- a/arch/x86/include/asm/io.h > +++ b/arch/x86/include/asm/io.h > @@ -210,8 +210,6 @@ void __iomem *ioremap(resource_size_t offset, unsigned long size); > extern void iounmap(volatile void __iomem *addr); > #define iounmap iounmap > > -extern void set_iounmap_nonlazy(void); > - > #ifdef __KERNEL__ > > void memcpy_fromio(void *, const volatile void __iomem *, size_t); > diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c > index a7f617a3981d..97529552dd24 100644 > --- a/arch/x86/kernel/crash_dump_64.c > +++ b/arch/x86/kernel/crash_dump_64.c > @@ -37,7 +37,6 @@ static ssize_t __copy_oldmem_page(unsigned long pfn, char *buf, size_t csize, > } else > memcpy(buf, vaddr + offset, csize); > > - set_iounmap_nonlazy(); > iounmap((void __iomem *)vaddr); > return csize; > } > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index e163372d3967..0b17498a34f1 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1671,17 +1671,6 @@ static DEFINE_MUTEX(vmap_purge_lock); > /* for per-CPU blocks */ > static void purge_fragmented_blocks_allcpus(void); > > -#ifdef CONFIG_X86_64 > -/* > - * called before a call to iounmap() if the caller wants vm_area_struct's > - * immediately freed. > - */ > -void set_iounmap_nonlazy(void) > -{ > - atomic_long_set(&vmap_lazy_nr, lazy_max_pages()+1); > -} > -#endif /* CONFIG_X86_64 */ > - > /* > * Purges all lazily-freed vmap areas. > */ > -- > 2.35.2 > > >