From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 979FDC433EF for ; Tue, 5 Apr 2022 20:29:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F0C956B0072; Tue, 5 Apr 2022 16:29:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EBB606B0073; Tue, 5 Apr 2022 16:29:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D5BEE6B0074; Tue, 5 Apr 2022 16:29:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id C601C6B0072 for ; Tue, 5 Apr 2022 16:29:08 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 8C7B321C4F for ; Tue, 5 Apr 2022 20:28:58 +0000 (UTC) X-FDA: 79323964356.06.203766D Received: from mail-lj1-f173.google.com (mail-lj1-f173.google.com [209.85.208.173]) by imf27.hostedemail.com (Postfix) with ESMTP id 2400340020 for ; Tue, 5 Apr 2022 20:28:58 +0000 (UTC) Received: by mail-lj1-f173.google.com with SMTP id s13so546767ljd.5 for ; Tue, 05 Apr 2022 13:28:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:date:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=B8h0YgoHok5L3w0aJt1m7RVL2O9EcMBblKRvVbE7hl0=; b=YfwCnqgobn6pKXozJWtm7QR2ShzJHHAoUEG6f8xKXkpkp92zIqhyRfwKqHfXep9vup WoQVC2lLINwhh0T8xbjDD3Wj+dvrty7Gg3WVhwu2Zm77QNw0rN/FstDmQFICiHNIUUwp usHoz9byHpsWIq+MhPcdkZdkI9ZMtVKU1BBI5jlxuMfMB9VhmP78VkG6n0CQNjYQip4T w38SnyGEIBjCrRmD55jU7crnFfSWc8aTAoUC05mMwn8S3ENqRP6XodAJ7jTGwn79Rsl0 pQF+bUMtJxpLmi72Z0q8KbQGLQEisDfBQR+0eSUqUUZnYEhVwTDN45sDsRgAazsRsrtI 9RLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:date:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=B8h0YgoHok5L3w0aJt1m7RVL2O9EcMBblKRvVbE7hl0=; b=cJPngGBSz1t39w6SdvUzGFo4/08KzQcY41wr/45rCrpfE7vdVG+UrNqzM0iIxNYMBv Z4H6dJ1FilwPpYHZumwuTgzc1ENyo6P/uK8KQtPbqeA25P3d2j4zhUPAzgsZIAJ1MLcD Qsme/HsKA007tmOAU2KnbnBpYqoH1sOD0tFEN1IaLhgrmOUH6CmkAoK+uboCcHl+pQQt dae3KqEitUsQx5AZAXNKq2CfmlLzgee1qr47hcVG9oqg2UiOE+uRR79dPDCRxV5mMaLt /rFqd6ZxpeLrEuUmUUZHNyrx8Ao5KAgObg3au/zExD8Xg9au6DhVmxIm8ZFJlRYas4nu hOGA== X-Gm-Message-State: AOAM531mVhypiZIFY6ZPuleLhdpGVM5i6voiF8bXxHWNvdlGztO/UvaY nT53kvLQmjD6qK0X9ahNlPk= X-Google-Smtp-Source: ABdhPJwz6ZKr2uD5eKkp5D1BM0KayXiGxa3Cl9TI984VZFn/9QbkKXNUHeSpsrx9KiUvPSf9CbeOHA== X-Received: by 2002:a2e:bc12:0:b0:249:8daa:4de4 with SMTP id b18-20020a2ebc12000000b002498daa4de4mr3176369ljf.383.1649190536335; Tue, 05 Apr 2022 13:28:56 -0700 (PDT) Received: from pc638.lan ([155.137.26.201]) by smtp.gmail.com with ESMTPSA id v17-20020a2e9f51000000b0024b006037eesm1421055ljk.139.2022.04.05.13.28.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Apr 2022 13:28:55 -0700 (PDT) From: Uladzislau Rezki X-Google-Original-From: Uladzislau Rezki Date: Tue, 5 Apr 2022 22:28:54 +0200 To: Omar Sandoval Cc: linux-mm@kvack.org, kexec@lists.infradead.org, Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Cliff Wickman , x86@kernel.org, kernel-team@fb.com Subject: Re: [PATCH] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore Message-ID: References: <75014514645de97f2d9e087aa3df0880ea311b77.1649187356.git.osandov@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <75014514645de97f2d9e087aa3df0880ea311b77.1649187356.git.osandov@fb.com> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 2400340020 X-Rspam-User: Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=YfwCnqgo; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of urezki@gmail.com designates 209.85.208.173 as permitted sender) smtp.mailfrom=urezki@gmail.com X-Stat-Signature: tnnrgn6duqnxthrg6uwi4ck63568zcok X-HE-Tag: 1649190538-857508 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Apr 05, 2022 at 12:40:31PM -0700, Omar Sandoval wrote: > From: Omar Sandoval > > Commit 3ee48b6af49c ("mm, x86: Saving vmcore with non-lazy freeing of > vmas") introduced set_iounmap_nonlazy(), which sets vmap_lazy_nr to > lazy_max_pages() + 1, ensuring that any future vunmaps() immediately > purges the vmap areas instead of doing it lazily. > > Commit 690467c81b1a ("mm/vmalloc: Move draining areas out of caller > context") moved the purging from the vunmap() caller to a worker thread. > Unfortunately, set_iounmap_nonlazy() can cause the worker thread to spin > (possibly forever). For example, consider the following scenario: > > 1. Thread reads from /proc/vmcore. This eventually calls > __copy_oldmem_page() -> set_iounmap_nonlazy(), which sets > vmap_lazy_nr to lazy_max_pages() + 1. > 2. Then it calls free_vmap_area_noflush() (via iounmap()), which adds 2 > pages (one page plus the guard page) to the purge list and > vmap_lazy_nr. vmap_lazy_nr is now lazy_max_pages() + 3, so the > drain_vmap_work is scheduled. > 3. Thread returns from the kernel and is scheduled out. > 4. Worker thread is scheduled in and calls drain_vmap_area_work(). It > frees the 2 pages on the purge list. vmap_lazy_nr is now > lazy_max_pages() + 1. > 5. This is still over the threshold, so it tries to purge areas again, > but doesn't find anything. > 6. Repeat 5. > > If the system is running with only one CPU (which is typicial for kdump) > and preemption is disabled, then this will never make forward progress: > there aren't any more pages to purge, so it hangs. If there is more than > one CPU or preemption is enabled, then the worker thread will spin > forever in the background. (Note that if there were already pages to be > purged at the time that set_iounmap_nonlazy() was called, this bug is > avoided.) > > This can be reproduced with anything that reads from /proc/vmcore > multiple times. E.g., vmcore-dmesg /proc/vmcore. > > A simple way to "fix" this would be to make set_iounmap_nonlazy() set > vmap_lazy_nr to lazy_max_pages() instead of lazy_max_pages() + 1. But, I > think it'd be better to get rid of this hack of clobbering vmap_lazy_nr. > Instead, this fix makes __copy_oldmem_page() explicitly drain the vmap > areas itself. > > Signed-off-by: Omar Sandoval > --- > arch/x86/include/asm/io.h | 2 +- > arch/x86/kernel/crash_dump_64.c | 2 +- > mm/vmalloc.c | 21 ++++++++++----------- > 3 files changed, 12 insertions(+), 13 deletions(-) > > diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h > index f6d91ecb8026..da466352f27c 100644 > --- a/arch/x86/include/asm/io.h > +++ b/arch/x86/include/asm/io.h > @@ -210,7 +210,7 @@ void __iomem *ioremap(resource_size_t offset, unsigned long size); > extern void iounmap(volatile void __iomem *addr); > #define iounmap iounmap > > -extern void set_iounmap_nonlazy(void); > +void iounmap_purge_vmap_area(void); > > #ifdef __KERNEL__ > > diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c > index a7f617a3981d..075dd36c502d 100644 > --- a/arch/x86/kernel/crash_dump_64.c > +++ b/arch/x86/kernel/crash_dump_64.c > @@ -37,8 +37,8 @@ static ssize_t __copy_oldmem_page(unsigned long pfn, char *buf, size_t csize, > } else > memcpy(buf, vaddr + offset, csize); > > - set_iounmap_nonlazy(); > iounmap((void __iomem *)vaddr); > + iounmap_purge_vmap_area(); > return csize; > } > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index e163372d3967..48084d742688 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1671,17 +1671,6 @@ static DEFINE_MUTEX(vmap_purge_lock); > /* for per-CPU blocks */ > static void purge_fragmented_blocks_allcpus(void); > > -#ifdef CONFIG_X86_64 > -/* > - * called before a call to iounmap() if the caller wants vm_area_struct's > - * immediately freed. > - */ > -void set_iounmap_nonlazy(void) > -{ > - atomic_long_set(&vmap_lazy_nr, lazy_max_pages()+1); > -} > -#endif /* CONFIG_X86_64 */ > - > /* > * Purges all lazily-freed vmap areas. > */ > @@ -1753,6 +1742,16 @@ static void purge_vmap_area_lazy(void) > mutex_unlock(&vmap_purge_lock); > } > > +#ifdef CONFIG_X86_64 > +/* Called after iounmap() to immediately free vm_area_struct's. */ > +void iounmap_purge_vmap_area(void) > +{ > + mutex_lock(&vmap_purge_lock); > + __purge_vmap_area_lazy(ULONG_MAX, 0); > + mutex_unlock(&vmap_purge_lock); > +} > +#endif /* CONFIG_X86_64 */ > + > static void drain_vmap_area_work(struct work_struct *work) > { > unsigned long nr_lazy; > -- > 2.35.1 > Indeed setting the vmap_lazy_nr to lazy_max_pages()+1 in order to force the drain sounds like a hack. Reviewed-by: Uladzislau Rezki (Sony) -- Uladzislau Rezki