From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A6AAC433F5 for ; Thu, 7 Apr 2022 08:11:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 706E26B0073; Thu, 7 Apr 2022 04:11:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6B43D6B0074; Thu, 7 Apr 2022 04:11:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 57CBB6B0075; Thu, 7 Apr 2022 04:11:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 49FE16B0073 for ; Thu, 7 Apr 2022 04:11:16 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 25BD7821C9 for ; Thu, 7 Apr 2022 08:11:06 +0000 (UTC) X-FDA: 79329362532.18.42A45F1 Received: from mail-lf1-f49.google.com (mail-lf1-f49.google.com [209.85.167.49]) by imf27.hostedemail.com (Postfix) with ESMTP id A83B44000F for ; Thu, 7 Apr 2022 08:11:05 +0000 (UTC) Received: by mail-lf1-f49.google.com with SMTP id bu29so8368952lfb.0 for ; Thu, 07 Apr 2022 01:11:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:date:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=tzNXTnvicimo5igsg5uLTkrG6O3he7W4o2IwRIIM8dw=; b=IpZUVdRCO/gzpTEUUPSAIEx7OMK6bybzLGqRSSN4JoGo9FB0l8AvMJuF08Aoq+cwjQ Q2onPt6P6hOg+elFq4N4/kWK+AjsAwL8P86hTyKQ1kl/f/evjcWnmRQdNh6nNNaoikaL ayvWrfUgBxvJ/09b2A3Yz/QRLxT9ha1aTRVbzepHgiz7Uacdck6CSOPxtHV5wyEDaeez YTAxvIgXSNMN0dDwXJrLiNhm2FaSp/GqPxpZQQUQE7W2a2d3KNf/V4EyeHY5+zCtDo54 q7CwmOdD6DIMNxrEa16qMVe1KLVjjz4eXxnm2yIBjJhBTNSNs8qfEMK6A8gCXpxrPkYP SK/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:date:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=tzNXTnvicimo5igsg5uLTkrG6O3he7W4o2IwRIIM8dw=; b=vPI0345OCgfWASVzfUZOBD1y+IA0EnHJICNpvqtkY4Ix9JRqe4Val8McSxRfjjqYUt xTORhrAutcWE8hgglqJhNC1NY2/JvbCP2C7rijMuVG9xamMOO9pbwMHaDR3lzixA0Dn1 l7O8JRrXxRsYXZXwxXP6Ldj/VGMRsetMFC1KJ4vHKuNi71Zs6Jjstp++ylNgGV1XBwtI ggdU5uPfKmplgOmKOQCzOKjcttWWpFwAfKD05NikZEgbFy7S80TElopZUulUmdDwpYFT KVoWhxCNxKL0WPAzizqBSsmM/Oru4FxJsrRoB6b+xSgY7ot8Os6HtPWDDOHfMLy08s3m 4I5w== X-Gm-Message-State: AOAM5304xZge0CLCqOoFbOoDCxx+9J97Vr4I0/VcuBMEtrrH/xbuav28 LGz3NVf7B+8KukIpQVNjick= X-Google-Smtp-Source: ABdhPJyhz4uyWpG4uPC6uXcf1PFH3ltdJblp4/9Cy2uIUmtiD/iupla65rx6gP+U8vFhyEzTKO/IgA== X-Received: by 2002:a05:6512:31d5:b0:44a:7a30:d83 with SMTP id j21-20020a05651231d500b0044a7a300d83mr8276262lfe.330.1649319063825; Thu, 07 Apr 2022 01:11:03 -0700 (PDT) Received: from pc638.lan ([155.137.26.201]) by smtp.gmail.com with ESMTPSA id z15-20020a19650f000000b0044b023012dfsm1388061lfb.127.2022.04.07.01.11.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Apr 2022 01:11:03 -0700 (PDT) From: Uladzislau Rezki X-Google-Original-From: Uladzislau Rezki Date: Thu, 7 Apr 2022 10:11:01 +0200 To: Omar Sandoval Cc: linux-mm@kvack.org, kexec@lists.infradead.org, Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Baoquan He , x86@kernel.org, kernel-team@fb.com Subject: Re: [PATCH v2] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore Message-ID: References: <52f819991051f9b865e9ce25605509bfdbacadcd.1649277321.git.osandov@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <52f819991051f9b865e9ce25605509bfdbacadcd.1649277321.git.osandov@fb.com> X-Stat-Signature: 1md143k3zzz86ig6cot3j5jcte1tdpmo Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=IpZUVdRC; spf=pass (imf27.hostedemail.com: domain of urezki@gmail.com designates 209.85.167.49 as permitted sender) smtp.mailfrom=urezki@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: A83B44000F X-HE-Tag: 1649319065-603611 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > From: Omar Sandoval > > Commit 3ee48b6af49c ("mm, x86: Saving vmcore with non-lazy freeing of > vmas") introduced set_iounmap_nonlazy(), which sets vmap_lazy_nr to > lazy_max_pages() + 1, ensuring that any future vunmaps() immediately > purge the vmap areas instead of doing it lazily. > > Commit 690467c81b1a ("mm/vmalloc: Move draining areas out of caller > context") moved the purging from the vunmap() caller to a worker thread. > Unfortunately, set_iounmap_nonlazy() can cause the worker thread to spin > (possibly forever). For example, consider the following scenario: > > 1. Thread reads from /proc/vmcore. This eventually calls > __copy_oldmem_page() -> set_iounmap_nonlazy(), which sets > vmap_lazy_nr to lazy_max_pages() + 1. > 2. Then it calls free_vmap_area_noflush() (via iounmap()), which adds 2 > pages (one page plus the guard page) to the purge list and > vmap_lazy_nr. vmap_lazy_nr is now lazy_max_pages() + 3, so the > drain_vmap_work is scheduled. > 3. Thread returns from the kernel and is scheduled out. > 4. Worker thread is scheduled in and calls drain_vmap_area_work(). It > frees the 2 pages on the purge list. vmap_lazy_nr is now > lazy_max_pages() + 1. > 5. This is still over the threshold, so it tries to purge areas again, > but doesn't find anything. > 6. Repeat 5. > > If the system is running with only one CPU (which is typicial for kdump) > and preemption is disabled, then this will never make forward progress: > there aren't any more pages to purge, so it hangs. If there is more than > one CPU or preemption is enabled, then the worker thread will spin > forever in the background. (Note that if there were already pages to be > purged at the time that set_iounmap_nonlazy() was called, this bug is > avoided.) > > This can be reproduced with anything that reads from /proc/vmcore > multiple times. E.g., vmcore-dmesg /proc/vmcore. > > It turns out that improvements to vmap() over the years have obsoleted > the need for this "optimization". I benchmarked > `dd if=/proc/vmcore of=/dev/null` with 4k and 1M read sizes on a system > with a 32GB vmcore. The test was run on 5.17, 5.18-rc1 with a fix that > avoided the hang, and 5.18-rc1 with set_iounmap_nonlazy() removed > entirely: > > |5.17 |5.18+fix|5.18+removal > 4k|40.86s| 40.09s| 26.73s > 1M|24.47s| 23.98s| 21.84s > > The removal was the fastest (by a wide margin with 4k reads). This patch > removes set_iounmap_nonlazy(). > > Signed-off-by: Omar Sandoval > --- > Changes from v1: > > - Remove set_iounmap_nonlazy() entirely instead of fixing it. > > arch/x86/include/asm/io.h | 2 -- > arch/x86/kernel/crash_dump_64.c | 1 - > mm/vmalloc.c | 11 ----------- > 3 files changed, 14 deletions(-) > > diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h > index f6d91ecb8026..e9736af126b2 100644 > --- a/arch/x86/include/asm/io.h > +++ b/arch/x86/include/asm/io.h > @@ -210,8 +210,6 @@ void __iomem *ioremap(resource_size_t offset, unsigned long size); > extern void iounmap(volatile void __iomem *addr); > #define iounmap iounmap > > -extern void set_iounmap_nonlazy(void); > - > #ifdef __KERNEL__ > > void memcpy_fromio(void *, const volatile void __iomem *, size_t); > diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c > index a7f617a3981d..97529552dd24 100644 > --- a/arch/x86/kernel/crash_dump_64.c > +++ b/arch/x86/kernel/crash_dump_64.c > @@ -37,7 +37,6 @@ static ssize_t __copy_oldmem_page(unsigned long pfn, char *buf, size_t csize, > } else > memcpy(buf, vaddr + offset, csize); > > - set_iounmap_nonlazy(); > iounmap((void __iomem *)vaddr); > return csize; > } > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index e163372d3967..0b17498a34f1 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1671,17 +1671,6 @@ static DEFINE_MUTEX(vmap_purge_lock); > /* for per-CPU blocks */ > static void purge_fragmented_blocks_allcpus(void); > > -#ifdef CONFIG_X86_64 > -/* > - * called before a call to iounmap() if the caller wants vm_area_struct's > - * immediately freed. > - */ > -void set_iounmap_nonlazy(void) > -{ > - atomic_long_set(&vmap_lazy_nr, lazy_max_pages()+1); > -} > -#endif /* CONFIG_X86_64 */ > - > /* > * Purges all lazily-freed vmap areas. > */ > -- > 2.35.1 > Much more better way of fixing it :) Reviewed-by: Uladzislau Rezki (Sony) -- Uladzislau Rezki