From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B22FEB64DC for ; Tue, 27 Jun 2023 12:40:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6C9DE8D0002; Tue, 27 Jun 2023 08:40:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 69F5D8D0001; Tue, 27 Jun 2023 08:40:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 567928D0002; Tue, 27 Jun 2023 08:40:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 46FE58D0001 for ; Tue, 27 Jun 2023 08:40:37 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 0A16B160161 for ; Tue, 27 Jun 2023 12:40:37 +0000 (UTC) X-FDA: 80948486514.24.4E400BB Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf01.hostedemail.com (Postfix) with ESMTP id 83A574001A for ; Tue, 27 Jun 2023 12:40:33 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=UxRGLpzK; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf01.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687869633; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sRDXYyjYq2aZJPbYdPZ7TDkJO6W/Spxbg8G9agS1s/c=; b=kgiMYXYLyQEWAjNuNxFcB5jiY6PIDy12sJelL7VElhVZjp9fQw8KSWaEKEMic7iMTT5qwr OYfY95iPRI/pO9iBE8wDgZwOlwyUBqgmQ3nWJcnHUn4SlYhf/IzJAmGKbl1Eq1t0eWLvWH bFLwgaIedCiE+ukQt3Me0WbAb7QDvO4= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=UxRGLpzK; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf01.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687869633; a=rsa-sha256; cv=none; b=rn84pkR2h+GJUO9CUjhqQZ33CJYEGyUtflvnCDxbueJbySGLLew5yW9rOha0zlT0h0zf9w KP7t03NXWW7Hr95JTayMA0Nyl7xFVbxIymMa86gGxMZxTMjTaTvOUhVY8u0TORvIt0I/HA Snn5lE+I519PQhkBrlQ3yE1iL/OIEKA= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id E8CC41F8C3; Tue, 27 Jun 2023 12:40:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1687869631; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=sRDXYyjYq2aZJPbYdPZ7TDkJO6W/Spxbg8G9agS1s/c=; b=UxRGLpzKE+7i9P2YS7a9iyAiQYrS7RxZ8ThjGgg+8F7CR+XrQJO6gGd+pXrheC38q3AhVj W1EEDt0HeSRqxyoo0H0l639ou3rnRj6gVQ1crcvW1aK/dL18vQqohAWOfDzWwyS6Xa/M7s KdZ4kCYZHqnOmBDGYb+yjgsfXkTb0Ts= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id CCADE13462; Tue, 27 Jun 2023 12:40:31 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id ynuuL7/YmmSJTwAAMHmgww (envelope-from ); Tue, 27 Jun 2023 12:40:31 +0000 Date: Tue, 27 Jun 2023 14:40:31 +0200 From: Michal Hocko To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, virtualization@lists.linux-foundation.org, Andrew Morton , "Michael S. Tsirkin" , John Hubbard , Oscar Salvador , Jason Wang , Xuan Zhuo Subject: Re: [PATCH v1 3/5] mm/memory_hotplug: make offline_and_remove_memory() timeout instead of failing on fatal signals Message-ID: References: <20230627112220.229240-1-david@redhat.com> <20230627112220.229240-4-david@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230627112220.229240-4-david@redhat.com> X-Rspamd-Queue-Id: 83A574001A X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: krh1wcxt88i7ba7r457nfwzy7tz38prt X-HE-Tag: 1687869633-825085 X-HE-Meta: U2FsdGVkX1/sZRpMpF2YBNuMTqABCu63sXYzQ5A104ReMMx4HJtq3O2P1RSAoQcBrC98JIjyHQIWR+yoeyfWDhplIENYk0HkMgOpxLC0Uv6JxA5ntHHezolp6htL3FvlscKTZ2OFXHWn8dwSYFQnqK2gEF171OCJyl0fN38Ark1Z+uuoHaHw0HR/u/vlFGJluIsfifXHgqi0TZIz23vGs7gBGwFWfundAvsUxoEBe+q3FSI1JPMDiG0tDeZYPrNJvU9r5FJInvR4s9X8HZShRKH1Djy8VS7qlXykgSjN3OWV3zhWqV43RPKINg/S1dtdR0XL7CjqSNMoVwvqK+CNtWQN7BrMEuquwEp7Xq9HOyvV2HHnm0qfP/ZjV+iUuheDyFVOjlPeNZhIcU0dniO/nS5RWQGS+GPGHpvADP/p0UV2PIEAJ/l4529/32GC1OuSZbe5Q5e307Da/wMfS6Tg0w0os6F6fi1Kz6S5BHxURubJoyPzLIkWSjveSAlTQsf5rHAAIBgn1HKPcSXTsb39ZeMv9owQWXBwxTedNAfvKB6JY24TKrJVDlMSW0AtEX5nsxoHefPV77dLRNW2rac9/QikmdScmrnSL/UOoinm7hBZeNL4Spd4M/Ncudx0c5fVce/Mp2J83juSb8uSmakrNDhgvAGOMJ4LUfeS31GZuVj4CVQEvELpgIRuEliSgn/8BStBtIQ1I5H/eiatC5zkNdjSi6Kc7xAfKoHiqe5CbBsdRWalt42R6m0golSYREGKnR+lTEU1HTgcGRtjhZ72bxp9piRayu6cwL2FYW5Z3GetgOWE+itx8OF5/t1RXS5dH+V3CV9UBP8Aoq6a+qHCdFSxVsJmoQDSFy/gY12crSE9t3oLPawGZVZX7lmpJY97BNFc2E5rYYKBFc40zh0kcWe6OEeR3gPtDruJpt1kEW+wk7HaA10/eDiIjZ7uRydJ8Ul8kMzl+nu/SeS54wv LSo96687 WRU/xLk0ASMLER8dXPWIuMFOIJa14thFDSPzQCYyL5XqFsMP+g0I76htw/rypULbTQXiNYbHERwJAVKwCZFg/0CKRgA5JARkwZrQvo7rRnKjC9Zw83VxwCXG+mm4NSDIbUgaNUrfJ5hyYCG4CAxqSmFyfMwl4Dvks6JlJ4ib+FU3FWyf/XxKacyxHD72oJItsy2E1yaDnnk1VG9qPpjrZfSj4r6NZbjaG5L6bOa/lJ43HTDAHfcmhAjg3yEnS751hRIr0uvzFU+rlId0D10aSPRW/Tj9GoatsL4JyUQZYrUP/u0SCmJt97pz7iw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 27-06-23 13:22:18, David Hildenbrand wrote: > John Hubbard writes [1]: > > Some device drivers add memory to the system via memory hotplug. > When the driver is unloaded, that memory is hot-unplugged. > > However, memory hot unplug can fail. And these days, it fails a > little too easily, with respect to the above case. Specifically, if > a signal is pending on the process, hot unplug fails. > > [...] > > So in this case, other things (unmovable pages, un-splittable huge > pages) can also cause the above problem. However, those are > demonstrably less common than simply having a pending signal. I've > got bug reports from users who can trivially reproduce this by > killing their process with a "kill -9", for example. This looks like a bug of the said driver no? If the tear down process is killed it could very well happen right before offlining so you end up in the very same state. Or what am I missing? > Especially with ZONE_MOVABLE, offlining is supposed to work in most > cases when offlining actually hotplugged (not boot) memory, and only fail > in rare corner cases (e.g., some driver holds a reference to a page in > ZONE_MOVABLE, turning it unmovable). > > In these corner cases we really don't want to be stuck forever in > offline_and_remove_memory(). But in the general cases, we really want to > do our best to make memory offlining succeed -- in a reasonable > timeframe. > > Reliably failing in the described case when there is a fatal signal pending > is sub-optimal. The pending signal check is mostly only relevant when user > space explicitly triggers offlining of memory using sysfs device attributes > ("state" or "online" attribute), but not when coming via > offline_and_remove_memory(). > > So let's use a timer instead and ignore fatal signals, because they are > not really expressive for offline_and_remove_memory() users. Let's default > to 30 seconds if no timeout was specified, and limit the timeout to 120 > seconds. I really hate having timeouts back. They just proven to be hard to get right and it is essentially a policy implemented in the kernel. They simply do not belong to the kernel space IMHO. > This change is also valuable for virtio-mem in BBM (Big Block Mode) with > "bbm_safe_unplug=off", to avoid endless loops when stuck forever in > offline_and_remove_memory(). > > While at it, drop the "extern" from offline_and_remove_memory() to make > it fit into a single line. > > [1] https://lkml.kernel.org/r/20230620011719.155379-1-jhubbard@nvidia.com > > Signed-off-by: David Hildenbrand > --- > drivers/virtio/virtio_mem.c | 2 +- > include/linux/memory_hotplug.h | 2 +- > mm/memory_hotplug.c | 50 ++++++++++++++++++++++++++++++++-- > 3 files changed, 50 insertions(+), 4 deletions(-) > > diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c > index cb8bc6f6aa90..f8792223f1db 100644 > --- a/drivers/virtio/virtio_mem.c > +++ b/drivers/virtio/virtio_mem.c > @@ -738,7 +738,7 @@ static int virtio_mem_offline_and_remove_memory(struct virtio_mem *vm, > "offlining and removing memory: 0x%llx - 0x%llx\n", addr, > addr + size - 1); > > - rc = offline_and_remove_memory(addr, size); > + rc = offline_and_remove_memory(addr, size, 0); > if (!rc) { > atomic64_sub(size, &vm->offline_size); > /* > diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h > index 9fcbf5706595..d5f9e8b5a4a4 100644 > --- a/include/linux/memory_hotplug.h > +++ b/include/linux/memory_hotplug.h > @@ -307,7 +307,7 @@ extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages, > struct zone *zone, struct memory_group *group); > extern int remove_memory(u64 start, u64 size); > extern void __remove_memory(u64 start, u64 size); > -extern int offline_and_remove_memory(u64 start, u64 size); > +int offline_and_remove_memory(u64 start, u64 size, unsigned int timeout_ms); > > #else > static inline void try_offline_node(int nid) {} > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 0d2151df4ee1..ca635121644a 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -152,6 +152,22 @@ void put_online_mems(void) > > bool movable_node_enabled = false; > > +/* > + * Protected by the device hotplug lock: offline_and_remove_memory() > + * will activate a timer such that offlining cannot be stuck forever. > + * > + * With an active timer, fatal signals will be ignored, because they can be > + * counter-productive when dying user space triggers device unplug/driver > + * unloading that ends up offlining+removing device memory. > + */ > +static bool mhp_offlining_timer_active; > +static atomic_t mhp_offlining_timer_expired; > + > +static void mhp_offline_timer_fn(struct timer_list *unused) > +{ > + atomic_set(&mhp_offlining_timer_expired, 1); > +} > + > #ifndef CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE > int mhp_default_online_type = MMOP_OFFLINE; > #else > @@ -1879,7 +1895,18 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, > do { > pfn = start_pfn; > do { > - if (fatal_signal_pending(current)) { > + /* > + * If a timer is active, we're coming via > + * offline_and_remove_memory() and want to ignore even > + * fatal signals. > + */ > + if (mhp_offlining_timer_active) { > + if (atomic_read(&mhp_offlining_timer_expired)) { > + ret = -ETIMEDOUT; > + reason = "timeout"; > + goto failed_removal_isolated; > + } > + } else if (fatal_signal_pending(current)) { > ret = -EINTR; > reason = "signal backoff"; > goto failed_removal_isolated; > @@ -2232,11 +2259,17 @@ static int try_reonline_memory_block(struct memory_block *mem, void *arg) > * memory is still in use. Primarily useful for memory devices that logically > * unplugged all memory (so it's no longer in use) and want to offline + remove > * that memory. > + * > + * offline_and_remove_memory() will not fail on fatal signals. Instead, it will > + * fail once the timeout has been reached and offlining was not completed. If > + * no timeout was specified, it will timeout after 30 seconds. The timeout is > + * limited to 120 seconds. > */ > -int offline_and_remove_memory(u64 start, u64 size) > +int offline_and_remove_memory(u64 start, u64 size, unsigned int timeout_ms) > { > const unsigned long mb_count = size / memory_block_size_bytes(); > uint8_t *online_types, *tmp; > + struct timer_list timer; > int rc; > > if (!IS_ALIGNED(start, memory_block_size_bytes()) || > @@ -2261,9 +2294,22 @@ int offline_and_remove_memory(u64 start, u64 size) > > lock_device_hotplug(); > > + if (!timeout_ms) > + timeout_ms = 30 * MSEC_PER_SEC; > + timeout_ms = min_t(unsigned int, timeout_ms, 120 * MSEC_PER_SEC); > + > + timer_setup_on_stack(&timer, mhp_offline_timer_fn, 0); > + mod_timer(&timer, jiffies + msecs_to_jiffies(timeout_ms)); > + mhp_offlining_timer_active = true; > + > tmp = online_types; > rc = walk_memory_blocks(start, size, &tmp, try_offline_memory_block); > > + timer_delete_sync(&timer); > + atomic_set(&mhp_offlining_timer_expired, 0); > + mhp_offlining_timer_active = false; > + destroy_timer_on_stack(&timer); > + > /* > * In case we succeeded to offline all memory, remove it. > * This cannot fail as it cannot get onlined in the meantime. > -- > 2.40.1 -- Michal Hocko SUSE Labs