From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7F9FEB64D9 for ; Tue, 27 Jun 2023 14:17:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 03B208D0003; Tue, 27 Jun 2023 10:17:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F2D6A8D0001; Tue, 27 Jun 2023 10:17:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DF4F68D0003; Tue, 27 Jun 2023 10:17:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id D012D8D0001 for ; Tue, 27 Jun 2023 10:17:45 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 967A91209FF for ; Tue, 27 Jun 2023 14:17:45 +0000 (UTC) X-FDA: 80948731290.01.C58D582 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf09.hostedemail.com (Postfix) with ESMTP id 22961140005 for ; Tue, 27 Jun 2023 14:17:42 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=WTxcbKg3; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf09.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687875463; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=K6TZQqfTC0phvzBA5+w9u6YZyNkINgqRSZTxEAYxUCk=; b=e+YGwEu+7D+3/TLZykAULLgFeVcoIOPl4XnxHuBXVqf+Q1XF7gJ0zrexs1P3VTPJ82fNLl 5N5fGli8lIMYW8DCCiONJRSw6D5og6H6LIy7LORAIC83eQ3u3ZyNmkEdoc/vdwZf6GNyrx KVn+FTzmJpuFB4UHvmWRIC2s4Sjji10= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=WTxcbKg3; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf09.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687875463; a=rsa-sha256; cv=none; b=v4Feqa+T7VtGy22RcLrElWWhdFAUlbMUj4BXjaz/6UlNUsdWN5Kz6QfErPCNg7KoQJoIyp 6Xeayc1RddPapikuXiFC8fhK+4HCVITrYMLNE4wLuuY7+ier7qLIYUkCBiWQWnynf4eR10 xGpy2xvVnW3IQdUD1+Xmh3o3SqhimwE= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 97F8F1F459; Tue, 27 Jun 2023 14:17:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1687875461; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=K6TZQqfTC0phvzBA5+w9u6YZyNkINgqRSZTxEAYxUCk=; b=WTxcbKg3Ypgtl9LOZmsm2ItRA90Vd1ORSw9Ivf7jTlzkgPaptz7wItfYE+Dz/SfMjvjH31 lIQqlz0QKwsH5NWri+NopPqZDsRN4Wi5RLm2S6AXqD0qZG5D+ZEYqgKxCamb6RoIDC43rY SdRxu5mtAVIp26G25JhQWYBB91Il/0c= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 7C2FB13462; Tue, 27 Jun 2023 14:17:41 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id cLbtG4XvmmRmBwAAMHmgww (envelope-from ); Tue, 27 Jun 2023 14:17:41 +0000 Date: Tue, 27 Jun 2023 16:17:40 +0200 From: Michal Hocko To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, virtualization@lists.linux-foundation.org, Andrew Morton , "Michael S. Tsirkin" , John Hubbard , Oscar Salvador , Jason Wang , Xuan Zhuo Subject: Re: [PATCH v1 3/5] mm/memory_hotplug: make offline_and_remove_memory() timeout instead of failing on fatal signals Message-ID: References: <20230627112220.229240-1-david@redhat.com> <20230627112220.229240-4-david@redhat.com> <74cbbdd3-5a05-25b1-3f81-2fd47e089ac3@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <74cbbdd3-5a05-25b1-3f81-2fd47e089ac3@redhat.com> X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 22961140005 X-Stat-Signature: prr6i5jyw3a7pxdquqfxhaezu1gz48j1 X-HE-Tag: 1687875462-757200 X-HE-Meta: U2FsdGVkX1+FaoSHXtDQcahxJ5BBTO4TE4AmqlpFQfyeipxPnwY1QKoptc+85ibDotVHxcUg6uNonqV8YZdvmAh9rOSr4yBadrI+LxHXex+Ovq4xkfOVSUU+yunk5NtghFlnaIMocPX9NcmpxodEC4wyEZtXmT1tacMyyjbaNGYGjOzraIA9uZa4I8h5XebTKB4t/CJc7jd8K+FNicOXw+3IsbedLv5tTyeCM8pJLoGMDmpzcsqz1p0MP9YCVeF3bY0+KGimtfZUP8JFvTvlrVyQmYmEvDtsBnkmr/QZzrZJH8LMaCz+4OTWp+lzdfyoxiRa9+d7SMBdYy3ZJzJ6kPaKIwHhrBT062pksU2/J1n/cgcFVL50gQGlbJoOMCZ1Sj4okh3IkKq5Jr8Zmj20KzEY4gmXiLoQjB3kk5+sgxIIWaAx0ZneRB6BdSQI20Ix6+zysBo0Rx4VTFuwb1VYjuaAJTsaWBjfz/zQiDTJBk1n2FKhdQV6uo4rYBWG2qrTsKoALJKjyJA4bwxmqxlnqnOJKiE5n0fs2npoI1gP0HkXPyRWLTeW1k2boOMd6NyhrcwRkb3m/2FMN2TR54RpcYk/vqwUyUH48ak9fNT1xCkUPRX0DM2248C2Ln7Cud0fUiOsJ6RjSez/2UJL7ax8/2hgb0gq6OhqAYHWvB2TvPuR+C8xGbR8YE+fQcc00QvERi9Rt9iV5zwHStJmA6VAk44NIEjp2ctcjJK+rRF5I6CUtTXbgHcSOa8cLWc3hI+SftFjbp1SBaRZYuxm0K0EVLWMIuI+Z6mIZKn09SpINVFXSRba2NxCSmZ5UnTRe7msyl03xevkcqKeHD9fjsLhP+Ad4zx6Dr2Xjkj22FkyLBQyzV0VRVWusI0m+D2Qr6sTws9KAQOGsAF1e6KAPaLPTIPlAQJB5UmirOxwv7Z/ZF63gNSLPREj9xOVc20NeIQWms9HVTsMn08/UfCvPcB Is9bOtIS 2Ljv96EKlqOgMYJ0l1shT9TZ6Mou7B5Rm6S4ygAmB4aMuEKl0AdaMiQzWi8s3aViBca7Fr5jy29HCGP4veTNuGUEVytoHUfpPnPxplpwZnqLrqE4Fho3fcKpITY4yCB3HVIZYKrjZNHiborFMsDTZoeOlvHvt8EbjCAtztPc3mN85fVTGqhk6WaS/9059ip5SqMO1lNR47LRCRrc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 27-06-23 15:14:11, David Hildenbrand wrote: > On 27.06.23 14:40, Michal Hocko wrote: > > On Tue 27-06-23 13:22:18, David Hildenbrand wrote: > > > John Hubbard writes [1]: > > > > > > Some device drivers add memory to the system via memory hotplug. > > > When the driver is unloaded, that memory is hot-unplugged. > > > > > > However, memory hot unplug can fail. And these days, it fails a > > > little too easily, with respect to the above case. Specifically, if > > > a signal is pending on the process, hot unplug fails. > > > > > > [...] > > > > > > So in this case, other things (unmovable pages, un-splittable huge > > > pages) can also cause the above problem. However, those are > > > demonstrably less common than simply having a pending signal. I've > > > got bug reports from users who can trivially reproduce this by > > > killing their process with a "kill -9", for example. > > > > This looks like a bug of the said driver no? If the tear down process is > > killed it could very well happen right before offlining so you end up in > > the very same state. Or what am I missing? > > IIUC (John can correct me if I am wrong): > > 1) The process holds the device node open > 2) The process gets killed or quits > 3) As the process gets torn down, it closes the device node > 4) Closing the device node results in the driver removing the device and > calling offline_and_remove_memory() > > So it's not a "tear down process" that triggers that offlining_removal > somehow explicitly, it's just a side-product of it letting go of the device > node as the process gets torn down. Isn't that just fragile? The operation might fail for other reasons. Why cannot there be a hold on the resource to control the tear down explicitly? > > > Especially with ZONE_MOVABLE, offlining is supposed to work in most > > > cases when offlining actually hotplugged (not boot) memory, and only fail > > > in rare corner cases (e.g., some driver holds a reference to a page in > > > ZONE_MOVABLE, turning it unmovable). > > > > > > In these corner cases we really don't want to be stuck forever in > > > offline_and_remove_memory(). But in the general cases, we really want to > > > do our best to make memory offlining succeed -- in a reasonable > > > timeframe. > > > > > > Reliably failing in the described case when there is a fatal signal pending > > > is sub-optimal. The pending signal check is mostly only relevant when user > > > space explicitly triggers offlining of memory using sysfs device attributes > > > ("state" or "online" attribute), but not when coming via > > > offline_and_remove_memory(). > > > > > > So let's use a timer instead and ignore fatal signals, because they are > > > not really expressive for offline_and_remove_memory() users. Let's default > > > to 30 seconds if no timeout was specified, and limit the timeout to 120 > > > seconds. > > > > I really hate having timeouts back. They just proven to be hard to get > > right and it is essentially a policy implemented in the kernel. They > > simply do not belong to the kernel space IMHO. > > As much as I agree with you in terms of offlining triggered from user space > (e.g., write "state" or "online" attribute) where user-space is actually in > charge and can do something reasonable (timeout, retry, whatever), in these > the offline_and_remove_memory() case it's the driver that wants a > best-effort memory offlining+removal. > > If it times out, virtio-mem will simply try another block or retry later. > Right now, it could get stuck forever in offline_and_remove_memory(), which > is obviously "not great". Fortunately, for virtio-mem it's configurable and > we use the alloc_contig_range()-method for now as default. It seems that offline_and_remove_memory is using a wrong operation then. If it wants an opportunistic offlining with some sort of policy. Timeout might be just one policy to use but failure mode or a retry count might be a better fit for some users. So rather than (ab)using offline_pages, would be make more sense to extract basic offlining steps and allow drivers like virtio-mem to reuse them and define their own policy? > If it would time out for John's driver, we most certainly don't want to be > stuck in offline_and_remove_memory(), blocking device/driver unloading (and > even a reboot IIRC) possibly forever. Now I am confused. John driver wants to tear down the device now? There is no way to release that memory otherwise AFAIU from the initial problem description. -- Michal Hocko SUSE Labs