From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52C9EEB64D7 for ; Wed, 21 Jun 2023 08:11:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B239E8D0003; Wed, 21 Jun 2023 04:11:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AD3738D0001; Wed, 21 Jun 2023 04:11:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 99B778D0003; Wed, 21 Jun 2023 04:11:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 8BC3B8D0001 for ; Wed, 21 Jun 2023 04:11:51 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 3F836AFFAD for ; Wed, 21 Jun 2023 08:11:51 +0000 (UTC) X-FDA: 80926036422.02.159E5C6 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf14.hostedemail.com (Postfix) with ESMTP id D19EF100010 for ; Wed, 21 Jun 2023 08:11:48 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Z8ypdSa5; spf=pass (imf14.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687335109; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DnhJV2KGCC+wiy6UkhMaiziRwfOnl5wpgh8up90fkFU=; b=g1F06/lih0UVQ1nPk4VS78iPtzs5fn4ILmxk+UhaOfWxRNHi+bTHp4i1OQtIfAcv+xK38z OsP6oPSUxv6pL156hkOoZPDocbNfEu88FlSGevjjkN+lHFsda9WdUlSw96ylA+yXeAiAf4 duDraUo8r/VxEZOWbXCKRi1dahfH+2I= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687335109; a=rsa-sha256; cv=none; b=dMtBc2eAGvzYB9MtSC1UpVXmxUHiP63Onys9Iq7S/dUeagELS4r3hOSuGexM+Ygj0e5Gpw gi56f43nBAWWSmqM5Yzxm75Ouv3gCwiHaJ3HUesKY2oMBv124I/yoxyGV3h6Z49GSwnFFn nKfA0mzdVnzaK/fqMEdPv4r+bvGHJ80= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Z8ypdSa5; spf=pass (imf14.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1687335108; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DnhJV2KGCC+wiy6UkhMaiziRwfOnl5wpgh8up90fkFU=; b=Z8ypdSa5pVhJxWlOykifonopz3vI2zHsuc6pJytLSs2AhW+84s7S5eMVTg7+L6K/FS/lT6 DFwhit2Iv/kfqnZMhZaISvjahJ4uIf93Tx6ZXronIB+WYFsCrcN1kGy64WDnyNpvZLECGi 52TGTCBSJE8A1QsgyjgDNisQcqDUvt8= Received: from mail-lf1-f70.google.com (mail-lf1-f70.google.com [209.85.167.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-45-h7axjcIVP8avTLCy5fvxjA-1; Wed, 21 Jun 2023 04:11:46 -0400 X-MC-Unique: h7axjcIVP8avTLCy5fvxjA-1 Received: by mail-lf1-f70.google.com with SMTP id 2adb3069b0e04-4f76712f950so4189896e87.0 for ; Wed, 21 Jun 2023 01:11:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687335105; x=1689927105; h=content-transfer-encoding:in-reply-to:subject:organization:from :content-language:references:cc:to:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=DnhJV2KGCC+wiy6UkhMaiziRwfOnl5wpgh8up90fkFU=; b=EYBNwCH8rcjMjXl1aWKFcbfnxN0p34ei4wdl8VyP8ORCi32ZKzBEGLmj4bmI9vyrBn 9H91x0gfVTF8//5EKsa5nsg5AGhp+iG0eZpnVMFWwryaKMTT/6nmZ3WfiuApEo7qKax9 Zr/sLgTnef0VCcsW/344voWji/NlD7raO5NnuMzTDpGqQKeTXt8GC+Ed1ZBc+3GScBmv 70gHXazQNaNElKnwLMx7cSbyhEdtRIz5PAHTA4n39orBziGTBuGsrQtjdsx5gtik7N3i iKkz8HS/D27UySwm8T/Rda+SizUvLppjik6K6oFY4J1NsKRFktzvga1F1QFfeAhn4byJ kt4Q== X-Gm-Message-State: AC+VfDwyytWX8fUYmAv5+qln5EO3QRXq8KRKgHhdpR+dEZTzs0ksmvsR cAulBA2S/dI2MtXQ0C/pGyxfM36Mk8vxiHrfUISNknUH2w7TnuibN9tIrDFHyx+Tt4pukKfZa9n 1Lu9R9e7Kxr0= X-Received: by 2002:a05:6512:556:b0:4f8:4a86:3d82 with SMTP id h22-20020a056512055600b004f84a863d82mr9657104lfl.51.1687335104940; Wed, 21 Jun 2023 01:11:44 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5QHb/j3jEs5APyw67qLtHqMCCJJ4sJJDxS4xRs2nB3GY+63qdju+Fdp4euidCYoPbhNLySCA== X-Received: by 2002:a05:6512:556:b0:4f8:4a86:3d82 with SMTP id h22-20020a056512055600b004f84a863d82mr9657084lfl.51.1687335104461; Wed, 21 Jun 2023 01:11:44 -0700 (PDT) Received: from ?IPV6:2003:cb:c70b:9c00:7978:3030:9d9a:1aef? (p200300cbc70b9c00797830309d9a1aef.dip0.t-ipconnect.de. [2003:cb:c70b:9c00:7978:3030:9d9a:1aef]) by smtp.gmail.com with ESMTPSA id a17-20020a5d5711000000b003062b2c5255sm3779725wrv.40.2023.06.21.01.11.43 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 21 Jun 2023 01:11:43 -0700 (PDT) Message-ID: <83689f25-ca50-7ece-45f0-a936e704df7d@redhat.com> Date: Wed, 21 Jun 2023 10:11:43 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 To: John Hubbard , Oscar Salvador Cc: Andrew Morton , LKML , linux-mm@kvack.org References: <20230620011719.155379-1-jhubbard@nvidia.com> <80e01fa9-28c0-37e8-57f8-5bb4ce9a9db7@nvidia.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH] mm/memory_hotplug.c: don't fail hot unplug quite so eagerly In-Reply-To: <80e01fa9-28c0-37e8-57f8-5bb4ce9a9db7@nvidia.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Stat-Signature: h5q41u4ix3haso6xk7tummtowh8ojdgp X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: D19EF100010 X-Rspam-User: X-HE-Tag: 1687335108-409875 X-HE-Meta: U2FsdGVkX1/S3z5HxjcCpqqx5dzGyc+YIX8hzM9Eh5Dm0P82JoSb4VmYs9uR2TIN7aDmYzmjfuyexkp8tuNrMiTG0+LcUyZLt4MBgHFihv1RLut23Qz9+7So3BquVH1LjfJ3gUmPx5yG3eTlUasnk3eO3aei5C6oGCTaGJ777RL6ugDwMZQdkqLlieBiB2m/JYVl3Wm7t4LN656rA2vHek+YI2Z5sAlAp6jXZGWvEzAwJtmgTBMXcyBpIzjz/oKmShQSbOv1ycM+rKj9g3L6XqETYOlefwMaMpZzLn6gra9VfL7X6gEVeuNT9bZg8/3pqQu18RtPInib21yYF52dVtREizDUoDlZ5eeZ3plEyRkKxU6+QkhpENlUPGPI8jghFknUbO4CavBncj84w16Lnphar3ND6CmgmZNeW6MA+T7KZ9748EzNzas6QSUOIPb0/uL44XzIgGxiHXdY2DIxsVGKB7zJxgXnJoOq3PzdJDJ8e0zMSwBGaS3ieFRXXO8m/z45bmSnVwPC1A7DVVj4k0Lyo4ThRR6zxyvr6U03UG7XB8j54lgW5+Ld5yUPkgLFwu53/q0dbd/SYPfCuMc6yD2GlWtx6/BfM+4vTHV2mf8UmfRWpHJeNRdbWUQMKZNUuSJPRJfI1CKzlLdji2iBSs1r6kHrBSSK+2xtYLkHzKh7dJm5AFj5S9FNmLVqv2Hlimw8Q4iQKLINV9tPE3sn3J76WGMirF8DY11butsxZ6K47V4yGhcvsnIsTW5hLKoWrb28oeFYnY8ZP2JscUPF/nQNrhyO8UbcdNeNFj5as/euHiTPXfUVnXGkN1lTOoKZzmUpxz2exEnpV+MGx9y9yJRMQOBMKlLa5QN1Pl8wRaodq/WVXiJTyCFuD65efMvJMxcM8lwC7cejumr5bMCk91VLQnozKBipiq1y6SUhADZ7l5I+UsSX2IItk12inrirnpO0wo/rd9f5rbqCtpt Fti0WK6a OjFvgh1EVK3IROazrYYByPm7HGSD57t96xetwZOY2nTZrHsQtgT7GrYjmcHLR8ceUsjCUx1K5qND5kbUTlk49+bgwmzGgzm0pfN6VWXwEx7GKI5ro1ODmFPufzwxJjlxObJxa6aDTlb4dyqn6nBpFBbky/S2VTDdntoHaVfmXlCiOA0zw95SiV1NE4f+sl7bX9RtC2AVCuBtPv0wERNNui2c35hy11Jh/OpnUqk9HDN8E7Im/DRL0MMXU4g/8BIBrwKP2QtnbRzsbNqyyYrjrU6O9ThAa6kXpTVNyvXCT5j2PCEsK5P2h9Zv/5X4wtnFZ3mkuKscIAA0MBR0OtzuusRA6ayWfw6rfuHUl41nif1MZa4kgO3QjJyZrmw6UPzQF1C4KTJFIWltBGNbTk8Mf+xxZdnSl5R0JS+MDJ3dVNeCoIraod2HoQyrmRv1D7Uc5WSkRcRZ2jlArtZ9/xVKyyDFtfYkazngVKB7+J2n+9Il/EJMLr69wzVZ3sBvsf6jjwQ/W X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 20.06.23 23:54, John Hubbard wrote: > On 6/20/23 00:12, David Hildenbrand wrote: >> On 20.06.23 03:17, John Hubbard wrote: >>> mm/memory_hotplug.c: don't fail hot unplug quite so eagerly >>> >>> Some device drivers add memory to the system via memory hotplug. When >>> the driver is unloaded, that memory is hot-unplugged. >> >> Which interfaces are they using to add/remove memory? > > It's coming in from the kernel driver, like this: > > offline_and_remove_memory() > walk_memory_blocks() > try_offline_memory_block() > device_offline() > memory_subsys_offline() > offline_pages() > > ...and the above is getting invoked as part of killing a user space > process that was helping (for performance reasons) holding the device > nodes open. That triggers a final close of the file descriptors and > leads to tearing down the driver. The teardown succeeds even though > the memory was not offlined, and now everything is, to use a technical > term, "stuck". :) > Ah, I see, thanks! I thought it would just be offlining from user space. > More below... > >> >>> >>> However, memory hot unplug can fail. And these days, it fails a little >>> too easily, with respect to the above case. Specifically, if a signal is >>> pending on the process, hot unplug fails. This leads directly to: the >>> user must reboot the machine in order to unload the driver, and >>> therefore the device is unusable until the machine is rebooted. >> >> Why can't they retry in user space when offlining fails with -EINTR, or re-trigger driver unloading? > > If someone uses "kill -9" to kill that process, then we get here, > because user space cannot trap that signal. Understood, thanks! > > > ... >>> --- a/mm/memory_hotplug.c >>> +++ b/mm/memory_hotplug.c >>> @@ -1879,12 +1879,6 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, >>>       do { >>>           pfn = start_pfn; >>>           do { >>> -            if (signal_pending(current)) { >>> -                ret = -EINTR; >>> -                reason = "signal backoff"; >>> -                goto failed_removal_isolated; >>> -            } >>> - >>>               cond_resched(); >>>               ret = scan_movable_pages(pfn, end_pfn, &pfn); >> >> No, we can't remove that. It's documented behavior that exists precisely for that reason: >> >> https://docs.kernel.org/admin-guide/mm/memory-hotplug.html#id21 >> >> " >> When offlining is triggered from user space, the offlining context can be terminated by sending a fatal signal. A timeout based offlining can easily be implemented via: >> >> % timeout $TIMEOUT offline_block | failure_handling >> " >> >> Otherwise, there is no way to stop an userspace-triggered offline operation that loops forever in the kernel. > > OK yes, I see. > >> >> I guess switching to fatal_signal_pending() might help to some degree, it should keep the timeout trick working. >> >> But it wouldn't help in your case because where root kills arbitrary processes. I'm not sure if that is something we should be paying attention to. >> > > Right. I think it would be more accurate perhaps, but it wouldn't help > this particular complaint. > > Perhaps it is reasonable to claim that, "well, kill -9 *means* that you > end up here!" :) And the above patch clearly is not the way to go, but... > > ...what about discerning between "user initiated offline_pages" and > "offline pages as part of a driver shutdown/unload"? Makes sense to me. There are two ways for triggering it directly from user space: 1) drivers/base/core.c:online_store() 2) drivers/base/memory.c:state_store() We cannot easily hook into 2) to indicate "we're offlining directly from user space". SO we might have to do it the other way around. Something along the following lines should do the trick (expect whitespace damage): diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 53ee7654f009..acd4b739505a 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -152,6 +152,13 @@ void put_online_mems(void) bool movable_node_enabled = false; +/* + * Protected by the device hotplug lock. Indicates whether device offlining + * is triggered from try_offline_memory_block() such that we don't fail memory + * offlining if a signal is pending. + */ +static bool mhp_in_try_offline_memory_block; + #ifndef CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE int mhp_default_online_type = MMOP_OFFLINE; #else @@ -1860,7 +1867,8 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, do { pfn = start_pfn; do { - if (signal_pending(current)) { + if (!mhp_in_try_offline_memory_block && + signal_pending(current)) { ret = -EINTR; reason = "signal backoff"; goto failed_removal_isolated; @@ -2177,7 +2185,9 @@ static int try_offline_memory_block(struct memory_block *mem, void *arg) if (page && zone_idx(page_zone(page)) == ZONE_MOVABLE) online_type = MMOP_ONLINE_MOVABLE; + mhp_in_try_offline_memory_block = true; rc = device_offline(&mem->dev); + mhp_in_try_offline_memory_block = false; /* * Default is MMOP_OFFLINE - change it only if offlining succeeded, * so try_reonline_memory_block() can do the right thing. There is still arch/powerpc/platforms/pseries/hotplug-memory.c that calls device_offline() and would fail on signals (not sure if relevant, like for virtio-mem it shouldn't be that relevant). I guess dlpar_remove_lmb() can now simply call offline_and_remove_memory(). [I might craft a patch later] -- Cheers, David / dhildenb