From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89EAEC3DA4A for ; Sun, 11 Aug 2024 06:07:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B47946B008A; Sun, 11 Aug 2024 02:07:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AF7506B0092; Sun, 11 Aug 2024 02:07:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9E60F6B0095; Sun, 11 Aug 2024 02:07:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 812AB6B008A for ; Sun, 11 Aug 2024 02:07:02 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id F12E51219E4 for ; Sun, 11 Aug 2024 06:07:01 +0000 (UTC) X-FDA: 82438931442.19.8A084BC Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf25.hostedemail.com (Postfix) with ESMTP id D9808A0010 for ; Sun, 11 Aug 2024 06:06:59 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=none; spf=pass (imf25.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723356366; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Wr7APZ882bYIFUcI0+LvKvpbicijfB6jFnmvWlv9Np0=; b=D3gWsI1BuT0bkYx3za5MBsfktW/X92FEZrDeWH0RGUSWL7ND81z/GkdzmCg/Dr45P1s9Wl bJRpG33QZpkVCHaNXLeIXTcLNefr7RNLSRAI+mwjK0Kgxn7/SYeW3qlLHjmjYI8yAQkElP BC0rlCX78kplVHwQkSdxLuJtOBTaLI0= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=none; spf=pass (imf25.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723356366; a=rsa-sha256; cv=none; b=DTDuLQm/hrmDnGsm+94ugIZ91dBjB/9QeZJYYNe8Q/cUh7i/uqYyoRVxtJdA+/cDS5yXX+ 8PM/aUd94YqVevFr2HbTEKsGjmzCWt/sUG3Q4hJp7jwnPVZOqcCoeETe3ulZ/VkIxG23VA uzwy7Qyxt5dkmDwGmkphNXCa4KYgfEw= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8DE39FEC; Sat, 10 Aug 2024 23:07:24 -0700 (PDT) Received: from [10.163.57.79] (unknown [10.163.57.79]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3DCF63F6A8; Sat, 10 Aug 2024 23:06:48 -0700 (PDT) Message-ID: Date: Sun, 11 Aug 2024 11:36:26 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch To: David Hildenbrand , akpm@linux-foundation.org, shuah@kernel.org, willy@infradead.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, osalvador@suse.de, baolin.wang@linux.alibaba.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, ioworker0@gmail.com, gshan@redhat.com, mark.rutland@arm.com, kirill.shutemov@linux.intel.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, broonie@kernel.org, mgorman@techsingularity.net, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, ying.huang@intel.com References: <20240809103129.365029-1-dev.jain@arm.com> <20240809103129.365029-2-dev.jain@arm.com> <761ba58e-9d6f-4a14-a513-dcc098c2aa94@redhat.com> <5a4ae1d3-d753-4261-97a8-926e44d4217a@arm.com> <367b0403-7477-4857-9e7c-5a749c723432@redhat.com> Content-Language: en-US From: Dev Jain In-Reply-To: <367b0403-7477-4857-9e7c-5a749c723432@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Stat-Signature: 9any3dqghfarck1x9ppx4ibgzpepkhth X-Rspam-User: X-Rspamd-Queue-Id: D9808A0010 X-Rspamd-Server: rspam02 X-HE-Tag: 1723356419-501710 X-HE-Meta: U2FsdGVkX1+K/UuMyVhQ8yEwO2JMcx0PV53zWvr8SUfgR5bN34gJiiAP6956127mhn3C+3sTe9QVZ1x4f6HMHaCiXcHga5D3p8ygujKBiaibxTJ0nwHZ8NqbCR9JpjCnJx0AcpQ0gmm23R8kZvgUm4Y1B2njjhzk7K7TB+Uz8ql4WtfqkBtyWIvW9t2WwLtgAipnKrtrOiuW2lQSMABLiv6o7zYjKDWDa7bLjCrgtPfXQJWCwFg6e9g6nkD+97l+xoN94LevKTKfBfZswt7R2EW4I7bBith9abYadUFiswZU6bsINNGhJx+pWaknjPytnUwozpfbIPfo6MP9698Xs1gjDWVwCwwOL8OG2OZiC+KetOPH38/ZnnwH90cwvbqo7gRppDT+OzHbwNUytOHuVje9pDdGwq93fvN9Fa6fw3sV9z0PB77YVMxVPZc3ffCxgkBdel3eve+B7Ulx6oWry2DDsNVSgi8ps6nsUYyPqqLvWo2YXgTHEYMSIA5jE9cy4enqb9zf75yBustQwLpxV1jbFzltut+V4ucfYPAtfmbHhgWCNXMshXrvIMQJxgNh5OrwxAGuEUKdc5mkmeiqwJqkmKfhyW1pnNKuXlinILlxTx0DKpRWhyGbfYInNOzB8A4ZqL5QHcTUSaXGD68aY9sy2wThnmHfbR1shYpRRybOwVMvJxBsGRsB5V8+IMv+RsT5hfjJbs8ogNsdCNbBpCxuUTbvpI+fM4mf7Ph3q96W1oVKAoAO/Dwza84mQEzd67cmgz994EfcoQmSUFpJKMAD7Mwc5BRX5JPpn9O0x9LsIhtkRRj2ovVaf+xGXYmxn50pxp5pr0FKhH0ZgMy/QZbi6WmzUpGblB6RbcXlKmBiIm0LAm3TSbDOCs5Uejr1UutEwMP4yZ2y146lvC+4dY6HZ8HKtMjgHhvmO21kr4CtJtcO1WgFhPRDfzh+0uYoQFH0tUoz1iD+pRI7A33 Uela5k8L dULPlcncxVV7rnMeYt4uIoEtOGEkBpRtqAIkNIKFoieDrJLxVfZ3dVWMnNNvEkjo+LtevTBuhzP8r6KSlcjRDVYYMKTuWEQZlhWfddvmSqtiO4bS1m6e52yumRCu0l+nwwwbfJ+dRxgYFrZo5VHK96PmnSlXD+gNACbC7g6eyw5k1KibTNLtLZu3zDC2sOhQnY8E/NrD4XDr65QJnrrcpLBE+Kq8BK8pSPrlOzzEf9dhJH1Y= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 8/11/24 00:22, David Hildenbrand wrote: > On 10.08.24 20:42, Dev Jain wrote: >> >> On 8/9/24 19:17, David Hildenbrand wrote: >>> On 09.08.24 12:31, Dev Jain wrote: >>>> As already being done in __migrate_folio(), wherein we backoff if the >>>> folio refcount is wrong, make this check during the unmapping phase, >>>> upon >>>> the failure of which, the original state of the PTEs will be restored >>>> and >>>> the folio lock will be dropped via migrate_folio_undo_src(), any >>>> racing >>>> thread will make progress and migration will be retried. >>>> >>>> Signed-off-by: Dev Jain >>>> --- >>>>    mm/migrate.c | 9 +++++++++ >>>>    1 file changed, 9 insertions(+) >>>> >>>> diff --git a/mm/migrate.c b/mm/migrate.c >>>> index e7296c0fb5d5..477acf996951 100644 >>>> --- a/mm/migrate.c >>>> +++ b/mm/migrate.c >>>> @@ -1250,6 +1250,15 @@ static int migrate_folio_unmap(new_folio_t >>>> get_new_folio, >>>>        } >>>>          if (!folio_mapped(src)) { >>>> +        /* >>>> +         * Someone may have changed the refcount and maybe sleeping >>>> +         * on the folio lock. In case of refcount mismatch, bail out, >>>> +         * let the system make progress and retry. >>>> +         */ >>>> +        struct address_space *mapping = folio_mapping(src); >>>> + >>>> +        if (folio_ref_count(src) != folio_expected_refs(mapping, >>>> src)) >>>> +            goto out; >>> >>> This really seems to be the latest point where we can "easily" back >>> off and unlock the source folio -- in this function :) >>> >>> I wonder if we should be smarter in the migrate_pages_batch() loop >>> when we start the actual migrations via migrate_folio_move(): if we >>> detect that a folio has unexpected references *and* it has waiters >>> (PG_waiters), back off then and retry the folio later. If it only has >>> unexpected references, just keep retrying: no waiters -> nobody is >>> waiting for the lock to make progress. >> >> >> The patch currently retries migration irrespective of the reason of >> refcount change. >> >> If you are suggesting that, break the retrying according to two >> conditions: > > That's not what I am suggesting ... > >> >> >>> This really seems to be the latest point where we can "easily" back >>> off and unlock the source folio -- in this function :) >>> For example, when migrate_folio_move() fails with -EAGAIN, check if >>> there are waiters (PG_waiter?) and undo+unlock to try again later. >> >> >> Currently, on -EAGAIN, migrate_folio_move() returns without undoing src >> and dst; even if we were to fall > > ... > > I am wondering if we should detect here if there are waiters and undo > src+dst. After undoing src+dst, which restores the PTEs, how are you going to set the PTEs to migration again? That is being done through migrate_folio_unmap(), and the loops of _unmap() and _move() are different. Or am I missing something... >