From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C91FDC3DA4A for ; Tue, 20 Aug 2024 07:16:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 316276B0083; Tue, 20 Aug 2024 03:16:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2C6ED6B0085; Tue, 20 Aug 2024 03:16:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 18E366B0088; Tue, 20 Aug 2024 03:16:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id EF6CD6B0083 for ; Tue, 20 Aug 2024 03:16:44 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 610B4C17EF for ; Tue, 20 Aug 2024 07:16:44 +0000 (UTC) X-FDA: 82471766328.02.497F273 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf15.hostedemail.com (Postfix) with ESMTP id A7CA0A0009 for ; Tue, 20 Aug 2024 07:16:41 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=none; spf=pass (imf15.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724138123; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=s9gPPuqnrKIFQcwXvLb7H0uKU2RzZDViQneRbuUIlD8=; b=xZFjXsCxAfkr+uj6MMWHVqBs7meLmGA1qgTGHe3dSxofq76l3O3GSxai8xpFuklu7206uD FMrAv26Jj59ZIvMH9hwMlNWP9Jr7NGkUWTBr52i3dxc52ZaQm78oUrpr4wcHfCrUpmpi+9 xer1q6JUWsL7jx486MhYJPU1I5vKqLk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724138123; a=rsa-sha256; cv=none; b=0EeA9F69AhZ8E4r41Twy1ZjneBc81T517TyvJkchPo4fol8Ykh3XDLopBA009E2bu3vnMT 88QuzRDLyQ1VkheAfi58Mq1gulWsPpsYOi3KGLjGPJamAcSkzbd52w+P7OcvsSXon8/CCO hMyZuyPgDIiwLE5kPm1Yod28qpPdNFw= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=none; spf=pass (imf15.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 71C60339; Tue, 20 Aug 2024 00:17:06 -0700 (PDT) Received: from [10.163.86.112] (unknown [10.163.86.112]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4EE643F73B; Tue, 20 Aug 2024 00:16:29 -0700 (PDT) Message-ID: Date: Tue, 20 Aug 2024 12:46:22 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch To: "Huang, Ying" Cc: akpm@linux-foundation.org, shuah@kernel.org, david@redhat.com, willy@infradead.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, osalvador@suse.de, baolin.wang@linux.alibaba.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, ioworker0@gmail.com, gshan@redhat.com, mark.rutland@arm.com, kirill.shutemov@linux.intel.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, broonie@kernel.org, mgorman@techsingularity.net, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org References: <20240809103129.365029-1-dev.jain@arm.com> <20240809103129.365029-2-dev.jain@arm.com> <87frrauwwv.fsf@yhuang6-desk2.ccr.corp.intel.com> <15dbe4ac-a036-4029-ba08-e12a236f448a@arm.com> <87bk1yuuzu.fsf@yhuang6-desk2.ccr.corp.intel.com> <95b72817-5444-4ced-998a-1cb90f42bf49@arm.com> <8734naurhm.fsf@yhuang6-desk2.ccr.corp.intel.com> <9d84e4e8-ac54-4eb1-a113-3f32aea864c9@arm.com> <391d4f4f-e642-4c11-a36b-190874963f8a@arm.com> <87a5h9hud7.fsf@yhuang6-desk2.ccr.corp.intel.com> Content-Language: en-US From: Dev Jain In-Reply-To: <87a5h9hud7.fsf@yhuang6-desk2.ccr.corp.intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Stat-Signature: 7f75p59neb4uqf7niogk6mphzthtoexw X-Rspamd-Queue-Id: A7CA0A0009 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1724138201-956457 X-HE-Meta: U2FsdGVkX1/+cuih+pVC87RWD2xY/RtrrlHgg8EHdZwCcQp9DI7JFNcxrjIal5Wl7Tk5lNXR9R3UqrC9PnUiLP05is0GUZJYNApRAyOVVMU2P6x0E0vWfWrUUWa/xkf/VtDtT+Lty8eKPFHI+QleYLGm+aVrbvxqEaAMujhPEP4O2Tit3qmicV6yQb2JGDhAtd5yYLJdGDaWxCFTVk3EXFwMMoM2+5EvX2rO+RKNf/PGYibpwD44+MGK9PdpHcnOCRQsCzNNDILQsFzwvAjOp7Nr8b0Yvtwti2bR72LXzYkJYJmM/zulsfoo0OCunKKDRrugph2+nJuK5hNL6zUc3pctlkoVnhI4qAp7somp2gYsGk8CoQMQZSWOFchpd5tDcPipdccWr7CBqZQnOYuDAjAk8tFJEbpbrHKudBYtrjSioJHTHE7oFjiX9m7jp+UDKe6ZR+TIUUtJodrQ9+zM7D8sFbIP9wD5xq38gzoA0tUAqmZMzgPx0mv6sbuqX+9jG2eJJEFXstB6fWq9m7jTQg5anvaDRHBhDAubGzGw+OoCBxb593CzB4JnB6fIrNP+5dEvmfbFe071pmhufso+2tUMKJ9AT9F352TA92ckCItJ1ppeA0DcW7CJSgsv0Fxceh1DpOWKoAWLTAZgZ1ezbRV9fqRcAlRgJaiQ3n+3LdR1ZQ/jDHANxFRw1uYYi10+mSHYMMsGWQXjbgw17XpDR9DSr50zk1ihilw8rqbzqEvIUy133l85WLJP0myfq6b7H9f2aGRhEwURCfns23gBbFjGtDUYR+Lghkyw6DMPx56+2uhcSDv4ooKCI1YNLZN2TjMscnnG4v1vIelfyKYiFjqWZuWIwUgX8HkkdQhOstWb2/RCklT1MK5bE7T+WQ9RFAIgB2AepEEaSeyxPy3Rmt/5R5iejIzcQEjsghOD62YFGTzRhpQXOqCZuMzzZQc4xg1RpVM9jWAaw1USChL 6I9tV5Bl H9TFMbAK9cA4R6EALMb1zGHnOrTz3L6+VsqlozOYZSwGg/EMCHS2tF7NMUEp/3Ztf0kI8PIGpAYvadhqeWUHbz+gqZHKsQ/Wd1GA9dzD500kZQcywNS2Si5H6yhaQXGo6FGo7Ur68V45tIv2LazJPANvNd9+Sxo66jaFQagxyWo9uJq/t2MyCP3p4cPdE+SA4xVYwzAHRADGpvfOr2pFAAYPeigc4bx9oYNncHOKCLtzrvu6/988VKUoUJIVeoHOjXjorD9StF2GKG3QS9IPz5Z25d1PA2IYOF1rJbZEDnhBk6ntpb2Z8E3xK7JWyu8WutG6da6RPPw+5y02rGH9vCf7Dv3W5o+KyBSlGdF28O+jy2qmhyhaGMxwDFZ55+EsS1oJr9HZndERZAkTyBQTize1Cqg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 8/19/24 12:28, Huang, Ying wrote: > Dev Jain writes: > >> On 8/13/24 12:52, Dev Jain wrote: >>> On 8/13/24 10:30, Dev Jain wrote: >>>> On 8/12/24 17:38, Dev Jain wrote: >>>>> On 8/12/24 13:01, Huang, Ying wrote: >>>>>> Dev Jain writes: >>>>>> >>>>>>> On 8/12/24 11:45, Huang, Ying wrote: >>>>>>>> Dev Jain writes: >>>>>>>> >>>>>>>>> On 8/12/24 11:04, Huang, Ying wrote: >>>>>>>>>> Hi, Dev, >>>>>>>>>> >>>>>>>>>> Dev Jain writes: >>>>>>>>>> >>>>>>>>>>> As already being done in __migrate_folio(), wherein we >>>>>>>>>>> backoff if the >>>>>>>>>>> folio refcount is wrong, make this check during the >>>>>>>>>>> unmapping phase, upon >>>>>>>>>>> the failure of which, the original state of the PTEs will be >>>>>>>>>>> restored and >>>>>>>>>>> the folio lock will be dropped via migrate_folio_undo_src(), >>>>>>>>>>> any racing >>>>>>>>>>> thread will make progress and migration will be retried. >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Dev Jain >>>>>>>>>>> --- >>>>>>>>>>>     mm/migrate.c | 9 +++++++++ >>>>>>>>>>>     1 file changed, 9 insertions(+) >>>>>>>>>>> >>>>>>>>>>> diff --git a/mm/migrate.c b/mm/migrate.c >>>>>>>>>>> index e7296c0fb5d5..477acf996951 100644 >>>>>>>>>>> --- a/mm/migrate.c >>>>>>>>>>> +++ b/mm/migrate.c >>>>>>>>>>> @@ -1250,6 +1250,15 @@ static int >>>>>>>>>>> migrate_folio_unmap(new_folio_t get_new_folio, >>>>>>>>>>>         } >>>>>>>>>>>           if (!folio_mapped(src)) { >>>>>>>>>>> +        /* >>>>>>>>>>> +         * Someone may have changed the refcount and maybe >>>>>>>>>>> sleeping >>>>>>>>>>> +         * on the folio lock. In case of refcount mismatch, >>>>>>>>>>> bail out, >>>>>>>>>>> +         * let the system make progress and retry. >>>>>>>>>>> +         */ >>>>>>>>>>> +        struct address_space *mapping = folio_mapping(src); >>>>>>>>>>> + >>>>>>>>>>> +        if (folio_ref_count(src) != >>>>>>>>>>> folio_expected_refs(mapping, src)) >>>>>>>>>>> +            goto out; >>>>>>>>>>>             __migrate_folio_record(dst, old_page_state, >>>>>>>>>>> anon_vma); >>>>>>>>>>>             return MIGRATEPAGE_UNMAP; >>>>>>>>>>>         } >>>>>>>>>> Do you have some test results for this?  For example, after >>>>>>>>>> applying the >>>>>>>>>> patch, the migration success rate increased XX%, etc. >>>>>>>>> I'll get back to you on this. >>>>>>>>> >>>>>>>>>> My understanding for this issue is that the migration success >>>>>>>>>> rate can >>>>>>>>>> increase if we undo all changes before retrying. This is the >>>>>>>>>> current >>>>>>>>>> behavior for sync migration, but not for async migration.  If >>>>>>>>>> so, we can >>>>>>>>>> use migrate_pages_sync() for async migration too to increase >>>>>>>>>> success >>>>>>>>>> rate?  Of course, we need to change the function name and >>>>>>>>>> comments. >>>>>>>>> As per my understanding, this is not the current behaviour for sync >>>>>>>>> migration. After successful unmapping, we fail in >>>>>>>>> migrate_folio_move() >>>>>>>>> with -EAGAIN, we do not call undo src+dst (rendering the loop >>>>>>>>> around >>>>>>>>> migrate_folio_move() futile), we do not push the failed folio >>>>>>>>> onto the >>>>>>>>> ret_folios list, therefore, in _sync(), _batch() is never >>>>>>>>> tried again. >>>>>>>> In migrate_pages_sync(), migrate_pages_batch(,MIGRATE_ASYNC) will be >>>>>>>> called first, if failed, the folio will be restored to the original >>>>>>>> state (unlocked).  Then migrate_pages_batch(,_SYNC*) is called >>>>>>>> again. >>>>>>>> So, we unlock once.  If it's necessary, we can unlock more times via >>>>>>>> another level of loop. >>>>>>> Yes, that's my point. We need to undo src+dst and retry. >>>>>> For sync migration, we undo src+dst and retry now, but only once.  You >>>>>> have shown that more retrying increases success rate. >>>>>> >>>>>>> We will have >>>>>>> to decide where we want this retrying to be; do we want to change the >>>>>>> return value, end up in the while loop wrapped around _sync(), >>>>>>> and retry >>>>>>> there by adding another level of loop, or do we want to make use >>>>>>> of the >>>>>>> existing retry loops, one of which is wrapped around _unmap(); >>>>>>> the latter >>>>>>> is my approach. The utility I see for the former approach is >>>>>>> that, in case >>>>>>> of a large number of page migrations (which should usually be >>>>>>> the case), >>>>>>> we are giving more time for the folio to get retried. The latter >>>>>>> does not >>>>>>> give much time and discards the folio if it did not succeed >>>>>>> under 7 times. >>>>>> Because it's a race, I guess that most folios will be migrated >>>>>> successfully in the first pass. >>>>>> >>>>>> My concerns of your method are that it deal with just one case >>>>>> specially.  While retrying after undoing all appears more general. >>>>> >>>>> Makes sense. Also, please ignore my "change the return value" >>>>> thing, I got confused between unmap_folios, ret_folios, etc. >>>>> Now I think I understood what the lists are doing :) >>>>> >>>>>> If it's really important to retry after undoing all, we can either >>>>>> convert two retying loops of migrate_pages_batch() into one loop, or >>>>>> remove retry loop in migrate_pages_batch() and retry in its caller >>>>>> instead. >>>>> And if I implemented this correctly, the following makes the test >>>>> pass always: >>>>> https://www.codedump.xyz/diff/Zrn7EdxzNXmXyNXe >>>> >>>> Okay, I did mess up with the implementation, leading to a false >>>> positive. Let me try again :) >>> >>> Hopefully this should do the job: >>> https://www.codedump.xyz/diff/ZrsIV8JSOPYx5V_u >>> >>> But the result is worse than the patch proposed; I rarely hit >>> a 3 digit number of successes of move_pages(). But, on a >>> base kernel without any changes, when I apply David's >>> suggestion to change the test, if I choose 7 as the number >>> of retries (= NR_MAX_MIGRATE_SYNC_RETRY) in the test, I >>> can touch even 4 digits. I am puzzled. >>> We can also try merging the for loops of unmap and move... >> >> If people are okay with this change, I guess I can send it as >> a v2? I concur with your assessment that my initial approach >> is solving a specific case; the above approach does give me >> a slight improvement on arm64 and should be an improvement >> in general, since it makes sense to defer retrying the failed folio >> as much as we can. > We need to deal with something else before a formal v2, > > - stats need to be fixed, please check result processing for the first > loop of migrate_pages_sync(). Sorry, can you point out where do they need to be fixed exactly? The change I did is inside the while(!list_empty(from)) block, and there is no stat computation being done there already. > > - Do we need something similar for async migration. > > - Can we add another level of explicit loop for the second loop of > migrate_pages_sync()? That is to improve code readability. Or, add a > function to dot that? > > - Is it good to remove retry loop in migrate_pages_batch()? And do > retry in the caller? I am personally in favour of leaving the retry loop, and async migration, as it is. Since async version is basically minimal-effort migration, it won't make sense to "optimize" it, given the code churn it would create, including the change we will have to then do in "if (mode == MIGRATE_ASYNC) => migrate_pages_batch(ASYNC)" inside migrate_pages(). Sorry, what do you mean by "another level of explicit loop"? > > -- > Best Regards, > Huang, Ying