From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41742C3DA7F for ; Mon, 12 Aug 2024 06:52:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 93CDC6B0092; Mon, 12 Aug 2024 02:52:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8C5476B0098; Mon, 12 Aug 2024 02:52:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 766136B009F; Mon, 12 Aug 2024 02:52:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 580BE6B0092 for ; Mon, 12 Aug 2024 02:52:17 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 063191C28A2 for ; Mon, 12 Aug 2024 06:52:17 +0000 (UTC) X-FDA: 82442674314.19.A78370C Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf16.hostedemail.com (Postfix) with ESMTP id 115B7180018 for ; Mon, 12 Aug 2024 06:52:14 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=none; spf=pass (imf16.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723445443; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hctmvNAn6nYDc80gDZWMJGFWHigeMGtvI2WxXt+g44c=; b=nwqKzS2JPNQikU1HGmBh2e1bbuRgWJi9mXUk4rwfqS773ZqeevEyuax34uvv1pP+BxmKDS 0/vLeQ3iYy9UOeL0W+uNVQGJtmOygjsqjh8EbRKDodm28UT4nTPCLGzGrbLt8vJAXtqqBk wRkDCT4bzcHbFFOgsR6nwKI+gLohzmk= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; spf=pass (imf16.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723445443; a=rsa-sha256; cv=none; b=6RIgx8SvYAnF94LTyUDr+wArjC6V5Dd64rurL5JEl8q49WvuNEw4d3lELzYGrN5x49dl6i aOK2xYxp6R4pqSLLJCUyzOFIA8Jfw9RPsLB0wRYzW54MY3IVTSCJDEK+u8JrLarzoNon2w Vhby+Vzy+oKujNB8y6C6Zs+Afg9r6Pc= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E7CFEFEC; Sun, 11 Aug 2024 23:52:39 -0700 (PDT) Received: from [10.162.43.141] (e116581.arm.com [10.162.43.141]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id BDDEF3F6A8; Sun, 11 Aug 2024 23:52:04 -0700 (PDT) Message-ID: <95b72817-5444-4ced-998a-1cb90f42bf49@arm.com> Date: Mon, 12 Aug 2024 12:22:01 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch To: "Huang, Ying" Cc: akpm@linux-foundation.org, shuah@kernel.org, david@redhat.com, willy@infradead.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, osalvador@suse.de, baolin.wang@linux.alibaba.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, ioworker0@gmail.com, gshan@redhat.com, mark.rutland@arm.com, kirill.shutemov@linux.intel.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, broonie@kernel.org, mgorman@techsingularity.net, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org References: <20240809103129.365029-1-dev.jain@arm.com> <20240809103129.365029-2-dev.jain@arm.com> <87frrauwwv.fsf@yhuang6-desk2.ccr.corp.intel.com> <15dbe4ac-a036-4029-ba08-e12a236f448a@arm.com> <87bk1yuuzu.fsf@yhuang6-desk2.ccr.corp.intel.com> Content-Language: en-US From: Dev Jain In-Reply-To: <87bk1yuuzu.fsf@yhuang6-desk2.ccr.corp.intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 115B7180018 X-Stat-Signature: qfouhhsao36emhgw3xsirqd7mkpiocmr X-Rspam-User: X-HE-Tag: 1723445534-676042 X-HE-Meta: U2FsdGVkX1+7B+wjHKH9g+bmKId07U2ExPjiTOwdzeB3/WJmNve7pqnWkp4dGnWb9FyP/kF8hnaDCSCLpxYP2w+FbDDLY/naJ4xQqAry6CAaK4K5a6s7qC2gN4djX4j+NXayIMGPUdon3QClWxuAxMYEfq3WUYJbVv7xKYuU6My4VE9ljkWczDipF9kAJIpWbSrnH5PWiQvj6YFWFUk62518iRzgNUKMVzP/0Y1L0bp22kTVd0CXRiXd8hDbXnTKU6hsH1TYxiT/Rkxxr5rrgCQL/Yh3q39dS/XrtQfkXVAYpi+y5Jsy0S3xru39cUelldfzFAOvaEXvBMwE176zxr4RwG3+urqO2MdJzYiwTooUITSJaJA0ccnGz9lUW44P6B60TBYyuxRtPN7qx09vetvCYbyO3yq1x56SAiAzpM2miE+j5EROHEovvbqYxDtML7O08GM2CEiU3tRyMvmHIGe7PJMXvyWGRnU6T789sd5AYi1hAtndD7aBM43oSFgbdUPLdKObGaxUeqhfaQMRHIt+QIPZAScsUj5/1LZ2ggbpdD/0Q77haDhD4tTJgrT1kvLsYiiBGICWoLTgU0fWFpnbiTsjAVE5j4UyhdtOAgdHbpmUkFfVL6tfKVd66oIDPzF3WkYeprYoToavQ3EE5bxc9ZgtdPpb+VI5wl6ukFMVW9r0ChyMX2qu8RaF3/Sv4jCQ+N1M4uQCKX5I7E0FAfLtGkT/nq39yL0AT8ZNtKLea7KSf6gJ36XawwasEZhLW2JSCmB9KH9dvsUosJ+aWoyG9nnhb6e8Tl4TYQYKAcyeIuXOG07HHVAbhCcisDn7qg1gAj+2Q05DLFxjtL/oFpDououSL2GWIpUNemf/M3Vrgwy5BE/z16l3/Yf1ox2FCoBw8gwc1erEjqIOTGCNv402sJirpgR8Ul7J5dXNttK+lWPdaGsOlzZOFmnlWHnlJu7D/2Z2/Xe/MMNSWDO mcJwAec2 KqeSNErH3OANpi3hgnSuSBinfm77D8+A8vMARYAc3Y0HvUb7wjoa+2pchIUHhU3R/1Gb+tZGbR1IqKEgNw3SNPpqhYiifWdXF5Ggf53ztYKdKYIdB5aGVqah/AEJx1S299PojRRd46uuugLKb3SDtlXHK2PJqQ33T4TsZDMhG2emnpVB246m8esiHgwnTzOrEYe0EV05dfjwITOA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 8/12/24 11:45, Huang, Ying wrote: > Dev Jain writes: > >> On 8/12/24 11:04, Huang, Ying wrote: >>> Hi, Dev, >>> >>> Dev Jain writes: >>> >>>> As already being done in __migrate_folio(), wherein we backoff if the >>>> folio refcount is wrong, make this check during the unmapping phase, upon >>>> the failure of which, the original state of the PTEs will be restored and >>>> the folio lock will be dropped via migrate_folio_undo_src(), any racing >>>> thread will make progress and migration will be retried. >>>> >>>> Signed-off-by: Dev Jain >>>> --- >>>> mm/migrate.c | 9 +++++++++ >>>> 1 file changed, 9 insertions(+) >>>> >>>> diff --git a/mm/migrate.c b/mm/migrate.c >>>> index e7296c0fb5d5..477acf996951 100644 >>>> --- a/mm/migrate.c >>>> +++ b/mm/migrate.c >>>> @@ -1250,6 +1250,15 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, >>>> } >>>> if (!folio_mapped(src)) { >>>> + /* >>>> + * Someone may have changed the refcount and maybe sleeping >>>> + * on the folio lock. In case of refcount mismatch, bail out, >>>> + * let the system make progress and retry. >>>> + */ >>>> + struct address_space *mapping = folio_mapping(src); >>>> + >>>> + if (folio_ref_count(src) != folio_expected_refs(mapping, src)) >>>> + goto out; >>>> __migrate_folio_record(dst, old_page_state, anon_vma); >>>> return MIGRATEPAGE_UNMAP; >>>> } >>> Do you have some test results for this? For example, after applying the >>> patch, the migration success rate increased XX%, etc. >> I'll get back to you on this. >> >>> My understanding for this issue is that the migration success rate can >>> increase if we undo all changes before retrying. This is the current >>> behavior for sync migration, but not for async migration. If so, we can >>> use migrate_pages_sync() for async migration too to increase success >>> rate? Of course, we need to change the function name and comments. >> >> As per my understanding, this is not the current behaviour for sync >> migration. After successful unmapping, we fail in migrate_folio_move() >> with -EAGAIN, we do not call undo src+dst (rendering the loop around >> migrate_folio_move() futile), we do not push the failed folio onto the >> ret_folios list, therefore, in _sync(), _batch() is never tried again. > In migrate_pages_sync(), migrate_pages_batch(,MIGRATE_ASYNC) will be > called first, if failed, the folio will be restored to the original > state (unlocked). Then migrate_pages_batch(,_SYNC*) is called again. > So, we unlock once. If it's necessary, we can unlock more times via > another level of loop. Yes, that's my point. We need to undo src+dst and retry. We will have to decide where we want this retrying to be; do we want to change the return value, end up in the while loop wrapped around _sync(), and retry there by adding another level of loop, or do we want to make use of the existing retry loops, one of which is wrapped around _unmap(); the latter is my approach. The utility I see for the former approach is that, in case of a large number of page migrations (which should usually be the case), we are giving more time for the folio to get retried. The latter does not give much time and discards the folio if it did not succeed under 7 times. > > -- > Best Regards, > Huang, Ying