From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F070AC54E41 for ; Tue, 5 Mar 2024 10:28:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5134F8000A; Tue, 5 Mar 2024 05:28:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4C276940017; Tue, 5 Mar 2024 05:28:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 364248000A; Tue, 5 Mar 2024 05:28:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 218F8940017 for ; Tue, 5 Mar 2024 05:28:44 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id ED9B9A0AB1 for ; Tue, 5 Mar 2024 10:28:43 +0000 (UTC) X-FDA: 81862611726.27.911351A Received: from mail-vs1-f54.google.com (mail-vs1-f54.google.com [209.85.217.54]) by imf10.hostedemail.com (Postfix) with ESMTP id 2E818C0007 for ; Tue, 5 Mar 2024 10:28:41 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ar0FhDoQ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.54 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709634522; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FN77tpG/quuLoYWsibOIr1ogL/zcsrl/LxCdvi8UsMs=; b=kJNcE4bCgb/Fn2dYoPfj5uLsD89Jd/GCr1kk62z19C74pccr1KnXyykk2yJtC2FN0ZNvR7 oTYyin+rcCdPz5XhDBR03QHKRp8qEAKXoeRPMUdqEK7c+ohZ7xCEu3whOK9mJsi0WhkkRW QoxYuQpH9jHYzVCrtlTTh/17IGU52Ls= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ar0FhDoQ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.54 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709634522; a=rsa-sha256; cv=none; b=3utoLTzw09mn7UZ5NWSeMx40w10LzoQWLtI0I5WxdowBS55yQSO+J5qo1UMZggyX5YX0RR OpiUcsfswf5aQ9n9GLzFE+tw3LuXJFT7xjSsdv6sZVo9lBrr9qfIJkOxpi0vwfjS4wUIEQ /oMqa/DEmDJacVrcsog30c13B1wPffg= Received: by mail-vs1-f54.google.com with SMTP id ada2fe7eead31-4725acb539cso1501736137.3 for ; Tue, 05 Mar 2024 02:28:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709634521; x=1710239321; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=FN77tpG/quuLoYWsibOIr1ogL/zcsrl/LxCdvi8UsMs=; b=ar0FhDoQfaMoNAOiFflCqvv3tuMkvVwA2i0OJl/8r0Ryc/f4haim+pk2BbWVSNMXyr M8Qr7HQivD5a2f8r+Uxj3VnlkfNd5FsJmW/KApq8Z2EmhavGjB6msMZAEJ+CzLlBvqg7 yEbiwdanw7RfYfwfHlROyeT1Yv8EYrBzqvcs3tpByYQq9vgLmKz6HJ/tZoKcrQZU+fAA RuBMzVIk6bPDunkLBWIje0ohbVC3G5LytRXZmtN8L/iIH3etAK1F+zM7IxbIL1g3rYjb qdTt0sGZr6lkEfCKnhFMUJaDksnrMHFPzD9kSevpb2N23pNaWqnpM8rq8I52aMxm3FY5 rfQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709634521; x=1710239321; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FN77tpG/quuLoYWsibOIr1ogL/zcsrl/LxCdvi8UsMs=; b=WtI9p09cvyKfxoNF7l1wzi//jZQm8ijRCekyVK2KF+M+FlNMZ/vEL3tszKme4gqlg+ 4ewhgca3zZK1mt3W3rgeGqw3RgzGJgMFR8ee5Udu+rWLLmSVzKIqNE+so21k8rkPaMQV kT+jEXkk8zMyjx2FvtF29KX1leUglMK5hZHBdUUhcaowUqDT99/mtjl1VpvcMfIU6mQR /Vxm2n6PeJC/smbZDtlMBrQ+u+MylxWjjkHBG2rjRmziyZ+7wUVRJK54LZYwzQsvU/UI ao9mbIVGLCf3xr9X2eEkuxdkwjWxRUt5yv4XLIfXoeTu6ViGQMWaoZqUNPavneuKOl/i lelw== X-Forwarded-Encrypted: i=1; AJvYcCVv+sG3Hr2u1Xk8WxGW5J+VieslLhhlLvJhLx1632IT2+LNy7ou9EAecY/C4t7daPwpk0TGC8zW84XXrwxGYYWQVPQ= X-Gm-Message-State: AOJu0YzLlSYDsUTqkv46IZFg7PjQop1rj83ZP0wMBtf4UPtyO7TOzr/W 9FcID1AZxplR2GpOkfZU/DkJ75JARZ6aVo4mDdel81EMbjugNA+kQc9Jvu2X2Fp8NitPGETfOJP fpAwBksiqI0UlP/kFC5pZuT/+nm4= X-Google-Smtp-Source: AGHT+IEp8JxXlndv9uy0AGNOONqWLNFxFTbENb7RZGxpF1laqBG6jfI0UahxVLO2sukcoc6AlZ9ivj5GMuHqAALOzWI= X-Received: by 2002:a05:6102:2cb:b0:471:f1a2:13c8 with SMTP id h11-20020a05610202cb00b00471f1a213c8mr1336080vsh.11.1709634521118; Tue, 05 Mar 2024 02:28:41 -0800 (PST) MIME-Version: 1.0 References: <20240304103757.235352-1-21cnbao@gmail.com> <706b7129-85f6-4470-9fd9-f955a8e6bd7c@arm.com> <37f1e6da-412b-4bb4-88b7-4c49f21f5fe9@redhat.com> <804524c8-772c-42d0-93a5-90d77f13f304@redhat.com> <87r0gp868d.fsf@yhuang6-desk2.ccr.corp.intel.com> <87il2182nr.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Tue, 5 Mar 2024 23:28:29 +1300 Message-ID: Subject: Re: [RFC PATCH] mm: hold PTL from the first PTE while reclaiming a large folio To: "Huang, Ying" Cc: David Hildenbrand , Ryan Roberts , akpm@linux-foundation.org, linux-mm@kvack.org, chrisl@kernel.org, yuzhao@google.com, hanchuanhua@oppo.com, linux-kernel@vger.kernel.org, willy@infradead.org, xiang@kernel.org, mhocko@suse.com, shy828301@gmail.com, wangkefeng.wang@huawei.com, Barry Song , Hugh Dickins Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 2E818C0007 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: d36q41k5fcycdghu9jgc4wjj3qckgkw6 X-HE-Tag: 1709634521-707085 X-HE-Meta: U2FsdGVkX198dTzBDRJBNBzUjvXGYdNg/4lBtTS5Orwobl8K6O8fm5MBYLQXNvdRBvBxzcNxQjmctWUSORH9oRNRc/vkV48M82O2AAZ7ty9BbfqGTKBegEnpiXc0FDuRVw99GRsCaVj0Kpet0Bofomre96HCMcSekLmBhxYDVeVGQQR+VGJdGCPss+G7AZjcwN185UAiAOz1YjFNRd66xNP9QNEUQNG46OePZeAowGHeS/Sqz09mypM4PR/C1fm2NZFTBihxBrZ1IDZ51tTr2uhgoHs4ArDAeQfb/V3ctfSQ/mO5uFHH2B8uWujDkD6zscURTce73FdFqsyL+ewXfmwuDUIcQGBkS8wBo0eiMW3STHHz4q1C2+vfGkGio8nAKrMQ8H09sC50c6BTOM5AcJ/6IxR1w1Ho4SD0Ay5hgugBYZoK5H5wgIguxaGBpeZ56ObNztck/Ws+fCh0b3RhQAA/gyWzzeiVBeo+qppQLp2cF/xnA7GMn7DRyaqiHnUAMImPsg7qI1SMHsWOoXdQZeGAR0lMAu72uYexTnGEW9stB169SVpFrylSnLhc4V2Juvv6BdeaYshqkke5b++8ToD9EUVZGkssPS5vv8O2Z/DfggIevZI1Q4bvj4nAFo7I6XoU05QchtFhjqZGKgg7H+l1peu1GeZ788VbgymNF9vULLeGksXYv+EUxi9XlGv48ZhL0SeVgjFNfyJGo+to0ZASINXb0nk0PM0eBscNLtXY5nwyhqXFRY52rX38rryKEw63PVa5Oj68LdpXbbJgG+avwa/SiEDPWxqU/PMmWTT6mPLLeQsu3YCuKpPDa++rREJ9cegj5k0qQHwZZ+Nm3udCjyFFHjF6CEQ4xLpoi0goS48YLD9l3oMoeByqw/VY5O0awtQVgLiTdYdMbeMiSERUi8d9S1i/ZTBNu1ZS75zluNvmC5lSBTGZr840Lfp4xA7bR2T95NbZGqYKv5g FJNk9YUr HoncCTaxbnAkLfRdKNlOqwCBbj/3Ic00yz2UF X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Mar 5, 2024 at 10:21=E2=80=AFPM Barry Song <21cnbao@gmail.com> wrot= e: > > On Tue, Mar 5, 2024 at 10:12=E2=80=AFPM Huang, Ying wrote: > > > > Barry Song <21cnbao@gmail.com> writes: > > > > > On Tue, Mar 5, 2024 at 8:55=E2=80=AFPM Huang, Ying wrote: > > >> > > >> Barry Song <21cnbao@gmail.com> writes: > > >> > > >> > On Tue, Mar 5, 2024 at 10:15=E2=80=AFAM David Hildenbrand wrote: > > >> >> > But we did "resolve" those bugs by entirely untouching all PTEs= if we > > >> >> > found some PTEs were skipped in try_to_unmap_one [1]. > > >> >> > > > >> >> > While we find we only get the PTL from 2nd, 3rd but not > > >> >> > 1st PTE, we entirely give up on try_to_unmap_one, and leave > > >> >> > all PTEs untouched. > > >> >> > > > >> >> > /* we are not starting from head */ > > >> >> > if (!IS_ALIGNED((unsigned long)pvmw.pte, CONT_PTES * sizeof(*pv= mw.pte))) { > > >> >> > ret =3D false; > > >> >> > atomic64_inc(&perf_stat.mapped_walk_start_f= rom_non_head); > > >> >> > set_pte_at(mm, address, pvmw.pte, pteval); > > >> >> > page_vma_mapped_walk_done(&pvmw); > > >> >> > break; > > >> >> > } > > >> >> > This will ensure all PTEs still have a unified state such as CO= NT-PTE > > >> >> > after try_to_unmap fails. > > >> >> > I feel this could have some false postive because when racing > > >> >> > with unmap, 1st PTE might really become pte_none. So explicitly > > >> >> > holding PTL from 1st PTE seems a better way. > > >> >> > > >> >> Can we estimate the "cost" of holding the PTL? > > >> >> > > >> > > > >> > This is just moving PTL acquisition one or two PTE earlier in thos= e corner > > >> > cases. In normal cases, it doesn't affect when PTL is held. > > >> > > >> The mTHP may be mapped at the end of page table. In that case, the = PTL > > >> will be held longer. Or am I missing something? > > > > > > no. this patch doesn't change when we release PTL but change when we > > > get PTL. > > > > > > when the original code iterates nr_pages PTEs in a large folio, it wi= ll skip > > > invalid PTEs, when it meets a valid one, it will acquire PTL. so if i= t gets > > > intermediate PTE values some other threads are modifying, it might > > > skip PTE0, or sometimes PTE0 and PTE1 according to my test. but > > > arriving at PTE2, likely other threads have written a new value, so w= e > > > will begin to hold PTL and iterate till the end of the large folio. > > > > Is there any guarantee that the mTHP will always be mapped at the > > beginning of the page table (PTE0)? IIUC, mTHP can be mapped at PTE496= . > > If so, with your patch, PTL will be held from PTE0 instead of PTE496 in > > some cases. > > I agree. but in another discussion[1], the plan is if we find a large fol= io has > been deferred split, we split it before try_to_unmap and pageout. otherwi= se, > we may result in lots of redundant I/O, because PTE0-495 will still be > pageout()-ed. > > [1] https://lore.kernel.org/linux-mm/a4a9054f-2040-4f70-8d10-a5af4972e5aa= @arm.com/ I thought about this again, seems we can cope with it even w/o the above pl= an by: + if (folio_test_large(folio) && list_empty(&folio->_deferred_list)) + flags |=3D TTU_SYNC; if a folio has been deferred split, it seems no sense to have the optimizat= ion for the corner cases this patch wants to provide. Only while we know this folio is still entirely mapped, we have this optimization. This should have reduced the chance to be quite small though we still have a bit. > > > > > -- > > Best Regards, > > Huang, Ying > > > > > The proposal is that we directly get PTL from PTE0, thus we don't get > > > intermediate values for the head of nr_pages PTEs. this will ensure > > > a large folio is either completely unmapped or completely mapped. > > > but not partially mapped and partially unmapped. > > > > > >> Thanks Barry