From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 375F3C54798 for ; Tue, 5 Mar 2024 09:08:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B8D056B00A9; Tue, 5 Mar 2024 04:08:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B3E126B00AA; Tue, 5 Mar 2024 04:08:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9DFF46B00AB; Tue, 5 Mar 2024 04:08:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 8B8226B00A9 for ; Tue, 5 Mar 2024 04:08:48 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 364701C016E for ; Tue, 5 Mar 2024 09:08:48 +0000 (UTC) X-FDA: 81862410336.29.2E42AFD Received: from mail-vs1-f44.google.com (mail-vs1-f44.google.com [209.85.217.44]) by imf16.hostedemail.com (Postfix) with ESMTP id 784CF180004 for ; Tue, 5 Mar 2024 09:08:46 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=JiiaLCky; spf=pass (imf16.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.44 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709629726; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=u3UP6MqDIx5gIB8gV7yO8wiO4y+LssVv4w12YoavVWE=; b=fzbU7J2zoME037p4xEQrUgEf3Qf82mpnE3BId+K/NARD0IfjaaFrP4yzmdNFqe3seupj6q gJU6xWfF5jdckyVW6Edi4tUH0QMb4bS7Pk7OlJsLnfuQl+99KDkxCqClw08sCXVH1g7wLm /+37CNxin/2l9to8AVIPU3p8lGDRvzo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709629726; a=rsa-sha256; cv=none; b=QkYejnq0E4mLR0icEjiENKKOvsGi66Dpbip06Xb2yREs2U9Sq8GVAtSMlOURbc4Hgkf9Xe O1VJrVU93QgrbPVvA+8pL1SkWfv5q0ed6RuB8a2TxJ7RfRkSAbVoihIcv9Mb+xpXmb8f98 z/FquMUSHr/sa+pgVCVj1qgRr5OeNrA= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=JiiaLCky; spf=pass (imf16.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.44 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-vs1-f44.google.com with SMTP id ada2fe7eead31-4726608233fso827063137.2 for ; Tue, 05 Mar 2024 01:08:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709629725; x=1710234525; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=u3UP6MqDIx5gIB8gV7yO8wiO4y+LssVv4w12YoavVWE=; b=JiiaLCkyCsdCRmGRAJhTnIeTr93KXcytoUQXMkpoqj5Cb1jwIuWnaMfb8/BvTEF41W kV2LZI275RQKfxnPC6kwspFQRAI+6goeXFmMFfpT8iKFS+dpJe+t3+g51WGWMz5p2ri1 PLeDnGoqPgUkQWMJgkiFRnnAJd89YNTJVFx67tCTiewiwYZnfq29aW8ed7TzxJZ+y+Ft Nd2mjwgMldZzIn61y6gio88aR48dFYOkZ1NLD3Ah83kWzgymjNAw2xYmDl/Q/xNXE5Eh WPiw6/dKiVsKNWjmVYxW/qvKKkf3+sTvywoN85+m6n3et4XQxHBVL9wIHSup6Autk3+Y 19Nw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709629725; x=1710234525; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=u3UP6MqDIx5gIB8gV7yO8wiO4y+LssVv4w12YoavVWE=; b=keFsc27D5HOCGiXXNrepEtGnEqmFSq6nT4at4rpN2JVYre4s/wzNgEHGWLiupV6wRV hR+ZYiaCvKcXcgX31TaUxTDzQ5k4hdfv+LvEFZ/dy9PydlA74LkMWFAR24btTLDVOaWX D/a2xVJ5hMcwUV7okg+HVJais+qS1P/t3JaxAJGptKoIlNg4fP2n2hnUR2t9t8OP55bB O9ItSwfJZOzoKvGx0f+CVzquThoQvUmRAreK1fKJPDjtmYe68da4/0jbkcMJp4zS2LSx OvP2Rwlc6P9mBr641VrQrrduPqhb9SQ/2iHpyUrCKJpuDoS1s1U/1qkuLogfN5/1xJLV iKmQ== X-Forwarded-Encrypted: i=1; AJvYcCVEiKamKXYMBFkXAt/HVH5K06KmxCqoZGnq4HU2PtYPRPRez27AcXsg+uAbESr1Pi+CNkXvvNWzu+cN5rn8200yY9k= X-Gm-Message-State: AOJu0YzW/9K59Fxi/oUdox2BEulzGTONGkBwOJjwSRC81pD0c/BPOlQP 8DuNeLWtN0vEJT1mG6M+h+VPYXEAKoWHnJzMoGMxM8dfDHu5zrc2/k8kuIWvzBBIaHo0Ds8Imm2 6yo13Q4zY0FYNHyMii8fPdZ4Dcjc= X-Google-Smtp-Source: AGHT+IGGNn6I7Iux67fpLD0A4nxfGqtlFXrBviwt/csf2bAXF45nEz9/NJSFVh4k9T+BpXB30GTtfPnm9KBuMiTxpOQ= X-Received: by 2002:a05:6102:c07:b0:472:7b61:70c6 with SMTP id x7-20020a0561020c0700b004727b6170c6mr1290749vss.2.1709629725473; Tue, 05 Mar 2024 01:08:45 -0800 (PST) MIME-Version: 1.0 References: <20240304103757.235352-1-21cnbao@gmail.com> <706b7129-85f6-4470-9fd9-f955a8e6bd7c@arm.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Tue, 5 Mar 2024 22:08:34 +1300 Message-ID: Subject: Re: [RFC PATCH] mm: hold PTL from the first PTE while reclaiming a large folio To: Ryan Roberts Cc: akpm@linux-foundation.org, linux-mm@kvack.org, david@redhat.com, chrisl@kernel.org, yuzhao@google.com, hanchuanhua@oppo.com, linux-kernel@vger.kernel.org, willy@infradead.org, ying.huang@intel.com, xiang@kernel.org, mhocko@suse.com, shy828301@gmail.com, wangkefeng.wang@huawei.com, Barry Song , Hugh Dickins Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: nayo4yyztcgpgcafr6hhniidkznu3i4u X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 784CF180004 X-Rspam-User: X-HE-Tag: 1709629726-545293 X-HE-Meta: U2FsdGVkX19ljUkl8V+mRiXbBuyERuFlBu68mvgA7+wh3NYeOZyFzZiPQda3jv3upLNt/1wmjpGglA0GvcpSMzRhgUG8zuxMTRvjfeneNlWGP9RoIsi8QINz12DoRoiSQ4pFg59g6Ye03slOkMKm54cZk4lvjHT87FbevzdO438DTblM4tbdq7LX8HvB8TiZrn+0QiN/heu8lEzJ/JKandcCZuSSwu1q94Ha8kLYA1s2dwy2PabrwKXgZC/Ky143ebu9zN1g7ujA1UfuH1MtBH7MxpRe3Z49Due+RDe8LxDlw8WiSmIuI1yDWNabpRyvjtOeSYImJcTXGrOcuxidVriQzAKZ0cdSj3pQ453hSoOkamKBF4WlZ2nywGmuoOxZ+dT9havDWKABBfxUlPJLcsSLq4D0qPnlFbPaZUWV/HnOtAW4a3/GCa5aURAmxdusUnG0Ts/72vvjP8LTYppbgnVHjBBkcui1OWCmsd24nhiqmvu0CS4DW/weXJrRACCE2G+9T1m2AgK13R0YKeEHEOoxkWAV6s0i2YVBn9zos6vDewxP04tom7VeIm8Qf7gmBnRCYgv/XUg3Nw7fivahifE1UcVm5PxlMaRi9mjCDLgYbV99Vl8sns362fhwcc7Fdqvh5VKN5OfLJag6LQ1gtzYfqGWAaO2a03eHTBVAdhKHOL8D5Yd81lplTwFzoOZABXE4XGkrz47hpbtZ6H2LDStvGeHKIG+EewR8dJWOoRhROt6SY1XOn3s4xgJnfd3g9ghAIgurFYueK5N9FbaTR4ECi/tE3G6zjc1ipenXVoC1OwVYnrEpQSFwek65E+HsJxHOSTyFUXMM0cRT5fbw4ELjnYey9jZqxwNexEjK8Gr6z5LoDnLB1ipXf1RbTjCUoWorN+4TIebSwaBSUt3/ORHJ0Vr15stjUBybq0AICq3Z5QGYQRjv3QB5issPbXpKFvWdS1OZBFIsmXVwwF+ CJ5X0qSB P4MA6+vKtYjjAydbRxxHF480jDFKtUM893Hcn X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Mar 5, 2024 at 9:54=E2=80=AFPM Ryan Roberts = wrote: > > On 04/03/2024 21:57, Barry Song wrote: > > On Tue, Mar 5, 2024 at 1:21=E2=80=AFAM Ryan Roberts wrote: > >> > >> Hi Barry, > >> > >> On 04/03/2024 10:37, Barry Song wrote: > >>> From: Barry Song > >>> > >>> page_vma_mapped_walk() within try_to_unmap_one() races with other > >>> PTEs modification such as break-before-make, while iterating PTEs > >>> of a large folio, it will only begin to acquire PTL after it gets > >>> a valid(present) PTE. break-before-make intermediately sets PTEs > >>> to pte_none. Thus, a large folio's PTEs might be partially skipped > >>> in try_to_unmap_one(). > >> > >> I just want to check my understanding here - I think the problem occur= s for > >> PTE-mapped, PMD-sized folios as well as smaller-than-PMD-size large fo= lios? Now > >> that I've had a look at the code and have a better understanding, I th= ink that > >> must be the case? And therefore this problem exists independently of m= y work to > >> support swap-out of mTHP? (From your previous report I was under the i= mpression > >> that it only affected mTHP). > > > > I think this affects all large folios with PTEs entries more than 1. bu= t hugeTLB > > is handled as a whole in try_to_unmap_one and its rmap is removed all > > together, i feel hugeTLB doesn't have this problem. > > > >> > >> Its just that the problem is becoming more pronounced because with mTH= P, > >> PTE-mapped large folios are much more common? > > > > right. as now large folios become a more common case, and it is my case > > running in millions of phones. > > > > BTW, I feel we can somehow learn from hugeTLB, for example, we can recl= aim > > all PTEs all together rather than iterating PTEs one by one. This will = improve > > performance. for example, a batched > > set_ptes_to_swap_entries() > > { > > } > > then we only need to loop once for a large folio, right now we are loop= ing > > nr_pages times. > > You still need a pte-pte loop somewhere. In hugetlb's case it's in the ar= ch > implementation. HugeTLB ptes are all a fixed size for a given VMA, which = makes > things a bit easier too, whereas in the regular mm, they are now a variab= le size. > > David and I introduced folio_pte_batch() to help gather batches of ptes, = and it > uses the contpte bit to avoid iterating over intermediate ptes. And I'm a= dding > swap_pte_batch() which does a similar thing for swap entry batching in v4= of my > swap-out series. > > For your set_ptes_to_swap_entries() example, I'm not sure what it would d= o other > than loop over the PTEs setting an incremented swap entry to each one? Ho= w is > that more performant? right now, while (page_vma_mapped_walk(&pvmw)) will loop nr_pages for each PTE, if each PTE, we do lots of checks within the loop. by implementing set_ptes_to_swap_entries(), we can iterate once for page_vma_mapped_walk(), after folio_pte_batch() has confirmed the large folio is completely mapped, we set nr_pages swap entries all together. we are replacing for(i=3D0;i Thanks, Ryan