From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA61BEB64DD for ; Fri, 14 Jul 2023 02:09:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 086508E000C; Thu, 13 Jul 2023 22:09:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 036FA8E0001; Thu, 13 Jul 2023 22:09:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E19938E000C; Thu, 13 Jul 2023 22:09:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CF54B8E0001 for ; Thu, 13 Jul 2023 22:09:17 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 90AA3A0400 for ; Fri, 14 Jul 2023 02:09:17 +0000 (UTC) X-FDA: 81008585154.22.E9BA29F Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) by imf15.hostedemail.com (Postfix) with ESMTP id CFDEDA0007 for ; Fri, 14 Jul 2023 02:09:15 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=MkLXXmdM; spf=pass (imf15.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.170 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689300555; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yaoyMod4YhQet4GlKsQo7SoI2O5v7F0iNZz73/v59vU=; b=QOXfNCaiCuV7000c2g66QHltXriJgn8l03qK0lu4b8TfoXfTJFrIXs0in+hl18dQBuih0X WHhIF0HX3kNEJzrrZZFUmaB3kBQXLN7HAQ0D4HlpD9yA0TLAMisF33RYgzdiGL0GIPJchx CEfHEiwAPFsqMxeKCvm0T7m8NtL0trE= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=MkLXXmdM; spf=pass (imf15.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.170 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689300555; a=rsa-sha256; cv=none; b=pa99qXw0bPqp3b2U+oz4D568EwBw8jg0BXJvXaRcGANH+JW9AK3esVVkKG7nF0Xm58MKzR 7RGSSqbr5rSnjuiPoogE4cViRUREu5DOLJYMMB4ia/L0ckYPR3Reifrlr7UrosdO1Kzzdu Cs1BnKuE9dPUL2k5797jBuAhAjWxW5I= Received: by mail-qt1-f170.google.com with SMTP id d75a77b69052e-4036bd4fff1so173821cf.0 for ; Thu, 13 Jul 2023 19:09:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689300555; x=1691892555; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=yaoyMod4YhQet4GlKsQo7SoI2O5v7F0iNZz73/v59vU=; b=MkLXXmdMYO5MjdINzkKT/vpvpgCKquPcbnWMJh7ZVyVppPkKOUFhF7vkNfx3GDSD2Y o3ytngfQLVehiOeiE7z84FEolHR4lnKs8Js+X8hnVPfFu5aocs88eACuhb//HqpkTxPJ E1B39L1s2E/FDIUjQQ4Ji4Iy6avLEF5IbWf4jjFvmVpE0h8UWefuQal7EetYynazW5Ln J3yhCxh29d3ibhvvX6/qJgJ+5RDB3YLAZBE3O8BhP68Gm4f5uyrGcw8Ht4MmZKXxXAL2 Hhsg00KWx59mvRtMR7xzbqXOGUR7dQGDCv/YN0ohetR/3UVMH5Gwp+1ZezBH18jb41M4 pj4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689300555; x=1691892555; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yaoyMod4YhQet4GlKsQo7SoI2O5v7F0iNZz73/v59vU=; b=ITsk+17z4nMkDQNryBSUl/AaBelF2f+I8AEWiGwB4gg5dxrhcaGnHoucKhw0OUzqv/ W8aYjt7ytoRCYHfo48f1SyfHlverBZF6jIts/D7wwq3NAMuVy4H5Ey+hV8DrzAGH/mzt GP6bM54ofr2+SG/801FMqNpLT9Z/v+9W9jkfeYWBpa18Qiu5qDK8XIRCcihY+QwxF3xL egMUu0tesWVCGQ5S/jIBmqywzNYF7ySaZU6Ff3pobJGV0DFEHHic/eVoSSI92HjmEI3z BmH9rY81nEDT16ijfgZFgxFN9AcnFpSYLI9vIUryevffFK5vTQ4DxtGSycK4S/P6Xhcn ckZw== X-Gm-Message-State: ABy/qLZIjKjDacxTTLiuTWbVGNwssQlF0MeOKgXjKiWTbFgxWQu7mTKB 24AeII/wINnVjWS+UXZGZqLEx/46CnxUfNY7sIBgCA== X-Google-Smtp-Source: APBJJlEPLV0bF4tbzek2jRVF22PTIoNWTDMx7dndC/6IdnzTfMMK2pzf/pL0r/5uHf0J6qv/hb/b+lg1WYFjI43wJ0I= X-Received: by 2002:a05:622a:394:b0:3fa:45ab:22a5 with SMTP id j20-20020a05622a039400b003fa45ab22a5mr708987qtx.27.1689300554862; Thu, 13 Jul 2023 19:09:14 -0700 (PDT) MIME-Version: 1.0 References: <20230713150558.200545-1-fengwei.yin@intel.com> In-Reply-To: <20230713150558.200545-1-fengwei.yin@intel.com> From: Yu Zhao Date: Thu, 13 Jul 2023 20:08:38 -0600 Message-ID: Subject: Re: [RFC PATCH] madvise: make madvise_cold_or_pageout_pte_range() support large folio To: Yin Fengwei , Minchan Kim Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, willy@infradead.org, david@redhat.com, ryan.roberts@arm.com, shy828301@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: CFDEDA0007 X-Rspam-User: X-Stat-Signature: 69qgu3ypxg5r3ugq859q73mdms8fp5kk X-Rspamd-Server: rspam01 X-HE-Tag: 1689300555-17840 X-HE-Meta: U2FsdGVkX194ApaH11JY4sMLd7DNeVk6aFocJqdmiJYk0DgvE0r7iIIgUqr7WOx/wCUvjCcY3FZUPORbaHDOHNcdV+HcQ2JCt9gkxjsmyD6zDi8UH+K2rFBIYmZFff0eSV0PR59vwI+RpnnaglqkDUAXpZglYjZPOWjw1iI7Iuyb7eTyT6seZlo1VpP8Ab/Gxd48VmrV28/4BpOrUBBVjewMW+DX9PW9LO1EcxXb3M6+UFCwwsFim306t0LHAAxGdmejIAFzPnZt5tSkDDvjcCloYPDlXMmiNtJVlCWrELrPw46EUHAEMTQlBIuASZFHiRDdeLpBupbB4xaZZH2u6Df04eVEsC6tTdYoKRA4bB3txZETRg0VQzR44DrfKoqi+S09bLJ1PeTCNByroOVRKzNfJZZXWoJDTw8HKwcq9juyn3NqxgdRLgtOL5ADA2HhRx3SkXwbnhfr1uaAi/+7x9zv9qoEcba3cIE3Zay53X8ebFKAGfAgMIE/lCcE6F6b7Ei2C14TserLHiSGzJAwwqXNXdnuKQZ98Yt+IwUntZK/nlg90mhEfQQt61w1XCfj0psSiH4rSQv/kr7txdpajM+BYn7hLaosrRwDpb8bzGHQQ9GYNOAUhIxx2Hla/f8l2JdpYqoai9pvXTM0LKK1bJYaEpKaZs9+hwmRcxRBlWhzXr1yb2Nf8DA+SwFd1cR7+ckSxjrV4YqZyxZpMYHgeFfJzIa6Wyd12goJX3bnw+3VY37ESEqcsSQ7GNu//dnX+8B5hCT6fP1b0zl3vC9S0WNfET0147Pwv1U+YLwjfWOI+dk52aLS73GW1/ya601Nef3Ea7M+JH3e7P4YMctnKj3xENZjhZVscIvx5P/XQ3ajoXXZpGQ8oA61X6b1BNf11qnafX0iBXMmDXYl1/NqhX+5RndielvlSfsSTr5AwUk/e72LvYFcCKPXb5wOrLeOl01W6afYz/pdX544EUn SK9I7m5u 3NVzlvnJ4wkjg9SF+dJO6yY63SF8qjlC1eJ/3AnPVjVRM7RPdReljcTyp2IkkRpue7qz22SoGumFlVtWCOEB7+MYJkRwFiewz/9In9IvhKXR2Li4mfVeHSlsYJW3bMgT2ja2Dq2mCo1NRujIxW95XJ8P6vh5whGrzd2u5U7Yv3TJLzL9j9VMouSHzHMRMD4PzmPqalMb7qByp7LRiAYa5CYUz92u/GrvtAN88Jvjb07tsEDQ6tHW6OEYleMIWDCMA2nRcGmsvW7T69Df0EFZutMTD8u8+VZozoDRTcwV4Zfh4qqk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jul 13, 2023 at 9:06=E2=80=AFAM Yin Fengwei = wrote: > > Current madvise_cold_or_pageout_pte_range() has two problems for > large folio support: > - Using folio_mapcount() with large folio prevent large folio from > picking up. > - If large folio is in the range requested, shouldn't split it > in madvise_cold_or_pageout_pte_range(). > > Fix them by: > - Use folio_estimated_sharers() with large folio > - If large folio is in the range requested, don't split it. Leave > to page reclaim phase. > > For large folio cross boundaries of requested range, skip it if it's > page cache. Try to split it if it's anonymous folio. If splitting > fails, skip it. For now, we may not want to change the existing semantic (heuristic). IOW, we may want to stick to the "only owner" condition: - if (folio_mapcount(folio) !=3D 1) + if (folio_entire_mapcount(folio) || + (any_page_within_range_has_mapcount > 1)) +Minchan Kim Also there is an existing bug here: the later commit 07e8c82b5eff8 ("madvise: convert madvise_cold_or_pageout_pte_range() to use folios") is incorrect for sure; the original commit 9c276cc65a58f ("mm: introduce MADV_COLD") seems incorrect too. +Vishal Moola (Oracle) The "any_page_within_range_has_mapcount" test above seems to be the only correct to meet condition claimed by the comments, before or after the folio conversion, assuming here a THP page means the compound page without PMD mappings (PMD-split). Otherwise the test is always false (if it's also PMD mapped somewhere else). /* * Creating a THP page is expensive so split it only if we * are sure it's worth. Split it if we are only owner. */ > The main reason to call folio_referenced() is to clear the yong of > conresponding PTEs. So in page reclaim phase, there is good chance > the folio can be reclaimed. > > Signed-off-by: Yin Fengwei > --- > This patch is based on mlock large folio support rfc2 as it depends > on the folio_in_range() added by that patchset > > Also folio_op_size() can be unitfied with get_folio_mlock_step(). > > Testing done: > - kselftest: No new regression introduced. > > mm/madvise.c | 133 ++++++++++++++++++++++++++++++++------------------- > 1 file changed, 84 insertions(+), 49 deletions(-) Also the refactor looks fine to me but it'd be better if it's a separate pa= tch.