From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2953BEB64DC for ; Fri, 14 Jul 2023 15:42:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AA0888E000D; Fri, 14 Jul 2023 11:42:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A510A8E000A; Fri, 14 Jul 2023 11:42:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 918B38E000D; Fri, 14 Jul 2023 11:42:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 83FF08E000A for ; Fri, 14 Jul 2023 11:42:33 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 402481C82FC for ; Fri, 14 Jul 2023 15:42:33 +0000 (UTC) X-FDA: 81010634586.01.1E6BBC9 Received: from mail-qt1-f169.google.com (mail-qt1-f169.google.com [209.85.160.169]) by imf01.hostedemail.com (Postfix) with ESMTP id 4678A40035 for ; Fri, 14 Jul 2023 15:42:31 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=R34y9Lvm; spf=pass (imf01.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.169 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689349351; a=rsa-sha256; cv=none; b=EgPZluDwTGwtz71pHYY0W+yBRNKqfUNMDNJ4bGrcie+4eNpQM0q2MssaAslS7lcmqBj68r rzCPGzr4vzqCWv6I7nPzzLrCeXj+wAzr3LEi6MCpIFbg3Io0Bby5KdzwuCFEvc0mJAH1kg 6hB33sY9J7UoJZSRpIQoN+PY84oKKDU= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=R34y9Lvm; spf=pass (imf01.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.169 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689349351; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yUOclPqGS2aQ6MHj235Yhr7TmUiRO+nl0WDWI9QQ9Ws=; b=QCwIHc7EuLlXIZD67G33vjpckOl1mTeMcOfJ3M+xPr78+6YCo+wNYfmfNI3cgeevkmfcQy pqO1tcrnyvxWOFAaYGGxNfcXDxSiUv+HKoecsZr6y3SIG9ct07S1OB2qQa4/CHONf5VW8q 4MsMea0/VCldwwKqNI1k/A6YCS6NklE= Received: by mail-qt1-f169.google.com with SMTP id d75a77b69052e-401d1d967beso350521cf.0 for ; Fri, 14 Jul 2023 08:42:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689349350; x=1691941350; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=yUOclPqGS2aQ6MHj235Yhr7TmUiRO+nl0WDWI9QQ9Ws=; b=R34y9LvmWd8rhM/1V9nkkjg7dc4EbbBQIQCNU6+5KsYOXG/DMWLF69QlmALeS5D6wH 0PX8pyRIWNpciGpwEWqGvzW9vY7Z0vTOKQESvbp/MItD9+H2bzL4cKu75ZzhWvvf77Se vxz09dvSRDM8Kqqx8iiUYEfgDWvL/sXee1HYiuVq/rsaXjwLeHlZZAI9ahhxUox0OEZy 4pwct57RG0yOC0yRl9OXfjet3OFWr3rKwoYP/LhZRna51q/PejsVhTO4tcIctdWTlgGu nCV6Hq5VaU4dYwE74Ts0QOufTfTtZJ/L85kg3MSxloFFTTxelXstkm99FN2xQUpKSlOx LkvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689349350; x=1691941350; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yUOclPqGS2aQ6MHj235Yhr7TmUiRO+nl0WDWI9QQ9Ws=; b=BSkf9TI3msdiXRcTIKyC+f/4PC2Spz+kxLUYltUeN/snJ174rRAG7Qgx06s9D801gY BihDupLAHYqzG65BXTEFU2nXIXO1fBPre1frPJE+fA8DouSsjSZlyynCjEMmSIdgNEGV 8nRMQwSj9J3gCHtl5pBE1UP8UaAJgn8pRNg1HGgHAnJJpbo0Wap++EbjXErQjXRD+LRF O67Z9RPTimYebaKotE0XpHL/QIbd2Xh/Z59tadHX19u0VZDNBWCqHv+aXNOjBRhBKTR6 9wlq417KaS1bfPHRt+gSrdKXLKdauVdsthBFh7OGEDNLOdn3fVSkg48mBl9iGcYlhZfy ITsQ== X-Gm-Message-State: ABy/qLZTKqD8Tjp1FLUEMXjsTJ8260gCPpEOEGGMG0oI2jtBtqewnncE t1+Jq/nnqyAzNYYAoJ/5rVgZtxFhQEMENh546+SJBQ== X-Google-Smtp-Source: APBJJlEJsh2YD+PQahW32JWtx8xYtJjW7rkcxNtKBRVndaAtXeaf+VjmDByFoLPYlc/KRUSgYAcLVY7E+tkiwu0mRCY= X-Received: by 2002:a05:622a:38a:b0:3f5:2006:50f1 with SMTP id j10-20020a05622a038a00b003f5200650f1mr866730qtx.12.1689349350222; Fri, 14 Jul 2023 08:42:30 -0700 (PDT) MIME-Version: 1.0 References: <20230713150558.200545-1-fengwei.yin@intel.com> <7e09d69e-50a7-04c5-0a40-065e2d6fd658@intel.com> In-Reply-To: <7e09d69e-50a7-04c5-0a40-065e2d6fd658@intel.com> From: Yu Zhao Date: Fri, 14 Jul 2023 09:41:54 -0600 Message-ID: Subject: Re: [RFC PATCH] madvise: make madvise_cold_or_pageout_pte_range() support large folio To: "Yin, Fengwei" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, willy@infradead.org, david@redhat.com, ryan.roberts@arm.com, shy828301@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 4678A40035 X-Stat-Signature: z1roscqeup5es9f83qqic5kk3hn1nh1n X-Rspam-User: X-HE-Tag: 1689349351-819626 X-HE-Meta: U2FsdGVkX1+KFOghd5EUYMMsKHTLktD1qFoYSG6AznIzR6ZVHxrz7liSeXA4Byxk37i8mx9azMa1Y63NrA6pmKckA9tKxDmdyWUgQ23hlepqx+N7rn2O5cqbs4vXhZ3dS6QnI3oTm+inNnfm5JEKREmR63xhO+LNi7PKLN14St7Xmpod+AniCHzvl1P1PEtoXTgfeye0qDX1t6xdk7wt8KFIfbAMMbUSNgbNMrh3cb8y8oFCEq92UGaKgIJBC1Gu5K9L3kUZI/mWekjX5L0qGow5oYawMmg4XNGo03K1myTVFVNdWwPgjR+UztffAYfXh0EgouW9BlH+4nCW3zcbWdGtPha3fyfRAdzmySnNhXBP/U4j/HMAKaIcpY5pypjMzksrwcpUeGHKyJSLg6M0EaJ1fZ7/pxaKR/7qXiYEgCLKG5lG5P93o3+f2nKa7tq8IPA8U8kpo7q/PAUdao/ulXZDlHBGm3Jwlq/f3jrpQyhX3FibZilsm3dZT16kKKQopNqC4CYvHLKr5wGqmYcw6UPaKe3ZKNnPwPKlU9q6L7f6oecdt1SJjAagcZjOKwIIWERCC5MzvprVajolashnv+RPIobdr/ucYnTNlXxnisgft/PT8RXasGUF7z8+2WcJF7+pvnwv7E30XBeWpDAJTCXSNw3A3uWTMdie89E+gv++P8d2jtD+tgVM5x1OT/QNw+pMP8O/j5HdpFiQ7u/3bpiKiAdGvLNQgCOmhaQcjqutd1xLwQN5IGWjQpwMw8sUnrikoAoSE3krRkTujNRIJYA/ZKWeM56tvG/FuCj0/o/nrFciwEx8kM/XziK4uP9bndE5FapUUYimGRnGqvqACzRw/FG61/H/wwJxDNn+ZeJtTTbv6ZT26/5MqfHSerzsO0Gos6aF3WCxgYqeAKLIhEuCPBhsFcJFuAyD+R7fF+eoIWQCLZ7bKxjUWjP0/VQT+BJvrtrTOY2Di+w4Lnr So5x8b+A Ne6DHv7GGUafant63NnqakgQB+DDmbyk8uw7heQy4CfpbooZ+iBjkacF7AfHDmtNmdxszCDyQRVyLtYTixbHsmpd9tKw7vzV9e/Ucs5RfNSCOBnHSDHNJ6I0WMb3fknZerbeCacFTth49yrTuMlqEW+P+qgolH7I7d+u5KRVHgFI/ZrEKKu/MVsRfQ5tVMmNOm6ChUwzJYcJGjI8kdL6bRUA6kWgrPzDe3z8PKhklhZLMmNvZZiCs03lRFbxyaGClwV4ljrZrpN0raFVnXUHP9pWIVQ40i24fNAZ7QH1xIdKyjPk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jul 13, 2023 at 11:57=E2=80=AFPM Yin, Fengwei wrote: > > > >> - if (pageout_anon_only_filter && !folio_test_anon(folio= )) > >> + /* Do not interfere with other mappings of this folio = */ > >> + if (folio_mapcount(folio) !=3D 1) > >> continue; > >> > >> - VM_BUG_ON_FOLIO(folio_test_large(folio), folio); > >> - > >> - if (pte_young(ptent)) { > >> - ptent =3D ptep_get_and_clear_full(mm, addr, pt= e, > >> - tlb->fullmm); > >> - ptent =3D pte_mkold(ptent); > >> - set_pte_at(mm, addr, pte, ptent); > >> - tlb_remove_tlb_entry(tlb, pte, addr); > >> - } > >> - > >> - /* > >> - * We are deactivating a folio for accelerating reclai= ming. > >> - * VM couldn't reclaim the folio unless we clear PG_yo= ung. > >> - * As a side effect, it makes confuse idle-page tracki= ng > >> - * because they will miss recent referenced history. > >> - */ > >> - folio_clear_referenced(folio); > >> - folio_test_clear_young(folio); > >> - if (folio_test_active(folio)) > >> - folio_set_workingset(folio); > >> +pageout_cold_folio: > >> if (pageout) { > >> if (folio_isolate_lru(folio)) { > >> if (folio_test_unevictable(folio)) > >> @@ -529,8 +542,30 @@ static int madvise_cold_or_pageout_pte_range(pmd_= t *pmd, > >> arch_leave_lazy_mmu_mode(); > >> pte_unmap_unlock(start_pte, ptl); > >> } > >> - if (pageout) > >> - reclaim_pages(&folio_list); > >> + > >> + if (pageout) { > >> + LIST_HEAD(reclaim_list); > >> + > >> + while (!list_empty(&folio_list)) { > >> + int refs; > >> + unsigned long flags; > >> + struct mem_cgroup *memcg =3D folio_memcg(folio= ); > >> + > >> + folio =3D lru_to_folio(&folio_list); > >> + list_del(&folio->lru); > >> + > >> + refs =3D folio_referenced(folio, 0, memcg, &fl= ags); > >> + > >> + if ((flags & VM_LOCKED) || (refs =3D=3D -1)) { > >> + folio_putback_lru(folio); > >> + continue; > >> + } > >> + > >> + folio_test_clear_referenced(folio); > >> + list_add(&folio->lru, &reclaim_list); > >> + } > >> + reclaim_pages(&reclaim_list); > >> + } > > > > i overlooked the chunk above -- it's unnecessary: after we split the > > large folio (and splice the base folios onto the same LRU list), we > > continue at the position of the first base folio because of: > > > > pte--; > > addr -=3D PAGE_SIZE; > > continue; > > > > And then we do pte_mkold(), which takes care of the A-bit. > This patch moves the A-bit clear out of the folio isolation loop. So > even the folio is split and loop restarts from the first base folio, > the A-bit is not cleared. A-bit is only cleared in reclaim loop. > > There is one option for A-bit clearing: > - clear A-bit of base 4K page in isolation loop and leave large folio > A-bit clearing to reclaim loop. > > This patch didn't use it because don't want to introduce A-bit clearing > in two places. But I am open about clearing base 4K page A-bit cleared in > isolation loop. Thanks. Sorry but why are we trying to do multiple things in one patch that I assumed is supposed to simply fix madvise() for large anon folios? And none of those things seems to have a clear rationale behind it. The only patch that makes sense at the moment (or the first patch of a series) is what I said before: - if (folio_mapcount(folio) !=3D 1) + if (folio_estimated_sharers(folio) !=3D 1) And probably clarify (preferrably in the comments above) this is an estimate because we think it's a better tradeoff if we do so (less code/overhead from checking the mapcounts of the rest of folios within the range).