From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9BD01C3DA4A for ; Mon, 19 Aug 2024 21:56:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2D7876B007B; Mon, 19 Aug 2024 17:56:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 260676B0082; Mon, 19 Aug 2024 17:56:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0DA1F6B0083; Mon, 19 Aug 2024 17:56:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E108A6B007B for ; Mon, 19 Aug 2024 17:56:07 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 8BD861615CF for ; Mon, 19 Aug 2024 21:56:07 +0000 (UTC) X-FDA: 82470353574.12.DB7F43E Received: from mail-vk1-f174.google.com (mail-vk1-f174.google.com [209.85.221.174]) by imf19.hostedemail.com (Postfix) with ESMTP id B7BCB1A000E for ; Mon, 19 Aug 2024 21:56:05 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf19.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.174 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724104508; a=rsa-sha256; cv=none; b=KyJBBWeJoLU+CBKA+OZKQYofQFAZfvVDPTYKe5Vsdnv074xfPtv8ZszME9DkrH7O++lRwI f+vALA+lwyW9LGmSQ3LTydpPEO0Y+D3gztB4KfR71CUeDNWY8hDiCGhPE+HiJCG4K+H8Fn 840zZdCHmVFGxIkfh2pnEH8mJAtkTvw= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf19.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.174 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724104508; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=68jtH1gMT0hvORY1A0IXRpMKwJhK5A/kT7YX0AeFdQk=; b=hL1ctdIQbGOIAtsuWjF2xFotZaxFa09fhBFPTjCL59DG6vkMainfDk4Azh8T/4WWtoF2w3 iSwivogp0Ztc1WrObgPKJfe/jBXT0xe1EimD8ECDy/4OEdjwcJSt5zHZ0q+Y1nt5dJoaFP PniiLL1ir925iNu/GiQvwkKXmnQXo/o= Received: by mail-vk1-f174.google.com with SMTP id 71dfb90a1353d-4fc7bd8763cso806426e0c.2 for ; Mon, 19 Aug 2024 14:56:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724104565; x=1724709365; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=68jtH1gMT0hvORY1A0IXRpMKwJhK5A/kT7YX0AeFdQk=; b=O55iEo1Rz7bU2ixWqd5xFZcBSg0KBmVe9weZw0523/3d9ntmGB5iVgnZo4J1lvdeK8 9cxwoa/YmdukTUdVcUcQ6L1B9kNNS3foIBEPbVtN1gDxgueTuPQLfIxUP4KqRV7o1Yya n7Ty2RQEypBUJyWWr8hBJcECe3zPpLRWNuF8/VuDvNYg5qvEkRB+GUu45EdK+rPcs8pY CZMdRtXEU6qo0Cb1l7W6jy0OK8n4Ze7IohH8Ow1Sj61nkY9KE+TcXU+Ogccj3+SiouNX yz4AQsunwdcqFw/0A0x7MbwE6ut/Q2bp/kNBBuM3AuJcIHlwTnWrGOUVeGfOUVhQjqT2 hoBw== X-Forwarded-Encrypted: i=1; AJvYcCXkICqw09byAN3PQKB1Qv5Egt66l0pvdNJxcHrq2DbGEpVwN1IcFwYsKEwgTMZpUlWp0Z78R9IZYtl2soCj1alDnFQ= X-Gm-Message-State: AOJu0YywF/C5OWvnu/HM3uZ6fwy0r72NrizGxNoW7Kxx0TgATZfetr0a Uy12iFgZulrg/wpkFVHwqf9U8rUjHuD2fziBFJC/Sdn/wcC8UeWNXkgSNH/o7Pv+OADTFbPqXxg czdQu/bSDtdD12rPixy3PoYF1rQg= X-Google-Smtp-Source: AGHT+IEZIdw8wpoXSQOM21Stk7hHbJ3kSdIpE6PCXQlMRY/gGlOQJDe11aHF960J0st3NKJdV4kLVgMJRI2OLtGqyp4= X-Received: by 2002:a05:6122:2193:b0:4f6:b094:80b1 with SMTP id 71dfb90a1353d-4fc6c9fff96mr10350034e0c.11.1724104564640; Mon, 19 Aug 2024 14:56:04 -0700 (PDT) MIME-Version: 1.0 References: <20240819023145.2415299-1-usamaarif642@gmail.com> <20240819023145.2415299-5-usamaarif642@gmail.com> <9a58e794-2156-4a9f-a383-1cdfc07eee5e@gmail.com> In-Reply-To: From: Barry Song Date: Tue, 20 Aug 2024 09:55:49 +1200 Message-ID: Subject: Re: [PATCH v4 4/6] mm: Introduce a pageflag for partially mapped folios To: Usama Arif Cc: akpm@linux-foundation.org, linux-mm@kvack.org, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, ryncsn@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: B7BCB1A000E X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 3mgas8xrh7rcadg6myqmehwdbydsn1j4 X-HE-Tag: 1724104565-93029 X-HE-Meta: U2FsdGVkX18JyfSjur9hwV+bUD42o8KWm1w/d/ntIuMopn3DN2L7u3hRx9kk6mslnMRi/0ecVLjvgzEzoqUb4wiEUQzwFhMETQtC7p1Hne11YTbQhBNvpTWGx7IQet3Kb9Xcn69oEVyQw1H6rK2g/LLBsDhuBiQbQA0f83VyYRwr8U7gXGWeC7TTO/sMjpPNKZs3mjuJA1IzxRNUGiu4ALgRgTwXXTXT80EtywOLAjsoAxhI2lS8m2Dbi5qhu6McnBfoAnIY4eyiA2q+6d5ZUn2He9st7r5jB0bIK7jJAUx37NYWQMYuyk/yus1q/KK3ziJHPvVuJwALsog29EVWDL8BJc51ouk7snmbBV0gYJnLmf3fms3vOueAJLDf3oV99W4Kk4RWL++WogyodN4HmPOhKE8J5rsVYvtQ2dvCyHiQl7imzCMtKn8xWLRP0Naw6CPLHtfs15ONgMIt9xALCxYrW8N+5D96a83kWAGygt+mamOSB4b3Wjq2E5hu7GG3Gg5gfFko5MwOD5JGY6MuTnoNycsH00oAqUQ6OwzGa6jKDn+rZyx9kXteQpUh/LAvCMw//l+D2nmDJsSjpujn6j6kFt4+chsznFv3nHe4zXnIdKQyzGEzTPzRjXtoSKHYcLBwDC4dOzbSo4XFgIW91VOuTs+hxKt3A0NW4I+NjlTAzpEjGFcquWDh8RmzbmoRYusz3laHBmCCKjPb8M3DLR5O2u8DT91Rn5ycSJcH0qgNyxL7DOBVGtmYFOuWFjtlZfdHzIEJPaHT8zhSnva7n7wzM3N79b4C6gtyXspYPRZvNVupufyBwe6J1O3+5xrg2xlJ4eqUo4GYujUyJ5XDKJ7qclLIfhitZbmANKq146S8bWcn6oETRXSJiH2xSu/+JJuYqIVIoAfREjsy++97id0ldwd4kFjgXIX6ij7MVmroLT4zjbJxCaYO61OZJYMrPCxlitrJ9cE7VsgmrHT giS+jEwU MXY0wnbQw3i9SdgvtsHqKlS+zDlT8OWraWs3tHAgd8dCF2ZyEYvhOggpJpAjYAgXepsoYq0zjuK0awA5hJNJQlgmUq5XgHAKj7eDlxqoBsvIoNpeReBQHIdoxFHBKUvf3U2hILXfVmYzeOJDohT161zh576W5NiIRQu60GToV2XIiGyozGI0B6eq0nBehufiAVUA94Eyz+YlaCRrxAFcYO71mq+eYuXnd3y3Fkg6hHyZ7ayvSTCKbvo0WDJqxFaHZ2V6QnvW8HTQY7CRxdLpSQGQRJUSw3NRJTxkXP9wuYB9W93LgcZdaxmr+yass2Ty9PQbjKIF7Ww0PB+UcME0CLwfwKCDa759fCa3GrbXhEV7KWGc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Aug 20, 2024 at 9:34=E2=80=AFAM Barry Song wrot= e: > > On Tue, Aug 20, 2024 at 8:16=E2=80=AFAM Usama Arif wrote: > > > > > > > > On 19/08/2024 20:00, Barry Song wrote: > > > On Tue, Aug 20, 2024 at 2:17=E2=80=AFAM Usama Arif wrote: > > >> > > >> > > >> > > >> On 19/08/2024 09:29, Barry Song wrote: > > >>> Hi Usama, > > >>> > > >>> I feel it is much better now! thanks! > > >>> > > >>> On Mon, Aug 19, 2024 at 2:31=E2=80=AFPM Usama Arif wrote: > > >>>> > > >>>> Currently folio->_deferred_list is used to keep track of > > >>>> partially_mapped folios that are going to be split under memory > > >>>> pressure. In the next patch, all THPs that are faulted in and coll= apsed > > >>>> by khugepaged are also going to be tracked using _deferred_list. > > >>>> > > >>>> This patch introduces a pageflag to be able to distinguish between > > >>>> partially mapped folios and others in the deferred_list at split t= ime in > > >>>> deferred_split_scan. Its needed as __folio_remove_rmap decrements > > >>>> _mapcount, _large_mapcount and _entire_mapcount, hence it won't be > > >>>> possible to distinguish between partially mapped folios and others= in > > >>>> deferred_split_scan. > > >>>> > > >>>> Eventhough it introduces an extra flag to track if the folio is > > >>>> partially mapped, there is no functional change intended with this > > >>>> patch and the flag is not useful in this patch itself, it will > > >>>> become useful in the next patch when _deferred_list has non partia= lly > > >>>> mapped folios. > > >>>> > > >>>> Signed-off-by: Usama Arif > > >>>> --- > > >>>> include/linux/huge_mm.h | 4 ++-- > > >>>> include/linux/page-flags.h | 11 +++++++++++ > > >>>> mm/huge_memory.c | 23 ++++++++++++++++------- > > >>>> mm/internal.h | 4 +++- > > >>>> mm/memcontrol.c | 3 ++- > > >>>> mm/migrate.c | 3 ++- > > >>>> mm/page_alloc.c | 5 +++-- > > >>>> mm/rmap.c | 5 +++-- > > >>>> mm/vmscan.c | 3 ++- > > >>>> 9 files changed, 44 insertions(+), 17 deletions(-) > > >>>> > > >>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > > >>>> index 4c32058cacfe..969f11f360d2 100644 > > >>>> --- a/include/linux/huge_mm.h > > >>>> +++ b/include/linux/huge_mm.h > > >>>> @@ -321,7 +321,7 @@ static inline int split_huge_page(struct page = *page) > > >>>> { > > >>>> return split_huge_page_to_list_to_order(page, NULL, 0); > > >>>> } > > >>>> -void deferred_split_folio(struct folio *folio); > > >>>> +void deferred_split_folio(struct folio *folio, bool partially_map= ped); > > >>>> > > >>>> void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, > > >>>> unsigned long address, bool freeze, struct folio *= folio); > > >>>> @@ -495,7 +495,7 @@ static inline int split_huge_page(struct page = *page) > > >>>> { > > >>>> return 0; > > >>>> } > > >>>> -static inline void deferred_split_folio(struct folio *folio) {} > > >>>> +static inline void deferred_split_folio(struct folio *folio, bool= partially_mapped) {} > > >>>> #define split_huge_pmd(__vma, __pmd, __address) \ > > >>>> do { } while (0) > > >>>> > > >>>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags= .h > > >>>> index a0a29bd092f8..c3bb0e0da581 100644 > > >>>> --- a/include/linux/page-flags.h > > >>>> +++ b/include/linux/page-flags.h > > >>>> @@ -182,6 +182,7 @@ enum pageflags { > > >>>> /* At least one page in this folio has the hwpoison flag s= et */ > > >>>> PG_has_hwpoisoned =3D PG_active, > > >>>> PG_large_rmappable =3D PG_workingset, /* anon or file-back= ed */ > > >>>> + PG_partially_mapped =3D PG_reclaim, /* was identified to b= e partially mapped */ > > >>>> }; > > >>>> > > >>>> #define PAGEFLAGS_MASK ((1UL << NR_PAGEFLAGS) - 1) > > >>>> @@ -861,8 +862,18 @@ static inline void ClearPageCompound(struct p= age *page) > > >>>> ClearPageHead(page); > > >>>> } > > >>>> FOLIO_FLAG(large_rmappable, FOLIO_SECOND_PAGE) > > >>>> +FOLIO_TEST_FLAG(partially_mapped, FOLIO_SECOND_PAGE) > > >>>> +/* > > >>>> + * PG_partially_mapped is protected by deferred_split split_queue= _lock, > > >>>> + * so its safe to use non-atomic set/clear. > > >>>> + */ > > >>>> +__FOLIO_SET_FLAG(partially_mapped, FOLIO_SECOND_PAGE) > > >>>> +__FOLIO_CLEAR_FLAG(partially_mapped, FOLIO_SECOND_PAGE) > > >>>> #else > > >>>> FOLIO_FLAG_FALSE(large_rmappable) > > >>>> +FOLIO_TEST_FLAG_FALSE(partially_mapped) > > >>>> +__FOLIO_SET_FLAG_NOOP(partially_mapped) > > >>>> +__FOLIO_CLEAR_FLAG_NOOP(partially_mapped) > > >>>> #endif > > >>>> > > >>>> #define PG_head_mask ((1UL << PG_head)) > > >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > >>>> index 2d77b5d2291e..70ee49dfeaad 100644 > > >>>> --- a/mm/huge_memory.c > > >>>> +++ b/mm/huge_memory.c > > >>>> @@ -3398,6 +3398,7 @@ int split_huge_page_to_list_to_order(struct = page *page, struct list_head *list, > > >>>> * page_deferred_list. > > >>>> */ > > >>>> list_del_init(&folio->_deferred_list); > > >>>> + __folio_clear_partially_mapped(folio); > > >>>> } > > >>>> spin_unlock(&ds_queue->split_queue_lock); > > >>>> if (mapping) { > > >>>> @@ -3454,11 +3455,13 @@ void __folio_undo_large_rmappable(struct f= olio *folio) > > >>>> if (!list_empty(&folio->_deferred_list)) { > > >>>> ds_queue->split_queue_len--; > > >>>> list_del_init(&folio->_deferred_list); > > >>>> + __folio_clear_partially_mapped(folio); > > >>> > > >>> is it possible to make things clearer by > > >>> > > >>> if (folio_clear_partially_mapped) > > >>> __folio_clear_partially_mapped(folio); > > >>> > > >>> While writing without conditions isn't necessarily wrong, adding a = condition > > >>> will improve the readability of the code and enhance the clarity of= my mTHP > > >>> counters series. also help decrease smp cache sync if we can avoid > > >>> unnecessary writing? > > >>> > > >> > > >> Do you mean if(folio_test_partially_mapped(folio))? > > >> > > >> I don't like this idea. I think it makes the readability worse? If I= was looking at if (test) -> clear for the first time, I would become confu= sed why its being tested if its going to be clear at the end anyways? > > > > > > In the pmd-order case, the majority of folios are not partially mappe= d. > > > Unconditional writes will trigger cache synchronization across all > > > CPUs (related to the MESI protocol), making them more costly. By > > > using conditional writes, such as "if(test) write," we can avoid > > > most unnecessary writes, which is much more efficient. Additionally, > > > we only need to manage nr_split_deferred when the condition > > > is met. We are carefully evaluating all scenarios to determine > > > if modifications to the partially_mapped flag are necessary. > > > > > > > > > Hmm okay, as you said its needed for nr_split_deferred anyways. Somethi= ng like below is ok to fold in? > > > > commit 4ae9e2067346effd902b342296987b97dee29018 (HEAD) > > Author: Usama Arif > > Date: Mon Aug 19 21:07:16 2024 +0100 > > > > mm: Introduce a pageflag for partially mapped folios fix > > > > Test partially_mapped flag before clearing it. This should > > avoid unnecessary writes and will be needed in the nr_split_deferre= d > > series. > > > > Signed-off-by: Usama Arif > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index 5d67d3b3c1b2..ccde60aaaa0f 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -3479,7 +3479,8 @@ void __folio_undo_large_rmappable(struct folio *f= olio) > > if (!list_empty(&folio->_deferred_list)) { > > ds_queue->split_queue_len--; > > list_del_init(&folio->_deferred_list); > > - __folio_clear_partially_mapped(folio); > > + if (folio_test_partially_mapped(folio)) > > + __folio_clear_partially_mapped(folio); > > } > > spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); > > } > > @@ -3610,7 +3611,8 @@ static unsigned long deferred_split_scan(struct s= hrinker *shrink, > > } else { > > /* We lost race with folio_put() */ > > list_del_init(&folio->_deferred_list); > > - __folio_clear_partially_mapped(folio); > > + if (folio_test_partially_mapped(folio)) > > + __folio_clear_partially_mapped(folio); > > ds_queue->split_queue_len--; > > } > > if (!--sc->nr_to_scan) > > > > Do we also need if (folio_test_partially_mapped(folio)) in > split_huge_page_to_list_to_order()? > > I recall that in Yu Zhao's TAO, there=E2=80=99s a chance of splitting (sh= attering) > non-partially-mapped folios. To be future-proof, we might want to handle > both cases equally. we recall we also have a real case which can split entirely_mapped folio: mm: huge_memory: enable debugfs to split huge pages to any order https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?= id=3Dfc4d182316bd5309b4066fd9ef21529ea397a7d4 > > By the way, we might not need to clear the flag for a new folio. This dif= fers > from the init_list, which is necessary. If a new folio has the partially_= mapped > flag, it indicates that we failed to clear it when freeing the folio to > the buddy system, which is a bug we need to fix in the free path. > > Thanks > Barry