From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1CD62C52D7C for ; Mon, 19 Aug 2024 08:31:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A36166B008A; Mon, 19 Aug 2024 04:31:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9E6956B008C; Mon, 19 Aug 2024 04:31:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 887006B0092; Mon, 19 Aug 2024 04:31:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 69C306B008A for ; Mon, 19 Aug 2024 04:31:07 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 25E161410B8 for ; Mon, 19 Aug 2024 08:31:07 +0000 (UTC) X-FDA: 82468324974.13.0ABD683 Received: from mail-vk1-f174.google.com (mail-vk1-f174.google.com [209.85.221.174]) by imf18.hostedemail.com (Postfix) with ESMTP id 584D21C001D for ; Mon, 19 Aug 2024 08:31:05 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=none; spf=pass (imf18.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.174 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724056227; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=APE0ISEvOBWNrU7IRNXIASnXYlFgTmdKbOjH55pzJR0=; b=skn4D3u8Yvmotl53PWyEnfcduLo6lnpK0IY6flUVt3+YUxGSoyq2naRNOS99FDxZJwsL8D 4Oo5Ooc7a4fXBa+7kpFNx1Cf14ZiIG4yNSVSkkTIqJ/dIPxPRmHJvVQ+6IR4I8SEegQnGd TgQTuzywXKjgiZZu1GbRdUKMSbcXuXY= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=none; spf=pass (imf18.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.174 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724056227; a=rsa-sha256; cv=none; b=XAIQYThSKqjxZwDssN19L0QFjfF6iud1H1N0Dr6sTCvwbDclCytJ4JAU+BwGGaTAdcIPNT m1F4rkoYNvcEIt8D5Y4l9Ea2unmZG9yYZMPvRM/Vl3udcR0GSA00nv2+7jfCKhO7rODpkS A/fMyB6Z42yJLdMH00My20ao5qvEp1s= Received: by mail-vk1-f174.google.com with SMTP id 71dfb90a1353d-4f52bd5b555so2418439e0c.1 for ; Mon, 19 Aug 2024 01:31:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724056264; x=1724661064; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=APE0ISEvOBWNrU7IRNXIASnXYlFgTmdKbOjH55pzJR0=; b=ecMO+fzDLz9jT33m6sMLv3/CJpet8JJSyXmVTQUIKuevLv53eDj1/5kib7AYG80pOY xV/6BB4LDKWC1rnEn3bpsekZsLC7MOPj+HmxpkrXeVso+PmkOQgC6vX2EguU1UIpTCYa 6bwejL5OMDwLHocHvS19nB8bS8v7B53nwzmp9r2HY0QJ81EEl+nEWBrNrwdrRQTEXVcU XhbJw15wa8rFtuJwIrXwW1WKL0gBcxj2we6IUAnDIXqj+VUggayjrfPQFtpntS/apdq6 SOKDf34QmEN5asT+sNImclVeNY+e0yYpI+1jOLi5aecvS0w1ItIpW1IlOBkzPDZt+xUo PlHQ== X-Forwarded-Encrypted: i=1; AJvYcCXFhT818w9uo9TIcUltCo6pDyawb2L9QO4wiGjPbteKuAFvQ5o/DtzYRDj0A2mNFfGtRdJUnvz6rckdPBVCxss5m7M= X-Gm-Message-State: AOJu0YzmTpnl44EZ7Y0Qu7WT8G1lWPAiw1r4Q5tw1F8uQvzKidrfYTHB 1HwXIp1LZ3OxdieL94mP4dq2eyYWe9oh46VMjt3I7Q8TeL6A73W5v6cucWw9r24h2rLj72GA/yy jM/sIXJl7XuAs+f4q4a4a8mN2VD8= X-Google-Smtp-Source: AGHT+IEUjuV3D3O1TAYVg3myJn21mduKe8joUGy2ep8WJTXqsoUSgJuWS1Or+mXGakuAFdyKTEYhhZpcElXPbb/iRls= X-Received: by 2002:a05:6122:29c8:b0:4f5:2adf:3445 with SMTP id 71dfb90a1353d-4fc591623e7mr7175299e0c.7.1724056264233; Mon, 19 Aug 2024 01:31:04 -0700 (PDT) MIME-Version: 1.0 References: <20240819023145.2415299-1-usamaarif642@gmail.com> <20240819023145.2415299-5-usamaarif642@gmail.com> In-Reply-To: From: Barry Song Date: Mon, 19 Aug 2024 20:30:53 +1200 Message-ID: Subject: Re: [PATCH v4 4/6] mm: Introduce a pageflag for partially mapped folios To: Usama Arif Cc: akpm@linux-foundation.org, linux-mm@kvack.org, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, ryncsn@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 584D21C001D X-Stat-Signature: 9mhwwnkg6gtsugan6ruoodp4cxyggqu6 X-HE-Tag: 1724056265-602073 X-HE-Meta: U2FsdGVkX18OAXlGGDjJY1pfFI6adQMyN/5Xt6J6kkbEP9D2uoav1IKXV8DcmYga7lTg1/yv4U10QEyU1xmvqdyMpZMlRFtbcQJVNuZgODSgHbIt4BtwMlEZcdqb4LyqJsS2Z8ld/lt4mGkxMPhHmgcbznHFa93fxk8XDm71eHuO4Fs3+0nfIMHT+ASWxXSYIkrXGyUmfKKYV5Nef9zzHEALHAvgsaUxmbdNIeZa+O3u5avBbIQAVHv32IZ7gaGfKP3pfXNRUZBiNjGajPgScS044s8hWBi3prSL+seNp7ocJy8efFPWW9GjujTwKu+fSrDSaWZ+i8D4CRCSWhaCwrpDS1cAD03DwTACG5YvUwxOuGQlPOmLF9kr5I72mDu7gZGyHLyKYR02sGZc1wpQOF798HcDhD1hb4v+oG6aGPcyfkKBKNqNRRTOPrmsUss87ZmV3uY+A0uCH+qsDpFNbN6O+G+Gd1Pi7YSo0m/mu4Qtg+OI4IsK9wY+yTLUxYBOqnpRnyxBDwI7p+Vwv2bcwjlv3OWXwQxFhKK443qMq5qw1qtQ9juykvJT/6x1QP9Us8pN9Xx5Rnkl9QLFmGGoFe6knI4vrULFGVwzqtfARMFwaFp1p87BMQdSPdBucgZMH6hxemvYJTk5Q+osgaCwrEEQGkZO9Pyv459EzJio2f9eMcfnEwR04RPpp6+2IZb3SYPcwTkS+Vqe+6SuhTJcspGJEm47F9o4Bj0wHLXIwYTlwd12XvIEzFGtMqN7FvJY6keT34orEzOl3foyiuNl6NCaY1PkmoGe40zFxsb70xlI8PWMyGi2rG3gb0afvLSIqV5LFd01347JNPgmCPU7BFziAnU2KuSuwHKv81V9eP6HnJmDJ7DU2oX5+xgHJBIDbVFnZh6G00jKa0+G1vi6WWUzMVCsIKr0Et3joawye6oj1arLR9uYzOgIqzeLxSAdfrXkq8PZf1aDpMtFTHF KTMX1uNN eWutCut33G/wPFAMFU79+Q4sYu0ldT8Jlc9xhrqRHfXV0fZ5+Xw78CRI13Ck1mGy4VRxSE/zWhCCEyXD1cTaNP/mcVat/x/3jhfU+HGois/xcBVpnel+erigSXGmL4bHePMhEG99HFuFrolUp3TK0SGKwzC/m0/EaT1Z5T9luC8qsPzfo43mtyNDz4I5g0xsinYAwsWS0iW2dnHT+pk1iGacp4671qQcNfu+1YZ+Rr9sW4q8bmK/FYYQThlyDmMNNyKf+StCpV7bKdoP5mvBYKp7zkc3NhYNUxxVkKy9f+kIstj+XxswlveA/TiZy9wf8kOEsQtogJgGcNBsA857MmCGDcXJDGVGCy3vZKdXZdNmQtqM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Aug 19, 2024 at 8:29=E2=80=AFPM Barry Song wrot= e: > > Hi Usama, > > I feel it is much better now! thanks! > > On Mon, Aug 19, 2024 at 2:31=E2=80=AFPM Usama Arif wrote: > > > > Currently folio->_deferred_list is used to keep track of > > partially_mapped folios that are going to be split under memory > > pressure. In the next patch, all THPs that are faulted in and collapsed > > by khugepaged are also going to be tracked using _deferred_list. > > > > This patch introduces a pageflag to be able to distinguish between > > partially mapped folios and others in the deferred_list at split time i= n > > deferred_split_scan. Its needed as __folio_remove_rmap decrements > > _mapcount, _large_mapcount and _entire_mapcount, hence it won't be > > possible to distinguish between partially mapped folios and others in > > deferred_split_scan. > > > > Eventhough it introduces an extra flag to track if the folio is > > partially mapped, there is no functional change intended with this > > patch and the flag is not useful in this patch itself, it will > > become useful in the next patch when _deferred_list has non partially > > mapped folios. > > > > Signed-off-by: Usama Arif > > --- > > include/linux/huge_mm.h | 4 ++-- > > include/linux/page-flags.h | 11 +++++++++++ > > mm/huge_memory.c | 23 ++++++++++++++++------- > > mm/internal.h | 4 +++- > > mm/memcontrol.c | 3 ++- > > mm/migrate.c | 3 ++- > > mm/page_alloc.c | 5 +++-- > > mm/rmap.c | 5 +++-- > > mm/vmscan.c | 3 ++- > > 9 files changed, 44 insertions(+), 17 deletions(-) > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > > index 4c32058cacfe..969f11f360d2 100644 > > --- a/include/linux/huge_mm.h > > +++ b/include/linux/huge_mm.h > > @@ -321,7 +321,7 @@ static inline int split_huge_page(struct page *page= ) > > { > > return split_huge_page_to_list_to_order(page, NULL, 0); > > } > > -void deferred_split_folio(struct folio *folio); > > +void deferred_split_folio(struct folio *folio, bool partially_mapped); > > > > void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, > > unsigned long address, bool freeze, struct folio *folio= ); > > @@ -495,7 +495,7 @@ static inline int split_huge_page(struct page *page= ) > > { > > return 0; > > } > > -static inline void deferred_split_folio(struct folio *folio) {} > > +static inline void deferred_split_folio(struct folio *folio, bool part= ially_mapped) {} > > #define split_huge_pmd(__vma, __pmd, __address) \ > > do { } while (0) > > > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > > index a0a29bd092f8..c3bb0e0da581 100644 > > --- a/include/linux/page-flags.h > > +++ b/include/linux/page-flags.h > > @@ -182,6 +182,7 @@ enum pageflags { > > /* At least one page in this folio has the hwpoison flag set */ > > PG_has_hwpoisoned =3D PG_active, > > PG_large_rmappable =3D PG_workingset, /* anon or file-backed */ > > + PG_partially_mapped =3D PG_reclaim, /* was identified to be par= tially mapped */ > > }; > > > > #define PAGEFLAGS_MASK ((1UL << NR_PAGEFLAGS) - 1) > > @@ -861,8 +862,18 @@ static inline void ClearPageCompound(struct page *= page) > > ClearPageHead(page); > > } > > FOLIO_FLAG(large_rmappable, FOLIO_SECOND_PAGE) > > +FOLIO_TEST_FLAG(partially_mapped, FOLIO_SECOND_PAGE) > > +/* > > + * PG_partially_mapped is protected by deferred_split split_queue_lock= , > > + * so its safe to use non-atomic set/clear. > > + */ > > +__FOLIO_SET_FLAG(partially_mapped, FOLIO_SECOND_PAGE) > > +__FOLIO_CLEAR_FLAG(partially_mapped, FOLIO_SECOND_PAGE) > > #else > > FOLIO_FLAG_FALSE(large_rmappable) > > +FOLIO_TEST_FLAG_FALSE(partially_mapped) > > +__FOLIO_SET_FLAG_NOOP(partially_mapped) > > +__FOLIO_CLEAR_FLAG_NOOP(partially_mapped) > > #endif > > > > #define PG_head_mask ((1UL << PG_head)) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index 2d77b5d2291e..70ee49dfeaad 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -3398,6 +3398,7 @@ int split_huge_page_to_list_to_order(struct page = *page, struct list_head *list, > > * page_deferred_list. > > */ > > list_del_init(&folio->_deferred_list); > > + __folio_clear_partially_mapped(folio); > > } > > spin_unlock(&ds_queue->split_queue_lock); > > if (mapping) { > > @@ -3454,11 +3455,13 @@ void __folio_undo_large_rmappable(struct folio = *folio) > > if (!list_empty(&folio->_deferred_list)) { > > ds_queue->split_queue_len--; > > list_del_init(&folio->_deferred_list); > > + __folio_clear_partially_mapped(folio); > > is it possible to make things clearer by > > if (folio_clear_partially_mapped) > __folio_clear_partially_mapped(folio); sorry for typo, if (folio_test_partially_mapped) __folio_clear_partially_mapped(folio); > > While writing without conditions isn't necessarily wrong, adding a condit= ion > will improve the readability of the code and enhance the clarity of my mT= HP > counters series. also help decrease smp cache sync if we can avoid > unnecessary writing? > > > } > > spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); > > } > > > > -void deferred_split_folio(struct folio *folio) > > +/* partially_mapped=3Dfalse won't clear PG_partially_mapped folio flag= */ > > +void deferred_split_folio(struct folio *folio, bool partially_mapped) > > { > > struct deferred_split *ds_queue =3D get_deferred_split_queue(fo= lio); > > #ifdef CONFIG_MEMCG > > @@ -3486,14 +3489,19 @@ void deferred_split_folio(struct folio *folio) > > if (folio_test_swapcache(folio)) > > return; > > > > - if (!list_empty(&folio->_deferred_list)) > > - return; > > - > > spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > > + if (partially_mapped) { > > + if (!folio_test_partially_mapped(folio)) { > > + __folio_set_partially_mapped(folio); > > + if (folio_test_pmd_mappable(folio)) > > + count_vm_event(THP_DEFERRED_SPLIT_PAGE)= ; > > + count_mthp_stat(folio_order(folio), MTHP_STAT_S= PLIT_DEFERRED); > > + } > > + } else { > > + /* partially mapped folios cannot become non-partially = mapped */ > > + VM_WARN_ON_FOLIO(folio_test_partially_mapped(folio), fo= lio); > > + } > > if (list_empty(&folio->_deferred_list)) { > > - if (folio_test_pmd_mappable(folio)) > > - count_vm_event(THP_DEFERRED_SPLIT_PAGE); > > - count_mthp_stat(folio_order(folio), MTHP_STAT_SPLIT_DEF= ERRED); > > list_add_tail(&folio->_deferred_list, &ds_queue->split_= queue); > > ds_queue->split_queue_len++; > > #ifdef CONFIG_MEMCG > > @@ -3542,6 +3550,7 @@ static unsigned long deferred_split_scan(struct s= hrinker *shrink, > > } else { > > /* We lost race with folio_put() */ > > list_del_init(&folio->_deferred_list); > > + __folio_clear_partially_mapped(folio); > > as above? Do we also need if(test) for split_huge_page_to_list_to_order()= ? > > > ds_queue->split_queue_len--; > > } > > if (!--sc->nr_to_scan) > > diff --git a/mm/internal.h b/mm/internal.h > > index 52f7fc4e8ac3..27cbb5365841 100644 > > --- a/mm/internal.h > > +++ b/mm/internal.h > > @@ -662,8 +662,10 @@ static inline void prep_compound_head(struct page = *page, unsigned int order) > > atomic_set(&folio->_entire_mapcount, -1); > > atomic_set(&folio->_nr_pages_mapped, 0); > > atomic_set(&folio->_pincount, 0); > > - if (order > 1) > > + if (order > 1) { > > INIT_LIST_HEAD(&folio->_deferred_list); > > + __folio_clear_partially_mapped(folio); > > if partially_mapped is true for a new folio, does it mean we already have > a bug somewhere? > > How is it possible for a new folio to be partially mapped? > > > + } > > } > > > > static inline void prep_compound_tail(struct page *head, int tail_idx) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index e1ffd2950393..0fd95daecf9a 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -4669,7 +4669,8 @@ static void uncharge_folio(struct folio *folio, s= truct uncharge_gather *ug) > > VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); > > VM_BUG_ON_FOLIO(folio_order(folio) > 1 && > > !folio_test_hugetlb(folio) && > > - !list_empty(&folio->_deferred_list), folio); > > + !list_empty(&folio->_deferred_list) && > > + folio_test_partially_mapped(folio), folio); > > > > /* > > * Nobody should be changing or seriously looking at > > diff --git a/mm/migrate.c b/mm/migrate.c > > index 2d2e65d69427..ef4a732f22b1 100644 > > --- a/mm/migrate.c > > +++ b/mm/migrate.c > > @@ -1735,7 +1735,8 @@ static int migrate_pages_batch(struct list_head *= from, > > * use _deferred_list. > > */ > > if (nr_pages > 2 && > > - !list_empty(&folio->_deferred_list)) { > > + !list_empty(&folio->_deferred_list) && > > + folio_test_partially_mapped(folio)) { > > if (!try_split_folio(folio, split_folio= s, mode)) { > > nr_failed++; > > stats->nr_thp_failed +=3D is_th= p; > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 408ef3d25cf5..a145c550dd2a 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -957,8 +957,9 @@ static int free_tail_page_prepare(struct page *head= _page, struct page *page) > > break; > > case 2: > > /* the second tail page: deferred_list overlaps ->mappi= ng */ > > - if (unlikely(!list_empty(&folio->_deferred_list))) { > > - bad_page(page, "on deferred list"); > > + if (unlikely(!list_empty(&folio->_deferred_list) && > > + folio_test_partially_mapped(folio))) { > > + bad_page(page, "partially mapped folio on defer= red list"); > > goto out; > > } > > break; > > diff --git a/mm/rmap.c b/mm/rmap.c > > index a6b9cd0b2b18..4c330635aa4e 100644 > > --- a/mm/rmap.c > > +++ b/mm/rmap.c > > @@ -1578,8 +1578,9 @@ static __always_inline void __folio_remove_rmap(s= truct folio *folio, > > * Check partially_mapped first to ensure it is a large folio. > > */ > > if (partially_mapped && folio_test_anon(folio) && > > - list_empty(&folio->_deferred_list)) > > - deferred_split_folio(folio); > > + !folio_test_partially_mapped(folio)) > > + deferred_split_folio(folio, true); > > + > > __folio_mod_stat(folio, -nr, -nr_pmdmapped); > > > > /* > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 25e43bb3b574..25f4e8403f41 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1233,7 +1233,8 @@ static unsigned int shrink_folio_list(struct list= _head *folio_list, > > * Split partially mapped folio= s right away. > > * We can free the unmapped pag= es without IO. > > */ > > - if (data_race(!list_empty(&foli= o->_deferred_list)) && > > + if (data_race(!list_empty(&foli= o->_deferred_list) && > > + folio_test_partially_mapped= (folio)) && > > split_folio_to_list(folio, = folio_list)) > > goto activate_locked; > > } > > -- > > 2.43.5 > > > > Thanks > Barry