From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB2EDC3DA4A for ; Wed, 14 Aug 2024 11:26:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 585E36B0082; Wed, 14 Aug 2024 07:26:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5351A6B0083; Wed, 14 Aug 2024 07:26:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4237B6B0085; Wed, 14 Aug 2024 07:26:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 253186B0082 for ; Wed, 14 Aug 2024 07:26:46 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id D27E8160E4A for ; Wed, 14 Aug 2024 11:26:45 +0000 (UTC) X-FDA: 82450623570.03.3A5EA3C Received: from mail-yb1-f173.google.com (mail-yb1-f173.google.com [209.85.219.173]) by imf08.hostedemail.com (Postfix) with ESMTP id EDB7716000E for ; Wed, 14 Aug 2024 11:26:43 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=none; spf=pass (imf08.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.219.173 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723634732; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QT5lH3llt5yDkNep7VtRwo9nhgK6w5OeNlkSRKQDgNM=; b=olIGVAg14rfwgokA5dGqfzj3wB57uNJS8CtGtAst3SwKlV77NkB1CdSm8nmpsRUqNEZeFX ErCr+C7HnBE6Qo1Q6oZHAFaPCS1NNba/CwPLfa5mTB2xyCwQoyjR8V82w0SZjQtAcUucu6 nnRs5qLNBEUHb+goQtHBacHD/MCgutI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723634732; a=rsa-sha256; cv=none; b=yLbPTgCTsO0qBkcnv2Rak3hOwb7EZdHQEfXUv7H9alD5n8B3Rt7iMTE2O5SIyM19NFqTWW NCU90FnQhwcm0sGZ2SmqaO69+gYnfhbgS7+dvZn1lXHXjG6g5KtPHDdraTzf5uoXNQYRMR IeVU0XEb6NqkI676PBarN/5KIinadAU= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=none; spf=pass (imf08.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.219.173 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none) Received: by mail-yb1-f173.google.com with SMTP id 3f1490d57ef6-e11693fbebaso14190276.3 for ; Wed, 14 Aug 2024 04:26:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723634803; x=1724239603; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QT5lH3llt5yDkNep7VtRwo9nhgK6w5OeNlkSRKQDgNM=; b=NJjkUylIQb0iK89VOW5jr2+mdWrVPqhSU5Rurr12gfkVFNquMD6SodUIlzMoQ/PFwJ sCW3KHTWTenJ8WMY/QlThzJbIQu1UHZ+4AfNioi4rxXxBxRSkbvyB+PEH1wM0al8SUzy 8vpGuE8XLkxF4ITau2Z2OGDg0QT8cgm5coxtn435qdh7HouAR7tL5hCzkl+nG7IRX+fu Nhl/pi09UgLQzko4QQ7B34BEod+HcRlpgn+cmoWuxjudDx5nDDvDNSYysG891sUTRxsM r+vp6z+PeYsYzd/jITvVefeI5YIR8lKdqcgKdO6C8QZYzJfGN4HHG5XCwrHwX5Kqe5xh Ww/A== X-Forwarded-Encrypted: i=1; AJvYcCVSEV0vl17XnW+dvRP/r2eFJBSTvzT3X0tJGsl09YwvKauGx+KcZfqlCmTf839MEpj/z69oPQL/DsPVbG1tjFFpTsg= X-Gm-Message-State: AOJu0YxGBL5RFcPELVjP9NOdl69KuYUTM8mOI6CcLEs+RY7rPc/mxBLz wXzYIXPRhExbpTCFMNEedmKl43RnlyUSPqnEwqQJDYXgOlMowBXP+dtcsDO1JDSiPSxNl6MZRxY YJkDOBiy+IYMqMd/tFiuNnehpF70= X-Google-Smtp-Source: AGHT+IEUrQU/dg9fEllqiVqvnBpeUzrMN0ffa9RSbvtQprtqwh52oiIaJzndd44+fVCD4mrALJuUP1cJ1bRsKi3Djtk= X-Received: by 2002:a05:6902:140c:b0:e0b:fcb9:28d6 with SMTP id 3f1490d57ef6-e1155a4758emr2565796276.9.1723634802922; Wed, 14 Aug 2024 04:26:42 -0700 (PDT) MIME-Version: 1.0 References: <20240813120328.1275952-1-usamaarif642@gmail.com> <20240813120328.1275952-5-usamaarif642@gmail.com> <925804e4-0b33-45eb-905d-e00f67192828@gmail.com> In-Reply-To: From: Barry Song Date: Wed, 14 Aug 2024 23:26:31 +1200 Message-ID: Subject: Re: [PATCH v3 4/6] mm: Introduce a pageflag for partially mapped folios To: Usama Arif Cc: akpm@linux-foundation.org, linux-mm@kvack.org, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: fjhp64q9u1yzj85buryyb6en1oijped3 X-Rspamd-Queue-Id: EDB7716000E X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1723634803-207567 X-HE-Meta: U2FsdGVkX1/yOuFMLySZbOLzuwC/BKZmkDm9bIVkLYBz1Ps+taIaZTI0cIE4wbP3DJ5X+k78vL/uQRWt/B36GUAiDA8GpYIDOIKowQhJ4rh9Y86QB9lpeH0XIzT4xv4B0TpQ93xHri95MqNOZR2Dp5E3uNunoy5bxOhholmWvcy8abvUzKWi6KJtgeVQqH16wbaj1WWGKgFTXVP/HAkcAAkf6yirI7YMymvTTUcQUtKAEycGR1Tt1O9awpV0TtAj08LT2ZIlgfETVucTa3h2krq419FIikkoIode6k9mM/J6T5FdD/qqt5DHgMjIK8dLV2vLoM0imaJ1WGFQvSBG0ZdZOoGWrlfBm9rwWkDyF3QZtmdgwI8qi/EwJRNBw726m7unXmG3i1vMj9tS/hyDQrB5BmHfq0/KSltf6tbNgnYJUU19mmnfJ3ky6W9+bMJjBMXoyrkrDTo4dAgrhAo+CS24NoKouwm8SVbuLmGu+R3CULdhalZ7CRAlpfX6lKyUw2CAyxqzngGnnfBNoGn0+ret8oSMiOdmNQAmlmFecHidVkzAnv7/siCPB0DBKy6B/vC9peRUij4T98637wNBq8oYq802mEGu5bxwFUFTM5Z4OQ/Qvz6si+HKcCngCv7gJcEZRgdAdnciUYlPXqqbvtikU7K3FNFVu9eSgXdoVH1o39rcYIe49Eue+VPUQjqKc1bDuelhZyVoVJGWOV2Pe2SAialSq5bXChtW6LbnbwER7pdxPQKS3Lc5Ciotgdg2FTmnxo7PrriuxgKfctMetAEK3GOoGj3gNYDaGP976eQEfP5hOm2LBsrnYr1LzqrDLq81FsUAW0p769Nk4aZYXaBF79p4jwU/1eF/jUSK9DhijGCPRiVmzJIZGiUrThnLURU1PWPvsDBiJxYmlCYaqzsfnURRX5JNEKpsRbOVmrNem8/0UlC+t6W6fXDjoovJvGXENWgVecTQdjolX7N yvKSBT6L ap+tZor0odzNH+2+HJ7R5xAjkbFPN4Qy6BPuUf1EefO0AWRbjRhwSFqB+LlrRDYAX0fxkQ69y1cDt8DRJ1e99kmMEilPXpeCNVOSY0osqyqmX5nAxaiGEuMe8zS3ATplJG1IiOq5ioyV9dDf3sLMM7D5sYmND0l1h2j9fEGpM7l9ASpIICZB8pDK1tHqcA4erTFN03uofcuxuDLyo6zLhm+tzfyhlPH2FSk8kPaioUU1T7wXlIKAlRz1n+Wd+7zKxmu6qeyk0DgVqAJu1OqSfV/2ILggV3nx8ClmTQDU+k1Du2TaVNygFbEBj4/E//hBPSwRmvE1gxfF4+MEv29amZKg5LhYCzIxxZiYcmeCU3jMzNfI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Aug 14, 2024 at 11:20=E2=80=AFPM Barry Song wro= te: > > On Wed, Aug 14, 2024 at 11:11=E2=80=AFPM Usama Arif wrote: > > > > > > > > On 14/08/2024 11:44, Barry Song wrote: > > > On Wed, Aug 14, 2024 at 12:03=E2=80=AFAM Usama Arif wrote: > > >> > > >> Currently folio->_deferred_list is used to keep track of > > >> partially_mapped folios that are going to be split under memory > > >> pressure. In the next patch, all THPs that are faulted in and collap= sed > > >> by khugepaged are also going to be tracked using _deferred_list. > > >> > > >> This patch introduces a pageflag to be able to distinguish between > > >> partially mapped folios and others in the deferred_list at split tim= e in > > >> deferred_split_scan. Its needed as __folio_remove_rmap decrements > > >> _mapcount, _large_mapcount and _entire_mapcount, hence it won't be > > >> possible to distinguish between partially mapped folios and others i= n > > >> deferred_split_scan. > > >> > > >> Eventhough it introduces an extra flag to track if the folio is > > >> partially mapped, there is no functional change intended with this > > >> patch and the flag is not useful in this patch itself, it will > > >> become useful in the next patch when _deferred_list has non partiall= y > > >> mapped folios. > > >> > > >> Signed-off-by: Usama Arif > > >> --- > > >> include/linux/huge_mm.h | 4 ++-- > > >> include/linux/page-flags.h | 3 +++ > > >> mm/huge_memory.c | 21 +++++++++++++-------- > > >> mm/hugetlb.c | 1 + > > >> mm/internal.h | 4 +++- > > >> mm/memcontrol.c | 3 ++- > > >> mm/migrate.c | 3 ++- > > >> mm/page_alloc.c | 5 +++-- > > >> mm/rmap.c | 3 ++- > > >> mm/vmscan.c | 3 ++- > > >> 10 files changed, 33 insertions(+), 17 deletions(-) > > >> > > >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > > >> index 4c32058cacfe..969f11f360d2 100644 > > >> --- a/include/linux/huge_mm.h > > >> +++ b/include/linux/huge_mm.h > > >> @@ -321,7 +321,7 @@ static inline int split_huge_page(struct page *p= age) > > >> { > > >> return split_huge_page_to_list_to_order(page, NULL, 0); > > >> } > > >> -void deferred_split_folio(struct folio *folio); > > >> +void deferred_split_folio(struct folio *folio, bool partially_mappe= d); > > >> > > >> void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, > > >> unsigned long address, bool freeze, struct folio *fo= lio); > > >> @@ -495,7 +495,7 @@ static inline int split_huge_page(struct page *p= age) > > >> { > > >> return 0; > > >> } > > >> -static inline void deferred_split_folio(struct folio *folio) {} > > >> +static inline void deferred_split_folio(struct folio *folio, bool p= artially_mapped) {} > > >> #define split_huge_pmd(__vma, __pmd, __address) \ > > >> do { } while (0) > > >> > > >> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > > >> index a0a29bd092f8..cecc1bad7910 100644 > > >> --- a/include/linux/page-flags.h > > >> +++ b/include/linux/page-flags.h > > >> @@ -182,6 +182,7 @@ enum pageflags { > > >> /* At least one page in this folio has the hwpoison flag set= */ > > >> PG_has_hwpoisoned =3D PG_active, > > >> PG_large_rmappable =3D PG_workingset, /* anon or file-backed= */ > > >> + PG_partially_mapped, /* was identified to be partially mappe= d */ > > >> }; > > >> > > >> #define PAGEFLAGS_MASK ((1UL << NR_PAGEFLAGS) - 1) > > >> @@ -861,8 +862,10 @@ static inline void ClearPageCompound(struct pag= e *page) > > >> ClearPageHead(page); > > >> } > > >> FOLIO_FLAG(large_rmappable, FOLIO_SECOND_PAGE) > > >> +FOLIO_FLAG(partially_mapped, FOLIO_SECOND_PAGE) > > >> #else > > >> FOLIO_FLAG_FALSE(large_rmappable) > > >> +FOLIO_FLAG_FALSE(partially_mapped) > > >> #endif > > >> > > >> #define PG_head_mask ((1UL << PG_head)) > > >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > >> index 6df0e9f4f56c..c024ab0f745c 100644 > > >> --- a/mm/huge_memory.c > > >> +++ b/mm/huge_memory.c > > >> @@ -3397,6 +3397,7 @@ int split_huge_page_to_list_to_order(struct pa= ge *page, struct list_head *list, > > >> * page_deferred_list. > > >> */ > > >> list_del_init(&folio->_deferred_list); > > >> + folio_clear_partially_mapped(folio); > > >> } > > >> spin_unlock(&ds_queue->split_queue_lock); > > >> if (mapping) { > > >> @@ -3453,11 +3454,12 @@ void __folio_undo_large_rmappable(struct fol= io *folio) > > >> if (!list_empty(&folio->_deferred_list)) { > > >> ds_queue->split_queue_len--; > > >> list_del_init(&folio->_deferred_list); > > >> + folio_clear_partially_mapped(folio); > > >> } > > >> spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); > > >> } > > >> > > >> -void deferred_split_folio(struct folio *folio) > > >> +void deferred_split_folio(struct folio *folio, bool partially_mappe= d) > > >> { > > >> struct deferred_split *ds_queue =3D get_deferred_split_queue= (folio); > > >> #ifdef CONFIG_MEMCG > > >> @@ -3485,14 +3487,17 @@ void deferred_split_folio(struct folio *foli= o) > > >> if (folio_test_swapcache(folio)) > > >> return; > > >> > > >> - if (!list_empty(&folio->_deferred_list)) > > >> - return; > > >> - > > >> spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > > >> + if (partially_mapped) > > >> + folio_set_partially_mapped(folio); > > >> + else > > >> + folio_clear_partially_mapped(folio); > > > > > > Hi Usama, > > > > > > Do we need this? When can a partially_mapped folio on deferred_list b= ecome > > > non-partially-mapped and need a clear? I understand transferring from > > > entirely_map > > > to partially_mapped is a one way process? partially_mapped folios can= 't go back > > > to entirely_mapped? > > > > > Hi Barry, > > > > deferred_split_folio function is called in 3 places after this series, = at fault, collapse and partial mapping. partial mapping can only happen aft= er fault/collapse, and we have FOLIO_FLAG_FALSE(partially_mapped), i.e. fla= g initialized to false, so technically its not needed. A partially_mapped f= olio on deferred list wont become non-partially mapped. > > > > I just did it as a precaution if someone ever changes the kernel and ca= lls deferred_split_folio with partially_mapped set to false after it had be= en true. The function arguments of deferred_split_folio make it seem that p= assing partially_mapped=3Dfalse as an argument would clear it, which is why= I cleared it as well. I could change the patch to something like below if = it makes things better? i.e. add a comment at the top of the function: > > > > to me, it seems a BUG to call with false after a folio has been > partially_mapped. So I'd rather > VM_WARN_ON_FOLIO(folio_test_partially_mapped(folio), folio); > > The below should also fix the MTHP_STAT_SPLIT_DEFERRED > counter this patch is breaking? > > @@ -3515,16 +3522,18 @@ void deferred_split_folio(struct folio *folio, > bool partially_mapped) > return; > > spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > - if (partially_mapped) > - folio_set_partially_mapped(folio); > - else > - folio_clear_partially_mapped(folio); > - if (list_empty(&folio->_deferred_list)) { > - if (partially_mapped) { > + if (partially_mapped) { > + if (!folio_test_set_partially_mapped(folio)) { > + mod_mthp_stat(folio_order(folio), > + MTHP_STAT_NR_SPLIT_DEFERRED, 1); > if (folio_test_pmd_mappable(folio)) > count_vm_event(THP_DEFERRED_SPLIT_PAGE); > count_mthp_stat(folio_order(folio), > MTHP_STAT_SPLIT_DEFERRED); > } > + } > + VM_WARN_ON_FOLIO(folio_test_partially_mapped(folio), folio); sorry, I mean: VM_WARN_ON_FOLIO(folio_test_partially_mapped(folio) && !partially_mapped, folio); > + > + if (list_empty(&folio->_deferred_list)) { > list_add_tail(&folio->_deferred_list, &ds_queue->split_qu= eue); > ds_queue->split_queue_len++; > #ifdef CONFIG_MEMCG > > > > > > -void deferred_split_folio(struct folio *folio) > > +/* partially_mapped=3Dfalse won't clear PG_partially_mapped folio flag= */ > > +void deferred_split_folio(struct folio *folio, bool partially_mapped) > > { > > struct deferred_split *ds_queue =3D get_deferred_split_queue(fo= lio); > > #ifdef CONFIG_MEMCG > > @@ -3485,14 +3488,15 @@ void deferred_split_folio(struct folio *folio) > > if (folio_test_swapcache(folio)) > > return; > > > > - if (!list_empty(&folio->_deferred_list)) > > - return; > > - > > spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > > + if (partially_mapped) > > + folio_set_partially_mapped(folio); > > if (list_empty(&folio->_deferred_list)) { > > - if (folio_test_pmd_mappable(folio)) > > - count_vm_event(THP_DEFERRED_SPLIT_PAGE); > > - count_mthp_stat(folio_order(folio), MTHP_STAT_SPLIT_DEF= ERRED); > > + if (partially_mapped) { > > + if (folio_test_pmd_mappable(folio)) > > + count_vm_event(THP_DEFERRED_SPLIT_PAGE)= ; > > + count_mthp_stat(folio_order(folio), MTHP_STAT_S= PLIT_DEFERRED); > > + } > > list_add_tail(&folio->_deferred_list, &ds_queue->split_= queue); > > ds_queue->split_queue_len++; > > #ifdef CONFIG_MEMCG > > > > > > > I am trying to rebase my NR_SPLIT_DEFERRED counter on top of your > > > work, but this "clear" makes that job quite tricky. as I am not sure > > > if this patch > > > is going to clear the partially_mapped flag for folios on deferred_li= st. > > > > > > Otherwise, I can simply do the below whenever folio is leaving deferr= ed_list > > > > > > ds_queue->split_queue_len--; > > > list_del_init(&folio->_deferred_list); > > > if (folio_test_clear_partially_mapped(folio)) > > > mod_mthp_stat(folio_order(folio), > > > MTHP_STAT_NR_SPLIT_DEFERRED, -1); > > > > > > > Thanks > Barry