From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65A98C48BF6 for ; Mon, 26 Feb 2024 08:37:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D4817940016; Mon, 26 Feb 2024 03:37:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CD052940014; Mon, 26 Feb 2024 03:37:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B4C0C940016; Mon, 26 Feb 2024 03:37:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A0AE6940014 for ; Mon, 26 Feb 2024 03:37:25 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 72407A070E for ; Mon, 26 Feb 2024 08:37:25 +0000 (UTC) X-FDA: 81833300850.29.96349DC Received: from mail-oi1-f181.google.com (mail-oi1-f181.google.com [209.85.167.181]) by imf03.hostedemail.com (Postfix) with ESMTP id 93DBC20009 for ; Mon, 26 Feb 2024 08:37:23 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=QKhu66sO; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.167.181 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708936643; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XCF/aSx//6Xp/J5AVRDOXx1snfpNyCHhXSfd5XU/Gpw=; b=E4x+fxTyDnskqFVDS2jCX34ntF1G51Kd0aZbJzKo8C9hWEEsFsDi9WbmTpCpdxQEQdLJhS +CQtLpCf3nlWk4xhjVRVW3ieGAIAdWfa7ngk44llqRPf++dbgemJHnxZHaEC230S5qGaxI EwU1ZNOAUgqfzmTP6QF7wpZDi+AJu8M= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=QKhu66sO; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.167.181 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708936643; a=rsa-sha256; cv=none; b=flXxlmuOgBrXMIOguE0UeaAZaBw+7ShwSPN5DY4w0p/ya86YO13vZ3hehf75XqYdjE6qsL RoY4hJfNEqpVYxVJsPPGQXv+DvoO6k+RUv2kX8eW8jSSqCqIuDgZi7PESTj67u0bMR621h JSjZNXH7ZnojCty/vMNX0Y4YZ6N+VQ4= Received: by mail-oi1-f181.google.com with SMTP id 5614622812f47-3c1a1e1e539so733082b6e.1 for ; Mon, 26 Feb 2024 00:37:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708936642; x=1709541442; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XCF/aSx//6Xp/J5AVRDOXx1snfpNyCHhXSfd5XU/Gpw=; b=QKhu66sOVXCzcRMbmHJSfCis96EtEcbbabVkpUTrYiUUGHlqrfSrqVZv0A4/82tMf6 aaxxSrP5F4iU6Dhr4AHpS2GCHrRHJdk+gpNji6A5mwph0v7tun1OxHsBG8UKsuxpCMNp A+od011RRw7SWRXGGg1MT70YRHnJPyiiOogY5wzy75YJOawXqQFV6hpwBTcjOGQoj1+N L9EaBAdzLO63nGWKoXPENeSJ0tkGaSRcTVegW+WxT3Fi6ByEkc3yNB3nn6xAO0s8agYu UH5+1NUfGtzeSEjnsHtRFdl6dvX5XkSSi7X5DwvDSbLcW6qic6OjIGPbg8VPtUCuF3ki tIGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708936642; x=1709541442; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XCF/aSx//6Xp/J5AVRDOXx1snfpNyCHhXSfd5XU/Gpw=; b=WPNrvNFadoVoAE64Ou5kjEfmcfTqetT42sIrO05s+5wrXdFVSU9hEB/xj55UofdDVA C3+mXrNBOBxPCglEWM0seBzjoqRq6KreutdfEeA1oqozuB3oN5TCil3ADP4Qr04tavTU UhrTuz+JvzZaIXrUdlmQ2siTlRIDcdUPxLwufJQ04BT0vBMbUAMYX8teOW4XWb21Z27F Ez8q+5QwRNCks5qaukDZWOJDtZOtxRyUqO3rp1SJbDrICnNkZVWROaTNKUJYuQlIiRqg zh9d3kY6bcv+hlWq6HBVRWbWWyguoJC7x2O75OBw6Y4h/65LDQ0OGkJTOiKJRzDi3Ztc wm5g== X-Forwarded-Encrypted: i=1; AJvYcCXrLv9QQ/72HJnIOdU2BgELIs62LjvXrRlK3D83+1JLN4OeZpESDMR1nAeisIT4ahk2njA7nHnpkOFOdbKNtphFsm0= X-Gm-Message-State: AOJu0Yx3BH/eHVFTV6w0N20r/HGFTjI/BY31bK8w7jeGtkTCbin4QnrG dE/x4KNlnXDM3UTio7Ui+8VinHbi42chjFXL2bc81VZvw7vbBdUI X-Google-Smtp-Source: AGHT+IEiA2ApdBnCUs64epb1lN7FXFCqRB5g7ItE4oAr7lRYDlq7uFBqz+gfICk8iLjOF4Cx8Zc6Mg== X-Received: by 2002:a05:6808:2014:b0:3bf:db6a:7614 with SMTP id q20-20020a056808201400b003bfdb6a7614mr8861753oiw.42.1708936642635; Mon, 26 Feb 2024 00:37:22 -0800 (PST) Received: from LancedeMBP.lan ([112.10.225.117]) by smtp.gmail.com with ESMTPSA id l2-20020a056a00140200b006e4e93f4f17sm3479647pfu.117.2024.02.26.00.37.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Feb 2024 00:37:22 -0800 (PST) From: Lance Yang To: 21cnbao@gmail.com Cc: akpm@linux-foundation.org, david@redhat.com, ioworker0@gmail.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhocko@suse.com, minchan@kernel.org, peterx@redhat.com, ryan.roberts@arm.com, shy828301@gmail.com, songmuchun@bytedance.com, wangkefeng.wang@huawei.com, zokeefe@google.com, fengwei.yin@intel.com Subject: Re: [PATCH 1/1] mm/madvise: enhance lazyfreeing with mTHP in madvise_free Date: Mon, 26 Feb 2024 16:37:14 +0800 Message-Id: <20240226083714.26187-1-ioworker0@gmail.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Stat-Signature: k5hnyzb9cejw7w6xexa8cekxxn5szq9f X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 93DBC20009 X-HE-Tag: 1708936643-622156 X-HE-Meta: U2FsdGVkX1/GbV9mfq2UAMPIY50M2b2XV+OocwIpn6sgYIKvWql4e46T/IeAGQrYph/rhcACQdiXGodxqN6EjvDUMpRMnTwcrVkxv3hkmM3vnF5nV81gnkeaF1fpSGBCYakaBYVYH5kqb81wORjp8gvT73CrvCOFFtx5dpH7HDl5H3mvn8+jPPxpNCWW0OCVIJ6tLTkoJqqivRpFCkMvPpI5IYH0Vtdz1TscJ4LGJX7CoWX+8QBvi7+4rH4uKBU3fp013E8KPhRJYgL9p9umfGBNBfw9jhXWZEiOtu1YolFwmpLfBP+2Xi7EUYj1E2KMK+UKeBO1vXwURtc/DUzl2sIEI47t13vg4he4Xo61bLGZZ1aWbyCot4eDQEmvpDX16ZLqY8AYXu9NK+hvKEIDsgunTNjo7OKtEYEa3IaZ0TH9am/MUrWDrHRMW9jglmKKmDyNOAZjgl5V4oINC5bJWWOV3+wRKRbeQt6mI1DrH2h/6WfVHhB2fjqsxoWS71A3n6E5BW9o8vlAWPV6vjoLddh1FM1bHRACBZu3BgE5IpCNHBBiX9pRG4iXKzvunkW1xaoA8QeGFe4+JUIyP6gdK0e6QfwfcTskw4jn1uwfaT6wAP7ntJWAIVzs63oBHaSaCmvx61VVSb/J72K39GvRoWVRPxiVngqzr2y+i94y/ROYOOu/5lSGg1a0rYkmJQqMUHREJh/RFVrLd4ihd3H8T8B4QYVBF5uvDEkBrel8KsAZSMPOmaSxuzGJ33pmzFhlctmpxGKpD6UqTU18KIIN243q2kgE/1zI+ddcmDhjMUjoiEsqD3w8aZIbg+bXRsR868nU8zVxqcYM7BGSWfthgr2bh7u6ebckEJQSmKNIbi8Qom155/qUs8cxUtglspB8t/XRfReKpv/xRvvyo6CWP6mQBNMUco+EAkmdwTBSnJzeCFZGMqK8QfjRfYIUyLPANsXOOj3ArVyKo7tSSXu pHUneoB/ DdryXmdgX1SAjjixvKvGHw3MQjgD8+AvjEutLUVmosaGAL5v5BozHy3CtEnch4BqXL3+8Q3vIUghQd+M7Y6Z5lzyN1a1nNHjwmyonBPh0fB0sBkjIIMGRJ+++Tf/bpfw5R12ySGBJ18ZoWkq7533XXzGDcqeGTGeqIw3S896kQ5o4GRFpi3GGnhnBu6RDOLCXWRY6lyfaUWFihfgFplOYaRcfkCWHPoSaGr5cewRsjgEu3KFSG8g5KqZQu0yop7mKUVwYc6lgBDWWRSLl41BN6QtobK8/9B3h1oFZU/EyxviO7aV9mlwBrstxkibzSo1P0Fz9HBSLlZuYX4e/lqCEuLA2qImyCDsa7P5Vwl5PuwRYPx0YVc+xJYqTj9aoRvsHE/yA72q/KKDd3pJj0NAxdmvL9SR+98VxA0RnlMXdGRw/j4TsN12YWa5LH8V/U+FnewLotxzNLntJyH4luiEmyw/rTr6JqnerbaUDnghDtiBdPB6tgFvoNdW75LJ5iNy2XcV7yFWdu1WyzuraewoKHsz/IdpHgtp6n+97naFDNBRpdVWrntnULPO0HcZPBkA+Xx+HcW1Sm4Jds5j6SGaLHoLKBA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hey Barry, Thanks for taking time to review! On Mon, Feb 26, 2024 at 12:00 PM Barry Song <21cnbao@gmail.com> wrote: [...] > On Mon, Feb 26, 2024 at 1:33 AM Lance Yang wrote: [...] > We did something similar on MADV_PAGEOUT[1] > [1] https://lore.kernel.org/linux-mm/20240118111036.72641-7-21cnbao@gmail.com/ Thanks for providing the link above. [...] > > +                        * Avoid unnecessary folio splitting if the large > > +                        * folio is entirely within the given range. > > +                        */ > > +                       folio_test_clear_dirty(folio); > > +                       folio_unlock(folio); > > +                       for (; addr != next_addr; pte++, addr += PAGE_SIZE) { > > +                               ptent = ptep_get(pte); > > +                               if (pte_young(ptent) || pte_dirty(ptent)) { > > +                                       ptent = ptep_get_and_clear_full( > > +                                               mm, addr, pte, tlb->fullmm); > > +                                       ptent = pte_mkold(ptent); > > +                                       ptent = pte_mkclean(ptent); > > +                                       set_pte_at(mm, addr, pte, ptent); > > +                                       tlb_remove_tlb_entry(tlb, pte, addr); > > +                               } > > The code works under the assumption the large folio is entirely mapped > in all PTEs in the range. This is not always true. > > This won't work in some cases as some PTEs might be mapping to the > large folios. some others might have been unmapped or mapped > to different folios. > > so in MADV_PAGEOUT, we have a function to check the folio is > really entirely mapped: > > +static inline bool pte_range_cont_mapped(unsigned long start_pfn, > + pte_t *start_pte, unsigned long start_addr, int nr) > +{ > + int i; > + pte_t pte_val; > + > + for (i = 0; i < nr; i++) { > + pte_val = ptep_get(start_pte + i); > + > + if (pte_none(pte_val)) > + return false; > + > + if (pte_pfn(pte_val) != (start_pfn + i)) > + return false; > + } > + > + return true; > +} Thanks for providing the information; it's very helpful to me! I made some changes. Would you mind taking another look, please? As a diff against this patch. diff --git a/mm/madvise.c b/mm/madvise.c index bcbf56595a2e..255d2f329be4 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -616,6 +616,18 @@ static long madvise_pageout(struct vm_area_struct *vma, return 0; } +static inline bool pte_range_cont_mapped(pte_t *pte, unsigned long nr) +{ + pte_t pte_val; + unsigned long pfn = pte_pfn(pte); + for (int i = 0; i < nr; i++) { + pte_val = ptep_get(pte + i); + if (pte_none(pte_val) || pte_pfn(pte_val) != (pfn + i)) + return false; + } + return true; +} + static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) @@ -676,20 +688,25 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, */ if (folio_test_large(folio)) { int err; - unsigned long next_addr, align; + unsigned long nr, next_addr, align; if (folio_estimated_sharers(folio) != 1 || !folio_trylock(folio)) goto skip_large_folio; - align = folio_nr_pages(folio) * PAGE_SIZE; + nr = folio_nr_pages(folio); + align = nr * PAGE_SIZE; next_addr = ALIGN_DOWN(addr + align, align); /* - * If we mark only the subpages as lazyfree, - * split the large folio. + * If we mark only the subpages as lazyfree, or + * if there is a cow folio associated with this folio, + * or if this folio is not really entirely mapped, + * then split the large folio. */ - if (next_addr > end || next_addr - addr != align) + if (next_addr > end || next_addr - addr != align || + folio_total_mapcount(folio) != nr || + pte_range_cont_mapped(pte, nr)) goto split_large_folio; /* --- Thanks again for your time! Best, Lance