From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8EC13C43217 for ; Thu, 1 Dec 2022 13:46:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2CAE56B0072; Thu, 1 Dec 2022 08:46:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 27B446B0073; Thu, 1 Dec 2022 08:46:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 142566B0074; Thu, 1 Dec 2022 08:46:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 03ED06B0072 for ; Thu, 1 Dec 2022 08:46:50 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id B697FA0293 for ; Thu, 1 Dec 2022 13:46:49 +0000 (UTC) X-FDA: 80193862938.04.F8C18A1 Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) by imf20.hostedemail.com (Postfix) with ESMTP id 500AC1C000A for ; Thu, 1 Dec 2022 13:46:49 +0000 (UTC) Received: by mail-pf1-f174.google.com with SMTP id 130so1889210pfu.8 for ; Thu, 01 Dec 2022 05:46:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=If3f63B2h04b254I9o4MKB84vjG8+jxVlrx/meu7X4Q=; b=EClEtH/ntnyU/neADIAgFPgnHmZXeChJhm37Axu3KuXbE1dTu0O3YWjGntXfHNWfkA UrbxYH2cZ/n2/uH4BK9lm7mxMwTsGYRMjgZ+oVcBECDySXkMbF7tX30n/N75T5KRfna9 ldbJkAZ8XKhWWTEV0vjOAFiJfRh2HJjEsksdVySpPgj4cCLgSITukd7FTCJFi4VWnOjV UaSfH7BmrrkbeNyP/1okOoXcq/KOBlykgn4B6WdNuacwVQzM8EIKsAFo0U1ye5kknbu+ FQhqpnWuQy1Kr3MhKjN9FoIDNBOBqwJJADYZToSLeOgDknqIBaDrbgw7pXyE9OM8Z9ze UxNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=If3f63B2h04b254I9o4MKB84vjG8+jxVlrx/meu7X4Q=; b=nmz7CfROkJ90OX7/u02WHuLfQFgwblkyfesbUkmXbivN4ArK5a0xsotGBVlbW6lExN 4bUtgY8v9uhgzlMW1JP4TEHyPEGJO/fPZfVMAQtT6vQLtxHW1rYHmW/cl1BPykxoXQTu HxVo6xggQI7ibfWQetaDX/3pd1yT+drAdo5FpKBFIx8zWKIcw8KIVd+jdTocFGpbVaCi 6NiWDWz3Rb1uhqxXuYaBUtRaSI4rGxcwOX7UGPLEffki8Dic9cIBulnSrwfWhevofXxx XqMf+esOc5PiQT+z7/VpYrgCzYw2eiCLfcHj2VXJA1nouMJvtQJ+/+Uj/4ylVlvZ55IV lHAQ== X-Gm-Message-State: ANoB5pk59H7fzFj+4lgAi6XUTAklZm1U7ketetwDQ8fDMXQsTJuL1cAe tYKezFTnqgSyk34kca21qNEIn92rD5cueM0ls3Q= X-Google-Smtp-Source: AA0mqf7bG/X5stL6Dd2Hv8aBi5pAAIrx3BXCVyth6PhEVACscNTU53PS0x03AGRjcBFApsqp3WXipxXrns7ZwqHCfr0= X-Received: by 2002:a63:4808:0:b0:46a:f646:13da with SMTP id v8-20020a634808000000b0046af64613damr40318638pga.621.1669902408052; Thu, 01 Dec 2022 05:46:48 -0800 (PST) MIME-Version: 1.0 References: <1667971116-12900-1-git-send-email-quic_pkondeti@quicinc.com> In-Reply-To: <1667971116-12900-1-git-send-email-quic_pkondeti@quicinc.com> From: Mark Hemment Date: Thu, 1 Dec 2022 13:46:36 +0000 Message-ID: Subject: Re: [PATCH] mm/madvise: fix madvise_pageout for private file mappings To: Pavankumar Kondeti Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Suren Baghdasaryan , Minchan Kim , Charan Teja Kalla , Prakash Gupta , Divyanand Rangu Content-Type: text/plain; charset="UTF-8" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669902409; a=rsa-sha256; cv=none; b=afyiH5m6Nuxvz5vRFIn3egqJufYwvt2xcITl/bV/qSMK3coLXgeW944fZxfa7zCbRrao1D VHPNvQAANbdp4VCtiSRv26KyYg4ocbtbLlABXcypHBNIv4NIwPmb2Z+Hnun1BiFAXRgENp 4zsrgQpN7xJJts0QiD8yxQ4oIQLetQQ= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=googlemail.com header.s=20210112 header.b="EClEtH/n"; dmarc=pass (policy=quarantine) header.from=googlemail.com; spf=pass (imf20.hostedemail.com: domain of markhemm@googlemail.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=markhemm@googlemail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669902409; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=If3f63B2h04b254I9o4MKB84vjG8+jxVlrx/meu7X4Q=; b=kccvR9VSao1FmX6LQaEaXyTV6r5AbpN4zQv4EHpSKQnVQb/TzNUQ/dYpaRhlHKAZTH0hAm e6y6NggTrLVawlYmatwkc4dL9RrC1Zwksnjf7NsilG32iiuxNQfPgT0uR4KTFp0FQJfOK7 LdrFIcq6Zs48Xy300u6+MiGIKz0wFwQ= X-Rspamd-Queue-Id: 500AC1C000A X-Rspam-User: Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=googlemail.com header.s=20210112 header.b="EClEtH/n"; dmarc=pass (policy=quarantine) header.from=googlemail.com; spf=pass (imf20.hostedemail.com: domain of markhemm@googlemail.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=markhemm@googlemail.com X-Rspamd-Server: rspam09 X-Stat-Signature: gzd6eur9xq8on57wzj4zypk9yxtwdsa9 X-HE-Tag: 1669902409-809841 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, 9 Nov 2022 at 05:19, Pavankumar Kondeti wrote: > > When MADV_PAGEOUT is called on a private file mapping VMA region, > we bail out early if the process is neither owner nor write capable > of the file. However, this VMA may have both private/shared clean > pages and private dirty pages. The opportunity of paging out the > private dirty pages (Anon pages) is missed. Fix this by caching > the file access check and use it later along with PageAnon() during > page walk. > > We observe ~10% improvement in zram usage, thus leaving more available > memory on a 4GB RAM system running Android. > > Signed-off-by: Pavankumar Kondeti Only scanned review the patch; the logic looks good (as does the reasoning) but a couple of minor comments; > --- > mm/madvise.c | 30 +++++++++++++++++++++++------- > 1 file changed, 23 insertions(+), 7 deletions(-) > > diff --git a/mm/madvise.c b/mm/madvise.c > index c7105ec..b6b88e2 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -40,6 +40,7 @@ > struct madvise_walk_private { > struct mmu_gather *tlb; > bool pageout; > + bool can_pageout_file; > }; > > /* > @@ -328,6 +329,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, > struct madvise_walk_private *private = walk->private; > struct mmu_gather *tlb = private->tlb; > bool pageout = private->pageout; > + bool pageout_anon_only = pageout && !private->can_pageout_file; > struct mm_struct *mm = tlb->mm; > struct vm_area_struct *vma = walk->vma; > pte_t *orig_pte, *pte, ptent; > @@ -364,6 +366,9 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, > if (page_mapcount(page) != 1) > goto huge_unlock; > > + if (pageout_anon_only && !PageAnon(page)) > + goto huge_unlock; > + > if (next - addr != HPAGE_PMD_SIZE) { > int err; > > @@ -432,6 +437,8 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, > if (PageTransCompound(page)) { > if (page_mapcount(page) != 1) > break; > + if (pageout_anon_only && !PageAnon(page)) > + break; > get_page(page); > if (!trylock_page(page)) { > put_page(page); > @@ -459,6 +466,9 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, > if (!PageLRU(page) || page_mapcount(page) != 1) > continue; > > + if (pageout_anon_only && !PageAnon(page)) > + continue; > + The added PageAnon()s probably do not have a measurable performance impact, but not ideal when walking a large anonymous mapping (as '->can_pageout_file' is zero for anon mappings). Could the code be re-structured so that PageAnon() is only tested when filtering is needed? Say; if (pageout_anon_only_filter && !PageAnon(page)) { continue; } where 'pageout_anon_only_filter' is only set for a private named mapping when do not have write perms on backing object. It would not be set for anon mappings. > VM_BUG_ON_PAGE(PageTransCompound(page), page); > > if (pte_young(ptent)) { > @@ -541,11 +551,13 @@ static long madvise_cold(struct vm_area_struct *vma, > > static void madvise_pageout_page_range(struct mmu_gather *tlb, > struct vm_area_struct *vma, > - unsigned long addr, unsigned long end) > + unsigned long addr, unsigned long end, > + bool can_pageout_file) > { > struct madvise_walk_private walk_private = { > .pageout = true, > .tlb = tlb, > + .can_pageout_file = can_pageout_file, > }; > > tlb_start_vma(tlb, vma); > @@ -553,10 +565,8 @@ static void madvise_pageout_page_range(struct mmu_gather *tlb, > tlb_end_vma(tlb, vma); > } > > -static inline bool can_do_pageout(struct vm_area_struct *vma) > +static inline bool can_do_file_pageout(struct vm_area_struct *vma) > { > - if (vma_is_anonymous(vma)) > - return true; > if (!vma->vm_file) > return false; > /* > @@ -576,17 +586,23 @@ static long madvise_pageout(struct vm_area_struct *vma, > { > struct mm_struct *mm = vma->vm_mm; > struct mmu_gather tlb; > + bool can_pageout_file; > > *prev = vma; > if (!can_madv_lru_vma(vma)) > return -EINVAL; > > - if (!can_do_pageout(vma)) > - return 0; The removal of this test results in a process, which cannot get write perms for a shared named mapping, performing a 'walk'. As such a mapping cannot have anon pages, this walk will be a no-op. Not sure why a well-behaved program would do a MADV_PAGEOUT on such a mapping, but if one does this could be considered a (minor performance) regression. As madvise_pageout() can easily filter this case, might be worth adding a check. > + /* > + * If the VMA belongs to a private file mapping, there can be private > + * dirty pages which can be paged out if even this process is neither > + * owner nor write capable of the file. Cache the file access check > + * here and use it later during page walk. > + */ > + can_pageout_file = can_do_file_pageout(vma); > > lru_add_drain(); > tlb_gather_mmu(&tlb, mm); > - madvise_pageout_page_range(&tlb, vma, start_addr, end_addr); > + madvise_pageout_page_range(&tlb, vma, start_addr, end_addr, can_pageout_file); > tlb_finish_mmu(&tlb); > > return 0; > -- > 2.7.4 > > Cheers, Mark