From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68361C43217 for ; Thu, 1 Dec 2022 14:17:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 900CA6B0072; Thu, 1 Dec 2022 09:17:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 889876B0073; Thu, 1 Dec 2022 09:17:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 703146B0074; Thu, 1 Dec 2022 09:17:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 5E4306B0072 for ; Thu, 1 Dec 2022 09:17:49 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 208F780AED for ; Thu, 1 Dec 2022 14:17:49 +0000 (UTC) X-FDA: 80193941058.22.F7F22D2 Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) by imf12.hostedemail.com (Postfix) with ESMTP id 1B8D140015 for ; Thu, 1 Dec 2022 14:17:46 +0000 (UTC) Received: from pps.filterd (m0279865.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2B1BqUwG031346; Thu, 1 Dec 2022 14:17:34 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=qcppdkim1; bh=RBHvfFvfjPggq85BmasY8t3grXmjKTmAAepeqUyVSSg=; b=IRQWKguMolO4NcpFbx9TyBlCJJCpgblSFAW9T8iNvi6PruFtUbDbnlUQ+3O6C3A9BRPP e8TzpNS8HHtRuc6K88KdjbD44VVYrvcdJXywBfVAqpBGWNrXGDSxuwAGG54kZUc9C7K7 zlSN+mNubxKZw0Lwg/ICgDIi+WIhUltpQDviABkSehKGLm+MMz3gr8eQNGKb4gt4mKOb S6Tdfl264T2QcITC0hM6ZF6PKBNstlsOXe7S2HtFoVrdW/3RmYkdnyojkE1kNgNqTQk1 PxOKsQ3KePLj3hgR6IFlBLLK2n8ALgWOAosdOT4Ggo3MYWGQIV+KmEwo3zf+jWgOLW7Y jQ== Received: from nalasppmta02.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3m6k6xhypq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 01 Dec 2022 14:17:34 +0000 Received: from nalasex01a.na.qualcomm.com (nalasex01a.na.qualcomm.com [10.47.209.196]) by NALASPPMTA02.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTPS id 2B1EHXse012499 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 1 Dec 2022 14:17:33 GMT Received: from hu-pkondeti-hyd.qualcomm.com (10.80.80.8) by nalasex01a.na.qualcomm.com (10.47.209.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 1 Dec 2022 06:17:30 -0800 Date: Thu, 1 Dec 2022 19:47:26 +0530 From: Pavan Kondeti To: Mark Hemment CC: Pavankumar Kondeti , Andrew Morton , , , Suren Baghdasaryan , Minchan Kim , Charan Teja Kalla , Prakash Gupta , Divyanand Rangu Subject: Re: [PATCH] mm/madvise: fix madvise_pageout for private file mappings Message-ID: <20221201141726.GA18487@hu-pkondeti-hyd.qualcomm.com> References: <1667971116-12900-1-git-send-email-quic_pkondeti@quicinc.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01b.na.qualcomm.com (10.46.141.250) To nalasex01a.na.qualcomm.com (10.47.209.196) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-GUID: oY82rhSEp2POs5Pbfe7YGy3HuPUmgv0A X-Proofpoint-ORIG-GUID: oY82rhSEp2POs5Pbfe7YGy3HuPUmgv0A X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-12-01_04,2022-12-01_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 spamscore=0 malwarescore=0 lowpriorityscore=0 bulkscore=0 impostorscore=0 phishscore=0 mlxscore=0 mlxlogscore=999 priorityscore=1501 clxscore=1011 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2212010103 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669904267; a=rsa-sha256; cv=none; b=a9wxn/xCEmfLIghPlkKxtn50FJCHBS/6vzZU2Cpf3ylIHOZ68mYTLVpHioClGyO9e3Q7H5 zdm6hBPVbrMB7iDBCcdD60Caq0vRah4C/tw6tlPX9Ybu7e0T8RDFkclwMrJb06u45Qh5k8 ZTCAClx2YNhH6MCDfIRpTd7bJKeige8= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b=IRQWKguM; dmarc=pass (policy=none) header.from=quicinc.com; spf=pass (imf12.hostedemail.com: domain of quic_pkondeti@quicinc.com designates 205.220.168.131 as permitted sender) smtp.mailfrom=quic_pkondeti@quicinc.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669904267; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RBHvfFvfjPggq85BmasY8t3grXmjKTmAAepeqUyVSSg=; b=tuMe5v6iXtFFd9mSEFdzZ+B9KpYZbSj2WKseEBUzdm7zDOQFMrE1+srdRna3z3aG8n3CJ1 L0FZgh8WBcUClANbXnd0Z80/q0vlDBuivIq+p1pwqxHacP9MnhKyC774UuFvVcQ/32mtu0 bc9fhgE6hXmKI6CdydLUZV94SpuMc5Y= X-Rspamd-Queue-Id: 1B8D140015 X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b=IRQWKguM; dmarc=pass (policy=none) header.from=quicinc.com; spf=pass (imf12.hostedemail.com: domain of quic_pkondeti@quicinc.com designates 205.220.168.131 as permitted sender) smtp.mailfrom=quic_pkondeti@quicinc.com X-Rspamd-Server: rspam09 X-Stat-Signature: 4r7agrra7kgmwezrxmfmyxpa4gdahse9 X-HE-Tag: 1669904266-504044 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Mark, On Thu, Dec 01, 2022 at 01:46:36PM +0000, Mark Hemment wrote: > On Wed, 9 Nov 2022 at 05:19, Pavankumar Kondeti > wrote: > > > > When MADV_PAGEOUT is called on a private file mapping VMA region, > > we bail out early if the process is neither owner nor write capable > > of the file. However, this VMA may have both private/shared clean > > pages and private dirty pages. The opportunity of paging out the > > private dirty pages (Anon pages) is missed. Fix this by caching > > the file access check and use it later along with PageAnon() during > > page walk. > > > > We observe ~10% improvement in zram usage, thus leaving more available > > memory on a 4GB RAM system running Android. > > > > Signed-off-by: Pavankumar Kondeti > > Only scanned review the patch; the logic looks good (as does the > reasoning) but a couple of minor comments; > Thanks for the review and nice suggestions on how the patch can be improved. > > > --- > > mm/madvise.c | 30 +++++++++++++++++++++++------- > > 1 file changed, 23 insertions(+), 7 deletions(-) > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > index c7105ec..b6b88e2 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -40,6 +40,7 @@ > > struct madvise_walk_private { > > struct mmu_gather *tlb; > > bool pageout; > > + bool can_pageout_file; > > }; > > > > /* > > @@ -328,6 +329,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, > > struct madvise_walk_private *private = walk->private; > > struct mmu_gather *tlb = private->tlb; > > bool pageout = private->pageout; > > + bool pageout_anon_only = pageout && !private->can_pageout_file; > > struct mm_struct *mm = tlb->mm; > > struct vm_area_struct *vma = walk->vma; > > pte_t *orig_pte, *pte, ptent; > > @@ -364,6 +366,9 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, > > if (page_mapcount(page) != 1) > > goto huge_unlock; > > > > + if (pageout_anon_only && !PageAnon(page)) > > + goto huge_unlock; > > + > > if (next - addr != HPAGE_PMD_SIZE) { > > int err; > > > > @@ -432,6 +437,8 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, > > if (PageTransCompound(page)) { > > if (page_mapcount(page) != 1) > > break; > > + if (pageout_anon_only && !PageAnon(page)) > > + break; > > get_page(page); > > if (!trylock_page(page)) { > > put_page(page); > > @@ -459,6 +466,9 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, > > if (!PageLRU(page) || page_mapcount(page) != 1) > > continue; > > > > + if (pageout_anon_only && !PageAnon(page)) > > + continue; > > + > > The added PageAnon()s probably do not have a measurable performance > impact, but not ideal when walking a large anonymous mapping (as > '->can_pageout_file' is zero for anon mappings). > Could the code be re-structured so that PageAnon() is only tested when > filtering is needed? Say; > if (pageout_anon_only_filter && !PageAnon(page)) { > continue; > } > where 'pageout_anon_only_filter' is only set for a private named > mapping when do not have write perms on backing object. It would not > be set for anon mappings. > Understood. Like you suggested, PageAnon() check can be eliminated for an anon mapping. will make the necessary changes. > > > VM_BUG_ON_PAGE(PageTransCompound(page), page); > > > > if (pte_young(ptent)) { > > @@ -541,11 +551,13 @@ static long madvise_cold(struct vm_area_struct *vma, > > > > static void madvise_pageout_page_range(struct mmu_gather *tlb, > > struct vm_area_struct *vma, > > - unsigned long addr, unsigned long end) > > + unsigned long addr, unsigned long end, > > + bool can_pageout_file) > > { > > struct madvise_walk_private walk_private = { > > .pageout = true, > > .tlb = tlb, > > + .can_pageout_file = can_pageout_file, > > }; > > > > tlb_start_vma(tlb, vma); > > @@ -553,10 +565,8 @@ static void madvise_pageout_page_range(struct mmu_gather *tlb, > > tlb_end_vma(tlb, vma); > > } > > > > -static inline bool can_do_pageout(struct vm_area_struct *vma) > > +static inline bool can_do_file_pageout(struct vm_area_struct *vma) > > { > > - if (vma_is_anonymous(vma)) > > - return true; > > if (!vma->vm_file) > > return false; > > /* > > @@ -576,17 +586,23 @@ static long madvise_pageout(struct vm_area_struct *vma, > > { > > struct mm_struct *mm = vma->vm_mm; > > struct mmu_gather tlb; > > + bool can_pageout_file; > > > > *prev = vma; > > if (!can_madv_lru_vma(vma)) > > return -EINVAL; > > > > - if (!can_do_pageout(vma)) > > - return 0; > > The removal of this test results in a process, which cannot get write > perms for a shared named mapping, performing a 'walk'. As such a > mapping cannot have anon pages, this walk will be a no-op. Not sure > why a well-behaved program would do a MADV_PAGEOUT on such a mapping, > but if one does this could be considered a (minor performance) > regression. As madvise_pageout() can easily filter this case, might be > worth adding a check. > Got it. we can take care of this edge case by rejecting shared mappings i.e !!(vma->vm_flags & VM_MAYSHARE) == 1 where the process has no write permission. > > > + /* > > + * If the VMA belongs to a private file mapping, there can be private > > + * dirty pages which can be paged out if even this process is neither > > + * owner nor write capable of the file. Cache the file access check > > + * here and use it later during page walk. > > + */ > > + can_pageout_file = can_do_file_pageout(vma); > > > > lru_add_drain(); > > tlb_gather_mmu(&tlb, mm); > > - madvise_pageout_page_range(&tlb, vma, start_addr, end_addr); > > + madvise_pageout_page_range(&tlb, vma, start_addr, end_addr, can_pageout_file); > > tlb_finish_mmu(&tlb); > > > > return 0; > > -- > > 2.7.4 > > > > > Thanks, Pavan