From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5838C6FD1C for ; Mon, 13 Mar 2023 14:10:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C9016B0072; Mon, 13 Mar 2023 10:10:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 07A046B0074; Mon, 13 Mar 2023 10:10:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E83826B0075; Mon, 13 Mar 2023 10:10:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D88526B0072 for ; Mon, 13 Mar 2023 10:10:29 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id B19DE120C45 for ; Mon, 13 Mar 2023 14:10:29 +0000 (UTC) X-FDA: 80564060178.05.A9F6E48 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by imf04.hostedemail.com (Postfix) with ESMTP id AFF9540016 for ; Mon, 13 Mar 2023 14:10:24 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=dmbg7Yzs; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf04.hostedemail.com: domain of imbrenda@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=imbrenda@linux.ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678716625; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jV6w5o+g9Zg6pNY2GerBin6AWWVs5FUi64qFIY/KAYM=; b=x/8DGWcGu9yhuC6qON8tM4QMY7aqJw3tDiRtLNFwHTVBkjvDWP5Tx8KWfJ550k2cB/d2is IfB/auCzlh8fJarkTFQmIXedV4X75puk8J0CZ+csTuewE+3XtivGMFbdnGkLt8A4fAm6z9 sa/yIGKEc7XEe76t31ol5sTtXku+EF8= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=dmbg7Yzs; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf04.hostedemail.com: domain of imbrenda@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=imbrenda@linux.ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678716625; a=rsa-sha256; cv=none; b=CG5NWaMSZHowLlJ/c/mBoIsb7sgWHc/qxQGJ1HykvxdoHmAHtdAg2V7anp2MNTPJbuMDQ3 TGu6bB9AbJd3WdZ9ybKTtPHCRElYzkSb+jIowacnRIPJV3zr6881yz8duQYx6cNmIJDEx7 7UuYACk953MExjUsMsW02OqqpOe4jG0= Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 32DDfRtM003549; Mon, 13 Mar 2023 14:10:23 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : cc : subject : message-id : in-reply-to : references : content-type : content-transfer-encoding : mime-version; s=pp1; bh=jV6w5o+g9Zg6pNY2GerBin6AWWVs5FUi64qFIY/KAYM=; b=dmbg7YzsMXn/1MsgVDEQGxJzdPxyeF0p7ycC828pWiJVuLQVHgFbmhubx/fQ/Az5ui1I dsWtysLvp1qILwC5flB9jTNjSS1Fj+7zV7KZ0EYjHA6Cyn5OEP1YvMh0ri9tERdf/DSb 4voWnIGwbZ7UXhiVnt9j6hViVnrsitrfk2Py1n4yZek3duTpn0z2n7ReDhPxhwY6r701 rNIiVipCUBNKC8TI8MXqcojGtbkG2WrtcaRlLdryA13L8nFVMPCvwJiNxUTgWDHnA/eU WzNrQu0VbzSLTG4juwsv8oGG+Q8gm1p0FU1CaVNE3JKDvca1EY6EwNLwEGtHFiX8bHVB DA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3pa3dkv0u9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 13 Mar 2023 14:10:22 +0000 Received: from m0098396.ppops.net (m0098396.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 32DCQt0m026782; Mon, 13 Mar 2023 14:10:22 GMT Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3pa3dkv0t0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 13 Mar 2023 14:10:21 +0000 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 32D23MwT028663; Mon, 13 Mar 2023 14:10:19 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma04ams.nl.ibm.com (PPS) with ESMTPS id 3p8h96kfdk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 13 Mar 2023 14:10:19 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 32DEAGWL35127802 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 13 Mar 2023 14:10:17 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E402120043; Mon, 13 Mar 2023 14:10:16 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A6EF020040; Mon, 13 Mar 2023 14:10:16 +0000 (GMT) Received: from p-imbrenda (unknown [9.152.224.56]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 13 Mar 2023 14:10:16 +0000 (GMT) Date: Mon, 13 Mar 2023 15:10:14 +0100 From: Claudio Imbrenda To: David Hildenbrand Cc: yang.yang29@zte.com.cn, akpm@linux-foundation.org, jiang.xuexin@zte.com.cn, linux-kernel@vger.kernel.org, linux-mm@kvack.org, ran.xiaokai@zte.com.cn, xu.xin.sc@gmail.com, xu.xin16@zte.com.cn, Hugh Dickins Subject: Re: [PATCH v6 0/6] ksm: support tracking KSM-placed zero-pages Message-ID: <20230313151014.5e17fc19@p-imbrenda> In-Reply-To: <9d7a8be3-ee9e-3492-841b-a0af9952ef36@redhat.com> References: <202302100915227721315@zte.com.cn> <9d7a8be3-ee9e-3492-841b-a0af9952ef36@redhat.com> Organization: IBM X-Mailer: Claws Mail 4.1.1 (GTK 3.24.35; x86_64-redhat-linux-gnu) Content-Type: text/plain; charset=US-ASCII X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 3M3Wmu_ynbX26bS7ncbujnl5i23aFBXh X-Proofpoint-GUID: rttlbGPjc26P2nkCxkdZdc1Tl124ODlP Content-Transfer-Encoding: 7bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-13_07,2023-03-13_02,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 priorityscore=1501 malwarescore=0 impostorscore=0 phishscore=0 bulkscore=0 mlxlogscore=401 suspectscore=0 clxscore=1015 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2303130112 X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: AFF9540016 X-Stat-Signature: ayf89ihkwwh1x9w6whz97zxfqkmt53cx X-HE-Tag: 1678716624-118729 X-HE-Meta: U2FsdGVkX194UUWh7riEFp7808HrGIndW5OFcGfb0iTaJ5VnOozYhXvrqna8VXJqbFBE1DDeZUpTsTHj2aoK3FzFNr6x7H2YavovOkz/x25KhSR+1SkgFdUk2QTf+bgsuYS5yHvNnA4BSnPhWGxUJ6nKJhB9ouu4cbscHvagLntm922iphbJnzzR0qXu8U9zZaXABDayr9igNZMEqUA2jbjEQXbYGBu99ohx8WThCC6it3wSl6zdtf9fvE3waJTCOr8WrOTDsHeyUQAOAuS11NC5r/n6jLkEwatXSlwZqaGMnB+myhlVsO9q5bBsiBbn1tM/2NIh1gpOMx5lEFn5013idmN32qqYajVQ5x/Xq2RXb+jmFMqdamBz92S743FCb4pbPWoiB8zNG+ABLsKA4sKOcek6HWLe0dYgsRQiVBxHnORQYo/MjYWwSxVdtEjr2jJHZMv/lTEISG6xIksPExO5em+P1pZN/8x/M4lRoft/iLzbVGdygJS2GtWuguqHW4KYat1saP8ilTA6OH6fgH7hBJ8wE8357d6kyoX5g5iV5Wf2p21LkiKz07oL3DBRIfuqvlKSHlUHx4hN35a3HHB1h/XMlv5gETXCZq2tpKvlNI/lRE227SXTdM4+iDdGKzWmfifXLJclq80ScQ5Ebd3r1WrE7bmHD2s1jo6AJ9aVtyYDRhdRwJBKjjfV3cjN98ulxGnzueQASEHkDcl8P+v4Msxfst9fZr9/2cyKVz0B58pFuWftQsmYBkYkswuhlMN8q7hMTX01hGf1caP9P7xWc4iFpMYd/kJ5R37naKToMSp5T1yB4F02AZN2qEOBYlqK07tsGa/At2ZwJvFLW8tkxwIHvRSJupvFgXqN/0xZP3qLNALkNa3Jb2Qc09wN99nLsh4RZO9fwxI/TuW3Yz+JuJPAjNdeGOcNew2EDBLzvGe+78+Hxy8mlZ5yzfQOZsra9SuW6x+cTETl2Y+ KJv1t+NX FDiAGiCSP/um/aUunAuFsXoyoTsxNKgfs7Rc6+WcKDebq35NUY0x9QiQZdrCdxMzFd0FiaYjXwh4jwQ4auLOONK05YpnC95rhfIiCuU6hP1j00olSboO/QBd9O3Nf0IRAj4E6xFEq7rSqEZqiouCjji2TI/ItPxJvFd+dQAiR+3O0/O68HR/pH4JQisEJsK3swVPMpMxhNLw65smEuUeBXM5N3wgvq90N8bNp1/F+dxcoyeakBJM0jZjaSszl2unoVDFIAbpk4aN40hTS8xgPv/Zkm07A7GgYB4FSEAYGoCwLE8WZJTwRYpBmtGt85ENem9lsFW4aIm+zhRECBTwLcUfzRrUNTV2HxacZTpIz6AbMc/aI4OcHFuugvI2qvCtuuUJO7+vtsgxfNXQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 13 Mar 2023 14:03:33 +0100 David Hildenbrand wrote: > On 10.02.23 02:15, yang.yang29@zte.com.cn wrote: > > From: xu xin > > > > Hi, > > sorry for the late follow-up. Still wrapping my head around this and > possible alternatives. I hope we'll get some comments from others as > well about the basic approach. > > > The core idea of this patch set is to enable users to perceive the number of any > > pages merged by KSM, regardless of whether use_zero_page switch has been turned > > on, so that users can know how much free memory increase is really due to their > > madvise(MERGEABLE) actions. But the problem is, when enabling use_zero_pages, > > all empty pages will be merged with kernel zero pages instead of with each > > other as use_zero_pages is disabled, and then these zero-pages are no longer > > monitored by KSM. > > > > The motivations for me to do this contains three points: > > > > 1) MADV_UNMERGEABLE and other ways to trigger unsharing will *not* > > unshare the shared zeropage as placed by KSM (which is against the > > MADV_UNMERGEABLE documentation at least); see the link: > > https://lore.kernel.org/lkml/4a3daba6-18f9-d252-697c-197f65578c44@redhat.com/ > > > > 2) We cannot know how many pages are zero pages placed by KSM when > > enabling use_zero_pages, which hides the critical information about > > how much actual memory are really saved by KSM. Knowing how many > > ksm-placed zero pages are helpful for user to use the policy of madvise > > (MERGEABLE) better because they can see the actual profit brought by KSM. > > > > 3) The zero pages placed-by KSM are different from those initial empty page > > (filled with zeros) which are never touched by applications. The former > > is active-merged by KSM while the later have never consume actual memory. > > > > I agree with all of the above, but it's still unclear to me if there is > a real downside to a simpler approach: > > (1) Tracking the shared zeropages. That would be fairly easy: whenever > we map/unmap a shared zeropage, we simply update the counter. > > (2) Unmerging all shared zeropages inside the VMAs during > MADV_UNMERGEABLE. > > (3) Documenting that MADV_UNMERGEABLE will also unmerge the shared > zeropage when toggle xy is flipped. > > It's certainly simpler and doesn't rely on the rmap item. See below. I would surely prefer a simpler approach > > > use_zero_pages is useful, not only because of cache colouring as described > > in doc, but also because use_zero_pages can accelerate merging empty pages > > when there are plenty of empty pages (full of zeros) as the time of > > page-by-page comparisons (unstable_tree_search_insert) is saved. So we hope to > > implement the support for ksm zero page tracking without affecting the feature > > of use_zero_pages. > > > > Zero pages may be the most common merged pages in actual environment(not only VM but > > also including other application like containers). Enabling use_zero_pages in the > > environment with plenty of empty pages(full of zeros) will be very useful. Users and > > app developer can also benefit from knowing the proportion of zero pages in all > > merged pages to optimize applications. > > > > I agree with that point, especially after I read in a paper that KSM > applied to some applications mainly deduplicates pages filled with 0s. > So it seems like a reasonable thing to optimize for. > > > With the patch series, we can both unshare zero-pages(KSM-placed) accurately > > and count ksm zero pages with enabling use_zero_pages. > > The problem with this approach I see is that it fundamentally relies on > the rmap/stable-tree to detect whether a zeropage was placed or not. > > I was wondering, why we even need an rmap item *at all* anymore. Why > can't we place the shared zeropage an call it a day (remove the rmap > item)? Once we placed a shared zeropage, the next KSM scan should better > just ignore it, it's already deduplicated. > > So if most pages we deduplicate are shared zeropages, it would be quite > interesting to reduce the memory overhead and avoid rmap items, instead > of building new functionality on top of it? > > > > If we'd really want to identify whether a zeropage was deduplciated by > KSM, we could try storing that information inside the PTE instead of this is interesting, but needs caution, for the reason you mention below > inside the RMAP. Then, we could directly adjust the counter when zapping > the shared zeropage or during MADV_DONTNEED/when unmerging. > > Eventually, we could simply say that > * !pte_dirty(): zeropage placed during fault > * pte_dirty(): zeropage placed by KSM > > Then it would also be easy to adjust counters and unmerge. We'd limit > this handling to known-working architectures initially (spec64 still has > the issue that pte_mkdirty() will set a pte writable ... and my patch to > fix that was not merged yet). We'd have to double-check all > pte_mkdirty/pte_mkclean() callsites. this will be... interesting >