From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F22CC76196 for ; Thu, 6 Apr 2023 17:03:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C31766B0071; Thu, 6 Apr 2023 13:03:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BE0116B0074; Thu, 6 Apr 2023 13:03:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA9BA6B0075; Thu, 6 Apr 2023 13:03:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 9AE016B0071 for ; Thu, 6 Apr 2023 13:03:22 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 694D41A100C for ; Thu, 6 Apr 2023 17:03:22 +0000 (UTC) X-FDA: 80651587044.09.096BC6E Received: from wout5-smtp.messagingengine.com (wout5-smtp.messagingengine.com [64.147.123.21]) by imf24.hostedemail.com (Postfix) with ESMTP id 6B117180024 for ; Thu, 6 Apr 2023 17:03:18 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=devkernel.io header.s=fm3 header.b=Rf0A9xNp; dkim=pass header.d=messagingengine.com header.s=fm2 header.b=ct2aPFIo; spf=pass (imf24.hostedemail.com: domain of shr@devkernel.io designates 64.147.123.21 as permitted sender) smtp.mailfrom=shr@devkernel.io; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680800598; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ygGy2hfGMUZ/sVKx6qf9fqJmSIWmeg4Snqz38AyDgXs=; b=Mf2HsBjHwWB4cs3B0Cgq+KIvYNK4gKWq+Saqynia++n/RIjlECKLpH2OQ2scOFHJQNq++d 5oU8u3BuZp1IS2W9FfjhMcWZ2y3ZEssh3BiRB0S3gX146/3PxTmWboTbBlRiCfzDfm6aeT sBtdCEXLozSc0Hu4csR7VDyofQ4ndus= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=devkernel.io header.s=fm3 header.b=Rf0A9xNp; dkim=pass header.d=messagingengine.com header.s=fm2 header.b=ct2aPFIo; spf=pass (imf24.hostedemail.com: domain of shr@devkernel.io designates 64.147.123.21 as permitted sender) smtp.mailfrom=shr@devkernel.io; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680800598; a=rsa-sha256; cv=none; b=VGUBsi6WtYWTGOymR4HFYFrtrq254W/8aAa7ypDPVrk8L3FeKmP+6NEl3O8WHfyrQXWVHL 8Hg6Ba9q7Qetpd/q5vhLxjbdKmzcq+hkZxQClmB7Gzj+OPA6S0x2hXhT7tSzgnQHnFUKj1 v4Iql67Zcx3DrvWti0PEapJKGloFeHs= Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.west.internal (Postfix) with ESMTP id 38225320097A; Thu, 6 Apr 2023 13:03:16 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Thu, 06 Apr 2023 13:03:16 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=devkernel.io; h= cc:cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm3; t=1680800595; x=1680886995; bh=yg Gy2hfGMUZ/sVKx6qf9fqJmSIWmeg4Snqz38AyDgXs=; b=Rf0A9xNpCsujqRcO+S DI8YVguRlCs1X375klibe/SRPN+UF0IB2rU6CliVHfBfwMhPYzYTFWIzKMEDgpWe sfUYgWpEi2rY/Xn7GNyUFAQCk0UgYUo5QNcvGm/p9lxYic60WDZioVHMegSY5IxC LyOXQ1TgpSoWzvsKKBjE1NkZyjxlPZD24GXB3s0ebvAE5KxmlzLFR9bMUiW/kF1j 71CqxPigYZUbhzhD2KkuKQ4hFOViLJyjIZInFjc90UTFAYWy0DjqcAcGLW7xwhNX vQM5NUZ7J0+1VTEz13yq6CwUzrTKzikeNxIPVZlz1v5oW7HG63/xcnl5MhJae0g2 O7dw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; t=1680800595; x=1680886995; bh=ygGy2hfGMUZ/s VKx6qf9fqJmSIWmeg4Snqz38AyDgXs=; b=ct2aPFIoakvaMYWpsWOaFgXYL7j7s hMLGpcgARKDcHTSbjIjzMAI4NauzikJCe3TW3mdmRMmtCmKeDj7iTcbbBMLcjQ9u udtN7CtVkeYovh64ePnFxf2lwvRDDF+L7yVaXcU4jzMpGLdtkyObTKUNAxJ/NcVW u91yfNgnECrIIyngVuGR80U9ck25GFpmrqLvIBis4WiR7wEHO7zXP9WvhCijodW2 afdu3Zr2hjYl8pdb7UG5GgJ+wRY6YqipMFuy/E3lsTFH9IZ30LkvBQ3suu0Aeski J9xzHR+6pBD3/DVvrun1RzudY9tkCiG1PV8nhDymJpmzSv2MuheFFHo9w== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrvdejfedguddtiecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpehffgfhvfevufffjgfkgggtsehttdertddtredtnecuhfhrohhmpefuthgv fhgrnhcutfhovghstghhuceoshhhrhesuggvvhhkvghrnhgvlhdrihhoqeenucggtffrrg htthgvrhhnpeevlefggffhheduiedtheejveehtdfhtedvhfeludetvdegieekgeeggfdu geeutdenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpe hshhhrseguvghvkhgvrhhnvghlrdhioh X-ME-Proxy: Feedback-ID: i84614614:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 6 Apr 2023 13:03:13 -0400 (EDT) References: <20230310182851.2579138-1-shr@devkernel.io> <20230328160914.5b6b66e4a5ad39e41fd63710@linux-foundation.org> <37dcd52a-2e32-c01d-b805-45d862721fbc@redhat.com> User-agent: mu4e 1.6.11; emacs 28.2.50 From: Stefan Roesch To: David Hildenbrand Cc: Johannes Weiner , Andrew Morton , kernel-team@fb.com, linux-mm@kvack.org, riel@surriel.com, mhocko@suse.com, linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org, Hugh Dickins Subject: Re: [PATCH v4 0/3] mm: process/cgroup ksm support Date: Thu, 06 Apr 2023 09:59:32 -0700 In-reply-to: Message-ID: MIME-Version: 1.0 Content-Type: text/plain X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 6B117180024 X-Rspam-User: X-Stat-Signature: qa8x3kcku4b6ppznwhnwmoe94ydu837w X-HE-Tag: 1680800598-978777 X-HE-Meta: U2FsdGVkX1+McQbpxIuVjJXdy/uj8ZwHgzLooZLxEwhkcuC113G0D3PQKaDS/FMfHHGBuL6/C4nHIgTmUugqiAgDUsQDLRlB5AT1dBBrOD5JtUaUdeGUqNAboWYG4c9R2kOYtg0GW44FDBmxs+JgVJ0HVJibeub01no6AX3DXWPpfN4rjDCC37gytoamlopgmlpfoFsT7YwxZapIpicV2Jd0IaCj1YVYgo+t8+0LjQ28kvIG84a0RSTi69VGyGgZW7VuhhaZXedghiKoz9jPB9feVTXJr8tOm32iu9NAqfs48gPAiTooDOzXmjYHTuNJGTllKaTW8B1Vcqd344FAq0b7ye7xednky2ET0zWYGpcethL9tyuf0MyOfAeeSVSyuEz4Xy8SYN1hpBxXQM+6n5fsGoq7sdAQM8FafCl3FxxjUONJJH5mFqI+u4XNK59uliwRyI+9GkiwBIoIRR+kEQ/078RQKqODRLz34t0VVKyOa0NgE/aUutKS+txgt0PtV1cloAdepejMaP0rmD4MFpilscEa7grvMQGvmogTdE3nwqijXEpCPBxzIsZhw52Pr6YNPZmv5a36smmVQVu6HQ0UoubKAG8Pj4H3Veg+kpdYicK5Wz7Sl0f/gejf8tsa2MFM6AH+gXU5EjGey+A5Z3BUAI/gCSogodHpCsoUFWz0rVMsXERHzyk7Fapzh3v5r+8O9lVlu05RCUEpXDjHs70eiHfhZASWqtscZ6xZLPiqGEvM3yiDAgww0lR1kzdVweZaNJSGc2b0IWVb4mb9EZFaNe/5qEGHGRvyuXzcDYm9hILWMrL0ktNhNme+Hfny3C3JghJg/gdoJjhEfNIaNnKMbEyXowA0MCshWnWNMYmibSx7uB++b53K7y4e+aQZJd6zNno2Ewwacrjp5/SHefnvhu5e+u5Aie9vDhsOoEuppzvFewO1umnts2R4Po131F4q3T4n2hLfLfUS2mQ fQ3Sd1pa km3pub270IqVpyupt7fMOM6gP/6cPryF+KCrq2AXeX5WyqVMqVR7ax3X9Pe95uzuuraxPNKKU9qCvt9/tGjX5ST1MnMEf5+WhQU8EzczL+TEaIR0rNj3T4lwNjSKAAmUn7501lu4JWapY2tR1dpIKQW2OnHu607BaiqAMusw4W6p6B96IsYATYmlEjs060Bg9s2kR X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Stefan Roesch writes: > David Hildenbrand writes: > >>>>> Obviously we could spend months analysing which exact allocations are >>>>> identical, and then more months or years reworking the architecture to >>>>> deduplicate them by hand and in userspace. But this isn't practical, >>>>> and KSM is specifically for cases where this isn't practical. >>>>> Based on your request in the previous thread, we investigated whether >>>>> the boost was coming from the unintended side effects of KSM splitting >>>>> THPs. This wasn't the case. >>>>> If you have other theories on how the results could be bogus, we'd be >>>>> happy to investigate those as well. But you have to let us know what >>>>> you're looking for. >>>>> >>>> >>>> Maybe I'm bad at making such requests but >>>> >>>> "Stefan, can you do me a favor and investigate which pages we end up >>>> deduplicating -- especially if it's mostly only the zeropage and if it's >>>> still that significant when disabling THP?" >>>> >>>> "In any case, it would be nice to get a feeling for how much variety in >>>> these 20% of deduplicated pages are. " >>>> >>>> is pretty clear to me. And shouldn't take months. >>>> >> >> Just to clarify: the details I requested are not meant to decide whether to >> reject the patch set (I understand that it can be beneficial to have); I >> primarily want to understand if we're really dealing with a workload where KSM >> is able to deduplicate pages that are non-trivial, to maybe figure out if there >> are other workloads that could similarly benefit -- or if we could optimize KSM >> for these specific cases or avoid the memory deduplication altogether. >> >> In contrast to e.g.: >> >> 1) THP resulted in many zeropages we end up deduplicating again. The THP >> placement was unfortunate. >> >> 2) Unoptimized memory allocators that leave many identical pages mapped >> after freeing up memory (e.g., zeroed pages, pages all filled with >> poison values) instead of e.g., using MADV_DONTNEED to free up that >> memory. >> >> > > I repeated an experiment with and without KSM. In terms of THP there is > no huge difference between the two. On a 64GB main memory machine I see > between 100 - 400MB in AnonHugePages. > >>> /sys/kernel/mm/ksm/pages_shared is over 10000 when we run this on an >>> Instagram workload. The workload consists of 36 processes plus a few >>> sidecar processes. >> >> Thanks! To which value is /sys/kernel/mm/ksm/max_page_sharing set in that >> environment? >> > > It's set to the standard value of 256. > > In the meantime I have run experiments with different settings for > pages_to_scan. With the default value of 100, we only get a relatively > small benefit of KSM. If I increase the value to for instance to 2000 or > 3000 the savings are substantial. (The workload is memory bound, not > CPU bound). > > Here are some stats for setting pages_to_scan to 3000: > > full_scans: 560 > general_profit: 20620539008 > max_page_sharing: 256 > merge_across_nodes: 1 > pages_shared: 125446 > pages_sharing: 5259506 > pages_to_scan: 3000 > pages_unshared: 1897537 > pages_volatile: 12389223 > run: 1 > sleep_millisecs: 20 > stable_node_chains: 176 > stable_node_chains_prune_millisecs: 2000 > stable_node_dups: 2604 > use_zero_pages: 0 > zero_pages_sharing: 0 > > >> What would be interesting is pages_shared after max_page_sharing was set to a >> very high number such that pages_shared does not include duplicates. Then >> pages_shared actually expresses how many different pages we deduplicate. No need >> to run without THP in that case. >> > > Thats on my list for the next set of experiments. > In the new experiment I increased the max_page_sharing value to 16384. This reduced the number of stable_node_dups considerably (its around 3% of the previous value). However pages_sharing is still very high for this workload. full_scans: 138 general_profit: 24442268608 max_page_sharing: 16384 merge_across_nodes: 1 pages_shared: 144590 pages_sharing: 6230983 pages_to_scan: 3000 pages_unshared: 2120307 pages_volatile: 14590780 run: 1 sleep_millisecs: 20 stable_node_chains: 23 stable_node_chains_prune_millisecs: 2000 stable_node_dups: 78 use_zero_pages: 0 zero_pages_sharing: 0 >> Similarly, enabling "use_zero_pages" could highlight if your workload ends up >> deduplciating a lot of zeropages. But maxing out max_page_sharing would be >> sufficient to understand what's happening. >> >> > > I already run experiments with use_zero_pages, but they didn't make a > difference. I'll repeat the experiment with a higher pages_to_scan > value. > >>> Each of these individual processes has around 500MB in KSM pages. >>> >> >> That's really a lot, thanks. >> >>> Also to give some idea for individual VMA's >>> 7ef5d5600000-7ef5e5600000 rw-p 00000000 00:00 0 (Size: 262144 KB, KSM: >>> 73160 KB) >>> >> >> I'll have a look at the patches today.