From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4DCE6C61DA4 for ; Mon, 13 Mar 2023 13:03:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C11216B0071; Mon, 13 Mar 2023 09:03:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BC19E6B0072; Mon, 13 Mar 2023 09:03:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A63966B0074; Mon, 13 Mar 2023 09:03:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 99C176B0071 for ; Mon, 13 Mar 2023 09:03:42 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 610E1C0BD1 for ; Mon, 13 Mar 2023 13:03:42 +0000 (UTC) X-FDA: 80563891884.22.E6DD6C8 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf02.hostedemail.com (Postfix) with ESMTP id 4D4258001E for ; Mon, 13 Mar 2023 13:03:38 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="UAr/xuZX"; spf=pass (imf02.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678712618; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=z/DLfot4JSpsy3w7Ct1qHyh3IvVJ9dCmlGzWXj2vp5k=; b=iEYbBznOJ7IsfK/ORd5xFXZ7gZTozoFp0bWMMrdby3pcjhwdbh3YfE9K8Debf1NcP6+nFJ 7X3pS87fN1TWBJZuk61cmHofbUMcgfkK+KGUG2DcRw9IeiKvf2/2E1zjYcFX/0ynGRNFhd H/+XFw/wyqReFgQfLEhMq81a/PtaXyE= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="UAr/xuZX"; spf=pass (imf02.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678712618; a=rsa-sha256; cv=none; b=MQH+Lg2ad0y8kZCG5tjoH4OVoe0esWV+UVIncnjKAMTdEX56if/f13ryaeaNhYACF6cXJM WXH9YOQG6wS5+OfpRTEwMKvKkTfnQv/kMEHHhl77N+cK6uFKABPwYJtdZEe7trQmthNTQ7 7qquBiglIlsce9nkO+YfVrAApf5sXgY= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1678712617; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=z/DLfot4JSpsy3w7Ct1qHyh3IvVJ9dCmlGzWXj2vp5k=; b=UAr/xuZX3WcwR+UtCzhfLCt3uRMHnzLFnNImhW6qqiF1CsO0DBu9IlbTcziVZjNLwFznz2 Xao32W2weixNPSa/VngFdsp6qdhUSSGorjfJ27QBlcN5NntfBTeGvskDQvJ0SUW0WgzQFo G9k2TvC0htrDFwQ4sMw+dE74f/kXrO0= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-329-KR7-iC-FPZGVPVQrEtKNaw-1; Mon, 13 Mar 2023 09:03:36 -0400 X-MC-Unique: KR7-iC-FPZGVPVQrEtKNaw-1 Received: by mail-wr1-f69.google.com with SMTP id bx25-20020a5d5b19000000b002c7ce4ce3c3so2107596wrb.13 for ; Mon, 13 Mar 2023 06:03:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678712615; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=z/DLfot4JSpsy3w7Ct1qHyh3IvVJ9dCmlGzWXj2vp5k=; b=vMXTQl2y1QT+Xd/DeiSlTd4O4ws0vtlcdfpN20oChyZOWYYI1i85jfIQfvYm6s1DHE YfLoCSGJX+6+zukpxQnFIsMzz8O5PcSJtS57Bb7wuYbbXPe5wynS1mafLmsvqdWnaM9/ tgHIDwjYJSM/vSfmCNEoc+nT08AasCLWICJBFsyv4AqlPUvUL5E8qa8O7OrryravsiyP mEI1t8F3PLenzWu0sfTasNFQUc6aFqo1G7n7EjfVka/YEYJ3VyPG6tk6kQBXJoukUwZh /ubzdcXFSyVkrO39K2qZdmNGmg+Ibkfv3dVJdB5QGm3sGtMrw1mUQiwjVxAe0y1EiP6f U7vA== X-Gm-Message-State: AO0yUKVjpJQTEBTqkG2KMFFhIGZeqwpUEG92MyA1L2OgO+3Wh6wzQ5Q/ E1V6E4nvcKZaeHU2HwPdEZRMggiJL6kXvZDvLrZZY6W4Rcuwl6rjr1FbJvKuNpo9ELaHuUBuBiI P0x4IjS9Fkaw= X-Received: by 2002:a05:6000:1c2:b0:2ce:a944:1d9b with SMTP id t2-20020a05600001c200b002cea9441d9bmr3776829wrx.58.1678712615499; Mon, 13 Mar 2023 06:03:35 -0700 (PDT) X-Google-Smtp-Source: AK7set8SXtDlde5143rTYY54QSBP2tzbO47Y4zst+YARWAhqzofWtcRon4WxnKgiBWaGaiZUojRq/A== X-Received: by 2002:a05:6000:1c2:b0:2ce:a944:1d9b with SMTP id t2-20020a05600001c200b002cea9441d9bmr3776805wrx.58.1678712615144; Mon, 13 Mar 2023 06:03:35 -0700 (PDT) Received: from ?IPV6:2003:cb:c701:7500:9d5c:56ee:46c3:8593? (p200300cbc70175009d5c56ee46c38593.dip0.t-ipconnect.de. [2003:cb:c701:7500:9d5c:56ee:46c3:8593]) by smtp.gmail.com with ESMTPSA id o15-20020a5d4a8f000000b002c70e60abd4sm7972148wrq.2.2023.03.13.06.03.34 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 13 Mar 2023 06:03:34 -0700 (PDT) Message-ID: <9d7a8be3-ee9e-3492-841b-a0af9952ef36@redhat.com> Date: Mon, 13 Mar 2023 14:03:33 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 To: yang.yang29@zte.com.cn, akpm@linux-foundation.org Cc: imbrenda@linux.ibm.com, jiang.xuexin@zte.com.cn, linux-kernel@vger.kernel.org, linux-mm@kvack.org, ran.xiaokai@zte.com.cn, xu.xin.sc@gmail.com, xu.xin16@zte.com.cn, Hugh Dickins References: <202302100915227721315@zte.com.cn> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v6 0/6] ksm: support tracking KSM-placed zero-pages In-Reply-To: <202302100915227721315@zte.com.cn> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 4D4258001E X-Stat-Signature: dqgbema1khg3hkrusryzibk6yioujghd X-HE-Tag: 1678712618-489339 X-HE-Meta: U2FsdGVkX1/Kcgpoy3qQJNG+GBTaLgtSW7F/QkwxlRe9+FLQQ5RBWeS8ks1cHddMwUSmNQHrdSEKCx/Af9Ia8DDv1SdWHQFA8/EKMgLgZ4+2/UB4mN1VutHW9TQ+98WdAeCiZ5OWxU4Q53mIddGQdO6Jqf4aSMsmjEuHU/P7i17N+qXW2L4KOZqwUO6rxw358v7tZJYsYnlZ1TG4g4bFnCxQB3SCwTKICW7BkLlW/BNXigC/22ToWpm2EYtjizm6wJ97zp4onPNFdDHvJBc89QZS/rgdnpck3OtfW2XwI10lBeQuP0z5lWSYH0Rl/J/iEHE6/RsPiCbDdort5Qw/QDqTOeoYcc5bOH8Wv2ywRJeFERYrwSda/wt+RoT5HtmxHgS5EOSimqHkFk11mHB3MlJ4UQXm9KQ4wJ5096FDnoXZ/4RfpSrkw5Ewv9rx6ily9d4VYkeacbQMIBc6ad7gLZ9tYUBe5MSaJnJ04Nf9ecMStmV3muVHwTXpCl+gsWKQ7T/wOniAKY68hubwI6JtXMbEJVMBMEChx1ZYxt32LQq0Dgu6mZN4OOtsO3d/QX7M41btH5HF807CaVFqejKfWzaqT49IfW5YVzjo/FK7JjTKWbnXTJPZQeLSobgvZ2cc6KWxerjy/a6vTjcnn0DbYlXdfDVYkJpR2QWaeeR2rzMulsUA3R1kk15SR4rklIOVhYpqnPXbqin60sxbAvNudrVgBpi2dp2P/UtWX43fOkQyBAF5e3XQNUXu7ot24LHmRCxHKFzo0pBkPnYJuOK730s5KWK6or4OkiweF3TN3Qz6hSXb8/B5t8MS0tRp7oxN/7WlDZIpy+QMMnjvp9Kk9NhmTlNUrGN7uW0r3QvUV1hXgTfpk0xgmxgF5EqQMATOjzPcDVjZcyyTp/xYxEtnq1SpqzuzRnAobo8aOMJetEyMSL1Q8lQxxmhCFvh19UgMp1gzvKS+jySZJ5tSGUN qv7Acg8q YbZpbQla9cxSO2Edbh99Rp1lNpFOgiayruojid6HRRStvewk6FlehGXRkggdJ0B/wTPV1RjeQbN0IQg8O1ucrgfhOLy9N7F3EgENdMT9yISDaidS4NpnnW6Ohc0BdVsuEGBY5T0941mPU3W5zNydhnZMZz+PVXt3+XMBA/f5qOc5Br/QH2luUoQO4NyjKhlb5z7TOYJ+ihKp6zGZLAHy2Lt7VVM+z10R4bemEKxscUQOkuY8tlnDNa3Anko3OUxt/Sm6tEwQgwzvpL5pinomnGRIBu9xX0CyuX0U19OjQ+BquT/gDDvVdcUp3uygo24NkOJbVDcHj1VWXF0ndzPwaWzyTTV6Z7sRjS5CT2riZjQNXtqxXPaYh35kppOC1ElX8g1H0dvm/nlrN081I2ZjFg0VEAPFqm6OgyEnOQl9xoA3sNIjePTqSH6a5kLpXD8T0xtxPgSYo1CvOrUjhJiMdWzhihIuV/1wzjrBzOAVq0WraYNHs8BFc0Ni3QzzoexeTXCplkWn5BVpjymRPX+gkpS+il4eEH2/KgZNMKM8yzOy4+yxN3DZuO35NA/2U/cqYJAup8qWfQ4gahatNLPuc2NH0q+eDsuyK/vWqNRNqiacePHI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 10.02.23 02:15, yang.yang29@zte.com.cn wrote: > From: xu xin > Hi, sorry for the late follow-up. Still wrapping my head around this and possible alternatives. I hope we'll get some comments from others as well about the basic approach. > The core idea of this patch set is to enable users to perceive the number of any > pages merged by KSM, regardless of whether use_zero_page switch has been turned > on, so that users can know how much free memory increase is really due to their > madvise(MERGEABLE) actions. But the problem is, when enabling use_zero_pages, > all empty pages will be merged with kernel zero pages instead of with each > other as use_zero_pages is disabled, and then these zero-pages are no longer > monitored by KSM. > > The motivations for me to do this contains three points: > > 1) MADV_UNMERGEABLE and other ways to trigger unsharing will *not* > unshare the shared zeropage as placed by KSM (which is against the > MADV_UNMERGEABLE documentation at least); see the link: > https://lore.kernel.org/lkml/4a3daba6-18f9-d252-697c-197f65578c44@redhat.com/ > > 2) We cannot know how many pages are zero pages placed by KSM when > enabling use_zero_pages, which hides the critical information about > how much actual memory are really saved by KSM. Knowing how many > ksm-placed zero pages are helpful for user to use the policy of madvise > (MERGEABLE) better because they can see the actual profit brought by KSM. > > 3) The zero pages placed-by KSM are different from those initial empty page > (filled with zeros) which are never touched by applications. The former > is active-merged by KSM while the later have never consume actual memory. > I agree with all of the above, but it's still unclear to me if there is a real downside to a simpler approach: (1) Tracking the shared zeropages. That would be fairly easy: whenever we map/unmap a shared zeropage, we simply update the counter. (2) Unmerging all shared zeropages inside the VMAs during MADV_UNMERGEABLE. (3) Documenting that MADV_UNMERGEABLE will also unmerge the shared zeropage when toggle xy is flipped. It's certainly simpler and doesn't rely on the rmap item. See below. > use_zero_pages is useful, not only because of cache colouring as described > in doc, but also because use_zero_pages can accelerate merging empty pages > when there are plenty of empty pages (full of zeros) as the time of > page-by-page comparisons (unstable_tree_search_insert) is saved. So we hope to > implement the support for ksm zero page tracking without affecting the feature > of use_zero_pages. > > Zero pages may be the most common merged pages in actual environment(not only VM but > also including other application like containers). Enabling use_zero_pages in the > environment with plenty of empty pages(full of zeros) will be very useful. Users and > app developer can also benefit from knowing the proportion of zero pages in all > merged pages to optimize applications. > I agree with that point, especially after I read in a paper that KSM applied to some applications mainly deduplicates pages filled with 0s. So it seems like a reasonable thing to optimize for. > With the patch series, we can both unshare zero-pages(KSM-placed) accurately > and count ksm zero pages with enabling use_zero_pages. The problem with this approach I see is that it fundamentally relies on the rmap/stable-tree to detect whether a zeropage was placed or not. I was wondering, why we even need an rmap item *at all* anymore. Why can't we place the shared zeropage an call it a day (remove the rmap item)? Once we placed a shared zeropage, the next KSM scan should better just ignore it, it's already deduplicated. So if most pages we deduplicate are shared zeropages, it would be quite interesting to reduce the memory overhead and avoid rmap items, instead of building new functionality on top of it? If we'd really want to identify whether a zeropage was deduplciated by KSM, we could try storing that information inside the PTE instead of inside the RMAP. Then, we could directly adjust the counter when zapping the shared zeropage or during MADV_DONTNEED/when unmerging. Eventually, we could simply say that * !pte_dirty(): zeropage placed during fault * pte_dirty(): zeropage placed by KSM Then it would also be easy to adjust counters and unmerge. We'd limit this handling to known-working architectures initially (spec64 still has the issue that pte_mkdirty() will set a pte writable ... and my patch to fix that was not merged yet). We'd have to double-check all pte_mkdirty/pte_mkclean() callsites. -- Thanks, David / dhildenb