From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82777C61DA4 for ; Wed, 15 Mar 2023 21:45:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1662D6B007B; Wed, 15 Mar 2023 17:45:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 116B96B007D; Wed, 15 Mar 2023 17:45:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EF92E6B007E; Wed, 15 Mar 2023 17:45:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E19F46B007B for ; Wed, 15 Mar 2023 17:45:38 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id AFD2D16127F for ; Wed, 15 Mar 2023 21:45:38 +0000 (UTC) X-FDA: 80572464756.27.CDF82A8 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf21.hostedemail.com (Postfix) with ESMTP id 6445C1C0020 for ; Wed, 15 Mar 2023 21:45:36 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=FMbml4r2; spf=pass (imf21.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678916736; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2w0EoBdyGMAStnY9ncA9Hff5vn5E9IGC8OhITWijw/M=; b=JMUnCRxh23BHnJV7iAN8bx10zdwntx+6lD/M3sPvJ80mFm5X7chf/yi8jIWNr02tMxATQf HXV9DGaVuDbMN0jMCXYCrFgKuPIh8py0NRJq3WQoMhU5zBTzqRIObArB5r71L2wPsbbZ7v nV5kqsn1gluV0YPgJ9cYPlfYdzLG3rE= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=FMbml4r2; spf=pass (imf21.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678916736; a=rsa-sha256; cv=none; b=JVxXdUB+ITnaCv4TfTF7fkb6Vp/lTg7slqKkH3KnKqNVfs6HqhiQzyqIp5Ul3CNe5/6ISm CSKcCcrO/1VEPcmHSB4QcJKt7xfjiWR/+A2v3pkzPoMDLv3YNgFhGKnziQfeUr9x19fsAA MbGRuKwX//ZeTsaXfK7OkSuI+Ppi7h4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1678916735; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2w0EoBdyGMAStnY9ncA9Hff5vn5E9IGC8OhITWijw/M=; b=FMbml4r2hMF98wNYgYixPtNuhGzZTOdzIbYKLaOLd01wq/iKhRwOeqN0sxbpQt3q5Rb22q Olr0qkKMe2XuGb+JBiWcaW4F2JwzSCLOs2cKPJ0EZfiPs9tVxzAzMN6vgAR+hvcYVDlZKh neBd/yGOf0NIuhAx5LXKKgMlfOEJ+Ws= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-474-zwKxgjdAMh2TNh9YDgnEEQ-1; Wed, 15 Mar 2023 17:45:34 -0400 X-MC-Unique: zwKxgjdAMh2TNh9YDgnEEQ-1 Received: by mail-wm1-f71.google.com with SMTP id ay12-20020a05600c1e0c00b003ed201dcf71so4347355wmb.4 for ; Wed, 15 Mar 2023 14:45:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678916733; h=content-transfer-encoding:in-reply-to:subject:organization:from :content-language:references:cc:to:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=2w0EoBdyGMAStnY9ncA9Hff5vn5E9IGC8OhITWijw/M=; b=a/DqENSDPpnqyZZ/ddNjfH3Hax+wAjVZmsUWOHBqJn/jtmXXo+mIXOnXPXkvG5bFkJ wWnmcceBTSnY3agoFaVP1c0Bv6TEIi73JihNjZJ5GfGrq8Pvu2Tj8rBzrrgabQrDLsYt Be6uc2yQJqv6BVrS4Q1ZdvrUr3r3I+iL2xHL6xqTtPoSLqoO1gbunXJ1rOMzKICQJGGU KDsRPml4QJnzQ0CNQZGBr6uDS447NqndJbs7MuTD5vkJgJ8QoizA/N8yNquxo4Na6g3w HFxDcutDkhfFeH0qhvRdoILjckofAk97iDmgka/vXB57ZbPCzws09p1N5oWjrEsWpyhv 6Wvg== X-Gm-Message-State: AO0yUKX5RlMZk7bnTznIxU69SbldPgNXm70ETKqPIiRAcEKGCzCvqRDZ 2m8LCJ+2nNSN7wHI5fEsWwUOJ3kb6b6CZaCBZwAdMhmQSrSTjoFl7t9/MYSEiLhEcsC/MK8fbBx 3X8xuWy85JLY= X-Received: by 2002:a05:600c:b8c:b0:3ed:2f1a:883c with SMTP id fl12-20020a05600c0b8c00b003ed2f1a883cmr6853237wmb.14.1678916732981; Wed, 15 Mar 2023 14:45:32 -0700 (PDT) X-Google-Smtp-Source: AK7set9DJlfSAzyK1UirrgQ3RE0vPytiyldFZWHR/k2PWAfw/ZZ52cTw/0QmNxiag4vGApkQU2kUxw== X-Received: by 2002:a05:600c:b8c:b0:3ed:2f1a:883c with SMTP id fl12-20020a05600c0b8c00b003ed2f1a883cmr6853214wmb.14.1678916732598; Wed, 15 Mar 2023 14:45:32 -0700 (PDT) Received: from ?IPV6:2003:cb:c702:2f00:2038:213d:e59f:7d44? (p200300cbc7022f002038213de59f7d44.dip0.t-ipconnect.de. [2003:cb:c702:2f00:2038:213d:e59f:7d44]) by smtp.gmail.com with ESMTPSA id n5-20020adfe785000000b002c7066a6f77sm5610937wrm.31.2023.03.15.14.45.31 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 15 Mar 2023 14:45:32 -0700 (PDT) Message-ID: Date: Wed, 15 Mar 2023 22:45:31 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 To: Johannes Weiner Cc: Stefan Roesch , kernel-team@fb.com, linux-mm@kvack.org, riel@surriel.com, mhocko@suse.com, linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org, akpm@linux-foundation.org, Mike Kravetz References: <20230310182851.2579138-1-shr@devkernel.io> <273a2f82-928f-5ad1-0988-1a886d169e83@redhat.com> <20230315210545.GA116016@cmpxchg.org> <20230315211927.GB116016@cmpxchg.org> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v4 0/3] mm: process/cgroup ksm support In-Reply-To: <20230315211927.GB116016@cmpxchg.org> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Stat-Signature: 8ywjc56ae8hkbrbqjptnd5coohhbtpix X-Rspam-User: X-Rspamd-Queue-Id: 6445C1C0020 X-Rspamd-Server: rspam06 X-HE-Tag: 1678916736-106627 X-HE-Meta: U2FsdGVkX1/puWykUDyh7P03a10nSJ+HTVI478kc3Xue+fneSUk2LbnBnri6tsD0MlnJsQqz7x/qcADvgxKSK5ohWOqK4bNA2Fh8X8M1A40E3aLTGlMbRf0kiEYWFyFlvfp8uhYkzGGStR8VFGbbFW+70p0rM5HuDu970AA/WLs3Ho27tJA+A7gG1A5BX06ujQOMu/0C5DTIku2qImxTThZ+ASGEk62kIbFJegGNtCXKw99mwlL4RFfmMvvRk6kCLKCcsBjqBeo6U1X1X//xKbupNie1DsP2wDRCXa4bVhUfrXIDwaB8AxLbszpM9qdOV5aJYU4UsXKrKrE+hez4dQ/wL+3IhgfQ4n6HN7rO0PlwN3gLLk9fYrDod1DYVAt180UjC7vbgcD+7N6eJ0uxfkvbDVV6OQpDaXd/jjj2tV41gFwcp5WuzmCfZ703OpAB/Uo1fcvZNTZzGaZmnyHKbxYpr2QJnzlYVJ8f/uVuIl9HUh5BiHulpfpeL8CcnOvsDLWVOd2uCY/jz9hHnc4y4bd8D5CluPEWMsYuA1QIoWhUzREcfuHJFDr4N8m6FN+TrFQ83bSJdSjJw9Zm/PPEyIX/qmDrVrxwEE2EevHzWGS4onsrOv0sPa/EYxZ4+cVfiBduY9mlnLByTEDtjSGYk99LMB9VFWH2gelunahllH5OObgwPoTXbI7Q1Tjlkd+SWBmHq8tI0t/JpGqpIn0F6jyz5fz+cQf+gRJpcBmmzGnNyZH6eh6uWdkIGQ+LsK9yAs1H31aKDvc6ef1hyzpW/OOVbuC7hRPkNiaOr77GrOyODt2qmf+sg/VHKoZ5ImRuMb/oX/zfevOyxEMTZvKHO8yzBv0KMd0pIDWLo3H6uYG7/5K2qf3cgZXCRL9DJesYXKUx6BziFBj3oRD40p+COyZT7pndd3DuG0XU3upXsb9EJvk30BjMgzJdkuBD7MGZJM4jThghuVa/yKI6vAX p/W8gGxR T6mwEQv0tHArKJuUyx92EdbZwo8N8O180CfR0ga0cwBgPvJIdYKryX31SmuIBJQgqtdbWuFD8fZFvelGyk5M33XADR0UGZzlqcegSXqG/D4H/SKLjx6wzVhD2ywVhgQ6taX7d+DAFoz1JLuSTKyEWLn4Ihl1NjscChEMZik5hQIaOMYTLCrrzqiHU4TgmA1qFVEv4Jxs8oIZmuXd1VkOUbwLHkgVYqtkyuhvdEHUeQonGlBDzGYie1y4uYOv/tJRQ3RVVJjKMYXpB+qddXV/Hqj3GojYM0cmK6vo3uOae1xQnjVfmlO+rU+y2OnzUyXAwM8zz3DsfbqXhTc6VcP51ukKpOxNTop8LRh7wSdq9gcYl4sgJ6Jz+juLWWyHZNyKTkMuhCluCZeEkT3QEt+rfkK+Q4Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 15.03.23 22:19, Johannes Weiner wrote: > On Wed, Mar 15, 2023 at 05:05:47PM -0400, Johannes Weiner wrote: >> On Wed, Mar 15, 2023 at 09:03:57PM +0100, David Hildenbrand wrote: >>> On 10.03.23 19:28, Stefan Roesch wrote: >>>> So far KSM can only be enabled by calling madvise for memory regions. To >>>> be able to use KSM for more workloads, KSM needs to have the ability to be >>>> enabled / disabled at the process / cgroup level. >>>> >>>> Use case 1: >>>> The madvise call is not available in the programming language. An example for >>>> this are programs with forked workloads using a garbage collected language without >>>> pointers. In such a language madvise cannot be made available. >>>> >>>> In addition the addresses of objects get moved around as they are garbage >>>> collected. KSM sharing needs to be enabled "from the outside" for these type of >>>> workloads. >>>> >>>> Use case 2: >>>> The same interpreter can also be used for workloads where KSM brings no >>>> benefit or even has overhead. We'd like to be able to enable KSM on a workload >>>> by workload basis. >>>> >>>> Use case 3: >>>> With the madvise call sharing opportunities are only enabled for the current >>>> process: it is a workload-local decision. A considerable number of sharing >>>> opportuniites may exist across multiple workloads or jobs. Only a higler level >>>> entity like a job scheduler or container can know for certain if its running >>>> one or more instances of a job. That job scheduler however doesn't have >>>> the necessary internal worklaod knowledge to make targeted madvise calls. >>>> >>>> Security concerns: >>>> In previous discussions security concerns have been brought up. The problem is >>>> that an individual workload does not have the knowledge about what else is >>>> running on a machine. Therefore it has to be very conservative in what memory >>>> areas can be shared or not. However, if the system is dedicated to running >>>> multiple jobs within the same security domain, its the job scheduler that has >>>> the knowledge that sharing can be safely enabled and is even desirable. >>>> >>>> Performance: >>>> Experiments with using UKSM have shown a capacity increase of around 20%. >>> >>> Stefan, can you do me a favor and investigate which pages we end up >>> deduplicating -- especially if it's mostly only the zeropage and if it's >>> still that significant when disabling THP? >>> >>> >>> I'm currently investigating with some engineers on playing with enabling KSM >>> on some selected processes (enabling it blindly on all VMAs of that process >>> via madvise() ). >>> >>> One thing we noticed is that such (~50 times) 20MiB processes end up saving >>> ~2MiB of memory per process. That made me suspicious, because it's the THP >>> size. >>> >>> What I think happens is that we have a 2 MiB area (stack?) and only touch a >>> single page. We get a whole 2 MiB THP populated. Most of that THP is zeroes. >>> >>> KSM somehow ends up splitting that THP and deduplicates all resulting >>> zeropages. Thus, we "save" 2 MiB. Actually, it's more like we no longer >>> "waste" 2 MiB. I think the processes with KSM have less (none) THP than the >>> processes with THP enabled, but I only took a look at a sample of the >>> process' smaps so far. >> >> THP and KSM is indeed an interesting problem. Better TLB hits with >> THPs, but reduced chance of deduplicating memory - which may or may >> not result in more IO that outweighs any THP benefits. >> >> That said, the service in the experiment referenced above has swap >> turned on and is under significant memory pressure. Unused splitpages >> would get swapped out. The difference from KSM was from deduplicating >> pages that were in active use, not internal THP fragmentation. > > Brainfart, my apologies. It could have been the ksm-induced splits > themselves that allowed the unused subpages to get swapped out in the > first place. Yes, it's not easy to spot that this is implemented. I just wrote a simple reproducer to confirm: modifying a single subpage in a bunch of THP ranges will populate a THP whereby most of the THP is zeroes. As long as you keep accessing the single subpage via the PMD I assume chances of getting it swapped out are lower, because the folio will be references/dirty. KSM will come around and split the THP filled mostly with zeroes and deduplciate the resulting zero pages. [that's where a zeropage-only KSM could be very valuable eventually I think] > > But no, I double checked that workload just now. On a weekly average, > it has about 50 anon THPs and 12 million regular anon. THP is not a > factor in the reduction results. You mean with KSM enabled or with KSM disabled for the process? Not sure if your observation reliably implies that the scenario described couldn't have happened, but it's late in Germany already :) In any case, it would be nice to get a feeling for how much variety in these 20% of deduplicated pages are. For example, if it's 99% the same page or just a wild collection. Maybe "cat /sys/kernel/mm/ksm/pages_shared" would be expressive already. But I seem to be getting "126" in my simple example where only zeropages should get deduplicated, so I have to take another look at the stats tomorrow ... -- Thanks, David / dhildenb