From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 579D5C7618A for ; Wed, 15 Mar 2023 21:47:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E84606B007B; Wed, 15 Mar 2023 17:47:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E34476B007D; Wed, 15 Mar 2023 17:47:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CD5BC6B007E; Wed, 15 Mar 2023 17:47:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C05936B007B for ; Wed, 15 Mar 2023 17:47:20 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 837ED1C6C72 for ; Wed, 15 Mar 2023 21:47:19 +0000 (UTC) X-FDA: 80572468998.26.36D45A3 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf04.hostedemail.com (Postfix) with ESMTP id 5A99F40013 for ; Wed, 15 Mar 2023 21:47:17 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=aLzaQcH4; spf=pass (imf04.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678916837; a=rsa-sha256; cv=none; b=GWyFrpX/7ho/f2Fm6KNhBE/koslwApFsrKE67NbUso1nlRWkIWaTjmbMWnD/hJ02tBi5/L iaba9xhSw7+ySX0+n4pqJTdxDcBwdxNjPC8ojgzqvjP27L4kXFs4iti7TIkVtp8Armgk8B G9A3QXyJhWlKH2YX/bDjLHWMcawO/Zc= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=aLzaQcH4; spf=pass (imf04.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678916837; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uzHsIdU3m4ZPZ+Fn4mq+l2BgK390N6OcjPBD6SNBiec=; b=gm/tijAC0SH2zQHWXmf0BKFq42huNhs1UvjczNZ6v5ajsq8HefcWWU25h+oLcugjH5lhbI Q4mvCrjUiqMEnGBmzA0Bgm0X8+H0kIF9+pvvTivvzgT+WLDpnBcPIh2mprFCha1D4PPtgD VyYPCi64WMajGkqyzNt8DecoqRAHorQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1678916836; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uzHsIdU3m4ZPZ+Fn4mq+l2BgK390N6OcjPBD6SNBiec=; b=aLzaQcH4iQ4OdE0j8im8KqbAzF1IvYbOu1/EHoS2b9V5SJG59ixNBWmSty0CuHCZ4pou2e X+q8wi8GPrcf1Zncotwyg31z+bnZpMfay1tUsjgcvLFSQj7mrsAnXGquTVLWhiEQow1HeM 5M8iqO++IBCPAGpGu4FEp2zyfsWjCq0= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-98-RAPmaLmhNfmOhKhG-SQqbw-1; Wed, 15 Mar 2023 17:47:15 -0400 X-MC-Unique: RAPmaLmhNfmOhKhG-SQqbw-1 Received: by mail-wm1-f71.google.com with SMTP id k18-20020a05600c1c9200b003ed2a3f101fso1578546wms.9 for ; Wed, 15 Mar 2023 14:47:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678916834; h=content-transfer-encoding:in-reply-to:organization:references:cc:to :from:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=uzHsIdU3m4ZPZ+Fn4mq+l2BgK390N6OcjPBD6SNBiec=; b=jecPPBL9s2bjENgp0uPmM0GXhpNBgbreEwIBEU5Wa8j8IcUdi+rHXehlUZjEowxQfH Wj90IFA6kQiGhk4bSXFxhF/Ouo0k9erIjdbq58Fmt0htDzTnt/sDl9rbKuFsRknsb+4G meNzugLZW+Eqg7VVCrKPDS3wRDfuq/Tfdi4znZvMlBeRCOAcVMp7z1BsPwDw7X3gAqPN CJKLcC0gyRutosU9ZMvF/ayfeIk1ZM8guHEIPDuV3R5e9DjcCwXwo/P6iqMEHrdEoLZW VRuDKJsHFUwhZMG+BKRP1Jj2on/yvTF623BQGT7HULXoJLMup/dXEyv6nGRucK0pVEt1 ihLQ== X-Gm-Message-State: AO0yUKVRML56cwaQwpgb5Kzb84GVDVCl2OyIgakHy5Gws+QmMyI3va/l 6BGFlZxlDdcN9GU9Fwxnn0J0DhrgpSvfulhfXfQDaoVElGvEsAJLS6abIXnDwaDHWdN97831dz5 0zki2AGxK4Du2p2iGoeA= X-Received: by 2002:adf:d0cb:0:b0:2ce:a6be:2bd with SMTP id z11-20020adfd0cb000000b002cea6be02bdmr2950855wrh.1.1678916834394; Wed, 15 Mar 2023 14:47:14 -0700 (PDT) X-Google-Smtp-Source: AK7set9ni6+8w5yOlHeBDamsaqeVxh0pzxFcbK3le1wv656bEc31mDssO2ZwpVB/It+BPM1a4ympQA== X-Received: by 2002:adf:d0cb:0:b0:2ce:a6be:2bd with SMTP id z11-20020adfd0cb000000b002cea6be02bdmr2950838wrh.1.1678916834010; Wed, 15 Mar 2023 14:47:14 -0700 (PDT) Received: from ?IPV6:2003:cb:c702:2f00:2038:213d:e59f:7d44? (p200300cbc7022f002038213de59f7d44.dip0.t-ipconnect.de. [2003:cb:c702:2f00:2038:213d:e59f:7d44]) by smtp.gmail.com with ESMTPSA id a16-20020a056000101000b002c54e26bca5sm5677331wrx.49.2023.03.15.14.47.12 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 15 Mar 2023 14:47:13 -0700 (PDT) Message-ID: <70340c6e-cf3e-746a-4893-7978e11d3817@redhat.com> Date: Wed, 15 Mar 2023 22:47:12 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Subject: Re: [PATCH v4 0/3] mm: process/cgroup ksm support From: David Hildenbrand To: Johannes Weiner Cc: Stefan Roesch , kernel-team@fb.com, linux-mm@kvack.org, riel@surriel.com, mhocko@suse.com, linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org, akpm@linux-foundation.org, Mike Kravetz References: <20230310182851.2579138-1-shr@devkernel.io> <273a2f82-928f-5ad1-0988-1a886d169e83@redhat.com> <20230315210545.GA116016@cmpxchg.org> <20230315211927.GB116016@cmpxchg.org> Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Queue-Id: 5A99F40013 X-Rspamd-Server: rspam01 X-Stat-Signature: 9hpmgfn7jx41b1mi1hmbammqckiry38a X-HE-Tag: 1678916837-435340 X-HE-Meta: U2FsdGVkX1+tvewl+H/5C3nFLh5yNkEz3awl3I7VnyHbOWw7eRtb1wkb06QfmisYm1i0cFonJmvqhJFamLXC0/FKWj76bU6/ez4pCHDmOJY6/LUzLTmJfysjuQSKPjlAV1trPGcSP9oxlJ9zSfyY0GCDGHTlym0gsPTqMK2U8Nr5s+dJXvPBx2UK9wnq93mhauuGf9NJYsY2coY9n6CHm/isc3Au3/bI6St4lDWoSe87xhCL1titZ+FY41xO8psqzFbK9MqdVzCb+8s9Snv2mk9hhM4nQfRi3fkXhwvMpWmRvx/by2xynF8ffmggerGt06Jk0ElPpX24+tGW2+fZo1eH2INk+PEWIOo+sfx2xLf3AxPkmKfHu3ql84TRVnO4Rfq/k35LMDsaEL5nPo7zDgQ9POPq6IFuNkLRklqsXycGMX97Hq/ECkc+EwyeJ5RHAjz13aSAuDQ2kaxPMFgHSyBxb35pn4y2t4n3GN1VnmCOgiMS/6nKUm396rXAts5v0/Lw90Vm1Lq8vVXehXPaen00z6RRIx/DQBYuePKiAZbZNEAU7b+C+kr62ChSdCqfR+wpbC6vQcaFBa0v+FH9pLr/lbT1bjrBoapEu1MTP9jo/GkqERTuMy+F8tTXgxDUcoRZgUkYLOv1OiOrvEsA8TBycv5mVKceCL7Nu6PMXsF6k9Qx/opi0Z1DzsdjWez2uW/2ZfLVuOhFmZRZvrGY37Z/76bhqi1H5tu+tQbPsJsi+8Oq5ScY/c5ZcSiBPKYVrx0M0Y2mxgTgsqhFiCfzjCTGqqSDjxYYp25gO8Khc3hYrvS3CNc0MwVl1g/b2mB+r6RujHzQC5kgUyBorWA2jlrYQ+gQBY29QJbDv10veW2HZW1KvKi4a9aPnZEHz4HnUELKlZ5wMGZ1T8Kyy1CYyiVQbaWyZTKBJZYYwV/atXTfgQvYREuqdVGjSc75Q5I7oabp3IHN2ShwIH+QIDx UToHPliH 7sMT30vFTV992lU6hgkd8DHAl95KzMKNMN6dVhMMMnaoGvX/jzdJKgnmjbiSfaCgosaiEtb2D60xz0efgih3FELx1NxqZEmAW7D/1gsbLZkAytWZRkMEDB7xqv3XcxOfNL4/Vn5ZSPPhMYZhntuop/CFGDXwfrzDB3o+6mcFc0IFZj3bgxux2n+HrhmH1djhTGZFSaqKleY1FhWo6MfRzpv9jpfHnwffm852DiHjPM5eyW64sJOSw+s+gtVd+fkwiTzmk0KAxWpb17RLWlewy8zBYV25WiU0KPfkt+VT+wSp6TXIhRpvfXhDErppGzb2yvwfmzxiYKRrK9btlFz5eQLPn1qDuvWBOSjgZjjLAMN7wK4JfaToWIzz6Lj8IVKFmEG54DtyF3PDCJkKRCF20zeBYEg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 15.03.23 22:45, David Hildenbrand wrote: > On 15.03.23 22:19, Johannes Weiner wrote: >> On Wed, Mar 15, 2023 at 05:05:47PM -0400, Johannes Weiner wrote: >>> On Wed, Mar 15, 2023 at 09:03:57PM +0100, David Hildenbrand wrote: >>>> On 10.03.23 19:28, Stefan Roesch wrote: >>>>> So far KSM can only be enabled by calling madvise for memory regions. To >>>>> be able to use KSM for more workloads, KSM needs to have the ability to be >>>>> enabled / disabled at the process / cgroup level. >>>>> >>>>> Use case 1: >>>>> The madvise call is not available in the programming language. An example for >>>>> this are programs with forked workloads using a garbage collected language without >>>>> pointers. In such a language madvise cannot be made available. >>>>> >>>>> In addition the addresses of objects get moved around as they are garbage >>>>> collected. KSM sharing needs to be enabled "from the outside" for these type of >>>>> workloads. >>>>> >>>>> Use case 2: >>>>> The same interpreter can also be used for workloads where KSM brings no >>>>> benefit or even has overhead. We'd like to be able to enable KSM on a workload >>>>> by workload basis. >>>>> >>>>> Use case 3: >>>>> With the madvise call sharing opportunities are only enabled for the current >>>>> process: it is a workload-local decision. A considerable number of sharing >>>>> opportuniites may exist across multiple workloads or jobs. Only a higler level >>>>> entity like a job scheduler or container can know for certain if its running >>>>> one or more instances of a job. That job scheduler however doesn't have >>>>> the necessary internal worklaod knowledge to make targeted madvise calls. >>>>> >>>>> Security concerns: >>>>> In previous discussions security concerns have been brought up. The problem is >>>>> that an individual workload does not have the knowledge about what else is >>>>> running on a machine. Therefore it has to be very conservative in what memory >>>>> areas can be shared or not. However, if the system is dedicated to running >>>>> multiple jobs within the same security domain, its the job scheduler that has >>>>> the knowledge that sharing can be safely enabled and is even desirable. >>>>> >>>>> Performance: >>>>> Experiments with using UKSM have shown a capacity increase of around 20%. >>>> >>>> Stefan, can you do me a favor and investigate which pages we end up >>>> deduplicating -- especially if it's mostly only the zeropage and if it's >>>> still that significant when disabling THP? >>>> >>>> >>>> I'm currently investigating with some engineers on playing with enabling KSM >>>> on some selected processes (enabling it blindly on all VMAs of that process >>>> via madvise() ). >>>> >>>> One thing we noticed is that such (~50 times) 20MiB processes end up saving >>>> ~2MiB of memory per process. That made me suspicious, because it's the THP >>>> size. >>>> >>>> What I think happens is that we have a 2 MiB area (stack?) and only touch a >>>> single page. We get a whole 2 MiB THP populated. Most of that THP is zeroes. >>>> >>>> KSM somehow ends up splitting that THP and deduplicates all resulting >>>> zeropages. Thus, we "save" 2 MiB. Actually, it's more like we no longer >>>> "waste" 2 MiB. I think the processes with KSM have less (none) THP than the >>>> processes with THP enabled, but I only took a look at a sample of the >>>> process' smaps so far. >>> >>> THP and KSM is indeed an interesting problem. Better TLB hits with >>> THPs, but reduced chance of deduplicating memory - which may or may >>> not result in more IO that outweighs any THP benefits. >>> >>> That said, the service in the experiment referenced above has swap >>> turned on and is under significant memory pressure. Unused splitpages >>> would get swapped out. The difference from KSM was from deduplicating >>> pages that were in active use, not internal THP fragmentation. >> >> Brainfart, my apologies. It could have been the ksm-induced splits >> themselves that allowed the unused subpages to get swapped out in the >> first place. > > Yes, it's not easy to spot that this is implemented. I just wrote a > simple reproducer to confirm: modifying a single subpage in a bunch of > THP ranges will populate a THP whereby most of the THP is zeroes. > > As long as you keep accessing the single subpage via the PMD I assume > chances of getting it swapped out are lower, because the folio will be > references/dirty. > > KSM will come around and split the THP filled mostly with zeroes and > deduplciate the resulting zero pages. > > [that's where a zeropage-only KSM could be very valuable eventually I think] > >> >> But no, I double checked that workload just now. On a weekly average, >> it has about 50 anon THPs and 12 million regular anon. THP is not a >> factor in the reduction results. > > You mean with KSM enabled or with KSM disabled for the process? Not sure > if your observation reliably implies that the scenario described > couldn't have happened, but it's late in Germany already :) > > In any case, it would be nice to get a feeling for how much variety in > these 20% of deduplicated pages are. For example, if it's 99% the same > page or just a wild collection. > > Maybe "cat /sys/kernel/mm/ksm/pages_shared" would be expressive already. > But I seem to be getting "126" in my simple example where only zeropages > should get deduplicated, so I have to take another look at the stats > tomorrow ... > On second thought, I guess it's because of "max_page_sharing". So one has to set that really high to make pages_shared more expressive. -- Thanks, David / dhildenb