From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14F89C6FD1D for ; Wed, 15 Mar 2023 21:05:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8D2186B0072; Wed, 15 Mar 2023 17:05:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 882116B0075; Wed, 15 Mar 2023 17:05:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6FDBD6B0078; Wed, 15 Mar 2023 17:05:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 6004A6B0072 for ; Wed, 15 Mar 2023 17:05:51 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 25F29ABA22 for ; Wed, 15 Mar 2023 21:05:51 +0000 (UTC) X-FDA: 80572364502.07.7B23A28 Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) by imf21.hostedemail.com (Postfix) with ESMTP id 229321C0021 for ; Wed, 15 Mar 2023 21:05:47 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=dAm7XrSJ; spf=pass (imf21.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.182 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678914349; a=rsa-sha256; cv=none; b=8coWz6LghP7K6wwhRaqvW/zljybIXSETHjiPDV8A3Up4oSpr97FZGClQfdbaz9boX010FK n3hvpgDGt0Odin5sCfk0RYC9DvSistP0vK3VQnhGpVcr4tlMTGv49RwcH5dNmPApsXOJbw Jol17sRqD7OfnNkRhNZjjlABLm0rcbg= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=dAm7XrSJ; spf=pass (imf21.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.182 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678914349; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=R61XbdgNwxdl4gvWLm8BP+OVhRn8qwKx//HsrYt1qmk=; b=EkGSgBIjcEqx4yLnsv5mK4rsOz4w/H6P2t9W53u8JdHIrR8+cvAwu8pniI8gcVgToTM8oz ndKhyS9saqNH864bWyjmmGqtptK8SYZTiexc2YG8DlJrqW0VjEOts32Z2NximnTE30uY6r s84ODpnOdOH+HNAC4N3LTY1MSEUb69E= Received: by mail-qt1-f182.google.com with SMTP id d7so17691675qtr.12 for ; Wed, 15 Mar 2023 14:05:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20210112.gappssmtp.com; s=20210112; t=1678914347; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=R61XbdgNwxdl4gvWLm8BP+OVhRn8qwKx//HsrYt1qmk=; b=dAm7XrSJWFzPKXfbALD5L2NH+Sry4YXMzrsUHHfrTkzmxSalhFPSQ+eo4DJCiHb2aC WrK0UqJxhA71JD0wFZ2mRoEubqlNXRX7MLd4rEHdZJJegyHMZQ2KXkFBpMJ5g/cC8wiz 07h6CT1jVGkeBuNI8TSla2ksJEpARM5YFcwsU9zCeP+B89JgPa+hK0g3sRMrQadTV3IP 12z4eD235fFLzsFNlPkLJwAzzTjixlXzOkgEiYTDtTGmcSbCPALt3T2E6QUs93Bbp593 lVWWmn7tz/MeeD+bu+6dFrIwCbdErRZqgxhYZPrW3xW8/egL3XDhYS3QTqWcAcmNN3R5 FfGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678914347; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=R61XbdgNwxdl4gvWLm8BP+OVhRn8qwKx//HsrYt1qmk=; b=nQ7G9Y5i8jUPq8gIbgQRyhV1Ag+OBTjq41GJiIjq4NFWvM9JIgDjTf+Kt4I8LVx/a7 qy/7WSSInXotfMcefs7TtTWtQcieKkRG/i4dstmTfPyjxaTcYkZ3UFPyp+qzGQ3zswCc SzJPZZrVSoc+zDO6RvK/QlSGezVZhnKaSlA2ceh/2f6qTj7ekybEf1uQR185vFX/cvGO oI0WL8sExC3WjXlmL5l/00AQRBC1sJiD7yQxe6kak8dSVj0GI3xH+cLI/0I6kSBBVEck xhATY430J8p3r9Mvc3fOiPiuVcWVlEF53fh12Y16iqIm8RJCdySjuuSOtvUQJbvdBAbe XYlw== X-Gm-Message-State: AO0yUKWyrBmfTcG0njX15xio6yLEhs0ncdLEAgUqvQo6aZfPRwEINbAU PAr5CpoYFqT/Jc/9+P9633mF1A== X-Google-Smtp-Source: AK7set8xuYbyW7JISky8auUn+u60BrauFw8MHjUXtRQc00soTPcw9cEC/1WQHNSswFBYrXiVKUW51w== X-Received: by 2002:ac8:7f8c:0:b0:3bf:dc2e:ce5d with SMTP id z12-20020ac87f8c000000b003bfdc2ece5dmr2411491qtj.4.1678914347149; Wed, 15 Mar 2023 14:05:47 -0700 (PDT) Received: from localhost ([2620:10d:c091:400::5:62db]) by smtp.gmail.com with ESMTPSA id t5-20020a05620a034500b007456df35859sm4426062qkm.74.2023.03.15.14.05.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Mar 2023 14:05:46 -0700 (PDT) Date: Wed, 15 Mar 2023 17:05:45 -0400 From: Johannes Weiner To: David Hildenbrand Cc: Stefan Roesch , kernel-team@fb.com, linux-mm@kvack.org, riel@surriel.com, mhocko@suse.com, linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org, akpm@linux-foundation.org, Mike Kravetz Subject: Re: [PATCH v4 0/3] mm: process/cgroup ksm support Message-ID: <20230315210545.GA116016@cmpxchg.org> References: <20230310182851.2579138-1-shr@devkernel.io> <273a2f82-928f-5ad1-0988-1a886d169e83@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <273a2f82-928f-5ad1-0988-1a886d169e83@redhat.com> X-Rspam-User: X-Rspamd-Queue-Id: 229321C0021 X-Rspamd-Server: rspam01 X-Stat-Signature: w33qudc34w4gokgpe1c1g66fij991hdj X-HE-Tag: 1678914347-42295 X-HE-Meta: U2FsdGVkX18ltHNlTp8hIG+IxRQglsVTcbIX9T60RJ8mg8Ad1liFMB8VcbVhXugCEqgl/5+g4ZmDKwUMYRDF7cjrKQRhtSl/x5FC0HWjKFQqF9jwrQwAaC8JIswRLldKXh6PfMkG3fCoXjjCLpZJSJiYuP3P8AeArMW9KODCAzkl1y4488T2WO5MeaD17AdE4cna4xEuf5IGSJAlFtg1YvwsM8h5GkXL4KQnAYCGkdvrlPqX/GwmqgYpow9uGnTAXkBEsVPahUA3q6iEU3riqL2kZuj1AFDBsO7Mq1xrvpBE8F9CSJi3eXz/rhWdQ+Chbwr3gVN+r/HjIXpJwIXqwO2PQJnp5mSa25V+4uPmgptnkwI+YpL+DyMc72jw5fEDTq3HJ9TTlxLJ3sJ0DnFzF9o9wr8bHcw7qJTTZ48SBzDLEd5BfKYvpucJa6tsO6okCHQNB2xR+0nTa4ysmchGFqUBECRZdDY+IsUYLMSQOZWJem2E4+PospVCM8DIk/aZkzxH4fG6V6FLR396KhwJzBV78aFNOzmzVHMjKnZCxePaEHnShsjOF9AT//oRkPPtm9JBOsm0ER2wKWXqOh5kxFIH/8yw8qiZOz7wKeyYPsoxfB1VN2fL3SM+ugJ4DNgnxG/ccs7rse45o3RSL65j9f/a3Sv5C2bvSL8zSpK66/ilxxBcnAxJb27O3LbbNUpdmHRMq+GpTAfIU8r1EW6ZtktYNyyuWYcHB2b7MLxWmYRwNxp5ASvPK1pKorKJ/GD3xZFKijuTe/9I3zT1aqqsQ5b3tMVi4T3hR0vsd5UeSpnq5wwys1O/H1zUAVapp4XO2nAJ9LgGfwPYJMPehwVFXV/I0oQR8l1VRBCeDAVMSzUVmTlObXIujnt83ojLe+AcMu7aMHWZZhxew01DfU47TG2Pszrdoalf15uPXtO1EzGoFt6QfgB6s2CkQvym50z0LV7PBHrlnUrNdxBK9C/ l9I62ty6 unINbEHAMKKi/Kj4K3F02lr6jkJzB7YSoyhkb4jmLGyaby30UA6Rz8ie+11G/4eEPbb2kX4CxXR6NgxIyILc8Xc86qOvfgahpdfWxbE+nkfbyR16Hks6Rp36r+JFNYiuhluuO1rcy7HDujU7B+gV4oGvGpXcwOt/8J4nGHldZ8mux0CnsqkU1LIG8XTTF+LMbXSSNniUZw59p6cIkrPOIC3nf9J/byjH5dcKrpuuj0bNusqd/L+sM165idMb8dZ8RzYnEgEGFqdJoJFvsiSECFMNMWDP9Qxb9F7f1nxM90Hmg62dzn0xWMpWkjBrpKj3akU2KL0sH4uNRN37+1z1smaC9aQo71WQaldU4XGIaLTqdxaq0/v5WmTr4k06y3yaC2o3Tq4rVWGGeOmRQfNyrBIvnVV5OXr03tO4+QhPWj89SVZirKzdIh5eY5RBz0c7g47RB X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Mar 15, 2023 at 09:03:57PM +0100, David Hildenbrand wrote: > On 10.03.23 19:28, Stefan Roesch wrote: > > So far KSM can only be enabled by calling madvise for memory regions. To > > be able to use KSM for more workloads, KSM needs to have the ability to be > > enabled / disabled at the process / cgroup level. > > > > Use case 1: > > The madvise call is not available in the programming language. An example for > > this are programs with forked workloads using a garbage collected language without > > pointers. In such a language madvise cannot be made available. > > > > In addition the addresses of objects get moved around as they are garbage > > collected. KSM sharing needs to be enabled "from the outside" for these type of > > workloads. > > > > Use case 2: > > The same interpreter can also be used for workloads where KSM brings no > > benefit or even has overhead. We'd like to be able to enable KSM on a workload > > by workload basis. > > > > Use case 3: > > With the madvise call sharing opportunities are only enabled for the current > > process: it is a workload-local decision. A considerable number of sharing > > opportuniites may exist across multiple workloads or jobs. Only a higler level > > entity like a job scheduler or container can know for certain if its running > > one or more instances of a job. That job scheduler however doesn't have > > the necessary internal worklaod knowledge to make targeted madvise calls. > > > > Security concerns: > > In previous discussions security concerns have been brought up. The problem is > > that an individual workload does not have the knowledge about what else is > > running on a machine. Therefore it has to be very conservative in what memory > > areas can be shared or not. However, if the system is dedicated to running > > multiple jobs within the same security domain, its the job scheduler that has > > the knowledge that sharing can be safely enabled and is even desirable. > > > > Performance: > > Experiments with using UKSM have shown a capacity increase of around 20%. > > Stefan, can you do me a favor and investigate which pages we end up > deduplicating -- especially if it's mostly only the zeropage and if it's > still that significant when disabling THP? > > > I'm currently investigating with some engineers on playing with enabling KSM > on some selected processes (enabling it blindly on all VMAs of that process > via madvise() ). > > One thing we noticed is that such (~50 times) 20MiB processes end up saving > ~2MiB of memory per process. That made me suspicious, because it's the THP > size. > > What I think happens is that we have a 2 MiB area (stack?) and only touch a > single page. We get a whole 2 MiB THP populated. Most of that THP is zeroes. > > KSM somehow ends up splitting that THP and deduplicates all resulting > zeropages. Thus, we "save" 2 MiB. Actually, it's more like we no longer > "waste" 2 MiB. I think the processes with KSM have less (none) THP than the > processes with THP enabled, but I only took a look at a sample of the > process' smaps so far. THP and KSM is indeed an interesting problem. Better TLB hits with THPs, but reduced chance of deduplicating memory - which may or may not result in more IO that outweighs any THP benefits. That said, the service in the experiment referenced above has swap turned on and is under significant memory pressure. Unused splitpages would get swapped out. The difference from KSM was from deduplicating pages that were in active use, not internal THP fragmentation.