From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98923C61DA4 for ; Wed, 15 Mar 2023 20:04:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CAC4E6B0071; Wed, 15 Mar 2023 16:04:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C35056B0072; Wed, 15 Mar 2023 16:04:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AAF086B0075; Wed, 15 Mar 2023 16:04:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 985176B0071 for ; Wed, 15 Mar 2023 16:04:06 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 660581611D1 for ; Wed, 15 Mar 2023 20:04:06 +0000 (UTC) X-FDA: 80572208892.06.5FC90E7 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf20.hostedemail.com (Postfix) with ESMTP id E13071C0018 for ; Wed, 15 Mar 2023 20:04:02 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=VVYBuAQl; spf=pass (imf20.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678910643; a=rsa-sha256; cv=none; b=cRJgGyHXuhiAeOhDJzlmIJ4F28hPYk+n9IWKjVScJqaNqnbKFJXi7Chx1NwrgIpMqVeDsi zQw8K6gWXYQwCERdmVzqklvMxluaNqkzNHdcQeAKbsNxC/lZyMXZIh7TvSGA/2LoGgQvUW U6wSqacXcp76byaaP3GCsn+HNYI/WSc= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=VVYBuAQl; spf=pass (imf20.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678910643; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NSv6En10ufoKiHLROyorHfxCgUmhUTCq5rCSmqtwV+g=; b=oDgG0XYCptoiweXn8uIl8/9+iM5lG/JztP8VUqCnFcWIHSRmMXfmSlS/tRlxi6eqIOsmgA wU1aidgrC01sAGCnKSqJGM0LP7Pe7BctdEowRV8qr4rK5tfU+JMavEjME6yyItN0zaxYty aHIOzKTM31wzZdDFyZ16Ru8Pm1OqoZY= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1678910642; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NSv6En10ufoKiHLROyorHfxCgUmhUTCq5rCSmqtwV+g=; b=VVYBuAQlNeYDZ3JPoaYcZu/dADY8xHL58pL+dWnJVpEWpQ48TDBlg47tIDZ1WKY9LMvpfJ FbrWMGZQ8vsQ6cudxwrqeqJiLBkr6b+yCRMWToYs6EFFksHr/tRm6u89AOf7NJ/tBnKhQ/ x+sn444a1mKJnA2Ff/aQtanSY8pm6vs= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-439-VhA_gHieNiqBwd983IsSaQ-1; Wed, 15 Mar 2023 16:04:00 -0400 X-MC-Unique: VhA_gHieNiqBwd983IsSaQ-1 Received: by mail-wr1-f71.google.com with SMTP id i25-20020adfaad9000000b002cff37de14fso591778wrc.16 for ; Wed, 15 Mar 2023 13:04:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678910639; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=NSv6En10ufoKiHLROyorHfxCgUmhUTCq5rCSmqtwV+g=; b=FPPqAVw/l2KyEpq/x4+38xGLzdg7IGRrqjla/oEWccTowMY2G4o2PrkV5InT42ck3q dqhlHU1uwMJBN4fpYTz5VDvKwvkQTYoPWSfFcUDz7tyFIT1Pw8hWcR31SeYX8JWDCNMm qnCod5hncsGSrOE46JvQQadHPYIhqzrpxFj2ye11XLZMy5TwoQgrgZ9B8vNp4jwhEPcl +r7g/cegWstY2UPP5MOx38mcASAvt11TEynOPQq6UshSDwWZ3JqIEZLv1+k1uVkIko/r YBCIhWezEDqc955u9cZ12/wq4dSGiYktfAI3zdFNlOrmdd74DXFcvP/iNPCHDWu7jW5j ELlA== X-Gm-Message-State: AO0yUKWSOafJTXRyMzcD0TMRJYYkC4bWrLPMCSknDNp06/4xSF6lnCEH YIWOnrR0SoHjoBXQKz7E1WtZj42bY5uZNyTWzK/3vfsDxndcJXVOvYkmXoB48JFxNqw8dmlMpcc uVY9Gpx3E8Ro= X-Received: by 2002:a5d:5913:0:b0:2cf:e449:1a9e with SMTP id v19-20020a5d5913000000b002cfe4491a9emr3337060wrd.30.1678910639401; Wed, 15 Mar 2023 13:03:59 -0700 (PDT) X-Google-Smtp-Source: AK7set8jbsZ/lktzSE96GY2vi72nsu08J4PQCvtujDalLrYzPL/tXRo3i0vfV9CbR6FW+MilGp1Xyg== X-Received: by 2002:a5d:5913:0:b0:2cf:e449:1a9e with SMTP id v19-20020a5d5913000000b002cfe4491a9emr3337032wrd.30.1678910639032; Wed, 15 Mar 2023 13:03:59 -0700 (PDT) Received: from ?IPV6:2003:cb:c702:2f00:2038:213d:e59f:7d44? (p200300cbc7022f002038213de59f7d44.dip0.t-ipconnect.de. [2003:cb:c702:2f00:2038:213d:e59f:7d44]) by smtp.gmail.com with ESMTPSA id c18-20020a5d4cd2000000b002ce9f0e4a8fsm5479753wrt.84.2023.03.15.13.03.57 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 15 Mar 2023 13:03:58 -0700 (PDT) Message-ID: <273a2f82-928f-5ad1-0988-1a886d169e83@redhat.com> Date: Wed, 15 Mar 2023 21:03:57 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 To: Stefan Roesch , kernel-team@fb.com Cc: linux-mm@kvack.org, riel@surriel.com, mhocko@suse.com, linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org, akpm@linux-foundation.org, hannes@cmpxchg.org, Mike Kravetz , Rik van Riel References: <20230310182851.2579138-1-shr@devkernel.io> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v4 0/3] mm: process/cgroup ksm support In-Reply-To: <20230310182851.2579138-1-shr@devkernel.io> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Queue-Id: E13071C0018 X-Rspamd-Server: rspam01 X-Stat-Signature: epxx3qjei1ema639mnku9fpx4w7z3cy1 X-HE-Tag: 1678910642-502703 X-HE-Meta: U2FsdGVkX19kPoh+qtvwfxpyU6ZArEHjI5xMZ9o8MlS0qw4MPqz094CzdHAKf4YOyrXPbWOg9eBRyIga++dnbT+kvOvwIvGGOZMj6/CK2t9qzDw/hw8y+HASE87D6hZy2sjI5FjkpiWECXYuzCj59UO2nIZvwI5kAz3Issf9L/5B/Ayd1CkQWAVXRT5GAbU4NbpxOI50NnW4IR35zgT0CRWyJHf7lDDbL2SdtnAhitelsPfrMLUsZ7OTXDlsljkF8CA2aitmDqSK2EZWw645dwYQeUMpe9f7MAhaOHjGKg6VZfDh58S9VXB+feBVRnGOk14FIjTDu9hmRL/tNcprwdFDxcwzJuKq8oWowaoTfKm6asWVoc9/6FO6Zg2X8x8aC2/Cim1IypV8YxvAF5+F+3U/9wK61EpSDAHrfMG2bWk8Glr0aOVllPJZeLNj11MadD6ZhT/c9qEKZEF21s3xY0L5U51FigS9uuj/TP51MuBdRPCzmMpKZ7Nt+iSzBNSDgfKaxk1jnVWWPZPeVaK1liS2DP9LIYwwp+9Hn1vVzY6DYTrNf4Ejf/zLbW409I53Mr7kr+EnkcVmJW6toQXcuj1w8fhKYB+0QsX5S7UXUCdgzavVzudMLSCckM2Sg96jhdBu8aB6DUD1EV68HgBrQ2b/G17KofsSnUy0Emy5QUCEHSgXs4rKoQ6XBmbJZvCADq0NcJHsWf+sXv44TCDPI1Okpadyu/5w33P9Q0Pt1RXmPUMQvZFufo3NR8NAqhAmw/jsj6V2jTZABYZGQpnEPbD0JsR90G4SxlSM81a8ECN4ndccykNMZgRZuz1HOjWYs1QDrwZkEUO+LiWp5SIZLxCuhqI67Ey6Ky13MoZ4hgq4qgaD/xRkJNRq7HfUZV8iAHNiVnkR19hFMij9WRnD3HcJ+qdwjzvM/78hgBMDFTwAfwigSCn80A/krzNOCR2h0MEw5YVb0vNawmBgC05 Sd7puhzM x3VKksZenWcRTsY8xtP7AsuCRaHUrz7hbyqibU3qYzs6F1kHJa9cez9zDr15j1CSlvdXTTWSa+II6wGRAdgbZhSGjqsalPFcNElzX2QbvX8NKDDyILleBXGgwYP5C1HHa1lIhvLNT/mW+rcvXMVdpCpMd+GBIlqs73MY1QyNtgfSnSQmOmVVFZpnpEqnAX9iLaOz0nQmq1iBXnhBFiwAuWz2L8fdBOY/Zezdmh1+xGVx+tF4Aey96R0kdMyi9oEaiWDkUllJh3BypuzUpFrtNb+UO/vrRi51BCMFaB4NP3xGzZrINkh1YCKi6Xw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 10.03.23 19:28, Stefan Roesch wrote: > So far KSM can only be enabled by calling madvise for memory regions. To > be able to use KSM for more workloads, KSM needs to have the ability to be > enabled / disabled at the process / cgroup level. > > Use case 1: > The madvise call is not available in the programming language. An example for > this are programs with forked workloads using a garbage collected language without > pointers. In such a language madvise cannot be made available. > > In addition the addresses of objects get moved around as they are garbage > collected. KSM sharing needs to be enabled "from the outside" for these type of > workloads. > > Use case 2: > The same interpreter can also be used for workloads where KSM brings no > benefit or even has overhead. We'd like to be able to enable KSM on a workload > by workload basis. > > Use case 3: > With the madvise call sharing opportunities are only enabled for the current > process: it is a workload-local decision. A considerable number of sharing > opportuniites may exist across multiple workloads or jobs. Only a higler level > entity like a job scheduler or container can know for certain if its running > one or more instances of a job. That job scheduler however doesn't have > the necessary internal worklaod knowledge to make targeted madvise calls. > > Security concerns: > In previous discussions security concerns have been brought up. The problem is > that an individual workload does not have the knowledge about what else is > running on a machine. Therefore it has to be very conservative in what memory > areas can be shared or not. However, if the system is dedicated to running > multiple jobs within the same security domain, its the job scheduler that has > the knowledge that sharing can be safely enabled and is even desirable. > > Performance: > Experiments with using UKSM have shown a capacity increase of around 20%. Stefan, can you do me a favor and investigate which pages we end up deduplicating -- especially if it's mostly only the zeropage and if it's still that significant when disabling THP? I'm currently investigating with some engineers on playing with enabling KSM on some selected processes (enabling it blindly on all VMAs of that process via madvise() ). One thing we noticed is that such (~50 times) 20MiB processes end up saving ~2MiB of memory per process. That made me suspicious, because it's the THP size. What I think happens is that we have a 2 MiB area (stack?) and only touch a single page. We get a whole 2 MiB THP populated. Most of that THP is zeroes. KSM somehow ends up splitting that THP and deduplicates all resulting zeropages. Thus, we "save" 2 MiB. Actually, it's more like we no longer "waste" 2 MiB. I think the processes with KSM have less (none) THP than the processes with THP enabled, but I only took a look at a sample of the process' smaps so far. I recall that there was a proposal to split underutilized THP and free up the zeropages (IIRC Rik was involved). I also recall that Mike reported memory waste due to THP. -- Thanks, David / dhildenb