From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE871C678D5 for ; Wed, 8 Mar 2023 17:30:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3D96A6B0072; Wed, 8 Mar 2023 12:30:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 389A66B0074; Wed, 8 Mar 2023 12:30:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 25160280001; Wed, 8 Mar 2023 12:30:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 11F2C6B0072 for ; Wed, 8 Mar 2023 12:30:11 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id B9CD412060C for ; Wed, 8 Mar 2023 17:30:10 +0000 (UTC) X-FDA: 80546419380.21.F962115 Received: from mail-qt1-f177.google.com (mail-qt1-f177.google.com [209.85.160.177]) by imf20.hostedemail.com (Postfix) with ESMTP id A07C91C002B for ; Wed, 8 Mar 2023 17:30:08 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=WTJCKYrl; spf=pass (imf20.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.177 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678296608; a=rsa-sha256; cv=none; b=B/pFbenmqsODuvOYcIpinb30NO9svU25q1Mv5ZBpbFIlxBxdUfFc/QEjpszWQfBX65yG0x q9mKRnIGdN0uD3PTHOBDZTu28rcSmRu0ckv+aJmYH8VNUzu9Gf8+bWgJ9u1SdYFy46pdcg /Tzkpe1YHEUB6xDOfOzC1s2T/8BFsfI= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=WTJCKYrl; spf=pass (imf20.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.177 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678296608; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9ofryIsss+bbfKW9sRlJ4gUUuHW01b2FzSgIEOep0Fw=; b=inlMASWCo6wlwu6ZkxiNa4JvyZTHtM7uWzTTGbsMIeiJvBoAzIVieaMEOCSC6Vg0tDZpW6 LUifUoYjeRd2FMu527KhFBOqgyXGI9tWKHVhkd03FtGeqR5WNiZBZR4/7ZhLnMQ/8L8b8/ /U2mXYunnAsUKWyG35m6zfYYOdIJocc= Received: by mail-qt1-f177.google.com with SMTP id r5so18914306qtp.4 for ; Wed, 08 Mar 2023 09:30:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20210112.gappssmtp.com; s=20210112; t=1678296607; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=9ofryIsss+bbfKW9sRlJ4gUUuHW01b2FzSgIEOep0Fw=; b=WTJCKYrlPO+z80Pj0ldUQiPsUhYYM3Lzu1D1J/OSaW7XBqfIxu9QFpb18NqjkIkyoE K8hLYPh3wWCXGptcswhGP/YEslOikQirsJ+Cq6byqp68SmOchcJa9Zd6pVt9vB2YS9bV Sku9qjINYCyLh1V8Se2D2jNIH3tC4Avf+tseSoUcPRNirfhGvy//ihgHijokpUoyslqJ RyDS0BsKyAy5Z1928GgvcS+ORxhj7kmCdI4DrzIbqcc1eOMWqSCedIBxBaMdCt/KM37l YE9fsU4HhYLKK6W7NmP8X946Kg+GuUO6ZIMg9MuYmisrru6YdLIWOwJsSnL1D1wP/9Hl mmHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678296607; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=9ofryIsss+bbfKW9sRlJ4gUUuHW01b2FzSgIEOep0Fw=; b=MP5MBwrhBwUx6asypRaikURwyA5+GF8NwvnhS5JOu6ieB2b1pxL5+TejkJswB33Buv 26KW+F8J9KilJz0KoWivx6ODO11aPr22EUPrhH2OEptgot9mGnZO5mmjpaYikoThr1Re 1YxP+GLX0j2MIxGXKofV2hWhherPRbuGrMk3pSSGF5EMvx6tk2HoteRCIBQWsbGC9m1C MbX7j6gTBbBba3LLn5H6WxP2SGrYxdMxj//sWlcPey/wKYHZw7yGhW6wEpUsI4qLlC8d JZmqRw1R7m9kYxzCU0Ga48LImkP2DekSzziOHVq3cVWsP55tegNJRQhfAZ8s5YeNfkev yNnQ== X-Gm-Message-State: AO0yUKWF59Ii4o8Yv8SlQIEOi4EDepRvAVnoQJj5LsGvWYVUqMmJ3GXF noePhco5C4cwOlu/U2Ea7sa0TQ== X-Google-Smtp-Source: AK7set/8J01RvgfncU4nxvd+VBrZobaqJFtIVM0M54i84BS0aGGdL7XMMjkbFYRNSCsnKxJyTYSiJg== X-Received: by 2002:a05:622a:1aa4:b0:3bd:156f:6666 with SMTP id s36-20020a05622a1aa400b003bd156f6666mr4748778qtc.26.1678296607609; Wed, 08 Mar 2023 09:30:07 -0800 (PST) Received: from localhost (2603-7000-0c01-2716-8f57-5681-ccd3-4a2e.res6.spectrum.com. [2603:7000:c01:2716:8f57:5681:ccd3:4a2e]) by smtp.gmail.com with ESMTPSA id f13-20020ac87f0d000000b003b9b48cdbe8sm8515424qtk.58.2023.03.08.09.30.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Mar 2023 09:30:07 -0800 (PST) Date: Wed, 8 Mar 2023 12:30:06 -0500 From: Johannes Weiner To: David Hildenbrand Cc: Stefan Roesch , kernel-team@fb.com, linux-mm@kvack.org, riel@surriel.com, mhocko@suse.com, linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org, akpm@linux-foundation.org Subject: Re: [PATCH v3 0/3] mm: process/cgroup ksm support Message-ID: <20230308173006.GA476158@cmpxchg.org> References: <20230224044000.3084046-1-shr@devkernel.io> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Queue-Id: A07C91C002B X-Rspamd-Server: rspam01 X-Stat-Signature: qt9d1d416a58ix935d4m3qgdxxthjdue X-HE-Tag: 1678296608-387075 X-HE-Meta: U2FsdGVkX1+42XqQkaKPAoFq89y2ao9WCEqE46Irx46a5dUZGXDMW+fMXiu+oPDocxIe+7RT37yG2nYiuKsuNd/A2vOkKJxUptf7xTAkrPEZWMA0XnIZ7qh/5ivBs4t77GnTujpQ/xzmySVzbzRjHOoXLId7OU4FNDDoZEBros8QN0XtZ3rq/3kFWTtU/fD+yvCbNHQxvTi33nAysT8XuquJ/VoMU2Z8HBmfpfchSJY1lBTIqueyx6IAPM8pEf3vL0tUPlZVaqueXcRbFNIy0NThn6zg1b4j50Q777IoKuHTpDfK2xl28PdLwiJjFu8eQd116G3KTRSd2Yr+UET2kWYhlBjJzmn4H49yq7XeKghACZ6karcA0eFIAOKlUjae9TeX37xZM18mWGpo5V4n4hlQ5frofYQjaG7Hm53nYc02kbW2rGOrJeSHiDsH0x0IBt1d8uqU1c0xtkRvF0xkUrb7zTJ3G+QC/tXj1viJb4H0xIDcLiHt3Qj54upMpwqiKKipyOkkR4n2ci5XSdjVWrDEC5onK3krkWktCTFDHFhacFtlX2N5489agJwdIc2/dG19bTaL5FY96MwCK3O2dx0j2m8Zv+X7OVg3BbKFig/ITl+gwlOYkkaGO8Xfzpy7fGNov6IJFamfjhxaFqXOfKwAivTNH1VY1p525Z4DsCqSqV8bvr7ti/B/u9p3TsNKhIVMMNe/6lLqilJb7fmGpF43udVFialhfglVUuIfXvR0rdaF2nDr3NYDUfU90j7zBQFTz1bBIi0W2/MNKG4I1OxodZ5bQ4mUSoe8nvGwt+hRrs9T/9M1NHuPnKU8QxPdnf4m2ZqH6iBU/97OtzzpryjLadStUppQpq9OBXvZC29/42X4fGd7THOM5fkTjecmeByvyN+x7djjGu/9OV+NOmbfDpaNuprEViZQROVZoKBATaXUMPoj74Qo8o63pxuKtDaHm1QGXu75Vcma04s lOdrXp7X iEVsvPAEWzYBMNaCV+hDuBIc3x62a9Ph5TpXwobohiDqPLIQ6tOnIH9GWXTXdLnf2V3wv3rFVjScpqAcmHS1cpT8e3pyg7H0lQCZxU8XqK5WX0hTwadwpa6p2PxxPwYLZwipRzJcI3HrQJp2JjixkJE7GjovmbhyXpfn5QBTHRaVUwHV5ZDf/FOf7+CVVnYC1P7Jz59ITF8HtIWwrHDYzUigkCp1w+qUnXHavO/R5dU/rMb8gKFHSz/rfkCfjVMjUiM28tdgOYvZ/CsKFlOfLxrryI0mDBBkZG70HTKdgxHp8mgWeN6Xoq+EJMPrxjM5BvSlrH6jgGAUBQ8AuySe4Bn2a9Ahs+xq7oZSKF96op3s2itTrVPRDU6Xt6OiYFKeqAdL5U9TyyybzhO1QQgOPiYLjTIC++uRHGJPk X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hey David, On Wed, Mar 08, 2023 at 06:01:14PM +0100, David Hildenbrand wrote: > For some reason gmail thought it would be a good ideas to move this into the > SPAM folder, so I only saw the recent replies just now. > > I'm going to have a look at this soonish. Thanks! More eyes are always helpful. > One point that popped up in the past and that I raised on the last RFC: we > should think about letting processes *opt out/disable* KSM on their own. > Either completely, or for selected VMAs. > > Reasoning is, that if you have an application that really doesn't want some > memory regions to be applicable to KSM (memory de-duplication attacks? > Knowing that KSM on some regions will be counter-productive) > > For example, remembering if MADV_UNMERGEABLE was called and not only > clearing the VMA flag. So even if KSM would be force-enabled by some tooling > after the process started, such regions would not get considered for KSM. > > It would a bit like how we handle THP. I'm not sure the THP comparison is apt. THP is truly a local optimization that depends on the workload's access patterns. The environment isn't a true factor. It makes some sense that if there is a global policy to generally use THP the workload be able to opt out based on known sparse access patterns. At least until THP allocation strategy inside the kernel becomes smarter! Merging opportunities and security questions are trickier. The application might know which data is sensitive, but it doesn't know whether its environment is safe or subject do memory attacks, so it cannot make that decision purely from inside. There is a conceivable usecase where multiple instances of the same job are running inside a safe shared security domain and using the same sensitive data. There is a conceivable usecase where the system and the workload collaborate to merge insensitive data across security domains. I'm honestly not sure which usecase is more likely. My gut feeling is the first one, simply because of broader concerns of multiple security domains sharing kernel instances or physical hardware. > On 24.02.23 05:39, Stefan Roesch wrote: > > So far KSM can only be enabled by calling madvise for memory regions. To > > be able to use KSM for more workloads, KSM needs to have the ability to be > > enabled / disabled at the process / cgroup level. > > > > Use case 1: > > The madvise call is not available in the programming language. An example for > > this are programs with forked workloads using a garbage collected language without > > pointers. In such a language madvise cannot be made available. > > > > In addition the addresses of objects get moved around as they are garbage > > collected. KSM sharing needs to be enabled "from the outside" for these type of > > workloads. > > > > Use case 2: > > The same interpreter can also be used for workloads where KSM brings no > > benefit or even has overhead. We'd like to be able to enable KSM on a workload > > by workload basis. > > > > Use case 3: > > With the madvise call sharing opportunities are only enabled for the current > > process: it is a workload-local decision. A considerable number of sharing > > opportuniites may exist across multiple workloads or jobs. Only a higler level > > entity like a job scheduler or container can know for certain if its running > > one or more instances of a job. That job scheduler however doesn't have > > the necessary internal worklaod knowledge to make targeted madvise calls. > > > > Security concerns: > > In previous discussions security concerns have been brought up. The problem is > > that an individual workload does not have the knowledge about what else is > > running on a machine. Therefore it has to be very conservative in what memory > > areas can be shared or not. However, if the system is dedicated to running > > multiple jobs within the same security domain, its the job scheduler that has > > the knowledge that sharing can be safely enabled and is even desirable. > > Note that there are some papers about why limiting memory deduplciation > attacks to single security domains is not sufficient. Especially, the remote > deduplication attacks fall into that category IIRC. I think it would be good to elaborate on that and include any caveats in the documentation. Ultimately, the bar isn't whether there are attack vectors on a subset of possible usecases, but whether there are usecases where this can be used safely, which is obviously true.