From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25B78C76196 for ; Mon, 3 Apr 2023 17:02:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A88E96B0071; Mon, 3 Apr 2023 13:02:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A392B6B0074; Mon, 3 Apr 2023 13:02:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8D96B6B0075; Mon, 3 Apr 2023 13:02:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 7D6D46B0071 for ; Mon, 3 Apr 2023 13:02:45 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 39BCA140BF4 for ; Mon, 3 Apr 2023 17:02:45 +0000 (UTC) X-FDA: 80640699090.24.5816A08 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf26.hostedemail.com (Postfix) with ESMTP id AB178140003 for ; Mon, 3 Apr 2023 17:02:41 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=MpB8DYh4; spf=pass (imf26.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680541361; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kM3dwgQ9HVMfQOAjj/i3eYxZUwtH+i5WmR63AuQM8dg=; b=nG1SC0o9b9QSi5fKTfLIrriM84FCP1qT7GIiaLjvOsRvNfwqUAPzRrcQO8/9Hd3FQueDRc JOChnFMXa8K+wwqoPVij1l7Is3w1vFV/NKf5p59KMQ/VoENpHrh8CaJbvJDGLJb6KEnVAC Dwr1Tw9PrzOzROZ0b3i5+7OMuWwvO/Q= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=MpB8DYh4; spf=pass (imf26.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680541361; a=rsa-sha256; cv=none; b=sMiXJuEpZBw50JpiZ8rErA8358EhReT5CROpMToqN3yP1oPYM4MwatZUoS5Awmenll0XCE tEuuZwSfGR2FOn3uIL0kUEuIcC8jseYnanNFrNOxEOVhM8+F3EmKQfCywCOLEK1KwDoXvF fKqEG6fkE8wrNtI8bFE8GXdDKh9ksec= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680541361; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kM3dwgQ9HVMfQOAjj/i3eYxZUwtH+i5WmR63AuQM8dg=; b=MpB8DYh4RCL2x9m25e2EOlNCJmfzh5JlUMtuLeXr5929jf2s4zZNwHSDzy5ufP3v5RFpJI Se1qwcZaIZYTp9g/N5MQcBudJVGuH9O4HWESRzVP9RM+HpvQOdAB0oFPedWLqxqNI1AxPD x2VoLVIDh6nVkgJAhRE4o5R+eV2oXvk= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-345-BoHzYqxJNTyu36ug1_oI1A-1; Mon, 03 Apr 2023 13:02:39 -0400 X-MC-Unique: BoHzYqxJNTyu36ug1_oI1A-1 Received: by mail-wm1-f72.google.com with SMTP id m5-20020a05600c4f4500b003ee8db23ef9so16267075wmq.8 for ; Mon, 03 Apr 2023 10:02:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680541358; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=kM3dwgQ9HVMfQOAjj/i3eYxZUwtH+i5WmR63AuQM8dg=; b=gedG3aEqyGj7K9HK1HWroUua7wrTq+y35+cXFfFZgIHKjzk+thnElsNMuPcn/UtHMS efJCzlAVDPgocmaC0FPGBGGy7Kfuu5TZ2z5hwdUW6J8veYlkOFQtlNRd2rbRaeCBUPI7 ACOGZEj2YSDK4G7E/qxh929FyA2Do0W+n8i4U/Mg4LZ0/SaX6eC/r36GyKOlXfEyxg2m 51NrSOMi3uEHT01Qpqdf9F17Ryeqq+78vo+HrskrRuBtPwDfvcMrVXrow9aFN93UNuFc zXqtnvPfPvStfAnbuJbJV/FbV+Yi8ziyMPrYdGaDVXZ5XiOEwu8TACQhvTgEGn1TwEeh d6Gg== X-Gm-Message-State: AAQBX9cH5X9uluIPueglr+Ec02YactDGtuau6ZK776hn5j5mFwbzl0eT Lwkb/daW/7nbhm4BjBYvJmMSxZMJ9FVpZXTEro+LjMlQAYCC021/gPIAQN29iJJU1bXElJ9f1de AHAWRH/JLnjY= X-Received: by 2002:a5d:440e:0:b0:2e5:6441:2d16 with SMTP id z14-20020a5d440e000000b002e564412d16mr9153092wrq.42.1680541358589; Mon, 03 Apr 2023 10:02:38 -0700 (PDT) X-Google-Smtp-Source: AKy350bNyO+IrhEkypzMTwynN0OpO/ura/HmTqPcjZWbU5s0lV7LLf87+mbZZXdKp58mVrklwo0Ylg== X-Received: by 2002:a5d:440e:0:b0:2e5:6441:2d16 with SMTP id z14-20020a5d440e000000b002e564412d16mr9153076wrq.42.1680541358251; Mon, 03 Apr 2023 10:02:38 -0700 (PDT) Received: from ?IPV6:2003:cb:c702:5e00:8e78:71f3:6243:77f0? (p200300cbc7025e008e7871f3624377f0.dip0.t-ipconnect.de. [2003:cb:c702:5e00:8e78:71f3:6243:77f0]) by smtp.gmail.com with ESMTPSA id g7-20020a5d5407000000b002e6423cb207sm9071350wrv.112.2023.04.03.10.02.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 03 Apr 2023 10:02:37 -0700 (PDT) Message-ID: <0235770b-eb37-88e0-9350-a2d9c0cf9c32@redhat.com> Date: Mon, 3 Apr 2023 19:02:36 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 To: Stefan Roesch Cc: kernel-team@fb.com, linux-mm@kvack.org, riel@surriel.com, mhocko@suse.com, linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org, akpm@linux-foundation.org, hannes@cmpxchg.org, Bagas Sanjaya References: <20230310182851.2579138-1-shr@devkernel.io> <20230310182851.2579138-2-shr@devkernel.io> <7ed4308d-b400-d2bb-b539-3fe418862ab8@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v4 1/3] mm: add new api to enable ksm per process In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: unqr69xwpp1j9kz1n4uee3mjr68kk5sj X-Rspamd-Queue-Id: AB178140003 X-HE-Tag: 1680541361-872164 X-HE-Meta: U2FsdGVkX1+nd/ugg9f9qt/NvHicQ9Vqn/aqAkD275dOT6CXW+ioy0pbt/8lVi0Od/xZdBVNv8AaUJHBWiJC7wBzt4e9DUWdmYh9SnWJ5fcEy8cT2IHKdwk2kVWDeCnNAcqiqHah1lxnyuyWNhvRxY0HG1029p7puhvcdK4vx3WL6674jAStWvUzQ1QGKask51iWblKPmPW0MkHmSfcUAy2W19sC0pKv9Zqb4xp7mUAcigMnuM5ftzkvV/i0QTmWLIwX8L7OV8BijVdokmr79CTc4KfwKCqZ4HJXNENDQdDV1NF4yt/qqkIo132YSISPQUO6mklZw1Ff3pg62SHAp44prPP3YSB/MSNGQEWvjP+7ETRseM7s4R7J0llMwKXkY6i2DowSfJGGzlZhhpVT28GuRn6xVP83e2t740XhG4pCZmfhr/byCOjtSGXzfweAU/Bz5O9X6Dnuknv3xZu8gMsrp47dqeNhbQ93Z1XY3sfCl8ljc1l9RXh1BHn2cMNNkK+UUbgXqxqesAqEXIy/H39GdL2aG3euxaCuxz5zvHvG3vjqIIPUNqq8P8nkKXstRbCD4dkaE+DUOi8ya52K/OisYn8lDGftIKyTbC0xfHoRS/Z3AfGmmifPb0Xv197RsgsjIfepcX27biQ36KUPyjibBxwT7ntmyVtvk+JCmmvvwx6wLrIryJhVy13y7Ewy+1nrRb3vEUIP4QMctwmqcBDkO0A7oxasgq3xCiAS5zn0tZ7ORLjPef+syZiGuULNXBxfbbsPjJ0BDs6lJxfaEDguNZ75JKnsArEh4alQlbla4Qc1U6/D8coCaDRInFDR3Sw3F3fFDxVD6/I9IlIUtD41yX4HoOXB7EAhYqjJigiMuZPHgxsd1meytv4z4n9hHNIum9chrKI4syPU3z/i5aSACB7ms4syhAKhwptQpgFLSMr+Y7jhL4z1XjN434NfCNXAUNt1BG7xnzo4Gws WGVpiXEH V1DqHZ6i+t4Mw5KwKqBzmy/jQGAh9LeycWdRgwD1BWdDfQIBp5duLbj9f+QQ+WtN8Hrzyi1VhudTiO/yimDwmYwTbRgaH913a7LYoPDitQ3Lvt0ovKxIK12huVty/VnweCXIDYr/39wh21dJ3P2j7b+qzxdTr02w1oCdwg22XYD0eCWIsC7OvyZMC2Xrm13ykhdkzciCqm2mjeREIXvZHlGCnvwkkccM5zAOo1iaFVrQo/RJHHPNUYk6zZv9Hf0MwWirlKMaivPGRw99ck9/jqeY6twvMWt48axSOefToAOvisWtz9F93AQWvcOEO7GXuiEGTJyr3oxn6tl6Xla0WIDuLUbv+dgJzI2FwIIzLf8laxF8jUNo9Gy8pyOgD1ZewWhuxVNWlKiHO76bYe3nxGxcqUg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 03.04.23 17:50, Stefan Roesch wrote: >> I guess the interpreter could enable it (like a memory allocator could enable it >> for the whole heap). But I get that it's much easier to enable this per-process, >> and eventually only when a lot of the same processes are running in that >> particular environment. >> > > We don't want it to get enabled for all workloads of that interpreter, > instead we want to be able to select for which workloads we enable KSM. > Right. > >>> 1. New options for prctl system command >>> This patch series adds two new options to the prctl system call. >>> The first one allows to enable KSM at the process level and the second >>> one to query the setting. >>> The setting will be inherited by child processes. >>> With the above setting, KSM can be enabled for the seed process of a >>> cgroup and all processes in the cgroup will inherit the setting. >>> 2. Changes to KSM processing >>> When KSM is enabled at the process level, the KSM code will iterate >>> over all the VMA's and enable KSM for the eligible VMA's. >>> When forking a process that has KSM enabled, the setting will be >>> inherited by the new child process. >>> In addition when KSM is disabled for a process, KSM will be disabled >>> for the VMA's where KSM has been enabled. >> >> Do we want to make MADV_MERGEABLE/MADV_UNMERGEABLE fail while the new prctl is >> enabled for a process? > > I decided to allow enabling KSM with prctl even when MADV_MERGEABLE, > this allows more flexibility. MADV_MERGEABLE will be a nop. But IIUC, MADV_UNMERGEABLE will end up calling unmerge_ksm_pages() and clear VM_MERGEABLE. But then, the next KSM scan will merge the pages in there again. Not sure if that flexibility is worth having. [...] >>> @@ -2661,6 +2662,32 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, >>> case PR_SET_VMA: >>> error = prctl_set_vma(arg2, arg3, arg4, arg5); >>> break; >>> +#ifdef CONFIG_KSM >>> + case PR_SET_MEMORY_MERGE: >>> + if (!capable(CAP_SYS_RESOURCE)) >>> + return -EPERM; >>> + >>> + if (arg2) { >>> + if (mmap_write_lock_killable(me->mm)) >>> + return -EINTR; >>> + >>> + if (!test_bit(MMF_VM_MERGE_ANY, &me->mm->flags)) >>> + error = __ksm_enter(me->mm, MMF_VM_MERGE_ANY); >> >> Hm, I think this might be problematic if we alread called __ksm_enter() via >> madvise(). Maybe we should really consider making MMF_VM_MERGE_ANY set >> MMF_VM_MERGABLE instead. Like: >> >> error = 0; >> if(test_bit(MMF_VM_MERGEABLE, &me->mm->flags)) >> error = __ksm_enter(me->mm); >> if (!error) >> set_bit(MMF_VM_MERGE_ANY, &me->mm->flags); >> > > If we make that change, we would no longer be able to distinguish > if MMF_VM_MERGEABLE or MMF_VM_MERGE_ANY have been set. Why would you need that exactly? To cleanup? See below. > >>> + mmap_write_unlock(me->mm); >>> + } else { >>> + __ksm_exit(me->mm, MMF_VM_MERGE_ANY); >> >> Hm, I'd prefer if we really only call __ksm_exit() when we really exit the >> process. Is there a strong requirement to optimize disabling of KSM or would it >> be sufficient to clear the MMF_VM_MERGE_ANY flag here? >> > Then we still have the mm_slot allocated until the process gets > terminated. Which is the same as using MADV_UNMERGEABLE, no? > >> Also, I wonder what happens if we have another VMA in that process that has it >> enabled .. >> >> Last but not least, wouldn't we want to do the same thing as MADV_UNMERGEABLE >> and actually unmerge the KSM pages? >> > Do you want to call unmerge for all VMA's? The question is what clearing MMF_VM_MERGE_ANY is supposed to do. If it's supposed to disable KSM (like MADV_UNMERGEABLE) would, then I guess you should go over all VMA's and unmerge. Also, it depend on how you want to handle VM_MERGABLE with MMF_VM_MERGE_ANY. If MMF_VM_MERGE_ANY would not set VM_MERGABLE, then you'd only unmerge where VM_MERGABLE is not set. Otherwise, you'd unshare everywhere where VM_MERGABLE is set (and clear VM_MERGABLE) while at it. Unsharing when clearing MMF_VM_MERGE_ANY might be the right thing to do IMHO. I guess the main questions regarding implementation are: 1) Do we want setting MMF_VM_MERGE_ANY to set VM_MERGABLE on all candidate VMA's (go over all VMA's and set VM_MERGABLE). Then, clearing MMF_VM_MERGE_ANY would simply unmerge and clear VM_MERGABLE on all VMA's. 2) Do we want to make MMF_VM_MERGE_ANY imply MMF_VM_MERGABLE. You could still disable KSM (__ksm_exit()) during clearing MMF_VM_MERGE_ANY after going over all VMA's (where you might want to unshare already either way). I guess the code will end up simpler if you make MMF_VM_MERGE_ANY simply piggy-back on MMF_VM_MERGABLE + VM_MERGABLE. I might be wrong, of course. -- Thanks, David / dhildenb