From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E955BC433F5 for ; Wed, 13 Apr 2022 16:30:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7974C6B0074; Wed, 13 Apr 2022 12:30:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 71F576B0075; Wed, 13 Apr 2022 12:30:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 524836B0078; Wed, 13 Apr 2022 12:30:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 3D4406B0074 for ; Wed, 13 Apr 2022 12:30:54 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 17726612DE for ; Wed, 13 Apr 2022 16:30:54 +0000 (UTC) X-FDA: 79352394828.12.44D5ACD Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf08.hostedemail.com (Postfix) with ESMTP id 830DC160007 for ; Wed, 13 Apr 2022 16:30:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649867453; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JpsTDUjpmmHkgiKCF75tJ9Qb2QqmtHwjLJW7Lijr6Ec=; b=cNCbHTmqTb8bPVzu3CxkRyurrghpw3o5MLA01vLKGEsCpSreBiBEXUa5ImJDFNdZP1QIfx SfRGVuteoZfl5E6SXQXu9LtzIk8UdNg0VIEZwONoBUbykOGy/AltUjbDXCZftYXpH8H+3x imP2tsX6Lno33D2AYWaV22HNT5Hd/Dc= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-299-m8z-ZDG0MSmcUbFane3IBA-1; Wed, 13 Apr 2022 12:30:51 -0400 X-MC-Unique: m8z-ZDG0MSmcUbFane3IBA-1 Received: by mail-wm1-f72.google.com with SMTP id i66-20020a1c3b45000000b0038eab4e0feaso1093292wma.9 for ; Wed, 13 Apr 2022 09:30:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=JpsTDUjpmmHkgiKCF75tJ9Qb2QqmtHwjLJW7Lijr6Ec=; b=oeMYXnzNAMBc6V/3Gm+0s9nN4FGFMfkSAyTkiuYgrADn5LCnI0tJHdbzqs5rxqhFDy q9xV0EDqupVRv7fuE70zZzF9P7Own93s2sTVJ2KzWHahg2AfqT/nk10Hn3D+/YYFovtx Vf+QH+UKJrwhpkqB7fEYo5MSFop2ER/qzG5YIr7/4kjAasWIVxFWLyYFqvqGQbalQ/zB oUASq1IKcmy4jdoTlFJ6lZvHLUst+jzwXhZtCtmC3q6M/r3kDUYXe/7QPNtKQkPo4vEe KMqg4XSoyPxAyHivqFfRq/YMa7KQazuj+6lgpXj36/FdQK4uRlSVFZxa+XHOj6co48qx 6zew== X-Gm-Message-State: AOAM530xbCNmLQCWq42LPihSHSxxqn+F/IUjDch1SbOPl81VewIxLaJJ E3RdfCczDZwZYZjoNk0/T4M1vL+dmxX6DeTcigiKKHgAx6KBJbaIbQPJGT8D7Xd09mwM5qz+vzh E+KkNxsFeqzU= X-Received: by 2002:a05:600c:3512:b0:38c:be56:fc9c with SMTP id h18-20020a05600c351200b0038cbe56fc9cmr9458228wmq.197.1649867450683; Wed, 13 Apr 2022 09:30:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwkytZT68ibSBKNJoHlOqbNjWnA9Zh1oDwEE8SJcMIkEgIwTu5PARnU1fLD1KJu/KXuQqJYmA== X-Received: by 2002:a05:600c:3512:b0:38c:be56:fc9c with SMTP id h18-20020a05600c351200b0038cbe56fc9cmr9458202wmq.197.1649867450341; Wed, 13 Apr 2022 09:30:50 -0700 (PDT) Received: from ?IPV6:2003:cb:c704:5800:1078:ebb9:e2c3:ea8c? (p200300cbc70458001078ebb9e2c3ea8c.dip0.t-ipconnect.de. [2003:cb:c704:5800:1078:ebb9:e2c3:ea8c]) by smtp.gmail.com with ESMTPSA id f9-20020a05600c154900b0038cb98076d6sm3269751wmg.10.2022.04.13.09.30.47 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 13 Apr 2022 09:30:49 -0700 (PDT) Message-ID: <3b9effd9-4aba-e7ca-b3ca-6a474fd6469f@redhat.com> Date: Wed, 13 Apr 2022 18:30:47 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.2 To: Andy Lutomirski , Jason Gunthorpe Cc: Sean Christopherson , Chao Peng , kvm list , Linux Kernel Mailing List , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Linux API , qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , the arch/x86 maintainers , "H. Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A. Shutemov" , "Nakajima, Jun" , Dave Hansen , Andi Kleen , "Eric W. Biederman" References: <20220310140911.50924-1-chao.p.peng@linux.intel.com> <20220310140911.50924-5-chao.p.peng@linux.intel.com> <02e18c90-196e-409e-b2ac-822aceea8891@www.fastmail.com> <7ab689e7-e04d-5693-f899-d2d785b09892@redhat.com> <20220412143636.GG64706@ziepe.ca> <6f44ddf9-6755-4120-be8b-7a62f0abc0e0@www.fastmail.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v5 04/13] mm/shmem: Restrict MFD_INACCESSIBLE memory against RLIMIT_MEMLOCK In-Reply-To: <6f44ddf9-6755-4120-be8b-7a62f0abc0e0@www.fastmail.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=cNCbHTmq; spf=none (imf08.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 830DC160007 X-Stat-Signature: r7rgwb3zwone78egphfxjncfb9ggk1e9 X-HE-Tag: 1649867453-367747 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > > So this is another situation where the actual backend (TDX, SEV, pKVM, pure software) makes a difference -- depending on exactly what backend we're using, the memory may not be unmoveable. It might even be swappable (in the potentially distant future). Right. And on a system without swap we don't particularly care about mlock, but we might (in most cases) care about fragmentation with unmovable memory. > > Anyway, here's a concrete proposal, with a bit of handwaving: Thanks for investing some brainpower. > > We add new cgroup limits: > > memory.unmoveable > memory.locked > > These can be set to an actual number or they can be set to the special value ROOT_CAP. If they're set to ROOT_CAP, then anyone in the cgroup with capable(CAP_SYS_RESOURCE) (i.e. the global capability) can allocate movable or locked memory with this (and potentially other) new APIs. If it's 0, then they can't. If it's another value, then the memory can be allocated, charged to the cgroup, up to the limit, with no particular capability needed. The default at boot is ROOT_CAP. Anyone who wants to configure it differently is free to do so. This avoids introducing a DoS, makes it easy to run tests without configuring cgroup, and lets serious users set up their cgroups. I wonder what the implications are for existing user space. Assume we want to move page pinning (rdma, vfio, io_uring, ...) to the new model. How can we be sure a) We don't break existing user space b) We don't open the doors unnoticed for the admin to go crazy on unmovable memory. Any ideas? > > Nothing is charge per mm. > > To make this fully sensible, we need to know what the backend is for the private memory before allocating any so that we can charge it accordingly. Right, the support for migration and/or swap defines how to account. -- Thanks, David / dhildenb