From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F84FC433F5 for ; Fri, 8 Apr 2022 18:54:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BE7026B0072; Fri, 8 Apr 2022 14:54:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B96086B0073; Fri, 8 Apr 2022 14:54:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A10256B0074; Fri, 8 Apr 2022 14:54:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0029.hostedemail.com [216.40.44.29]) by kanga.kvack.org (Postfix) with ESMTP id 8FD4B6B0072 for ; Fri, 8 Apr 2022 14:54:10 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 3E8BA8249980 for ; Fri, 8 Apr 2022 18:54:10 +0000 (UTC) X-FDA: 79334611860.26.D462FF6 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf14.hostedemail.com (Postfix) with ESMTP id B035F100005 for ; Fri, 8 Apr 2022 18:54:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649444049; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9uyU2VFoeBsyhZOJQIlctyu40TJSn7XsF2RPXyVl1do=; b=dJW+NN+Q8CnIDKxH2Cmp05/NBGeR8ahGcN8gLqcVHp+q2UtArQwkQkTS395hwid15Sz0XW NfBALhlGXwJbrpCP17VMXQuX5D21Pg5lIkSSMxYZ9K42cGflWyKaG75kIVFsgOkiWDnGlE 3e1tMdIoJnlIEQhn0Il575mDPu/YXeo= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-354-xIvK7wqVOTmIFn4BkpjuKw-1; Fri, 08 Apr 2022 14:54:06 -0400 X-MC-Unique: xIvK7wqVOTmIFn4BkpjuKw-1 Received: by mail-wm1-f70.google.com with SMTP id k16-20020a7bc310000000b0038e6cf00439so4785060wmj.0 for ; Fri, 08 Apr 2022 11:54:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=9uyU2VFoeBsyhZOJQIlctyu40TJSn7XsF2RPXyVl1do=; b=vTL7Yed+e7aUuZlESMb2hGLHX0W6ITnH3Qk7qa8CgrPO2QVOz60OoeaJuDa9nJIx7H 8HUoBYcX/uOXxQiRBZ+JNGU7djA5ti0AMoiz665Yn+muYJvFWugd0OPcGm8BBDpAxmzV oUg/+laij4ORp41Rcz8MhviCDuapFYrk3HqXv3i6WA1I2ivmwLBvo0vKVLy/+nx9AlHS hbnEGwnorS1IByWiibKIPrlzbmwMmu7TuxrRx3gryXtKtNPbQ6Jtx9Y0BvTaaoCKHLXv 5HcAGdCb8NZiAklaQMhNorDx77a/tnmWDcYx5FcWtieF3VfiIYr3VpX1UeYp9yKG7Nm3 kcXA== X-Gm-Message-State: AOAM531R+MRudV4lFWnJHDmOiN7tC/bslT5wiaDqSRNnY4WBe0KRI9Bx fdpYUFlWS5y8E908ku8rQJx91pJPjE6eaHq7lDF19Ti1+WZk/sj2KJSZmUvChfOqlazdjB7XYj/ Ns5PZyJiI1aE= X-Received: by 2002:a05:600c:4f96:b0:38e:7dbf:f80b with SMTP id n22-20020a05600c4f9600b0038e7dbff80bmr18613215wmq.2.1649444044833; Fri, 08 Apr 2022 11:54:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwfhFIN8L3LyDcNyYvbpwcm/BIFpkPDSLWEfCSbEb0j8Zms39cvKkg2Ev9kPXAaH+LcMXIctA== X-Received: by 2002:a05:600c:4f96:b0:38e:7dbf:f80b with SMTP id n22-20020a05600c4f9600b0038e7dbff80bmr18613189wmq.2.1649444044533; Fri, 08 Apr 2022 11:54:04 -0700 (PDT) Received: from ?IPV6:2003:cb:c704:fd00:612:f12b:a4a2:26b0? (p200300cbc704fd000612f12ba4a226b0.dip0.t-ipconnect.de. [2003:cb:c704:fd00:612:f12b:a4a2:26b0]) by smtp.gmail.com with ESMTPSA id o19-20020a05600c511300b0038d0d8f67e5sm10994785wms.16.2022.04.08.11.54.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 08 Apr 2022 11:54:04 -0700 (PDT) Message-ID: <7ab689e7-e04d-5693-f899-d2d785b09892@redhat.com> Date: Fri, 8 Apr 2022 20:54:02 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.2 To: Sean Christopherson , Andy Lutomirski Cc: Chao Peng , kvm list , Linux Kernel Mailing List , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Linux API , qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , the arch/x86 maintainers , "H. Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A. Shutemov" , "Nakajima, Jun" , Dave Hansen , Andi Kleen References: <20220310140911.50924-1-chao.p.peng@linux.intel.com> <20220310140911.50924-5-chao.p.peng@linux.intel.com> <02e18c90-196e-409e-b2ac-822aceea8891@www.fastmail.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v5 04/13] mm/shmem: Restrict MFD_INACCESSIBLE memory against RLIMIT_MEMLOCK In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 X-Rspam-User: Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dJW+NN+Q; spf=none (imf14.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: B035F100005 X-Stat-Signature: hc9gky3ezw185oe3ceix1ii84uzygnpi X-HE-Tag: 1649444049-431364 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 08.04.22 19:56, Sean Christopherson wrote: > On Thu, Apr 07, 2022, Andy Lutomirski wrote: >> >> On Thu, Apr 7, 2022, at 9:05 AM, Sean Christopherson wrote: >>> On Thu, Mar 10, 2022, Chao Peng wrote: >>>> Since page migration / swapping is not supported yet, MFD_INACCESSIB= LE >>>> memory behave like longterm pinned pages and thus should be accounte= d to >>>> mm->pinned_vm and be restricted by RLIMIT_MEMLOCK. >>>> >>>> Signed-off-by: Chao Peng >>>> --- >>>> mm/shmem.c | 25 ++++++++++++++++++++++++- >>>> 1 file changed, 24 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/mm/shmem.c b/mm/shmem.c >>>> index 7b43e274c9a2..ae46fb96494b 100644 >>>> --- a/mm/shmem.c >>>> +++ b/mm/shmem.c >>>> @@ -915,14 +915,17 @@ static void notify_fallocate(struct inode *ino= de, pgoff_t start, pgoff_t end) >>>> static void notify_invalidate_page(struct inode *inode, struct foli= o *folio, >>>> pgoff_t start, pgoff_t end) >>>> { >>>> -#ifdef CONFIG_MEMFILE_NOTIFIER >>>> struct shmem_inode_info *info =3D SHMEM_I(inode); >>>> =20 >>>> +#ifdef CONFIG_MEMFILE_NOTIFIER >>>> start =3D max(start, folio->index); >>>> end =3D min(end, folio->index + folio_nr_pages(folio)); >>>> =20 >>>> memfile_notifier_invalidate(&info->memfile_notifiers, start, end); >>>> #endif >>>> + >>>> + if (info->xflags & SHM_F_INACCESSIBLE) >>>> + atomic64_sub(end - start, ¤t->mm->pinned_vm); >>> >>> As Vishal's to-be-posted selftest discovered, this is broken as curre= nt->mm >>> may be NULL. Or it may be a completely different mm, e.g. AFAICT the= re's >>> nothing that prevents a different process from punching hole in the s= hmem >>> backing. >>> >> >> How about just not charging the mm in the first place? There=E2=80=99= s precedent: >> ramfs and hugetlbfs (at least sometimes =E2=80=94 I=E2=80=99ve lost tr= ack of the current >> status). >> >> In any case, for an administrator to try to assemble the various rlimi= ts into >> a coherent policy is, and always has been, quite messy. ISTM cgroup li= mits, >> which can actually add across processes usefully, are much better. >> >> So, aside from the fact that these fds aren=E2=80=99t in a filesystem = and are thus >> available by default, I=E2=80=99m not convinced that this accounting i= s useful or >> necessary. >> >> Maybe we could just have some switch require to enable creation of pri= vate >> memory in the first place, and anyone who flips that switch without >> configuring cgroups is subject to DoS. >=20 > I personally have no objection to that, and I'm 99% certain Google does= n't rely > on RLIMIT_MEMLOCK. >=20 It's unnacceptable for distributions to have random unprivileged users be able to allocate an unlimited amount of unmovable memory. And any kind of these "switches" won't help a thing because the distribution will have to enable them either way. I raised in the past that accounting might be challenging, so it's no surprise that something popped up now. RLIMIT_MEMLOCK was the obvious candidate, but as we discovered int he past already with secretmem, it's not 100% that good of a fit (unmovable is worth than mlocked). But it gets the job done for now at least. So I'm open for alternative to limit the amount of unmovable memory we might allocate for user space, and then we could convert seretmem as well= . Random switches are not an option. --=20 Thanks, David / dhildenb