From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4DB67C0015E for ; Tue, 15 Aug 2023 07:47:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CEB44900024; Tue, 15 Aug 2023 03:47:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C9AF090000B; Tue, 15 Aug 2023 03:47:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B89A0900024; Tue, 15 Aug 2023 03:47:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A636C90000B for ; Tue, 15 Aug 2023 03:47:02 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 74C731C9C08 for ; Tue, 15 Aug 2023 07:47:02 +0000 (UTC) X-FDA: 81125557884.23.E16619A Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) by imf23.hostedemail.com (Postfix) with ESMTP id 85196140012 for ; Tue, 15 Aug 2023 07:47:00 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=arista.com header.s=google header.b=FOZc0Jg5; spf=pass (imf23.hostedemail.com: domain of snaipe@arista.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=snaipe@arista.com; dmarc=pass (policy=reject) header.from=arista.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692085620; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OGe5aB5QQFtrcwstILThBVk8o25menFmktNdebKAlpo=; b=KDTDFdHtn9ZcleoqSqeIs9p0NVn+XmnRFzpncxksbXcY65UXNnTUKmD3CkWegBu7kZ151r m+AJ+ToIkskHje3gt9TVA9/1Vdiq7uYQ1F1O2StKQO9LQRvhTTtNGi4qh4UZzEvi357GBo xJxLBs7F2nPIZ7AS65IcL6WhLd/KHbs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692085620; a=rsa-sha256; cv=none; b=4CyIaGgJDHG3shKvvxWleU98aS2CZR3SmQcNrktFfNxfCG2/T0URUpXuDP8Q2dQHufGdHV E4hhq0dPVV9ZoF+Pdnbo7kLjdTXITcBOwX1VE65/KQoUmF/KoFetFGK8tsxXakWtqIMCmi vSxSXSZlNXqmULXu9i2HPvPaWVHf95c= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=arista.com header.s=google header.b=FOZc0Jg5; spf=pass (imf23.hostedemail.com: domain of snaipe@arista.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=snaipe@arista.com; dmarc=pass (policy=reject) header.from=arista.com Received: by mail-ed1-f48.google.com with SMTP id 4fb4d7f45d1cf-5230a22cfd1so6901982a12.1 for ; Tue, 15 Aug 2023 00:47:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arista.com; s=google; t=1692085619; x=1692690419; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=OGe5aB5QQFtrcwstILThBVk8o25menFmktNdebKAlpo=; b=FOZc0Jg58vmTJ3w7Y4slBEtKA/i/NosQwUogj3m2eTUzme5uMzaRx+FsRyccaeWXOH FWUvLDxmICvhXrS2QE7RyCuZsZdwWd2MlyNq2lNWSjp4aJYj8O+0NPlXn7zqKJKhOyel MReGhZknKIbMk6Em6+RAkIOEx5DMlFOZeBK3m+HW8rawqQxFji5owRwkzwCRcpj3/qVT Pz1lPb7gNsYMOY/H8o7FfUJL1Mll5WAhQ46zVp9+q+pmJpN3fdb/6rg0ICOkS2t30cgQ pzbQv/HmFLrRtR+s6J4D3lQ9USgO000E7dPi5O6ufq15X34Y4aVc8j15UBD6ct0ff8P7 N11g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692085619; x=1692690419; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OGe5aB5QQFtrcwstILThBVk8o25menFmktNdebKAlpo=; b=U1mcgza1V2RipUbb29tcbVtsNDdOp+HQYSps6IfZGACq80f3Q+MjUg3TThzUiQaSHG 8+jyBe1lRPxKEpFAmVUUyFDX+KgluY3pl+yhDHaAWDciEC7PHBP+mDgLgMsqWizgvO8G u9bAmw09wTFiqbxkD7TwmiHCZEt7Jzu7SvIGUmlCAT+HYi1uCYRkHgaFA5J4kDHN5A2h cQpz5xBFgeoj4cIADkXE/P5DyfpHE9Iplff6NlTgYO6h+wcyhhv9KAsetrYqV5uat7mo eimp5TwENQ/1pHqXt1VmfzBVruUeXYD1/eG0ZRzyjoE2uqL5eBrD8xqg73TABWzOahiR EyBA== X-Gm-Message-State: AOJu0YxI+lwvUmi7XZpoOzgTiOmrqiwKuQooR5Uzyp62XGa49o2EU6Qk NM4QzqfDCMUlVgMzOdmZ6nzZN9o9Pu6ZecmBx5wURw== X-Google-Smtp-Source: AGHT+IHuWD0xsdFGgQ//NNdDa0xXkJDK6dyoZ0eavZhIAFc2CRFqWXSPouUcFhxh7k6nJhonciV2SH/RP5EVUWLxFw0= X-Received: by 2002:a05:6402:550:b0:523:1901:d19c with SMTP id i16-20020a056402055000b005231901d19cmr9389442edx.24.1692085618886; Tue, 15 Aug 2023 00:46:58 -0700 (PDT) MIME-Version: 1.0 References: <9b8d38f0-fd22-3f98-d070-16baf976ecb5@google.com> <20230814082339.2006418-1-snaipe@arista.com> <986c412c-669a-43fe-d72a-9e81bca8211@google.com> In-Reply-To: <986c412c-669a-43fe-d72a-9e81bca8211@google.com> From: =?UTF-8?B?RnJhbmtsaW4g4oCcU25haXBl4oCdIE1hdGhpZXU=?= Date: Tue, 15 Aug 2023 09:46:22 +0200 Message-ID: Subject: Re: [PATCH] shmem: add support for user extended attributes To: Hugh Dickins Cc: ovt@google.com, corbet@lwn.net, akpm@linux-foundation.org, brauner@kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 5grjy6cukzjny1w18rqmh3qz333hspce X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 85196140012 X-Rspam-User: X-HE-Tag: 1692085620-986045 X-HE-Meta: U2FsdGVkX1+7+9QRgcdYmPWaWi0tgRX7QcQPAiHPvC3iagMLG9b5GCzqM1+G86MXqTPKeXE2r13Xrg/QMKV2083RS1b4nsvj57fqSx638MyTGC2zsMIlLAAOtHpa6nMFWDL/XyRLOFXyD7yf/HB4AV39JIMfT8Z+H5yxssU9BodJ6rueNaHEWdWmRjCra33beoiJRQ9LwlER6eBr1KhJFeJU+UGY6SmXVaA/t1vJOvd/Gwk1V7G2omX6STbch/7UuHqWDaF9I4woyVSDsVqCArZOwREkpwoP9sOM2qJQ1idIbHJIb874qLbCenROosmpijwOE7SP1N2LjuP99+XCcCKYZANQMzBCV7uP8KOB8YtCIZlXogM3zbcRZTZu2MDXXHj9IkQxcZnhyO0DmzGAYABs1nHLPGZW5Bj63bBnegwwMzMqCNoXrV070eh8qIT6aIZoCbWnQVt3aGsqCujJTBR17k1TaKDTc9loBehE7/dVh1YLqrmOTfvKch5SKAzsQ4F9xfiSwUbT04fIe/fhHdTZpCFUo7gGFez6DVPxI4gViShibyUkRUzc9di3ZxjBlnXGTNItqU7ugbsO05AAhj1Kh17SOuIT9uhx6KfOTUHw5zyqgSQfFzmS87UE30ayNa8B0/7MtikkouiaFpIb2lTECo+7pgFP+Hk+2rcmWz1tBL2C2tZ4iI9LEcpCIBtCvZHJ2zhcaJYc0bBie+Nr44bHnwujeLN8KfDGp++eUUPMFk7Dc9ktM0RJnLjuAekE99YXzdNNhVqxJSLhIj3HLrcftZ+2CBpqYHvS055BRrbSAvIc/nRIDmyVxhflD5f4ASpzYvJdolOOQyWpWyl9mx8/JZVDEGQ0769JkZeyywqwYviDU7rgvmetr/4Yr4+cMVX73zCAekBA+EzCvXoGdL3iIduQDzI5QPEI2H3LxUFFKHReeEY1PK2QI049+KBn3s3g6MQGDANvFioOmFP kSb5pIVV dKoedCNH9W1pG0to6sVUK82EP4Lr8V1P43QlfmTSQKDImqexFbGoh16Wv7idAhG1Rt2rQgcUHNVJWzZSIAMzyYJi21xrHQgm0pdwMZFs3cawtsQ/eJOhNbyDqL914kn6Zy4SlYlvAbNUPWlYUeb87xd8uXYoF21lxubSHkSwI6Mtbx50SCivyNizY5KEd6WNgwhYHQ/hEBE9CUrxqyyYkcz38Yn+Z68HPj3r/4+MIMunFPzydxkUF5E/e8Jfwo6HDPiJGEmUSBqrfv6bETHP8w0DSM0wK5qjQZAP/hbIJvUbpcOQoBcxptHduCa4qSQtvUcT2ikEsI/31BRo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Aug 15, 2023 at 5:52=E2=80=AFAM Hugh Dickins wro= te: > > Thanks for the encouragement. At the time that I wrote that (20 July) > I did not expect to get around to it for months. But there happens to > have been various VFS-involving works going on in mm/shmem.c recently, > targeting v6.6: which caused me to rearrange priorities, and join the > party with tmpfs user xattrs, see > > https://lore.kernel.org/linux-fsdevel/e92a4d33-f97-7c84-95ad-4fed8e84608c= @google.com/ > > Which Christian Brauner quickly put into his vfs.git (vfs.tmpfs branch): > so unless something goes horribly wrong, you can expect them in v6.6. That's great to hear, thanks! > There's a lot that you wrote above which I have no understanding of > whatsoever (why would user xattrs stop rmdir failing?? it's okay, don't > try to educate me, I don't need to know, I'm just glad if they help you). > > Though your mention of "unprivileged" does make me shiver a little: > Christian will understand the implications when I do not, but I wonder > if my effort to limit the memory usage of user xattrs to "inode space" > can be be undermined by unprivileged mounts with unlimited (or default, > that's bad enough) nr_inodes. > > If so, that won't endanger the tmpfs user xattrs implementation, since > the problem would already go beyond those: can an unprivileged mount of > tmpfs allow its user to gobble up much more memory than is good for the > rest of the system? I don't actually know; I'm no expert in that area. That said, these tmpfses are themselves attached to an unprivileged mount namespace, so it would certainly be my assumption that in the case of an OOM condition, the OOM killer would keep trying to kill processes in that mount namespace until nothing else references it and all tmpfs mounts can be reclaimed, but then again, that's only my assumption and not necessarily reality. That said, I got curious and decided to experiment; I booted a kernel in a VM with 1GiB of memory and ran the following commands: $ unshare -Umfr bash # mount -t tmpfs tmp /mnt -o size=3D1g # dd if=3D/dev/urandom of=3D/mnt/oversize bs=3D1M count=3D1000 After about a second, the OOM killer woke up and killed bash then dd, causing the mount namespace to be collected (and with it the tmpfs). So far, so good. I got suspicious that what I was seeing was that these were the only reasonable candidates for the OOM killer, because there were no other processes in that VM besides them & init, so I modified slightly the experiment: $ dd if=3D/dev/zero of=3D/dev/null bs=3D10M count=3D10000000000 & $ unshare -Umfr bash # mount -t tmpfs tmp /mnt -o size=3D1g # dd if=3D/dev/urandom of=3D/mnt/oversize bs=3D1M count=3D1000 The intent being that the first dd would have a larger footprint than the second because of the large block size, yet it shouldn't be killed if the tmpfs usage was accounted for in processes in the mount namespace. What happened however is that both the outer dd and the outer shell got terminated, causing init to exit and with it the VM. So, it's likely that there's some more work to do in that area; I'd certainly expect the OOM killer to take the overall memory footprint of mount namespaces into account when selecting which processes to kill. It's also possible my experiment was flawed and not representative of a real-life scenario, as I clearly have interacted with misbehaving containers before, which got killed when they wrote too much to tmpfs. But then again, my experiment also didn't take memory cgroups into account. -- Snaipe