From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D9C8EE49AE for ; Tue, 22 Aug 2023 17:50:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 12C7C28004E; Tue, 22 Aug 2023 13:50:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0DB6928003C; Tue, 22 Aug 2023 13:50:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EE5B228004E; Tue, 22 Aug 2023 13:50:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id DFCDB28003C for ; Tue, 22 Aug 2023 13:50:52 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id B3C66B1DA9 for ; Tue, 22 Aug 2023 17:50:52 +0000 (UTC) X-FDA: 81152481144.02.4BA57D4 Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) by imf22.hostedemail.com (Postfix) with ESMTP id E31C8C0007 for ; Tue, 22 Aug 2023 17:50:50 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=NRQgiij0; spf=pass (imf22.hostedemail.com: domain of ovt@google.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=ovt@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692726651; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1nqhDYXGpSOAmKTJskbTtsAii0p0FKiT8QYeeL3lfVk=; b=jkpDMfp7RcYN/3rvTGfw+3t1yAjy2ghAqg/Ys7/wtSuB1iuAVZr0p0UmJveYA+NPanCYHQ KGmmMGbFNrN/vwfpwDRQjFslYgZNKu0++1bnH+FkZgqZB7f97vxh0GBfSV1Cp2yw5CUu9/ j6u79I3LpNu5F80rz09qyaSqanzkE2g= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692726651; a=rsa-sha256; cv=none; b=OH7a/64iZgRXAsfuwold8OABn9ZkxDcJFGn1aRizDOc8OBWMqjAq7DABmu8q56rhWXtO5L A7UKZkiArZP4krHT7fvFlBkdFFYtNrqUaviYHC3fgwvZZQq8pqq3ME1DCXWvdowe9QXo23 d5CBnbCtSK5F6X5Z6tcUtvh9f2soV0g= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=NRQgiij0; spf=pass (imf22.hostedemail.com: domain of ovt@google.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=ovt@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ej1-f49.google.com with SMTP id a640c23a62f3a-99c3c8adb27so617934466b.1 for ; Tue, 22 Aug 2023 10:50:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692726649; x=1693331449; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=1nqhDYXGpSOAmKTJskbTtsAii0p0FKiT8QYeeL3lfVk=; b=NRQgiij089P1xhEuHRN5ICIxwVHS67WBDgo4mfvyCGU7GYLQTMEOhPA1gO4ONgPR0J jmLPLqqASR9A48R5sBRBTMTB8gTLLA5sHB+5QJ4TvgHiP6UokBDt90n+KRC/I5cXqg82 kLNc/nnVwI1t/wuWMxFfDhC+JybTYvKbAEzNdM34aoNcF+xHPcmrc/8Xnd2a4c/UK1dU a62TVh3fgYE5tT3/LKlcNQxH5MHF50Ub7p7Bg0aZqcRUVeBNmMXiKUtW03Xbcr9WZYyn sDadWChZjLJ3ZEijv5gg2+YjvxAa5CMUiG6DrRoeGQFgertG5+QxYQM2MbHuOljnY1yS yJ/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692726649; x=1693331449; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=1nqhDYXGpSOAmKTJskbTtsAii0p0FKiT8QYeeL3lfVk=; b=AoatKsqOoahozH+GBmGE6s0aTtQww9rsn7ibbljw+L93rQVaum4Lwvy4aeNdxX0Kgp RPTIY6mBJwNbLp7zf+shaWSwZ80AGdGB4Oj50/VtCV26xbNopo2wH24VIopwtEc54eYT pdiQksUCZWuD0AKFJp3mxvk8qKWxIsIAtDnBKh+P4NnasEyPVZuYONxMg7HKzbBb8JOK Ao4ysS50qhHcXcslmbAUOb41B5vTR/UCgzlEf1jORnNc087vHjVA7kc4gO4gzr4WeGPZ zy1FBE57hVp2yq+hJ5eGKR3S8NvGt56IN0PhLHDvrkiEuvrlXOvCeF9AvnM2JElbI/Ny sGBw== X-Gm-Message-State: AOJu0YzmHfQFYv71D7mBh1sNedcwbLnA5YHgKzmAZkYVPSkT1mSqV2X5 2sYI7U48Ib7uYpz3lEQe7Tr/+7t1QqM8u/yyyjyK5A== X-Google-Smtp-Source: AGHT+IEhdHgz+eX5xDyp7hALr9S6iv4Orqa2NifEbN2SVTH/AppB/R2oGHQ+W5DnDKh7m5rly2a51T/xdvZ2wQKDqCo= X-Received: by 2002:a17:907:2bec:b0:99b:6687:6107 with SMTP id gv44-20020a1709072bec00b0099b66876107mr7971960ejc.5.1692726649290; Tue, 22 Aug 2023 10:50:49 -0700 (PDT) MIME-Version: 1.0 References: <9b8d38f0-fd22-3f98-d070-16baf976ecb5@google.com> <20230814082339.2006418-1-snaipe@arista.com> <986c412c-669a-43fe-d72a-9e81bca8211@google.com> <20230815-sensibel-weltumsegelung-6593f2195293@brauner> <924ed61c-5681-aa8b-d943-7f73694d159@google.com> In-Reply-To: <924ed61c-5681-aa8b-d943-7f73694d159@google.com> From: Oleksandr Tymoshenko Date: Tue, 22 Aug 2023 10:50:38 -0700 Message-ID: Subject: Re: [PATCH] shmem: add support for user extended attributes To: Hugh Dickins Cc: Christian Brauner , =?UTF-8?B?RnJhbmtsaW4g4oCcU25haXBl4oCdIE1hdGhpZXU=?= , corbet@lwn.net, akpm@linux-foundation.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: multipart/alternative; boundary="000000000000a23bf0060386a261" X-Stat-Signature: j19eiwz5kq6kzinq3zmum618zn6th5nh X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: E31C8C0007 X-Rspam-User: X-HE-Tag: 1692726650-205943 X-HE-Meta: U2FsdGVkX1874NC8cW/WOO2qvYWnfMGzl11X+XmgX/zwXc7eDTqENd3nwFczXvhPPGhUGn7Wc4TxIr9IoSVV4YhmyL9FJqcOHnCNkAeMyoPa00cBagOoAGe+D8C6o7/7u3WON4etR7Ap1PN1/sJ8hMlGTV5TazXrJAwwM94iCMM8HflQP0YQOkJMvzIZAyVmuCakfxqDmx1sbZ96X8GxPQLZbYFg0LXNFdA4RPk8a/6UFJpA2D6BT9814e0ngbK8EK164wiuBi50Ojbrr7O7ocCGdcnGkcNe1jJgrQpOV+xla8PpLkmYMWVhUZLJlHaIjguXDBTWUBCiMwrNp33hNlaC4owGlQSOyDjOGfKE6zg0g9izMdS8XNXQHK9dtexy8b2hnnC3z9TzPydmtw3U0SS1lsypWP6rIydmdt/hLg2rX9vZLyg81Z61e2oJI3J3XqsaliVo0i7FQJLmtvI8+gyQBdbxFeIzBVbpO93pw4O1ewC12VeKaGSSKW6Q0kDyTIYXihxmfQFCDEHhGlNWBR1fnTWamYogLB89VSSwdWT30/t4ZzzA2csjX0w/YEqEttenMx65ynw8QpxuC+KZY0PzBsAv3bzkG59O5myEblSgVle1RJJpoiNJgtNKJb9dvthMVOKY8ehq4xbeK9p6O3IDmvMNJa6eJm5IqHmsvo0DEx8bz3bC3qwnEvFXyi3UjJQZZc1S/Xc1HvZIhHHWOwFYy+DY0m7QUJhqOer0TVQK+fFnSyHJdlXN/rruKeAJe6wZZLOVmAxNky9se3KfUl/PlzlpCuuD0hdsPIxrE8PsxUMB8iwJYWeU9utO75789pwcBRzSMmImceWwbQMYB8GDAGuW/2fKu7xysTZV+pTsM75SgjnomVRTGYvJvNjg19Pr0dK157ej5jo7mBffW3xt3Qw/2AT2UD20DJMrHmQH02CBFEzff2lP4t5GlKThSi3yWv78+XWkAPVqkD+ lml9UVrP Mwx43SSyw0G+AxgzRIxVcfLog32YldsWSzetx5Eu2PdLwkwXHbP0I5IDMQfMFIiCOIDRW0unGK4orUDLr7+mKpQ/eckYM/v9Wpg/M8PUuY8uOqjB4WKCk4xgV6KSoqm4Dc6n2RdS4ywiSCgkW5IyYnrSKyJTS0CKeOHO1UjCExBcN4fQyvEAgow9w/Rk82gtEV10TuwcP04t6sR6iSiJ6EOgNBKGo97qJdAjlHKn2RuZRrWi3ZB3ZdQ/smwNu8wLM83IYXzBJKBVPuBCofJbCEcG0CKvYoS0wXxHLUYfdzJqYnhdHdAtXOTOVsWnAXtDIdfSJHPWae1uvmnWDTT22PmW6KyR6bsT6KvaQe89U/YbJ0L0SPn/aPKdgpuFVlsvE5AF9CNsLvQBW475qPulvcB/Vi/KCmufw3wEl X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --000000000000a23bf0060386a261 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks for working on this. On Mon, Aug 21, 2023 at 10:52=E2=80=AFAM Hugh Dickins wr= ote: > On Tue, 15 Aug 2023, Christian Brauner wrote: > > On Tue, Aug 15, 2023 at 09:46:22AM +0200, Franklin =E2=80=9CSnaipe=E2= =80=9D Mathieu > wrote: > > > > > > So, it's likely that there's some more work to do in that area; I'd > > > certainly expect the OOM killer to take the overall memory footprint > > > of mount namespaces into account when selecting which processes to > > > kill. It's also possible my experiment was flawed and not > > > representative of a real-life scenario, as I clearly have interacted > > > with misbehaving containers before, which got killed when they wrote > > > too much to tmpfs. But then again, my experiment also didn't take > > > memory cgroups into account. > > > > So mount namespaces are orthogonal to that and they would be the wrong > > layer to handle this. > > > > Note that an unprivileged user (regular or via containers) on the syste= m > > can just exhaust all memory in various ways. Ultimately the container o= r > > user would likely be taken down by in-kernel OOM or systemd-oomd or > > similar tools under memory pressure. > > > > Of course, all that means is that untrusted workloads need to have > > cgroup memory limits. That also limits tmpfs instances and prevents > > unprivileged user from using all memory. > > > > If you don't set a memory limit then yes, the container might be able t= o > > exhaust all memory but that's a bug in the container runtime. Also, at > > some point the OOM killer or related userspace tools will select the > > container init process for termination at which point all the namespace= s > > and mounts go away. That's probably what you experience as misbehaving > > containers. The real bug there is probably that they're allowed to run > > without memory limits in the first place. > > Thanks, this was a good reminder that I very much needed to look back at > the memory cgroup limiting of xattrs on tmpfs - I'd had the patch in the > original series, then was alarmed to find shmem_alloc_inode() using > GFP_KERNEL, so there seemed no point in accounting the xattrs if the > inodes were not being accounted: so dropped it temporarily. I had > forgotten that SLAB_ACCOUNT on the kmem_cache ensures that accounting. > > "tmpfs,xattr: GFP_KERNEL_ACCOUNT for simple xattrs" just sent to fix it: > > https://lore.kernel.org/linux-fsdevel/f6953e5a-4183-8314-38f2-40be6099861= 5@google.com/ > > Thanks, > Hugh --000000000000a23bf0060386a261 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Tha= nks for working=C2=A0on this.

On Mon, Aug 21, 2023 at 10:52=E2=80=AFAM= Hugh Dickins <hughd@google.com&= gt; wrote:
On Tu= e, 15 Aug 2023, Christian Brauner wrote:
> On Tue, Aug 15, 2023 at 09:46:22AM +0200, Franklin =E2=80=9CSnaipe=E2= =80=9D Mathieu wrote:
> >
> > So, it's likely that there's some more work to do in that= area; I'd
> > certainly expect the OOM killer to take the overall memory footpr= int
> > of mount namespaces into account when selecting which processes t= o
> > kill. It's also possible my experiment was flawed and not
> > representative of a real-life scenario, as I clearly have interac= ted
> > with misbehaving containers before, which got killed when they wr= ote
> > too much to tmpfs. But then again, my experiment also didn't = take
> > memory cgroups into account.
>
> So mount namespaces are orthogonal to that and they would be the wrong=
> layer to handle this.
>
> Note that an unprivileged user (regular or via containers) on the syst= em
> can just exhaust all memory in various ways. Ultimately the container = or
> user would likely be taken down by in-kernel OOM or systemd-oomd or > similar tools under memory pressure.
>
> Of course, all that means is that untrusted workloads need to have
> cgroup memory limits. That also limits tmpfs instances and prevents > unprivileged user from using all memory.
>
> If you don't set a memory limit then yes, the container might be a= ble to
> exhaust all memory but that's a bug in the container runtime. Also= , at
> some point the OOM killer or related userspace tools will select the > container init process for termination at which point all the namespac= es
> and mounts go away. That's probably what you experience as misbeha= ving
> containers. The real bug there is probably that they're allowed to= run
> without memory limits in the first place.

Thanks, this was a good reminder that I very much needed to look back at the memory cgroup limiting of xattrs on tmpfs - I'd had the patch in th= e
original series, then was alarmed to find shmem_alloc_inode() using
GFP_KERNEL, so there seemed no point in accounting the xattrs if the
inodes were not being accounted: so dropped it temporarily.=C2=A0 I had
forgotten that SLAB_ACCOUNT on the kmem_cache ensures that accounting.

"tmpfs,xattr: GFP_KERNEL_ACCOUNT for simple xattrs" just sent to = fix it:
https://lore.k= ernel.org/linux-fsdevel/f6953e5a-4183-8314-38f2-40be60998615@google.com/

Thanks,
Hugh
--000000000000a23bf0060386a261--