From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A2840C433F5 for ; Fri, 22 Apr 2022 09:02:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 320C56B0074; Fri, 22 Apr 2022 05:02:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2D1426B0075; Fri, 22 Apr 2022 05:02:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1990B6B0078; Fri, 22 Apr 2022 05:02:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 0A7EC6B0074 for ; Fri, 22 Apr 2022 05:02:37 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id D0E6624F07 for ; Fri, 22 Apr 2022 09:02:36 +0000 (UTC) X-FDA: 79383924312.29.FB76447 Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175]) by imf03.hostedemail.com (Postfix) with ESMTP id 63BDC2000D for ; Fri, 22 Apr 2022 09:02:34 +0000 (UTC) Received: by mail-qk1-f175.google.com with SMTP id y129so5378962qkb.2 for ; Fri, 22 Apr 2022 02:02:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=pNOuwA48pKvc4UnFkIjgb1b/GUP0I1w88fP9RmtHvqg=; b=iGFelV6lO9VSWPdNcdy3oYDos7k83TD+Z6/hxfMY4DTIljzi4MK234Ro1oPwTdsUaO JSwslZBs12uihktOs9VqfydlHMNLzBhtsN2RivOycM76MNFjteVmrAaALh9RsTeoryPj DnQNmLLQWeal3O1lASwvDy3IrZOkupjFJeyOaW+LB+LixeWCGYzHlbtnPmHG5xpQxdY+ Y+UKMle7Znj0iznA0YaQj9+MYwnFuy0B8XvnYxuN+hotIMMFUsgzdINnC+R8HC3d31cW BOCqJBUpOFtCDVut87MyZO/e7+ofxTsuhkR7echFhk+Ms3odVIINujK2AWhTYeh+DkmJ eRuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=pNOuwA48pKvc4UnFkIjgb1b/GUP0I1w88fP9RmtHvqg=; b=XMcCgQufbYh5WOZyRS/wCkrnmu+8Xih0G+qBRY2W0JbC0XBWeqmEioXQK3Qz8MmJwZ 9LaP+k6FsFs7Q5BzdMS/ofGYSCJWY4toTROy4W4svloeR9E5yMngMfvWcejGlv7sGFD3 kGX9F6wjVfoJgAg4kiOR8AeJlYF6Lu6ftDwVuJU3tAfQe46WPS2awI3BPfQe3dBtv1rc NXSb4yoIb1GIOibpe3yk950qW/NrubJWMCAcrWXE1adNnrRAdTTpUX5NijulcNbFqwsL 4mkaEwBxtRJno5NLvrtf4yep8NCq3TBAJZaqMYvqIPQ9jNIP7zF9My7Y3P/6HWRP3b7C GvWg== X-Gm-Message-State: AOAM532Su4xpEvQ42j1bSxHTgOwqnWW2aZiBQiGpeY7/EONEfU2nq6E9 z2n2mDb2epVdBXfcm4UtxPhRDUf9XM1cRSJCfao= X-Google-Smtp-Source: ABdhPJyXhogquPFFpSGRzpzeR9IYHnfitadHFYEi7QrBtnpjSqvpJArV8BNgFHUGH7WcySmSfzc0mg0eWc4K/VFdD4s= X-Received: by 2002:ae9:eb87:0:b0:69e:75b3:6527 with SMTP id b129-20020ae9eb87000000b0069e75b36527mr1992206qkg.386.1650618155493; Fri, 22 Apr 2022 02:02:35 -0700 (PDT) MIME-Version: 1.0 References: <20220418213713.273050-1-krisman@collabora.com> <20220418204204.0405eda0c506fd29e857e1e4@linux-foundation.org> <87h76pay87.fsf@collabora.com> In-Reply-To: From: Amir Goldstein Date: Fri, 22 Apr 2022 12:02:22 +0300 Message-ID: Subject: Re: [PATCH v3 0/3] shmem: Allow userspace monitoring of tmpfs for lack of space. To: Khazhy Kumykov Cc: Gabriel Krisman Bertazi , Andrew Morton , Hugh Dickins , Al Viro , kernel@collabora.com, Linux MM , linux-fsdevel , Theodore Tso Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 63BDC2000D X-Rspam-User: Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=iGFelV6l; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of amir73il@gmail.com designates 209.85.222.175 as permitted sender) smtp.mailfrom=amir73il@gmail.com X-Stat-Signature: zy3boq9shzj7zfrza4qjq18gmk4eqyno X-HE-Tag: 1650618154-127788 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Apr 22, 2022 at 2:19 AM Khazhy Kumykov wrote: > > On Wed, Apr 20, 2022 at 10:34 PM Amir Goldstein wrote: > > > > On Tue, Apr 19, 2022 at 6:29 PM Gabriel Krisman Bertazi > > wrote: > > > > > > Andrew Morton writes: > > > > > > Hi Andrew, > > > > > > > On Mon, 18 Apr 2022 17:37:10 -0400 Gabriel Krisman Bertazi wrote: > > > > > > > >> When provisioning containerized applications, multiple very small tmpfs > > > > > > > > "files"? > > > > > > Actually, filesystems. In cloud environments, we have several small > > > tmpfs associated with containerized tasks. > > > > > > >> are used, for which one cannot always predict the proper file system > > > >> size ahead of time. We want to be able to reliably monitor filesystems > > > >> for ENOSPC errors, without depending on the application being executed > > > >> reporting the ENOSPC after a failure. > > > > > > > > Well that sucks. We need a kernel-side workaround for applications > > > > that fail to check and report storage errors? > > > > > > > > We could do this for every syscall in the kernel. What's special about > > > > tmpfs in this regard? > > > > > > > > Please provide additional justification and usage examples for such an > > > > extraordinary thing. > > > > > > For a cloud provider deploying containerized applications, they might > > > not control the application, so patching userspace wouldn't be a > > > solution. More importantly - and why this is shmem specific - > > > they want to differentiate between a user getting ENOSPC due to > > > insufficiently provisioned fs size, vs. due to running out of memory in > > > a container, both of which return ENOSPC to the process. > > > > > > > Isn't there already a per memcg OOM handler that could be used by > > orchestrator to detect the latter? > > > > > A system administrator can then use this feature to monitor a fleet of > > > containerized applications in a uniform way, detect provisioning issues > > > caused by different reasons and address the deployment. > > > > > > I originally submitted this as a new fanotify event, but given the > > > specificity of shmem, Amir suggested the interface I'm implementing > > > here. We've raised this discussion originally here: > > > > > > https://lore.kernel.org/linux-mm/CACGdZYLLCqzS4VLUHvzYG=rX3SEJaG7Vbs8_Wb_iUVSvXsqkxA@mail.gmail.com/ > > > > > > > To put things in context, the points I was trying to make in this > > discussion are: > > > > 1. Why isn't monitoring with statfs() a sufficient solution? and more > > specifically, the shared disk space provisioning problem does not sound > > very tmpfs specific to me. > > It is a well known issue for thin provisioned storage in environments > > with shared resources as the ones that you describe > > I think this solves a different problem: to my understanding statfs > polling is useful for determining if a long lived, slowly growing FS > is approaching its limits - the tmpfs here are generally short lived, > and may be intentionally running close to limits (e.g. if they "know" > exactly how much they need, and don't expect to write any more than > that). In this case, the limits are there to guard against runaway > (and assist with scheduling), so "monitor and increase limits > periodically" isn't appropriate. > > It's meant just to make it easier to distinguish between "tmpfs write > failed due to OOM" and "tmpfs write failed because you exceeded tmpfs' > max size" (what makes tmpfs "special" is that tmpfs, for good reason, > returns ENOSPC for both of these situations to the user). For a small Maybe it's for a good reason, but it clearly is not the desired behavior in your use case. Perhaps what is needed here is a way for user to opt-in to a different OOM behavior from shmem using a mount option? Would that be enough to cover your use case? > task a user could easily go from 0% to full, or OOM, rather quickly, > so statfs polling would likely miss the event. The orchestrator can, > when the task fails, easily (and reliably) look at this statistic to > determine if a user exceeded the tmpfs limit. > > (I do see the parallel here to thin provisioned storage - "exceeded > your individual budget" vs. "underlying overcommitted system ran out > of bytes") Right, and in this case, the application gets a different error in case of "underlying space overcommitted", usually EIO, that's why I think that opting-in for this same behavior could make sense for tmpfs. We can even consider shutdown behavior for shmem in that case, but that is up to whoever may be interested in that kind of behavior. Thanks, Amir.