From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.0 required=3.0 tests=BAYES_00,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 069FAC47082 for ; Sat, 5 Jun 2021 21:37:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 840E660FE5 for ; Sat, 5 Jun 2021 21:37:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 840E660FE5 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BACF66B006E; Sat, 5 Jun 2021 17:37:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B5D786B0070; Sat, 5 Jun 2021 17:37:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9D6D06B0071; Sat, 5 Jun 2021 17:37:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0175.hostedemail.com [216.40.44.175]) by kanga.kvack.org (Postfix) with ESMTP id 6ABA66B006E for ; Sat, 5 Jun 2021 17:37:40 -0400 (EDT) Received: from smtpin33.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 043B0181AC9CC for ; Sat, 5 Jun 2021 21:37:39 +0000 (UTC) X-FDA: 78220982280.33.7130CEC Received: from mail-il1-f172.google.com (mail-il1-f172.google.com [209.85.166.172]) by imf29.hostedemail.com (Postfix) with ESMTP id AD62413A for ; Sat, 5 Jun 2021 21:37:37 +0000 (UTC) Received: by mail-il1-f172.google.com with SMTP id i13so6736864ilk.3 for ; Sat, 05 Jun 2021 14:37:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=f0JYFWLbU1K2WU+3KcY+euQ65dBHJZ+TcnqUBRn+Ixw=; b=N4Sim0qezE40Lr5NMxy0LDIAJYCN5QIWBEDC1sF/PhVFsbSBGRy0wXPcUxb9j4kv00 CiGS0gfIQadk2O+dL4Mh3P0phDySWtfR24ng5QVIkGfSxFCbAc3ZiR0lcSVZqKOrCE7v Nke0Cc6rLokxWmWg3F1b6rrGyIbBZ2FQ7J7MPFkWW5YW/icpzWoHBxQ6YBRx+j/6p6I/ xVhsEoQpN2OSPCJr+ZKdDLxU+ariWNaCaeY/At7rnCjoZYYmwVW3MSdkp8E3cs1yaUyU gePxj8mVYzE9sy9S2cRV4rnFvLC0YKYPdvlAFeyGTcEpYOZOzyae4loTD9cfZyeLtnNV 7b0Q== X-Gm-Message-State: AOAM533QDzbSSpDseVhkI9EsIHmqlDVawZKaSxJlMWUETuwHyTyaf6A4 Cdp71TKm+3JutfN0hWCyZuU= X-Google-Smtp-Source: ABdhPJwMLOVVfaiNYKItkPtUrVqfpfgw6YBhzHcDjMO92/yapMMAIYYVBoDqFzfAUq9CFL3+jjmLaA== X-Received: by 2002:a92:c611:: with SMTP id p17mr8862502ilm.166.1622929059076; Sat, 05 Jun 2021 14:37:39 -0700 (PDT) Received: from google.com (243.199.238.35.bc.googleusercontent.com. [35.238.199.243]) by smtp.gmail.com with ESMTPSA id 15sm3666647ilt.66.2021.06.05.14.37.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 05 Jun 2021 14:37:38 -0700 (PDT) Date: Sat, 5 Jun 2021 21:37:37 +0000 From: Dennis Zhou To: Roman Gushchin Cc: Jan Kara , Tejun Heo , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Alexander Viro , Dave Chinner , cgroups@vger.kernel.org Subject: Re: [PATCH v7 0/6] cgroup, blkcg: prevent dirty inodes to pin dying memory cgroups Message-ID: References: <20210604013159.3126180-1-guro@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210604013159.3126180-1-guro@fb.com> Authentication-Results: imf29.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf29.hostedemail.com: domain of dennisszhou@gmail.com designates 209.85.166.172 as permitted sender) smtp.mailfrom=dennisszhou@gmail.com X-Stat-Signature: e8fac9t8976grwhyn86ftjrgjotmsdif X-Rspamd-Queue-Id: AD62413A X-Rspamd-Server: rspam02 X-HE-Tag: 1622929057-792707 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello, On Thu, Jun 03, 2021 at 06:31:53PM -0700, Roman Gushchin wrote: > When an inode is getting dirty for the first time it's associated > with a wb structure (see __inode_attach_wb()). It can later be > switched to another wb (if e.g. some other cgroup is writing a lot of > data to the same inode), but otherwise stays attached to the original > wb until being reclaimed. > > The problem is that the wb structure holds a reference to the original > memory and blkcg cgroups. So if an inode has been dirty once and later > is actively used in read-only mode, it has a good chance to pin down > the original memory and blkcg cgroups forewer. This is often the case with > services bringing data for other services, e.g. updating some rpm > packages. > > In the real life it becomes a problem due to a large size of the memcg > structure, which can easily be 1000x larger than an inode. Also a > really large number of dying cgroups can raise different scalability > issues, e.g. making the memory reclaim costly and less effective. > > To solve the problem inodes should be eventually detached from the > corresponding writeback structure. It's inefficient to do it after > every writeback completion. Instead it can be done whenever the > original memory cgroup is offlined and writeback structure is getting > killed. Scanning over a (potentially long) list of inodes and detach > them from the writeback structure can take quite some time. To avoid > scanning all inodes, attached inodes are kept on a new list (b_attached). > To make it less noticeable to a user, the scanning and switching is performed > from a work context. > > Big thanks to Jan Kara, Dennis Zhou and Hillf Danton for their ideas and > contribution to this patchset. > > v7: > - shared locking for multiple inode switching > - introduced inode_prepare_wbs_switch() helper > - extended the pre-switch inode check for I_WILL_FREE > - added comments here and there > > v6: > - extended and reused wbs switching functionality to switch inodes > on cgwb cleanup > - fixed offline_list handling > - switched to the unbound_wq > - other minor fixes > > v5: > - switch inodes to bdi->wb instead of zeroing inode->i_wb > - split the single patch into two > - only cgwbs maintain lists of attached inodes > - added cond_resched() > - fixed !CONFIG_CGROUP_WRITEBACK handling > - extended list of prohibited inodes flag > - other small fixes > > > Roman Gushchin (6): > writeback, cgroup: do not switch inodes with I_WILL_FREE flag > writeback, cgroup: switch to rcu_work API in inode_switch_wbs() > writeback, cgroup: keep list of inodes attached to bdi_writeback > writeback, cgroup: split out the functional part of > inode_switch_wbs_work_fn() > writeback, cgroup: support switching multiple inodes at once > writeback, cgroup: release dying cgwbs by switching attached inodes > > fs/fs-writeback.c | 302 +++++++++++++++++++++---------- > include/linux/backing-dev-defs.h | 20 +- > include/linux/writeback.h | 1 + > mm/backing-dev.c | 69 ++++++- > 4 files changed, 293 insertions(+), 99 deletions(-) > > -- > 2.31.1 > I too am a bit late to the party. Feel free to add mine as well to the series. Acked-by: Dennis Zhou I left my one comment on the last patch regarding a possible future extension. Thanks, Dennis