From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34662C43603 for ; Wed, 18 Dec 2019 02:34:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E0A3721582 for ; Wed, 18 Dec 2019 02:34:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="k186Dmua" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E0A3721582 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 58A448E00C5; Tue, 17 Dec 2019 21:34:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 53A868E0079; Tue, 17 Dec 2019 21:34:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 479208E00C5; Tue, 17 Dec 2019 21:34:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0020.hostedemail.com [216.40.44.20]) by kanga.kvack.org (Postfix) with ESMTP id 314068E0079 for ; Tue, 17 Dec 2019 21:34:27 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id A901F181AC9CC for ; Wed, 18 Dec 2019 02:34:26 +0000 (UTC) X-FDA: 76276693332.16.cast20_43d769085292e X-HE-Tag: cast20_43d769085292e X-Filterd-Recvd-Size: 6729 Received: from mail-il1-f196.google.com (mail-il1-f196.google.com [209.85.166.196]) by imf35.hostedemail.com (Postfix) with ESMTP for ; Wed, 18 Dec 2019 02:34:26 +0000 (UTC) Received: by mail-il1-f196.google.com with SMTP id v15so385014iln.0 for ; Tue, 17 Dec 2019 18:34:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=zQnxUgnPs5XIy6wqoYk0lFfMX967lDJfnU21RxvBckA=; b=k186DmuaofohJF7sA2JaDqoAoEoV0LknoS1gvsgg5HlMX26R0W46/SzFpUIqDARcwR 00LGuKYdC/Ah2l3m0FR74FLbLClrUzr6U8RYQtkA/0LD6Dgn/egZaW+oTa+SHdQUfwJd iL2ln/3IZ4wbuHmxIOKiQ5aXlfD9dijmb3wY/DigOIQqRHgyNcHP4qBM47VT/mOwquP7 VLtcQ7v8wwfy8rRzfBNcW+uzUuUkn4/vvU1u8UoPYxfIBTfDREXtbiOY9wTjFOnrvK+x HqEOa3iBtKa1jUoL9Xdjh2kkCOg9BpjCmtq0cUSBLholVaawuOUpfkNm23f0VMqusdRt zCPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=zQnxUgnPs5XIy6wqoYk0lFfMX967lDJfnU21RxvBckA=; b=Ony8KsEpyvPLzTGQ+rTwgFZGqqAE3a3x/C+ktf3TM1aSoZ0rzhrijIHSOtjd7doWKf sIdQ3vAn2NgKiRGyebf40Qb9v0erNsOsZINnuIeQjfwtgqnKwcJs6h2EEPMIQQMW5Saw +sUQjsiLwBca19R6COKxm3GvlE83DYfu83F9R0HiPPSrOZIK4tudREsPOWUic9VxZYxJ JD63q0gU+RY2AnWNxLu4fHDu0hGMnJpInCtqU0n0+vQbLZOv6qnMqmkHrB4waESwZQRV h7CIledI/8qEXXrAO1owmTshTyPRtB/2hbNifD854aYpsVen5NCJy3P6z+H5Z9pM8+gC uRjA== X-Gm-Message-State: APjAAAWs6oCdI6MQd27xaoebq8J3iR2O8SG1osYsjO+TShnpM/svXknP LJ2y3iPXrqfiB3G8IoI7mTZv0NPYKfmgOfygHmQ= X-Google-Smtp-Source: APXvYqwsvCYyaOnogcb9qTavCG/3wnUELhgd4jAyUzdx2KEuXdrobreXrAdzYkKZtBgU05a5vSqnzcxIuWQzb5m2PMc= X-Received: by 2002:a92:5c8a:: with SMTP id d10mr621616ilg.137.1576636464386; Tue, 17 Dec 2019 18:34:24 -0800 (PST) MIME-Version: 1.0 References: <1576582159-5198-1-git-send-email-laoar.shao@gmail.com> <1576582159-5198-5-git-send-email-laoar.shao@gmail.com> <20191218022122.GT19213@dread.disaster.area> In-Reply-To: <20191218022122.GT19213@dread.disaster.area> From: Yafang Shao Date: Wed, 18 Dec 2019 10:33:48 +0800 Message-ID: Subject: Re: [PATCH 4/4] memcg, inode: protect page cache from freeing inode To: Dave Chinner Cc: Johannes Weiner , Michal Hocko , Vladimir Davydov , Andrew Morton , Al Viro , Linux MM , linux-fsdevel@vger.kernel.org, Roman Gushchin , Chris Down , Dave Chinner Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Dec 18, 2019 at 10:21 AM Dave Chinner wrote: > > On Tue, Dec 17, 2019 at 06:29:19AM -0500, Yafang Shao wrote: > > On my server there're some running MEMCGs protected by memory.{min, low}, > > but I found the usage of these MEMCGs abruptly became very small, which > > were far less than the protect limit. It confused me and finally I > > found that was because of inode stealing. > > Once an inode is freed, all its belonging page caches will be dropped as > > well, no matter how may page caches it has. So if we intend to protect the > > page caches in a memcg, we must protect their host (the inode) first. > > Otherwise the memcg protection can be easily bypassed with freeing inode, > > especially if there're big files in this memcg. > > The inherent mismatch between memcg and inode is a trouble. One inode can > > be shared by different MEMCGs, but it is a very rare case. If an inode is > > shared, its belonging page caches may be charged to different MEMCGs. > > Currently there's no perfect solution to fix this kind of issue, but the > > inode majority-writer ownership switching can help it more or less. > > > > Cc: Roman Gushchin > > Cc: Chris Down > > Cc: Dave Chinner > > Signed-off-by: Yafang Shao > > --- > > fs/inode.c | 9 +++++++++ > > include/linux/memcontrol.h | 15 +++++++++++++++ > > mm/memcontrol.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ > > mm/vmscan.c | 4 ++++ > > 4 files changed, 74 insertions(+) > > > > diff --git a/fs/inode.c b/fs/inode.c > > index fef457a..b022447 100644 > > --- a/fs/inode.c > > +++ b/fs/inode.c > > @@ -734,6 +734,15 @@ static enum lru_status inode_lru_isolate(struct list_head *item, > > if (!spin_trylock(&inode->i_lock)) > > return LRU_SKIP; > > > > + > > + /* Page protection only works in reclaimer */ > > + if (inode->i_data.nrpages && current->reclaim_state) { > > + if (mem_cgroup_inode_protected(inode)) { > > + spin_unlock(&inode->i_lock); > > + return LRU_ROTATE; > > Urk, so after having plumbed the memcg all the way down to the > list_lru walk code so that we only walk inodes in that memcg, we now > have to do a lookup from the inode back to the owner memcg to > determine if we should reclaim it? IOWs, I think the layering here > is all wrong - if memcg info is needed in the shrinker, it should > come from the shrink_control->memcg pointer, not be looked up from > the object being isolated... > Agree with you that the layering here is not good. I had tried to use shrink_control->memcg pointer as an argument or something else, but I found that will change lots of code. I don't want to change too much code, so I implement it this way, although it looks a litte strange. > i.e. this code should read something like this: > > if (memcg && inode->i_data.nrpages && > (!memcg_can_reclaim_inode(memcg, inode)) { > spin_unlock(&inode->i_lock); > return LRU_ROTATE; > } > > This code does not need comments because it is obvious what it does, > and it provides a generic hook into inode reclaim for the memcg code > to decide whether the shrinker should reclaim the inode or not. > > This is how the memcg code should interact with other shrinkers, too > (e.g. the dentry cache isolation function), so you need to look at > how to make the memcg visible to the lru walker isolation > functions.... > Thanks for your suggestion. I will rethink it torwards this way. Thanks Yafang