From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F510C33C9B for ; Tue, 7 Jan 2020 13:22:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2BA9B207E0 for ; Tue, 7 Jan 2020 13:22:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="qeiyQMTz" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2BA9B207E0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 703F68E0031; Tue, 7 Jan 2020 08:22:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6B30A8E001E; Tue, 7 Jan 2020 08:22:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5C8608E0031; Tue, 7 Jan 2020 08:22:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0075.hostedemail.com [216.40.44.75]) by kanga.kvack.org (Postfix) with ESMTP id 46C898E001E for ; Tue, 7 Jan 2020 08:22:47 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id E09FA5832 for ; Tue, 7 Jan 2020 13:22:46 +0000 (UTC) X-FDA: 76350903132.23.spark40_35534d2e1b539 X-HE-Tag: spark40_35534d2e1b539 X-Filterd-Recvd-Size: 8926 Received: from mail-il1-f194.google.com (mail-il1-f194.google.com [209.85.166.194]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Tue, 7 Jan 2020 13:22:46 +0000 (UTC) Received: by mail-il1-f194.google.com with SMTP id t8so45689603iln.4 for ; Tue, 07 Jan 2020 05:22:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=NI5D+6HBv2vXGauCHFmQ6Ym7RwGjVACGaqPZU9aEMx0=; b=qeiyQMTzMXoDkoGYEawF+3LImgcij7GlOX5efVHQjiMXbruhAMpfV62yLGGhIvhu9L p3v8PXxGq6yl5n1bDZHzgQlVA5l7vyuu2LMsm2BxckmywPuJ0TX4Jmsshfy2QOkIPC4Z 8wK7KoVQzm4HftyPqjlZhUuaNNE5JZUedNXnaSA7l7siiRrcPU4ZmvPCMsIJUY3kOwGK BMZnV5rASdD8Ba9FmzqGfzPTZke0FvxscwbHR5Zn9LiEBt0gALT+SMdDl51X1JXp2Vry FzkPXdzVsjFu9V0q16dmbtoLlXoSxUH0RcROi9VhNo4FRBd1Tpik5oJ16iYk4E4ST0wT mUdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=NI5D+6HBv2vXGauCHFmQ6Ym7RwGjVACGaqPZU9aEMx0=; b=pdBKhHLqJtazF9LA2IC2EoWtBEdKlYF6myc5ORWra7dJGFklXvABQm9Z2WJ0Ni1B7D eqsRO6LaNP1adfoMFKgVDMljSf5dCfITlBUNMBefgDdPLWCLLlga+qzeen6Ru9z7UF71 lLWyJLVixe31rUVQg0O5aAU4Ws0TQ2CkJ8fVQTgilIfXY1Bh87bYElc91DF0to0GnUsa rRXMzeQnk2ZYhFVxlArypYcYOs56zxfojbrTT1Z+ZYw+45kOIG7s89g6HLg9ewjwGlsd 4SFG0FSjOs8lkqCgAoZdq3RbWd/4SFQTyakZ9/sHW7J6PPF++W8A5k6pW3DSOsdngJIy QelQ== X-Gm-Message-State: APjAAAVE6Mg86uhOtHPaWRpehHTONm0VOEreWVDh8aaKGzY6mbxnxgCG TemY+5LAQJoreXN+7PT7AVtV2xOamTlSMva/Jk8= X-Google-Smtp-Source: APXvYqwO6B1feqA4p0tYLeeJMe9D20vrmn8I3hI5FWL/fpSImc5ZhCnZUUjCXRQ4iU2tdMOYizyUAgsp9YEgcwvX+Gc= X-Received: by 2002:a92:911b:: with SMTP id t27mr89595979ild.142.1578403365699; Tue, 07 Jan 2020 05:22:45 -0800 (PST) MIME-Version: 1.0 References: <1577174006-13025-1-git-send-email-laoar.shao@gmail.com> <1577174006-13025-5-git-send-email-laoar.shao@gmail.com> <20200104033558.GD23195@dread.disaster.area> <20200104212331.GG23195@dread.disaster.area> <20200106001713.GH23195@dread.disaster.area> <20200106213103.GJ23195@dread.disaster.area> In-Reply-To: <20200106213103.GJ23195@dread.disaster.area> From: Yafang Shao Date: Tue, 7 Jan 2020 21:22:09 +0800 Message-ID: Subject: Re: [PATCH v2 4/5] mm: make memcg visible to lru walker isolation function To: Dave Chinner Cc: Johannes Weiner , Michal Hocko , Vladimir Davydov , Andrew Morton , Al Viro , Linux MM , linux-fsdevel@vger.kernel.org, Dave Chinner Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 7, 2020 at 5:31 AM Dave Chinner wrote: > > On Mon, Jan 06, 2020 at 10:41:22PM +0800, Yafang Shao wrote: > > On Mon, Jan 6, 2020 at 8:17 AM Dave Chinner wrote: > > > We can clean up the code a lot by getting rid of the unnecessary > > > indenting by doing this: > > > > > > /* iterate the global lru first */ > > > isolated = list_lru_walk_one(lru, nid, NULL, isolate, cb_arg, > > > nr_to_walk); > > > if (!list_lru_memcg_aware(lru)) > > > return isolated; > > > > > > nlru = &lru->node[nid]; > > > for_each_mem_cgroup(memcg) { > > > /* already scanned the root memcg above */ > > > if (is_root_memcg(memcg)) > > > continue; > > > if (*nr_to_walk <= 0) > > > break; > > > spin_lock(&nlru->lock); > > > isolated += __list_lru_walk_one(nlru, memcg, > > > isolate, cb_arg, > > > nr_to_walk); > > > spin_unlock(&nlru->lock); > > > } > > > return isolated; > > > > > > > That's eaiser to understand. > > I wil change it like this in next version. > > Thanks! > > > > > > > And so to architecture... This all makes me wonder why we still > > > special case the memcg LRU lists here. > > > > Can't agree more. > > The first time I read this code, I wondered why not assign an > > non-negtive number to kmemcg_id of the root_mem_cgroup and then use > > memcg_lrus as well. > > Yeah, I've been wondering why the ID is -1 instead of 0 when we > have a global variable that stores the root memcg that we can > compare pointers directly against via is_root_memcg(). all it seems > to do is make things more complex by having to special case the root > memcg.... > > From that perspective, I do like your change to use the memcg > iterator functions rather than a for loop over the range of indexes, > but I'd much prefer to see this method used consistently everywhere > rather than the way we've duplicated lots of code by tacking memcgs > onto the side of the non-memcg code paths. > Agreed, that would be better. I will think about it. > > > Ever since we went to > > > per-node memcg lru lists (~2015, iirc), there's been this special > > > hidden hack for the root memcg to use the global list. and one that > > > I have to read lots of code to remind myself it exists every time I > > > have to did into this code again. > > > > > > I mean, if the list is not memcg aware, then it only needs a single > > > list per node - the root memcg list. That could be easily stored in > > > the memcg_lrus array for the node rather than a separate global > > > list, and then the node iteration code all starts to look like this: > > > > > > nlru = &lru->node[nid]; > > > for_each_mem_cgroup(memcg) { > > > spin_lock(&nlru->lock); > > > isolated += __list_lru_walk_one(nlru, memcg, > > > isolate, cb_arg, > > > nr_to_walk); > > > spin_unlock(&nlru->lock); > > > if (*nr_to_walk <= 0) > > > break; > > > > > > /* non-memcg aware lists only have a root memcg list */ > > > if (!list_lru_memcg_aware(lru)) > > > break; > > > } > > > > > > Hence for the walks in the !list_lru_memcg_aware(lru) case, the > > > list_lru_from_memcg() call in __list_lru_walk_one() always returns > > > just the root list. This makes everything use the same iteration > > > interface, and when you configure out memcg then we simply code the > > > the iterator to run once and list_lru_from_memcg() always returns > > > the one list... > > > > > > > Agree with you that it is a better architecture, and I also want to > > change it like this. > > That would be more clear and easier for maintiance. > > But it requires lots of code changes, should we address it in another > > separate patchset ? > > Yes. I think this is a separate piece of work as it spans much more > than just the list-lru infrastructure. > Got it. > > > And, FWIW, I noticed that the list_lru memcg code assumes we only > > > ever put objects from memcg associated slab pages in the list_lru. > > > It calls memcg_from_slab_page() which makes no attempt to verify the > > > page is actually a slab page. That's a landmine just waiting to get > > > boom - list-lru can store any type of object the user wants, not > > > just slab pages. e.g. the binder code stores pages in the list-lru, > > > not slab objects, and so the only reason it doesn't go boom is that > > > the lru-list is not configured to be memcg aware.... > > > > > > > Another new issue. > > I will try to dignose what hiddened in this landmine is, and after I > > understand it clearly I will submit a new patchset. > > The problem is memcg_from_slab_page() uses page->slab_cache directly > to retreive the owner memcg without first checking the > PageSlab(page) flag. If it's not a slab page, we need to get the > memcg from page->memcg, not page->slab_cache->memcg_params.memcg. > > see page_cgroup_ino() for how to safely get the owner memcg from a > random page of unknown type... > Understood, many thanks for your explanation. Thanks Yafang