From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E030CC4167B for ; Mon, 4 Dec 2023 17:49:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 04C186B02BF; Mon, 4 Dec 2023 12:49:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F3E896B02C5; Mon, 4 Dec 2023 12:49:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E066F6B02DE; Mon, 4 Dec 2023 12:49:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CE2D86B02BF for ; Mon, 4 Dec 2023 12:49:07 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 98C241602F5 for ; Mon, 4 Dec 2023 17:49:07 +0000 (UTC) X-FDA: 81529871934.17.2214A2F Received: from mail-io1-f51.google.com (mail-io1-f51.google.com [209.85.166.51]) by imf25.hostedemail.com (Postfix) with ESMTP id D850DA0025 for ; Mon, 4 Dec 2023 17:49:04 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PuLartBr; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.51 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701712144; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Q67gi0sVaKsqnMZLHzOOPDUWGyW46YeGeCa+WMG6ae8=; b=a6vsQQHhuBvLYWxbYbiSOpwc5cooXuiUeOzFBNmXI18pQDzqbtdVGpiDDN0w+m4JYJmJ9x vdXOtp5J55crzmgLiQaCEMUJ+SfFEP81z7VMjPq3vk6LGJZi178OuwPMO6gQf4mXljf509 w4kRICdlMLlxUfaoztoPKWgOR1antLw= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PuLartBr; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.51 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701712144; a=rsa-sha256; cv=none; b=ef9rnXlQNwpdcPa/kSOp6mchqa/VbZKPMtAlAigABwK6PKE9oxQXt/r0w3eGKoSWTAmD1D wE1VSOnATYkOyiy0aMMDnZPsT1O0yhOwWq3XloT/8GmSTSAhl/nAaetzPuEV6FUz2RpYJL OMws/rZB1fAu693UEDP18DfvdYWNeYk= Received: by mail-io1-f51.google.com with SMTP id ca18e2360f4ac-7b3b819f8a3so146625139f.1 for ; Mon, 04 Dec 2023 09:49:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701712144; x=1702316944; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Q67gi0sVaKsqnMZLHzOOPDUWGyW46YeGeCa+WMG6ae8=; b=PuLartBr/Czbye2S5BCYyBVjGjlE9FqXvMYx5URDax3/AWe+WiqFLPP3nuWYdAejkX IHt0oknZ20mSyEXuF5hn50IULRR4czVgauhMD6idY7VZiZcuvSmYTwKDrxGchUoygzv6 SG180BHWUGY7G/ZbZopItz/K7RgeOSqU7p/Ccg6+AGF79IOvTSE2JkYkkebGkYYhOPY2 1zhVQcGUEfgB7E8pxaf8TRltKHm8pBTud0LHMyfX3lySjt6BjYvjfG+uJmjNWD+FBxNg LnVF4uv57NX7u3ZM2ek4LnHGjAHFiXBUzoSgBIP/Jdo6sIxztTSSYNNY478hum2hgM+4 6H9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701712144; x=1702316944; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Q67gi0sVaKsqnMZLHzOOPDUWGyW46YeGeCa+WMG6ae8=; b=BXYy5CNF7qXBVaZumDavOHqBbpF2lr4STU9eJBYUgYWsnwm18HMkkDqznafBSwfWmP hMe4aYJxZeRiYJh+aSfSRtG3mbunaJJOnEXg6iz9eLlwjzOYC93Dq8kLt/K9Eoyf/Y5/ N5n6/4WPjFKIFDYA3Xi/lzU85SmKi/9hfOUQ5UZNye/Uy2yCXAJLcRl0WElQY7PYWc2g zH5Sq1/z9VoVzBhdgRqj1b7Q1V0NcVSm3LKRDP5drb3mjaYNC0G5dg1DXnIGL25SQIyI 6NSHWUWzxCV+TfPGqE32OXZ6jIB8M6OckZ40s3d047VIQtYRS3jnYe/CfJQCaIYWYWOc wIEA== X-Gm-Message-State: AOJu0YzxHXiM1QmYV83OMOogmiv9tE7FnU4CtOQGLIh/vDsJMCop9kh7 UQk9YRl5Gj74eRUAQcRqnFfiAETlXIuvzvkL0rs= X-Google-Smtp-Source: AGHT+IFt8GSzYZxVi1Z4SafwJMjDfy5IZvFf41ASLqOjmQPOc+b/WfHB3rutmq2iGSn8FSt0XVivG4D4QRvU+UeeVQg= X-Received: by 2002:a6b:c408:0:b0:7b4:2af4:479e with SMTP id y8-20020a6bc408000000b007b42af4479emr2378483ioa.24.1701712143938; Mon, 04 Dec 2023 09:49:03 -0800 (PST) MIME-Version: 1.0 References: <20231130194023.4102148-1-nphamcs@gmail.com> <20231130194023.4102148-2-nphamcs@gmail.com> <20231130203522.GC543908@cmpxchg.org> In-Reply-To: From: Nhat Pham Date: Mon, 4 Dec 2023 09:48:52 -0800 Message-ID: Subject: Re: [PATCH v8 1/6] list_lru: allows explicit memcg and NUMA node selection To: Chengming Zhou Cc: Johannes Weiner , Matthew Wilcox , akpm@linux-foundation.org, cerasuolodomenico@gmail.com, yosryahmed@google.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, chrisl@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, shuah@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: D850DA0025 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: ubhncez4czoamw5rzu6tntxjxzsetfmw X-HE-Tag: 1701712144-987646 X-HE-Meta: U2FsdGVkX1+8fJfTIIZqbADv+fUisFI8ahDsCOGdW9tAL+QWt4X834Js+0vRiw7zKVd3dGzCX4slUkhuu93nVoecHjU7KOo58zPxuNMTqLEUYiqP6OwhQssK6KdhelTgL+I2Ba7mss75cH88wB69AT11SMD64ctJ4twmeL3ozg+cKAabHYQQiqMmlesBC4M92GjPaluag0yLzU1rO9CPpaC+7cc06C4V97hA0R6pMMN5LrV2F6AjnoZhCC7l46alHNfpJlKMw9eVAK2CDAokT6Y0O2xrTHW6Md69CQvOzTx6FCYaD7zQbzIJS39J1A34vDlnsAyGruxGRkB+sCNMblo+5Dfer4Tx8EXP/ZxlJsExP1L2uPbSMz3SOzOstUHjtOeUGjzUmCzLNmmER4X1VZT05OmWnKHRi/BwIWSHNzsTnhX5wbHLBYh8IhsQYb55yFXqFy2p921TVp/KvFnkI+fu3T1zFDc8dD0EuZGz41uw69KTHqKesnXCf8Fuf1V0LUjyqczM6j3rxQ2jbFb6hZAB7JH3icpCuNnKUw04fSwHd75800slPa086cfXS6GwnyVCjBIvRoNo97QWG/VeGs6q4xqVNQ+DG8BZp2ialZJslduyKoRBULsfjQbgvixD6bvNlt/TfPavpViPGF7zzNnLrH62kiMaDZ3NxK2Ou14/U6an5Y0CUJKEkb4n12FpXocfwlecWu48Vtd0MBP7J9zOILXGX552vxm/kveqL19UiAkTm8+i1eiay0QGxlY0xL+Jtlr/GHgu7Z/xMwO4CVhRv8c+BMxK9ckqbB2leb2cdJtyJNpftr6SE8zu2iqmJ1jCJpbntbgvJiXa5fiFKaJF+YfBDsdonnrqbUJfzWiXX5W3g8RmRb08ZmnEJ6YwiccfT9YmEgmRcReGoN0naewwOTdMfSges3jgtHHVfqxiX7Ki2XyGB26L1lkF2DX0LbQv58ALke8DU5Zq8wg MvHAgPV3 Xfg5jBEEHTnHwk9UCqLgst0fWFcaND/H6lD2BLoA+EtBNz6XLIjys/8u2Ndy46novySSAXlZJ2YnDO8AIV3vweNXWgq6PY2+cd9vT06jgteSFCtd5m6UCpMIN0NMSHsz8AZgQxkxESz+9UnY6TTRIIhJKfhTKhnFH3a/sBKYRxqu/ULevB670IaOZCOZRPYtUYr26oo3BhgkWCtAm4XLI6EhTtDKN9qnwguQ39hdmS3gTGJ3w+LooD4u2qwSEJ1jk/QpAnSgp1e8qjsYenN8sbD/zqZGFgbT8spS1UWMCksnRYhyBHS+9BoUqpmBjuCUDDhU0HAMEfxqC8KAnmASG8JzvvVFxRNoOFI0MwnZIRB69BmZAA/UtlYQP0aOcnJhL1KUlbTTuVIfixc9cfCwkj934x/rcJLShnYnh2CuVicqCPzcMTMuhUM+SxWKdUiMSjjO3yxkmQG75vCs1SJARutsvFN6y9Lhvd4cb4U1XRWpqg9KLppWsv+1jBs/f4FJwyzJe X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Dec 4, 2023 at 12:30=E2=80=AFAM Chengming Zhou wrote: > > On 2023/12/1 04:35, Johannes Weiner wrote: > > On Thu, Nov 30, 2023 at 12:07:41PM -0800, Nhat Pham wrote: > >> On Thu, Nov 30, 2023 at 11:57=E2=80=AFAM Matthew Wilcox wrote: > >>> > >>> On Thu, Nov 30, 2023 at 11:40:18AM -0800, Nhat Pham wrote: > >>>> This patch changes list_lru interface so that the caller must explic= itly > >>>> specify numa node and memcg when adding and removing objects. The ol= d > >>>> list_lru_add() and list_lru_del() are renamed to list_lru_add_obj() = and > >>>> list_lru_del_obj(), respectively. > >>> > >>> Wouldn't it be better to add list_lru_add_memcg() and > >>> list_lru_del_memcg() and have: > >>> > >>> +bool list_lru_del(struct list_lru *lru, struct list_head *item) > >>> +{ > >>> + int nid =3D page_to_nid(virt_to_page(item)); > >>> + struct mem_cgroup *memcg =3D list_lru_memcg_aware(lru) ? > >>> + mem_cgroup_from_slab_obj(item) : NULL; > >>> + > >>> + return list_lru_del_memcg(lru, item, nid, memcg); > >>> +} > >>> > >>> Seems like _most_ callers will want the original versions and only > >>> a few will want the explicit memcg/nid versions. No? > >>> > >> > >> I actually did something along that line in earlier iterations of this > >> patch series (albeit with poorer naming - __list_lru_add() instead of > >> list_lru_add_memcg()). The consensus after some back and forth was > >> that the original list_lru_add() was not a very good design (the > >> better one was this new version that allows for explicit numa/memcg > >> selection). So I agreed to fix it everywhere as a prep patch. > >> > >> I don't have strong opinions here to be completely honest, but I do > >> think this new API makes more sense (at the cost of quite a bit of > >> elbow grease to fix every callsites and extra reviewing). > > > > Maybe I can shed some light since I was pushing for doing it this way. > > > > The quiet assumption that 'struct list_head *item' is (embedded in) a > > slab object that is also charged to a cgroup is a bit much, given that > > nothing in the name or documentation of the function points to that. > > > > It bit us in the THP shrinker where that list head is embedded in a > > tailpage (virt_to_page(page) is fun to debug). And it caused some > > confusion in this case as well, where the zswap entry is a slab object > > but not charged (the entry descriptor is not attractive for cgroup > > accounting, only the backing memory it points to.) > > Hi, > > I have a question, maybe I missed something since I haven't read all > the earlier versions. > > IIUC, the problem here is that "zswap_entry" has different memcg and node > than the "page", so I wonder if we can just charge "zswap_entry" to the > same memcg of the "page". > > Like we can do these when allocating the "zswap_entry": > > old_memcg =3D set_active_memcg(memcg) > kmem_cache_alloc_lru(zswap_entry_cache, lru, gfp) > set_active_memcg(old_memcg) > > The good points are: > > 1. "zswap_entry" is charged to the memcg of "page", which is more sensibl= e? > > 2. We can reuse the kmem_cache_alloc_lru() interface, which makes code si= mpler > since we don't need to manage list_lru_memcg by ourselves. > > 3. Maybe the new list_lru_add() and list_lru_del() are not needed anymore= ? > Since the "zswap_entry" is of the same memcg and node with the "page". > But don't know if THP shrinker still need it. > > Thanks! That idea was considered in earlier iterations/discussions of the patch series as well. Charging things is not free - there is an overhead associated with it, which is why we are usually selective about whether to charge something. We were not super keen to do this for zswap_entry just to plumb around the list_lru's restriction. Might as well pay the price of extending the list_lru interface now. If in the future, not charging the zswap entry causes a separate isolation issue, we could revisit this decision and charge it. Otherwise, IMHO we should just stick with this for now. > > > > > Yes, for most users - at least right now - the current assumption is > > accurate. The thinking was just that if we do have to differentiate > > callers now anyway, we might as well make the interface a bit more > > self-documenting and harder to misuse going forward, even if it's a > > bit more churn now. > > > >