From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2C4FC47073 for ; Wed, 10 Jan 2024 19:24:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1082B6B0087; Wed, 10 Jan 2024 14:24:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0B84C6B0088; Wed, 10 Jan 2024 14:24:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EC1F86B008A; Wed, 10 Jan 2024 14:24:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D680E6B0087 for ; Wed, 10 Jan 2024 14:24:18 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 991D4140AD3 for ; Wed, 10 Jan 2024 19:24:18 +0000 (UTC) X-FDA: 81664377396.04.81F7922 Received: from mail-yb1-f173.google.com (mail-yb1-f173.google.com [209.85.219.173]) by imf27.hostedemail.com (Postfix) with ESMTP id 0E8B940006 for ; Wed, 10 Jan 2024 19:24:15 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=TJ71p+wD; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of yuanchu@google.com designates 209.85.219.173 as permitted sender) smtp.mailfrom=yuanchu@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1704914656; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=00Yi17YMYf+/drsG/cIwBnmgomwMdlxbviIy1sqifXU=; b=ZCuPE/2gWobjvvRXIrLfpt9yBof6jTCCfiunSjRqqmYCIOX3sE+NE4gZYHbTw5ODzY/Lo2 8OwnxCsmuLHPmgP9vTESseYToimjJ2jhQ07J7CKdPDAaO/6a9k8YiAhYZrB1RkRstTRICk q5rFkoqhXWdqOWpjJNQxmBBdS7uEyZM= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=TJ71p+wD; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of yuanchu@google.com designates 209.85.219.173 as permitted sender) smtp.mailfrom=yuanchu@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1704914656; a=rsa-sha256; cv=none; b=GAflsJVPen7Mr16vAQKPbTyYGXpMf7WLSlSTM6pe12J6O9N07knNDomvEkhv6a8EunkEj7 A0Tc8XQnOZsn2ZSjBNdCzWAuEAsSy3RRYF0DtuXuk0BUfw2aYuZVDE3q3OPAHjF1Yd0Xoh exFYQAgjkBySM7vWfKydty3gpCJyruo= Received: by mail-yb1-f173.google.com with SMTP id 3f1490d57ef6-dbf2740355fso762071276.0 for ; Wed, 10 Jan 2024 11:24:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704914655; x=1705519455; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=00Yi17YMYf+/drsG/cIwBnmgomwMdlxbviIy1sqifXU=; b=TJ71p+wDDUu7gCgJ/dcaNzwRwQ1IJPonUwZk8cRClRBUYwX2FKqtEPfW0oZlJ1TQRE IdaB1vxra4b9e5c9UMCI6qYXm2Ds0lfEIpqhdhxvwd9JmkT1gvhC2ndTACd8rX8JM+jD jixYLLELw8ucIANVUEoG8bBtBpeCJev9TVe7Ga95C8b5jh2Z/lo6b+BGoElUeBguvds2 7a1hqFFIThNZauviSBpwoMJYSAMRSLs6bG6t2b6032dFIxoFsySMDGHVsatQ4njZlGwx 0P9q5rnrC8wCndcg0CRlHbpg4L/dTck1hp0Ep+8veEEIpkNFv6BSvqU8OajG6LWSf9dB CUWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704914655; x=1705519455; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=00Yi17YMYf+/drsG/cIwBnmgomwMdlxbviIy1sqifXU=; b=Ed0qPwB8NXHRYFqNfLez7mANcEtD4M/gDGY7efpltIfXsK77n1ZyDiIrFnzCPR/IzL e1HrfHQgKsK0ZFnt0ERl+98KgMQeGukjRecjfdo5Na8v4CzbYQWyCKOM4RB1pFFTG2mP PbFux+Xu6U4vBpH0keVl1P1vXFyngCysS2Rs9j/qJFE3f9rSL/aSnI8lKuf+tq5pAJ8a 6EqsFOTBWxzkrIY2Kp6gU6Cdy8/+1OnSrhAcg9F7gqXctHJ4gNLqU1y1bTX3KOpH/laj 6FGMUWI9QcsnGGOBR8NGrwGT7VPiN/PJXZaj1NwwAM+LlRhwGd+JYE9MiMbDrryAUdoA JF5w== X-Gm-Message-State: AOJu0Yw0ZPiUn9k+G9VI44jEQGAaUQRNE5fE5faK53kHkLW90SZAy7B/ uJPJ2D9uN1rRioHnKbsFefO9bWfyCrg0Uu05pVHn5L0t+Ehb X-Google-Smtp-Source: AGHT+IGK59DptdLBDkoMvQ5Mq82HCYvcaLO4aXCyrQUiVlQWIeFXtW818NFvZMbl9+P3PeQrUjB7YlAScK90ulv7518= X-Received: by 2002:a25:a121:0:b0:dbd:e213:4122 with SMTP id z30-20020a25a121000000b00dbde2134122mr359240ybh.35.1704914654914; Wed, 10 Jan 2024 11:24:14 -0800 (PST) MIME-Version: 1.0 References: <931f2e6d-30a1-5f10-e879-65cb11c89b85@google.com> <20231222154037.62823-1-henry.hj@antgroup.com> In-Reply-To: <20231222154037.62823-1-henry.hj@antgroup.com> From: Yuanchu Xie Date: Wed, 10 Jan 2024 11:24:02 -0800 Message-ID: Subject: Re: [RFC v2] mm: Multi-Gen LRU: fix use mm/page_idle/bitmap To: Henry Huang Cc: rientjes@google.com, akpm@linux-foundation.org, =?UTF-8?B?6LCI6Ym06ZSL?= , linux-kernel@vger.kernel.org, linux-mm@kvack.org, =?UTF-8?B?5pyx6L6JKOiMtuawtCk=?= , yuzhao@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 0E8B940006 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 6rpr6hqzeg5u6ha341sqmy81iqpd6su1 X-HE-Tag: 1704914655-778621 X-HE-Meta: U2FsdGVkX1+QTvT1bv8WEtKJXQd1WOSooQw4bfRmNQ/fAl8VRQaqJ+bvDcQc04ynTO2ZlHADbgw1PwxyneIRwEt+S69VQ1ypZv/Lq5wnxxCbLtVy6HfnfnsNqnrTPf38C5AIWg18Boz/aoXIeil9JgcQT7MBAy8oPWUTDdEjdIeBRswjT46FO4tMpC+y56RkotJbyYC2wBCs/y1mKBsHijqQl6ATa7V/lf5FQU9ltESJKhck6RExAh0jbI3RXcr/Z4pBzhaAIRyPvcf7EwZPGMWWl3BOwcr7HalN4hQ52ojBBqYN+m69QmE5XgFjKr0NqwGj3iama56tCU9PBetsS0O6wh4iMfNADC1V/8iYSd0NZIDCPT/T2wd7eODmJq9XVbTr7TXqNb/H7He+27Lrk8fL71XB295lSy6ijj/h3IMV5HOKEyULwrWPQd9uXKgcqGRoUxQZa1LejK+dCd5eCR41XwqJqVV4qL1uBiNRkGsJBL+J9AvzEYgd4UHENYWh1kDG2lyhrmqwxlf//+X9LE8MHptTNjHnrKHzw+fBpvTvtFQx9zR00pWt0viK+mVJieNDmEpyuXFKhbDK9LxMHJEkPXL2aD1+hwYc9OjpivRg2p3n0ow2ndlriz5LiP7NTfU3+BDnkYC+XUTGGbuYr2WiMb9e2Z6UaMgxlaAVCDh9xxCu6cJht314if82hL3pBipeEyG8BT9nPP8aQiSMH7ZtS8iZ5cFwuMtzh+5Eq7X1G+DdLiXrnXm0XYZYhIrcLeYS8S7/2D/Zw3euHYdnI5eJicZPS68lUm0u3Gd6AD8dhJO0IpILv9D7lj1yqnsEB5ifPs/7Wr5Q2JI1UwEkalV/1oAwmWdUcLqou5J4QKYwuMWIQa65BLvC63NCeh0k3xjv78EcTiBzVF43w1OI39G8+9oGICkmuMKDW8MI+Zy5fplS3POZujB1dNq2Rrs4tgkowlXETuWJ/DXblX5 /ceWA7No ka1qxp2ekrkS6cBQEPIayLfCP4ab8KsDcYQrFhs52l24dtIZM12cRhpi1uN7XJnUYriIq2lblKswWCVpyOtB5SUmx5A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Dec 22, 2023 at 7:40=E2=80=AFAM Henry Huang = wrote: > > - are pages ever shared between different memcg hierarchies? You > > mentioned sharing between processes in A and A/B, but I'm wondering > > if there is sharing between two different memcg hierarchies where roo= t > > is the only common ancestor? > > Yes, there is a another really common case: > If docker graph driver is overlayfs, different docker containers use the > same image, or share same low layers, would share file cache of public bi= n or > lib(i.e libc.so). Does this present a problem with setting memcg limits or OOMs? It seems like deterministically charging shared pages would be highly desirable. Mina Almasry previously proposed a memcg=3D mount option to implement deterministic charging[1], but it wasn't a generic sharing mechanism. Nonetheless, the problem remains, and it would be interesting to learn if this presents any issues for you. [1] https://lore.kernel.org/linux-mm/20211120045011.3074840-1-almasrymina@g= oogle.com/ > > > - do you anticipate a shorter scan period at some point? Proactively > > reclaiming all memory colder than one hour is a long time :) Are you > > concerned at all about the cost of doing your current idle bit > > harvesting approach becoming too expensive if you significantly reduc= e > > the scan period? > > We don't want the owner of the application to feel a significant > performance downgrade when using swap. There is a high risk to reclaim pa= ges > which idle age are less than 1 hour. We have internal test and > data analysis to support it. > > We disabled global swappiness and memcg swapinness. > Only proactive reclaim can swap anon pages. > > What's more, we see that mglru has a more efficient way to scan pte acces= s bit. > We perferred to use mglru scan help us scan and select idle pages. I'm working on a kernel driver/per-memcg interface to perform aging with MGLRU, including configuration for the MGLRU page scanning optimizations. I suspect scanning the PTE accessed bits for pages charged to a foreign memcg ad-hoc has some performance implications, and the more general solution is to charge in a predetermined way, which makes the scanning on behalf of the foreign memcg a bit cleaner. This is possible nonetheless, but a bit hacky. Let me know you have any ideas. > > > - is proactive reclaim being driven by writing to memory.reclaim, by > > enforcing a smaller memory.high, or something else? > > Because all pages info and idle age are stored in userspace, kernel can't= get > these information directly. We have a private patch include a new reclaim= interface > to support reclaim pages with specific pfns. Thanks for sharing! It's been enlightening to learn about different prod environments.