From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1749EED63A for ; Thu, 12 Sep 2024 18:51:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E025B6B0092; Thu, 12 Sep 2024 14:51:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DB1676B0093; Thu, 12 Sep 2024 14:51:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C517A6B0095; Thu, 12 Sep 2024 14:51:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id A81C16B0092 for ; Thu, 12 Sep 2024 14:51:07 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 0F191401E7 for ; Thu, 12 Sep 2024 18:51:07 +0000 (UTC) X-FDA: 82556978574.19.194A6E6 Received: from mail-lf1-f47.google.com (mail-lf1-f47.google.com [209.85.167.47]) by imf03.hostedemail.com (Postfix) with ESMTP id 33AC420011 for ; Thu, 12 Sep 2024 18:51:04 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=nAYK2nQk; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf03.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.47 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726167059; a=rsa-sha256; cv=none; b=PUo26zZ9VmPSmiGhqIwOF7gZP63sUbdGEr6biW6ZLD+YI5MTk6MsC8TQ7ydNplshV9Rj5g vqizfrWi7+aRcBTn67UpFi8O/6Vo1BKawlNk9Z/9wl+RBRHFgc/dilIkwtLCyQtjF1A7NS gSDIcXffjkSq4iIOs4vRQQu9dbqCAh0= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=nAYK2nQk; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf03.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.47 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726167059; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=o7bWKO38Qf894pq8zFNg+EuWT5DdQohpoosxwSmJBDM=; b=pWocLRGEfNCCvj+5gGPgx/xOIsteUoJwBIR0P8OBQjmDP4yiBzW6tyAjSyTwy1Vu4I0adt X4c12+ZCv0V/yx5DjBjvZipNK9kIobPpcamMFSjHHK5EqCncjHPb0NzVGD4+F3+Jb6cRh4 GxEd+XHR/4miQrBbWw1nrCpCBTp4wP0= Received: by mail-lf1-f47.google.com with SMTP id 2adb3069b0e04-5365928acd0so1675244e87.2 for ; Thu, 12 Sep 2024 11:51:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1726167063; x=1726771863; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=o7bWKO38Qf894pq8zFNg+EuWT5DdQohpoosxwSmJBDM=; b=nAYK2nQku/WovWQ0G3DEifnaWuz1Ucu23aNkAqCLzSmowQCnbTj+VkwazRk6JdvS8V SIvK2X8/KDfdaM1jccH9M4jeK3sZjVObNjrUiCFbkUijtkl3IN3B/zfThgLc0zTgqbom rPbh5Hu/sraJMfi3O4jrvEK41uh0hIRBcXh5FbDd+92096NX0ayv2iwr+iPeIbok+3Dl hX26OO26JQYN0qWZuImIq0quncgLEcNdFMtLHqRgIxcMrBKg4e+6zTA7JHCM48wS7Yl+ kEX83yE3n/3SK3mKhnjy6E4cgPUoV53axFJD6mAypmSkOntKCD0u1LDEJ1fuAsjuk0+O TJCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726167063; x=1726771863; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=o7bWKO38Qf894pq8zFNg+EuWT5DdQohpoosxwSmJBDM=; b=hIQgYO6aFgltjPRHfRl/ZxAd2QQsGaMGdv/ctUfTn/eSwexatxISO5ecLCl5PM1Zm+ DTxgWWWODFEpBO3Z/fKQVjn1j4LWLncFsQBKx3c4D3PRp+IPtiZ0TG6GPUXIfSivyReU FrylIeacbrDbI26VvtZT8uajafugiSjUYjgcOf5I4katnouMi+AUyDTLftRDBPcCX/Kk nMnVIlgVPvf8NHp/2kFSnN0wmGHSTvdVx+Gp2PTGQp+qyzrfT13c/hSa31D1YOPEFEGV dz6ZySCVTza2sPOGIQutmcQvPrC+dBEZEPkv4PzsuYBxyeVDWPCD5UwnGnYWCpd1MCeO LoXg== X-Forwarded-Encrypted: i=1; AJvYcCUS30wf+ewmiZ5j3c54YXfK2mD1xitR9smy//F/rVX5dRtUT5bgO1gbXA2PE3oLMD1XOwjxABOzSw==@kvack.org X-Gm-Message-State: AOJu0Yyo5TAHg7O49EuHWTyvDPIhJ6O4TmIEjIu3MLOJvWrgrZhrn2DY Bvz0Mlo+ZJISqdPWY2kCuzza0Wc1m+Uh4Q89Pvi7gWxEaLhsO9/EV9OVXEQqPVK1SkAKFN7Ty7G lTjQ8CrRRe2oOCWTdru93XklvuJscxdyp2hTD X-Google-Smtp-Source: AGHT+IGbsQyMnaE2mOrQPBKLZNE9IHxt090oc8KlhsXnAdhsVXdMcywgStpHdkAnRMuzThus9L/uQOrGK/gKgUNwOw8= X-Received: by 2002:a05:6512:b23:b0:52e:f99e:5dd1 with SMTP id 2adb3069b0e04-53678feb480mr2327014e87.47.1726167062504; Thu, 12 Sep 2024 11:51:02 -0700 (PDT) MIME-Version: 1.0 References: <172547884995.206112.808619042206173396.stgit@firesoul> <84e09f0c-10d7-4648-b243-32f18974e417@kernel.org> In-Reply-To: From: Yosry Ahmed Date: Thu, 12 Sep 2024 11:50:26 -0700 Message-ID: Subject: Re: [PATCH V10] cgroup/rstat: Avoid flushing if there is an ongoing root flush To: Nhat Pham Cc: Jesper Dangaard Brouer , tj@kernel.org, cgroups@vger.kernel.org, shakeel.butt@linux.dev, hannes@cmpxchg.org, lizefan.x@bytedance.com, longman@redhat.com, kernel-team@cloudflare.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, mfleming@cloudflare.com, joshua.hahnjy@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: ocbzb7b5xqgqhzbi6a6sj3kdzjqmxq4f X-Rspamd-Queue-Id: 33AC420011 X-Rspamd-Server: rspam02 X-HE-Tag: 1726167064-558856 X-HE-Meta: U2FsdGVkX1+RXiYSpDNGHyCyVys68zYtHQqu+WMnjC6yHT8LhJnT1lSfgGrb10ccC0Yaj2sf5smbmFwsBDIpzBJ+sX7EEj1s5jf0ypA1plX+52AekTb6bK2NBH3zb1qDrzBImv5AsJCMPj0ik7zECOktHhaFXLOAdVdRTe6o+vihGrJNHLXzSY8sMlVmk0SDCrpDVBhaaT+klK9C/c3QCtxrGqWBT9rOvtX+qxNcEQS3K9l45wqZ+J/HYhuMt027QHGaGW40c4UOjzIvoCQSkloakKGvdNi5blaCVmQyrkv3RC4noNmPtgZTRmQlG/zhblIyb9EZSCN3R9ZiF8ix+vIY5JZLEZOf4LNmUJdCH8fmuSJ2hz4RE8cPcCD0xRDg33W6Ul0QmJjmlkNCJxKUEN8l9F0R0qy0chnVJFzYkrEWoz7cRnDyAz0VmS91MYbSvkNH9fAJ/8H4rJ3HjxKf5oP9mvVsMnitWo5FLK5eo8hkbDAgoVZ/PrDVeYQGQROkoauJx3GHDCKhei1esXYblTEflPVg/kLZt5W4TBA4s6//QtPKogH3dHrpr3XdF2MsVUBLCfZCPE9Mvoi7nXjFiq6oIl4ZRXQ6ZJWsmhvAwaD1VHjJJUTJ+BWsA0WCC2qArNoXV16HDhRg3UIqhKIIZl4JsioIYPq16I+9tYtnqAUZ/8wAWVpl+lhiELEnMyNfUazCaMOH+WWqWzF4ZujVf5wbqXnvDdMh9ngxQQFSRHlGveKKPXYJAtAGIzM/gNCFbSH19fhOdfoGZHqlltUtyv3ur5NMFxg1W1vdvf5WFk/PJ2rEB7G9cUBzdF72R4jMcWcA4aEu3HgAznT0D8eNGKtnkOVvo1QROAbz/HshBkE+AG/tsZF0NNh/GjMscFqvhjyVs6Mjdd6Sw3rko4BVWJMjMQXsmNixuwhHfh0lFGlrLv87ICh7tYPYyMwcm4eJqHPhtHZgjyAQDiuBrko VhePvMMr PG7KpcraK8jGUpM/dYRyDosw6jF0PUVW/pEJuiiQctXFRdXuSG9ursGS+/ZoZ2Hkaew4GPs1Ht0s1d3Xc5GAGBek3ROMOTp/UVWqaRM3qI4ttP0Qx5sQDf/zJjOgNsmoEy+E+z8CUDs9QN5HFmwbsEsD0hq53Gg3HGo3TmQiHR7I7Svmx6nOSUqnszMyouipJ2CRDbcsQCtSvsoOIlf8Nz7tSWh6NyVbYS5ZRbJBAgKmC5O6xK8U5KJXdb0+hB9+6yA4q8KQVxo/w5zJnqem205oz3Pn3JWRkitMfYPmNff0uswRhsN+osooWHHqjZjegnadS X-Bogosity: Ham, tests=bogofilter, spamicity=0.026018, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Sep 12, 2024 at 11:25=E2=80=AFAM Nhat Pham wrot= e: > > On Thu, Sep 12, 2024 at 10:28=E2=80=AFAM Yosry Ahmed wrote: > > > > > > > > I'm not, but Joshua from my team is working on it :) > > > > Great, thanks for letting me know! > > FWIW, I think the zswap_shrinker_count() path is fairly trivial to > take care of :) We only need the stats itself, and you don't even > need any tree traversal tbh - technically it is most accurate to track > zswap memory usage of the memcg itself - one atomic counter per > zswap_lruvec_struct should suffice. Do you mean per-lruvec or per-memcg? > > obj_cgroup_may_zswap() could be more troublesome - we need the entire > subtree data to make the decision, at each level :) How about this: > > 1. Add a per-memcg counter to track zswap memory usage. > > 2. At obj_cgroup_may_zswap() time, the logic is unchanged - we > traverse the tree from current memcg to root memcg, grabbing the > memcg's counter and check for usage. > > 3. At obj_cgroup_charge_zswap() time, we have to perform another > upward traversal again, to increment the counters. Would this be too > expensive? > > We still need the whole obj_cgroup charging spiel, for memory usage > purposes, but this should allow us to remove the MEMCG_ZSWAP_B. > Similarly, another set of counters can be introduced to remove > MEMCG_ZSWAPPED... > > Yosry, Joshua, how do you feel about this design? Step 3 is the part > where I'm least certain about, but it's the only way I can think of > that would avoid any flushing action. You have to pay the price of > stat updates at *some* point :) In (2) obj_cgroup_may_zswap, the upward flush should get cheaper because we avoid the stats flush, we just read an atomic counter instead. In (3) obj_cgroup_charge_zswap(), we will do an upward traversal and atomic update. In a lot of cases this can be cheaper than the flush we avoid, but we'd need to measure it with different hierarchies to be sure. Keep in mind that if we consume_obj_stock() is not successful and we fallback to obj_cgroup_charge_pages(), and we already do an upward traversal. So it may be just fine to do the upward traversal. So I think the plan sounds good. We just need some perf testing to make sure (3) does not introduce regressions.