From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6DBDC433EF for ; Mon, 21 Feb 2022 19:09:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3B3888D0002; Mon, 21 Feb 2022 14:09:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 360B68D0001; Mon, 21 Feb 2022 14:09:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 201068D0002; Mon, 21 Feb 2022 14:09:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0181.hostedemail.com [216.40.44.181]) by kanga.kvack.org (Postfix) with ESMTP id 0E2178D0001 for ; Mon, 21 Feb 2022 14:09:20 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id C428D98788 for ; Mon, 21 Feb 2022 19:09:19 +0000 (UTC) X-FDA: 79167725238.09.893135F Received: from mail-yb1-f178.google.com (mail-yb1-f178.google.com [209.85.219.178]) by imf18.hostedemail.com (Postfix) with ESMTP id 616B31C0007 for ; Mon, 21 Feb 2022 19:09:19 +0000 (UTC) Received: by mail-yb1-f178.google.com with SMTP id j2so36511168ybu.0 for ; Mon, 21 Feb 2022 11:09:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=l8g4mPJl7yZGMKdsJ14ySAI3dRSwtq6Uc9uUcvHHcIM=; b=bKvDNDqCla7cG6jmlE79n26yt7VBFgTq07oPlomZLdsroY6TS5IGyuT/IcQ2ayfXTn Go9k1mSfIBy1hgtRK55S5L9TAkHBcRMnlmhTZvkbA5YdUYqKdFLHNgUkukVhZ50xiKPk rQtDph8c8NQB5ryJxa5ydXWfdE2QyknvArecnL8drb4HIiYysKDov4BXvWlcgptvX1uY oXWn5Wkogwgjf3FHS0RF5WbolDFy7Hard9j+gOvp1DnC+1uY8xb4nidGGKgOVChn9Cu1 SUMjf6fTlgxLV5yNp+zCCH6LKDYGu6MCSypdlGGh0IZpJadSKQ6/Q3s3HSpAte8cTA62 5faQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=l8g4mPJl7yZGMKdsJ14ySAI3dRSwtq6Uc9uUcvHHcIM=; b=ulF2Ba9awk3xY17fTISSzTMXYS23v1/zYViyPFsKPCUt6oQS/3L+M00F2ULEPvUCt9 ZGxlvrnKPvPrbmFdGZrnAZXExqN3nUlSnxSvlx3Qc4FNJD+Du2abw0tk9PWSh7/yeF0J Jh5YeO06W4wOB5azcQVI/iAvk3OzGIFb+tqQv9fuNtJZkbYsL8kz8Qf2fS3LGMopTiNc jG0Ff0hARQ0h0ficjC/SDtbffM4Y0hSHiinf80H17aOlnQFA57Du4Hcv2KqVBK0veqqT kaP1uIvXiZaABbrTl6HY7G/LrFppbIY6B+JWmDc3zjiI2WaN1lBDSAXMM8RxYF0dyE4U fAxw== X-Gm-Message-State: AOAM533mv9X4RJRpsoMjpvK5jhKvXGeu6xLVl5Zq1CasCMzyq3ptLb9q k5Z5Td6QJH3NswI1sejr4/Zgy6QyF4uS4+jcoNPRrw== X-Google-Smtp-Source: ABdhPJzMmNojJwJ8uebXW/vVelS+B7jNWLNzMD4tT1J8g0Z4pCv5rF1XWL7oeDFC+y7XYdBvhRZNk/fY9WEEpiziI5o= X-Received: by 2002:a25:1906:0:b0:61d:9576:754e with SMTP id 6-20020a251906000000b0061d9576754emr19357895ybz.426.1645470558324; Mon, 21 Feb 2022 11:09:18 -0800 (PST) MIME-Version: 1.0 References: <20220219174940.2570901-1-surenb@google.com> In-Reply-To: From: Suren Baghdasaryan Date: Mon, 21 Feb 2022 11:09:07 -0800 Message-ID: Subject: Re: [PATCH 1/1] mm: count time in drain_all_pages during direct reclaim as memory pressure To: Michal Hocko Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, peterz@infradead.org, guro@fb.com, shakeelb@google.com, minchan@kernel.org, timmurray@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" X-Rspam-User: Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=bKvDNDqC; spf=pass (imf18.hostedemail.com: domain of surenb@google.com designates 209.85.219.178 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 616B31C0007 X-Stat-Signature: zimzfhspikf416um1sk3jkr3h7m5fju1 X-HE-Tag: 1645470559-478194 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Feb 21, 2022 at 12:55 AM Michal Hocko wrote: > > On Sat 19-02-22 09:49:40, Suren Baghdasaryan wrote: > > When page allocation in direct reclaim path fails, the system will > > make one attempt to shrink per-cpu page lists and free pages from > > high alloc reserves. Draining per-cpu pages into buddy allocator can > > be a very slow operation because it's done using workqueues and the > > task in direct reclaim waits for all of them to finish before > > proceeding. Currently this time is not accounted as psi memory stall. > > > > While testing mobile devices under extreme memory pressure, when > > allocations are failing during direct reclaim, we notices that psi > > events which would be expected in such conditions were not triggered. > > After profiling these cases it was determined that the reason for > > missing psi events was that a big chunk of time spent in direct > > reclaim is not accounted as memory stall, therefore psi would not > > reach the levels at which an event is generated. Further investigation > > revealed that the bulk of that unaccounted time was spent inside > > drain_all_pages call. > > It would be cool to have some numbers here. A typical case I was able to record when drain_all_pages path gets activated: __alloc_pages_slowpath took 44.644.613ns __perform_reclaim 751.668ns (1.7%) drain_all_pages took 43.887.167ns (98.3%) PSI in this case records the time spent in __perform_reclaim but ignores drain_all_pages, IOW it misses 98.3% of the time spent in __alloc_pages_slowpath. Sure, normally it's not often that this path is activated, but when it is, we miss reporting most of the stall. > > > Annotate drain_all_pages and unreserve_highatomic_pageblock during > > page allocation failure in the direct reclaim path so that delays > > caused by these calls are accounted as memory stall. > > If the draining is too slow and dependent on the current CPU/WQ > contention then we should address that. The original intention was that > having a dedicated WQ with WQ_MEM_RECLAIM would help to isolate the > operation from the rest of WQ activity. Maybe we need to fine tune > mm_percpu_wq. If that doesn't help then we should revise the WQ model > and use something else. Memory reclaim shouldn't really get stuck behind > other unrelated work. Agree. However even after improving this I think we should record the time spent in drain_all_pages as psi memstall. So, this patch I believe is still relevant. Thanks, Suren. > -- > Michal Hocko > SUSE Labs