From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1831C6FD18 for ; Tue, 18 Apr 2023 20:25:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F0D228E0003; Tue, 18 Apr 2023 16:25:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EBDAF8E0001; Tue, 18 Apr 2023 16:25:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D84A58E0003; Tue, 18 Apr 2023 16:25:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C569F8E0001 for ; Tue, 18 Apr 2023 16:25:46 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 86F8812022C for ; Tue, 18 Apr 2023 20:25:46 +0000 (UTC) X-FDA: 80695642692.10.9901482 Received: from mail-ed1-f49.google.com (mail-ed1-f49.google.com [209.85.208.49]) by imf28.hostedemail.com (Postfix) with ESMTP id 7F8A8C001B for ; Tue, 18 Apr 2023 20:25:43 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b="QKK/xcrY"; spf=pass (imf28.hostedemail.com: domain of dianders@chromium.org designates 209.85.208.49 as permitted sender) smtp.mailfrom=dianders@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681849543; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=np4Mh5emCjbqZYkVAnvgCrsWZdAU1h0zngtlF2/nwH8=; b=xdj5f2EJkY//5O5yBrFRsX9QuKwYSZsvmW9FSYeIXdTHNj6fWNoxrLbSYOqL8ptkhMNdth gF3hEs0bLTdFZVRAD9+HTMa2ptUSVh2B5zyfqJygIoE48hk7SkWUgwF+QlA87F5g4Wfbjc Eci5J4/vW1cbrH/fuxEQjYVWSMR6msM= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b="QKK/xcrY"; spf=pass (imf28.hostedemail.com: domain of dianders@chromium.org designates 209.85.208.49 as permitted sender) smtp.mailfrom=dianders@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681849543; a=rsa-sha256; cv=none; b=7ksWgIoi8PkigtiwPlXNws8Uw6crt6NiXpVjYs/5TU7gC/iBVTR/NRHf6GWQunMtvv6HOy 8VlcdWH8/tnUCUTDxlFf+ZZiFWrVXF1XnJNO8jNSEZD3+hb8xDbvnWKjOUPncQD7uijONd yyGMP5oQnTzc+WmccDkuRS2f0+YiFvc= Received: by mail-ed1-f49.google.com with SMTP id 4fb4d7f45d1cf-5067736607fso4218803a12.0 for ; Tue, 18 Apr 2023 13:25:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1681849540; x=1684441540; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=np4Mh5emCjbqZYkVAnvgCrsWZdAU1h0zngtlF2/nwH8=; b=QKK/xcrYPlrybZWi0pEAjCfiFSwalRnGfnxpotdS3/PYShqY4+PdahDDmLduubdtSx aPYoJvetTkR+2qJr2yCHRqmuKIreSash8SBK3N8QWNVvvHs6kqV8XoIfCM4g1KWZeQOP X/FAv/MIm8+eSwboa1fe2hjRWrq1LyboI8848= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681849540; x=1684441540; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=np4Mh5emCjbqZYkVAnvgCrsWZdAU1h0zngtlF2/nwH8=; b=IoIoch5CAdVri3zJEtKXzMqaGufxLOcBcB4Xgd0JzaU6n5pmx0p4onlorWvFbGPexW 3aOemTOgNDj11PgR28tHGk3JMAKAmGxQ+7eNQIp1K69J8SFE+8gLlT0j776no4WTWaqg 9VkuLmlhMvoIFTB/wEfH3pD98Fh0AlAeXc+jhybbfjBSQCn+zTQ4SHZCoOmDI9hiw8yp SiJ9VDhz8OfYFRFVUN0bBr11yefdp7tVw0lbfd04MuAwPxSq1cabm8BNpd1AULYuDTYc 2zpFmjtDFIpGTwv3sEjAwOXRXq2xHwkjNRLJlCjlNHKxFGeRVEBhUaOw2w9JDwCAmkjU S/Fw== X-Gm-Message-State: AAQBX9dOllP3JUNScMx3pAIT5TnNg6OoZfQrao9uDNGF2h3sfeEN21n2 FGlhiKu+LrsrwGquX7vpmvxnICOmXQ3qzCYYY/1YSQ== X-Google-Smtp-Source: AKy350aBhM1pROAdQdWVnP0HuqEaqM7PEktpznughJ52HdtBqSAGk+zPUrMp+7pARtaDexnhYv1dTw== X-Received: by 2002:a05:6402:3cb:b0:4fa:315a:cb55 with SMTP id t11-20020a05640203cb00b004fa315acb55mr3596720edw.21.1681849540160; Tue, 18 Apr 2023 13:25:40 -0700 (PDT) Received: from mail-wr1-f51.google.com (mail-wr1-f51.google.com. [209.85.221.51]) by smtp.gmail.com with ESMTPSA id bo25-20020a0564020b3900b005067d129267sm6439163edb.39.2023.04.18.13.25.38 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 18 Apr 2023 13:25:38 -0700 (PDT) Received: by mail-wr1-f51.google.com with SMTP id ffacd0b85a97d-2eed43bfa4bso2523514f8f.2 for ; Tue, 18 Apr 2023 13:25:38 -0700 (PDT) X-Received: by 2002:adf:dd91:0:b0:2c7:1483:9479 with SMTP id x17-20020adfdd91000000b002c714839479mr651568wrl.11.1681849538078; Tue, 18 Apr 2023 13:25:38 -0700 (PDT) MIME-Version: 1.0 References: <20230418095852.RFC.1.I53bf7f0c7d48fe7af13c5dd3ad581d3bcfd9d1bd@changeid> <20230418195335.GA268630@cmpxchg.org> In-Reply-To: <20230418195335.GA268630@cmpxchg.org> From: Doug Anderson Date: Tue, 18 Apr 2023 13:25:25 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH] mm, compaction: kcompactd work shouldn't count towards memory PSI To: Johannes Weiner Cc: Andrew Morton , Vlastimil Babka , Mel Gorman , Yu Zhao , Ying , "Peter Zijlstra (Intel)" , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 7F8A8C001B X-Stat-Signature: n7mbhobp1qbcj7y5rch5izaw96n6i7n8 X-Rspam-User: X-HE-Tag: 1681849543-399898 X-HE-Meta: U2FsdGVkX1+CExJSAV4dVH145ewLqxRxT2j4HRoaO5VQWTHouT7sY5Z1uVSBHiuKShO4Alk27cWbNCewxEweL+0NmkZzfVhLL/sz6PyKstAFwhLxhQHMZrAWFqsPtNoqkbKPCSF/CVtsHeQezNf5H+qXF7vNGJAWOZB05gf3y5bXqppHeNpefTdelrL3+VNo/UxPdra1KlH8ehqZe1vnhf+gjf8MWdJEhjgJoftM9vOI59DhOALp6FDgrU05Rg4i/tNWfNpgKXFU2hY7BZwUngYGZ5F8TFs2lZgpeB8zALqQOID0g9L2zbp6ZCQeVCToJ8w56533K4rhcF9/zdKyy2RN74YAbRXKEH6IPA6V2aFn3EgMjZ4lMtIDie/enwFYV5/huW2fbMUn+LZMsXK1/4E4tqtnjRnsCxD7bp4Jw64wZsnvERIXEYA05t7X3KX1h/7SlMTSptwyEyJDAeEwTWbhFTgwSSSA27XuihztHk7PT0i/gtPzYNaNF79hNpEc7AKhrZra2VpKsPS4jyEHmMqy4LeIBQ90QueabXJy3A2yYrFjFIp4az7d2yHsMilwkJ3iEaJWLL+gry71lFbSznFkyzwO9FAvxu/ofPd14f0jxxWN0kMuFm11tG9ZOywD+sUXt6YxsM7bdIuI4bwXIvYhfgYprUWnmOV/jUBaxae0QYtFwTZ42ANacNMnQVIZMQUY2VLsPhxSeOMDWUnz/9GhbZUDEzh8VF6BJ1SpbnDavlCNh+DSyPmBtg4j4f/YaqmJe+pV5RnT/f03pCqVwFHVX0ZRI73HkmYJbUdDbeaJjXj7FORqr4zERaBWfb/YFC7VfbFTKE5uRR2ahqSr4qTOIgltw/U9yk4JAzo1nM71Tm1Gh41cDXE6JyG7B8dNyIlRr8DB7yrtWEQoQozP8THZd4me/rheld0XuTLVf6Pp6dmr8u98wppoIL59usJ4Lnrfw2kxl8t3DxtUZty 6sPvocBM AuCblSrCvX8GJzm4uVdOKoNR2JTh9onBy7xsT7iRquH+nKFFs4kB7Basjsx+TVNxTAFb8rfOH8ByjYCfwiAJfkk+maVn94DAQ+RA8eXequhB2wxjJngThV3CuoFemzxi3eaeC+jkyRpzAVd4vOfAU/vJMzct/H52iwkbfRN3757Gw0oMgcyqE2sYGqmuQ/s+njsEz9oWe0AiCLy4TCj3wIOqL55f/C5DYvo6lXXs/8k+1mEsnaRjjK8J51pOPFHDgWJn+4VcQmw8jotehjL97Rwdgjw4FOAbeHbNQ84nEuGYuQs4Z39XsQG1/iDY03XvUAsFlKD1ulgO5j3LtA1x3KGkVqMqowvelC8fJC0gunKbW0YjqsruDP6ge7TvtVIpTWtBe X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, On Tue, Apr 18, 2023 at 12:53=E2=80=AFPM Johannes Weiner wrote: > > On Tue, Apr 18, 2023 at 09:58:54AM -0700, Douglas Anderson wrote: > > When the main kcompactd thread is doing compaction then it's always > > proactive compaction. This is a little confusing because kcompactd has > > two phases and one of them is called the "proactive" phase. > > Specifically: > > * Phase 1 (the "non-proactive" phase): we've been told by someone else > > that it would be a good idea to try to compact memory. > > * Phase 2 (the "proactive" phase): we analyze memory fragmentation > > ourselves and compact if it looks fragmented. > > > > From the context of kcompactd, the above naming makes sense. However, > > from the context of the kernel as a whole both phases are "proactive" > > because in both cases we're trying compact memory ahead of time and > > we're not actually blocking (stalling) any task who is trying to use > > memory. > > > > Specifically, if any task is actually blocked needing memory to be > > compacted then it will be in direct reclaim. That won't block waiting > > on kcompactd task but instead call try_to_compact_pages() directly. > > The caller of that direct compaction, __alloc_pages_direct_compact(), > > already marks itself as counting towards PSI. > > > > Sanity checking by looking at this from another perspective, we can > > look at all the people who explicitly ask kcompactd to do a reclaim by > > calling wakeup_kcompactd(). That leads us to 3 places in vmscan.c. > > Those are all requests from kswapd, which is also a "proactive" > > mechanism in the kernel (tasks aren't blocked waiting for it). > > There is a reason behind annotating kswapd/kcompactd like this, it's > in the longish comment in psi.c: > > * The time in which a task can execute on a CPU is our baseline for > * productivity. Pressure expresses the amount of time in which this > * potential cannot be realized due to resource contention. > * > * This concept of productivity has two components: the workload and > * the CPU. To measure the impact of pressure on both, we define two > * contention states for a resource: SOME and FULL. > * > * In the SOME state of a given resource, one or more tasks are > * delayed on that resource. This affects the workload's ability to > * perform work, but the CPU may still be executing other tasks. > * > * In the FULL state of a given resource, all non-idle tasks are > * delayed on that resource such that nobody is advancing and the CPU > * goes idle. This leaves both workload and CPU unproductive. > * > * SOME =3D nr_delayed_tasks !=3D 0 > * FULL =3D nr_delayed_tasks !=3D 0 && nr_productive_tasks =3D=3D 0 > * > * What it means for a task to be productive is defined differently > * for each resource. For IO, productive means a running task. For > * memory, productive means a running task that isn't a reclaimer. For > * CPU, productive means an oncpu task. Ah, thanks for the pointer! > So when you have a CPU that's running reclaim/compaction work, that > CPU isn't available to execute the workload. > > Say you only have one CPU shared between an allocating thread and > kswapd. Even if the allocating thread never has to do reclaim on its > own, if it has to wait for the CPU behind kswapd 50% of the time, that > workload is positively under memory pressure. I guess I'm so much in the mindset of having 2-8 CPUs that I didn't think as much about the single CPU case. What you say makes a lot of sense for the single CPU case or for very parallel workloads that take up all available CPUs, but what about when you've got extra CPUs sitting there idling? In that case we're really not taking any CPU cycles away from someone by having one of those CPUs doing compaction in the background. I'm sure this has been discussed before somewhere, but I'd also wonder if there's ever a reason why we should prioritize kswapd/kcompactd over user programs. AKA: why don't kswapd/kcompactd run with a big "sched_nice" value? If kswapd/kcompactd were low enough priority then even in the single CPU case (or the multiple CPU case with a very parallel workload) kswapd/kcompactd would never be taking away significant time from a real program. > I don't think the distinction between proactive and reactive is all > that meaningful. It's generally assumed that all the work done by > these background threads is work that later doesn't have to be done by > an allocating thread. It might matter from a latency perspective, but > otherwise the work is fungible as it relates to memory pressure. If all compaction should count towards PSI then it feels like we have a different bug. The proactive_compact_node() function should also be marked as counting towards PSI, shouldn't it? -Doug