From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EEBDE7719E for ; Mon, 13 Jan 2025 10:05:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 95E696B0089; Mon, 13 Jan 2025 05:05:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 90EA56B008A; Mon, 13 Jan 2025 05:05:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7AF906B008C; Mon, 13 Jan 2025 05:05:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 5A3546B0089 for ; Mon, 13 Jan 2025 05:05:27 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 077D981C2C for ; Mon, 13 Jan 2025 10:05:27 +0000 (UTC) X-FDA: 83001996294.05.32CE182 Received: from mail-vs1-f45.google.com (mail-vs1-f45.google.com [209.85.217.45]) by imf11.hostedemail.com (Postfix) with ESMTP id 217734000E for ; Mon, 13 Jan 2025 10:05:24 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=RhdGDrhq; spf=pass (imf11.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.45 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736762725; a=rsa-sha256; cv=none; b=G+ScA1H3DMFLfvSnyw9GbIZNN0kVRvMyF9WBHTGtEE0RJin4BbN4iI+P8qYUteSPhG8fVU IgsW2/IHcfEbAoFcwJwxeD6SBeVtvTeY1AV0HRFekuTzhGyOE4GMIveZjxYNA5B4Vw1wfp XjSlXelAa3fH343WXsOooKW9dDdqYM4= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=RhdGDrhq; spf=pass (imf11.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.45 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736762725; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EUiBJXCfVRnqypSDlPZyRl8jPOdxY8zICV7F6SLAoik=; b=QHwHnhMN5GggZAW6ku8hj7PX5ybApSMG8P6mBz7LVqpNBr1XeW4l+wJgzI2jBNVeRP2xb7 792yELm8DB7LsAmqTOJVaDTah/d4FPOUXrQMF1rmTbs4qQnBgyR7LRTG4x3AGo4GbzmVxW 0UK7PRDfMUTkyF82BDwV8673uU204SY= Received: by mail-vs1-f45.google.com with SMTP id ada2fe7eead31-4b24d969db1so1125540137.0 for ; Mon, 13 Jan 2025 02:05:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736762724; x=1737367524; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=EUiBJXCfVRnqypSDlPZyRl8jPOdxY8zICV7F6SLAoik=; b=RhdGDrhq5AIjDV/mYiRdcx6LHgTUYBIS5v/P0hcBYMmM2sVsiiik1AM0lvqibUNzNP dNDZRrWz0kGk/S972vMuXpUSczwfK1qfRMCBLxq4CyDsrwUHv8fR1bLJ4WXnnqsjUGS4 bpKdTDDEjxiIzdJ49BsMBew25x1sHiRD84kXFTw2TUvofsotpz/0KgOxxQGGINxcXjvG irNLVFgPvzncPtoJWl6fkKEaBRf7BYnCYJ49hJ0lKA81khxSNvcGrNJPJ/kl8Nzrk4KL 7Gyzm1pqFTtiP5FDSerioxkusvNy2xnZyKKWUzI1N/G64OaX9HLS/ovlUZP0PHLiBTkA FacQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736762724; x=1737367524; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EUiBJXCfVRnqypSDlPZyRl8jPOdxY8zICV7F6SLAoik=; b=pfM1QAOrc6f9EpL3hnfqG603NFKRbCcscoKdrFeTIJLwpT7xm4BfoNlnzZfq5EZd+R 7+mjhhTdoASHTeuTLIEs7X1Dz9KGfIwuRfxLjE88ArR+cLtPM2fD25aEMM9Y33+4H3z7 vCH8/QZqKLqryoo+Tvvr7n30LUzFgjxgmoEzgxbobkH7v0h2dJyfqeDQ2q+Aat+en3m1 et5Flf876B3Wa3dKfAy9LzTESDfJaNkpkwb5UbHxFQYWt/io5Q2ugpK2FITiR1oYGLGH yffhdW8PhyaBiPUQvHOBdJ3qfAivSqPM5U85NR9EBirLjZTZNCkzCn/VNgkAJmP/NjNN jHkw== X-Forwarded-Encrypted: i=1; AJvYcCVRsDjuZt29KOFbRCkOstym0zwTZMoMWb+HIUUFNa+okUyo7IhQxasATWxVhoHWPxkfcyAMbUv6Ew==@kvack.org X-Gm-Message-State: AOJu0YyD2YgTUsbM3oXeCTZCp+zIfcABBzSiHfxhCSQKKfUMUiyLAj1j Tq2Kgom1oClFQSgxRdVf5j8zcBOkJUteAo+RpucSR/i/X7OxtMhy6ER5/l3CbhxFf5DZp5bknah GFoTbmKdGDZd20tmr6oaft7Eb1bU= X-Gm-Gg: ASbGncskeYuGGe0JvW1YKT442gYTpWNom/JjNmw0sdNE8SVBLAHqwPA6/Kr+7wOgE/8 yG9EpRk+xlJ/qNtN0QdOAxaarQHEiYAFfHyyLpmKvfzyRY1oxzl0GTuscjTcNHyK4r3gk+Jbs X-Google-Smtp-Source: AGHT+IE6q5hDe9H5W3KBMuHgDzpsUzRlLOFm1fD4SAknIRAwO23h0sC5lfPhNP3gCxFoOlwfLk9su9iAbsCv/vI1QoI= X-Received: by 2002:a05:6102:4b8a:b0:4b2:cbe5:fbc5 with SMTP id ada2fe7eead31-4b3d10485f1mr16955064137.20.1736762724136; Mon, 13 Jan 2025 02:05:24 -0800 (PST) MIME-Version: 1.0 References: <1736335854-548-1-git-send-email-yangge1116@126.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Mon, 13 Jan 2025 23:05:13 +1300 X-Gm-Features: AbW1kvbrXvRbBfI2mc14_AWUQBTG6EuXcM05G09XID8_L5DpEKA5MH4382SP-hQ Message-ID: Subject: Re: [PATCH V3] mm: compaction: skip memory compaction when there are not enough migratable pages To: Ge Yang Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, david@redhat.com, baolin.wang@linux.alibaba.com, hannes@cmpxchg.org, liuzixing@hygon.cn Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 217734000E X-Stat-Signature: m14cbpfqqfha6hfwj84zmcmixuna1o3e X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1736762724-143570 X-HE-Meta: U2FsdGVkX1/vxHzQ/tcG+PN5HOxyePKeZ2v20cv2AzFOdvtHE9ZhTL4TZ6ZyuMr7DWSiU/q4ug4WqehxGZU6rZiNpppBeyqAXk+YaZajQM/fnNhxk9CsaMore4JDsZzwolqG4q7pNxVQGS1tcWxHKjb+7EP4Eup23an6vQYEvafB7eMUwBX0Mn/eN384X70+EQZLvoA9THGcdKdJwLcm5+j3xIzIxiBwifkBSAh6kZRdgnxoRfU6tRAPE0ffWu/15M9kIJXkroi0PzK5WuMQWYYk5z0rqzTZ/rYtNjcc+2mxVA4/EiL+9Fun69jutMS3395TLRWqGeam3Jy4m03aCNhqbhoNPFj5X+X2GYrDawrdIKiLfWvto+02skL/BBdGRxoPf4s7pC3fMRbweJfvfZvO0yBklwvGASS6B/J1uXKP9jdQPNxN/ZiF0WgySagHBA8mqEP1xWKUriY1EeemWDuVZGxkCNsEE2QJZwcQ6xQUXCqitpyasrQVg1N/8jlfA6O3gcbMGIV7abOEJgaMF5k8P5SoTEQQndqmAXyH+Ug95xFQV+weMLirWp6/CVFbidBnAss6RXshJ4Gi14yFqkHimNnLW7zgQ7B6Ryg65baN4tyMkOF+hJTv08ouzM8svQOHOmK4QszRHKPexx7ZfrKqh7F094URBOt25A/Sgwd/ndwNIfYM6g2nN3+mSQvoni8g0L66tVE/Ffp1FOHEIRpUakOoAKwCU8QhWQNXZQc0OwlntSqDkwfq0UgHmbyGZYbA00qylPvZocwy1jML6VrKJLeruFRHI8xemYvMNqewPrYB5JpeEP+bmGJg4khvkenMRyZeKEldZbhocYT9ts70TS1TForXxiJwDZbdp1tidctew/4IJInJhU1WzwbAmTo8+TECrXhGpwA0LfroJxzz8qEDRI84uAMD2E85UdTg2/SaCbwtmNQUKmmgbQb1vGDs+Igf60O5oLXXar5 MY/eUaxB IrF2sRVPOZHkixrV/ifIuhCE76sYqE7e4QySQFcWiJ0dvH9LECFHRL9Jb5i/375ekdEIDAKtpDiYaaTCGLQmDiq9Vh5FgvxZHnRmLwTZDkBVqcgqzQElmGc7/OpRFAr7pzCfiwD2ri48DRtkaPXtz3MNoyJ/GzsfnQrBW2Uq4LYWvlmMY99j12VI2N3caUT84P2JsZKAYzKBn+Yo8f0287LY3GoyEHqyysyLTecpKDxon4olsjTQ7Fl4Yj7OswJZzIaDxIQzmLOLcdnYyQxKYbVoi4dowPieMXEj3FGwr01PYl1hX75GQ9Il4IjPhDCI+ABpF7+v7UdyHHCR5/j8suy0FSq27c9y0dQBxl81wrcVWywbfM700wezy5HQCvdiX+DIg X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jan 13, 2025 at 10:04=E2=80=AFPM Ge Yang wrote= : > > > > =E5=9C=A8 2025/1/13 16:47, Barry Song =E5=86=99=E9=81=93: > > On Thu, Jan 9, 2025 at 12:31=E2=80=AFAM wrote: > >> > >> From: yangge > >> > >> There are 4 NUMA nodes on my machine, and each NUMA node has 32GB > >> of memory. I have configured 16GB of CMA memory on each NUMA node, > >> and starting a 32GB virtual machine with device passthrough is > >> extremely slow, taking almost an hour. > >> > >> During the start-up of the virtual machine, it will call > >> pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory. > >> Long term GUP cannot allocate memory from CMA area, so a maximum of > >> 16 GB of no-CMA memory on a NUMA node can be used as virtual machine > >> memory. There is 16GB of free CMA memory on a NUMA node, which is > >> sufficient to pass the order-0 watermark check, causing the > >> __compaction_suitable() function to consistently return true. > >> However, if there aren't enough migratable pages available, performing > >> memory compaction is also meaningless. Besides checking whether > >> the order-0 watermark is met, __compaction_suitable() also needs > >> to determine whether there are sufficient migratable pages available > >> for memory compaction. > >> > >> For costly allocations, because __compaction_suitable() always > >> returns true, __alloc_pages_slowpath() can't exit at the appropriate > >> place, resulting in excessively long virtual machine startup times. > >> Call trace: > >> __alloc_pages_slowpath > >> if (compact_result =3D=3D COMPACT_SKIPPED || > >> compact_result =3D=3D COMPACT_DEFERRED) > >> goto nopage; // should exit __alloc_pages_slowpath() from her= e > >> > >> When the 16G of non-CMA memory on a single node is exhausted, we will > >> fallback to allocating memory on other nodes. In order to quickly > >> fallback to remote nodes, we should skip memory compaction when > >> migratable pages are insufficient. After this fix, it only takes a > >> few tens of seconds to start a 32GB virtual machine with device > >> passthrough functionality. > >> > >> Signed-off-by: yangge > >> --- > >> > >> V3: > >> - fix build error > >> > >> V2: > >> - consider unevictable folios > >> > >> mm/compaction.c | 20 ++++++++++++++++++++ > >> 1 file changed, 20 insertions(+) > >> > >> diff --git a/mm/compaction.c b/mm/compaction.c > >> index 07bd227..a9f1261 100644 > >> --- a/mm/compaction.c > >> +++ b/mm/compaction.c > >> @@ -2383,7 +2383,27 @@ static bool __compaction_suitable(struct zone *= zone, int order, > >> int highest_zoneidx, > >> unsigned long wmark_target) > >> { > >> + pg_data_t __maybe_unused *pgdat =3D zone->zone_pgdat; > >> + unsigned long sum, nr_pinned; > >> unsigned long watermark; > >> + > >> + sum =3D node_page_state(pgdat, NR_INACTIVE_FILE) + > >> + node_page_state(pgdat, NR_INACTIVE_ANON) + > >> + node_page_state(pgdat, NR_ACTIVE_FILE) + > >> + node_page_state(pgdat, NR_ACTIVE_ANON) + > >> + node_page_state(pgdat, NR_UNEVICTABLE); > >> + > >> + nr_pinned =3D node_page_state(pgdat, NR_FOLL_PIN_ACQUIRED) - > >> + node_page_state(pgdat, NR_FOLL_PIN_RELEASED); > >> + > > > > Does the sum of all LRU pages equal non-CMA memory? > > I'm quite confused for two reasons: > > 1. CMA pages can be LRU pages. > > 2. Free pages might not belong to any LRUs. > NO. > > If all the pages in the LRU are pinned, it seems unnecessary to perform > memory compaction, as the migration of pinned pages is unlikely to succee= d. > Besides checking whether the order-0 watermark is met, > __compaction_suitable() also needs to determine whether there are > sufficient migratable pages available for memory compaction. Ok, but I am not convinced that this is a correct patch. If all your CMA pages are used by userspace=E2=80=94in other words, they are in LRUs=E2=80=94the sum = could become quite large, and `nr_pinned` might include non-CMA pages. In that case, `sum - nr_pinned` would also be quite large. The "return false" logic would= n't work as intended. I suspect the issue seems to have disappeared simply because your CMA is not being used at all. > > > > > >> + /* > >> + * Gup-pinned pages are non-migratable. After subtracting thes= e pages, > >> + * we need to check if the remaining pages are sufficient for = memory > >> + * compaction. > >> + */ > >> + if ((sum - nr_pinned) < (1 << order)) > >> + return false; > >> + > >> /* > >> * Watermarks for order-0 must be met for compaction to be ab= le to > >> * isolate free pages for migration targets. This means that = the > >> -- > >> 2.7.4 > >> > >> > > Thanks Barry