From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B46D8C4345F for ; Mon, 15 Apr 2024 21:16:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 309336B0093; Mon, 15 Apr 2024 17:16:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2B7F66B0096; Mon, 15 Apr 2024 17:16:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 180FE6B0099; Mon, 15 Apr 2024 17:16:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id EFD7B6B0093 for ; Mon, 15 Apr 2024 17:16:31 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id BDA55A1040 for ; Mon, 15 Apr 2024 21:16:31 +0000 (UTC) X-FDA: 82013024982.20.51B0FB7 Received: from mail-lf1-f44.google.com (mail-lf1-f44.google.com [209.85.167.44]) by imf13.hostedemail.com (Postfix) with ESMTP id D76552000D for ; Mon, 15 Apr 2024 21:16:29 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Zjyx8jjP; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of shy828301@gmail.com designates 209.85.167.44 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713215790; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/gBO3bQqorcLrlItRxZaPGQ7ONvyrajD2NgIHaH5MR8=; b=tGooJBEw/1/RYal8AiN/yK4hWF7YPO+mYudpS/SIRs6xFyhThiGRolYxeXaQFrCEarUiHj MV6VZsd0HfOy3e5D22DFIMfsi2yZRLR/GwCZdB7kFv5pr87bc/4eDVY5JsMVCbjgepyBrb +iZpRGbS6RYIEoHd8NxXyNuTvEJqA3A= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Zjyx8jjP; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of shy828301@gmail.com designates 209.85.167.44 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713215790; a=rsa-sha256; cv=none; b=tQii0qNBXjfvvllSh8v/qtoKXdjnwEUA/ZN1CU4QBmShBVGj7KaPsp247ajtmrG71S2xwi Etyo39v7U5EzyChysS9uTOAhopmd6XH17pGHDE/XBUlCwFY24tMeIhSOUhj2/pNWSpomoS CMmsTPQY3P7+QAVmBshNJ8VSkRJ18B0= Received: by mail-lf1-f44.google.com with SMTP id 2adb3069b0e04-518e2283bd3so2448858e87.1 for ; Mon, 15 Apr 2024 14:16:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713215788; x=1713820588; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/gBO3bQqorcLrlItRxZaPGQ7ONvyrajD2NgIHaH5MR8=; b=Zjyx8jjPbJkFaZTdA9+WaakatXpDxzApOU+BkIDQMQz2JDcZcI+IC+t/p4rj2DyYHR i1calG4nviLny1XZpnvIQEP9vQt13JJPu6EJqVBNShJtv8wesFw3yxEpILHqPXF1zJas 7Qid+sbv5UNvxtam4sdT7d0KDBWgkw4LsMtohh+d/bLjCd7OgqUWaXiG+zqQsgcxM17t ZLYo8Mz8uALc89wnD5T2nhW0lHD5gKYrbfTEfGJcwTEsS9VplmdsE0PIwQeOOdYYWZCg roQUoOVX6+4E772JBFvfb5U/HG0xkXLbwYxHTW3q/r14tmPc0BDcgEZ7c0ybzSjKEzQD H5AA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713215788; x=1713820588; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/gBO3bQqorcLrlItRxZaPGQ7ONvyrajD2NgIHaH5MR8=; b=YsRCN1cJgCIp6+mDTpRLeY3acmFx6T8hH0F/PzigLCrqtfJIW+b3j3FRXWafgSGibK 5UK+d+Z389jKvhbvMhm1NDa/GxCC7cBQamiiHNuSLI1yMF3ePOU5PkY0vdJ29OMKaKmt SsT2pHLVzkynjyCLaRGXkBS2D+hAPXNl3XzV7bKTxK5QUPVvZX4idCrKHXQog/xuYsCF 8tdKqvjSgxJS5Qi6eFBs70mk+DyeP+MMqPwipl2qLwJubr7+QEVr7LBpNo3n8WT0IdpK tUrM2D9FlYgAFYsdLpHaBlNUJhBZ5wohljppq6YmjRPKxw/a6jtgfstSRig55SeUoUKV QKOA== X-Forwarded-Encrypted: i=1; AJvYcCV8DejayHZSGuVTMYSAIKsuDJ8C71pjSZ43dOwNoAOKpMzN/pjOSHYWpxsJMaBbqOgkbQhr1gW1AvJN5r4N6WN93jA= X-Gm-Message-State: AOJu0YwWS6D5qfGgte3HnxiuayyEYkadIsd5Yqtud3bn8xO0z0YxN7Us wmjWseNn4Tfo/FM+K40Rk/kFibsjwCw1OZ4Me16rb7sYD18OPRrlGK/eKNptkvcl5OH9OJk3T7C LUEkagpfeGnXlt8qj4vGG8S6ugNk= X-Google-Smtp-Source: AGHT+IEDn5kGKqXTNIp9wi5FOGO4o+yR/DHKy6T42if6Z7v7yd8DMUUrry362dBPDSX4Q/0P54AkmG4jasb6JD7/cJA= X-Received: by 2002:a19:5e02:0:b0:517:8e01:266f with SMTP id s2-20020a195e02000000b005178e01266fmr8591806lfb.2.1713215787578; Mon, 15 Apr 2024 14:16:27 -0700 (PDT) MIME-Version: 1.0 References: <20240411153232.169560-1-zi.yan@sent.com> <2C698A64-268C-4E43-9EDE-6238B656A391@nvidia.com> <60049ec1-df14-4c3f-b3dd-5d771c2ceac4@redhat.com> In-Reply-To: From: Yang Shi Date: Mon, 15 Apr 2024 14:16:15 -0700 Message-ID: Subject: Re: [PATCH] mm/rmap: do not add fully unmapped large folio to deferred split list To: David Hildenbrand Cc: Zi Yan , linux-mm@kvack.org, Andrew Morton , Matthew Wilcox , Ryan Roberts , Barry Song <21cnbao@gmail.com>, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: u4ch36abor93oiyy8f1keb936e6qut56 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: D76552000D X-HE-Tag: 1713215789-79517 X-HE-Meta: U2FsdGVkX18H+O3OUEP1mTcUdBNkI37n1GsNgmOXgfZGuKW5WUoyEv6D85tZF5bRsVM8+7dx5YHeqTzIjCJw9kwe6TuFDop6r9d8MufAVcyqGD862x/Z8GfVO6AcA3w4SA62zKG4w51DYY1Glr4rkMj8Qsh9YyhaxcGWnH7/LwM1QaIXn332Blv8iYpKvoCy2eLh37cKw3YyexGgcfFQ0hUV+EU3sbRNd6asrmzlj84KyABanB7TToWOmQ5oAMf1+Pd1fb+POn3wr5Y4cn+VNPIsrKC36WrRRlR9W25GbiEr+tGa35U9e6uOGBpf3N/d7Xi5VF4cPOBMyITZOYbSAk1KZOW/3ogE6scOr1T5i9r0pNlvrg+urPUMQ63Z7WiK7vIZKxDvXuc+KylqxcuVRJTUxdU+W2X/SCwq47Hyt04okjaP3bBELD1AAYdOvBd6uI+jQQfIE8cTHBRWUug8ylHug+x8ENUQDTDSSMtLA5STxzMBIOpVKEhQ0U8j0NQgENPYCsDdkVm2NxHMi/jJyGrrZ3zTwRb0+TYOD2TP3Fnxx21KEVE9N/UOtpVggPUU/Czs8ETAhnFtT/um2BUSKJ8l9Oa7TPinn/uhqmLzrUl2mri6mT5yKR6hF2+izHgCYPBr1+EqpC1YpK8eV4MsYcR7D5XRYuC5fiGiC/5p2nNd2Tiny1YDG1KoPmc4Hti7IPuKQy2WGLxbLcZmsadMWfdipCyaBMaVlDt1qOUAACEzf2XuHuH6ARTPVoGozVCWU6bGkFj4R0xsQsz4Bq4ZZKZRX/zWcJpN/Yg0J0qq39hKgJ4jPYsyfYK00NX0Ese/CgzWzX+W6I0h3FpiqhUOsy4T/F4LwEO5FhuixBW+csmyyD2s4DVJr/okqEjvcqY55D9NMmRHkBr5fpeIb3ODAat6lMtTIgMmFExgW47bfrzdrOpoaSbHBgSLEwg7zyI0icT/tcw3igNOfT5gqyz xeg5uEO1 +7upFP3nPy9Prv6U6HRfeHb0FSOKx+TmD6b6mP6lR3LoiLtULfi/X5gH6hD+UGn1ZgbdBEQlbY6P39meR4y6hQfcznhqNVyEAbCXxs8StoDFvIagA833hwFP+z42h568k3eUa9nae3KzhtwPoyCwaP6/bgdJ26Vdlq4R3clC+fHIAyM/Rc0lhmiVAx3+yMwQ1Nuda2phWpc2QTwtajrZIrJ1NBMGMckB/wKqkuDIz0DhtebL022gpTbX1LB4lnU5KgZdkkOyy3KhYiK1n4GxHSjFFvMApoHQxTSj1/0YNpYAJdoDrMnC4fVZag7imakK+AHIQhVFKEZiRSqjMvExNAqK+YzqXJTkWjJm7Ob/nmNZ+fihjZfd6M49bIw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Apr 15, 2024 at 12:19=E2=80=AFPM David Hildenbrand wrote: > > >> > >> We could have > >> * THP_DEFERRED_SPLIT_PAGE > >> * THP_UNDO_DEFERRED_SPLIT_PAGE > >> * THP_PERFORM_DEFERRED_SPLIT_PAGE > >> > >> Maybe that would catch more cases (not sure if all, though). Then, you > >> could tell how many are still on that list. THP_DEFERRED_SPLIT_PAGE - > >> THP_UNDO_DEFERRED_SPLIT_PAGE - THP_PERFORM_DEFERRED_SPLIT_PAGE. > >> > >> That could give one a clearer picture how deferred split interacts wit= h > >> actual splitting (possibly under memory pressure), the whole reason wh= y > >> deferred splitting was added after all. > > > > I'm not quite sure whether there is a solid usecase or not. If we > > have, we could consider this. But a simpler counter may be more > > preferred. > > Yes. > > > > >> > >>> It may be useful. However the counter is typically used to estimate > >>> how many THP are partially unmapped during a period of time. > >> > >> I'd say it's a bit of an abuse of that counter; well, or interpreting > >> something into the counter that that counter never reliably represente= d. > > > > It was way more reliable than now. > > Correct me if I am wrong: now that we only adjust the counter for > PMD-sized THP, it is as (un)reliable as it always was. Yes. The problem introduced by mTHP was somehow workaround'ed by that commi= t. > > Or was there another unintended change by some of my cleanups or > previous patches? No, at least I didn't see for now. > > > > >> > >> I can easily write a program that keeps sending your counter to infini= ty > >> simply by triggering that behavior in a loop, so it's all a bit shaky. > > > > I don't doubt that. But let's get back to reality. The counter used to > > stay reasonable and reliable with most real life workloads before > > mTHP. There may be over-counting, for example, when unmapping a > > PTE-mapped THP which was not on a deferred split queue before. But > > such a case is not common for real life workloads because the huge PMD > > has to be split by partial unmap for most cases. And the partial unmap > > will add the THP to deferred split queue. > > > > But now a common workload, for example, just process exit, may > > probably send the counter to infinity. > > Agreed, that's stupid. > > > > >> > >> Something like Ryans script makes more sense, where you get a clearer > >> picture of what's mapped where and how. Because that information can b= e > >> much more valuable than just knowing if it's mapped fully or partially > >> (again, relevant for handling with memory waste). > > > > Ryan's script is very helpful. But the counter has been existing and > > used for years, and it is a quick indicator and much easier to monitor > > in a large-scale fleet. > > > > If we think the reliability of the counter is not worth fixing, why > > don't we just remove it. No counter is better than a broken counter. > > Again, is only counting the PMD-sized THPs "fixing" the old use cases? Yes > Then it should just stick around. And we can even optimize it for some > more cases as proposed in this patch. But there is no easy way to "get > it completely right" I'm afraid. I don't mean we should revert that "fixing", my point is we should not rely on it and we should make rmap remove code behave more reliable regardless of whether we just count PMD-sized THP or not. > > -- > Cheers, > > David / dhildenb >