From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D327C77B61 for ; Sun, 16 Apr 2023 19:25:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E78598E0002; Sun, 16 Apr 2023 15:25:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E286D8E0001; Sun, 16 Apr 2023 15:25:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CEFDA8E0002; Sun, 16 Apr 2023 15:25:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id BF9098E0001 for ; Sun, 16 Apr 2023 15:25:43 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 7486012015C for ; Sun, 16 Apr 2023 19:25:43 +0000 (UTC) X-FDA: 80688233766.28.0723103 Received: from mail-yw1-f179.google.com (mail-yw1-f179.google.com [209.85.128.179]) by imf24.hostedemail.com (Postfix) with ESMTP id AD1DA18000A for ; Sun, 16 Apr 2023 19:25:40 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=1vkT+hAA; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf24.hostedemail.com: domain of hughd@google.com designates 209.85.128.179 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681673140; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MaaXUmWYMUrm34mHy8xc1MH5y2IuXmSQmwypNMLJjoU=; b=tRhEZBnP9JDtQ+S+U0DhD3zBsdbNo0NY8J9AaPoOYf2RfLgc4g1BMDFVfzOvTwTjbUpNES zoP3juBjiko52e82wy0D58kIFpnn6IRetbJnUnF6Ek4A7cpnkc64lV55FIGvk+GLhIVoB0 YolmxIeCVRF6LPWkoPHiAxyMX5f71WU= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=1vkT+hAA; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf24.hostedemail.com: domain of hughd@google.com designates 209.85.128.179 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681673140; a=rsa-sha256; cv=none; b=gTcRMTyGorDTwDnMOacNU3/c9o1kjEvw3xlOXY1Ky8BKTD2PjieNZXtbt2rAchybvPMpao wkWxd5lvLRL+OPgs6wffHQlQVU5FiTV/n5FUPAbuZ/BpGeFX+hJsJ4ltVzWo2oenlWa1pV +6KaCVgylxVP+gQ2NKkR4DQIscVFIFY= Received: by mail-yw1-f179.google.com with SMTP id 00721157ae682-54fc337a650so147103717b3.4 for ; Sun, 16 Apr 2023 12:25:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1681673140; x=1684265140; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=MaaXUmWYMUrm34mHy8xc1MH5y2IuXmSQmwypNMLJjoU=; b=1vkT+hAAWPKcBqQRjg5UlEuk7b/6YGv9MDW7ocM6tyOydCQ8m188DAXZgYCjKYgskz l2WmKzdeIvIpplhACjFYmHbx53zO0q+HQwzxe5p5dKYP/zmwWfGPKSW0Q6ocK17IlorQ 1REaf9+uusXPEJOfHMzNR92oZiPCgN9+CpKgjJiggkmT1hrE0wOb4HuizMvt/FoHbDu4 Fh2t5T36T1jcZkgepypMcH846gV0XmKgrwg7pWOvbDi70FzRnFDqg+rD/t6DVFm4hbEO s4xbCAgP8S4lH5NxtnThScJmZrax+zspeTnS9Sdsb2y0uWGlWTFSmrYV93/N2wFbgS+V 0+NQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681673140; x=1684265140; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MaaXUmWYMUrm34mHy8xc1MH5y2IuXmSQmwypNMLJjoU=; b=Mo6Osza6gPqD8UXHTMMOlnXhuBLmNayH+zRtiqE25WdaFb9cvvQoLNIvZ+7PcAcHJA 2E8DykDsn0xAt3DU4hbfwu86WLTVL/5PIbqna7WRaNyJoZVqO/Q/wu2khVNowHw1QWrT 0MkIZbSS5PqtNItnbikIOonvDAwOrSzEiAW1g1E5uagE3gLZyu06ssRI3lL4fFUyt7Yk LIimjpTCeQjieOtEKMEG27gsfkq7edqTV3w0sEr8ZfOFvxNcChxMu/TspBpb8drD/Az5 0gYZzxNw6Bc+xF/1j/rPbY25bxofAnRZZMnqJPzh4My6aDimUDxgU95AwwQ/iFsye77s /6zw== X-Gm-Message-State: AAQBX9f77bhNkdOTYymXuuGqfBz6HSWigz3WfZNSIe8mVf/uiVqPOpwf A3XJRV9Io8VssMVDkSKN0ABcjg== X-Google-Smtp-Source: AKy350ZQaC5PHvOTs/Ci+GGGlU987ym/CM5N0qzNMVlGoxzf3xpQizKJqUG6Xt6FbOcsrYeS4JwNDA== X-Received: by 2002:a81:4644:0:b0:54e:dcf2:705b with SMTP id t65-20020a814644000000b0054edcf2705bmr12031663ywa.47.1681673139695; Sun, 16 Apr 2023 12:25:39 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id b186-20020a811bc3000000b0054eff15530asm2641407ywb.90.2023.04.16.12.25.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 16 Apr 2023 12:25:39 -0700 (PDT) Date: Sun, 16 Apr 2023 12:25:37 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Zi Yan cc: "Matthew Wilcox (Oracle)" , Yang Shi , Yu Zhao , linux-mm@kvack.org, "Kirill A . Shutemov" , Ryan Roberts , =?ISO-8859-15?Q?Michal_Koutn=FD?= , Roman Gushchin , Zach O'Keefe , Andrew Morton , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: Re: [PATCH v3 5/7] mm: thp: split huge page to any lower order pages. In-Reply-To: <20230403201839.4097845-6-zi.yan@sent.com> Message-ID: <26723f25-609a-fe9c-a41a-e692634d892@google.com> References: <20230403201839.4097845-1-zi.yan@sent.com> <20230403201839.4097845-6-zi.yan@sent.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: AD1DA18000A X-Stat-Signature: bbhcpkgd4edf9iji5bihqz8gqcq1h1mr X-HE-Tag: 1681673140-6220 X-HE-Meta: U2FsdGVkX18FuKkZxESdHgt3VNcLxPKK1FTw/Aa5QHETm5kisTi/VEpHr7PKe4NSXKYdnIi3OWKZdJo3tLG7TZTAc4cL+anwMk7Hh/WXG4Y0fwkGJT6Ot/DphW/sD/Ew/SkDWB51iouPqw3sVfNG+TZsjsvbmA2puw1tlYARrma7i35TcuyCJiTqeE7H9hf1MnvVGn5RqofmLJhDfH+168+x1ejZejc68CI9DnQn801OgV6qQVPyBe+rEH/lgk1V3/tqmvQlgzgHqHEQMUgczzIlHBS+9GwiQ6gCwE4nE/WbglFv4FYvzxJkNaHb0VIo3erCuTuQBuHu9iFrOpKsXDvXdgKO/BzQtCYqsAT6C06AWWLa21Q/9efRqo4WdZxdIyQisjwFVftjxuji3xUtnk9kNj86kPV932IJV1Be5J/ec9xluMOC1KsUWe1BqEuAmDjXEFLUgNsrCSAk5Jkzsv/UMcus7Y+xIc6xXJKPPk71gjCR+nDj36Azn4H7GUQAoyRg3Vu4P6dS+risFQW9h/bUCAua+6PkzczkBK9iY5HELjPKlxJ9wZw43tCWEu4Aw47tXxEMfCwC+B51jXZG6H6eqwt19kd77OO/IzLE4tITdWR3fXQkI6eA93nuXhTJSFvlkfcAimBK/NfLZsW369SsdJXa2LdiPAFlsPDxiJAGd+LpOXrTAMqHmkI2T/9gCboBD886WhBzi9B6kytDT1+YBA33t0j2o5x0FGQ5N1BTvzd0nRon40DUjE7RnN4/DDWHQ2c4tADG5Zs5yfamADBenmq4Mk2/ZxNjAOcDCKgtmiowhvYe+VOxqRKl8uxLOyztHkVcZz1XWUDn472zvlOtVdF5LB57tqOrUZwqqFRqJxAGIVI2NcP4annT9WccpEJQ3GaHAzBRIDEUrVLLhItsEBBSbduISje1MqTzOYZ2TAGWMFuprhOgA9qvhxVw07gwMUesmB5COgSijX2 lomkYnTD b3oeZtBvmLX0VbUFYk9jYxFIs7wcZ6Fj8KPY3em47Hgek6ju/laib5jo8d3uKe954yFat+E6sSALz7bve6et1a1+vBX7xSVXaBMMa7pCzsoL3DoGX8v2nzL/MK4SYIzLcH8ONNJQKanDJFtd7JFhyB2DjOKy1KLUJylBor1VyrjpSCfFJ7Lrqo4TsliciexWfskU6IcsWGPkfgambbnYKkDNYc0XyXIrLGVAjkXcqoFE/dloPa6VcSwcgaVsK5aH3QefCxzrXfxXdpRRhCQ5e6DkkRfprzWSVrOO3KbVDHy9fiEVzTi4ZtcL/Q3bYtzDtgPTGFRsOtp+rtLylExaI3qw7D2LRXdgR5sVvw8Anvw35TFJ1BBpkXP9N6rdrlpRmQnMVb6qZg63IuBsH8pzVNslWUw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 3 Apr 2023, Zi Yan wrote: > From: Zi Yan > > To split a THP to any lower order pages, we need to reform THPs on > subpages at given order and add page refcount based on the new page > order. Also we need to reinitialize page_deferred_list after removing > the page from the split_queue, otherwise a subsequent split will see > list corruption when checking the page_deferred_list again. > > It has many uses, like minimizing the number of pages after > truncating a huge pagecache page. For anonymous THPs, we can only split > them to order-0 like before until we add support for any size anonymous > THPs. > > Signed-off-by: Zi Yan > --- ... > @@ -2754,14 +2798,18 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) > if (folio_test_swapbacked(folio)) { > __lruvec_stat_mod_folio(folio, NR_SHMEM_THPS, > -nr); > - } else { > + } else if (!new_order) { > + /* > + * Decrease THP stats only if split to normal > + * pages > + */ > __lruvec_stat_mod_folio(folio, NR_FILE_THPS, > -nr); > filemap_nr_thps_dec(mapping); > } > } This part is wrong. The problem I've had is /proc/sys/vm/stat_refresh warning of negative nr_shmem_hugepages (which then gets shown as 0 in vmstat or meminfo, even though there actually are shmem hugepages). At first I thought that the fix needed (which I'm running with) is: --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2797,17 +2797,16 @@ int split_huge_page_to_list_to_order(str int nr = folio_nr_pages(folio); xas_split(&xas, folio, folio_order(folio)); - if (folio_test_swapbacked(folio)) { - __lruvec_stat_mod_folio(folio, NR_SHMEM_THPS, - -nr); - } else if (!new_order) { - /* - * Decrease THP stats only if split to normal - * pages - */ - __lruvec_stat_mod_folio(folio, NR_FILE_THPS, - -nr); - filemap_nr_thps_dec(mapping); + if (folio_test_pmd_mappable(folio) && + new_order < HPAGE_PMD_ORDER) { + if (folio_test_swapbacked(folio)) { + __lruvec_stat_mod_folio(folio, + NR_SHMEM_THPS, -nr); + } else { + __lruvec_stat_mod_folio(folio, + NR_FILE_THPS, -nr); + filemap_nr_thps_dec(mapping); + } } } because elsewhere the maintenance of NR_SHMEM_THPS or NR_FILE_THPS is rightly careful to be dependent on folio_test_pmd_mappable() (and, so far as I know, we shall not be seeing folios of order higher than HPAGE_PMD_ORDER yet in mm/huge_memory.c - those would need more thought). But it may be more complicated than that, given that patch 7/7 appears (I haven't tried) to allow splitting to other orders on a file opened for reading - that might be a bug. The complication here is that we now have four kinds of large folio in mm/huge_memory.c, and the rules are a bit different for each. Anonymous THPs: okay, I think I've seen you exclude those with -EINVAL at a higher level (and they wouldn't be getting into this "if (mapping) {" block anyway). Shmem (swapbacked) THPs: we are only allocating shmem in 0-order or HPAGE_PMD_ORDER at present. I can imagine that in a few months or a year-or-so's time, we shall want to follow Matthew's folio readahead, and generalize to other orders in shmem; but right now I'd really prefer not to have truncation or debugfs introducing the surprise of other orders there. Maybe there's little that needs to be fixed, only the THP_SWPOUT and THP_SWPOUT_FALLBACK statistics have come to mind so far (would need to be limited to folio_test_pmd_mappable()); though I've no idea how well intermediate orders will work with or against THP swapout. CONFIG_READ_ONLY_THP_FOR_FS=y file THPs: those need special care, and their filemap_nr_thps_dec(mapping) above may not be good enough. So long as it's working as intended, it does exclude the possibility of truncation splitting here; but if you allow splitting via debugfs to reach them, then the accounting needs to be changed - for them, any order higher than 0 has to be counted in nr_thps - so splitting one HPAGE_PMD_ORDER THP into multiple large folios will need to add to that count, not decrement it. Otherwise, a filesystem unprepared for large folios or compound pages is in danger of meeting them by surprise. Better just disable that possibility, along with shmem. mapping_large_folio_support() file THPs: this category is the one you're really trying to address with this series, they can already come in various orders, and it's fair for truncation to make a different choice of orders - but is what it's doing worth doing? I'll say more on 6/7. Hugh