From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69AB1C35274 for ; Fri, 15 Dec 2023 12:44:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D272E8D012D; Fri, 15 Dec 2023 07:44:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CAEC18D0121; Fri, 15 Dec 2023 07:44:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B5B618D012D; Fri, 15 Dec 2023 07:44:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A041D8D0121 for ; Fri, 15 Dec 2023 07:44:36 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 4BCBC40CE3 for ; Fri, 15 Dec 2023 12:44:36 +0000 (UTC) X-FDA: 81569021352.23.E56CECF Received: from out0-201.mail.aliyun.com (out0-201.mail.aliyun.com [140.205.0.201]) by imf18.hostedemail.com (Postfix) with ESMTP id 96C331C0013 for ; Fri, 15 Dec 2023 12:44:30 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=antgroup.com; spf=pass (imf18.hostedemail.com: domain of henry.hj@antgroup.com designates 140.205.0.201 as permitted sender) smtp.mailfrom=henry.hj@antgroup.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702644274; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=A20vANv0qziX5EvAFM+tE9KI46VJPd2Riq2IaQKrnLI=; b=zfaB5jqlHnKWjDIn5qwq7HI7ITE+3Y512S0FFRyDCrc+UbQpjMmJgeKQjoMOCW4be7RC0x KirfIGBpFIE3h+6aLsRQoLZ0dDi1Vhk/GCncgmfz+3+gM5JejBDGHxKinDTsNphYM1NLqo ao75vt33+SB6RInGbW0/E7XJIh4ZGMs= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=antgroup.com; spf=pass (imf18.hostedemail.com: domain of henry.hj@antgroup.com designates 140.205.0.201 as permitted sender) smtp.mailfrom=henry.hj@antgroup.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702644274; a=rsa-sha256; cv=none; b=NTXc36cueKtPdlyj0EuGOc7cQpn+TnjDJydsQZVRX/49hqgymByUdaWLCaUBcTU46aY+PO 3wu90C6AZcpE7iZ4itL/WxHI/IHu51dzONN9WKT+75yk1p+rwZpQgqGr0u/7rt5zP0J6xl GNBNK3A0f8A7wqKlmaUsH9oAqrxTZlU= X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R641e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018047198;MF=henry.hj@antgroup.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---.Vl7PTuA_1702644264; Received: from localhost(mailfrom:henry.hj@antgroup.com fp:SMTPD_---.Vl7PTuA_1702644264) by smtp.aliyun-inc.com; Fri, 15 Dec 2023 20:44:25 +0800 From: "Henry Huang" To: yuzhao@google.com Cc: , "Henry Huang" , "=?UTF-8?B?6LCI6Ym06ZSL?=" , , , "=?UTF-8?B?5pyx6L6JKOiMtuawtCk=?=" Subject: Re: [RFC v2] mm: Multi-Gen LRU: fix use mm/page_idle/bitmap Date: Fri, 15 Dec 2023 20:44:17 +0800 Message-ID: <20231215124423.88878-1-henry.hj@antgroup.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 96C331C0013 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: kuorkoy38k38rbd8iwwsoa3sbcms9bkd X-HE-Tag: 1702644270-708270 X-HE-Meta: U2FsdGVkX1+/qSZublZaO+nUPmTKGS27pVlGGiqcpID0jLksg1wzNRh6e2Obkb3npMC8dKwb6lxC5j1Qk130lhBTaGKPpQpsSGbc0TgHR2ZA1zpamZ97xZwVwmYRfNDpinhRHMIAa4jrPjJOW5VYLscF7GRgzzA0zNCpx6j/dYqEZs+lnLj98dStTgbUHb0IVaYAYo8WRXdbH6Zy0BGsj1DZ/RheWnhnKI65L9wo8YWEXtTUHAi7h4m9idi7cqCijB4IaaxEmESJFGBy4CGRCCtYTroPKThNDOeXBFA1ANb+EpVoEFDKYJCK3hpc5LNng2JiCqZnlwNhkQeDo0q7NZLTsdMvDTTtUu3ftPoPNvgtdloBLawqoVF6ZusHE4o65wPCjpf/m98OJzhMQ9WUSosgXEu5pEx0zVuKdS1E/U7iVSTTNR0B3T6Tcb8h2uBJfvFhPr+L1wu7aIss2wA/fTKKa5YV0e3hMZDj/0/aNzsuMS8SaWWq3BA8ADCOgYLPK8SZqSqKgeTCcTsT0PfdBDSzjHybldRJImhHLHB4vcUTOBDEToKGRTMM4IFNbRc2hjI2ZDU3FOvf7GgElISqPG/SbtixO0UUeVAfe3d85JF0WC8eTEipFZk9IEzDzJ0Zv9/0MieNE+HqN7g4J4efEVC8nX/+mDS8iu8dG1LRPL3ll8F/tB+mSIND8kKQN5exYF0CAMZ+jXNCLl8SR1YMxGXKDZ8OBFmfL/j8IZxXfq9b7PNpi2Jz2U+tjVr18qbYmxrYMEfs0lZoGFCnPQ2emisS4d2WWLWriZ29uC0IhHuFcIof+Gatu/1tYilrrmnS/YFU4NLp1RZlfqB0qIao9sHsUegiyXP/DioJo4KXfUqfe5ene0xRTsr4xGyzS4Fi5mnckZ34PX2SaLykevepw1/qO2OS5ODeVi8XcNWnreK/T3++pJJ5gL2MIgfH1rQ88/SNPFEFg9dIo9e7rSp 557XAgDV dn3WdEDjWoIDtFfv5ZUO1Z0iwI9xdd0fZWOivWSHPUJ561MwAHE5v31IGSS5E8gVRVkSA8j9QJRK46Rx8+oFoQf3nZZR3VA+Au/DwXQz12zfHzCLk4MnR5AD6eGKr5QBTZaObAt3iXxMuTtTKRQlkW+y/AIEdLpc9AgLde+3OwNg67KRDCTPLNnuEhg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Dec 15, 2023 at 15:23 PM Yu Zhao wrote: >Regarding the change itself, it'd cause a slight regression to other >use cases (details below). > > > @@ -3355,6 +3359,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, > > unsigned long pfn; > > struct folio *folio; > > pte_t ptent = ptep_get(pte + i); > > + bool is_pte_young; > > > > total++; > > walk->mm_stats[MM_LEAF_TOTAL]++; > > @@ -3363,16 +3368,20 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, > > if (pfn == -1) > > continue; > > > > - if (!pte_young(ptent)) { > > - walk->mm_stats[MM_LEAF_OLD]++; > > Most overhead from page table scanning normally comes from > get_pfn_folio() because it almost always causes a cache miss. This is > like a pointer dereference, whereas scanning PTEs is like streaming an > array (bad vs good cache performance). > > pte_young() is here to avoid an unnecessary cache miss from > get_pfn_folio(). Also see the first comment in get_pfn_folio(). It > should be easy to verify the regression -- FlameGraph from the > memcached benchmark in the original commit message should do it. > > Would a tracepoint here work for you? > > > > > + is_pte_young = !!pte_young(ptent); > > + folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap, is_pte_young); > > + if (!folio) { > > + if (!is_pte_young) > > + walk->mm_stats[MM_LEAF_OLD]++; > > continue; > > } > > > > - folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); > > - if (!folio) > > + if (!folio_test_clear_young(folio) && !is_pte_young) { > > + walk->mm_stats[MM_LEAF_OLD]++; > > continue; > > + } > > > > - if (!ptep_test_and_clear_young(args->vma, addr, pte + i)) > > + if (is_pte_young && !ptep_test_and_clear_young(args->vma, addr, pte + i)) > > VM_WARN_ON_ONCE(true); > > > > young++; Thanks for replying. For avoiding below: 1. confict between page_idle/bitmap and mglru scan 2. performance downgrade in mglru page-table scan if call get_pfn_folio for each pte. We have a new idea: 1. Include a new api under /sys/kernel/mm/page_idle, support mark idle flag only, without rmap walking or clearing pte young. 2. Use mglru proactive scan to clear page idle flag. workflows: t1 t2 mark pages idle mglru scan and check pages idle It's easy for us to know that whether a page is accessed during t1~t2. Some code changes like these: We clear idle flags in get_pfn_folio, and in walk_pte_range we still follow original design. static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg, struct pglist_data *pgdat, bool can_swap, bool clear_idle) { struct folio *folio; /* try to avoid unnecessary memory loads */ if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) return NULL; folio = pfn_folio(pfn); + + if (clear_idle && folio_test_idle(folio)) + folio_clear_idle(folio); + if (folio_nid(folio) != pgdat->node_id) return NULL; if (folio_memcg_rcu(folio) != memcg) return NULL; /* file VMAs can contain anon pages from COW */ if (!folio_is_file_lru(folio) && !can_swap) return NULL; return folio; } static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, struct mm_walk *args) { ... for (i = pte_index(start), addr = start; addr != end; i++, addr += PAGE_SIZE) { unsigned long pfn; struct folio *folio; pte_t ptent = ptep_get(pte + i); total++; walk->mm_stats[MM_LEAF_TOTAL]++; pfn = get_pte_pfn(ptent, args->vma, addr); if (pfn == -1) continue; if (!pte_young(ptent)) { walk->mm_stats[MM_LEAF_OLD]++; continue; } + folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap, true); - folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) continue; ... } Is it a good idea or not ? -- 2.43.0