From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8228C00140 for ; Mon, 1 Aug 2022 03:20:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4CA478E0002; Sun, 31 Jul 2022 23:20:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A05D8E0001; Sun, 31 Jul 2022 23:20:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 368228E0002; Sun, 31 Jul 2022 23:20:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 284848E0001 for ; Sun, 31 Jul 2022 23:20:42 -0400 (EDT) Received: from smtpin31.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 072141C600F for ; Mon, 1 Aug 2022 03:20:42 +0000 (UTC) X-FDA: 79749571524.31.E4CA5A9 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf22.hostedemail.com (Postfix) with ESMTP id 159CEC0036 for ; Mon, 1 Aug 2022 03:20:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1659324041; x=1690860041; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=5Whp6MWFMT28EK86x1BUhJ+MuJ5Hi7tz5V8CFLUdxLE=; b=Wuxm7fZp1ZHohFJ40lugFEEfEEecl62lPUQQ9/cqjt0WwFQfUEJuzYml Sor258hkUnoUBEQhKdIwHhxdwTE4meQwrWoDp0sqEJommO2bflL5oJ7Gf L5Bdaqr2bm3jmUVTxOM/DtrYOtgkF1A9yBKl5eg0EN0EOGM5n5YS/urKR mbEOMqcNGg5sVyyKwcUfKI9TetiOMIk1PhN0c1A8BbSOWxyYr7i3UNtI5 bcFNByJ6RsgwETypwMnC1gIc1axoPFsvRCnh7P5dGp+uRJrkIZilTbyKA Tdbdq7xi96Um7ZtdowQym0U6USQWFrarHJgSod8cuWToPKM9U9poJ4e4p g==; X-IronPort-AV: E=McAfee;i="6400,9594,10425"; a="275952355" X-IronPort-AV: E=Sophos;i="5.93,206,1654585200"; d="scan'208";a="275952355" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2022 20:20:39 -0700 X-IronPort-AV: E=Sophos;i="5.93,206,1654585200"; d="scan'208";a="552370199" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2022 20:20:36 -0700 From: "Huang, Ying" To: Peter Xu Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrea Arcangeli , Andrew Morton , "Kirill A . Shutemov" , Nadav Amit , Hugh Dickins , David Hildenbrand , Vlastimil Babka Subject: Re: [PATCH RFC 0/4] mm: Remember young bit for migration entries References: <20220729014041.21292-1-peterx@redhat.com> Date: Mon, 01 Aug 2022 11:20:32 +0800 In-Reply-To: <20220729014041.21292-1-peterx@redhat.com> (Peter Xu's message of "Thu, 28 Jul 2022 21:40:37 -0400") Message-ID: <87y1w8k5tr.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659324041; a=rsa-sha256; cv=none; b=Uu8eHhdDsoPCFus7c1t8B7oJgS2C/RBA49VYwnrdbapNqFJ5RMAkieUpIdrhz3vHMNjIfg TzfFRPdYV8FXnU5kMMEO5QLVT0NyMyz6n/PNpMJk2bGp4oSl8GE/vRx2A9wQnwUx1NB5lR 3Uc6F+w0QonLMmV7ih+YePaTN1yXjIE= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=Wuxm7fZp; spf=pass (imf22.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659324041; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JcCL4bjKm3/p7iPymUWMZR51dBd+4L4k41SQy40emjs=; b=Td6qvDx9CGEfDGTAZ0x7yOQFWpF4ZAW6v+NpKpTRftzZOgprVMqX5my0UYamJMtX0ovtLM cCQl39wx9SGh76dDT9b8OYMi9PffeUsKUuZexokqq7k5RcbB3xRIVeZoO9ZqzN/ZPPZJFt A8pV/dhJdfGM7AU+JysD8GZ/vbLwgZw= Authentication-Results: imf22.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=Wuxm7fZp; spf=pass (imf22.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Stat-Signature: w63m1rfpe69ciyk4jjnczrwhfe4tebjt X-Rspamd-Queue-Id: 159CEC0036 X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1659324040-608558 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Peter Xu writes: > [Marking as RFC; only x86 is supported for now, plan to add a few more > archs when there's a formal version] > > Problem > ======= > > When migrate a page, right now we always mark the migrated page as old. > The reason could be that we don't really know whether the page is hot or > cold, so we could have taken it a default negative assuming that's safer. > > However that could lead to at least two problems: > > (1) We lost the real hot/cold information while we could have persisted. > That information shouldn't change even if the backing page is changed > after the migration, > > (2) There can be always extra overhead on the immediate next access to > any migrated page, because hardware MMU needs cycles to set the young > bit again (as long as the MMU supports). > > Many of the recent upstream works showed that (2) is not something trivial > and actually very measurable. In my test case, reading 1G chunk of memory > - jumping in page size intervals - could take 99ms just because of the > extra setting on the young bit on a generic x86_64 system, comparing to 4ms > if young set. LKP has observed that before too, as in the following reports and discussion. https://lore.kernel.org/all/87bn35zcko.fsf@yhuang-dev.intel.com/t/ Best Regards, Huang, Ying > This issue is originally reported by Andrea Arcangeli. > > Solution > ======== > > To solve this problem, this patchset tries to remember the young bit in the > migration entries and carry it over when recovering the ptes. > > We have the chance to do so because in many systems the swap offset is not > really fully used. Migration entries use swp offset to store PFN only, > while the PFN is normally not as large as swp offset and normally smaller. > It means we do have some free bits in swp offset that we can use to store > things like young, and that's how this series tried to approach this > problem. > > One tricky thing here is even though we're embedding the information into > swap entry which seems to be a very generic data structure, the number of > bits that are free is still arch dependent. Not only because the size of > swp_entry_t differs, but also due to the different layouts of swap ptes on > different archs. > > Here, this series requires specific arch to define an extra macro called > __ARCH_SWP_OFFSET_BITS represents the size of swp offset. With this > information, the swap logic can know whether there's extra bits to use, > then it'll remember the young bits when possible. By default, it'll keep > the old behavior of keeping all migrated pages cold. > > Tests > ===== > > After the patchset applied, the immediate read access test [1] of above 1G > chunk after migration can shrink from 99ms to 4ms. The test is done by > moving 1G pages from node 0->1->0 then read it in page size jumps. > > Currently __ARCH_SWP_OFFSET_BITS is only defined on x86 for this series and > only tested on x86_64 with Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz. > > Patch Layout > ============ > > Patch 1: Add swp_offset_pfn() and apply to all pfn swap entries, we should > also stop treating swp_offset() as PFN anymore because it can > contain more information starting from next patch. > Patch 2: The core patch to remember young bit in swap offsets. > Patch 3: A cleanup for x86 32 bits pgtable.h. > Patch 4: Define __ARCH_SWP_OFFSET_BITS on x86, enable young bit for migration > > Please review, thanks. > > [1] https://github.com/xzpeter/clibs/blob/master/misc/swap-young.c > > Peter Xu (4): > mm/swap: Add swp_offset_pfn() to fetch PFN from swap entry > mm: Remember young bit for page migrations > mm/x86: Use SWP_TYPE_BITS in 3-level swap macros > mm/x86: Define __ARCH_SWP_OFFSET_BITS > > arch/arm64/mm/hugetlbpage.c | 2 +- > arch/x86/include/asm/pgtable-2level.h | 6 ++ > arch/x86/include/asm/pgtable-3level.h | 15 +++-- > arch/x86/include/asm/pgtable_64.h | 5 ++ > include/linux/swapops.h | 85 +++++++++++++++++++++++++-- > mm/hmm.c | 2 +- > mm/huge_memory.c | 10 +++- > mm/memory-failure.c | 2 +- > mm/migrate.c | 4 +- > mm/migrate_device.c | 2 + > mm/page_vma_mapped.c | 6 +- > mm/rmap.c | 3 +- > 12 files changed, 122 insertions(+), 20 deletions(-)