From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F9D0C00144 for ; Mon, 1 Aug 2022 22:35:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E920B6B0071; Mon, 1 Aug 2022 18:35:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E41916B0072; Mon, 1 Aug 2022 18:35:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D09268E0001; Mon, 1 Aug 2022 18:35:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C19736B0071 for ; Mon, 1 Aug 2022 18:35:22 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 90F3F1A0D6C for ; Mon, 1 Aug 2022 22:35:22 +0000 (UTC) X-FDA: 79752481284.24.E0E818A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf11.hostedemail.com (Postfix) with ESMTP id 281F94005A for ; Mon, 1 Aug 2022 22:35:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1659393321; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=BGq/9sKWsp0P6HkN6ECK5vjCzI7FVypb04D1n+18zbc=; b=gRSBAiNg3akrW3bOuWMCrI4NxNntvvMIZOyG/br2uietjYrT2er+FvDjwPTJsAHYXPujsL wRMxdLiDCpuVQqyAPZ9gCw8EdPsDUtuaifxqPmnq27StWYrMHdbknMuNaj2Oycb1iMCa9o rWfdnrnM9VhQmLJKVr5rL4SZ0HYT/fg= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-101-f0mQlEtnNPmQeC-ZUPU0vw-1; Mon, 01 Aug 2022 18:35:20 -0400 X-MC-Unique: f0mQlEtnNPmQeC-ZUPU0vw-1 Received: by mail-qk1-f200.google.com with SMTP id de4-20020a05620a370400b006a9711bd9f8so10179683qkb.9 for ; Mon, 01 Aug 2022 15:35:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=BGq/9sKWsp0P6HkN6ECK5vjCzI7FVypb04D1n+18zbc=; b=sPuwVjeSUrMY9nmLn2us6lS8wUXKea6zLCKmrdpXsfy4m19WKwiMEi9jjmz+JGt/Oq M2fTAs7mAvg3yomk7WhoEXJ27sfJS9xzGZa/7LNqS8DI6gC0PY4kde9i704kFrOVbFpT D7sOUX6ZqquZfzuqJ9oVijVteVB2uAqBl1wbc+yAXwIoCCynk/Wtxe9rQbd8Nfa00mjh WVyNcX20YDZY8gg7+vmm49bfUkZnrEYREaDtjIz60RP5uX8p4HJzraf8jUilw0h2bvE5 VyIyePJwb9xYxY/QXGsU11T1ZIJDafbHiD4p3hCNInpoNm090eZNS/BtLePqZdEmvUQc /w/Q== X-Gm-Message-State: AJIora/ZwTEcYZHC069ZDU0WWpST1iROaTjed72gkze52KhOyfxLbKBJ 5KjLDm0t+jc7M+Y5t+0xbmHeElYi8DVmhnz5uLTcRgQibToehErX4cJ+VMWUXXTCh06xljKzD/N 89LDw7msfE+c= X-Received: by 2002:a05:620a:2804:b0:6b8:62a5:71bf with SMTP id f4-20020a05620a280400b006b862a571bfmr12962972qkp.545.1659393320210; Mon, 01 Aug 2022 15:35:20 -0700 (PDT) X-Google-Smtp-Source: AGRyM1smlmETeKn/0A2gBE08E372BSHdi6gbCTTefeQWYqHn5sTafMHwXCGBPznb08dz8HUSiP5Vbw== X-Received: by 2002:a05:620a:2804:b0:6b8:62a5:71bf with SMTP id f4-20020a05620a280400b006b862a571bfmr12962958qkp.545.1659393319964; Mon, 01 Aug 2022 15:35:19 -0700 (PDT) Received: from xz-m1.local (bras-base-aurron9127w-grc-35-70-27-3-10.dsl.bell.ca. [70.27.3.10]) by smtp.gmail.com with ESMTPSA id d17-20020a05622a15d100b0031eb215a682sm8346216qty.13.2022.08.01.15.35.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Aug 2022 15:35:19 -0700 (PDT) Date: Mon, 1 Aug 2022 18:35:18 -0400 From: Peter Xu To: David Hildenbrand Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , Andrea Arcangeli , Andrew Morton , "Kirill A . Shutemov" , Nadav Amit , Hugh Dickins , Vlastimil Babka Subject: Re: [PATCH RFC 0/4] mm: Remember young bit for migration entries Message-ID: References: <20220729014041.21292-1-peterx@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659393322; a=rsa-sha256; cv=none; b=J/Vr6hFSkSazbsdvFLIQsWGRrq1EahEqlafa+j2Z0xg02ZQFO/lNluK0fREXMen8EsRWFh skwZtRkC3733uCrzypiK6WJAGEYuhxDsTqCHCe+rXjkXX9snfQZJOIZI9RUD4STMCOnJBc QtaX83gE6rA7TdkJkoCmsVQwKIYrFK4= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=gRSBAiNg; spf=pass (imf11.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659393322; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BGq/9sKWsp0P6HkN6ECK5vjCzI7FVypb04D1n+18zbc=; b=HJmaMxjLwHEw+j1rshqQHCg/UqG4kfVPKoj94KPOdSEDTJMhJSZJhDmvedsw1waGldroTD KIfyQk+/tIZRCbUNz+kD5FvsaZS8FW9IOMoz2ff4pkfrxP3F3YwNgKWvjKOdyA2trOCeoB cd8BIw2vsCwuDpfuUe/xKHZHlpMQEH0= X-Stat-Signature: jpu55eersqfabaywe3ztf5cgxhga8bsc X-Rspamd-Queue-Id: 281F94005A X-Rspam-User: Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=gRSBAiNg; spf=pass (imf11.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam03 X-HE-Tag: 1659393321-450418 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Aug 01, 2022 at 10:21:32AM +0200, David Hildenbrand wrote: > On 29.07.22 03:40, Peter Xu wrote: > > [Marking as RFC; only x86 is supported for now, plan to add a few more > > archs when there's a formal version] > > > > Problem > > ======= > > > > When migrate a page, right now we always mark the migrated page as old. > > The reason could be that we don't really know whether the page is hot or > > cold, so we could have taken it a default negative assuming that's safer. > > > > However that could lead to at least two problems: > > > > (1) We lost the real hot/cold information while we could have persisted. > > That information shouldn't change even if the backing page is changed > > after the migration, > > > > (2) There can be always extra overhead on the immediate next access to > > any migrated page, because hardware MMU needs cycles to set the young > > bit again (as long as the MMU supports). > > > > Many of the recent upstream works showed that (2) is not something trivial > > and actually very measurable. In my test case, reading 1G chunk of memory > > - jumping in page size intervals - could take 99ms just because of the > > extra setting on the young bit on a generic x86_64 system, comparing to 4ms > > if young set. > > > > This issue is originally reported by Andrea Arcangeli. > > > > Solution > > ======== > > > > To solve this problem, this patchset tries to remember the young bit in the > > migration entries and carry it over when recovering the ptes. > > > > We have the chance to do so because in many systems the swap offset is not > > really fully used. Migration entries use swp offset to store PFN only, > > while the PFN is normally not as large as swp offset and normally smaller. > > It means we do have some free bits in swp offset that we can use to store > > things like young, and that's how this series tried to approach this > > problem. > > > > One tricky thing here is even though we're embedding the information into > > swap entry which seems to be a very generic data structure, the number of > > bits that are free is still arch dependent. Not only because the size of > > swp_entry_t differs, but also due to the different layouts of swap ptes on > > different archs. > > > > Here, this series requires specific arch to define an extra macro called > > __ARCH_SWP_OFFSET_BITS represents the size of swp offset. With this > > information, the swap logic can know whether there's extra bits to use, > > then it'll remember the young bits when possible. By default, it'll keep > > the old behavior of keeping all migrated pages cold. > > > > > I played with a similar idea when working on pte_swp_exclusive() but > gave up, because it ended up looking too hacky. Looking at patch #2, I > get the same feeling again. Kind of hacky. Could you explain what's the "hacky" part you mentioned? I used swap entry to avoid per-arch operations. I failed to figure out a common way to know swp offset length myself so unluckily in this RFC I still needed one macro per-arch. Ying's suggestion seems to be a good fit here to me to remove the last arch-specific dependency. > > > If we mostly only care about x86_64, and it's a performance improvement > after all, why not simply do it like > pte_swp_mkexclusive/pte_swp_exclusive/ ... and reuse a spare PTE bit? Page migration works for most archs, I want to have it work for all archs that can easily benefit from it. Besides I actually have a question on the anon exclusive bit in the swap pte: since we have that anyway, why we need a specific migration type for anon exclusive pages? Can it be simply read migration entries with anon exclusive bit set? Thanks, -- Peter Xu