From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66A40C00140 for ; Tue, 2 Aug 2022 22:15:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DC0AD8E0002; Tue, 2 Aug 2022 18:15:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D709A8E0001; Tue, 2 Aug 2022 18:15:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C10908E0002; Tue, 2 Aug 2022 18:15:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id AFAA38E0001 for ; Tue, 2 Aug 2022 18:15:25 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 75810140F6C for ; Tue, 2 Aug 2022 22:15:25 +0000 (UTC) X-FDA: 79756059810.13.93CB1C2 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf24.hostedemail.com (Postfix) with ESMTP id 9711718011F for ; Tue, 2 Aug 2022 22:15:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1659478524; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=FqVRlEHur0d/fmsso+w0FT2BQvJDAm+QPlbVTAl1QMc=; b=OSzqSLvIU2rzjQBvrhv1k66hb4qW1nu3nAj4noRUqhbmJy0dslpZwsyW5FihOJIeZG1CEk hflYSv87aqyvPlbu5SJJx4Zg//fER7cCNggpKnIrbFb7Z/aB6lYmWIvzOPU4eItCtMmx+d eyv1anxYPoPCeco4ntjtwqZQz/qpSWw= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-131-aSRyp3VAOFm_epeIdEDNcw-1; Tue, 02 Aug 2022 18:15:23 -0400 X-MC-Unique: aSRyp3VAOFm_epeIdEDNcw-1 Received: by mail-qt1-f198.google.com with SMTP id t9-20020ac85309000000b0031ee055ad11so9830468qtn.12 for ; Tue, 02 Aug 2022 15:15:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=FqVRlEHur0d/fmsso+w0FT2BQvJDAm+QPlbVTAl1QMc=; b=zxVgsOH+VwF/fxPkGZi6JreMOE7jo94Cp76WpA/Yif2Vb971PJJ6iYnyFHYtU/g7nF 7CyrNhZJqWi71+UCJFwXorbCDwtFXf00JobyFY/l8tgbvVEr8Qp/QcTghwcg6e8LL5XH WhOLvEGTjT6qII503YOVdYnSUxc4kbYjSdB6sHW4m39dA3iObzoJh8gRIGeXoEJbDYoj T01ibDpTKSKaR3U9ynJl8Ra9SNjjD5XwmWoaqI6vLPsC7xicsFVgPcfF3OM10Ja4cW5m j8VRdMutpmcSdbt1LLzo8FnoDlKGfKBW3IiJMg5Hskv6ZHaX27LhX4btOJg876LjrTBX S3Eg== X-Gm-Message-State: AJIora8f8MiyG/DAI1gaaMHonWjx/KMbAvDYADIjNSkW0JlFCSN/7C7E FS0aSkl+pV5z6QOXbrN+dLCdtM2dZgN7iwdrK6NvemZo4AFoVXsBNToA7RP91+bCHWmrib+srMf /k3EhMPl2r/0= X-Received: by 2002:a05:622a:547:b0:31f:226a:c89 with SMTP id m7-20020a05622a054700b0031f226a0c89mr20174608qtx.417.1659478522561; Tue, 02 Aug 2022 15:15:22 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tsI8Yn1r/qgEcDC82gRZwFw6b4Ddz79TVKpLZddeqmj0YlSWs1QaSK7GzDZgUoKDhW1AA2ng== X-Received: by 2002:a05:622a:547:b0:31f:226a:c89 with SMTP id m7-20020a05622a054700b0031f226a0c89mr20174591qtx.417.1659478522270; Tue, 02 Aug 2022 15:15:22 -0700 (PDT) Received: from xz-m1.local (bras-base-aurron9127w-grc-35-70-27-3-10.dsl.bell.ca. [70.27.3.10]) by smtp.gmail.com with ESMTPSA id d21-20020ac84e35000000b0031ed3d79556sm9629389qtw.53.2022.08.02.15.15.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 02 Aug 2022 15:15:21 -0700 (PDT) Date: Tue, 2 Aug 2022 18:15:20 -0400 From: Peter Xu To: David Hildenbrand Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , Andrea Arcangeli , Andrew Morton , "Kirill A . Shutemov" , Nadav Amit , Hugh Dickins , Vlastimil Babka Subject: Re: [PATCH RFC 0/4] mm: Remember young bit for migration entries Message-ID: References: <20220729014041.21292-1-peterx@redhat.com> <49434bea-3862-1052-2993-8ccad985708b@redhat.com> <24ffea6e-ca66-2b94-c682-48a42a655fd1@redhat.com> <4f876ff0-c6d2-2ebb-5917-dc1ff98fa8b0@redhat.com> MIME-Version: 1.0 In-Reply-To: <4f876ff0-c6d2-2ebb-5917-dc1ff98fa8b0@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659478524; a=rsa-sha256; cv=none; b=smTkCEEIYMCo0DiuryylVOdy9C3FUFvFoebFEK+8rvlkCoUjiFo3rSsWiPsxQWqDw8tEW9 D9qn4OVcsJNZbUw0bPr50ZIzshEHs08FvydKdMuu9bTTat2EqNgxTb0IwBtzMRZxGBJUc4 dVry1fJbPaPG1sDukC58Dv4t5NfhFgI= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=OSzqSLvI; spf=pass (imf24.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659478524; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FqVRlEHur0d/fmsso+w0FT2BQvJDAm+QPlbVTAl1QMc=; b=G6RrFF35Pn85IrIzEt4XQhOn0J6TYK0Btb4Mo76+d0kFskbk76l3eZmdcgJL9pSxZqyHE0 yFzKKXNglHWV18YVYnkyQQB2PjeHV6TlOdcvdjryS7nD+VHCEHWnebxNKXqa+y8Mgo0aRY NTDuRkbBVaCjY6pHX+ijcDBYLLREgKQ= X-Rspamd-Server: rspam04 X-Stat-Signature: 1x77fnfdhk5zdn9inft1jj3g8h6ap1te Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=OSzqSLvI; spf=pass (imf24.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Queue-Id: 9711718011F X-Rspam-User: X-HE-Tag: 1659478524-569422 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Aug 02, 2022 at 10:59:42PM +0200, David Hildenbrand wrote: > On 02.08.22 22:35, Peter Xu wrote: > > On Tue, Aug 02, 2022 at 10:23:49PM +0200, David Hildenbrand wrote: > >>> I don't think we only care about x86_64? Should other archs have the same > >>> issue as long as there's the hardware young bit? > >>> > >>> Even without it, it'll affect page reclaim logic too, and that's also not > >>> x86 only. > >> > >> Okay, reading the cover letter and looking at the code my understanding > >> was that x86-64 is the real focus. > >> > >>>> > >>>>> > >>>>> Besides I actually have a question on the anon exclusive bit in the swap > >>>>> pte: since we have that anyway, why we need a specific migration type for > >>>>> anon exclusive pages? Can it be simply read migration entries with anon > >>>>> exclusive bit set? > >>>> > >>>> Not before all arch support pte_swp_mkexclusive/pte_swp_exclusive/. > >>>> > >>>> As pte_swp_mkexclusive/pte_swp_exclusive/ only applies to actual swap > >>>> PTEs, you could even reuse that bit for migration entries and get at > >>>> alteast the most relevant 64bit architectures supported easily. > >>> > >>> Yes, but I think having two mechanisms for the single problem can confuse > >>> people. > >>> > >> > >> It would be one bit with two different meanings depending on the swp type. > >> > >>> IIUC the swap bit is already defined in major archs anyway, and since anon > >>> exclusive bit is best-effort (or am I wrong?..), I won't worry too much on > >> > >> It kind-of is best effort, but the goal is to have all archs support it. > >> > >> ... just like the young bit here? > > > > Exactly, so I'm also wondering whether we can move the swp pte anon > > exclusive bit into swp entry. It just sounds weird to have them defined in > > two ways. > > I'd argue it's just the swp vs. nonswp difference that are in fact two > different concepts (device+offset vs. type+pte). And some dirty details > how swp entries are actually used. > > With swp entries you have to be very careful, for example, take a look > at radix_to_swp_entry() and swp_to_radix_entry(). That made me refrain > from touching anything inside actual swp entries and instead store it in > the pte. I don't really see any risk - it neither touches the swp entry nor do complicated things around it (shift 1 and set bit 0 to 1). Please feel free to share your concern in case I overlooked something, though. > > > > >> > >>> archs outside x86/arm/ppc/s390 on having anon exclusive bit lost during > >>> migrations, because afaict the whole swap type of ANON_EXCLUSIVE_READ is > >>> only servicing that very minority.. which seems to be a pity to waste the > >> > >> I have a big item on my todo list to support all, but I have different > >> priorities right now. > >> > >> If there is no free bit, simply steal one from the offset ... which is > >> the same thing your approach would do, just in a different way, no? > >> > >>> swp type on all archs even if the archs defined swp pte bits just for anon > >>> exclusive. > >> > >> Why do we care? We walk about one type not one bit. > > > > The swap type address space is still limited, I'd say we should save when > > possible. I believe people caring about swapping care about the limit of > > swap devices too. If the offset can keep it, I think it's better than the > > Ehm, last time I did the math I came to the conclusion that nobody > cares. Let me redo the math: > > MAX_SWAPFILES = 1<<5 - 1 - 1 - 4 - 3 - 1 = 22 > > Which is the worst case right now with all kinds of oddity compiled in > (sorry CONFIG_DEVICE_PRIVATE). > > So far nobody complaint. Yeah. To me using one bit of it is fine especially if that's the best to do. Here what confuses me is we have two ways to represent "whether this page is anon exclusive" in a swap pte, either we need to go into the type or using the bit. The trick here is whether the swap pte bit makes sense depends on the swp type first too, while the swap type can be "anon exclusive read migration" itself. > > > swap type. De-dup either the type or the swap pte bit would be nicer, imho. > > > > If you manage bits in the pte manually, you might be able to get a > better packing density, if bits are scattered around. Just take a look > at the x86_64 location of _PAGE_SWP_EXCLUSIVE. > > What I'm rooting for is something like > > #define pte_nonswp_mkyoung pte_swp_mkexclusive > > Eventually with some VM_BUG_ONs to make sure people call it on the right > swp ptes. > > If we ever want to get rid of SWP_MIGRATION_READ_EXCLUSIVE (so people > can have 23 swap devices), and eventually have separate bits for both. > For now it's not necessary. Okay, but that's probably the last thing I'd like to try - I don't want to further complicate the anon exclusive bit in swap pte.. I'd think cleaning that up somehow would be nice, but as you mentioned if no one complains I have no strong opinion either. Thanks, -- Peter Xu