From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5D326C4332F
	for <linux-mm@archiver.kernel.org>; Thu, 15 Dec 2022 21:39:01 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id B08148E0003; Thu, 15 Dec 2022 16:39:00 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id A91A08E0002; Thu, 15 Dec 2022 16:39:00 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 90B188E0003; Thu, 15 Dec 2022 16:39:00 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16])
	by kanga.kvack.org (Postfix) with ESMTP id 7A8DE8E0002
	for <linux-mm@kvack.org>; Thu, 15 Dec 2022 16:39:00 -0500 (EST)
Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay08.hostedemail.com (Postfix) with ESMTP id 44C36140766
	for <linux-mm@kvack.org>; Thu, 15 Dec 2022 21:39:00 +0000 (UTC)
X-FDA: 80245856040.22.65EDB73
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by imf25.hostedemail.com (Postfix) with ESMTP id 6633DA0018
	for <linux-mm@kvack.org>; Thu, 15 Dec 2022 21:38:58 +0000 (UTC)
Authentication-Results: imf25.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=X3wawY+W;
	spf=pass (imf25.hostedemail.com: domain of npache@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=npache@redhat.com;
	dmarc=pass (policy=none) header.from=redhat.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1671140338;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=9NacCVxVJ3GJc/ioljESemL+AiTK9QAc3Zi0dYSUUF0=;
	b=IQb3RjVkdq6GxQSAhuIVVSGmtjfffGvRNCcvqsBMJCBTV0mCyVpQIN9l0uPOBCZQAHcwLT
	mWuFA8nwjh3JZXxM7w69qVZNi3UWM7ziKIjqxH65aI4COBXFXzLxFu5bT9MT70YnQ0xokg
	CBiF2RsLFn0+WQn/FqlO+DqJyEiblUI=
ARC-Authentication-Results: i=1;
	imf25.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=X3wawY+W;
	spf=pass (imf25.hostedemail.com: domain of npache@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=npache@redhat.com;
	dmarc=pass (policy=none) header.from=redhat.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671140338; a=rsa-sha256;
	cv=none;
	b=tdT+uYBBcNmz5hMGpbwirdGGuGNLQ2DloACSLw3jGfra0jeW8eFJpTNqvm+clbqk87X2CW
	aTPp4qCxhUhW5XjKdXuKadTiJFTL/fSDJDK5uGspAldBjf1v+PlqoEtMANciWLLyPYBHBM
	jg+t4ojYaTxX4eSNIPLqOfulNliblcc=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1671140337;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=9NacCVxVJ3GJc/ioljESemL+AiTK9QAc3Zi0dYSUUF0=;
	b=X3wawY+WAHrQBo7MwId47BY3uBvDxT8GS2tVHilxCJmqFKFJe1za2nRJ0pl01yi4uUyCrj
	Lohxcsz53es+UA4sI3CIGn3bPSuCVbkDlaKYKpuvTG6sRwTNRTPUoU6HdjpCzqEzSUYpmJ
	cI2MZG0nZiQAdtYkpS3AR0CnxXQFAcI=
Received: from mail-yb1-f199.google.com (mail-yb1-f199.google.com
 [209.85.219.199]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id
 us-mta-491-y1FL-qIrPDOtR7DvJl8wDg-1; Thu, 15 Dec 2022 16:38:55 -0500
X-MC-Unique: y1FL-qIrPDOtR7DvJl8wDg-1
Received: by mail-yb1-f199.google.com with SMTP id v195-20020a252fcc000000b007125383fe0dso251057ybv.23
        for <linux-mm@kvack.org>; Thu, 15 Dec 2022 13:38:55 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=9NacCVxVJ3GJc/ioljESemL+AiTK9QAc3Zi0dYSUUF0=;
        b=uXbvU9pnE7aiMS8x05ZkRw3UcvuRN+oo6Aluy24rU1/s7bDPhpGECWG+liees/Rc8L
         Kma6O+PEOFOjFrGxQ+h1a1ieJ5kKZiqVx3r24eHmI92XXiovqye2snVZZzpuJeBDruhG
         NzdWP4vU7XyZojdYymcpzihpVPzqtJ7AnTTV7PaUaODMEMjdXS4y84oIS8cAc+X3AXq1
         xJL8mu/V3FM17uKgyJUoLBv/zhWygzW0AWY+xEHwjJQONGFhZO0GHN17nJ9XS68A7pCl
         X42tt7ZUTLSXmPw6eOuMFFIfdaBG4ZVJ47PyPexXVWyOXN/tYCuiE5t33TFxkxJvH/J7
         D0Pw==
X-Gm-Message-State: ANoB5plofunBptVT/ow5PD6N7S5JrAI148TRsYDWgKy+uFvmPMkZbcr6
	UMwFpoW6vKa0CB1Z2tfYQaxBmSOi3uPLJE5uEfszfN6Ep+5yPt1hxmvMajnqI53Yw8maedrWSgZ
	ulJ6+C8xh13qcwmLiPa//5D6kJvA=
X-Received: by 2002:a0d:f287:0:b0:3c5:af7:5646 with SMTP id b129-20020a0df287000000b003c50af75646mr2091760ywf.320.1671140335065;
        Thu, 15 Dec 2022 13:38:55 -0800 (PST)
X-Google-Smtp-Source: AA0mqf65wJjV8B6XKPgEspTDQNHLYOYYw21NzWvJ6gEgmFI94zhjEAJTBrboFIDQ1LE+J1gXFbxgmBP0BbxnMKK+oeM=
X-Received: by 2002:a0d:f287:0:b0:3c5:af7:5646 with SMTP id
 b129-20020a0df287000000b003c50af75646mr2091750ywf.320.1671140334741; Thu, 15
 Dec 2022 13:38:54 -0800 (PST)
MIME-Version: 1.0
References: <20221213234505.173468-1-npache@redhat.com> <Y5oCD0gFV+Cq1JqJ@casper.infradead.org>
 <CAA1CXcA2dGeG2tzc+-OZ77eMVpnSN2SKkdtz9LqpLPywhJMOwA@mail.gmail.com>
In-Reply-To: <CAA1CXcA2dGeG2tzc+-OZ77eMVpnSN2SKkdtz9LqpLPywhJMOwA@mail.gmail.com>
From: Nico Pache <npache@redhat.com>
Date: Thu, 15 Dec 2022 14:38:28 -0700
Message-ID: <CAA1CXcCboCaVNgLv56Pc9ju95Yc9cK4XHCQObGPA_fbVZGxtTg@mail.gmail.com>
Subject: Re: [RFC V2] mm: add the zero case to page[1].compound_nr in set_compound_order
To: Matthew Wilcox <willy@infradead.org>, Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, muchun.song@linux.dev, 
	mike.kravetz@oracle.com, akpm@linux-foundation.org, 
	gerald.schaefer@linux.ibm.com, Waiman Long <llong@redhat.com>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset="UTF-8"
X-Rspam-User: 
X-Rspamd-Server: rspam04
X-Rspamd-Queue-Id: 6633DA0018
X-Stat-Signature: ca1uczxr85q9s5qkzukj1y145y4sc51g
X-HE-Tag: 1671140338-440913
X-HE-Meta: U2FsdGVkX1+3nVW2dQueOwnUDS2cXU6lhLfEqiTUm6McKJ6eOLZ3zT+dvjFqLixaLqGaqzGlQgpxxmaDouHN9Bm64XMKx0Tb5n/f/0rW2GPOXZkCTDOQ/93EfaSyNHggnd61olm8sOjdlgs9nCxajog7WqjMhTR2uqf8VT3aChzdelqKYVWICC1lyQht0ggCA27zJIDDwUgbHQB+6Tm2AH6Rn8F3qHNTRadJyhSKxJ1Fuz940t9GCuNUBFyyJUueKHK/QaEHQYMXGiaSF/9W0CwXIPQsk6LAisdOfUuL1nbaE/QcDMSpActq/0pvgPsAlS+hfSMxkjH2EpRquxP/aFNHVptnDxvALadoorEiQqeOe+zzMvzh/gZnxWMGZCkiTr3v5bjTjmxeuW31H7rKVY2sPVev4zbtrkU6c5CZl2BINfvny2VV6uvlzDFdYXbfEFVeIVeH6VJxKcSBZFi1S6EFlyIft614GPm7QyZruCHuEPacjWYlwiPiMhEXFssgfvX7NxO5r8RCLoWvn89Pje2qAfUBZTuSvhZ7/0nUE044iVXB0VqygqLY0zuATbAOX/omOhWfZcNR5QLmy7oXrtlP65tcNcJGRzQaWG02IcJNWsQY/SGjCyYBP+9mZGiE7MBRhxNuXwtaYaNpo7fqB7G7K/z/KmV1ayhq3ueuRfDOOpxW6q0e3h6loEDLojklUbhe1azaDjqdW5nwxgEmZ7F53C0CXRu3gpuQ7GKQKAB7OwyyJDrfxtRBFSK3BOZpPl0dQvRe7vmJnVtmutqLdhWBKhHaqCeRfzTsDCPxULgQSwb6Z+KcKmmxOgnauCgkmD6AEDpgqi9IyD6OuQIpAprrOgFJyl9gGTwkRbfht8duwN3tl4XNr9R36CE12acseF0JAhHklxrxW7C2xEt4RjP05NhAc9N23jc4dwYUpB6L+AWL9bOTL8ppC0ortoKlHzoUKWxA9idWIS7u1vj
 XUlnhWqm
 ZKT78pXkxWPu9/+mONuFANiyILw0cZzJ0UvphfhgdDatllnG71D22IADFtEllbg+qKdFqBv1LOa1o63I=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Wed, Dec 14, 2022 at 7:48 PM Nico Pache <npache@redhat.com> wrote:
>
> On Wed, Dec 14, 2022 at 10:04 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Tue, Dec 13, 2022 at 04:45:05PM -0700, Nico Pache wrote:
> > > Since commit 1378a5ee451a ("mm: store compound_nr as well as
> > > compound_order") the page[1].compound_nr must be explicitly set to 0 if
> > > calling set_compound_order(page, 0).
> > >
> > > This can lead to bugs if the caller of set_compound_order(page, 0) forgets
> > > to explicitly set compound_nr=0. An example of this is commit ba9c1201beaa
> > > ("mm/hugetlb: clear compound_nr before freeing gigantic pages")
> > >
> > > Collapse these calls into the set_compound_order by utilizing branchless
> > > bitmaths [1].
> > >
> > > [1] https://graphics.stanford.edu/~seander/bithacks.html#ConditionalSetOrClearBitsWithoutBranching
> > >
> > > V2: slight changes to commit log and remove extra '//' in the comments
> >
> > We don't usually use // comments anywhere in the kernel other than
> > the SPDX header.
>
> Whoops!
>
> > >  static inline void set_compound_order(struct page *page, unsigned int order)
> > >  {
> > > +     unsigned long shift = (1U << order);
> >
> > Shift is a funny name for this variable.  order is the shift.  this is 'nr'.
>
> Good point! Waiman found an even better/cleaner solution that would
> avoid needing an extra variable.
>     page[1].compound_nr = (1U << order) & ~1U;
>
> > >       page[1].compound_order = order;
> > >  #ifdef CONFIG_64BIT
> > > -     page[1].compound_nr = 1U << order;
> > > +     // Branchless conditional:
> > > +     // order  > 0 --> compound_nr = shift
> > > +     // order == 0 --> compound_nr = 0
> > > +     page[1].compound_nr = shift ^ (-order  ^ shift) & shift;
> >
> > Can the compiler see through this?  Before, the compiler sees:
> >
> >         page[1].compound_order = 0;
> >         page[1].compound_nr = 1U << 0;
> > ...
> >         page[1].compound_nr = 0;
> >
> > and it can eliminate the first store.
>
> This may be the case at the moment, but with:
> https://lore.kernel.org/linux-mm/20221213212053.106058-1-sidhartha.kumar@oracle.com/
> we will have a branch instead. Sidhartha tested it and found no
> regression; the concern is that if THPs get implemented using this
> callpath then we may end up seeing a slowdown.
>
> After doing my analysis below I dont think this is the case for the
> destroy case(at least on x86).
> In the destroy case for both the branch and branchless approach we see
> the compiler optimizing away the bitmath and the branch and setting
> the variable to zero.
> In the prep case we see the introduction of a test and cmovne
> instruction, implying a branch.
>
> > Now the compiler sees:
> >         unsigned long shift = (1U << 0);
> >         page[1].compound_order = order;
> >         page[1].compound_nr = shift ^ (0  ^ shift) & shift;
> >
> > Does it do the maths at compile-time, knowing that order is 0 at this
> > callsite and deducing that it can just store a 0?
> >
> > I think it might, since shift is constant-1,
> >
> >         page[1].compound_nr = 1 ^ (0 ^ 1) & 1;
> > ->      page[1].compound_nr = 1 ^ 1 & 1;
> > ->      page[1].compound_nr = 0 & 1;
> > ->      page[1].compound_nr = 0;
> >
> > But you should run it through the compiler and check the assembly
> > output for __destroy_compound_gigantic_page().
>
> Yep it does look like it gets optimized away for the destroy case:
>
> Bitmath Case (destroy)
> ---------------------------------
> Dump of assembler code for function __update_and_free_page:
> ...
> mov    %rsi,%rbp //move 2nd arg (page) to rbp
> ...
> movb   $0x0,0x51(%rbp) //page[1].compound_order = 0
> movl   $0x0,0x5c(%rbp)  //page[1].compound_nr = 0
> ...
>
> Math for movl : 0x5c (92) - 64 (sizeof page[0]) = 28
> pahole page: unsigned int compound_nr;        /*    28     4 */
>
> Bitmath Case (prep)
> ---------------------------------
> In the case of prep_compound_gigantic_page the bitmath is being computed
>    0xffffffff8134f17d <+13>:    mov    %rdi,%r12
>    0xffffffff8134f180 <+16>:    push   %rbp
>    0xffffffff8134f181 <+17>:    mov    $0x1,%ebp
>    0xffffffff8134f186 <+22>:    shl    %cl,%ebp
>    0xffffffff8134f188 <+24>:    neg    %ecx
>    0xffffffff8134f18a <+26>:    push   %rbx
>    0xffffffff8134f18b <+27>:    and    %ebp,%ecx
>    0xffffffff8134f18d <+29>:    mov    %sil,0x51(%rdi)
>    0xffffffff8134f191 <+33>:    mov    %ecx,0x5c(%rdi) //set page[1].compound_nr
>
> Now to break down the approach with the branch:
>
> Branch Case (destroy)
> ---------------------------------
>   No branch utilized to determine the following instructions.
>    0xffffffff813507bc <+236>:    movb   $0x0,0x51(%rbp)
>    0xffffffff813507c0 <+240>:    movl   $0x0,0x5c(%rbp)
>
> Branch  Case (prep)
> ---------------------------------
> The branch is being computed with the introduction of a cmovne instruction.
>    0xffffffff8134f15d <+13>:    mov    %rdi,%r12
>    0xffffffff8134f160 <+16>:    push   %rbp
>    0xffffffff8134f161 <+17>:    mov    $0x1,%ebp
>    0xffffffff8134f166 <+22>:    shl    %cl,%ebp
>    0xffffffff8134f168 <+24>:    test   %esi,%esi             //test
>    0xffffffff8134f16a <+26>:    push   %rbx
>    0xffffffff8134f16b <+27>:    cmovne %ebp,%ecx     //branch evaluation
>    0xffffffff8134f16e <+30>:    mov    %sil,0x51(%rdi)
>    0xffffffff8134f172 <+34>:    mov    %ecx,0x5c(%rdi)
>
To expand a little more on the analysis:
I computed the latency/throughput between <+24> and <+27> using
intel's manual (APPENDIX D):

The bitmath solutions shows a total latency of 2.5 with a Throughput of 0.5.
The branch solution show a total latency of 4 and throughput of 1.5.

Given this is not a tight loop, and the next instruction is requiring
the data computed, better (lower) latency is the more ideal situation.

Just wanted to add that little piece :)
 -- Nico

> So it looks like in the destruction of compound pages we'll see no
> gain or loss between the bitmath or branch approach.
> However, in the prep case we may see some performance loss once/if THP
> utilizes this path due to the branch and the loss of CPU
> parallelization that can be achieved utilizing the bitmath approach.
>
> Cheers,
> -- Nico