From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E7E7C5321D for ; Fri, 23 Aug 2024 19:49:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CD27C6B020F; Fri, 23 Aug 2024 15:49:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C32B6800B4; Fri, 23 Aug 2024 15:49:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AFA3A6B0246; Fri, 23 Aug 2024 15:49:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 91A9B6B020F for ; Fri, 23 Aug 2024 15:49:28 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 17B23161E34 for ; Fri, 23 Aug 2024 19:49:28 +0000 (UTC) X-FDA: 82484549616.22.2881C0F Received: from gentwo.org (gentwo.org [62.72.0.81]) by imf10.hostedemail.com (Postfix) with ESMTP id 6EBA2C0003 for ; Fri, 23 Aug 2024 19:49:26 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gentwo.org header.s=default header.b=FAEXlFLs; spf=pass (imf10.hostedemail.com: domain of cl@gentwo.org designates 62.72.0.81 as permitted sender) smtp.mailfrom=cl@gentwo.org; dmarc=pass (policy=reject) header.from=gentwo.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724442485; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hw7PdODNRfzVBPSwWgZinuFFBszzy+Gg/eKi0wKxsbA=; b=Ga3z1MvachK06LpqzSgWvD8VkEO5JaYw+a2qO8Xj+P6lBIbxzbzLEz/yiguWKRPlGBFgm7 uZzfTjCe0zVDaILiDTAF2EBHddsVlI8FW62QoYjtv1FBB+kN/Nwzv3wAo+54CvjR64RK9B aeJ+l0SkXJvZ8umkyCMZXDFlmLav9s4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724442485; a=rsa-sha256; cv=none; b=y86bB6O3YCqRxBVnCjPrhAtO8OOHS/shfaUwAJ/1JI2jD8k5It1cD5lSFdm/Xc1BL4ok59 SAy3IFqHpw+qweEhioTEM0ZUC24NOSQNePgEvEjUPiMT6baUTF+CSGavh3uHOB47ODoWPC vV8d382mwzYYbOun19jK0Y2alt3kSLM= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gentwo.org header.s=default header.b=FAEXlFLs; spf=pass (imf10.hostedemail.com: domain of cl@gentwo.org designates 62.72.0.81 as permitted sender) smtp.mailfrom=cl@gentwo.org; dmarc=pass (policy=reject) header.from=gentwo.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gentwo.org; s=default; t=1724441885; bh=4i+qKKvggQ5Z8O0Lc4TxnEKHKLERGGdbZV3Gjf/BCqs=; h=Date:From:To:cc:Subject:In-Reply-To:References:From; b=FAEXlFLsex+v8IS+kdbKBGhzllBlekoCJBcOnjFHTMa3NCEPL7mcevfqIfdmTckxd il07Ja18EcWtZvCoNQwV7ihRhFmeqAf4T7feQunp2U6+pBwZKCWbvoSLLOqqrYTmvE GAFqOte7PKRJ8DdReSmIZAoo8+bODAgXHprk+DRU= Received: by gentwo.org (Postfix, from userid 1003) id 081B040355; Fri, 23 Aug 2024 12:38:05 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by gentwo.org (Postfix) with ESMTP id 072E74022C; Fri, 23 Aug 2024 12:38:05 -0700 (PDT) Date: Fri, 23 Aug 2024 12:38:05 -0700 (PDT) From: "Christoph Lameter (Ampere)" To: Will Deacon cc: Catalin Marinas , Peter Zijlstra , Ingo Molnar , Waiman Long , Boqun Feng , Linus Torvalds , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-arch@vger.kernel.org Subject: Re: [PATCH v2] Avoid memory barrier in read_seqcount() through load acquire In-Reply-To: <20240823103205.GA31866@willie-the-truck> Message-ID: <8dcd8772-2c0c-20af-86c4-18f32c07d1e9@gentwo.org> References: <20240819-seq_optimize-v2-1-9d0da82b022f@gentwo.org> <20240823103205.GA31866@willie-the-truck> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Queue-Id: 6EBA2C0003 X-Stat-Signature: tfcn57i8df3ydf5g7j4z3se8pzfgutca X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1724442566-702696 X-HE-Meta: U2FsdGVkX1/aX1k2kv6KVQvHB6RORXRwafaB2LH4J3bY2CGVu+TVDKa9qXNDcR40T3C3aSz3qb/NI+VvurjB3b1XuvwqeHIdnkPjPuEDwPfomNFrtG7IQ2UAPU9bh7KNrgaXm2O93XYhfDXUfcP3dalDIjgSJiB2gvK3HFN9nEc9AD9DFVr85qEPAllP/URo6fvtT0jcpJjHGajYBBFyCKH/WSQmAJOk/Z3FZ51rom7VvnCBdBDH6yij/vNB+CxAWr7EX3b1GmD8+gTpb+CC/9WcnMev5iXnICObAyd7/k+f4EKoVMf+FISdloviQmYHCO21NkaeZHzSfHtx+4ZnJaI1WtxYcWzpa9NB/B3ZhyKd2TQMwQHZdhJ+vWhJJfxGZ/HFcuwDkpRjiFC+Z7fw7jtplSZ0H7Ox2atRpLEvPIsiW/6RgdDAJ6VP+F0wdlk6wCaBNx/I3LbHDjb+Vsetvp4D2+0Y65wkPWuU9feZO4TDkzO2d502kfe0HBvxg3gWRkPiaFXiYSwUD9qZmbaeoPHRYw3e+qfX0q/eKmXVGIOqLR85NVjjkdLw3V36q+JIgGQTwfznxK9zg5JFjDezVT2SMA2dkPEvtjir0lPuqtAAJRR+eJYG65/r6ZQkoBJDL3qHu5qvk2B8MofNSQLOMKhky+knXNt5hlYJiCcHPt4V2AU0PJvfi9tx7uftOoAX5CkLrh8/egCzhbogMWQ0ncSKAyPzckCZz+mUHH9bPhV0kxj2B4RQ492/Nj/I6eMuRQh3MyOxOvxDlrIwcFEaUeTWOPec1UTNwW1nKnOB99nYh5Ghio0OWxGaIYzlQAh6Wwvr65gl8UOmZLt5kSXu3Id5ID9dlQB/IWOC4iFUWc43GR4vHMGauNWdcryPcI15KgL/xNkYm07mQIAgfDKHIAYOT1ma6zR2+xkzsUY5+dQxX0fHGvOrt70B/WoOg5ar37A03RVPP9wNrzkvusZ 91T7NR5L asyPBUyfY4dpRybyDxAAKEEWSg4Wt5Ppq0SnWiifdKkrhjCCRr9BTpQH2vHf+CNL29A4xuY74tV6ojvssq07qRGIV++pWoZvUCbB0thurD0JEGcG8DhUmRNA9jcezRcMKOqXV6WYCgxuvoThHvPZ+Hy6CjbiLZpS6bKnI4x2n4Y0/BCsZTYXIEhRLd+4XL2E0NxgZdy9G+rn9bIgYTz8zBd8u7h+52D/yFm0dlJxy7jWQbx9Hv9K16IcAZJkxp9k2ceAfzC6nyEWpqU1xsHxpUy/D1F90WykRNeZ7aKKbwPWkjz8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, 23 Aug 2024, Will Deacon wrote: > > +#ifdef CONFIG_ARCH_HAS_ACQUIRE_RELEASE > > +#define raw_read_seqcount_begin(s) \ > > +({ \ > > + unsigned _seq; \ > > + \ > > + while ((_seq = seqprop_sequence_acquire(s)) & 1) \ > > + cpu_relax(); \ > > It would also be interesting to see whether smp_cond_load_acquire() > performs any better that this loop in the !RT case. The hack to do this follows. Kernel boots but no change in cycles. Also builds a kernel just fine. Another benchmark may be better. All my synthetic tests do is run the function calls in a loop in parallel on multiple cpus. The main effect here may be the reduction of power since the busyloop is no longer required. I would favor a solution like this. But the patch is not clean given the need to get rid of the const attribute with a cast. Index: linux/include/linux/seqlock.h =================================================================== --- linux.orig/include/linux/seqlock.h +++ linux/include/linux/seqlock.h @@ -325,9 +325,9 @@ SEQCOUNT_LOCKNAME(mutex, struct m #define raw_read_seqcount_begin(s) \ ({ \ unsigned _seq; \ + seqcount_t *e = seqprop_ptr((struct seqcount_spinlock *)s); \ \ - while ((_seq = seqprop_sequence_acquire(s)) & 1) \ - cpu_relax(); \ + _seq = smp_cond_load_acquire(&e->sequence, ((e->sequence & 1) == 0)); \ \ kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX); \ _seq; \