From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8BA3CC6FD1D
	for <linux-mm@archiver.kernel.org>; Tue, 21 Mar 2023 10:05:34 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id DF11E6B0075; Tue, 21 Mar 2023 06:05:33 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id DA10C6B0078; Tue, 21 Mar 2023 06:05:33 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id C69726B007B; Tue, 21 Mar 2023 06:05:33 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14])
	by kanga.kvack.org (Postfix) with ESMTP id B7FCF6B0075
	for <linux-mm@kvack.org>; Tue, 21 Mar 2023 06:05:33 -0400 (EDT)
Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay01.hostedemail.com (Postfix) with ESMTP id 8926B1C6A17
	for <linux-mm@kvack.org>; Tue, 21 Mar 2023 10:05:33 +0000 (UTC)
X-FDA: 80592473346.23.C298D82
Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172])
	by imf28.hostedemail.com (Postfix) with ESMTP id 6D017C0013
	for <linux-mm@kvack.org>; Tue, 21 Mar 2023 10:05:31 +0000 (UTC)
Authentication-Results: imf28.hostedemail.com;
	dkim=pass header.d=fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b=vPR3bujv;
	dmarc=pass (policy=quarantine) header.from=fromorbit.com;
	spf=pass (imf28.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=david@fromorbit.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1679393131;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=w1duAursv3dYZ+FsVJdTwkT5BNT6wALYEXhbvzy09mQ=;
	b=fFDwpkDQ8Czzg5dft9ZUH8SnG9wbzujrlxtBerN4zCM2fSPOgN/y6xXfULnUHJ6/DXZlPK
	5dOi94ihBdV81g/smnxaSo/nEVAIby/HXJEP5Rg9MrjhuCptW2w5zxRZ9mOgw5DG8LK3hH
	CBnUHeUHgwRx/slxms3eWrrRROdMbEU=
ARC-Authentication-Results: i=1;
	imf28.hostedemail.com;
	dkim=pass header.d=fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b=vPR3bujv;
	dmarc=pass (policy=quarantine) header.from=fromorbit.com;
	spf=pass (imf28.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=david@fromorbit.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679393131; a=rsa-sha256;
	cv=none;
	b=oXlvm2yvOx0hlI3QCr2jKRjo624z7KL2hyOBq8AkelFLyBihCPI3eNSFiBgdVyAelekKHg
	YcbOr28586El53pJsoSwEI5ydNTE4Snmx/zJVr5BeNJkj8bOTJO19cHIckjeW4KBS7kz+S
	cgc9cnT8b2O01VRzFLc+bZBSOCaDbq0=
Received: by mail-pl1-f172.google.com with SMTP id k2so15441592pll.8
        for <linux-mm@kvack.org>; Tue, 21 Mar 2023 03:05:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=fromorbit-com.20210112.gappssmtp.com; s=20210112; t=1679393130;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to;
        bh=w1duAursv3dYZ+FsVJdTwkT5BNT6wALYEXhbvzy09mQ=;
        b=vPR3bujvFheanHsY3IbfrdpPJIVjpCDA3EV+NEHfJ704uUcA+b2+DFulJoIw7nm5KZ
         CwIyAddycqQ1LlGJtLfkgSRYuZgcSrYVQIHdYHw63xygyeKh0QwpLipr5rWcskrSSwJ/
         28lWCHzGTuNaZaKBL0LeqeAUDUDZPODyCY0aTM+PIw9NFkZSP3wTO1nsy+LsqNomA1mb
         l0fjcuSfp0xUU/FhpjwBdaCpgC6Mba46yfNm/QWvqT8AdQYiW/3kVT19pwz69e9yewhO
         MS7OBGx5QVDHMCJplxalHeBaZywaXZehp/4QiiSoYXm7Khk6Cta17VBBND4MVbdmPzh2
         0a0Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112; t=1679393130;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date
         :message-id:reply-to;
        bh=w1duAursv3dYZ+FsVJdTwkT5BNT6wALYEXhbvzy09mQ=;
        b=tK97QSAlwhRwS780qJIbmWkPYJs0o2sgpN1/E+vbASH/GYImc2z2DUAiBt6BtKd9XT
         vVa8hQVVzV3ZEH4A5CHDFlbVoMftQd6/DCLqNkcPCqicu4BJ+zp//R7wPNNvzWNPyV3y
         YekypTnO4M3wKhLkYsaWgFQ8QLgb59p+I+DRz+Z5UgrO1o0raLL5OWL2zhs6XFLnGfJF
         la8NKz6MXZgR3/a7t6jWlZqDh6SCA6tE/x0vBQcJO7TLyiW8do5aPK9L+GzZ1XAG6/VA
         7AhnLeNNpB5tFK+Pge8x9+8eu5xeBiQeG+y2QwtF+6rZLHQIMAMnd3tqy9KtNPJ+cLpg
         VhSQ==
X-Gm-Message-State: AO0yUKXDIBU0Rv8ezyVI3SR18iD1t8q/gPVD+fzw+aJAIfPxvuDSJ6/Z
	BrYoToSaz5+rnmbxCn7admQz3A==
X-Google-Smtp-Source: AK7set/vx6sFqB5DNFvFYx7r90EmXOZbKs/Cr4AJaTCz6F9iFLF/J+jY/m9VQEE+zmz1MHocXjBfdw==
X-Received: by 2002:a17:902:f94e:b0:19f:3b86:4715 with SMTP id kx14-20020a170902f94e00b0019f3b864715mr1560718plb.8.1679393130131;
        Tue, 21 Mar 2023 03:05:30 -0700 (PDT)
Received: from destitution (pa49-196-94-140.pa.vic.optusnet.com.au. [49.196.94.140])
        by smtp.gmail.com with ESMTPSA id x2-20020a170902b40200b001a1cf0744a2sm3750245plr.247.2023.03.21.03.05.29
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 21 Mar 2023 03:05:29 -0700 (PDT)
Received: from dave by destitution with local (Exim 4.96)
	(envelope-from <david@fromorbit.com>)
	id 1peYru-000aOt-0z;
	Tue, 21 Mar 2023 21:05:26 +1100
Date: Tue, 21 Mar 2023 21:05:26 +1100
From: Dave Chinner <david@fromorbit.com>
To: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Uladzislau Rezki <urezki@gmail.com>, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Baoquan He <bhe@redhat.com>, Matthew Wilcox <willy@infradead.org>,
	David Hildenbrand <david@redhat.com>,
	Liu Shixin <liushixin2@huawei.com>, Jiri Olsa <jolsa@kernel.org>
Subject: Re: [PATCH v2 2/4] mm: vmalloc: use rwsem, mutex for vmap_area_lock
 and vmap_block->lock
Message-ID: <ZBmBZqhOHdGt4t9n@destitution>
References: <cover.1679209395.git.lstoakes@gmail.com>
 <6c7f1ac0aeb55faaa46a09108d3999e4595870d9.1679209395.git.lstoakes@gmail.com>
 <ZBkDuLKLhsOHNUeG@destitution>
 <ZBk/Wxj4rXPra/ge@pc636>
 <8cd31bcd-dad4-44e3-920f-299a656aea98@lucifer.local>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <8cd31bcd-dad4-44e3-920f-299a656aea98@lucifer.local>
X-Rspam-User: 
X-Rspamd-Server: rspam02
X-Rspamd-Queue-Id: 6D017C0013
X-Stat-Signature: up7u8haoapjmzcf7xupwxr638quxmxwq
X-HE-Tag: 1679393131-587036
X-HE-Meta: U2FsdGVkX18igDULrre0J7Zvg9+8Oi2FkJmecsEIByQFOSzNAxgVrLB5YftTNoOlL2CDnSgo/ZjalkIIxfOn9Hb2+UL9n9KAlFkgfXdx+kDBiZ/Bcvcs/OLZchUoqDT4Tv2ALYFU5yFfXMILb//FUMPokRTU7ShF/cey6Zl1jMF4XzUp5Ll/6fWDsIW1ZAUSRlkbC17546hdhT9iLSDVAKA3eKLKJ9fUlnDZ2ZsbLczz7puoSdtW2WHUUAt5wGccs5Lhj8RlAhvt3aL2AqKXhQ/e69lqmzGvudpKY799PR9XQggNOrgRRKqp1RQOhrlqOg0IiM+7kmmiUnWwaOwgKFxB93290J7yPRQ5hqVcjH2lnt5chwvBcjQ2XzaeHd4tuM/NmKGp3cX4W2WSxgaeZIK4OltVgdWdRTrLY8ZOo596NRT386vTuegLo8Ax30rqCF6AXLFF/EIQzmoN80G7Li5GF4tkK9uUIRk7lpYJjWhMyRd46cUoqNHcyC5xKrxSclf2fXMYqopfB4/2vajt0zXxmVdjOAiUQZtLj0We+fm2WnSYbAV4q5gZ1BgdlL0Xmr7lZF/cn6z7F0yS4K93G9J7O2qw6dobSe4Tge3+H1absc/CsWlaFccGV5RD4kHrcsZgJwKcIquuFKBJh7npp4ZJPeTNqK8Sw8gVuYAeU4Hkv7afhlTrXvZUVNVTdTiRRQyYFkhD+4NBGqi8K1Xs+ViXZX0lEvsHTf/2IR3Wqq31BTVb8/F1W2XezQxVkQFebddnfUPmF4w5nJMGJAnW4X5ixqeCRyBj9+aZ7DkUkyj0TtWXqy1QFeSJUSmF1RUo81NhSwT71NnorHS0+Ek1j8FCJ8NuQPo6DfY9FOH9GtXPY1eIsPurc9sinaCEGH+WPgs8VDXNqx3WinMeKTnOfaZKYCljwqYiem8eiVfqAlrEaLVyAceadh74nVytNqGNsONfY8qb5+IenhDGEsx
 og/rjcSb
 4YDaJJToV5PAMTXDFc+cpeR0IUf0CS84nVbR3mq4iXaMETt+3+z4frdY6mHEAR1DKEYN2w4625cDe6JaXhdoYO+EYWvDSt35XSZ+4wVunMaLZauRkn8n4Xnon26EKA2ro3xWgNiNGV2AB5+EAFIY0tp5GUnOMh7bO6vTZcYbianFAqmcJO/5EV13qA96Rl3XJmrBSMsJVFMowJyPJs1lwwOUzfL564ZfjPHSVjZuWbOSzuqrQ6FlbxGLwB03D81/AhH3xF/6uMHRZxpi2sd+Krzhy9aoFB/GYigFUqMKttSNgDFzmbSCggPMkzjgkVPQRzhGR4H46Mm0WaBvfqMxIeI//3WadDXy8rQ0nONrLL5F5NT1bIU+ePdGOMCssqFcaZSdxSOhKUpqfChRmqiZRp2FD9P+6GLX4fU8PQl3uvNqPh/volHmPtpNQrEM5/rPiHb7CUZOCQbWOSPF5RmUyZgrw9itwEL1VsLEc4sVViWr80sQmGYEDU+jvLqSbWS61WVKItTdMt5yNcOICWsx5dZFS8GFbSuKr826/atTobutdtlA23SbNuDRRdAMkHXUMZhrtILZw9vNo1ElVa+nDuA3TRB7SQY2K4gg1A7AvXq464Hro8CJTaoz6DYCzTEi2R6uJ
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue, Mar 21, 2023 at 07:45:56AM +0000, Lorenzo Stoakes wrote:
> On Tue, Mar 21, 2023 at 06:23:39AM +0100, Uladzislau Rezki wrote:
> > On Tue, Mar 21, 2023 at 12:09:12PM +1100, Dave Chinner wrote:
> > > On Sun, Mar 19, 2023 at 07:09:31AM +0000, Lorenzo Stoakes wrote:
> > > > vmalloc() is, by design, not permitted to be used in atomic context and
> > > > already contains components which may sleep, so avoiding spin locks is not
> > > > a problem from the perspective of atomic context.
> > > >
> > > > The global vmap_area_lock is held when the red/black tree rooted in
> > > > vmap_are_root is accessed and thus is rather long-held and under
> > > > potentially high contention. It is likely to be under contention for reads
> > > > rather than write, so replace it with a rwsem.
> > > >
> > > > Each individual vmap_block->lock is likely to be held for less time but
> > > > under low contention, so a mutex is not an outrageous choice here.
> > > >
> > > > A subset of test_vmalloc.sh performance results:-
> > > >
> > > > fix_size_alloc_test             0.40%
> > > > full_fit_alloc_test		2.08%
> > > > long_busy_list_alloc_test	0.34%
> > > > random_size_alloc_test		-0.25%
> > > > random_size_align_alloc_test	0.06%
> > > > ...
> > > > all tests cycles                0.2%
> > > >
> > > > This represents a tiny reduction in performance that sits barely above
> > > > noise.
> > >
> > > I'm travelling right now, but give me a few days and I'll test this
> > > against the XFS workloads that hammer the global vmalloc spin lock
> > > really, really badly. XFS can use vm_map_ram and vmalloc really
> > > heavily for metadata buffers and hit the global spin lock from every
> > > CPU in the system at the same time (i.e. highly concurrent
> > > workloads). vmalloc is also heavily used in the hottest path
> > > throught the journal where we process and calculate delta changes to
> > > several million items every second, again spread across every CPU in
> > > the system at the same time.
> > >
> > > We really need the global spinlock to go away completely, but in the
> > > mean time a shared read lock should help a little bit....
> > >
> 
> Hugely appreciated Dave, however I must disappoint on the rwsem as I have now
> reworked my patch set to use the original locks in order to satisfy Willy's
> desire to make vmalloc atomic in future, and Uladzislau's desire to not have a
> ~6% performance hit -
> https://lore.kernel.org/all/cover.1679354384.git.lstoakes@gmail.com/

Yeah, I'd already read that.

What I want to do, though, is to determine whether the problem
shared access contention or exclusive access contention. If it's
exclusive access contention, then an rwsem will do nothing to
alleviate the problem, and that's kinda critical to know before any
fix for the contention problems are worked out...

> > I am working on it. I submitted a proposal how to eliminate it:
> >
> >
> > <snip>
> > Hello, LSF.
> >
> > Title: Introduce a per-cpu-vmap-cache to eliminate a vmap lock contention
> >
> > Description:
> >  Currently the vmap code is not scaled to number of CPU cores in a system
> >  because a global vmap space is protected by a single spinlock. Such approach
> >  has a clear bottleneck if many CPUs simultaneously access to one resource.
> >
> >  In this talk i would like to describe a drawback, show some data related
> >  to contentions and places where those occur in a code. Apart of that i
> >  would like to share ideas how to eliminate it providing a few approaches
> >  and compare them.

If you want data about contention problems with vmalloc

> > Requirements:
> >  * It should be a per-cpu approach;

Hmmmm. My 2c worth on this: That is not a requirement.

That's a -solution-.

The requirement is that independent concurrent vmalloc/vfree
operations do not severely contend with each other.

Yes, the solution will probably involve sharding the resource space
across mulitple independent structures (as we do in filesystems with
block groups, allocations groups, etc) but that does not necessarily
need the structures to be per-cpu.

e.g per-node vmalloc arenas might be sufficient and allow more
expensive but more efficient indexing structures to be used because
we don't have to care about the explosion of memory that
fine-grained per-cpu indexing generally entails.  This may also fit
in to the existing per-node structure of the memory reclaim
infrastructure to manage things like compaction, balancing, etc of
vmalloc space assigned to the given node.

Hence I think saying "per-cpu is a requirement" kinda prevents
exploration of other novel solutions that may have advantages other
than "just solves the concurrency problem"...

> >  * Search of freed ptrs should not interfere with other freeing(as much as we can);
> >  *   - offload allocated areas(buzy ones) per-cpu;
> >  * Cache ready sized objects or merge them into one big per-cpu-space(split on demand);
> >  * Lazily-freed areas either drained per-cpu individually or by one CPU for all;
> >  * Prefetch a fixed size in front and allocate per-cpu

I'd call these desired traits and/or potential optimisations, not
hard requirements.

> > Goals:
> >  * Implement a per-cpu way of allocation to eliminate a contention.

The goal should be to "allow contention-free vmalloc operations", not
that we implement a specific solution.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com