From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8364C433F5 for ; Mon, 29 Nov 2021 13:23:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D91F16B0071; Mon, 29 Nov 2021 08:23:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D42586B0072; Mon, 29 Nov 2021 08:23:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C09836B0073; Mon, 29 Nov 2021 08:23:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0047.hostedemail.com [216.40.44.47]) by kanga.kvack.org (Postfix) with ESMTP id B029E6B0071 for ; Mon, 29 Nov 2021 08:23:36 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 67C8580EC68A for ; Mon, 29 Nov 2021 13:23:26 +0000 (UTC) X-FDA: 78862034412.14.248A4A1 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf17.hostedemail.com (Postfix) with ESMTP id 7B5DDF0001FA for ; Mon, 29 Nov 2021 13:23:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=XI+jJ4HE08vbr6cxNFwgDtnF+jXYceTPv2Cm+tg2N8U=; b=UoWSBUO0XIjErdM3Fo/kNlF2Of uR18Lpy94jYibhxqaRhcSl/ei022kwwzCkJQKD5zzFRQcosn/gIzXX7pC27HyZNcmpaTN2NqfLS+Q o26jheWS88Hu60Ibp73mem4N+F+4q3YURYA+xXfTL9j7dS/jzSXux+1JlVfHfGQjbhxbqGunYpKFZ x9Z0wUUtrBjfx+nbhFXvA+LKj5vfnxJ3RaSa8Bhg9dhdomEIRi+6d0N/YnFugrOCqdvciS0ETPmjh w5KsGNDoCSLzV94HaH5i/eUQoirZrmbPk5u4nGZS6NYMZF+16pIAYD+vx5mOk1vSZPiuTCAKlN07n VNem1r3w==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1mrgcp-007hAU-DH; Mon, 29 Nov 2021 13:23:19 +0000 Date: Mon, 29 Nov 2021 13:23:19 +0000 From: Matthew Wilcox To: Michal Hocko Cc: Hao Lee , Linux MM , Johannes Weiner , vdavydov.dev@gmail.com, Shakeel Butt , cgroups@vger.kernel.org, LKML Subject: Re: [PATCH] mm: reduce spinlock contention in release_pages() Message-ID: References: <20211125080238.GA7356@haolee.io> <20211125123133.GA7758@haolee.io> <20211126162623.GA10277@haolee.io> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 7B5DDF0001FA X-Stat-Signature: hm9thubkee77i51ci54hbw9y66crwidc Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=UoWSBUO0; dmarc=none; spf=none (imf17.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org X-HE-Tag: 1638192205-334770 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Nov 29, 2021 at 09:39:16AM +0100, Michal Hocko wrote: > On Fri 26-11-21 16:26:23, Hao Lee wrote: > [...] > > I will try Matthew's idea to use semaphore or mutex to limit the number of BE > > jobs that are in the exiting path. This sounds like a feasible approach for > > our scenario... > > I am not really sure this is something that would be acceptable. Your > problem is resource partitioning. Papering that over by a lock is not > the right way to go. Besides that you will likely hit a hard question on > how many tasks to allow to run concurrently. Whatever the value some > workload will very likely going to suffer. We cannot assume admin to > chose the right value because there is no clear answer for that. Not to > mention other potential problems - e.g. even more priority inversions > etc. I don't see how we get priority inversions. These tasks are exiting; at the point they take the semaphore, they should not be holding any locks. They're holding a resource (memory) that needs to be released, but a task wanting to acquire memory must already be prepared to sleep. I see this as being a thundering herd problem. We have dozens, maybe hundreds of tasks all trying to free their memory at once. If we force the herd to go through a narrow gap, they arrive at the spinlock in an orderly manner.