From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0BFAEB64DC for ; Mon, 3 Jul 2023 16:11:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3BBBA280018; Mon, 3 Jul 2023 12:11:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 36BD4280001; Mon, 3 Jul 2023 12:11:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1E640280018; Mon, 3 Jul 2023 12:11:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 0FF3E280001 for ; Mon, 3 Jul 2023 12:11:44 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C99F9B018F for ; Mon, 3 Jul 2023 16:11:43 +0000 (UTC) X-FDA: 80970791286.27.7455979 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf07.hostedemail.com (Postfix) with ESMTP id C30294001A for ; Mon, 3 Jul 2023 16:11:38 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=NGNDVCUX; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf07.hostedemail.com: domain of gerald.schaefer@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=gerald.schaefer@linux.ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688400700; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Wdh2RYAyHITrVgkXKMbm/krs8OGbtf5yBTnboTwlsec=; b=3q8DgyRGYU4oe4+1clBhT6g2Th4dL7L+Fs97EN+dypjtQQX+3+UUUfv/HgHlknClwKVg4y KwQnWknmxJZMy8rPKSxbILwlHISIEIhTD4aBN8iyGzZhnmycW6Dc3cUVGyAA0ge+nLGvMu nmu1+aVBRj92hHzWU7/PCz1+Eq9p8Wo= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=NGNDVCUX; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf07.hostedemail.com: domain of gerald.schaefer@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=gerald.schaefer@linux.ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688400700; a=rsa-sha256; cv=none; b=WI88Phut0ecGr2X6oEEv2/oJARMz9/ghOVobJU81W1N5FT5Ez/Q6P13MygGkGUk9uRKEbl OfiY/CRssQ9WpGuf+zdIVQl8baMPCtZv6XXom8aTlyNqZuGp+9o6/fbRQkvhgkykr5QK84 6ESLyZ8Lc/7Rou4T2wKDUcBc909C940= Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 363FqtYq009238; Mon, 3 Jul 2023 16:10:47 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : cc : subject : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=pp1; bh=Wdh2RYAyHITrVgkXKMbm/krs8OGbtf5yBTnboTwlsec=; b=NGNDVCUXDniP9ks7v9zkcKDPOrZ5hRklMqLrpxDF9B5S0VVIriSuzB1fhcisG606wBEk QPqFKPTRJeXoQQL2fdlyuz3UU9Wx66gt2FKlMlQcqls/QU7hvUPOZpSMVpQPQnIErFNk 26Mp+YxPRDdgeXO8CdGLro8fXUtPCvqpb36DbQnVvDyyfne2Q79PzqjSeLQ0pN9XM1Ts Y1s0lgcQ5y5SnVWCisewvr9NKqWHnlrlpsoBgOLliqKOdOSAc+3JJg+v0ivL/c+XLD6M gqQtoAHbNnioDyKOuDd0vKGRJvOV1LSmcYF9thH/HCIfNBE7qpLzmRnX782uQ2kZy7dX QA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3rm1burfyc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 03 Jul 2023 16:10:47 +0000 Received: from m0360072.ppops.net (m0360072.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 363FsDxK011531; Mon, 3 Jul 2023 16:10:46 GMT Received: from ppma06ams.nl.ibm.com (66.31.33a9.ip4.static.sl-reverse.com [169.51.49.102]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3rm1burfxa-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 03 Jul 2023 16:10:46 +0000 Received: from pps.filterd (ppma06ams.nl.ibm.com [127.0.0.1]) by ppma06ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3633njY3024429; Mon, 3 Jul 2023 16:10:43 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma06ams.nl.ibm.com (PPS) with ESMTPS id 3rjbde1dea-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 03 Jul 2023 16:10:43 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 363GAdU410420980 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 3 Jul 2023 16:10:39 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3760C2004B; Mon, 3 Jul 2023 16:10:39 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 730D420043; Mon, 3 Jul 2023 16:10:35 +0000 (GMT) Received: from thinkpad-T15 (unknown [9.179.14.225]) by smtpav01.fra02v.mail.ibm.com (Postfix) with SMTP; Mon, 3 Jul 2023 16:10:35 +0000 (GMT) Date: Mon, 3 Jul 2023 18:10:32 +0200 From: Gerald Schaefer To: Hugh Dickins Cc: Jason Gunthorpe , Andrew Morton , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page Message-ID: <20230703181032.5803c333@thinkpad-T15> In-Reply-To: References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> <20230628211624.531cdc58@thinkpad-T15> <20230629175645.7654d0a8@thinkpad-T15> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: cphjZUFFbr8OVTwFiLQZELsZpHPzMQRL X-Proofpoint-ORIG-GUID: D2nN67tsd20HkmnZX5QgdLfJHxrI2c8V X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-07-03_11,2023-06-30_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 mlxlogscore=999 malwarescore=0 adultscore=0 lowpriorityscore=0 mlxscore=0 clxscore=1015 suspectscore=0 impostorscore=0 spamscore=0 priorityscore=1501 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2307030146 X-Rspam-User: X-Stat-Signature: we6in6bibpbjqupgfudsg4au9bzbkuu7 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: C30294001A X-HE-Tag: 1688400698-530151 X-HE-Meta: U2FsdGVkX19PgB90esnzKRFpYrTYG0SoFYt4BK+EduUWN5CZtMF0ZmITbXrsf+JRErfxq16CzPiTr1B3WqSl17lqAKv9/JXGEPQylE3yklZ3LDr4BgQ95301ARMePv5bbR5ReP8Jg4wF8V/hHmcGzd0D42zr9Jje0jOfTSJvOE4y+fCQ7LtsCKIswPIGjl1mLp43t5RjHVnSVDVipUaKvFhalax5u4TqernN5LctzjdOVyPYRr2PGxTpKNXk5zAcbK4Ay4XJqYtkWVlbRHcTNdxouPBJeatLXAqZKEFdT1FAYfLhW4czg2SyuhcXGPTbedXQSqoTiv2Qyu+EggPzR1EKDSsavOPFzL0fQPuSBN2G/XLD65i4l+Xgs4wmallb1ZfJMPCG0hftX0YtWMsGeJWNhzFJaPegCA9tMSDVz58tZEFWCFYCnCz1uulzKfnxDX8YKwo2m1dUfGSvif48NMLJUR+vH0pB0QM2YADnRZlNoerCGVY2W43FQrrpAuGFgydIRpWkiiCBrq/G276L4ZqUszcxjqTAeFJ0sQXSLCTQuoI6Y+BJ13IQ1d5GUNdCbbWlyrK8eM/O9hRnjA6RLlrUNmoUJJc7Shvo2uM5UnT4P0QJPVTQTfPBgrpMMSDemYwQ6BrzLo3O8aYhduJ9EDSxHXFZIBe7akfAaIQ4GlzPlZnSV4rLoaOcj65JtOB1F+DzEragm1ObnuJVJzjDgP7ffY9USIwaxZEk/jwKffPVxPQLusRPUdiT55J4Md+CymRDajiA3Yjbir8d7yv1RUUiiuRjxvDDpszqEacChPP2DFWHjxHeeTQfvISK6SAM20XNdB/LT8wpggiI2iFLtPhSw2HIJnoQo6Xq41EWpyIHO6qZN+3D8cKb02lGnncxxm2gIGTV+NPJ0GgGphJS+jyBXugYz6ZoHhIfVQJXaY2pL/2Gu2dKvw0FqY61dNX9qoSHpNlZKT8LMH+Hm3J WBx6TeGF iC9l1iKPoL2aJ+PvMWntudN5uhAAZHL4TbiffAoAJvMqBxe4ZEiM4rwZZ9bYtnoaZNZlO9MCL8ny6GWyeu3rIhEIftgbhwuy5GXqBaw8/iEs9c+sT7qG3eyx8sRw/ixliVPmtZ/I9xpXKZOhZA67ZCRjfmzU2sD6AA88wF59H7D3WdOo+8f4Ru94Dqa6JRXogZNU3+8ZKaTRHaCwbK5i2BmrmPQSlq/9nP5OXTLYgwvmVLCwAxCbCfZ/IYPXabjH2hiGaUoPvKdHHpu45ijXvjtNVwV7+63QjyVQUTD2hnGEtO42Wja8Iu3Bu8l1XyTCTBwr/88Zz0LOzNhgMS5T4/p3F0I9py9+OehZXNPPJfi1/C27JTG7J+N/5CA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 29 Jun 2023 23:00:07 -0700 (PDT) Hugh Dickins wrote: > On Thu, 29 Jun 2023, Gerald Schaefer wrote: > > On Thu, 29 Jun 2023 12:22:24 -0300 > > Jason Gunthorpe wrote: > > > On Wed, Jun 28, 2023 at 10:08:08PM -0700, Hugh Dickins wrote: > > > > On Wed, 28 Jun 2023, Gerald Schaefer wrote: > > > > > > > > > > As discussed in the other thread, we would rather go with less complexity, > > > > > possibly switching to an approach w/o the list and fragment re-use in the > > > > > future. For now, as a first step in that direction, we can try with not > > > > > adding fragments back only for pte_free_defer(). Here is an adjusted > > > > > version of your patch, copying most of your pte_free_defer() logic and > > > > > also description, tested with LTP and all three of your patch series applied: > > > > > > > > Thanks, Gerald: I don't mind abandoning my 13/12 SLAB_TYPESAFE_BY_RCU > > > > patch (posted with fewer Cc's to the s390 list last week), and switching > > > > to your simpler who-cares-if-we-sometimes-don't-make-maximal-use-of-page > > > > patch. > > > > > > > > But I didn't get deep enough into it today to confirm it - and disappointed > > > > that you've found it necessary to play with pt_frag_refcount in addition to > > > > _refcount and HH bits. No real problem with that, but my instinct says it > > > > should be simpler. > > > > Yes, I also found it a bit awkward, but it seemed "good and simple enough", > > to have something to go forward with, while my instinct was in line with yours. > > > > > > > > Is there any reason it should be any different at all from what PPC is > > > doing? > > > > > > I still think the right thing to do here is make the PPC code common > > > (with Hugh's proposed RCU modification) and just use it in both > > > arches.... > > > > With the current approach, we would not add back fragments _only_ for > > the new pte_free_defer() path, while keeping our cleverness for the other > > paths. Not having a good overview of the negative impact wrt potential > > memory waste, I would rather take small steps, if possible. > > > > If we later switch to never adding back fragments, of course we should > > try to be in line with PPC implementation. > > I find myself half-agreeing with everyone. > > I agree with Gerald that s390 should keep close to what it is already > doing (except for adding pte_free_defer()): that changing its strategy > and implementation to be much more like powerpc, is a job for some other > occasion (and would depend on gathering data about how well each does). > > But I agree with Jason that the powerpc solution we ended up with cut > out a lot of unnecessary complication: it shifts the RCU delay from > when pte_free_defer() is called, to when the shared page comes to be > freed; which may be a lot later, and might not be welcome in a common > path, but is quite okay for the uncommon pte_free_defer(). Ok, I guess I must admit that I completely ignored the latest progress in the powerpc thread, and therefore was not up-to-date. Still had the older approach in mind, where you also checked for pt_frag_refcount to avoid double call_rcu(). The new approach sounds very reasonable, and I also like your latest s390 patch from a first glance. Need to get more up-to-date with PageActive and maybe also powerpc approach, and give this some proper review tomorrow. > > And I agree with Alexander that pte_free_lower() and pte_free_upper() > are better names than pte_free_now0() and pte_free_now1(): I was going > to make that change, except all those functions disappear if we follow > Jason's advice and switch the call_rcu() to when freeing the page. > > (Lower and upper seem unambiguous to me: Gerald, does your confusion > come just from the way they are shown the wrong way round in the PP AA > diagram? I corrected that in my patch, but you reverted it in yours.) Ah yes, that could well be, and unfortunately I did not notice that you fixed that in the comment. I only saw that you "fixed" the bit numbering from 01234567 to 76543210, which I think is wrong on big-endian s390, and therefore I simply removed that complete hunk. But thanks a lot for pointing to that! We will certainly want to fix that comment in a later patch, to reduce some or maybe all of the (at least my) upper/lower confusion.