From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C08AEB64DC for ; Thu, 29 Jun 2023 15:45:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 705CF8D0005; Thu, 29 Jun 2023 11:45:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 68E738D0001; Thu, 29 Jun 2023 11:45:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B9AD8D0005; Thu, 29 Jun 2023 11:45:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 374E78D0001 for ; Thu, 29 Jun 2023 11:45:47 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 02C4DC03E7 for ; Thu, 29 Jun 2023 15:45:46 +0000 (UTC) X-FDA: 80956210734.04.E04DE3C Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by imf29.hostedemail.com (Postfix) with ESMTP id 3D5FC12000D for ; Thu, 29 Jun 2023 15:45:41 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=bBsonv92; spf=pass (imf29.hostedemail.com: domain of gerald.schaefer@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=gerald.schaefer@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688053542; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=REuMPbpiXxOOqmGePcLnL26GHPmbfQCfjSdqDhRBqJY=; b=UtJMEUsIL9xBhEP6yySC37Iz0ZLs9vyDXzhn96tVn3uZONP5PV24tCL7zKRrhKAsbFD6UX kG70avsKs/atUg4Hh2kJPgLJHFUXkcS8GANtgvS7FDAJCBT46Km6Mrkquo3qZzxqzkv5yV mx4ogVrRUVs4OK0zfpomMo7yjF1XZJY= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=bBsonv92; spf=pass (imf29.hostedemail.com: domain of gerald.schaefer@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=gerald.schaefer@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688053542; a=rsa-sha256; cv=none; b=dC0E9zW+ZP238A9pUrKl/6x7SH31Ye49FsljcN8J196x8G07um+5de4ZWJQNthocyfkN9W u+B8n0ojsICs2FhWoKkHhOrKFu+NSgTkNQgIfxpjfgBcYc19sDklRTwb+SFSfJYtf+gIvg eGO9cIhkUgKT9U0dHU9j6qiHbxEWFo8= Received: from pps.filterd (m0353728.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 35TFgVbP002416; Thu, 29 Jun 2023 15:44:53 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : cc : subject : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=pp1; bh=REuMPbpiXxOOqmGePcLnL26GHPmbfQCfjSdqDhRBqJY=; b=bBsonv926PlkwA6TzSXylOVMuJXDwWfuyrEkJE0fL24hbL9rxscF1torf4uob6L380nl EN/LXEt4fNbjl2yidJVXJTvhQidne2ZnzGeMHDOAgLWmtb2rlL7qKDoE4y7btLoqIQHj smS6gwrB9gLr3wn+NAKmsGLc6E17BCS2CQIZZN63znuo1iFnfqbFbVh699uy2gTC7u3D 2GhuajLs4GF0c4NHnhJD7djNyCWqBzLx/Lq+nAxgfwg+sADTmQGSyvwxQIQC27A/4WD+ hNG9LlOmVVSkSylZySjLEToKd686gqogibwmB+n+4Ht392oQ+RognSPVb93J74qqtsJO 9w== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3rhcu5070r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 29 Jun 2023 15:44:53 +0000 Received: from m0353728.ppops.net (m0353728.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 35TFgaDS002862; Thu, 29 Jun 2023 15:44:33 GMT Received: from ppma04fra.de.ibm.com (6a.4a.5195.ip4.static.sl-reverse.com [149.81.74.106]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3rhcu505gc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 29 Jun 2023 15:44:32 +0000 Received: from pps.filterd (ppma04fra.de.ibm.com [127.0.0.1]) by ppma04fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 35TDgsNe000301; Thu, 29 Jun 2023 15:44:05 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma04fra.de.ibm.com (PPS) with ESMTPS id 3rdr452jy9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 29 Jun 2023 15:44:05 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 35TFi1CT20382400 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 29 Jun 2023 15:44:01 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5DE522004F; Thu, 29 Jun 2023 15:44:01 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 087EF20040; Thu, 29 Jun 2023 15:43:58 +0000 (GMT) Received: from thinkpad-T15 (unknown [9.171.52.248]) by smtpav01.fra02v.mail.ibm.com (Postfix) with SMTP; Thu, 29 Jun 2023 15:43:57 +0000 (GMT) Date: Thu, 29 Jun 2023 17:43:55 +0200 From: Gerald Schaefer To: Alexander Gordeev Cc: Hugh Dickins , Andrew Morton , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page Message-ID: <20230629174355.222ebed0@thinkpad-T15> In-Reply-To: References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> <20230628211624.531cdc58@thinkpad-T15> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: OW0O7AqzJVgwJLiz07Yw91wbyFNWLd8a X-Proofpoint-GUID: rDaQf77dkZisi0rgCzgj53DrRyw-QNEz X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-06-29_03,2023-06-27_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 suspectscore=0 adultscore=0 clxscore=1015 priorityscore=1501 bulkscore=0 impostorscore=0 mlxscore=0 malwarescore=0 spamscore=0 phishscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2306290140 X-Rspamd-Queue-Id: 3D5FC12000D X-Rspam-User: X-Stat-Signature: 7m9so553seqfgwnt3dpu3epk5co67omf X-Rspamd-Server: rspam01 X-HE-Tag: 1688053541-27327 X-HE-Meta: U2FsdGVkX19V1QkU7wRPTTOTBIBW/QrSLodG8DXdL53iYpc2mQA/v5wT4Xq6fxT5PlulsxMK4rD2sVDQQoJQBlYBMCee8zqBCUnKS8paJ3L4l7qOAif2cxL3WMfnIo88r69AxwZ5DDkwYV/sZfApLvzbUP26yPTsGg/347GwsCKTCP3uw/uA5DSKylleNFXi/rhuoQNS/9WxGUyuJdd9TnOw4JcCZbjKMHHfPRzvZr8xm5Ac2J2K7lcjMMou0vcnP0C3RXI23QK/5fTqAS26ZpVfjieIfHwv9YFwl6qbRvIHnqYPW9TMJiyRpuGzsYiVbW3le0H2INw4FxoAd/sBUfzPqv2pGE8hBLAeylCHXD0tyEFPDX31oPeEu3jipa5/8f/7lu2YHa0E2TtmuqN5mRCWT0OmgddqS9Jga3zPHrMma4hjEkWovIGYS8rEkCAiT10WQz7kPIfxDNOyMofw4N3Mj/TDcrfP+QWywM9wRpcNxe5wed9OEGFjoX2kbIFNmjY0ckTurmcS2gI8/iMQbtRFHjRbQu74wEmf3bpH+/YxY1ET8KV0W14evdp+GYhAKS9p9F3+wFzGluSjZOdqMRsC1T/hdtm/pQ9QyIleb2ISsFQFfDNFWny641UW5vlTei5sPGF1p0oXFbmQLKMFeMpbHOTC/orSvQN1hC18MMkJTJ8kXqGzo7f9soHwVzji/BTCvku3X1tExVescmzY8FSzo/MyzE0XiMavOLP4uQqFwzRZXpF8oiCS0lSR13K8/nCyiz/uba+oqcISDljOs0gxrVfEUsSPJd0mdvFbuo63t2B+4A2RPtGOvOH8r2ILHD/mUewSfiqR98YJtPZi2UpYZhXmqqVBHguk+nyvNRwKULf2pQUakTyBkG4kVIy1IC9LhJXzjUpLulBpFA0aIuEL1nM0OecXhZN09JcqIEJdaGuqP9ylTldN+7Jfgq0D/579KurvPs0LT1txx49 xUANOcyZ F9bTUuY/0I/F5jTSBuu+6srlSoQb/wgf0tIswa15v0H4ntr48zYkMus9gFbHiTAeqgXaB6OUMSz5YQLqTrH2EmT0agC3XrC/4NVGFESCfm4KlIj+SSd8ACH0DqvNTJOFt/bKy9aE15YQDB9XpibXWoWAdx/m1hQu9urTu7AqrCI3IpvwXyZIJ2Me+wfOCox/oOab3xjKfKBb0PzXI/Eqs9tzjZkakVa1uCWXRGgh513Q6JzwmiwYjADbg25K6UZxQmAm0RqkqBUk9HKWbEq92VR7k/jSingt0irE4W2t/Gdm7PW8Le1PAUk9guUG2LAS2Ybl9gtTskABOTfAufCFLxjA3apFtU6bVMmEPuh6zCyC3ucE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 29 Jun 2023 15:59:07 +0200 Alexander Gordeev wrote: > On Wed, Jun 28, 2023 at 09:16:24PM +0200, Gerald Schaefer wrote: > > On Tue, 20 Jun 2023 00:51:19 -0700 (PDT) > > Hugh Dickins wrote: > > Hi Gerald, Hugh! > > ... > > @@ -407,6 +445,88 @@ void __tlb_remove_table(void *_table) > > __free_page(page); > > } > > > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > > +static void pte_free_now0(struct rcu_head *head); > > +static void pte_free_now1(struct rcu_head *head); > > What about pte_free_lower() / pte_free_upper()? I actually like the 0/1 better, I always get confused what exactly we mean with "lower / upper" in our code and comments. Is it the first or second half? With 0/1 it is immediately clear to me. > > ... > > +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) > > +{ > > + unsigned int bit, mask; > > + struct page *page; > > + > > + page = virt_to_page(pgtable); > > + if (mm_alloc_pgste(mm)) { > > + /* > > + * TODO: Do we need gmap_unlink(mm, pgtable, addr), like in > > + * page_table_free_rcu()? > > + * If yes -> need addr parameter here, like in pte_free_tlb(). > > + */ > > + call_rcu(&page->rcu_head, pte_free_pgste); > > + return; > > +} > > + bit = ((unsigned long)pgtable & ~PAGE_MASK) / (PTRS_PER_PTE * sizeof(pte_t)); > > + > > + spin_lock_bh(&mm->context.lock); > > + mask = atomic_xor_bits(&page->_refcount, 0x15U << (bit + 24)); > > This makes the bit logic increasingly complicated to me. I think it is well in line with existing code in page_table_free[_rcu]. Only instead of doing xor with 0x11U, it does xor with 0x15U to also switch on the H bit while at it. > > What if instead we set the rule "one bit at a time only"? > That means an atomic group bit flip is only allowed between > pairs of bits, namely: > > bit flip initiated from > ----------- ---------------------------------------- > P <- A page_table_free(), page_table_free_rcu() > H <- A pte_free_defer() > P <- H pte_free_half() > > In the current model P bit could be on together with H > bit simultaneously. That actually brings in equation > nothing. P bit has to be set at the latest when __tlb_remove_table() gets called, because there it is checked / cleared. It might be possible to not set it in pte_free_defer() already, but only later in pte_free_half() RCU callback, before calling __tlb_remove_table(). But that would not be in line any more with existing code, where it is already set before scheduling the RCU callback. Therefore, I would rather stick to the current approach, unless you see some bug in it. > > Besides, this check in page_table_alloc() (while still > correct) makes one (well, me) wonder "what about HH bits?": > > mask = (mask | (mask >> 4)) & 0x03U; > if (mask != 0x03U) { > ... > } Without adding fragments back to the list, it is not necessary to check any H bits page_table_alloc(), or so I hope. Actually, I like that aspect most, i.e. we have as little impact on current code as possible. And H bits are only relevant for preventing double use of rcu_head, which is what they were designed for, and only the new code has to care about them.