From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3415C7EE25 for ; Thu, 8 Jun 2023 13:58:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 11EF58E0001; Thu, 8 Jun 2023 09:58:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0CF846B0074; Thu, 8 Jun 2023 09:58:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB1C98E0001; Thu, 8 Jun 2023 09:58:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id DB2476B0072 for ; Thu, 8 Jun 2023 09:58:38 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id A3F154024D for ; Thu, 8 Jun 2023 13:58:38 +0000 (UTC) X-FDA: 80879735916.12.E6C2EFF Received: from mail-oi1-f180.google.com (mail-oi1-f180.google.com [209.85.167.180]) by imf25.hostedemail.com (Postfix) with ESMTP id 94AFEA000C for ; Thu, 8 Jun 2023 13:58:36 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=ziepe.ca header.s=google header.b=c3kx8K7t; dmarc=none; spf=pass (imf25.hostedemail.com: domain of jgg@ziepe.ca designates 209.85.167.180 as permitted sender) smtp.mailfrom=jgg@ziepe.ca ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686232716; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UNu2/S7sKOt+ubQCHoHXnO+tkZORQ3Sbd1tjf3f4Mr4=; b=GZuGVRMPaCdhzjxcnf4bGFxbDnAwdEfjkn9o1NjmqRlyAHXOZZ9DyvFF/mLG7JBlUGc25Y ed+d+YnBTrksS7hJguAKOcKMwlLCKrypeGluL0/NfCGTFyVhYNWZrbsMVEzstbHiks6Ol8 E+ovzorDUHbAdi8fdOMzaw5OFTL4Dso= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=ziepe.ca header.s=google header.b=c3kx8K7t; dmarc=none; spf=pass (imf25.hostedemail.com: domain of jgg@ziepe.ca designates 209.85.167.180 as permitted sender) smtp.mailfrom=jgg@ziepe.ca ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686232716; a=rsa-sha256; cv=none; b=xqgnhwNwKLhW4pTbMpzB88jVkENSCM2JxCVwbmcFlvUX79fUVOnKdJvFMREmNfOmggcR7v hPo2dagsMp/3+RFQdp8P4+ieVbzfITotrZWdnP831nHEm+2/yaoUMWp1yAEsgcdOFmlhYe mVSpzkzsHcIBALcVqAVKw+DaXfJxrUc= Received: by mail-oi1-f180.google.com with SMTP id 5614622812f47-39a3f165ac5so416470b6e.3 for ; Thu, 08 Jun 2023 06:58:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; t=1686232715; x=1688824715; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=UNu2/S7sKOt+ubQCHoHXnO+tkZORQ3Sbd1tjf3f4Mr4=; b=c3kx8K7tTcukv+anoLR5JE3/ubg8q5CO3nzZOO6SU5CXIix8vczuprLx4rXVDWTdEX hs+E+zzja4ua8gGSjBrbVm6Ep8+L0fOEvdhrqVYJLxFv130wvOZzdjAaB2Flhgb2TRRb uqPK/7XI63a9Wl2IZQN4DEejxckFyPyr0xFuvLWArVQ/+f9ud9V5FeMXJ6bonL6nPSkP ZAWsLq+XfvZHPiGRN/xo7lAPtKsiLx+Q72RSIJ86pzS9EcuyNwtp9F9J96kqArgULjkv 3GfO6CrfSEhxuYeCqGRvjyHkvsZI1BdV24dd8bBsMQRPPswYuaCUpkJe5qj8Iwetk/KI UiSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686232715; x=1688824715; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=UNu2/S7sKOt+ubQCHoHXnO+tkZORQ3Sbd1tjf3f4Mr4=; b=gX20mqLgZ3dpSrtXH4vQ1iJ3Eo7gQfv94fSw8YKe/RSzcJNixM2uffRZAQYr8gcghg JMxmLjaWBeaICt9UhCPdz0gareV3cGr2taQ95yGU0QTHvcv0JLhAI/vOsZ5wI+uKJF7L NM3DKY5OLjTs7KQ19n+COuByCh5SAULnLk60RlHpznWn9SHdbzdtrJ0088EJ7pFSM+9C WiJUwlsYhY+uaLXJHhWGBNALgQYzQe5ez3FZkakclV4bTe982GXZt5gC+rzX0U5/1Tvv zbvcmKt3TBSR3YadJW//U74ZfFZBT4wsdD5AGHUe6i2JMaGcsjO0NTv7HcyrmpjfsGx7 7/+A== X-Gm-Message-State: AC+VfDwz4wXnHrg+Ybe4vfk9DdRxgziAyvnBO8+l8gqTiEfoXsl8OqJj iYhZd4+5AQSrr2tK1wYH9NGC8w== X-Google-Smtp-Source: ACHHUZ79SDvHlQrsY3n06C+42lQZ8V3VBEDt3rX4FxqxgeXDKorKznGSCHqgRSC89Wx2XwVPR9xrlA== X-Received: by 2002:aca:90e:0:b0:397:f9f2:76b with SMTP id 14-20020aca090e000000b00397f9f2076bmr8071851oij.30.1686232715650; Thu, 08 Jun 2023 06:58:35 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-142-68-25-194.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.68.25.194]) by smtp.gmail.com with ESMTPSA id pg4-20020a17090b1e0400b00256a4d59bfasm3119836pjb.23.2023.06.08.06.58.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 06:58:34 -0700 (PDT) Received: from jgg by wakko with local (Exim 4.95) (envelope-from ) id 1q7G9o-003TSq-VU; Thu, 08 Jun 2023 10:58:32 -0300 Date: Thu, 8 Jun 2023 10:58:32 -0300 From: Jason Gunthorpe To: Hugh Dickins Cc: Gerald Schaefer , Vasily Gorbik , Andrew Morton , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 07/12] s390: add pte_free_defer(), with use of mmdrop_async() Message-ID: References: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> <6dd63b39-e71f-2e8b-7e0-83e02f3bcb39@google.com> <175ebec8-761-c3f-2d98-6c3bd87161c8@google.com> <20230606214037.09c6b280@thinkpad-T15> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 94AFEA000C X-Stat-Signature: gxb5d5k886dd6zpu7cj6hf4qmf73ig3o X-HE-Tag: 1686232716-454220 X-HE-Meta: U2FsdGVkX1+IirIDuKCWVWbpB6IvRnA8Qxlg7bGddlPExaXj0CidB+6qEqmAExZcM+t4pjAU3W6hW+z+fay9x/LhTbQ/A9vXXICtvLHN+Fop7SW3QQUsZox3rd8xB0QPp4DAyLlDvoNndjupivog/cFy2SGZeqxYbV3Cg47Er54qKeRUGo/z8vS4gCAUCUSKyv8fv6dfyo6AW2bFG0ksD8X8FB3YNCG7j7KwIF9TWv8dCllHyHirIAdx7FxpuGCG15/P+KDw6w98+rZjmG5gk2/5rUG9zKE+Kd2ZPxeVQTTsujEqe67zIkpab1jHvJwnMfZbs+XcgW5/7J02GYgrl0/aXK4tJZ/8fkgO88e8EWj5j7Rr05kzJUPmT021rMqW/Mc/pZGkjjMfy18sPNQAfokMhtvB/4fhQcsYWOezDcrQKEgNV7iJ7Rc3B38tLS2RdG2Vl62DpLXFfbtqK3iJkkbHSuJJe0sE764Y7eB4fZ1CUl6fMa3bzm+sFmzCdHCMDrdCdlSdCGnwaN7YIXrJAE4uThWhoroMK9RSa4lzgIQjUWkBGMIwOAVwgkVLERyoIlaGaB4X72AtOFFuHZb7AGTcpEpBXtrvPjVwExlAZOquN17qIDE+uJba02oOkaTMbqltUSPGTOZOw8trxJzIAgtW20iaofl256VWJr+88xYtoNx7/NH8aoas3iBJv2YKO8e+JyFgsJ8oZjKfb3mlIALLaViALvCAILWbTwbYdWJAfxjI3+j+HHhjK6D/V2+/HjQLwQPMjavzRb4N52K3otpE4ct9zXEQP1+jVY54qgOEEMEixL1Q5X3CB1cmIpb5ceAeSIfwBffcpdZ8jQ32esK7EYhPt9zXNGKpolnxV9NrIeEGhq9mW+IajAPVqlSLUK2LgKJDrbwSoCgacu+vO2c1sY3A8yMedrM7F82zD9clFSal2hoL7w4pgr5GfxtsF5If7CzkRd2vwbOGIL/ qPKcnr/1 VE3XJj4P0vQJUWpIyn3VOC3KbsBi+prkaxpqS3FRT0p4qXBk3X/VfUPEMgBrkB0m1OcHYhL+Qz+ePxzWxNkd+oury+KxuGgGVUMoBK0d9bwE3EpOQChgvrsqNUoK3Dmn7zoBOU2zaOidk+TDtfVJIv93GFiCs5HNMJVFGCamQiGscGFduiy5NJK0gnuWvSFInjJGBPXmJCnEnL4OF5bWCJrmNej7OqYrgjsnLbYkNQpsnBKgwn/E4QZz8RoCarK45vJ3d600ThxAmmUdUB7STUmRgvx81rNvYsgisS4LW8l9k9KwNFSTfD4WOzVc8GKTyBVk+f96F+3b5KojGaOWW3F5x8KacWRU9iHjdLpemHdWQ8Wuu1I+zymEmlA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jun 07, 2023 at 08:35:05PM -0700, Hugh Dickins wrote: > My current thinking (but may be proved wrong) is along the lines of: > why does something on its way to being freed need to be on any list > than the rcu_head list? I expect the current answer is, that the > other half is allocated, so the page won't be freed; but I hope that > we can put it back on that list once we're through with the rcu_head. I was having the same thought. It is pretty tricky, but if this was made into some core helper then PPC and S390 could both use it and PPC would get a nice upgrade to have the S390 frag re-use instead of leaking frags. Broadly we have three states: all frags free at least one frag free all frags used 'all frags free' should be returned to the allocator 'at least one frag free' should have the struct page on the mmu_struct's list 'all frags used' should be on no list. So if we go from all frags used -> at least one frag free Then we put it on the RCU then the RCU puts it on the mmu_struct list If we go from at least one frag free -> all frags free Then we take it off the mmu_struct list, put it on the RCU, and RCU frees it. Your trick to put the list_head for the mm_struct list into the frag memory looks like the right direction. So 'at least one frag free' has a single already RCU free'd frag hold the list head pointer. Thus we never use the LRU and the rcu_head is always available. The struct page itself can contain the actual free frag bitmask. I think if we split up the memory used for pt_frag_refcount we can get enough bits to keep track of everything. With only 2-4 frags we should be OK. So we track this data in the struct page: - Current RCU free TODO bitmask - if non-zero then a RCU is already triggered - Next RCU TODO bitmaks - If an RCU is already triggrered then we accumulate more free'd frags here - Current Free Bits - Only updated by the RCU callback ? We'd also need to store the mmu_struct pointer in the struct page for the RCU to be able to add/remove from the mm_struct list. I'm not sure how much of the work can be done with atomics and how much would need to rely on spinlock inside the mm_struct. It feels feasible and not so bad. :) Figure it out and test it on S390 then make power use the same common code, and we get full RCU page table freeing using a reliable rcu_head on both of these previously troublesome architectures :) Yay Jason