From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27881C001DE for ; Thu, 10 Aug 2023 20:57:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AC7496B0071; Thu, 10 Aug 2023 16:57:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A76AE6B0072; Thu, 10 Aug 2023 16:57:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 93EDB6B0074; Thu, 10 Aug 2023 16:57:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 81F006B0071 for ; Thu, 10 Aug 2023 16:57:18 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 44298C1015 for ; Thu, 10 Aug 2023 20:57:18 +0000 (UTC) X-FDA: 81109405356.08.D80C67C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf06.hostedemail.com (Postfix) with ESMTP id 105ED180014 for ; Thu, 10 Aug 2023 20:57:15 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=TOA5rMav; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf06.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691701036; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=c57XtBA66AtpZhTDt6oR9gySLUoblqFZkKBt5oz+BZQ=; b=6wSt4BbFBVJfey8bJOnrNo7ni78ef5ae0B7SygJcPPxmnBsK+siaRpJwBXx3CIsUqz8Jqs gRt6ZWm/WPLQ1QDROrKM+/jvrR8QlFnmguBdHMvk3wbk/ytKwTJUTwAkJWF40qm7e4hlVi PuWJyTsNu0b60qY58TM5R58vLO2XWGA= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=TOA5rMav; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf06.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691701036; a=rsa-sha256; cv=none; b=3jLYCB43KM/8uCrafe3H2Lgqw4yxaEuPiq+UxeY2hMJyMSLuFV0I0Ht5g8zjAWYKbJgC+q UkJVzPQfwjzetDaJEKMIeHlngH30IgYZdrujRKB3WXmiKLK01FBctwUZ8/h4EfRvXhrAzJ NVx6yfLkc+7BZ/5NX0GgkTvKMbJ3f9g= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1691701035; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=c57XtBA66AtpZhTDt6oR9gySLUoblqFZkKBt5oz+BZQ=; b=TOA5rMavQKtALE5tN7E67zeqv0cmY+sU8E8T9iA30IfW+Ri4KhYk7PjQaeqJKxylIrB9mx MVzmmQjuHU/ziFnOpGYAkTyWxPLHtH7hpuLu72vM5PvQHwvlohN2sj6evNCAFQxZAy56Ar saM0WREU0F0M5Ucv/k1ONuOmBPmlw0g= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-344-SJLHJLHYN4a79ovIHJSEoQ-1; Thu, 10 Aug 2023 16:57:13 -0400 X-MC-Unique: SJLHJLHYN4a79ovIHJSEoQ-1 Received: by mail-qk1-f199.google.com with SMTP id af79cd13be357-76c7cffef41so31663185a.1 for ; Thu, 10 Aug 2023 13:57:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691701033; x=1692305833; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=c57XtBA66AtpZhTDt6oR9gySLUoblqFZkKBt5oz+BZQ=; b=dSJJd7q1La+cgvs/ny8EiMBNp37oielsA5ue70FOk0rYQIqtzuBkBCR4dqBN0NeNV5 7S8BavVmUnHohzU86JrIv9hEngxy7n1DB1e8hAJFRt0iNyt+YEuCEGNuqt5tWN7+luG0 shrNPOfUYElKOrgi2iLIJLfQEbC/Kc2l/e0RUiwyKEw9FmMH+L45duKpfoDGu8bX+BnM cF73KdwM+UUy66gJJTcNfc07gd3oU3WCdgwcbrx8hn5q8h9v8NJzMEQZDy1lwXNcXu62 no16yz1Up5wBlMMFWj93Wzu+M/jhgAaqLlemGc36ZB+UnlY1O+oZ4B20Qu59/APCfSpi vymQ== X-Gm-Message-State: AOJu0Yzl/Q7bzmzA7qJj0esrghhaVNZ3XKMdESmT9pnmCa3OrNBr8zyn +QzadYNYeQTFf+8Qvv0gCn+L9ByXopddWB1T9wAVqp2PHjcRsgT6I7tlNoqjoIksJ4zbwG7uP/l uzz/iE0TQ3Ho= X-Received: by 2002:a05:620a:31a0:b0:767:346c:4b37 with SMTP id bi32-20020a05620a31a000b00767346c4b37mr4390973qkb.7.1691701033207; Thu, 10 Aug 2023 13:57:13 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFy2ZcN8TgfL1/LVy5I3Vmk2HDJjfFr+opHM9AF0KOIf/olK85CZeVLDi0THySO2F2YMshvqg== X-Received: by 2002:a05:620a:31a0:b0:767:346c:4b37 with SMTP id bi32-20020a05620a31a000b00767346c4b37mr4390952qkb.7.1691701032833; Thu, 10 Aug 2023 13:57:12 -0700 (PDT) Received: from x1n (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id a2-20020a05620a102200b007671678e325sm744131qkk.88.2023.08.10.13.57.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Aug 2023 13:57:12 -0700 (PDT) Date: Thu, 10 Aug 2023 16:57:11 -0400 From: Peter Xu To: David Hildenbrand Cc: Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, Andrew Morton , Jonathan Corbet , Mike Kravetz , Hugh Dickins , "Matthew Wilcox (Oracle)" , Yin Fengwei , Yang Shi , Zi Yan Subject: Re: [PATCH mm-unstable v1] mm: add a total mapcount for large folios Message-ID: References: <20230809083256.699513-1-david@redhat.com> <155bd03e-b75c-4d2d-a89d-a12271ada71b@arm.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Queue-Id: 105ED180014 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: wiorx8ys4f5kihgem53snceeyma7dc6c X-HE-Tag: 1691701035-925356 X-HE-Meta: U2FsdGVkX193q3fWjOXyna9hkoC1pmQRz4cQpYMy8/Z650I0UQMZA01/bA4cZmlFet/kUYPuxiQF1Dr6fpUJneyPq3+L2Z0iYR+3x98j41KADZJHOG9o8XLqTODZKKpKSdVGehAmnHS7HRGK4Vv3f6i3gxjjRn6RYOA0dMIQdisSoEPfqphnZazKbNoJHYcJ7HI00sQSTU/FNEysr8F5hTA/dMTIrGoUYk69q6QHHkzh9XD4xI60SM0yoGUXU2hDGXden0fb9buSKur4Ana1jEcfiYTrc8pl0vkW8yVaqJfZ7AzeFSDrBMF3rOe9YSaBM/py+nLbuWF59TwZfJUKoOrIoHRjYjmsVomGLbYJUX8O9snY2Lvh+Limd1X9iFMMBWvauIMAPWmE5rcUv0bsMmuisgp+C7tj8ogYQoupj2pxKX/xPbTtnjAzOhvFda8fQfHEW30m8A+J7Q3Fk0ppR6lm7RVj5Blx9NAkMfo7Aew3umCqz85yZtDnTZP4v5WGIywzaagqn/1ejvF8UGp5wyn2IpdI+NtIBIhpC7CtlrVgpixqaQMT1q4VSXFUDxPCzEfu8Ta+kyqJfnZAAb9flJbOuCBhwLBFZdvgAyduBgiHUKGfShMgNTRj2vr6hALeA3VSj2yJ1p/bxhM1QU987Ip+3CfBySvmKHeyXYx94HQL7EmpC5MFLN7CGGIOa3tn6MK+xFkwT5u77d/qtOLcncMUlcyzYj1acdVtY6cuqi6UIkFSlUT2Ut0/WPptnkE+UH+lT1fTBdiA4lzrN5pVYli9yT/rW3JUuxF7jTQVG6wwL0XtL9Yrh38X+dEDBkN0hBi2i5qgAZ4mIUJO2R2kDqECMqxkW39EL9YOlgqs8P/GS1X1LkE7ANFma/ZNnlSXpXePpZW9Zs9jYnQ7t3rranvV8E8yw7id5gVkDILNy6Sinz+iFRHZi6w4NltjalI9BMhe2dHAXG3X0D+jSBK gQQI72MW 3C2GgWxlg3sNkT1kSPaP/bnoGsYf7xWWGePFMWpYr/Gg/mpnCPDK1OrDxPtCLKzsv1xrqyHXuq34EcIyNVgIeNYC6Z0kZt8lHeS/6RL/F7N3ZJajx3rhfpKr1LqZXRbj0lom/uxz2M2CaTIKo23+WjbB/QXLWU+ioJixGT61DImaYhIXCHS4Zp3CZV/Tdju7y/jv7EpA0ov9HKzz2Ue7fZWOgJ71tykxARK3FW8pMVJuME67GBSk4AzkO9xVlkFT6KPuEl222528KoM2dWVUdCHBiuqb7a+dpeDwCFfttaGo21Rd82yS08l/ft2jWmfLURhKLV8mQm1WaCppOLnKIHtxvadAtovcbmaMGVkAhhY5b+fEm1IPcWUCYZ/yrkTZ/JcLWufuubBpnfFk+IhIx5+a0+3zA22sMQlhcuSus+4Ta4O+6uDWKp5wwHiI31yEivFuNrFJ/uRJBWIc7akj0hPylHwIsOzW0TF4h4PGmgdRq6XgtMEs6ni5C7cAWOjKUKOwh X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Aug 10, 2023 at 07:47:35PM +0200, David Hildenbrand wrote: > On 10.08.23 19:15, Peter Xu wrote: > > On Thu, Aug 10, 2023 at 11:48:27AM +0100, Ryan Roberts wrote: > > > > For PTE-mapped THP, it might be a bit bigger noise, although I doubt it is > > > > really significant (judging from my experience on managing PageAnonExclusive > > > > using set_bit/test_bit/clear_bit when (un)mapping anon pages). > > > > > > > > As folio_add_file_rmap_range() indicates, for PTE-mapped THPs we should be > > > > batching where possible (and Ryan is working on some more rmap batching). > > > > > > Yes, I've just posted [1] which batches the rmap removal. That would allow you > > > to convert the per-page atomic_dec() into a (usually) single per-large-folio > > > atomic_sub(). > > > > > > [1] https://lore.kernel.org/linux-mm/20230810103332.3062143-1-ryan.roberts@arm.com/ > > > > Right, that'll definitely make more sense, thanks for the link; I'd be very > > happy to read more later (finally I got some free time recently..). But > > then does it mean David's patch can be attached at the end instead of > > proposed separately and early? > > Not in my opinion. Batching rmap makes sense even without this change, and > this change makes sense even without batching. > > > > > I was asking mostly because I read it as a standalone patch first, and > > honestly I don't know the effect. It's based on not only the added atomic > > ops itself, but also the field changes. > > > > For example, this patch moves Hugh's _nr_pages_mapped into the 2nd tail > > page, I think it means for any rmap change of any small page of a huge one > > we'll need to start touching one more 64B cacheline on x86. I really have > > no idea what does it mean for especially a large SMP: see 292648ac5cf1 on > > why I had an impression of that. But I've no enough experience or clue to > > prove it a problem either, maybe would be interesting to measure the time > > needed for some pte-mapped loops? E.g., something like faulting in a thp, > > Okay, so your speculation right now is: > > 1) The change in cacheline might be problematic. > > 2) The additional atomic operation might be problematic. > > > then measure the split (by e.g. mprotect() at offset 1M on a 4K?) time it > > takes before/after this patch. > > I can certainly try getting some numbers on that. If you're aware of other > micro-benchmarks that would likely notice slower pte-mapping of THPs, please > let me know. Thanks. > > > > > When looking at this, I actually found one thing that is slightly > > confusing, not directly relevant to your patch, but regarding the reuse of > > tail page 1 on offset 24 bytes. Current it's Hugh's _nr_pages_mapped, > > and you're proposing to replace it with the total mapcount: > > > > atomic_t _nr_pages_mapped; /* 88 4 */ > > > > Now my question is.. isn't byte 24 of tail page 1 used for keeping a > > poisoned mapping? See prep_compound_tail() where it has: > > > > p->mapping = TAIL_MAPPING; > > > > While here mapping is, afaict, also using offset 24 of the tail page 1: > > > > struct address_space * mapping; /* 24 8 */ > > > > I hope I did a wrong math somewhere, though. > > > > I think your math is correct. > > prep_compound_head() is called after prep_compound_tail(), so > prep_compound_head() wins. > > In __split_huge_page_tail() there is a VM_BUG_ON_PAGE() that explains the > situation: > > /* ->mapping in first and second tail page is replaced by other uses */ > VM_BUG_ON_PAGE(tail > 2 && page_tail->mapping != TAIL_MAPPING, > page_tail); > > Thanks for raising that, I had to look into that myself. It's so confusing so I did try to document them a bit myself, then I found maybe I should just post a patch for it and I just did: https://lore.kernel.org/r/20230810204944.53471-1-peterx@redhat.com It'll conflict with yours, but I marked RFC so it's never anything urgent, but maybe that'll be helpful already for you to introduce any new fields like total_mapcounts. AFAICS if that patch was all correct (while I'm not yet sure..), you can actually fit your new total mapcount field into page 1 so even avoid the extra cacheline access. You can have a look: the trick is refcount for tail page 1 is still seems to be free on 32 bits (if that was your worry before). Then it'll be very nice if to keep Hugh's counter all in tail 1. -- Peter Xu