From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9BDA7C433E7 for ; Fri, 9 Oct 2020 05:50:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 02E6E22240 for ; Fri, 9 Oct 2020 05:50:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="TUGzzv5w" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 02E6E22240 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0F3CB6B005C; Fri, 9 Oct 2020 01:50:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0A3B86B005D; Fri, 9 Oct 2020 01:50:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EADE06B0062; Fri, 9 Oct 2020 01:50:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0153.hostedemail.com [216.40.44.153]) by kanga.kvack.org (Postfix) with ESMTP id BEB486B005C for ; Fri, 9 Oct 2020 01:50:35 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 42F5F181AE86D for ; Fri, 9 Oct 2020 05:50:35 +0000 (UTC) X-FDA: 77351312430.20.spy00_23076e3271de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id 25E20180C07A3 for ; Fri, 9 Oct 2020 05:50:35 +0000 (UTC) X-HE-Tag: spy00_23076e3271de X-Filterd-Recvd-Size: 8140 Received: from mail-oi1-f196.google.com (mail-oi1-f196.google.com [209.85.167.196]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Fri, 9 Oct 2020 05:50:34 +0000 (UTC) Received: by mail-oi1-f196.google.com with SMTP id w141so9097000oia.2 for ; Thu, 08 Oct 2020 22:50:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=5WePdr/iw6Yd6Xdt25KouG+cEhpLu/D+hcK7anznQVc=; b=TUGzzv5wRU1uM6h4CcNL4Wh1RvtrSTNdLaI7tLLgFYGazofmizEnu4w09XCeRR3NLo 87OAGGWxZnr5/Q/GOOlKdNBIxyJjKOoR2/xa/GPCmEylGVEMLv744oKJzyEAwkcesgkh 6xV+QDc9RatvLv4SxOCoSfpX3IRx2M7XQ7Oo9MsS15Af+1kS0eMc3VywzvNTpWyUpP3z cr63hDaVXf6M9gAdnDzm5kL+BEED/Ana5ChJouGGlj7zYS94pQTyAsuGO/2k/nU0YPv6 1GfhJrhPpcupmNPmF8YQBfWSAdE7+FE0lvJBJuG3puybXURrDWVub9MwShhtbxy6Kf4D qeGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=5WePdr/iw6Yd6Xdt25KouG+cEhpLu/D+hcK7anznQVc=; b=EKLpviqSB3v00+P5SI+XPVQfHEY4h4wT08rytdhNc8EdmCVIfCAcNnefXL6ZeL+GZ8 nDiCL3I1aTAeSfR0YMrB4wb3rwb5QlCvc7CHnuN988k1/lS7/hnk/Gjs46uj98D46QJs G6bN8ECg3S1SDN3LwCtjS6bwYzzw3rR/Vs6Be9TZAs9/JMq3hUtfke3bUykJak7UZ+Pg Qkue9FAUAUSl0HyqSd6akM+1QTHN+E0NQsKV12IfdfsXmFxMCvl7TBbYfjLewFi6T5rw 67cVA0DWrS2gXYHyI6LWoV8lv5lt9amdyFfAl5OASsnQH83JSpBSJ/7w/Gm6NMgo+mrb PXqA== X-Gm-Message-State: AOAM531U8JnTr46MuF1v0CJSoe4odKsiT4EkSQrobVME97hrTAVcRjdY WUJAK715QsCo8xZgAUXayHX/Cw== X-Google-Smtp-Source: ABdhPJyGod/abwBc9sNxWCaq+vdZroBiHEzDZZBsRkggm2Xfo5o5KmvSEYQcdA8A4ErKxYGMqYir5Q== X-Received: by 2002:aca:cfce:: with SMTP id f197mr1533894oig.44.1602222633666; Thu, 08 Oct 2020 22:50:33 -0700 (PDT) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id r62sm6678939oih.12.2020.10.08.22.50.31 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Thu, 08 Oct 2020 22:50:32 -0700 (PDT) Date: Thu, 8 Oct 2020 22:50:16 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Mike Kravetz cc: Hugh Dickins , Qian Cai , js1304@gmail.com, Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@lge.com, Vlastimil Babka , Christoph Hellwig , Roman Gushchin , Naoya Horiguchi , Michal Hocko , Joonsoo Kim Subject: Re: [PATCH v3 7/8] mm/mempolicy: use a standard migration target allocation callback In-Reply-To: Message-ID: References: <1592892828-1934-1-git-send-email-iamjoonsoo.kim@lge.com> <1592892828-1934-8-git-send-email-iamjoonsoo.kim@lge.com> <20200708012044.GC992@lca.pw> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 8 Oct 2020, Mike Kravetz wrote: > On 10/7/20 8:21 PM, Hugh Dickins wrote: > > > > Mike, j'accuse... your 5.7 commit c0d0381ade79 ("hugetlbfs: > > use i_mmap_rwsem for more pmd sharing synchronization"), in which > > unmap_and_move_huge_page() now passes the TTU_RMAP_LOCKED flag to > > try_to_unmap(), because it's already holding mapping->i_mmap_rwsem: > > but that is not the right lock to secure an anon_vma lookup. > > Thanks Hugh! Your analysis is correct and the code in that commit is > not correct. I was so focused on the file mapping case, I overlooked > (actually introduced) this issue for anon mappings. > > Let me verify that this indeed is the root cause. However, since > move_pages12 migrated anon hugetlb pages it certainly does look to be > the case. > > > I intended to send a patch, passing TTU_RMAP_LOCKED only in the > > !PageAnon case (and, see vma_adjust(), anon_vma lock conveniently > > nests inside i_mmap_rwsem); but then wondered if i_mmap_rwsem was > > needed in that case or not, so looked deeper into c0d0381ade79. > > > > Hmm, not even you liked it! But the worst of it looks simply > > unnecessary to me, and I hope can be deleted - but better by you > > than by me (in particular, you were trying to kill 1) and 2) birds > > with one stone, and I've always given up on understanding hugetlb's > > reservations: I suspect that side of it is irrelevant here, > > but I wouldn't pretend to be sure). > > > > How could you ever find a PageAnon page in a vma_shareable() area? > > > > It is all rather confusing (vma_shareable() depending on VM_MAYSHARE, > > whereas is_cow_mapping() on VM_SHARED and VM_MAYWRITE: they have to > > be studied together with do_mmap()'s > > vm_flags |= VM_SHARED | VM_MAYSHARE; > > if (!(file->f_mode & FMODE_WRITE)) > > vm_flags &= ~(VM_MAYWRITE | VM_SHARED); > > > > (And let me add to the confusion by admitting that, prior to 3.15's > > cda540ace6a1 "mm: get_user_pages(write,force) refuse to COW in > > shared areas", maybe it was possible to find a PageAnon there.) > > > > But my belief (best confirmed by you running your tests with a > > suitably placed BUG_ON or WARN_ON) is that you'll never find a > > PageAnon in a vma_shareable() area, so will never need try_to_unmap() > > to unshare a pagetable in the PageAnon case, so won't need i_mmap_rwsem > > for PageAnon there, and _get_hugetlb_page_mapping() (your function that > > deduces an address_space from an anon_vma) can just be deleted. > > Yes, it is confusing. Let me look into this. I would be really happy > to delete that ugly function. > > > (And in passing, may I ask what hugetlb_page_mapping_lock_write()'s > > hpage->_mapcount inc and dec are for? You comment it as a hack, > > but don't explain what needs that hack, and I don't see it.) > > We are trying to lock the mapping (mapping->i_mmap_rwsem). We know > mapping is valid, because we obtained it from page_mapping() and it > will remain valid because we have the page locked. Page needs to be > unlocked to unmap. However, we have to drop page lock in order to > acquire i_mmap_rwsem. Once we drop page lock, mapping could become > invalid. So, the code code artifically incs mapcount so that mapping > will remain valid when upmapping page. No, unless you can point me to some other hugetlbfs-does-it-differently (and I didn't see it there in that commit), raising _mapcount does not provide any such protection; but does add the possiblility of a "BUG: Bad page cache" and leak from unaccount_page_cache_page(). Earlier in the day I was trying to work out what to recommend instead, but had to turn aside to something else: I'll try again tomorrow. It's a problem I've faced before in tmpfs, keeping a hold on the mapping while page lock is dropped. Quite awkward: igrab() looks as if it's the right thing to use, but turns out to give no protection against umount. Last time around, I ended up with a stop_eviction count in the shmem inode, which shmem_evict_inode() waits on if necessary. Something like that could be done for hugetlbfs too, but I'd prefer to do it without adding extra, if there is a way. > > As mentioned above, I hope all this can be removed. If you continue to nest page lock inside i_mmap_rwsem for hugetlbfs, then I think that part of hugetlb_page_mapping_lock_write() has to remain. I'd much prefer that hugetlbfs did not reverse the usual nesting, but accept that you had reasons for doing it that way. Hugh