From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 038C5C433F5 for ; Fri, 17 Dec 2021 22:19:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6B5576B0071; Fri, 17 Dec 2021 17:18:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 63DAB6B0073; Fri, 17 Dec 2021 17:18:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B6C96B0074; Fri, 17 Dec 2021 17:18:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0235.hostedemail.com [216.40.44.235]) by kanga.kvack.org (Postfix) with ESMTP id 3C4086B0071 for ; Fri, 17 Dec 2021 17:18:50 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id E21358249980 for ; Fri, 17 Dec 2021 22:18:39 +0000 (UTC) X-FDA: 78928701558.19.5F4C7AB Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) by imf25.hostedemail.com (Postfix) with ESMTP id 3B41DA0024 for ; Fri, 17 Dec 2021 22:18:33 +0000 (UTC) Received: by mail-ed1-f48.google.com with SMTP id z29so13096311edl.7 for ; Fri, 17 Dec 2021 14:18:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=5dEiolkPpvvhLwx2+Ioc9tJdHpFervZFA65QfTIT8vY=; b=UNpQI26wi57RyV1YxwSTRqGoF8pfqkrpBA7VJxkw1ZxwY2/GrWansyaJMvj0VZ+zMR 214ZaRKvpcMDVsjVSqCzfpGWodtPb3EbTzuTia7DZiYsm/Guy70EmsRf0kdKbipswVN4 hCq+6iDJno8XBTK4ljnI1AIK0LnvRqZevuKic= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=5dEiolkPpvvhLwx2+Ioc9tJdHpFervZFA65QfTIT8vY=; b=2p2B8qFjzxl6LDZ1k/m+Gpmwdf/ntR0J65r3A66ZM+8NLHehvAk/ztiJlR5H6Q06dC CyCvcs6OIhS5NbUO02jmbUnI3puRn7PeXiT2YFxxgcOW9jyU3/UxiuFlm/KZ29JT/M7m 8434sJ0QpyeRO4wYJBNjICXkkfprSmG/Q/jQyXQ2JqKIEqbV+5TvvVFL893o903MEevB g78yNvxoICng+9VWgDfyfgePZFKgfNtRLDv17TaQ6oFCHVOqmU2xZhNHgKBwRCSvlJ6Z pZymAnimef5qL+LdeoNBE22/as0+UKNZwcuJ8VTSrkn1/F4O6zKClFI916dgOhyuQ8kY KMsg== X-Gm-Message-State: AOAM5322wDzHPK8hJDSR7u/uHO1NNCmxTZ5wtmWs2a2bfh8CBG55Xdx5 X2OxPANTewTDmr7dUCVOFqlsILM+L5/tkc+j6wM= X-Google-Smtp-Source: ABdhPJx9vqd4Df8xtewu+3Ya63uT1FDjMWofph6iz8jFPFMw3VlqyX/qwWD4IOLErxxIFtkyyTG3ew== X-Received: by 2002:a17:906:93e8:: with SMTP id yl8mr3975890ejb.207.1639779517607; Fri, 17 Dec 2021 14:18:37 -0800 (PST) Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com. [209.85.221.42]) by smtp.gmail.com with ESMTPSA id f29sm3230769ejj.209.2021.12.17.14.18.36 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 17 Dec 2021 14:18:37 -0800 (PST) Received: by mail-wr1-f42.google.com with SMTP id t26so6621086wrb.4 for ; Fri, 17 Dec 2021 14:18:36 -0800 (PST) X-Received: by 2002:adf:f54e:: with SMTP id j14mr4057992wrp.442.1639779505799; Fri, 17 Dec 2021 14:18:25 -0800 (PST) MIME-Version: 1.0 References: <20211217113049.23850-1-david@redhat.com> <20211217113049.23850-7-david@redhat.com> <9c3ba92e-9e36-75a9-9572-a08694048c1d@redhat.com> <02cf4dcf-74e8-9cbd-ffbf-8888f18a9e8a@redhat.com> In-Reply-To: <02cf4dcf-74e8-9cbd-ffbf-8888f18a9e8a@redhat.com> From: Linus Torvalds Date: Fri, 17 Dec 2021 14:18:09 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE (!hugetlb) To: David Hildenbrand Cc: Linux Kernel Mailing List , Andrew Morton , Hugh Dickins , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Linux-MM , "open list:KERNEL SELFTEST FRAMEWORK" , "open list:DOCUMENTATION" Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=UNpQI26w; dmarc=none; spf=pass (imf25.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.48 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 3B41DA0024 X-Stat-Signature: imana3js39htchwum1nkkrx7wdecj7zo X-HE-Tag: 1639779513-481726 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Dec 17, 2021 at 1:47 PM David Hildenbrand wrote: > > For now I have not heard a compelling argument why the mapcount is > dubious, I repeat: > > * mapcount can only increase due to fork() > * mapcount can decrease due to unmap / zap And to answer the "why is this dubious", let' sjust look at your actual code that I reacted to: + vmf->page = vm_normal_page(vmf->vma, vmf->address, vmf->orig_pte); + if (vmf->page && PageAnon(vmf->page) && !PageKsm(vmf->page) && + page_mapcount(vmf->page) > 1) { Note how you don't just check page_mapcount(). Why not? Because mapcount is completely immaterial if it's not a PageAnon page, so you test for that. So even when you do the mapcount read as one atomic thing, it's one atomic thing that depends on _other_ things, and all these checks are not atomic. But a PageAnon() page can actually become a swap-backed page, and as far as I can tell, your code doesn't have any locking to protect against that. So now you need not only the mmap_sem (to protect against fork), you also need the page lock (to protect against rmap changing the type of page). I don't see you taking the page lock anywhere. Maybe the page table lock ends up serializing sufficiently with the rmap code that it ends up working In the do_wp_page() path, we currently do those kinds of racy checks too, but then we do a trylock_page, and re-do them. And at any time there is any question about things, we fall back to copying - because a copy is always safe. Well, it's always safe if we have the rule that "once we've pinned things, we don't cause them to be COW again". But that "it's safe if" was exactly my (b) case. That's why I much prefer the model I'm trying to push - it's conceptually quite simple. I can literally explain mine at a conceptual level with that "break pre-existing COW, make sure no future COW" model. In contrast, I look at your page_mapcount() code, and I go "there is no conceptual rules here, and the actual implementation details look dodgy". I personally like having clear conceptual rules - as opposed to random implementation details. Linus