From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B88AC3DA78 for ; Tue, 17 Jan 2023 21:54:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 67ED06B0073; Tue, 17 Jan 2023 16:54:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 62ED56B0074; Tue, 17 Jan 2023 16:54:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4CF616B007B; Tue, 17 Jan 2023 16:54:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 3E1C86B0073 for ; Tue, 17 Jan 2023 16:54:46 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id F0A0C1606FF for ; Tue, 17 Jan 2023 21:54:45 +0000 (UTC) X-FDA: 80365646130.28.11E84F9 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf25.hostedemail.com (Postfix) with ESMTP id 2A3C8A0006 for ; Tue, 17 Jan 2023 21:54:43 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=XeyC2680; spf=pass (imf25.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673992484; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SgeMP37+5GkMMY5cU2F9SAhbA9yv/IZm6bw7oKKCsAE=; b=qvgBt7MFmAbJZ1yHSRv3vARDFgFrVumgc8Ngm7EJa0/guJ+wf1ScCSMguGjkhAywV4ehJ9 9YQfY/BjwzAqtrn2+4VAarlENBUYdyUi+9sBnjAI3grODjzT9OY3hDoUWEZeSXszasXBAc zZCIbPwBy7kuELd/8mJgGp/el2in6xc= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=XeyC2680; spf=pass (imf25.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673992484; a=rsa-sha256; cv=none; b=cbdNc0hQ6Pr2R19y3SWlZq2dULD8tawVdknZhlbLjaaT7QTRaF3O8l4Asoyr/LPJVK83YU Wy4WaPDGO9rCCLnOnr6fTrGIQqurk/+9yXiZ4mPQwQB66splQYslp0XIdGUpGXgXk1YfrV i6MBzRdr8fVjTaarlHzG5RYliZnLNTA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673992483; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=SgeMP37+5GkMMY5cU2F9SAhbA9yv/IZm6bw7oKKCsAE=; b=XeyC26800ozv9myXbMau+/aj71H/Vbvu2uwac5xv1n2OCnplVqUu6t06DIXm7Ry8OT3WSZ nJZL70EtIsY5ruN7s0wXwjPfBu97LE9wsifw+ugwyhZALNvsuYU3XRChArAotw5WY6Llse qQMIfuncdHM1AIqy4NGe5bahcG72Fis= Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-662-6as0JFxGOciog6rQ5fEKZg-1; Tue, 17 Jan 2023 16:54:42 -0500 X-MC-Unique: 6as0JFxGOciog6rQ5fEKZg-1 Received: by mail-qk1-f197.google.com with SMTP id br6-20020a05620a460600b007021e1a5c48so23548494qkb.6 for ; Tue, 17 Jan 2023 13:54:42 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=SgeMP37+5GkMMY5cU2F9SAhbA9yv/IZm6bw7oKKCsAE=; b=Sfav6kj6C9PFcZhphjKKpoQetnntp6Bt+fvKJL1+KAVgtalfaDE0BK4yoBsFpCOycr B075vmLDrM8UOigQVYdmlyePD/OVPhb34lJoj7FunYsHQJE3480jQLJwWijadJfTQf6+ ar8gkUFhnBrssw8td3prnIzMSIvxgSbLfD+rvsxT8UXYsqo4xXK4aWBtJG00taOaCtf2 x0oyYRnzBHX9VPovy1y7V3SMOQ/jGquo1z93CA1HWZCTDOd8iwCnWnBn8t88cOHn1CLQ 9WjizjAgpAg+Ydrv/cUWHDZWmnKosmw+zdE4czDgY4O28ROYxQDs4ezyno0GCDI6m592 Hdpw== X-Gm-Message-State: AFqh2kp16l/M78tw9qcbKgkqAUdoOb/1RFqyWQ65Tuwp7rS2jl0cgqdO hNbZXmBHjHyH6HaPsa7gZ4MTvDEanUJ4hzHVmUt6MDtBChTwr0i+zjdpqCKJ9Sd7l2ItEVdZeCm G3sNltmq5niY= X-Received: by 2002:a05:6214:2c0a:b0:532:35ef:203a with SMTP id lc10-20020a0562142c0a00b0053235ef203amr7280047qvb.31.1673992481889; Tue, 17 Jan 2023 13:54:41 -0800 (PST) X-Google-Smtp-Source: AMrXdXtlT8XS1fomDg2KJtqO7kFqcy20Zed4tyaDLwl2aePI17xv7CDEftZLFqepx1/fEDx1XInq5w== X-Received: by 2002:a05:6214:2c0a:b0:532:35ef:203a with SMTP id lc10-20020a0562142c0a00b0053235ef203amr7280029qvb.31.1673992481692; Tue, 17 Jan 2023 13:54:41 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-56-70-30-145-63.dsl.bell.ca. [70.30.145.63]) by smtp.gmail.com with ESMTPSA id bj3-20020a05620a190300b00705975d0054sm21166567qkb.19.2023.01.17.13.54.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Jan 2023 13:54:40 -0800 (PST) Date: Tue, 17 Jan 2023 16:54:38 -0500 From: Peter Xu To: James Houghton Cc: Mike Kravetz , Muchun Song , David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Zach O'Keefe , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 35/46] hugetlb: add MADV_COLLAPSE for hugetlb Message-ID: References: <20230105101844.1893104-1-jthoughton@google.com> <20230105101844.1893104-36-jthoughton@google.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 2A3C8A0006 X-Rspam-User: X-Stat-Signature: hr9xtg8dkafe6dp38tsmnortxoyic5hi X-HE-Tag: 1673992483-88712 X-HE-Meta: U2FsdGVkX19crxw3I9rw5UHLd7+zs+yHMsd5iKapPcVugVC5W5/86XIlhTTXnnraSZmbdv2Gf4AIZAhdp0dFIFHMGMlWhdSuy6OSuN2nmEY1uZZ8V1X1P0i++8YdbGYuNqBy72q1Myknp5bDTkx+8aSwAyBOQRrzr7CbaVYKcHPJtx69Cbk4OaOZ9n+GXDGVEeQRmovx7CfEdgyoCPLECxG4V93o4oAVy+oXg/mnSCy7E3fFvQOeLOg9M8sQdG6jEUoZt14vcxxY9WV+E43JFKBUU4pblO15uhwtC5OUJrHec/tHvCfi3R8CXyKji49d6yZAz4JDS7F4M6VSpjx6jJNbJWVDPFFHXj+/rGRLx/szbxMmjAD6Ki0YptHVm6n0zlli3faD2OHZV2peCltfzRchYPU50Yxh7kSn5ESNlxxgOoHcYFQ7GOJIXW2FLMhAHmG7BBIu4LslbzR+574Tk5E/ZIsQau2YbPImg2Twkl1FwERYvSSPlZmWhZDl+dT3/laswbJSvE0cStXpnzIoF/F/nMw3KQIWsnn0XQzcOYWxQ4EqShJYPfBesVHA4LDULwGgYRF3KzZ31Cix28eXwBg1/BENmfuXx0cj5fWiqof1RP3rdtyT6m4XlmqSjQT4NPfFsAw8ST+d8AbNvqW1FaqZQykH132zPsQtwQpvHNgqqegkjnxOH2ZWhQVELIYT64h2XGdKd5I08lovPdSVXylTl9R5N2WuTmE1ThM3sMk5K3iubRweBcLYwG/IOVAXoPs02rHmw4+Trs1JvGPj+w7TmCi0yupDx1B99W5ISmmq9f6RQXJkLOKAHWRQ4MuX2kxEGXp+wYnCGJuJSaPq4DBgc9wuAND4LisV+WKhAfI2GUIxPVV6Hn7PkBrv2dlcsaTZsvy5udKzW2h1l29v/KdisQoj7Zv6Sc1j53K2U8PYa3YMrvGEtFUQIlsjTEdZHSKJ7rtAJcBbKeOJ9R8 t6430UaR Bq+MOvLCg1lYllT9izD0+ZUIgVI9W8VTSOmuzgRJPwBru0Ucsh2JnMbt020q2bLI2cthcrnxjVg6S2eiqxcNGs6wIK9xQyQrRfaqUFQ0cvIE/gSlpUPqdwXLEahg6mkxn+jSCAdVU0S5G02M7lcM0FsSiDQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 17, 2023 at 01:38:24PM -0800, James Houghton wrote: > > > + if (curr < end) { > > > + /* Don't hold the VMA lock for too long. */ > > > + hugetlb_vma_unlock_write(vma); > > > + cond_resched(); > > > + hugetlb_vma_lock_write(vma); > > > > The intention is good here but IIUC this will cause vma lock to be taken > > after the i_mmap_rwsem, which can cause circular deadlocks. If to do this > > properly we'll need to also release the i_mmap_rwsem. > > Sorry if you spent a long time debugging this! I sent a reply a week > ago about this too. Oops, yes, I somehow missed that one. No worry - it's reported by lockdep. :) > > > > > However it may make the resched() logic over complicated, meanwhile for 2M > > huge pages I think this will be called for each 2M range which can be too > > fine grained, so it looks like the "cur < end" check is a bit too aggresive. > > > > The other thing is I noticed that the long period of mmu notifier > > invalidate between start -> end will (in reallife VM context) causing vcpu > > threads spinning. > > > > I _think_ it's because is_page_fault_stale() (when during a vmexit > > following a kvm page fault) always reports true during the long procedure > > of MADV_COLLAPSE if to be called upon a large range, so even if we release > > both locks here it may not tremedously on the VM migration use case because > > of the long-standing mmu notifier invalidation procedure. > > Oh... indeed. Thanks for pointing that out. > > > > > To summarize.. I think a simpler start version of hugetlb MADV_COLLAPSE can > > drop this "if" block, and let the userapp decide the step size of COLLAPSE? > > I'll drop this resched logic. Thanks Peter. Sounds good, thanks. -- Peter Xu