From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2177CC4361B for ; Thu, 17 Dec 2020 20:50:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A701723A1D for ; Thu, 17 Dec 2020 20:50:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A701723A1D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E160A6B0036; Thu, 17 Dec 2020 15:50:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DC61C6B005D; Thu, 17 Dec 2020 15:50:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CB6156B0068; Thu, 17 Dec 2020 15:50:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0054.hostedemail.com [216.40.44.54]) by kanga.kvack.org (Postfix) with ESMTP id B48B46B0036 for ; Thu, 17 Dec 2020 15:50:51 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 6BD4E247A for ; Thu, 17 Dec 2020 20:50:51 +0000 (UTC) X-FDA: 77603968302.09.flock92_580c4d327437 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id 54EF9180AD811 for ; Thu, 17 Dec 2020 20:50:51 +0000 (UTC) X-HE-Tag: flock92_580c4d327437 X-Filterd-Recvd-Size: 6837 Received: from mail-qv1-f49.google.com (mail-qv1-f49.google.com [209.85.219.49]) by imf21.hostedemail.com (Postfix) with ESMTP for ; Thu, 17 Dec 2020 20:50:50 +0000 (UTC) Received: by mail-qv1-f49.google.com with SMTP id u16so13979652qvl.7 for ; Thu, 17 Dec 2020 12:50:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=oqHfltIe6StadBZi2KkdxefX4rAbzFV9jepIFW1E6FY=; b=E/x8A5vMezHCFb9dLm6S0IXJq8Gyh+EPvo5BKs55Wk+POuD1PHi75XEpm0rMiyFNCO HHO1b2pGTYbUDh5Hiq7MA7CpWTSo63OwzfS273kvFBXKpSIfdWF1ZTUIIbb4brsNRlK5 9spCH84mjZBmVt00zaRF6kNachiZDpPlmFdp9n2kbSsAFwA+VdZ4CFn9vTCc6LCSi4FW ei6BeLw9IBW0givRHd5+knksA6HoU9W/sihFUzVzR961sv/rWpvQQEk9M7EOvx/H7YHm oAk/kaZ+0uNu1UMZJxnoVx5JOwNnnEzv1DfYcMHinfAUV+6TYGsF8ZEtSili6jQXujtw 4S1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=oqHfltIe6StadBZi2KkdxefX4rAbzFV9jepIFW1E6FY=; b=Ucvw5068WVKkmiYnNgmK+yUKxZdrA83KwGKbeIIhFx5JGiDJomqLRckbDqTe1DgNg2 bEoPaHMYalbusp3+yAeMDoXrgvMbwhkwf275mmjnoaMs5yaPUnGQJxpaEJ76uKDzq8R4 //qPtf85O+ztrz0vuOev6zVuevquvwAwCFXAiGp6qbFP0u6sGzllD84FmkalWgvlGMLK pn2Du3CCacA8T/pApiDRxuFzsZvmSnIFZq4osHKURt0gY+HEl8+U/MMN1xnc5vaE3s70 QJA9R69awrLohPhVT2f6KZ5PACUmHAeXzpxNBrDjxYVNo7sMZXQcD/80oizA5XN/SPIt oXzA== X-Gm-Message-State: AOAM53099tCV/JceAt+T+ryCeVeAbTs7Lzr749Utra/TKjb2BA8JkIK3 21JWPbpHa/BXDuTLrFfLSqUUAQ== X-Google-Smtp-Source: ABdhPJxnMK/i8fyKRnAicH/iIJoa4TtlmAQCGTxXlVYSxs/wReGHF4jBJbTahp3EzGoSA6ywyXHl3Q== X-Received: by 2002:a0c:ea34:: with SMTP id t20mr928397qvp.5.1608238250201; Thu, 17 Dec 2020 12:50:50 -0800 (PST) Received: from ziepe.ca (hlfxns017vw-142-162-115-133.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.162.115.133]) by smtp.gmail.com with ESMTPSA id a5sm4381546qtn.57.2020.12.17.12.50.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Dec 2020 12:50:49 -0800 (PST) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1kq0Ea-00CNWJ-NQ; Thu, 17 Dec 2020 16:50:48 -0400 Date: Thu, 17 Dec 2020 16:50:48 -0400 From: Jason Gunthorpe To: Pavel Tatashin Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, vbabka@suse.cz, mhocko@suse.com, david@redhat.com, osalvador@suse.de, dan.j.williams@intel.com, sashal@kernel.org, tyhicks@linux.microsoft.com, iamjoonsoo.kim@lge.com, mike.kravetz@oracle.com, rostedt@goodmis.org, mingo@redhat.com, peterz@infradead.org, mgorman@suse.de, willy@infradead.org, rientjes@google.com, jhubbard@nvidia.com, linux-doc@vger.kernel.org, ira.weiny@intel.com, linux-kselftest@vger.kernel.org Subject: Re: [PATCH v4 08/10] mm/gup: limit number of gup migration failures, honor failures Message-ID: <20201217205048.GL5487@ziepe.ca> References: <20201217185243.3288048-1-pasha.tatashin@soleen.com> <20201217185243.3288048-9-pasha.tatashin@soleen.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201217185243.3288048-9-pasha.tatashin@soleen.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Dec 17, 2020 at 01:52:41PM -0500, Pavel Tatashin wrote: > +/* > + * Verify that there are no unpinnable (movable) pages, if so return true. > + * Otherwise an unpinnable pages is found return false, and unpin all pages. > + */ > +static bool check_and_unpin_pages(unsigned long nr_pages, struct page **pages, > + unsigned int gup_flags) > +{ > + unsigned long i, step; > + > + for (i = 0; i < nr_pages; i += step) { > + struct page *head = compound_head(pages[i]); > + > + step = compound_nr(head) - (pages[i] - head); You can't assume that all of a compound head is in the pages array, this assumption would only work inside the page walkers if the page was found in a PMD or something. > + if (gup_flags & FOLL_PIN) { > + unpin_user_pages(pages, nr_pages); So we throw everything away? Why? That isn't how the old algorithm worked > @@ -1654,22 +1664,55 @@ static long __gup_longterm_locked(struct mm_struct *mm, > struct vm_area_struct **vmas, > unsigned int gup_flags) > { > - unsigned long flags = 0; > + int migrate_retry = 0; > + int isolate_retry = 0; > + unsigned int flags; > long rc; > > - if (gup_flags & FOLL_LONGTERM) > - flags = memalloc_pin_save(); > + if (!(gup_flags & FOLL_LONGTERM)) > + return __get_user_pages_locked(mm, start, nr_pages, pages, vmas, > + NULL, gup_flags); > > - rc = __get_user_pages_locked(mm, start, nr_pages, pages, vmas, NULL, > - gup_flags); > + /* > + * Without FOLL_WRITE fault handler may return zero page, which can > + * be in a movable zone, and also will fail to isolate during migration, > + * thus the longterm pin will fail. > + */ > + gup_flags &= FOLL_WRITE; Is &= what you mean here? |= right? Seems like we've ended up in a weird place if FOLL_LONGTERM always includes FOLL_WRITE. Putting the zero page in ZONE_MOVABLE seems like a bad idea, no? > + /* > + * Migration may fail, we retry before giving up. Also, because after > + * migration pages[] becomes outdated, we unpin and repin all pages > + * in the range, so pages array is repopulated with new values. > + * Also, because of this we cannot retry migration failures in a loop > + * without pinning/unpinnig pages. > + */ The old algorithm made continuous forward progress and only went back to the first migration point. > + for (; ; ) { while (true)? > + rc = __get_user_pages_locked(mm, start, nr_pages, pages, vmas, > + NULL, gup_flags); > + /* Return if error or if all pages are pinnable */ > + if (rc <= 0 || check_and_unpin_pages(rc, pages, gup_flags)) > + break; So we sweep the pages list twice now? > + /* Some pages are not pinnable, migrate them */ > + rc = migrate_movable_pages(rc, pages); > + > + /* > + * If there is an error, and we tried maximum number of times > + * bail out. Notice: we return an error code, and all pages are > + * unpinned > + */ > + if (rc < 0 && migrate_retry++ >= PINNABLE_MIGRATE_MAX) { > + break; > + } else if (rc > 0 && isolate_retry++ >= PINNABLE_ISOLATE_MAX) { > + rc = -EBUSY; I don't like this at all. It shouldn't be so flakey Can you do migration without the LRU? Jason