From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0891C4361B for ; Fri, 18 Dec 2020 13:10:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1B04223A79 for ; Fri, 18 Dec 2020 13:10:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1B04223A79 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 260216B005D; Fri, 18 Dec 2020 08:10:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 211FE6B0068; Fri, 18 Dec 2020 08:10:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1004D6B006C; Fri, 18 Dec 2020 08:10:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0002.hostedemail.com [216.40.44.2]) by kanga.kvack.org (Postfix) with ESMTP id EECEA6B005D for ; Fri, 18 Dec 2020 08:10:30 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id AA36E1803F470 for ; Fri, 18 Dec 2020 13:10:30 +0000 (UTC) X-FDA: 77606437020.12.berry04_1e04c0b2743d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id 69661180BB104 for ; Fri, 18 Dec 2020 13:08:58 +0000 (UTC) X-HE-Tag: berry04_1e04c0b2743d X-Filterd-Recvd-Size: 7314 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Fri, 18 Dec 2020 13:08:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1608296696; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0yL/uvjBh7S2FyJMu/CWGO5j4gtl6HamnL7fIlW7GCw=; b=RHXQZ7UO2dvZ75DE/iQ+J/zEsB0m71voMeTuDd7i71JSCOY+U0fzaxhupSYXpHb0fmtkIv 8YLPdDp1seXmGCHnvihnAv+t0o+2qsKnSGa0ohaY+PLXH/RZ6xuvywn5epq+jw0h2MR8L3 ODiGwr8LyzXUmb+Q1ws4a8ldPENWwFc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1608296937; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0yL/uvjBh7S2FyJMu/CWGO5j4gtl6HamnL7fIlW7GCw=; b=YZ/KMfvDJja7nas2RW211/EwVL96fgnH9b2M2ORrw3rnUgvGPPeOUkSzzK/o2GLvt142CP R+5YNkAjBcNHeh9odGnbcxNMiSdjqM3DQNQUaCz8vZvyavNL3gypLTFWww6VJZVIFsi9Ph 0OV9dzTxFiKIDDzbTbbUfQaXVspfe7M= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-147-zg79EeTiOMyMZru1giTtkA-1; Fri, 18 Dec 2020 08:04:51 -0500 X-MC-Unique: zg79EeTiOMyMZru1giTtkA-1 Received: by mail-wr1-f69.google.com with SMTP id i4so1146841wrm.21 for ; Fri, 18 Dec 2020 05:04:51 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=e/CHgMLH2UdVHVqifkFPGlm0faQZyFjgpZO1JW9Bwng=; b=oZwKwIZ3NCgyBO9Ei6vkvPd2fy6scOqdJnsDvIqwqbWd9FdA2Cz8vZkzvJvCSjm9CL lAGSbsEaFY8CoLfzidlPbobR6gfnWX83nn6GZT4rnqdbwWJvX0tXbtDB59YG/lYyjJSJ nQFo6rDqC9FpNgOLhb+2aTbC2V2SM3Y+S5uzykfwtG98Y3+BafUApn//b5ZeMFnAOEid +H5xKA5G+ar3V7cYgU2COFMavS66AnsSci3HqmpXOrXU6Y69N3l8Pd2on8FRwnkFCLQl q6IZuY/EMqGVfrI4ySN/zjiUw/d/l/PxQM5F7pWtzN1YDf9KfIeAp6iNxZC2uBfq2rSs rXfg== X-Gm-Message-State: AOAM531N86En3lqPwrsBfgeT6YuFBOgV7Q4cBts+MZGjz69+/23sUHCZ j3FsHCc0O5iantwoxuCdNYolTD8fEqQPto94HjU7+s0QQNoYS0Uv6jaIfN1/rWildSVgB9xntFX i8luetNQRxKQ= X-Received: by 2002:a5d:674c:: with SMTP id l12mr4337379wrw.399.1608296690708; Fri, 18 Dec 2020 05:04:50 -0800 (PST) X-Google-Smtp-Source: ABdhPJwvXjhuHussNqD/bNTJR7GABzVwieD5LhDI+X6Xwo5OKXtYsMIhoNZrAqgzmYzqeytY6lA7sw== X-Received: by 2002:a5d:674c:: with SMTP id l12mr4337352wrw.399.1608296690473; Fri, 18 Dec 2020 05:04:50 -0800 (PST) Received: from [192.168.3.114] (p5b0c6327.dip0.t-ipconnect.de. [91.12.99.39]) by smtp.gmail.com with ESMTPSA id k10sm12747574wrq.38.2020.12.18.05.04.49 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 18 Dec 2020 05:04:49 -0800 (PST) From: David Hildenbrand Mime-Version: 1.0 (1.0) Subject: Re: [PATCH v4 08/10] mm/gup: limit number of gup migration failures, honor failures Date: Fri, 18 Dec 2020 14:04:48 +0100 Message-Id: <1671AFC0-3D06-4C4E-934D-CB6DC0AFE4A1@redhat.com> References: Cc: Michal Hocko , LKML , linux-mm , Andrew Morton , Vlastimil Babka , David Hildenbrand , Oscar Salvador , Dan Williams , Sasha Levin , Tyler Hicks , Joonsoo Kim , mike.kravetz@oracle.com, Steven Rostedt , Ingo Molnar , Jason Gunthorpe , Peter Zijlstra , Mel Gorman , Matthew Wilcox , David Rientjes , John Hubbard , Linux Doc Mailing List , Ira Weiny , linux-kselftest@vger.kernel.org In-Reply-To: To: Pavel Tatashin X-Mailer: iPhone Mail (18B92) Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > Am 18.12.2020 um 13:43 schrieb Pavel Tatashin = : >=20 > =EF=BB=BFOn Fri, Dec 18, 2020 at 5:46 AM Michal Hocko w= rote: >>=20 >> On Thu 17-12-20 13:52:41, Pavel Tatashin wrote: >> [...] >>> +#define PINNABLE_MIGRATE_MAX 10 >>> +#define PINNABLE_ISOLATE_MAX 100 >>=20 >> Why would we need to limit the isolation retries. Those should always be >> temporary failure unless I am missing something. >=20 > Actually, during development, I was retrying isolate errors > infinitely, but during testing found a hung where when FOLL_TOUCH > without FOLL_WRITE is passed (fault in kernel without write flag), the > zero page is faulted. The isolation of the zero page was failing every > time, therefore the process was hanging. >=20 > Since then, I fixed this problem by adding FOLL_WRITE unconditionally > to FOLL_LONGTERM, but I was worried about other possible bugs that > would cause hangs, so decided to limit isolation errors. If you think > it its not necessary, I can unlimit isolate retires. >=20 >> I am not sure about the >> PINNABLE_MIGRATE_MAX either. Why do we want to limit that? migrate_pages >> already implements its retry logic why do you want to count retries on >> top of that? I do agree that the existing logic is suboptimal because >=20 > True, but again, just recently, I worked on a race bug where pages can > end up in per-cpu list after lru_add_drain_all() but before isolation, > so I think retry is necessary. >=20 >> the migration failure might be ephemeral or permanent but that should be >> IMHO addressed at migrate_pages (resp. unmap_and_move) and simply report >> failures that are permanent - e.g. any potential pre-existing long term >> pin - if that is possible at all. If not what would cause permanent >> migration failure? OOM? >=20 > Yes, OOM is the main cause for migration failures. And also a few > cases described in movable zone comment, where it is possible during > boot some pages can be allocated by memblock in movable zone due to > lack of memory resources (even if those resources were added later), > hardware page poisoning is another rare example. >=20 How is concurrent migration handled? Like memory offlining, compaction, all= oc_contig_range() while trying to pin? >> -- >> Michal Hocko >> SUSE Labs >=20