linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Write throughput impaired by touching dirty_ratio
@ 2015-06-19 15:16 Mark Hills
  2015-06-24  8:27 ` Vlastimil Babka
  0 siblings, 1 reply; 9+ messages in thread
From: Mark Hills @ 2015-06-19 15:16 UTC (permalink / raw)
  To: linux-mm

I noticed that any change to vm.dirty_ratio causes write throuput to 
plummet -- to around 5Mbyte/sec.

  <system bootup, kernel 4.0.5>

  # dd if=/dev/zero of=/path/to/file bs=1M

  # sysctl vm.dirty_ratio
  vm.dirty_ratio = 20
  <all ok; writes at ~150Mbyte/sec>

  # sysctl vm.dirty_ratio=20
  <all continues to be ok>

  # sysctl vm.dirty_ratio=21
  <writes drop to ~5Mbyte/sec>

  # sysctl vm.dirty_ratio=20
  <writes continue to be slow at ~5Mbyte/sec>

The test shows that return to the previous value does not restore the old 
behaviour. I return the system to usable state with a reboot.

Reads continue to be fast and are not affected.

A quick look at the code suggests differing behaviour from 
writeback_set_ratelimit on startup. And that some of the calculations (eg. 
global_dirty_limit) is badly behaved once the system has booted.

The system is an HP xw6600, running i686 kernel. This happens whether 
internal SATA HDD, SSD or external USB drive is used. I first saw this on 
kernel 4.0.4, and 4.0.5 is also affected.

It would suprise me if I'm the only person who was setting dirty_ratio.

Have others seen this behaviour? Thanks

-- 
Mark


CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_PERF_EVENTS_INTEL_UNCORE=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_HAVE_INTEL_TXT=y
CONFIG_X86_32_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_32_LAZY_GS=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
CONFIG_LOCALVERSION="-mh"
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
CONFIG_DEFAULT_HOSTNAME="darkstar"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_LEGACY_ALLOC_HWIRQ=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_IRQ_DOMAIN=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_TICK_CPU_ACCOUNTING=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_PREEMPT_RCU=y
CONFIG_SRCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_FANOUT=32
CONFIG_RCU_FANOUT_LEAF=16
CONFIG_RCU_KTHREAD_PRIO=0
CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_CGROUPS=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_NAMESPACES=y
CONFIG_SCHED_AUTOGROUP=y
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_BPF=y
CONFIG_EXPERT=y
CONFIG_SYSFS_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_ADVISE_SYSCALLS=y
CONFIG_PCI_QUIRKS=y
CONFIG_HAVE_PERF_EVENTS=y
CONFIG_PERF_EVENTS=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB=y
CONFIG_SLUB_CPU_PARTIAL=y
CONFIG_PROFILING=y
CONFIG_HAVE_OPROFILE=y
CONFIG_OPROFILE_NMI_TIMER=y
CONFIG_JUMP_LABEL=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_ARCH_USE_BUILTIN_BSWAP=y
CONFIG_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_ATTRS=y
CONFIG_HAVE_DMA_CONTIGUOUS=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_DMA_API_DEBUG=y
CONFIG_HAVE_HW_BREAKPOINT=y
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y
CONFIG_HAVE_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_HAVE_PERF_REGS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
CONFIG_HAVE_CMPXCHG_LOCAL=y
CONFIG_HAVE_CMPXCHG_DOUBLE=y
CONFIG_ARCH_WANT_IPC_PARSE_VERSION=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP_FILTER=y
CONFIG_HAVE_CC_STACKPROTECTOR=y
CONFIG_CC_STACKPROTECTOR_NONE=y
CONFIG_HAVE_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_MODULES_USE_ELF_REL=y
CONFIG_CLONE_BACKWARDS=y
CONFIG_OLD_SIGSUSPEND3=y
CONFIG_OLD_SIGACTION=y
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
CONFIG_HAVE_GENERIC_DMA_COHERENT=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
CONFIG_LBDAF=y
CONFIG_BLK_DEV_BSG=y
CONFIG_PARTITION_ADVANCED=y
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
CONFIG_EFI_PARTITION=y
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_CFQ=y
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_UNINLINE_SPIN_UNLOCK=y
CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
CONFIG_MUTEX_SPIN_ON_OWNER=y
CONFIG_RWSEM_SPIN_ON_OWNER=y
CONFIG_LOCK_SPIN_ON_OWNER=y
CONFIG_ARCH_USE_QUEUE_RWLOCK=y
CONFIG_QUEUE_RWLOCK=y
CONFIG_FREEZER=y
CONFIG_ZONE_DMA=y
CONFIG_SMP=y
CONFIG_X86_FEATURE_NAMES=y
CONFIG_NO_BOOTMEM=y
CONFIG_MCORE2=y
CONFIG_X86_INTERNODE_CACHE_SHIFT=6
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=5
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_PROCESSOR_SELECT=y
CONFIG_CPU_SUP_INTEL=y
CONFIG_DMI=y
CONFIG_NR_CPUS=8
CONFIG_SCHED_MC=y
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_X86_UP_APIC_MSI=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_INTEL=y
CONFIG_X86_MCE_THRESHOLD=y
CONFIG_X86_THERMAL_VECTOR=y
CONFIG_VM86=y
CONFIG_X86_16BIT=y
CONFIG_X86_ESPFIX32=y
CONFIG_HIGHMEM64G=y
CONFIG_VMSPLIT_3G=y
CONFIG_PAGE_OFFSET=0xC0000000
CONFIG_HIGHMEM=y
CONFIG_X86_PAE=y
CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_NEED_NODE_MEMMAP_SIZE=y
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ILLEGAL_POINTER_VALUE=0
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_HAVE_MEMORY_PRESENT=y
CONFIG_SPARSEMEM_STATIC=y
CONFIG_HAVE_MEMBLOCK=y
CONFIG_HAVE_MEMBLOCK_NODE_MAP=y
CONFIG_ARCH_DISCARD_MEMBLOCK=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK=y
CONFIG_COMPACTION=y
CONFIG_MIGRATION=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_MMU_NOTIFIER=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
CONFIG_GENERIC_EARLY_IOREMAP=y
CONFIG_X86_RESERVE_LOW=64
CONFIG_MTRR=y
CONFIG_MTRR_SANITIZER=y
CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=0
CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=1
CONFIG_X86_PAT=y
CONFIG_ARCH_USES_PG_UNCACHED=y
CONFIG_ARCH_RANDOM=y
CONFIG_SECCOMP=y
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_SCHED_HRTICK=y
CONFIG_KEXEC=y
CONFIG_PHYSICAL_START=0x1000000
CONFIG_PHYSICAL_ALIGN=0x200000
CONFIG_HOTPLUG_CPU=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
CONFIG_HIBERNATE_CALLBACKS=y
CONFIG_HIBERNATION=y
CONFIG_PM_STD_PARTITION=""
CONFIG_PM_SLEEP=y
CONFIG_PM_SLEEP_SMP=y
CONFIG_PM_AUTOSLEEP=y
CONFIG_PM=y
CONFIG_PM_DEBUG=y
CONFIG_PM_ADVANCED_DEBUG=y
CONFIG_PM_SLEEP_DEBUG=y
CONFIG_ACPI=y
CONFIG_ACPI_LEGACY_TABLES_LOOKUP=y
CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_BUTTON=m
CONFIG_ACPI_FAN=m
CONFIG_ACPI_PROCESSOR=m
CONFIG_ACPI_HOTPLUG_CPU=y
CONFIG_ACPI_PROCESSOR_AGGREGATOR=m
CONFIG_ACPI_THERMAL=m
CONFIG_X86_PM_TIMER=y
CONFIG_ACPI_CONTAINER=y
CONFIG_ACPI_HOTPLUG_IOAPIC=y
CONFIG_HAVE_ACPI_APEI=y
CONFIG_HAVE_ACPI_APEI_NMI=y
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_GOV_COMMON=y
CONFIG_CPU_FREQ_STAT=m
CONFIG_CPU_FREQ_STAT_DETAILS=y
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_X86_ACPI_CPUFREQ=m
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_CPU_IDLE_GOV_MENU=y
CONFIG_INTEL_IDLE=y
CONFIG_PCI=y
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_DOMAINS=y
CONFIG_PCIEPORTBUS=y
CONFIG_PCIEAER=y
CONFIG_PCIEASPM=y
CONFIG_PCIEASPM_DEFAULT=y
CONFIG_PCIE_PME=y
CONFIG_PCI_MSI=y
CONFIG_PCI_LABEL=y
CONFIG_ISA_DMA_API=y
CONFIG_BINFMT_ELF=y
CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
CONFIG_BINFMT_SCRIPT=y
CONFIG_HAVE_AOUT=y
CONFIG_COREDUMP=y
CONFIG_HAVE_ATOMIC_IOMAP=y
CONFIG_PMC_ATOM=y
CONFIG_NET=y
CONFIG_PACKET=m
CONFIG_UNIX=m
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_TCP_CONG_CUBIC=y
CONFIG_DEFAULT_TCP_CONG="cubic"
CONFIG_IPV6=m
CONFIG_NET_PTP_CLASSIFY=y
CONFIG_STP=m
CONFIG_BRIDGE=m
CONFIG_BRIDGE_IGMP_SNOOPING=y
CONFIG_HAVE_NET_DSA=y
CONFIG_LLC=m
CONFIG_RPS=y
CONFIG_RFS_ACCEL=y
CONFIG_XPS=y
CONFIG_NET_RX_BUSY_POLL=y
CONFIG_BQL=y
CONFIG_NET_FLOW_LIMIT=y
CONFIG_UEVENT_HELPER=y
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_DEVTMPFS=y
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_EXTRA_FIRMWARE=""
CONFIG_ALLOW_DEV_COREDUMP=y
CONFIG_GENERIC_CPU_AUTOPROBE=y
CONFIG_DMA_SHARED_BUFFER=y
CONFIG_ARCH_MIGHT_HAVE_PC_PARPORT=y
CONFIG_PNP=y
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_LOOP_MIN_COUNT=8
CONFIG_HAVE_IDE=y
CONFIG_SCSI_MOD=y
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
CONFIG_BLK_DEV_SD=y
CONFIG_BLK_DEV_SR=m
CONFIG_CHR_DEV_SG=m
CONFIG_CHR_DEV_SCH=m
CONFIG_SCSI_SCAN_ASYNC=y
CONFIG_ATA=y
CONFIG_ATA_VERBOSE_ERROR=y
CONFIG_ATA_ACPI=y
CONFIG_SATA_AHCI=y
CONFIG_SATA_AHCI_PLATFORM=y
CONFIG_ATA_SFF=y
CONFIG_ATA_BMDMA=y
CONFIG_ATA_PIIX=y
CONFIG_MD=y
CONFIG_BLK_DEV_MD=m
CONFIG_MD_LINEAR=m
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
CONFIG_MD_RAID10=m
CONFIG_MD_RAID456=m
CONFIG_BLK_DEV_DM_BUILTIN=y
CONFIG_BLK_DEV_DM=m
CONFIG_DM_DEBUG=y
CONFIG_DM_BUFIO=m
CONFIG_DM_BIO_PRISON=m
CONFIG_DM_PERSISTENT_DATA=m
CONFIG_DM_DEBUG_BLOCK_STACK_TRACING=y
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=m
CONFIG_DM_CACHE=m
CONFIG_DM_CACHE_MQ=m
CONFIG_DM_CACHE_CLEANER=m
CONFIG_DM_ERA=m
CONFIG_DM_MIRROR=m
CONFIG_DM_RAID=m
CONFIG_DM_ZERO=m
CONFIG_DM_MULTIPATH=m
CONFIG_DM_UEVENT=y
CONFIG_NETDEVICES=y
CONFIG_NET_CORE=y
CONFIG_NETCONSOLE=m
CONFIG_NETCONSOLE_DYNAMIC=y
CONFIG_NETPOLL=y
CONFIG_NET_POLL_CONTROLLER=y
CONFIG_TUN=m
CONFIG_VHOST_NET=m
CONFIG_VHOST_RING=m
CONFIG_VHOST=m
CONFIG_ETHERNET=y
CONFIG_NET_VENDOR_ADAPTEC=y
CONFIG_NET_VENDOR_AGERE=y
CONFIG_NET_VENDOR_ARC=y
CONFIG_NET_VENDOR_BROADCOM=y
CONFIG_TIGON3=m
CONFIG_NET_VENDOR_INTEL=y
CONFIG_NET_VENDOR_QUALCOMM=y
CONFIG_NET_VENDOR_ROCKER=y
CONFIG_NET_VENDOR_SAMSUNG=y
CONFIG_PHYLIB=m
CONFIG_USB_NET_DRIVERS=y
CONFIG_INPUT=y
CONFIG_INPUT_SPARSEKMAP=m
CONFIG_INPUT_MOUSEDEV=m
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
CONFIG_INPUT_EVDEV=m
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=m
CONFIG_MOUSE_PS2_CYPRESS=y
CONFIG_MOUSE_PS2_FOCALTECH=y
CONFIG_INPUT_MISC=y
CONFIG_INPUT_PCSPKR=y
CONFIG_SERIO=y
CONFIG_ARCH_MIGHT_HAVE_PC_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=m
CONFIG_SERIO_LIBPS2=y
CONFIG_TTY=y
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_VT_CONSOLE_SLEEP=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_UNIX98_PTYS=y
CONFIG_DEVMEM=y
CONFIG_DEVKMEM=y
CONFIG_HW_RANDOM=m
CONFIG_HW_RANDOM_INTEL=m
CONFIG_HW_RANDOM_GEODE=m
CONFIG_NVRAM=m
CONFIG_HPET=y
CONFIG_HPET_MMAP=y
CONFIG_HPET_MMAP_DEFAULT=y
CONFIG_DEVPORT=y
CONFIG_I2C=y
CONFIG_ACPI_I2C_OPREGION=y
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_CHARDEV=m
CONFIG_I2C_SMBUS=m
CONFIG_I2C_ALGOBIT=m
CONFIG_I2C_ALGOPCA=m
CONFIG_PPS=m
CONFIG_PTP_1588_CLOCK=m
CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y
CONFIG_POWER_SUPPLY=y
CONFIG_HWMON=y
CONFIG_SENSORS_I5K_AMB=m
CONFIG_SENSORS_CORETEMP=m
CONFIG_SENSORS_SMSC47B397=m
CONFIG_THERMAL=y
CONFIG_THERMAL_HWMON=y
CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE=y
CONFIG_THERMAL_GOV_STEP_WISE=y
CONFIG_THERMAL_GOV_USER_SPACE=y
CONFIG_X86_PKG_TEMP_THERMAL=m
CONFIG_SSB_POSSIBLE=y
CONFIG_BCMA_POSSIBLE=y
CONFIG_MFD_CORE=m
CONFIG_LPC_SCH=m
CONFIG_AGP=m
CONFIG_AGP_INTEL=m
CONFIG_INTEL_GTT=m
CONFIG_DRM=m
CONFIG_DRM_KMS_HELPER=m
CONFIG_DRM_KMS_FB_HELPER=y
CONFIG_DRM_TTM=m
CONFIG_DRM_RADEON=m
CONFIG_FB=m
CONFIG_FIRMWARE_EDID=y
CONFIG_FB_CMDLINE=y
CONFIG_FB_CFB_FILLRECT=m
CONFIG_FB_CFB_COPYAREA=m
CONFIG_FB_CFB_IMAGEBLIT=m
CONFIG_FB_MODE_HELPERS=y
CONFIG_BACKLIGHT_LCD_SUPPORT=y
CONFIG_BACKLIGHT_CLASS_DEVICE=m
CONFIG_HDMI=y
CONFIG_VGA_CONSOLE=y
CONFIG_DUMMY_CONSOLE=y
CONFIG_DUMMY_CONSOLE_COLUMNS=80
CONFIG_DUMMY_CONSOLE_ROWS=25
CONFIG_FRAMEBUFFER_CONSOLE=m
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
CONFIG_SOUND=m
CONFIG_SND=m
CONFIG_SND_TIMER=m
CONFIG_SND_PCM=m
CONFIG_SND_HWDEP=m
CONFIG_SND_RAWMIDI=m
CONFIG_SND_SEQUENCER=m
CONFIG_SND_DYNAMIC_MINORS=y
CONFIG_SND_MAX_CARDS=32
CONFIG_SND_VERBOSE_PROCFS=y
CONFIG_SND_DMA_SGBUF=y
CONFIG_SND_RAWMIDI_SEQ=m
CONFIG_SND_DRIVERS=y
CONFIG_SND_DUMMY=m
CONFIG_SND_ALOOP=m
CONFIG_SND_VIRMIDI=m
CONFIG_SND_PCI=y
CONFIG_SND_ECHO3G=m
CONFIG_SND_USB=y
CONFIG_SND_USB_AUDIO=m
CONFIG_SND_USB_CAIAQ=m
CONFIG_SND_USB_CAIAQ_INPUT=y
CONFIG_HID=y
CONFIG_HIDRAW=y
CONFIG_HID_GENERIC=y
CONFIG_USB_HID=y
CONFIG_USB_HIDDEV=y
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_SUPPORT=y
CONFIG_USB_COMMON=y
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB=y
CONFIG_USB_ANNOUNCE_NEW_DEVICES=y
CONFIG_USB_DEFAULT_PERSIST=y
CONFIG_USB_OTG=y
CONFIG_USB_OTG_FSM=m
CONFIG_USB_MON=m
CONFIG_USB_XHCI_HCD=m
CONFIG_USB_XHCI_PCI=m
CONFIG_USB_EHCI_HCD=y
CONFIG_USB_EHCI_ROOT_HUB_TT=y
CONFIG_USB_EHCI_TT_NEWSCHED=y
CONFIG_USB_EHCI_PCI=y
CONFIG_USB_UHCI_HCD=y
CONFIG_USB_STORAGE=m
CONFIG_USB_SERIAL=m
CONFIG_USB_SERIAL_FTDI_SIO=m
CONFIG_USB_PHY=y
CONFIG_EDAC=y
CONFIG_EDAC_LEGACY_SYSFS=y
CONFIG_EDAC_MM_EDAC=m
CONFIG_EDAC_I5400=m
CONFIG_RTC_LIB=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_HCTOSYS=y
CONFIG_RTC_SYSTOHC=y
CONFIG_RTC_HCTOSYS_DEVICE="rtc0"
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
CONFIG_RTC_INTF_DEV=y
CONFIG_RTC_DRV_CMOS=m
CONFIG_X86_PLATFORM_DEVICES=y
CONFIG_HP_WMI=m
CONFIG_ACPI_WMI=m
CONFIG_CLKSRC_I8253=y
CONFIG_CLKEVT_I8253=y
CONFIG_I8253_LOCK=y
CONFIG_CLKBLD_I8253=y
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y
CONFIG_IOMMU_IOVA=y
CONFIG_DMAR_TABLE=y
CONFIG_INTEL_IOMMU=y
CONFIG_INTEL_IOMMU_DEFAULT_ON=y
CONFIG_INTEL_IOMMU_FLOPPY_WA=y
CONFIG_PWM=y
CONFIG_PWM_SYSFS=y
CONFIG_RAS=y
CONFIG_DMIID=y
CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK=y
CONFIG_DCACHE_WORD_ACCESS=y
CONFIG_EXT2_FS=m
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
CONFIG_EXT3_FS=m
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT4_FS=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_JBD=m
CONFIG_JBD2=y
CONFIG_FS_MBCACHE=y
CONFIG_REISERFS_FS=m
CONFIG_REISERFS_FS_XATTR=y
CONFIG_REISERFS_FS_POSIX_ACL=y
CONFIG_XFS_FS=m
CONFIG_XFS_QUOTA=y
CONFIG_XFS_POSIX_ACL=y
CONFIG_XFS_RT=y
CONFIG_BTRFS_FS=m
CONFIG_BTRFS_FS_POSIX_ACL=y
CONFIG_BTRFS_ASSERT=y
CONFIG_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
CONFIG_FILE_LOCKING=y
CONFIG_FSNOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_FANOTIFY=y
CONFIG_QUOTACTL=y
CONFIG_FUSE_FS=m
CONFIG_ISO9660_FS=m
CONFIG_JOLIET=y
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
CONFIG_NTFS_FS=m
CONFIG_NTFS_RW=y
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_KERNFS=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_CONFIGFS_FS=m
CONFIG_MISC_FILESYSTEMS=y
CONFIG_HFS_FS=m
CONFIG_HFSPLUS_FS=m
CONFIG_HFSPLUS_FS_POSIX_ACL=y
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=m
CONFIG_NFS_V2=m
CONFIG_NFS_V3=m
CONFIG_NFSD=m
CONFIG_NFSD_V3=y
CONFIG_GRACE_PERIOD=m
CONFIG_LOCKD=m
CONFIG_LOCKD_V4=y
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=m
CONFIG_CIFS=m
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=m
CONFIG_NLS_CODEPAGE_850=m
CONFIG_NLS_ASCII=m
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_ISO8859_15=m
CONFIG_NLS_UTF8=m
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
CONFIG_DYNAMIC_DEBUG=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_REDUCED=y
CONFIG_FRAME_WARN=1024
CONFIG_STRIP_ASM_SYMS=y
CONFIG_DEBUG_FS=y
CONFIG_ARCH_WANT_FRAME_POINTERS=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
CONFIG_DEBUG_KERNEL=y
CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_HAVE_DEBUG_STACKOVERFLOW=y
CONFIG_HAVE_ARCH_KMEMCHECK=y
CONFIG_PANIC_ON_OOPS_VALUE=0
CONFIG_PANIC_TIMEOUT=0
CONFIG_STACKTRACE=y
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_RCU_CPU_STALL_TIMEOUT=60
CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS=y
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_C_RECORDMCOUNT=y
CONFIG_TRACING_SUPPORT=y
CONFIG_HAVE_ARCH_KGDB=y
CONFIG_DEBUG_RODATA=y
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
CONFIG_IO_DELAY_TYPE_0X80=0
CONFIG_IO_DELAY_TYPE_0XED=1
CONFIG_IO_DELAY_TYPE_UDELAY=2
CONFIG_IO_DELAY_TYPE_NONE=3
CONFIG_IO_DELAY_0X80=y
CONFIG_DEFAULT_IO_DELAY_TYPE=0
CONFIG_OPTIMIZE_INLINING=y
CONFIG_DEFAULT_SECURITY_DAC=y
CONFIG_DEFAULT_SECURITY=""
CONFIG_XOR_BLOCKS=m
CONFIG_ASYNC_CORE=m
CONFIG_ASYNC_MEMCPY=m
CONFIG_ASYNC_XOR=m
CONFIG_ASYNC_PQ=m
CONFIG_ASYNC_RAID6_RECOV=m
CONFIG_CRYPTO=y
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_BLKCIPHER=m
CONFIG_CRYPTO_BLKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_PCOMP2=y
CONFIG_CRYPTO_MANAGER=m
CONFIG_CRYPTO_MANAGER2=y
CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y
CONFIG_CRYPTO_WORKQUEUE=y
CONFIG_CRYPTO_CBC=m
CONFIG_CRYPTO_ECB=m
CONFIG_CRYPTO_CMAC=m
CONFIG_CRYPTO_HMAC=m
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=m
CONFIG_CRYPTO_SHA1=m
CONFIG_CRYPTO_SHA256=m
CONFIG_CRYPTO_SHA512=m
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_ARC4=m
CONFIG_CRYPTO_DES=m
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_IRQFD=y
CONFIG_HAVE_KVM_IRQ_ROUTING=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_APIC_ARCHITECTURE=y
CONFIG_KVM_MMIO=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_HAVE_KVM_MSI=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_KVM_VFIO=y
CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=m
CONFIG_KVM_INTEL=m
CONFIG_KVM_DEVICE_ASSIGNMENT=y
CONFIG_RAID6_PQ=m
CONFIG_BITREVERSE=y
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
CONFIG_GENERIC_NET_UTILS=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_PCI_IOMAP=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_IO=y
CONFIG_ARCH_HAS_FAST_MULTIPLIER=y
CONFIG_CRC16=y
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
CONFIG_CRC32_SLICEBY8=y
CONFIG_LIBCRC32C=m
CONFIG_ZLIB_INFLATE=m
CONFIG_ZLIB_DEFLATE=m
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_INTERVAL_TREE=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT_MAP=y
CONFIG_HAS_DMA=y
CONFIG_CPU_RMAP=y
CONFIG_DQL=y
CONFIG_GLOB=y
CONFIG_NLATTR=y
CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE=y
CONFIG_FONT_SUPPORT=m
CONFIG_FONTS=y
CONFIG_FONT_8x16=y
CONFIG_FONT_AUTOSELECT=y
CONFIG_ARCH_HAS_SG_CHAIN=y

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Write throughput impaired by touching dirty_ratio
  2015-06-19 15:16 Write throughput impaired by touching dirty_ratio Mark Hills
@ 2015-06-24  8:27 ` Vlastimil Babka
  2015-06-24  9:16   ` Michal Hocko
  2015-06-24 22:26   ` Mark Hills
  0 siblings, 2 replies; 9+ messages in thread
From: Vlastimil Babka @ 2015-06-24  8:27 UTC (permalink / raw)
  To: Mark Hills; +Cc: linux-mm, Michal Hocko, Mel Gorman, Johannes Weiner, LKML

[add some CC's]

On 06/19/2015 05:16 PM, Mark Hills wrote:
> I noticed that any change to vm.dirty_ratio causes write throuput to 
> plummet -- to around 5Mbyte/sec.
> 
>   <system bootup, kernel 4.0.5>
> 
>   # dd if=/dev/zero of=/path/to/file bs=1M
> 
>   # sysctl vm.dirty_ratio
>   vm.dirty_ratio = 20
>   <all ok; writes at ~150Mbyte/sec>
> 
>   # sysctl vm.dirty_ratio=20
>   <all continues to be ok>
> 
>   # sysctl vm.dirty_ratio=21
>   <writes drop to ~5Mbyte/sec>
> 
>   # sysctl vm.dirty_ratio=20
>   <writes continue to be slow at ~5Mbyte/sec>
> 
> The test shows that return to the previous value does not restore the old 
> behaviour. I return the system to usable state with a reboot.
> 
> Reads continue to be fast and are not affected.
> 
> A quick look at the code suggests differing behaviour from 
> writeback_set_ratelimit on startup. And that some of the calculations (eg. 
> global_dirty_limit) is badly behaved once the system has booted.

Hmm, so the only thing that dirty_ratio_handler() changes except the
vm_dirty_ratio itself, is ratelimit_pages through writeback_set_ratelimit(). So
I assume the problem is with ratelimit_pages. There's num_online_cpus() used in
the calculation, which I think would differ between the initial system state
(where we are called by page_writeback_init()) and later when all CPU's are
onlined. But I don't see CPU onlining code updating the limit (unlike memory
hotplug which does that), so that's suspicious.

Another suspicious thing is that global_dirty_limits() looks at current
process's flag. It seems odd to me that the process calling the sysctl would
determine a value global to the system.

If you are brave enough (and have kernel configured properly and with
debuginfo), you can verify how value of ratelimit_pages variable changes on the
live system, using the crash tool. Just start it, and if everything works, you
can inspect the live system. It's a bit complicated since there are two static
variables called "ratelimit_pages" in the kernel so we can't print them easily
(or I don't know how). First we have to get the variable address:

crash> sym ratelimit_pages
ffffffff81e67200 (d) ratelimit_pages
ffffffff81ef4638 (d) ratelimit_pages

One will be absurdly high (probably less on your 32bit) so it's not the one we want:

crash> rd -d ffffffff81ef4638 1
ffffffff81ef4638:    4294967328768

The second will have a smaller value:
(my system after boot with dirty ratio = 20)
crash> rd -d ffffffff81e67200 1
ffffffff81e67200:             1577

(after changing to 21)
crash> rd -d ffffffff81e67200 1
ffffffff81e67200:             1570

(after changing back to 20)
crash> rd -d ffffffff81e67200 1
ffffffff81e67200:             1496

So yes, it does differ but not drastically. A difference between 1 and 8 online
CPU's would look differently I think. So my theory above is questionable. But
you might try what it looks like on your system...

> 
> The system is an HP xw6600, running i686 kernel. This happens whether 
> internal SATA HDD, SSD or external USB drive is used. I first saw this on 
> kernel 4.0.4, and 4.0.5 is also affected.

So what was the last version where you did change the dirty ratio and it worked
fine?

> 
> It would suprise me if I'm the only person who was setting dirty_ratio.
> 
> Have others seen this behaviour? Thanks
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Write throughput impaired by touching dirty_ratio
  2015-06-24  8:27 ` Vlastimil Babka
@ 2015-06-24  9:16   ` Michal Hocko
  2015-06-24 22:26   ` Mark Hills
  1 sibling, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2015-06-24  9:16 UTC (permalink / raw)
  To: Mark Hills; +Cc: Vlastimil Babka, linux-mm, Mel Gorman, Johannes Weiner, LKML

On Wed 24-06-15 10:27:36, Vlastimil Babka wrote:
> [add some CC's]
> 
> On 06/19/2015 05:16 PM, Mark Hills wrote:
[...]
> > The system is an HP xw6600, running i686 kernel. This happens whether 

How many CPUs does the machine have?

> > internal SATA HDD, SSD or external USB drive is used. I first saw this on 
> > kernel 4.0.4, and 4.0.5 is also affected.

OK so this is 32b kernel which might be the most important part. What is
the value of /proc/sys/vm/highmem_is_dirtyable? Also how does your low
mem vs higmem look when you are setting the ratio (cat /proc/zoneinfo)?

It seems Vlastimil is right and a bogus ratelimit_pages is calculated
and your writers are throttled every few pages.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Write throughput impaired by touching dirty_ratio
  2015-06-24  8:27 ` Vlastimil Babka
  2015-06-24  9:16   ` Michal Hocko
@ 2015-06-24 22:26   ` Mark Hills
  2015-06-25  9:20     ` Michal Hocko
  2015-06-25  9:30     ` Vlastimil Babka
  1 sibling, 2 replies; 9+ messages in thread
From: Mark Hills @ 2015-06-24 22:26 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: linux-mm, Michal Hocko, Mel Gorman, Johannes Weiner, LKML

On Wed, 24 Jun 2015, Vlastimil Babka wrote:

> [add some CC's]
> 
> On 06/19/2015 05:16 PM, Mark Hills wrote:
> > I noticed that any change to vm.dirty_ratio causes write throuput to 
> > plummet -- to around 5Mbyte/sec.
> > 
> >   <system bootup, kernel 4.0.5>
> > 
> >   # dd if=/dev/zero of=/path/to/file bs=1M
> > 
> >   # sysctl vm.dirty_ratio
> >   vm.dirty_ratio = 20
> >   <all ok; writes at ~150Mbyte/sec>
> > 
> >   # sysctl vm.dirty_ratio=20
> >   <all continues to be ok>
> > 
> >   # sysctl vm.dirty_ratio=21
> >   <writes drop to ~5Mbyte/sec>
> > 
> >   # sysctl vm.dirty_ratio=20
> >   <writes continue to be slow at ~5Mbyte/sec>
> > 
> > The test shows that return to the previous value does not restore the old 
> > behaviour. I return the system to usable state with a reboot.
> > 
> > Reads continue to be fast and are not affected.
> > 
> > A quick look at the code suggests differing behaviour from 
> > writeback_set_ratelimit on startup. And that some of the calculations (eg. 
> > global_dirty_limit) is badly behaved once the system has booted.
> 
> Hmm, so the only thing that dirty_ratio_handler() changes except the
> vm_dirty_ratio itself, is ratelimit_pages through writeback_set_ratelimit(). So
> I assume the problem is with ratelimit_pages. There's num_online_cpus() used in
> the calculation, which I think would differ between the initial system state
> (where we are called by page_writeback_init()) and later when all CPU's are
> onlined. But I don't see CPU onlining code updating the limit (unlike memory
> hotplug which does that), so that's suspicious.
> 
> Another suspicious thing is that global_dirty_limits() looks at current
> process's flag. It seems odd to me that the process calling the sysctl would
> determine a value global to the system.

Yes, I also spotted this. The fragment of code is:

  	tsk = current;
	if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) {
		background += background / 4;
		dirty += dirty / 4;
	}

It seems to imply the code was not always used from the /proc interface. 
It's relevant in a moment...

> If you are brave enough (and have kernel configured properly and with
> debuginfo),

I'm brave... :) I hadn't seen this tool before, thanks for introducing me 
to it, I will use it more now, I'm sure.

> you can verify how value of ratelimit_pages variable changes on the live 
> system, using the crash tool. Just start it, and if everything works, 
> you can inspect the live system. It's a bit complicated since there are 
> two static variables called "ratelimit_pages" in the kernel so we can't 
> print them easily (or I don't know how). First we have to get the 
> variable address:
> 
> crash> sym ratelimit_pages
> ffffffff81e67200 (d) ratelimit_pages
> ffffffff81ef4638 (d) ratelimit_pages
> 
> One will be absurdly high (probably less on your 32bit) so it's not the one we want:
> 
> crash> rd -d ffffffff81ef4638 1
> ffffffff81ef4638:    4294967328768
> 
> The second will have a smaller value:
> (my system after boot with dirty ratio = 20)
> crash> rd -d ffffffff81e67200 1
> ffffffff81e67200:             1577
> 
> (after changing to 21)
> crash> rd -d ffffffff81e67200 1
> ffffffff81e67200:             1570
> 
> (after changing back to 20)
> crash> rd -d ffffffff81e67200 1
> ffffffff81e67200:             1496

In my case there's only one such symbol (perhaps because this kernel 
config is quite slimmed down?)

  crash> sym ratelimit_pages
  c148b618 (d) ratelimit_pages

  (bootup with dirty_ratio 20)
  crash> rd -d ratelimit_pages
  c148b618:            78 

  (after changing to 21)
  crash> rd -d ratelimit_pages
  c148b618:            16 

  (after changing back to 20)
  crash> rd -d ratelimit_pages
  c148b618:            16 

Compared to your system, even the bootup value seems pretty low.

So I am new to this code, but I took a look. Seems like we're basically 
hitting the lower bound of 16.

  void writeback_set_ratelimit(void)
  {
	unsigned long background_thresh;
	unsigned long dirty_thresh;
	global_dirty_limits(&background_thresh, &dirty_thresh);
	global_dirty_limit = dirty_thresh;
	ratelimit_pages = dirty_thresh / (num_online_cpus() * 32);
	if (ratelimit_pages < 16)
		ratelimit_pages = 16;
  }

>From this code, we don't have dirty_thresh preserved, but we do have 
global_dirty_limit:

  crash> rd -d global_dirty_limit
  c1545080:             0 

And if that is zero then:

  ratelimit_pages = 0 / (num_online_cpus() * 32)
                  = 0

So it seems like this is the path to follow.

The function global_dirty_limits() produces the value for dirty_thresh 
and, aside from a potential multiply by 0.25 (the 'task dependent' 
mentioned before) the value is derived as:

  if (vm_dirty_bytes)
	dirty = DIV_ROUND_UP(vm_dirty_bytes, PAGE_SIZE);
  else
	dirty = (vm_dirty_ratio * available_memory) / 100;

I checked the vm_dirty_bytes codepath and that works:

  (vm.dirty_bytes = 1048576000, 1000Mb)
  crash> rd -d ratelimit_pages
  c148b618:           1000 

Therefore it's the 'else' case, and this points to available_memory is 
zero, or near it (in my case < 5). This value is the direct result of 
global_dirtyable_memory(), which I've annotated with some values:

  static unsigned long global_dirtyable_memory(void)
  {
	unsigned long x;

	x = global_page_state(NR_FREE_PAGES);      //   2648091
	x -= min(x, dirty_balance_reserve);        //  - 175522

	x += global_page_state(NR_INACTIVE_FILE);  //  + 156369
	x += global_page_state(NR_ACTIVE_FILE);    //  +   3475  = 2632413

	if (!vm_highmem_is_dirtyable)
		x -= highmem_dirtyable_memory(x);

	return x + 1;	/* Ensure that we never return 0 */
  }

If I'm correct here, global includes the highmem stuff, and it implies 
that highmem_dirtyable_memory() is returning a value only slightly less 
than or equal to the sum of the others.

To test, I flipped the vm_highmem_is_dirtyable (which had no effect until 
I forced it to re-evaluate ratelimit_pages):

  $ echo 1 > /proc/sys/vm/highmem_is_dirtyable
  $ echo 21 > /proc/sys/vm/dirty_ratio
  $ echo 20 > /proc/sys/vm/dirty_ratio 

  crash> rd -d ratelimit_pages
  c148b618:          2186 

The value is now healthy, more so than even the value we started 
with on bootup.

My questions and observations are:

* What does highmem_is_dirtyable actually mean, and should it really 
  default to 1?

  Is it actually a misnomer? Since it's only used in 
  global_dirtyable_memory(), it doesn't actually prevent dirtying of 
  highmem, it just attempts to place a limit that corresponds to the 
  amount of non-highmem.I have limited understanding at the moment, but 
  that would be something different.

* That the codepaths around setting highmem_is_dirtyable from /proc
  is broken; it also needs to make a call to writeback_set_ratelimit()

* Even with highmem_is_dirtyable=1, there's still a sizeable difference 
  between the value on bootup (78) and the evaluation once booted (2186). 
  This goes the wrong direction and is far too big a difference to be 
  solely nr_cpus_online() switching from 1 to 8.

The machine is 32-bit with 12GiB of RAM.

For info, I posted a typical zoneinfo, below.

> So yes, it does differ but not drastically. A difference between 1 and 8 
> online CPU's would look differently I think. So my theory above is 
> questionable. But you might try what it looks like on your system...
> 
> > 
> > The system is an HP xw6600, running i686 kernel. This happens whether 
> > internal SATA HDD, SSD or external USB drive is used. I first saw this on 
> > kernel 4.0.4, and 4.0.5 is also affected.
> 
> So what was the last version where you did change the dirty ratio and it worked
> fine?

Sorry, I don't know when it broke. I don't immediately have access to an 
old kernel to test, but I could do that if necessary.
 
> > It would suprise me if I'm the only person who was setting dirty_ratio.
> > 
> > Have others seen this behaviour? Thanks
> > 
> 

Thanks, I hope you find this useful.

-- 
Mark


Node 0, zone      DMA
  pages free     1566
        min      196
        low      245
        high     294
        scanned  0
        spanned  4095
        present  3989
        managed  3970
    nr_free_pages 1566
    nr_alloc_batch 49
    nr_inactive_anon 0
    nr_active_anon 0
    nr_inactive_file 163
    nr_active_file 1129
    nr_unevictable 0
    nr_mlock     0
    nr_anon_pages 0
    nr_mapped    0
    nr_file_pages 1292
    nr_dirty     0
    nr_writeback 0
    nr_slab_reclaimable 842
    nr_slab_unreclaimable 162
    nr_page_table_pages 17
    nr_kernel_stack 4
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 0
    nr_vmscan_immediate_reclaim 0
    nr_writeback_temp 0
    nr_isolated_anon 0
    nr_isolated_file 0
    nr_shmem     0
    nr_dirtied   661
    nr_written   661
    nr_pages_scanned 0
    workingset_refault 0
    workingset_activate 0
    workingset_nodereclaim 0
    nr_anon_transparent_hugepages 0
    nr_free_cma  0
        protection: (0, 377, 12165, 12165)
  pagesets
    cpu: 0
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 8
    cpu: 1
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 8
    cpu: 2
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 8
    cpu: 3
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 8
    cpu: 4
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 8
    cpu: 5
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 8
    cpu: 6
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 8
    cpu: 7
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 8
  all_unreclaimable: 0
  start_pfn:         1
  inactive_ratio:    1
Node 0, zone   Normal
  pages free     37336
        min      4789
        low      5986
        high     7183
        scanned  0
        spanned  123902
        present  123902
        managed  96773
    nr_free_pages 37336
    nr_alloc_batch 331
    nr_inactive_anon 0
    nr_active_anon 0
    nr_inactive_file 4016
    nr_active_file 26672
    nr_unevictable 0
    nr_mlock     0
    nr_anon_pages 0
    nr_mapped    1
    nr_file_pages 30684
    nr_dirty     4
    nr_writeback 0
    nr_slab_reclaimable 19865
    nr_slab_unreclaimable 4673
    nr_page_table_pages 1027
    nr_kernel_stack 281
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 0
    nr_vmscan_immediate_reclaim 0
    nr_writeback_temp 0
    nr_isolated_anon 0
    nr_isolated_file 0
    nr_shmem     0
    nr_dirtied   14354
    nr_written   21672
    nr_pages_scanned 0
    workingset_refault 0
    workingset_activate 0
    workingset_nodereclaim 0
    nr_anon_transparent_hugepages 0
    nr_free_cma  0
        protection: (0, 0, 94302, 94302)
  pagesets
    cpu: 0
              count: 78
              high:  186
              batch: 31
  vm stats threshold: 24
    cpu: 1
              count: 140
              high:  186
              batch: 31
  vm stats threshold: 24
    cpu: 2
              count: 116
              high:  186
              batch: 31
  vm stats threshold: 24
    cpu: 3
              count: 100
              high:  186
              batch: 31
  vm stats threshold: 24
    cpu: 4
              count: 70
              high:  186
              batch: 31
  vm stats threshold: 24
    cpu: 5
              count: 82
              high:  186
              batch: 31
  vm stats threshold: 24
    cpu: 6
              count: 144
              high:  186
              batch: 31
  vm stats threshold: 24
    cpu: 7
              count: 59
              high:  186
              batch: 31
  vm stats threshold: 24
  all_unreclaimable: 0
  start_pfn:         4096
  inactive_ratio:    1
Node 0, zone  HighMem
  pages free     2536526
        min      128
        low      37501
        high     74874
        scanned  0
        spanned  3214338
        present  3017668
        managed  3017668
    nr_free_pages 2536526
    nr_alloc_batch 10793
    nr_inactive_anon 2118
    nr_active_anon 118021
    nr_inactive_file 80138
    nr_active_file 273523
    nr_unevictable 3475
    nr_mlock     3475
    nr_anon_pages 119672
    nr_mapped    48158
    nr_file_pages 357567
    nr_dirty     0
    nr_writeback 0
    nr_slab_reclaimable 0
    nr_slab_unreclaimable 0
    nr_page_table_pages 0
    nr_kernel_stack 0
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 0
    nr_vmscan_immediate_reclaim 0
    nr_writeback_temp 0
    nr_isolated_anon 0
    nr_isolated_file 0
    nr_shmem     2766
    nr_dirtied   1882996
    nr_written   1695681
    nr_pages_scanned 0
    workingset_refault 0
    workingset_activate 0
    workingset_nodereclaim 0
    nr_anon_transparent_hugepages 151
    nr_free_cma  0
        protection: (0, 0, 0, 0)
  pagesets
    cpu: 0
              count: 171
              high:  186
              batch: 31
  vm stats threshold: 64
    cpu: 1
              count: 80
              high:  186
              batch: 31
  vm stats threshold: 64
    cpu: 2
              count: 91
              high:  186
              batch: 31
  vm stats threshold: 64
    cpu: 3
              count: 173
              high:  186
              batch: 31
  vm stats threshold: 64
    cpu: 4
              count: 114
              high:  186
              batch: 31
  vm stats threshold: 64
    cpu: 5
              count: 159
              high:  186
              batch: 31
  vm stats threshold: 64
    cpu: 6
              count: 130
              high:  186
              batch: 31
  vm stats threshold: 64
    cpu: 7
              count: 62
              high:  186
              batch: 31
  vm stats threshold: 64
  all_unreclaimable: 0
  start_pfn:         127998
  inactive_ratio:    10

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Write throughput impaired by touching dirty_ratio
  2015-06-24 22:26   ` Mark Hills
@ 2015-06-25  9:20     ` Michal Hocko
  2015-06-25 12:56       ` Michal Hocko
  2015-06-25 21:45       ` Mark Hills
  2015-06-25  9:30     ` Vlastimil Babka
  1 sibling, 2 replies; 9+ messages in thread
From: Michal Hocko @ 2015-06-25  9:20 UTC (permalink / raw)
  To: Mark Hills; +Cc: Vlastimil Babka, linux-mm, Mel Gorman, Johannes Weiner, LKML

On Wed 24-06-15 23:26:49, Mark Hills wrote:
> On Wed, 24 Jun 2015, Vlastimil Babka wrote:
[...]
> > Another suspicious thing is that global_dirty_limits() looks at current
> > process's flag. It seems odd to me that the process calling the sysctl would
> > determine a value global to the system.
> 
> Yes, I also spotted this. The fragment of code is:
> 
>   	tsk = current;
> 	if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) {
> 		background += background / 4;
> 		dirty += dirty / 4;
> 	}

Yes this might be confusing for the proc path but it shouldn't be hit
there because PF_LESS_THROTTLE is currently only used from the nfs code
(to tell the throttling code to not throttle it because it is freeing
memory) and you usually do not set proc values from th RT context.  So
this shouldn't matter.

[...]
>   crash> sym ratelimit_pages
>   c148b618 (d) ratelimit_pages
> 
>   (bootup with dirty_ratio 20)
>   crash> rd -d ratelimit_pages
>   c148b618:            78 
> 
>   (after changing to 21)
>   crash> rd -d ratelimit_pages
>   c148b618:            16 
> 
>   (after changing back to 20)
>   crash> rd -d ratelimit_pages
>   c148b618:            16 
> 
> Compared to your system, even the bootup value seems pretty low.
> 
> So I am new to this code, but I took a look. Seems like we're basically 
> hitting the lower bound of 16.

Yes this is really low and as suspected your writers are throttled every
few pages.

> 
>   void writeback_set_ratelimit(void)
>   {
> 	unsigned long background_thresh;
> 	unsigned long dirty_thresh;
> 	global_dirty_limits(&background_thresh, &dirty_thresh);
> 	global_dirty_limit = dirty_thresh;
> 	ratelimit_pages = dirty_thresh / (num_online_cpus() * 32);
> 	if (ratelimit_pages < 16)
> 		ratelimit_pages = 16;
>   }
> 
> From this code, we don't have dirty_thresh preserved, but we do have 
> global_dirty_limit:
> 
>   crash> rd -d global_dirty_limit
>   c1545080:             0 

This is really bad.

> And if that is zero then:
> 
>   ratelimit_pages = 0 / (num_online_cpus() * 32)
>                   = 0
> 
> So it seems like this is the path to follow.
> 
> The function global_dirty_limits() produces the value for dirty_thresh 
> and, aside from a potential multiply by 0.25 (the 'task dependent' 
> mentioned before) the value is derived as:
> 
>   if (vm_dirty_bytes)
> 	dirty = DIV_ROUND_UP(vm_dirty_bytes, PAGE_SIZE);
>   else
> 	dirty = (vm_dirty_ratio * available_memory) / 100;
> 
> I checked the vm_dirty_bytes codepath and that works:
> 
>   (vm.dirty_bytes = 1048576000, 1000Mb)
>   crash> rd -d ratelimit_pages
>   c148b618:           1000 
> 
> Therefore it's the 'else' case, and this points to available_memory is 
> zero, or near it (in my case < 5).

OK so it looks like you do not basically have any dirtyable memory.
Which smells like a highmem issue.

> This value is the direct result of 
> global_dirtyable_memory(), which I've annotated with some values:
> 
>   static unsigned long global_dirtyable_memory(void)
>   {
> 	unsigned long x;
> 
> 	x = global_page_state(NR_FREE_PAGES);      //   2648091
> 	x -= min(x, dirty_balance_reserve);        //  - 175522
> 
> 	x += global_page_state(NR_INACTIVE_FILE);  //  + 156369
> 	x += global_page_state(NR_ACTIVE_FILE);    //  +   3475  = 2632413
> 
> 	if (!vm_highmem_is_dirtyable)
> 		x -= highmem_dirtyable_memory(x);
> 
> 	return x + 1;	/* Ensure that we never return 0 */
>   }
> 
> If I'm correct here, global includes the highmem stuff, and it implies 
> that highmem_dirtyable_memory() is returning a value only slightly less 
> than or equal to the sum of the others.

Exactly!

> To test, I flipped the vm_highmem_is_dirtyable (which had no effect until 
> I forced it to re-evaluate ratelimit_pages):
> 
>   $ echo 1 > /proc/sys/vm/highmem_is_dirtyable
>   $ echo 21 > /proc/sys/vm/dirty_ratio
>   $ echo 20 > /proc/sys/vm/dirty_ratio 
> 
>   crash> rd -d ratelimit_pages
>   c148b618:          2186 
> 
> The value is now healthy, more so than even the value we started 
> with on bootup.

>From your /proc/zoneinfo:
> Node 0, zone  HighMem
>   pages free     2536526
>         min      128
>         low      37501
>         high     74874
>         scanned  0
>         spanned  3214338
>         present  3017668
>         managed  3017668

You have 11G of highmem. Which is a lot wrt. the the lowmem

> Node 0, zone   Normal
>   pages free     37336
>         min      4789
>         low      5986
>         high     7183
>         scanned  0
>         spanned  123902
>         present  123902
>         managed  96773

which is only 378M! So something had to eat portion of the lowmem.
I think it is a bad idea to use 32b kernel with that amount of memory in
general. The lowmem pressure is even worse by the fact that something is
eating already precious amount of lowmem. What is the reason to stick
with 32b kernel anyway?

> My questions and observations are:
> 
> * What does highmem_is_dirtyable actually mean, and should it really 
>   default to 1?

It says whether highmem should be considered dirtyable. It is not by
default. See more for motivation in 195cf453d2c3 ("mm/page-writeback:
highmem_is_dirtyable option").

>   Is it actually a misnomer? Since it's only used in 
>   global_dirtyable_memory(), it doesn't actually prevent dirtying of 
>   highmem, it just attempts to place a limit that corresponds to the 
>   amount of non-highmem.I have limited understanding at the moment, but 
>   that would be something different.
> 
> * That the codepaths around setting highmem_is_dirtyable from /proc
>   is broken; it also needs to make a call to writeback_set_ratelimit()

That should be probably fixed.

> * Even with highmem_is_dirtyable=1, there's still a sizeable difference 
>   between the value on bootup (78) and the evaluation once booted (2186). 
>   This goes the wrong direction and is far too big a difference to be 
>   solely nr_cpus_online() switching from 1 to 8.

I am not sure where the 78 came from because the default value is 32 and
it is not set anywhere else but writeback_set_ratelimit. At least it
looks like that from the quick code inspection. I am not an expert in
that area.

> The machine is 32-bit with 12GiB of RAM.

I think you should really consider 64b kernel for such a machine. You
would suffer from the low mem pressure otherwise and I do not see a good
reason for that. If you depend on 32b userspace then it should run just
fine on top of 64b kernel.

[...]
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Write throughput impaired by touching dirty_ratio
  2015-06-24 22:26   ` Mark Hills
  2015-06-25  9:20     ` Michal Hocko
@ 2015-06-25  9:30     ` Vlastimil Babka
  1 sibling, 0 replies; 9+ messages in thread
From: Vlastimil Babka @ 2015-06-25  9:30 UTC (permalink / raw)
  To: Mark Hills; +Cc: linux-mm, Michal Hocko, Mel Gorman, Johannes Weiner, LKML

On 06/25/2015 12:26 AM, Mark Hills wrote:
> On Wed, 24 Jun 2015, Vlastimil Babka wrote:
>
>> [add some CC's]
>>
>> On 06/19/2015 05:16 PM, Mark Hills wrote:
>>
>> Hmm, so the only thing that dirty_ratio_handler() changes except the
>> vm_dirty_ratio itself, is ratelimit_pages through writeback_set_ratelimit(). So
>> I assume the problem is with ratelimit_pages. There's num_online_cpus() used in
>> the calculation, which I think would differ between the initial system state
>> (where we are called by page_writeback_init()) and later when all CPU's are
>> onlined. But I don't see CPU onlining code updating the limit (unlike memory
>> hotplug which does that), so that's suspicious.
>>
>> Another suspicious thing is that global_dirty_limits() looks at current
>> process's flag. It seems odd to me that the process calling the sysctl would
>> determine a value global to the system.
>
> Yes, I also spotted this. The fragment of code is:
>
>    	tsk = current;
> 	if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) {
> 		background += background / 4;
> 		dirty += dirty / 4;
> 	}
>
> It seems to imply the code was not always used from the /proc interface.
> It's relevant in a moment...
>
>> If you are brave enough (and have kernel configured properly and with
>> debuginfo),
>
> I'm brave... :) I hadn't seen this tool before, thanks for introducing me
> to it, I will use it more now, I'm sure.

Ok I admit I didn't expect so much outcome from my suggestion. Good job :)

>> you can verify how value of ratelimit_pages variable changes on the live
>> system, using the crash tool. Just start it, and if everything works,
>> you can inspect the live system. It's a bit complicated since there are
>> two static variables called "ratelimit_pages" in the kernel so we can't
>> print them easily (or I don't know how). First we have to get the
>> variable address:
>>
>> crash> sym ratelimit_pages
>> ffffffff81e67200 (d) ratelimit_pages
>> ffffffff81ef4638 (d) ratelimit_pages
>>
>> One will be absurdly high (probably less on your 32bit) so it's not the one we want:
>>
>> crash> rd -d ffffffff81ef4638 1
>> ffffffff81ef4638:    4294967328768
>>
>> The second will have a smaller value:
>> (my system after boot with dirty ratio = 20)
>> crash> rd -d ffffffff81e67200 1
>> ffffffff81e67200:             1577
>>
>> (after changing to 21)
>> crash> rd -d ffffffff81e67200 1
>> ffffffff81e67200:             1570
>>
>> (after changing back to 20)
>> crash> rd -d ffffffff81e67200 1
>> ffffffff81e67200:             1496
>
> In my case there's only one such symbol (perhaps because this kernel
> config is quite slimmed down?)
>
>    crash> sym ratelimit_pages
>    c148b618 (d) ratelimit_pages
>
>    (bootup with dirty_ratio 20)
>    crash> rd -d ratelimit_pages
>    c148b618:            78

With just one symbol you can use
crash> p ratelimit_pages

This will take the type properly into account, while rd will print full 
32bit/64bit depending on your kernel, which might be larger than the 
actual variable. But if there are more symbols of same name, "p" will 
somehow randomly pick one of them and don't even warn about it.

[snip]

>>>
>>
>
> Thanks, I hope you find this useful.

Yes, thanks, nice analysis. Since Michal already replied and has more 
experience with the reclaim code and dirty throttling, I won't try 
adding more.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Write throughput impaired by touching dirty_ratio
  2015-06-25  9:20     ` Michal Hocko
@ 2015-06-25 12:56       ` Michal Hocko
  2015-06-25 21:45       ` Mark Hills
  1 sibling, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2015-06-25 12:56 UTC (permalink / raw)
  To: Mark Hills; +Cc: Vlastimil Babka, linux-mm, Mel Gorman, Johannes Weiner, LKML

On Thu 25-06-15 11:20:56, Michal Hocko wrote:
[...]
> From your /proc/zoneinfo:
> > Node 0, zone  HighMem
> >   pages free     2536526
> >         min      128
> >         low      37501
> >         high     74874
> >         scanned  0
> >         spanned  3214338
> >         present  3017668
> >         managed  3017668
> 
> You have 11G of highmem. Which is a lot wrt. the the lowmem
> 
> > Node 0, zone   Normal
> >   pages free     37336
> >         min      4789
> >         low      5986
> >         high     7183
> >         scanned  0
> >         spanned  123902
> >         present  123902
> >         managed  96773
> 
> which is only 378M! So something had to eat portion of the lowmem.

And just to clarify. Your lowmem has only 123902 pages (+DMA zone which
has 16M so it doesn't add much) which is ~480M. The lowmem can sit only
in the low 1G (actually less because part of that is used by kernel for
special mappings). You only have half of that because, presumably some
HW has reserved portion of that address range. So your lowmem zone is
really tiny. Now part of that range is used for kernel stuff like struct
pages which have to describe the full memory and this is eating quite a
lot for 3 million pages. So you ended up with only 378M really usable
for all the kernel allocations which cannot live in the highmem (and there
are many of those). This makes a large memory pressure on that zone even
though you might have huge amount of highmem free. This is the primary
reason why PAE kernels are not really usable for large memory setups
in general. A very specific usecases might work but even then I would
have to a very strong reason to stick with 32b kernel (e.g. a stupid out
of tree driver which is 32b specific or something similar).
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Write throughput impaired by touching dirty_ratio
  2015-06-25  9:20     ` Michal Hocko
  2015-06-25 12:56       ` Michal Hocko
@ 2015-06-25 21:45       ` Mark Hills
  2015-07-01 15:40         ` Michal Hocko
  1 sibling, 1 reply; 9+ messages in thread
From: Mark Hills @ 2015-06-25 21:45 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Vlastimil Babka, linux-mm, Mel Gorman, Johannes Weiner, LKML

On Thu, 25 Jun 2015, Michal Hocko wrote:

> On Wed 24-06-15 23:26:49, Mark Hills wrote:
> [...]
> > To test, I flipped the vm_highmem_is_dirtyable (which had no effect until 
> > I forced it to re-evaluate ratelimit_pages):
> > 
> >   $ echo 1 > /proc/sys/vm/highmem_is_dirtyable
> >   $ echo 21 > /proc/sys/vm/dirty_ratio
> >   $ echo 20 > /proc/sys/vm/dirty_ratio 
> > 
> >   crash> rd -d ratelimit_pages
> >   c148b618:          2186 
> > 
> > The value is now healthy, more so than even the value we started 
> > with on bootup.
> 
> From your /proc/zoneinfo:
> > Node 0, zone  HighMem
> >   pages free     2536526
> >         min      128
> >         low      37501
> >         high     74874
> >         scanned  0
> >         spanned  3214338
> >         present  3017668
> >         managed  3017668
> 
> You have 11G of highmem. Which is a lot wrt. the the lowmem
> 
> > Node 0, zone   Normal
> >   pages free     37336
> >         min      4789
> >         low      5986
> >         high     7183
> >         scanned  0
> >         spanned  123902
> >         present  123902
> >         managed  96773
> 
> which is only 378M! So something had to eat portion of the lowmem.
> I think it is a bad idea to use 32b kernel with that amount of memory in
> general. The lowmem pressure is even worse by the fact that something is
> eating already precious amount of lowmem.

Yup, that's the ""vmalloc=512M" kernel parameter.

That was a requirement for my NVidia GPU to work, but now I have an AMD 
card so I have been able to remove that. It now gives me ~730M, and 
provided some relieve to ratelimit_pages; now at 63 (when dirty_ratio is 
set to 20 after boot)

> What is the reason to stick with 32b kernel anyway?

Because it's ideal for finding edge cases and bugs in kernels :-)

The real reason is more practical. I never had a problem with the 32-bit 
one, and as my OS is quite home-grown and evolved over 10+ years, I 
haven't wanted to start again or reinstall.

This is the first time I've been aware of any problem or notable 
performance impact -- the PAE kernel has worked very well for me.

The only reason I have so much RAM is that RAM is cheap, and it's a great 
disk cache. I'd be more likely to remove some of the RAM than reinstall!

Perhaps someone could kindly explain why don't I have the same problem if 
I have, say 1.5G of RAM? Is it because the page table for 12G is large and 
sits in the lowmem?

> > My questions and observations are:
> > 
> > * What does highmem_is_dirtyable actually mean, and should it really 
> >   default to 1?
> 
> It says whether highmem should be considered dirtyable. It is not by
> default. See more for motivation in 195cf453d2c3 ("mm/page-writeback:
> highmem_is_dirtyable option").

Thank you, this explanation is useful.

I know very little about the constraints on highmem and lowmem, though I 
can make an educated guess (and reading http://linux-mm.org/HighMemory)

I do have some questions though, perhaps if someone would be happy to 
explain.

What is the "excessive scanning" mentioned in that patch, and why it is 
any more than I would expect a 64-bit kernel to be doing? ie. what is the 
practical downside of me doing:

  $ echo 1073741824 > /proc/sys/vm/dirty_bytes

Also, is VMSPLIT_2G likely to be appropriate here if the kernel is 
managing larger amounts of total RAM? I enabled it and it increases the 
lowmem. Is this a simple tradeoff I am making now between user and kernel 
space?

I'm not trying to sit in the dark ages, but the bad I/O throttling is the 
only real problem I have suffered by staying 32-bit, and a small tweak has 
restored sanity. So it's reasonable to question the logic that is in use.

For example, if we're saying that ratelimit_pages is dependent truly on 
free lowmem, then surely it needs to be periodically re-evaluated as the 
system is put to use? Setting 'dirty_ratio' implies that it's a ratio of a 
fixed, unchanging value.

Many thanks

-- 
Mark

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Write throughput impaired by touching dirty_ratio
  2015-06-25 21:45       ` Mark Hills
@ 2015-07-01 15:40         ` Michal Hocko
  0 siblings, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2015-07-01 15:40 UTC (permalink / raw)
  To: Mark Hills; +Cc: Vlastimil Babka, linux-mm, Mel Gorman, Johannes Weiner, LKML

On Thu 25-06-15 22:45:57, Mark Hills wrote:
> On Thu, 25 Jun 2015, Michal Hocko wrote:
> 
> > On Wed 24-06-15 23:26:49, Mark Hills wrote:
> > [...]
> > > To test, I flipped the vm_highmem_is_dirtyable (which had no effect until 
> > > I forced it to re-evaluate ratelimit_pages):
> > > 
> > >   $ echo 1 > /proc/sys/vm/highmem_is_dirtyable
> > >   $ echo 21 > /proc/sys/vm/dirty_ratio
> > >   $ echo 20 > /proc/sys/vm/dirty_ratio 
> > > 
> > >   crash> rd -d ratelimit_pages
> > >   c148b618:          2186 
> > > 
> > > The value is now healthy, more so than even the value we started 
> > > with on bootup.
> > 
> > From your /proc/zoneinfo:
> > > Node 0, zone  HighMem
> > >   pages free     2536526
> > >         min      128
> > >         low      37501
> > >         high     74874
> > >         scanned  0
> > >         spanned  3214338
> > >         present  3017668
> > >         managed  3017668
> > 
> > You have 11G of highmem. Which is a lot wrt. the the lowmem
> > 
> > > Node 0, zone   Normal
> > >   pages free     37336
> > >         min      4789
> > >         low      5986
> > >         high     7183
> > >         scanned  0
> > >         spanned  123902
> > >         present  123902
> > >         managed  96773
> > 
> > which is only 378M! So something had to eat portion of the lowmem.
> > I think it is a bad idea to use 32b kernel with that amount of memory in
> > general. The lowmem pressure is even worse by the fact that something is
> > eating already precious amount of lowmem.
> 
> Yup, that's the ""vmalloc=512M" kernel parameter.

I see.

> That was a requirement for my NVidia GPU to work, but now I have an AMD 
> card so I have been able to remove that. It now gives me ~730M, and 
> provided some relieve to ratelimit_pages; now at 63 (when dirty_ratio is 
> set to 20 after boot)
> 
> > What is the reason to stick with 32b kernel anyway?
> 
> Because it's ideal for finding edge cases and bugs in kernels :-)

OK, then good luck ;)

> The real reason is more practical. I never had a problem with the 32-bit 
> one, and as my OS is quite home-grown and evolved over 10+ years, I 
> haven't wanted to start again or reinstall.

I can understand that. I was using PAE kernel for ages as well even
though I was aware of all the problems. It wasn't such a big deal
because I didn't have much more than 4G on my machines. But it simply
stopped being practical and I have moved on.

> This is the first time I've been aware of any problem or notable 
> performance impact -- the PAE kernel has worked very well for me.
> 
> The only reason I have so much RAM is that RAM is cheap, and it's a great 
> disk cache. I'd be more likely to remove some of the RAM than reinstall!

Well, you do not have to reinstall the whole system. You should be able
to install 64b kernel only.
 
> Perhaps someone could kindly explain why don't I have the same problem if 
> I have, say 1.5G of RAM? Is it because the page table for 12G is large and 
> sits in the lowmem?

I've tried to explain some of the issues in the other email. Some of the
problems (e.g. performance where each highmem page has to be mapped when
the kernel want's to access it) do not depend on the amount of memory
but some of them do (e.g. struct pages which scale with the amount of
memory).

> > > My questions and observations are:
> > > 
> > > * What does highmem_is_dirtyable actually mean, and should it really 
> > >   default to 1?
> > 
> > It says whether highmem should be considered dirtyable. It is not by
> > default. See more for motivation in 195cf453d2c3 ("mm/page-writeback:
> > highmem_is_dirtyable option").
> 
> Thank you, this explanation is useful.
> 
> I know very little about the constraints on highmem and lowmem, though I 
> can make an educated guess (and reading http://linux-mm.org/HighMemory)
> 
> I do have some questions though, perhaps if someone would be happy to 
> explain.
> 
> What is the "excessive scanning" mentioned in that patch, and why it is 
> any more than I would expect a 64-bit kernel to be doing?

This is a good question! It wasn't obvious to me as well so I took my
pickaxe and a showel and dig into the history.
The highmem has been removed from the dirty throttling code back in
2005 by Andrea and Rik (https://lkml.org/lkml/2004/12/20/111) because
some mappings couldn't use highmem (e.g. dd of=block_device) and
so they didn't get throttled properly made a huge memory pressure
on lowmem and could even cause an OOM killer. The code still
considered highmem dirtyable for highmem capable mappings but that
has been later removed by Linus because it has caused other problems
(http://marc.info/?l=git-commits-head&m=117013324728709).

> ie. what is the practical downside of me doing:
> 
>   $ echo 1073741824 > /proc/sys/vm/dirty_bytes

You could end up having the full lowmem dirty for lowmem only mappings.

> Also, is VMSPLIT_2G likely to be appropriate here if the kernel is 
> managing larger amounts of total RAM? I enabled it and it increases the 
> lowmem. Is this a simple tradeoff I am making now between user and kernel 
> space?

Your userspace will get only 2G of address space. If this is sufficient
for you then it will help to your lowmem pressure.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-07-01 15:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-19 15:16 Write throughput impaired by touching dirty_ratio Mark Hills
2015-06-24  8:27 ` Vlastimil Babka
2015-06-24  9:16   ` Michal Hocko
2015-06-24 22:26   ` Mark Hills
2015-06-25  9:20     ` Michal Hocko
2015-06-25 12:56       ` Michal Hocko
2015-06-25 21:45       ` Mark Hills
2015-07-01 15:40         ` Michal Hocko
2015-06-25  9:30     ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox