Список изменений в ядре 6.6.18

ALSA: hda/conexant: Add quirk for SWS JS201D [+ + +]

Author: bo liu <bo.liu@senarytech.com>
Date:   Mon Feb 5 09:38:02 2024 +0800

    ALSA: hda/conexant: Add quirk for SWS JS201D
    
    commit 4639c5021029d49fd2f97fa8d74731f167f98919 upstream.
    
    The SWS JS201D need a different pinconfig from windows driver.
    Add a quirk to use a specific pinconfig to SWS JS201D.
    
    Signed-off-by: bo liu <bo.liu@senarytech.com>
    Cc: <stable@vger.kernel.org>
    Link: https://lore.kernel.org/r/20240205013802.51907-1-bo.liu@senarytech.com
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda/cs35l56: select intended config FW_CS_DSP [+ + +]

Author: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Date:   Fri Feb 9 09:20:44 2024 +0100

    ALSA: hda/cs35l56: select intended config FW_CS_DSP
    
    [ Upstream commit e5aa6d51a2ef8c7ef7e3fe76bebe530fb68e7f08 ]
    
    Commit 73cfbfa9caea ("ALSA: hda/cs35l56: Add driver for Cirrus Logic
    CS35L56 amplifier") adds configs SND_HDA_SCODEC_CS35L56_{I2C,SPI},
    which selects the non-existing config CS_DSP. Note the renaming in
    commit d7cfdf17cb9d ("firmware: cs_dsp: Rename KConfig symbol CS_DSP ->
    FW_CS_DSP"), though.
    
    Select the intended config FW_CS_DSP.
    
    This broken select command probably was not noticed as the configs also
    select SND_HDA_CS_DSP_CONTROLS and this then selects FW_CS_DSP. So, the
    select FW_CS_DSP could actually be dropped, but we will keep this
    redundancy in place as the author originally also intended to have this
    redundancy of selects in place.
    
    Fixes: 73cfbfa9caea ("ALSA: hda/cs35l56: Add driver for Cirrus Logic CS35L56 amplifier")
    Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
    Reviewed-by: Simon Trimmer <simont@opensource.cirrus.com>
    Link: https://lore.kernel.org/r/20240209082044.3981-1-lukas.bulwahn@gmail.com
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ALSA: hda/cs8409: Suppress vmaster control for Dolphin models [+ + +]

Author: Vitaly Rodionov <vitalyr@opensource.cirrus.com>
Date:   Mon Jan 22 18:47:10 2024 +0000

    ALSA: hda/cs8409: Suppress vmaster control for Dolphin models
    
    commit a2ed0a44d637ef9deca595054c206da7d6cbdcbc upstream.
    
    Customer has reported an issue with specific desktop platform
    where two CS42L42 codecs are connected to CS8409 HDA bridge.
    If "Master Volume Control" is created then on Ubuntu OS UCM
    left/right balance slider in UI audio settings has no effect.
    This patch will fix this issue for a target paltform.
    
    Fixes: 20e507724113 ("ALSA: hda/cs8409: Add support for dolphin")
    Signed-off-by: Vitaly Rodionov <vitalyr@opensource.cirrus.com>
    Cc: <stable@vger.kernel.org>
    Link: https://lore.kernel.org/r/20240122184710.5802-1-vitalyr@opensource.cirrus.com
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda/realtek - Add speaker pin verbtable for Dell dual speaker platform [+ + +]

Author: Kailang Yang <kailang@realtek.com>
Date:   Wed Jan 24 14:21:47 2024 +0800

    ALSA: hda/realtek - Add speaker pin verbtable for Dell dual speaker platform
    
    commit fcfc9f711d1e2fc7876ac12b1b16c509404b9625 upstream.
    
    SSID 0x0c0d platform. It can't mute speaker when HP plugged.
    This patch add quirk to fill speaker pin verbtable.
    And disable speaker passthrough.
    
    Signed-off-by: Kailang Yang <kailang@realtek.com>
    Cc: <stable@vger.kernel.org>
    Link: https://lore.kernel.org/r/38b82976a875451d833d514cee34ff6a@realtek.com
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda/realtek: add IDs for Dell dual spk platform [+ + +]

Author: Shuming Fan <shumingf@realtek.com>
Date:   Mon Feb 5 15:22:52 2024 +0800

    ALSA: hda/realtek: add IDs for Dell dual spk platform
    
    commit fddab35fd064414c677e9488c4fb3a1f67725d37 upstream.
    
    This patch adds another two IDs for the Dell dual speaker platform.
    
    Signed-off-by: Shuming Fan <shumingf@realtek.com>
    Cc: <stable@vger.kernel.org>
    Link: https://lore.kernel.org/r/20240205072252.3791500-1-shumingf@realtek.com
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda/realtek: Apply headset jack quirk for non-bass alc287 thinkpads [+ + +]

Author: Josц╘ Relvas <josemonsantorelvas@gmail.com>
Date:   Wed Jan 31 11:34:09 2024 +0000

    ALSA: hda/realtek: Apply headset jack quirk for non-bass alc287 thinkpads
    
    commit 2468e8922d2f6da81a6192b73023eff67e3fefdd upstream.
    
    There currently exists two thinkpad headset jack fixups:
    ALC285_FIXUP_THINKPAD_NO_BASS_SPK_HEADSET_JACK
    ALC285_FIXUP_THINKPAD_HEADSET_JACK
    
    The latter is applied to alc285 and alc287 thinkpads which contain
    bass speakers.
    However, the former was only being applied to alc285 thinkpads,
    leaving non-bass alc287 thinkpads with no headset button controls.
    This patch fixes that by adding ALC285_FIXUP_THINKPAD_NO_BASS_SPK_HEADSET_JACK
    to the alc287 chains, allowing the detection of headset buttons.
    
    Signed-off-by: Josц╘ Relvas <josemonsantorelvas@gmail.com>
    Cc: <stable@vger.kernel.org>
    Link: https://lore.kernel.org/r/20240131113407.34698-3-josemonsantorelvas@gmail.com
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda/realtek: Enable headset mic on Vaio VJFE-ADL [+ + +]

Author: Edson Juliano Drosdeck <edson.drosdeck@gmail.com>
Date:   Thu Feb 1 09:21:14 2024 -0300

    ALSA: hda/realtek: Enable headset mic on Vaio VJFE-ADL
    
    commit c7de2d9bb68a5fc71c25ff96705a80a76c8436eb upstream.
    
    Vaio VJFE-ADL is equipped with ALC269VC, and it needs
    ALC298_FIXUP_SPK_VOLUME quirk to make its headset mic work.
    
    Signed-off-by: Edson Juliano Drosdeck <edson.drosdeck@gmail.com>
    Cc: <stable@vger.kernel.org>
    Link: https://lore.kernel.org/r/20240201122114.30080-1-edson.drosdeck@gmail.com
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda/realtek: Enable Mute LED on HP Laptop 14-fq0xxx [+ + +]

Author: Luka Guzenko <l.guzenko@web.de>
Date:   Sun Jan 28 16:57:04 2024 +0100

    ALSA: hda/realtek: Enable Mute LED on HP Laptop 14-fq0xxx
    
    commit f0d78972f27dc1d1d51fbace2713ad3cdc60a877 upstream.
    
    This HP Laptop uses ALC236 codec with COEF 0x07 controlling the
    mute LED. Enable existing quirk for this device.
    
    Signed-off-by: Luka Guzenko <l.guzenko@web.de>
    Cc: <stable@vger.kernel.org>
    Link: https://lore.kernel.org/r/20240128155704.2333812-1-l.guzenko@web.de
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda/realtek: fix mute/micmute LED For HP mt645 [+ + +]

Author: Eniac Zhang <eniac-xw.zhang@hp.com>
Date:   Thu Feb 15 15:49:22 2024 +0000

    ALSA: hda/realtek: fix mute/micmute LED For HP mt645
    
    commit 32f03f4002c5df837fb920eb23fcd2f4af9b0b23 upstream.
    
    The HP mt645 G7 Thin Client uses an ALC236 codec and needs the
    ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF quirk to make the mute and
    micmute LEDs work.
    
    There are two variants of the USB-C PD chip on this device. Each uses
    a different BIOS and board ID, hence the two entries.
    
    Signed-off-by: Eniac Zhang <eniac-xw.zhang@hp.com>
    Signed-off-by: Alexandru Gagniuc <alexandru.gagniuc@hp.com>
    Cc: <stable@vger.kernel.org>
    Link: https://lore.kernel.org/r/20240215154922.778394-1-alexandru.gagniuc@hp.com
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda/realtek: fix mute/micmute LEDs for HP ZBook Power [+ + +]

Author: Andy Chi <andy.chi@canonical.com>
Date:   Mon Jan 22 15:48:24 2024 +0800

    ALSA: hda/realtek: fix mute/micmute LEDs for HP ZBook Power
    
    commit 1513664f340289cf10402753110f3cff12a738aa upstream.
    
    The HP ZBook Power using ALC236 codec which using 0x02 to
    control mute LED and 0x01 to control micmute LED.
    Therefore, add a quirk to make it works.
    
    Signed-off-by: Andy Chi <andy.chi@canonical.com>
    Cc: <stable@vger.kernel.org>
    Link: https://lore.kernel.org/r/20240122074826.1020964-1-andy.chi@canonical.com
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda/realtek: Fix the external mic not being recognised for Acer Swift 1 SF114-32 [+ + +]

Author: David Senoner <seda18@rolmail.net>
Date:   Fri Jan 26 16:56:26 2024 +0100

    ALSA: hda/realtek: Fix the external mic not being recognised for Acer Swift 1 SF114-32
    
    commit efb56d84dd9c3de3c99fc396abb57c6d330038b5 upstream.
    
    If you connect an external headset/microphone to the 3.5mm jack on the
    Acer Swift 1 SF114-32 it does not recognize the microphone. This fixes
    that and gives the user the ability to choose between internal and
    headset mic.
    
    Signed-off-by: David Senoner <seda18@rolmail.net>
    Cc: <stable@vger.kernel.org>
    Link: https://lore.kernel.org/r/20240126155626.2304465-1-seda18@rolmail.net
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64/signal: Don't assume that TIF_SVE means we saved SVE state [+ + +]

Author: Mark Brown <broonie@kernel.org>
Date:   Tue Jan 30 15:43:53 2024 +0000

    arm64/signal: Don't assume that TIF_SVE means we saved SVE state
    
    commit 61da7c8e2a602f66be578cbbcebe8638c10e0f48 upstream.
    
    When we are in a syscall we will only save the FPSIMD subset even though
    the task still has access to the full register set, and on context switch
    we will only remove TIF_SVE when loading the register state. This means
    that the signal handling code should not assume that TIF_SVE means that
    the register state is stored in SVE format, it should instead check the
    format that was recorded during save.
    
    Fixes: 8c845e273104 ("arm64/sve: Leave SVE enabled on syscall if we don't context switch")
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240130-arm64-sve-signal-regs-v2-1-9fc6f9502782@kernel.org
    Signed-off-by: Will Deacon <will@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: Subscribe Microsoft Azure Cobalt 100 to ARM Neoverse N2 errata [+ + +]

Author: Easwar Hariharan <eahariha@linux.microsoft.com>
Date:   Wed Feb 14 17:55:18 2024 +0000

    arm64: Subscribe Microsoft Azure Cobalt 100 to ARM Neoverse N2 errata
    
    commit fb091ff394792c018527b3211bbdfae93ea4ac02 upstream.
    
    Add the MIDR value of Microsoft Azure Cobalt 100, which is a Microsoft
    implemented CPU based on r0p0 of the ARM Neoverse N2 CPU, and therefore
    suffers from all the same errata.
    
    CC: stable@vger.kernel.org # 5.15+
    Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com>
    Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
    Acked-by: Mark Rutland <mark.rutland@arm.com>
    Acked-by: Marc Zyngier <maz@kernel.org>
    Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
    Link: https://lore.kernel.org/r/20240214175522.2457857-1-eahariha@linux.microsoft.com
    Signed-off-by: Will Deacon <will@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ASoC: amd: yc: Add DMI quirk for Lenovo Ideapad Pro 5 16ARP8 [+ + +]

Author: Mario Limonciello <mario.limonciello@amd.com>
Date:   Mon Feb 5 15:48:53 2024 -0600

    ASoC: amd: yc: Add DMI quirk for Lenovo Ideapad Pro 5 16ARP8
    
    commit 610010737f74482a61896596a0116876ecf9e65c upstream.
    
    The laptop requires a quirk ID to enable its internal microphone. Add
    it to the DMI quirk table.
    
    Reported-by: Stanislav Petrov <stanislav.i.petrov@gmail.com>
    Closes: https://bugzilla.kernel.org/show_bug.cgi?id=216925
    Cc: stable@vger.kernel.org
    Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
    Link: https://lore.kernel.org/r/20240205214853.2689-1-mario.limonciello@amd.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ASoC: amd: yc: Add DMI quirk for MSI Bravo 15 C7VF [+ + +]

Author: Techno Mooney <techno.mooney@gmail.com>
Date:   Mon Jan 29 15:11:47 2024 +0700

    ASoC: amd: yc: Add DMI quirk for MSI Bravo 15 C7VF
    
    commit c6dce23ec993f7da7790a9eadb36864ceb60e942 upstream.
    
    The laptop requires a quirk ID to enable its internal microphone. Add
    it to the DMI quirk table.
    
    Reported-by: Techno Mooney <techno.mooney@gmail.com>
    Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218402
    Cc: stable@vger.kernel.org
    Signed-off-by: Techno Mooney <techno.mooney@gmail.com>
    Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
    Link: https://msgid.link/r/20240129081148.1044891-1-bagasdotme@gmail.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ASoC: codecs: wcd938x: handle deferred probe [+ + +]

Author: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Date:   Wed Jan 17 16:12:06 2024 +0100

    ASoC: codecs: wcd938x: handle deferred probe
    
    commit 086df711d9b886194481b4fbe525eb43e9ae7403 upstream.
    
    WCD938x sound codec driver ignores return status of getting regulators
    and returns EINVAL instead of EPROBE_DEFER.  If regulator provider
    probes after the codec, system is left without probed audio:
    
      wcd938x_codec audio-codec: wcd938x_probe: Fail to obtain platform data
      wcd938x_codec: probe of audio-codec failed with error -22
    
    Fixes: 16572522aece ("ASoC: codecs: wcd938x-sdw: add SoundWire driver")
    Cc:  <stable@vger.kernel.org>
    Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
    Link: https://msgid.link/r/20240117151208.1219755-1-krzysztof.kozlowski@linaro.org
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ASoC: rt5645: Fix deadlock in rt5645_jack_detect_work() [+ + +]

Author: Alexey Khoroshilov <khoroshilov@ispras.ru>
Date:   Sun Feb 11 12:58:34 2024 +0300

    ASoC: rt5645: Fix deadlock in rt5645_jack_detect_work()
    
    [ Upstream commit 6ef5d5b92f7117b324efaac72b3db27ae8bb3082 ]
    
    There is a path in rt5645_jack_detect_work(), where rt5645->jd_mutex
    is left locked forever. That may lead to deadlock
    when rt5645_jack_detect_work() is called for the second time.
    
    Found by Linux Verification Center (linuxtesting.org) with SVACE.
    
    Fixes: cdba4301adda ("ASoC: rt5650: add mutex to avoid the jack detection failure")
    Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
    Link: https://lore.kernel.org/r/1707645514-21196-1-git-send-email-khoroshilov@ispras.ru
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: SOF: ipc3-topology: Fix pipeline tear down logic [+ + +]

Author: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Date:   Thu Feb 8 15:34:32 2024 +0200

    ASoC: SOF: ipc3-topology: Fix pipeline tear down logic
    
    [ Upstream commit d7332c4a4f1a7d16f054c6357fb65c597b6a86a7 ]
    
    With the change in the widget free logic to power down the cores only
    when the scheduler widgets are freed, we need to ensure that the
    scheduler widget is freed only after all the widgets associated with the
    scheduler are freed. This is to ensure that the secondary core that the
    scheduler is scheduled to run on is kept powered on until all widgets
    that need them are in use. While this works well for dynamic pipelines,
    in the case of static pipelines the current logic does not take this into
    account and frees all widgets in the order they occur in the
    widget_list. So, modify this to ensure that the scheduler widgets are freed
    only after all other types of widgets in the widget_list are freed.
    
    Link: https://github.com/thesofproject/linux/issues/4807
    Fixes: 31ed8da1c8e5 ("ASoC: SOF: sof-audio: Modify logic for enabling/disabling topology cores")
    Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
    Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
    Reviewed-by: Pц╘ter Ujfalusi <peter.ujfalusi@linux.intel.com>
    Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
    Link: https://lore.kernel.org/r/20240208133432.1688-1-peter.ujfalusi@linux.intel.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: SOF: IPC3: fix message bounds on ipc ops [+ + +]

Author: Curtis Malainey <cujomalainey@chromium.org>
Date:   Tue Feb 13 14:38:34 2024 +0200

    ASoC: SOF: IPC3: fix message bounds on ipc ops
    
    commit fcbe4873089c84da641df75cda9cac2e9addbb4b upstream.
    
    commit 74ad8ed65121 ("ASoC: SOF: ipc3: Implement rx_msg IPC ops")
    introduced a new allocation before the upper bounds check in
    do_rx_work. As a result A DSP can cause bad allocations if spewing
    garbage.
    
    Fixes: 74ad8ed65121 ("ASoC: SOF: ipc3: Implement rx_msg IPC ops")
    Reported-by: Tim Van Patten <timvp@google.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Curtis Malainey <cujomalainey@chromium.org>
    Reviewed-by: Pц╘ter Ujfalusi <peter.ujfalusi@linux.intel.com>
    Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com>
    Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
    Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
    Link: https://msgid.link/r/20240213123834.4827-1-peter.ujfalusi@linux.intel.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ASoC: tas2781: add module parameter to tascodec_init() [+ + +]

Author: Gergo Koteles <soyer@irl.hu>
Date:   Sun Feb 4 21:01:17 2024 +0100

    ASoC: tas2781: add module parameter to tascodec_init()
    
    commit 34a1066981a967eab619938e7b35a9be6b4c34e1 upstream.
    
    The tascodec_init() of the snd-soc-tas2781-comlib module is called from
    snd-soc-tas2781-i2c and snd-hda-scodec-tas2781-i2c modules. It calls
    request_firmware_nowait() with parameter THIS_MODULE and a cont/callback
    from the latter modules.
    
    The latter modules can be removed while their callbacks are running,
    resulting in a general protection failure.
    
    Add module parameter to tascodec_init() so request_firmware_nowait() can
    be called with the module of the callback.
    
    Fixes: ef3bcde75d06 ("ASoC: tas2781: Add tas2781 driver")
    CC: stable@vger.kernel.org
    Signed-off-by: Gergo Koteles <soyer@irl.hu>
    Link: https://lore.kernel.org/r/118dad922cef50525e5aab09badef2fa0eb796e5.1707076603.git.soyer@irl.hu
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

binder: signal epoll threads of self-work [+ + +]

Author: Carlos Llamas <cmllamas@google.com>
Date:   Wed Jan 31 21:53:46 2024 +0000

    binder: signal epoll threads of self-work
    
    commit 97830f3c3088638ff90b20dfba2eb4d487bf14d7 upstream.
    
    In (e)poll mode, threads often depend on I/O events to determine when
    data is ready for consumption. Within binder, a thread may initiate a
    command via BINDER_WRITE_READ without a read buffer and then make use
    of epoll_wait() or similar to consume any responses afterwards.
    
    It is then crucial that epoll threads are signaled via wakeup when they
    queue their own work. Otherwise, they risk waiting indefinitely for an
    event leaving their work unhandled. What is worse, subsequent commands
    won't trigger a wakeup either as the thread has pending work.
    
    Fixes: 457b9a6f09f0 ("Staging: android: add binder driver")
    Cc: Arve Hjц╦nnevц╔g <arve@android.com>
    Cc: Martijn Coenen <maco@android.com>
    Cc: Alice Ryhl <aliceryhl@google.com>
    Cc: Steven Moreland <smoreland@google.com>
    Cc: stable@vger.kernel.org # v4.19+
    Signed-off-by: Carlos Llamas <cmllamas@google.com>
    Link: https://lore.kernel.org/r/20240131215347.1808751-1-cmllamas@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

blk-wbt: Fix detection of dirty-throttled tasks [+ + +]

Author: Jan Kara <jack@suse.cz>
Date:   Tue Jan 23 18:58:26 2024 +0100

    blk-wbt: Fix detection of dirty-throttled tasks
    
    commit f814bdda774c183b0cc15ec8f3b6e7c6f4527ba5 upstream.
    
    The detection of dirty-throttled tasks in blk-wbt has been subtly broken
    since its beginning in 2016. Namely if we are doing cgroup writeback and
    the throttled task is not in the root cgroup, balance_dirty_pages() will
    set dirty_sleep for the non-root bdi_writeback structure. However
    blk-wbt checks dirty_sleep only in the root cgroup bdi_writeback
    structure. Thus detection of recently throttled tasks is not working in
    this case (we noticed this when we switched to cgroup v2 and suddently
    writeback was slow).
    
    Since blk-wbt has no easy way to get to proper bdi_writeback and
    furthermore its intention has always been to work on the whole device
    rather than on individual cgroups, just move the dirty_sleep timestamp
    from bdi_writeback to backing_dev_info. That fixes the checking for
    recently throttled task and saves memory for everybody as a bonus.
    
    CC: stable@vger.kernel.org
    Fixes: b57d74aff9ab ("writeback: track if we're sleeping on progress in balance_dirty_pages()")
    Signed-off-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20240123175826.21452-1-jack@suse.cz
    [axboe: fixup indentation errors]
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

block: fix partial zone append completion handling in req_bio_endio() [+ + +]

Author: Damien Le Moal <dlemoal@kernel.org>
Date:   Wed Jan 10 18:29:42 2024 +0900

    block: fix partial zone append completion handling in req_bio_endio()
    
    [ Upstream commit 748dc0b65ec2b4b7b3dbd7befcc4a54fdcac7988 ]
    
    Partial completions of zone append request is not allowed but if a zone
    append completion indicates a number of completed bytes different from
    the original BIO size, only the BIO status is set to error. This leads
    to bio_advance() not setting the BIO size to 0 and thus to not call
    bio_endio() at the end of req_bio_endio().
    
    Make sure a partially completed zone append is failed and completed
    immediately by forcing the completed number of bytes (nbytes) to be
    equal to the BIO size, thus ensuring that bio_endio() is called.
    
    Fixes: 297db731847e ("block: fix req_bio_endio append error handling")
    Cc: stable@kernel.vger.org
    Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Link: https://lore.kernel.org/r/20240110092942.442334-1-dlemoal@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY [+ + +]

Author: Magnus Karlsson <magnus.karlsson@intel.com>
Date:   Wed Feb 7 09:47:36 2024 +0100

    bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY
    
    [ Upstream commit 9b0ed890ac2ae233efd8b27d11aee28a19437bb8 ]
    
    Do not report the XDP capability NETDEV_XDP_ACT_XSK_ZEROCOPY as the
    bonding driver does not support XDP and AF_XDP in zero-copy mode even
    if the real NIC drivers do.
    
    Note that the driver used to report everything as supported before a
    device was bonded. Instead of just masking out the zero-copy support
    from this, have the driver report that no XDP feature is supported
    until a real device is bonded. This seems to be more truthful as it is
    the real drivers that decide what XDP features are supported.
    
    Fixes: cb9e6e584d58 ("bonding: add xdp_features support")
    Reported-by: Prashant Batra <prbatra.mail@gmail.com>
    Link: https://lore.kernel.org/all/CAJ8uoz2ieZCopgqTvQ9ZY6xQgTbujmC6XkMTamhp68O-h_-rLg@mail.gmail.com/T/
    Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
    Reviewed-by: Toke Hц╦iland-Jц╦rgensen <toke@redhat.com>
    Link: https://lore.kernel.org/r/20240207084737.20890-1-magnus.karlsson@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

btrfs: add and use helper to check if block group is used [+ + +]

Author: Filipe Manana <fdmanana@suse.com>
Date:   Thu Jan 25 09:53:06 2024 +0000

    btrfs: add and use helper to check if block group is used
    
    commit 1693d5442c458ae8d5b0d58463b873cd879569ed upstream.
    
    Add a helper function to determine if a block group is being used and make
    use of it at btrfs_delete_unused_bgs(). This helper will also be used in
    future code changes.
    
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Reviewed-by: Boris Burkov <boris@bur.io>
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: do not ASSERT() if the newly created subvolume already got read [+ + +]

Author: Qu Wenruo <wqu@suse.com>
Date:   Sat Jan 20 19:41:28 2024 +1030

    btrfs: do not ASSERT() if the newly created subvolume already got read
    
    commit e03ee2fe873eb68c1f9ba5112fee70303ebf9dfb upstream.
    
    [BUG]
    There is a syzbot crash, triggered by the ASSERT() during subvolume
    creation:
    
     assertion failed: !anon_dev, in fs/btrfs/disk-io.c:1319
     ------------[ cut here ]------------
     kernel BUG at fs/btrfs/disk-io.c:1319!
     invalid opcode: 0000 [#1] PREEMPT SMP KASAN
     RIP: 0010:btrfs_get_root_ref.part.0+0x9aa/0xa60
      <TASK>
      btrfs_get_new_fs_root+0xd3/0xf0
      create_subvol+0xd02/0x1650
      btrfs_mksubvol+0xe95/0x12b0
      __btrfs_ioctl_snap_create+0x2f9/0x4f0
      btrfs_ioctl_snap_create+0x16b/0x200
      btrfs_ioctl+0x35f0/0x5cf0
      __x64_sys_ioctl+0x19d/0x210
      do_syscall_64+0x3f/0xe0
      entry_SYSCALL_64_after_hwframe+0x63/0x6b
     ---[ end trace 0000000000000000 ]---
    
    [CAUSE]
    During create_subvol(), after inserting root item for the newly created
    subvolume, we would trigger btrfs_get_new_fs_root() to get the
    btrfs_root of that subvolume.
    
    The idea here is, we have preallocated an anonymous device number for
    the subvolume, thus we can assign it to the new subvolume.
    
    But there is really nothing preventing things like backref walk to read
    the new subvolume.
    If that happens before we call btrfs_get_new_fs_root(), the subvolume
    would be read out, with a new anonymous device number assigned already.
    
    In that case, we would trigger ASSERT(), as we really expect no one to
    read out that subvolume (which is not yet accessible from the fs).
    But things like backref walk is still possible to trigger the read on
    the subvolume.
    
    Thus our assumption on the ASSERT() is not correct in the first place.
    
    [FIX]
    Fix it by removing the ASSERT(), and just free the @anon_dev, reset it
    to 0, and continue.
    
    If the subvolume tree is read out by something else, it should have
    already get a new anon_dev assigned thus we only need to free the
    preallocated one.
    
    Reported-by: Chenyuan Yang <chenyuan0y@gmail.com>
    Fixes: 2dfb1e43f57d ("btrfs: preallocate anon block device at first phase of snapshot creation")
    CC: stable@vger.kernel.org # 5.15+
    Reviewed-by: Filipe Manana <fdmanana@suse.com>
    Signed-off-by: Qu Wenruo <wqu@suse.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: do not delete unused block group if it may be used soon [+ + +]

Author: Filipe Manana <fdmanana@suse.com>
Date:   Thu Jan 25 09:53:14 2024 +0000

    btrfs: do not delete unused block group if it may be used soon
    
    commit f4a9f219411f318ae60d6ff7f129082a75686c6c upstream.
    
    Before deleting a block group that is in the list of unused block groups
    (fs_info->unused_bgs), we check if the block group became used before
    deleting it, as extents from it may have been allocated after it was added
    to the list.
    
    However even if the block group was not yet used, there may be tasks that
    have only reserved space and have not yet allocated extents, and they
    might be relying on the availability of the unused block group in order
    to allocate extents. The reservation works first by increasing the
    "bytes_may_use" field of the corresponding space_info object (which may
    first require flushing delayed items, allocating a new block group, etc),
    and only later a task does the actual allocation of extents.
    
    For metadata we usually don't end up using all reserved space, as we are
    pessimistic and typically account for the worst cases (need to COW every
    single node in a path of a tree at maximum possible height, etc). For
    data we usually reserve the exact amount of space we're going to allocate
    later, except when using compression where we always reserve space based
    on the uncompressed size, as compression is only triggered when writeback
    starts so we don't know in advance how much space we'll actually need, or
    if the data is compressible.
    
    So don't delete an unused block group if the total size of its space_info
    object minus the block group's size is less then the sum of used space and
    space that may be used (space_info->bytes_may_use), as that means we have
    tasks that reserved space and may need to allocate extents from the block
    group. In this case, besides skipping the deletion, re-add the block group
    to the list of unused block groups so that it may be reconsidered later,
    in case the tasks that reserved space end up not needing to allocate
    extents from it.
    
    Allowing the deletion of the block group while we have reserved space, can
    result in tasks failing to allocate metadata extents (-ENOSPC) while under
    a transaction handle, resulting in a transaction abort, or failure during
    writeback for the case of data extents.
    
    CC: stable@vger.kernel.org # 6.0+
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Reviewed-by: Boris Burkov <boris@bur.io>
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: don't drop extent_map for free space inode on write error [+ + +]

Author: Josef Bacik <josef@toxicpanda.com>
Date:   Wed Jan 31 14:27:25 2024 -0500

    btrfs: don't drop extent_map for free space inode on write error
    
    commit 5571e41ec6e56e35f34ae9f5b3a335ef510e0ade upstream.
    
    While running the CI for an unrelated change I hit the following panic
    with generic/648 on btrfs_holes_spacecache.
    
    assertion failed: block_start != EXTENT_MAP_HOLE, in fs/btrfs/extent_io.c:1385
    ------------[ cut here ]------------
    kernel BUG at fs/btrfs/extent_io.c:1385!
    invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
    CPU: 1 PID: 2695096 Comm: fsstress Kdump: loaded Tainted: G        W          6.8.0-rc2+ #1
    RIP: 0010:__extent_writepage_io.constprop.0+0x4c1/0x5c0
    Call Trace:
     <TASK>
     extent_write_cache_pages+0x2ac/0x8f0
     extent_writepages+0x87/0x110
     do_writepages+0xd5/0x1f0
     filemap_fdatawrite_wbc+0x63/0x90
     __filemap_fdatawrite_range+0x5c/0x80
     btrfs_fdatawrite_range+0x1f/0x50
     btrfs_write_out_cache+0x507/0x560
     btrfs_write_dirty_block_groups+0x32a/0x420
     commit_cowonly_roots+0x21b/0x290
     btrfs_commit_transaction+0x813/0x1360
     btrfs_sync_file+0x51a/0x640
     __x64_sys_fdatasync+0x52/0x90
     do_syscall_64+0x9c/0x190
     entry_SYSCALL_64_after_hwframe+0x6e/0x76
    
    This happens because we fail to write out the free space cache in one
    instance, come back around and attempt to write it again.  However on
    the second pass through we go to call btrfs_get_extent() on the inode to
    get the extent mapping.  Because this is a new block group, and with the
    free space inode we always search the commit root to avoid deadlocking
    with the tree, we find nothing and return a EXTENT_MAP_HOLE for the
    requested range.
    
    This happens because the first time we try to write the space cache out
    we hit an error, and on an error we drop the extent mapping.  This is
    normal for normal files, but the free space cache inode is special.  We
    always expect the extent map to be correct.  Thus the second time
    through we end up with a bogus extent map.
    
    Since we're deprecating this feature, the most straightforward way to
    fix this is to simply skip dropping the extent map range for this failed
    range.
    
    I shortened the test by using error injection to stress the area to make
    it easier to reproduce.  With this patch in place we no longer panic
    with my error injection test.
    
    CC: stable@vger.kernel.org # 4.14+
    Reviewed-by: Filipe Manana <fdmanana@suse.com>
    Signed-off-by: Josef Bacik <josef@toxicpanda.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: don't reserve space for checksums when writing to nocow files [+ + +]

Author: Filipe Manana <fdmanana@suse.com>
Date:   Wed Jan 31 17:18:04 2024 +0000

    btrfs: don't reserve space for checksums when writing to nocow files
    
    commit feefe1f49d26bad9d8997096e3a200280fa7b1c5 upstream.
    
    Currently when doing a write to a file we always reserve metadata space
    for inserting data checksums. However we don't need to do it if we have
    a nodatacow file (-o nodatacow mount option or chattr +C) or if checksums
    are disabled (-o nodatasum mount option), as in that case we are only
    adding unnecessary pressure to metadata reservations.
    
    For example on x86_64, with the default node size of 16K, a 4K buffered
    write into a nodatacow file is reserving 655360 bytes of metadata space,
    as it's accounting for checksums. After this change, which stops reserving
    space for checksums if we have a nodatacow file or checksums are disabled,
    we only need to reserve 393216 bytes of metadata.
    
    CC: stable@vger.kernel.org # 6.1+
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: forbid creating subvol qgroups [+ + +]

Author: Boris Burkov <boris@bur.io>
Date:   Wed Jan 10 17:51:26 2024 -0800

    btrfs: forbid creating subvol qgroups
    
    commit 0c309d66dacddf8ce939b891d9ead4a8e21ad6f0 upstream.
    
    Creating a qgroup 0/subvolid leads to various races and it isn't
    helpful, because you can't specify a subvol id when creating a subvol,
    so you can't be sure it will be the right one. Any requirements on the
    automatic subvol can be gratified by using a higher level qgroup and the
    inheritance parameters of subvol creation.
    
    Fixes: cecbb533b5fc ("btrfs: record simple quota deltas in delayed refs")
    CC: stable@vger.kernel.org # 4.14+
    Reviewed-by: Qu Wenruo <wqu@suse.com>
    Signed-off-by: Boris Burkov <boris@bur.io>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: forbid deleting live subvol qgroup [+ + +]

Author: Boris Burkov <boris@bur.io>
Date:   Wed Jan 10 17:30:00 2024 -0800

    btrfs: forbid deleting live subvol qgroup
    
    commit a8df35619948bd8363d330c20a90c9a7fbff28c0 upstream.
    
    If a subvolume still exists, forbid deleting its qgroup 0/subvolid.
    This behavior generally leads to incorrect behavior in squotas and
    doesn't have a legitimate purpose.
    
    Fixes: cecbb533b5fc ("btrfs: record simple quota deltas in delayed refs")
    CC: stable@vger.kernel.org # 5.4+
    Reviewed-by: Qu Wenruo <wqu@suse.com>
    Signed-off-by: Boris Burkov <boris@bur.io>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: reject encoded write if inode has nodatasum flag set [+ + +]

Author: Filipe Manana <fdmanana@suse.com>
Date:   Fri Feb 2 12:09:22 2024 +0000

    btrfs: reject encoded write if inode has nodatasum flag set
    
    commit 1bd96c92c6a0a4d43815eb685c15aa4b78879dc9 upstream.
    
    Currently we allow an encoded write against inodes that have the NODATASUM
    flag set, either because they are NOCOW files or they were created while
    the filesystem was mounted with "-o nodatasum". This results in having
    compressed extents without corresponding checksums, which is a filesystem
    inconsistency reported by 'btrfs check'.
    
    For example, running btrfs/281 with MOUNT_OPTIONS="-o nodatacow" triggers
    this and 'btrfs check' errors out with:
    
       [1/7] checking root items
       [2/7] checking extents
       [3/7] checking free space tree
       [4/7] checking fs roots
       root 256 inode 257 errors 1040, bad file extent, some csum missing
       root 256 inode 258 errors 1040, bad file extent, some csum missing
       ERROR: errors found in fs roots
       (...)
    
    So reject encoded writes if the target inode has NODATASUM set.
    
    CC: stable@vger.kernel.org # 6.1+
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: send: return EOPNOTSUPP on unknown flags [+ + +]

Author: David Sterba <dsterba@suse.com>
Date:   Wed Jan 10 17:48:44 2024 +0100

    btrfs: send: return EOPNOTSUPP on unknown flags
    
    commit f884a9f9e59206a2d41f265e7e403f080d10b493 upstream.
    
    When some ioctl flags are checked we return EOPNOTSUPP, like for
    BTRFS_SCRUB_SUPPORTED_FLAGS, BTRFS_SUBVOL_CREATE_ARGS_MASK or fallocate
    modes. The EINVAL is supposed to be for a supported but invalid
    values or combination of options. Fix that when checking send flags so
    it's consistent with the rest.
    
    CC: stable@vger.kernel.org # 4.14+
    Link: https://lore.kernel.org/linux-btrfs/CAL3q7H5rryOLzp3EKq8RTbjMHMHeaJubfpsVLF6H4qJnKCUR1w@mail.gmail.com/
    Reviewed-by: Filipe Manana <fdmanana@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

can: j1939: Fix UAF in j1939_sk_match_filter during setsockopt(SO_J1939_FILTER) [+ + +]

Author: Oleksij Rempel <o.rempel@pengutronix.de>
Date:   Fri Oct 20 15:38:14 2023 +0200

    can: j1939: Fix UAF in j1939_sk_match_filter during setsockopt(SO_J1939_FILTER)
    
    commit efe7cf828039aedb297c1f9920b638fffee6aabc upstream.
    
    Lock jsk->sk to prevent UAF when setsockopt(..., SO_J1939_FILTER, ...)
    modifies jsk->filters while receiving packets.
    
    Following trace was seen on affected system:
     ==================================================================
     BUG: KASAN: slab-use-after-free in j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939]
     Read of size 4 at addr ffff888012144014 by task j1939/350
    
     CPU: 0 PID: 350 Comm: j1939 Tainted: G        W  OE      6.5.0-rc5 #1
     Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
     Call Trace:
      print_report+0xd3/0x620
      ? kasan_complete_mode_report_info+0x7d/0x200
      ? j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939]
      kasan_report+0xc2/0x100
      ? j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939]
      __asan_load4+0x84/0xb0
      j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939]
      j1939_sk_recv+0x20b/0x320 [can_j1939]
      ? __kasan_check_write+0x18/0x20
      ? __pfx_j1939_sk_recv+0x10/0x10 [can_j1939]
      ? j1939_simple_recv+0x69/0x280 [can_j1939]
      ? j1939_ac_recv+0x5e/0x310 [can_j1939]
      j1939_can_recv+0x43f/0x580 [can_j1939]
      ? __pfx_j1939_can_recv+0x10/0x10 [can_j1939]
      ? raw_rcv+0x42/0x3c0 [can_raw]
      ? __pfx_j1939_can_recv+0x10/0x10 [can_j1939]
      can_rcv_filter+0x11f/0x350 [can]
      can_receive+0x12f/0x190 [can]
      ? __pfx_can_rcv+0x10/0x10 [can]
      can_rcv+0xdd/0x130 [can]
      ? __pfx_can_rcv+0x10/0x10 [can]
      __netif_receive_skb_one_core+0x13d/0x150
      ? __pfx___netif_receive_skb_one_core+0x10/0x10
      ? __kasan_check_write+0x18/0x20
      ? _raw_spin_lock_irq+0x8c/0xe0
      __netif_receive_skb+0x23/0xb0
      process_backlog+0x107/0x260
      __napi_poll+0x69/0x310
      net_rx_action+0x2a1/0x580
      ? __pfx_net_rx_action+0x10/0x10
      ? __pfx__raw_spin_lock+0x10/0x10
      ? handle_irq_event+0x7d/0xa0
      __do_softirq+0xf3/0x3f8
      do_softirq+0x53/0x80
      </IRQ>
      <TASK>
      __local_bh_enable_ip+0x6e/0x70
      netif_rx+0x16b/0x180
      can_send+0x32b/0x520 [can]
      ? __pfx_can_send+0x10/0x10 [can]
      ? __check_object_size+0x299/0x410
      raw_sendmsg+0x572/0x6d0 [can_raw]
      ? __pfx_raw_sendmsg+0x10/0x10 [can_raw]
      ? apparmor_socket_sendmsg+0x2f/0x40
      ? __pfx_raw_sendmsg+0x10/0x10 [can_raw]
      sock_sendmsg+0xef/0x100
      sock_write_iter+0x162/0x220
      ? __pfx_sock_write_iter+0x10/0x10
      ? __rtnl_unlock+0x47/0x80
      ? security_file_permission+0x54/0x320
      vfs_write+0x6ba/0x750
      ? __pfx_vfs_write+0x10/0x10
      ? __fget_light+0x1ca/0x1f0
      ? __rcu_read_unlock+0x5b/0x280
      ksys_write+0x143/0x170
      ? __pfx_ksys_write+0x10/0x10
      ? __kasan_check_read+0x15/0x20
      ? fpregs_assert_state_consistent+0x62/0x70
      __x64_sys_write+0x47/0x60
      do_syscall_64+0x60/0x90
      ? do_syscall_64+0x6d/0x90
      ? irqentry_exit+0x3f/0x50
      ? exc_page_fault+0x79/0xf0
      entry_SYSCALL_64_after_hwframe+0x6e/0xd8
    
     Allocated by task 348:
      kasan_save_stack+0x2a/0x50
      kasan_set_track+0x29/0x40
      kasan_save_alloc_info+0x1f/0x30
      __kasan_kmalloc+0xb5/0xc0
      __kmalloc_node_track_caller+0x67/0x160
      j1939_sk_setsockopt+0x284/0x450 [can_j1939]
      __sys_setsockopt+0x15c/0x2f0
      __x64_sys_setsockopt+0x6b/0x80
      do_syscall_64+0x60/0x90
      entry_SYSCALL_64_after_hwframe+0x6e/0xd8
    
     Freed by task 349:
      kasan_save_stack+0x2a/0x50
      kasan_set_track+0x29/0x40
      kasan_save_free_info+0x2f/0x50
      __kasan_slab_free+0x12e/0x1c0
      __kmem_cache_free+0x1b9/0x380
      kfree+0x7a/0x120
      j1939_sk_setsockopt+0x3b2/0x450 [can_j1939]
      __sys_setsockopt+0x15c/0x2f0
      __x64_sys_setsockopt+0x6b/0x80
      do_syscall_64+0x60/0x90
      entry_SYSCALL_64_after_hwframe+0x6e/0xd8
    
    Fixes: 9d71dd0c70099 ("can: add support of SAE J1939 protocol")
    Reported-by: Sili Luo <rootlab@huawei.com>
    Suggested-by: Sili Luo <rootlab@huawei.com>
    Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
    Cc: stable@vger.kernel.org
    Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
    Link: https://lore.kernel.org/all/20231020133814.383996-1-o.rempel@pengutronix.de
    Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock [+ + +]

Author: Ziqi Zhao <astrajoan@yahoo.com>
Date:   Fri Jul 21 09:22:26 2023 -0700

    can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock
    
    commit 6cdedc18ba7b9dacc36466e27e3267d201948c8d upstream.
    
    The following 3 locks would race against each other, causing the
    deadlock situation in the Syzbot bug report:
    
    - j1939_socks_lock
    - active_session_list_lock
    - sk_session_queue_lock
    
    A reasonable fix is to change j1939_socks_lock to an rwlock, since in
    the rare situations where a write lock is required for the linked list
    that j1939_socks_lock is protecting, the code does not attempt to
    acquire any more locks. This would break the circular lock dependency,
    where, for example, the current thread already locks j1939_socks_lock
    and attempts to acquire sk_session_queue_lock, and at the same time,
    another thread attempts to acquire j1939_socks_lock while holding
    sk_session_queue_lock.
    
    NOTE: This patch along does not fix the unregister_netdevice bug
    reported by Syzbot; instead, it solves a deadlock situation to prepare
    for one or more further patches to actually fix the Syzbot bug, which
    appears to be a reference counting problem within the j1939 codebase.
    
    Reported-by: <syzbot+1591462f226d9cbf0564@syzkaller.appspotmail.com>
    Signed-off-by: Ziqi Zhao <astrajoan@yahoo.com>
    Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
    Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
    Link: https://lore.kernel.org/all/20230721162226.8639-1-astrajoan@yahoo.com
    [mkl: remove unrelated newline change]
    Cc: stable@vger.kernel.org
    Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

can: netlink: Fix TDCO calculation using the old data bittiming [+ + +]

Author: Maxime Jayat <maxime.jayat@mobile-devices.fr>
Date:   Mon Nov 6 19:01:58 2023 +0100

    can: netlink: Fix TDCO calculation using the old data bittiming
    
    commit 2aa0a5e65eae27dbd96faca92c84ecbf6f492d42 upstream.
    
    The TDCO calculation was done using the currently applied data bittiming,
    instead of the newly computed data bittiming, which means that the TDCO
    had an invalid value unless setting the same data bittiming twice.
    
    Fixes: d99755f71a80 ("can: netlink: add interface for CAN-FD Transmitter Delay Compensation (TDC)")
    Signed-off-by: Maxime Jayat <maxime.jayat@mobile-devices.fr>
    Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
    Link: https://lore.kernel.org/all/40579c18-63c0-43a4-8d4c-f3a6c1c0b417@munic.io
    Cc: stable@vger.kernel.org
    Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ceph: prevent use-after-free in encode_cap_msg() [+ + +]

Author: Rishabh Dave <ridave@redhat.com>
Date:   Thu Feb 1 17:07:16 2024 +0530

    ceph: prevent use-after-free in encode_cap_msg()
    
    commit cda4672da1c26835dcbd7aec2bfed954eda9b5ef upstream.
    
    In fs/ceph/caps.c, in encode_cap_msg(), "use after free" error was
    caught by KASAN at this line - 'ceph_buffer_get(arg->xattr_buf);'. This
    implies before the refcount could be increment here, it was freed.
    
    In same file, in "handle_cap_grant()" refcount is decremented by this
    line - 'ceph_buffer_put(ci->i_xattrs.blob);'. It appears that a race
    occurred and resource was freed by the latter line before the former
    line could increment it.
    
    encode_cap_msg() is called by __send_cap() and __send_cap() is called by
    ceph_check_caps() after calling __prep_cap(). __prep_cap() is where
    arg->xattr_buf is assigned to ci->i_xattrs.blob. This is the spot where
    the refcount must be increased to prevent "use after free" error.
    
    Cc: stable@vger.kernel.org
    Link: https://tracker.ceph.com/issues/59259
    Signed-off-by: Rishabh Dave <ridave@redhat.com>
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Reviewed-by: Xiubo Li <xiubli@redhat.com>
    Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cifs: fix underflow in parse_server_interfaces() [+ + +]

Author: Dan Carpenter <dan.carpenter@linaro.org>
Date:   Thu Feb 8 13:18:46 2024 +0300

    cifs: fix underflow in parse_server_interfaces()
    
    [ Upstream commit cffe487026be13eaf37ea28b783d9638ab147204 ]
    
    In this loop, we step through the buffer and after each item we check
    if the size_left is greater than the minimum size we need.  However,
    the problem is that "bytes_left" is type ssize_t while sizeof() is type
    size_t.  That means that because of type promotion, the comparison is
    done as an unsigned and if we have negative bytes left the loop
    continues instead of ending.
    
    Fixes: fe856be475f7 ("CIFS: parse and store info on iface queries")
    Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
    Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

connector/cn_proc: revert "connector: Fix proc_event_num_listeners count not cleared" [+ + +]

Author: Keqi Wang <wangkeqi_chris@163.com>
Date:   Fri Feb 9 17:16:59 2024 +0800

    connector/cn_proc: revert "connector: Fix proc_event_num_listeners count not cleared"
    
    commit 8929f95b2b587791a7dcd04cc91520194a76d3a6 upstream.
    
    This reverts commit c46bfba1337d ("connector: Fix proc_event_num_listeners
    count not cleared").
    
    It is not accurate to reset proc_event_num_listeners according to
    cn_netlink_send_mult() return value -ESRCH.
    
    In the case of stress-ng netlink-proc, -ESRCH will always be returned,
    because netlink_broadcast_filtered will return -ESRCH,
    which may cause stress-ng netlink-proc performance degradation.
    
    Reported-by: kernel test robot <oliver.sang@intel.com>
    Closes: https://lore.kernel.org/oe-lkp/202401112259.b23a1567-oliver.sang@intel.com
    Fixes: c46bfba1337d ("connector: Fix proc_event_num_listeners count not cleared")
    Signed-off-by: Keqi Wang <wangkeqi_chris@163.com>
    Link: https://lore.kernel.org/r/20240209091659.68723-1-wangkeqi_chris@163.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: algif_hash - Remove bogus SGL free on zero-length error path [+ + +]

Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Thu Feb 1 13:49:09 2024 +0800

    crypto: algif_hash - Remove bogus SGL free on zero-length error path
    
    commit 24c890dd712f6345e382256cae8c97abb0406b70 upstream.
    
    When a zero-length message is hashed by algif_hash, and an error
    is triggered, it tries to free an SG list that was never allocated
    in the first place.  Fix this by not freeing the SG list on the
    zero-length error path.
    
    Reported-by: Shigeru Yoshida <syoshida@redhat.com>
    Reported-by: xingwei lee <xrivendell7@gmail.com>
    Fixes: b6d972f68983 ("crypto: af_alg/hash: Fix recvmsg() after sendmsg(MSG_MORE)")
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Reported-by: syzbot+3266db0c26d1fbbe3abb@syzkaller.appspotmail.com
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: ccp - Fix null pointer dereference in __sev_platform_shutdown_locked [+ + +]

Author: Kim Phillips <kim.phillips@amd.com>
Date:   Thu Jan 25 17:12:53 2024 -0600

    crypto: ccp - Fix null pointer dereference in __sev_platform_shutdown_locked
    
    commit ccb88e9549e7cfd8bcd511c538f437e20026e983 upstream.
    
    The SEV platform device can be shutdown with a null psp_master,
    e.g., using DEBUG_TEST_DRIVER_REMOVE.  Found using KASAN:
    
    [  137.148210] ccp 0000:23:00.1: enabling device (0000 -> 0002)
    [  137.162647] ccp 0000:23:00.1: no command queues available
    [  137.170598] ccp 0000:23:00.1: sev enabled
    [  137.174645] ccp 0000:23:00.1: psp enabled
    [  137.178890] general protection fault, probably for non-canonical address 0xdffffc000000001e: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN NOPTI
    [  137.182693] KASAN: null-ptr-deref in range [0x00000000000000f0-0x00000000000000f7]
    [  137.182693] CPU: 93 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc1+ #311
    [  137.182693] RIP: 0010:__sev_platform_shutdown_locked+0x51/0x180
    [  137.182693] Code: 08 80 3c 08 00 0f 85 0e 01 00 00 48 8b 1d 67 b6 01 08 48 b8 00 00 00 00 00 fc ff df 48 8d bb f0 00 00 00 48 89 f9 48 c1 e9 03 <80> 3c 01 00 0f 85 fe 00 00 00 48 8b 9b f0 00 00 00 48 85 db 74 2c
    [  137.182693] RSP: 0018:ffffc900000cf9b0 EFLAGS: 00010216
    [  137.182693] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 000000000000001e
    [  137.182693] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 00000000000000f0
    [  137.182693] RBP: ffffc900000cf9c8 R08: 0000000000000000 R09: fffffbfff58f5a66
    [  137.182693] R10: ffffc900000cf9c8 R11: ffffffffac7ad32f R12: ffff8881e5052c28
    [  137.182693] R13: ffff8881e5052c28 R14: ffff8881758e43e8 R15: ffffffffac64abf8
    [  137.182693] FS:  0000000000000000(0000) GS:ffff889de7000000(0000) knlGS:0000000000000000
    [  137.182693] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  137.182693] CR2: 0000000000000000 CR3: 0000001cf7c7e000 CR4: 0000000000350ef0
    [  137.182693] Call Trace:
    [  137.182693]  <TASK>
    [  137.182693]  ? show_regs+0x6c/0x80
    [  137.182693]  ? __die_body+0x24/0x70
    [  137.182693]  ? die_addr+0x4b/0x80
    [  137.182693]  ? exc_general_protection+0x126/0x230
    [  137.182693]  ? asm_exc_general_protection+0x2b/0x30
    [  137.182693]  ? __sev_platform_shutdown_locked+0x51/0x180
    [  137.182693]  sev_firmware_shutdown.isra.0+0x1e/0x80
    [  137.182693]  sev_dev_destroy+0x49/0x100
    [  137.182693]  psp_dev_destroy+0x47/0xb0
    [  137.182693]  sp_destroy+0xbb/0x240
    [  137.182693]  sp_pci_remove+0x45/0x60
    [  137.182693]  pci_device_remove+0xaa/0x1d0
    [  137.182693]  device_remove+0xc7/0x170
    [  137.182693]  really_probe+0x374/0xbe0
    [  137.182693]  ? srso_return_thunk+0x5/0x5f
    [  137.182693]  __driver_probe_device+0x199/0x460
    [  137.182693]  driver_probe_device+0x4e/0xd0
    [  137.182693]  __driver_attach+0x191/0x3d0
    [  137.182693]  ? __pfx___driver_attach+0x10/0x10
    [  137.182693]  bus_for_each_dev+0x100/0x190
    [  137.182693]  ? __pfx_bus_for_each_dev+0x10/0x10
    [  137.182693]  ? __kasan_check_read+0x15/0x20
    [  137.182693]  ? srso_return_thunk+0x5/0x5f
    [  137.182693]  ? _raw_spin_unlock+0x27/0x50
    [  137.182693]  driver_attach+0x41/0x60
    [  137.182693]  bus_add_driver+0x2a8/0x580
    [  137.182693]  driver_register+0x141/0x480
    [  137.182693]  __pci_register_driver+0x1d6/0x2a0
    [  137.182693]  ? srso_return_thunk+0x5/0x5f
    [  137.182693]  ? esrt_sysfs_init+0x1cd/0x5d0
    [  137.182693]  ? __pfx_sp_mod_init+0x10/0x10
    [  137.182693]  sp_pci_init+0x22/0x30
    [  137.182693]  sp_mod_init+0x14/0x30
    [  137.182693]  ? __pfx_sp_mod_init+0x10/0x10
    [  137.182693]  do_one_initcall+0xd1/0x470
    [  137.182693]  ? __pfx_do_one_initcall+0x10/0x10
    [  137.182693]  ? parameq+0x80/0xf0
    [  137.182693]  ? srso_return_thunk+0x5/0x5f
    [  137.182693]  ? __kmalloc+0x3b0/0x4e0
    [  137.182693]  ? kernel_init_freeable+0x92d/0x1050
    [  137.182693]  ? kasan_populate_vmalloc_pte+0x171/0x190
    [  137.182693]  ? srso_return_thunk+0x5/0x5f
    [  137.182693]  kernel_init_freeable+0xa64/0x1050
    [  137.182693]  ? __pfx_kernel_init+0x10/0x10
    [  137.182693]  kernel_init+0x24/0x160
    [  137.182693]  ? __switch_to_asm+0x3e/0x70
    [  137.182693]  ret_from_fork+0x40/0x80
    [  137.182693]  ? __pfx_kernel_init+0x10/0x10
    [  137.182693]  ret_from_fork_asm+0x1b/0x30
    [  137.182693]  </TASK>
    [  137.182693] Modules linked in:
    [  137.538483] ---[ end trace 0000000000000000 ]---
    
    Fixes: 1b05ece0c931 ("crypto: ccp - During shutdown, check SEV data pointer before using")
    Cc: stable@vger.kernel.org
    Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
    Signed-off-by: Kim Phillips <kim.phillips@amd.com>
    Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
    Acked-by: John Allen <john.allen@amd.com>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

devlink: Fix command annotation documentation [+ + +]

Author: Parav Pandit <parav@nvidia.com>
Date:   Tue Feb 6 18:17:17 2024 +0200

    devlink: Fix command annotation documentation
    
    [ Upstream commit 4ab18af47a2c2a80ac11674122935700caf80cc6 ]
    
    Command example string is not read as command.
    Fix command annotation.
    
    Fixes: a8ce7b26a51e ("devlink: Expose port function commands to control migratable")
    Signed-off-by: Parav Pandit <parav@nvidia.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://lore.kernel.org/r/20240206161717.466653-1-parav@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

dm-crypt, dm-verity: disable tasklets [+ + +]

Author: Mikulas Patocka <mpatocka@redhat.com>
Date:   Wed Jan 31 21:57:27 2024 +0100

    dm-crypt, dm-verity: disable tasklets
    
    commit 0a9bab391e336489169b95cb0d4553d921302189 upstream.
    
    Tasklets have an inherent problem with memory corruption. The function
    tasklet_action_common calls tasklet_trylock, then it calls the tasklet
    callback and then it calls tasklet_unlock. If the tasklet callback frees
    the structure that contains the tasklet or if it calls some code that may
    free it, tasklet_unlock will write into free memory.
    
    The commits 8e14f610159d and d9a02e016aaf try to fix it for dm-crypt, but
    it is not a sufficient fix and the data corruption can still happen [1].
    There is no fix for dm-verity and dm-verity will write into free memory
    with every tasklet-processed bio.
    
    There will be atomic workqueues implemented in the kernel 6.9 [2]. They
    will have better interface and they will not suffer from the memory
    corruption problem.
    
    But we need something that stops the memory corruption now and that can be
    backported to the stable kernels. So, I'm proposing this commit that
    disables tasklets in both dm-crypt and dm-verity. This commit doesn't
    remove the tasklet support, because the tasklet code will be reused when
    atomic workqueues will be implemented.
    
    [1] https://lore.kernel.org/all/d390d7ee-f142-44d3-822a-87949e14608b@suse.de/T/
    [2] https://lore.kernel.org/lkml/20240130091300.2968534-1-tj@kernel.org/
    
    Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
    Cc: stable@vger.kernel.org
    Fixes: 39d42fa96ba1b ("dm crypt: add flags to optionally bypass kcryptd workqueues")
    Fixes: 5721d4e5a9cdb ("dm verity: Add optional "try_verify_in_tasklet" feature")
    Signed-off-by: Mike Snitzer <snitzer@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dm: limit the number of targets and parameter size area [+ + +]

Author: Mikulas Patocka <mpatocka@redhat.com>
Date:   Tue Jan 9 15:57:56 2024 +0100

    dm: limit the number of targets and parameter size area
    
    commit bd504bcfec41a503b32054da5472904b404341a4 upstream.
    
    The kvmalloc function fails with a warning if the size is larger than
    INT_MAX. The warning was triggered by a syscall testing robot.
    
    In order to avoid the warning, this commit limits the number of targets to
    1048576 and the size of the parameter area to 1073741824.
    
    Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
    Signed-off-by: Mike Snitzer <snitzer@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

docs: kernel_feat.py: fix build error for missing files [+ + +]

Author: Vegard Nossum <vegard.nossum@oracle.com>
Date:   Mon Feb 5 18:51:26 2024 +0100

    docs: kernel_feat.py: fix build error for missing files
    
    commit c23de7ceae59e4ca5894c3ecf4f785c50c0fa428 upstream.
    
    If the directory passed to the '.. kernel-feat::' directive does not
    exist or the get_feat.pl script does not find any files to extract
    features from, Sphinx will report the following error:
    
        Sphinx parallel build error:
        UnboundLocalError: local variable 'fname' referenced before assignment
        make[2]: *** [Documentation/Makefile:102: htmldocs] Error 2
    
    This is due to how I changed the script in c48a7c44a1d0 ("docs:
    kernel_feat.py: fix potential command injection"). Before that, the
    filename passed along to self.nestedParse() in this case was weirdly
    just the whole get_feat.pl invocation.
    
    We can fix it by doing what kernel_abi.py does -- just pass
    self.arguments[0] as 'fname'.
    
    Fixes: c48a7c44a1d0 ("docs: kernel_feat.py: fix potential command injection")
    Cc: Justin Forbes <jforbes@fedoraproject.org>
    Cc: Salvatore Bonaccorso <carnil@debian.org>
    Cc: Jani Nikula <jani.nikula@intel.com>
    Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
    Cc: stable@vger.kernel.org
    Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
    Link: https://lore.kernel.org/r/20240205175133.774271-2-vegard.nossum@oracle.com
    Signed-off-by: Jonathan Corbet <corbet@lwn.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Documentation/arch/ia64/features.rst: fix kernel-feat directive [+ + +]

Author: Vegard Nossum <vegard.nossum@oracle.com>
Date:   Mon Feb 5 11:39:59 2024 +0100

    Documentation/arch/ia64/features.rst: fix kernel-feat directive
    
    My mainline commit c48a7c44a1d0 ("docs: kernel_feat.py: fix potential
    command injection") contains a bug which can manifests like this when
    building the documentation:
    
        Sphinx parallel build error:
        UnboundLocalError: local variable 'fname' referenced before assignment
        make[2]: *** [Documentation/Makefile:102: htmldocs] Error 2
    
    However, this only appears when there exists a '.. kernel-feat::'
    directive that points to a non-existent file, which isn't the case in
    mainline.
    
    When this commit was backported to stable 6.6, it didn't change
    Documentation/arch/ia64/features.rst since ia64 was removed in 6.7 in
    commit cf8e8658100d ("arch: Remove Itanium (IA-64) architecture"). This
    lead to the build failure seen above -- but only in stable kernels.
    
    This patch fixes the backport and should only be applied to kernels where
    Documentation/arch/ia64/features.rst exists and commit c48a7c44a1d0 has
    also been applied.
    
    A second patch will follow to fix kernel_feat.py in mainline so that it
    doesn't error out when the '.. kernel-feat::' directive points to a
    nonexistent file.
    
    Link: https://lore.kernel.org/all/ZbkfGst991YHqJHK@fedora64.linuxtx.org/
    Fixes: e961f8c6966a ("docs: kernel_feat.py: fix potential command injection") # stable 6.6.15
    Reported-by: Justin Forbes <jforbes@fedoraproject.org>
    Reported-by: Salvatore Bonaccorso <carnil@debian.org>
    Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

driver core: Fix device_link_flag_is_sync_state_only() [+ + +]

Author: Saravana Kannan <saravanak@google.com>
Date:   Fri Feb 2 01:56:33 2024 -0800

    driver core: Fix device_link_flag_is_sync_state_only()
    
    commit 7fddac12c38237252431d5b8af7b6d5771b6d125 upstream.
    
    device_link_flag_is_sync_state_only() correctly returns true on the flags
    of an existing device link that only implements sync_state() functionality.
    However, it incorrectly and confusingly returns false if it's called with
    DL_FLAG_SYNC_STATE_ONLY.
    
    This bug doesn't manifest in any of the existing calls to this function,
    but fix this confusing behavior to avoid future bugs.
    
    Fixes: 67cad5c67019 ("driver core: fw_devlink: Add DL_FLAG_CYCLE support to device links")
    Signed-off-by: Saravana Kannan <saravanak@google.com>
    Tested-by: Xu Yang <xu.yang_2@nxp.com>
    Link: https://lore.kernel.org/r/20240202095636.868578-2-saravanak@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

driver core: fw_devlink: Improve detection of overlapping cycles [+ + +]

Author: Saravana Kannan <saravanak@google.com>
Date:   Fri Feb 2 01:56:34 2024 -0800

    driver core: fw_devlink: Improve detection of overlapping cycles
    
    [ Upstream commit 6442d79d880cf7a2fff18779265d657fef0cce4c ]
    
    fw_devlink can detect most overlapping/intersecting cycles. However it was
    missing a few corner cases because of an incorrect optimization logic that
    tries to avoid repeating cycle detection for devices that are already
    marked as part of a cycle.
    
    Here's an example provided by Xu Yang (edited for clarity):
    
                        usb
                      +-----+
       tcpc           |     |
      +-----+         |  +--|
      |     |----------->|EP|
      |--+  |         |  +--|
      |EP|<-----------|     |
      |--+  |         |  B  |
      |     |         +-----+
      |  A  |            |
      +-----+            |
         ^     +-----+   |
         |     |     |   |
         +-----|  C  |<--+
               |     |
               +-----+
               usb-phy
    
    Node A (tcpc) will be populated as device 1-0050.
    Node B (usb) will be populated as device 38100000.usb.
    Node C (usb-phy) will be populated as device 381f0040.usb-phy.
    
    The description below uses the notation:
    consumer --> supplier
    child ==> parent
    
    1. Node C is populated as device C. No cycles detected because cycle
       detection is only run when a fwnode link is converted to a device link.
    
    2. Node B is populated as device B. As we convert B --> C into a device
       link we run cycle detection and find and mark the device link/fwnode
       link cycle:
       C--> A --> B.EP ==> B --> C
    
    3. Node A is populated as device A. As we convert C --> A into a device
       link, we see it's already part of a cycle (from step 2) and don't run
       cycle detection. Thus we miss detecting the cycle:
       A --> B.EP ==> B --> A.EP ==> A
    
    Looking at it another way, A depends on B in one way:
    A --> B.EP ==> B
    
    But B depends on A in two ways and we only detect the first:
    B --> C --> A
    B --> A.EP ==> A
    
    To detect both of these, we remove the incorrect optimization attempt in
    step 3 and run cycle detection even if the fwnode link from which the
    device link is being created has already been marked as part of a cycle.
    
    Reported-by: Xu Yang <xu.yang_2@nxp.com>
    Closes: https://lore.kernel.org/lkml/DU2PR04MB8822693748725F85DC0CB86C8C792@DU2PR04MB8822.eurprd04.prod.outlook.com/
    Fixes: 3fb16866b51d ("driver core: fw_devlink: Make cycle detection more robust")
    Signed-off-by: Saravana Kannan <saravanak@google.com>
    Tested-by: Xu Yang <xu.yang_2@nxp.com>
    Link: https://lore.kernel.org/r/20240202095636.868578-3-saravanak@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/amd/display: Add align done check [+ + +]

Author: Zhikai Zhai <zhikai.zhai@amd.com>
Date:   Mon Jan 29 17:02:18 2024 +0800

    drm/amd/display: Add align done check
    
    commit 94b38b895dec8c0ef093140a141e191b60ff614c upstream.
    
    [WHY]
    We Double-check link status if training successful,
    but miss the lane align status.
    
    [HOW]
    Add the lane align status check
    
    Cc: Mario Limonciello <mario.limonciello@amd.com>
    Cc: Alex Deucher <alexander.deucher@amd.com>
    Cc: stable@vger.kernel.org
    Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
    Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
    Signed-off-by: Zhikai Zhai <zhikai.zhai@amd.com>
    Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amd/display: Fix MST Null Ptr for RV [+ + +]

Author: Fangzhi Zuo <jerry.zuo@amd.com>
Date:   Mon Jan 22 13:43:46 2024 -0500

    drm/amd/display: Fix MST Null Ptr for RV
    
    commit e6a7df96facdcf5b1f71eb3ec26f2f9f6ad61e57 upstream.
    
    The change try to fix below error specific to RV platform:
    
    BUG: kernel NULL pointer dereference, address: 0000000000000008
    PGD 0 P4D 0
    Oops: 0000 [#1] PREEMPT SMP NOPTI
    CPU: 4 PID: 917 Comm: sway Not tainted 6.3.9-arch1-1 #1 124dc55df4f5272ccb409f39ef4872fc2b3376a2
    Hardware name: LENOVO 20NKS01Y00/20NKS01Y00, BIOS R12ET61W(1.31 ) 07/28/2022
    RIP: 0010:drm_dp_atomic_find_time_slots+0x5e/0x260 [drm_display_helper]
    Code: 01 00 00 48 8b 85 60 05 00 00 48 63 80 88 00 00 00 3b 43 28 0f 8d 2e 01 00 00 48 8b 53 30 48 8d 04 80 48 8d 04 c2 48 8b 40 18 <48> 8>
    RSP: 0018:ffff960cc2df77d8 EFLAGS: 00010293
    RAX: 0000000000000000 RBX: ffff8afb87e81280 RCX: 0000000000000224
    RDX: ffff8afb9ee37c00 RSI: ffff8afb8da1a578 RDI: ffff8afb87e81280
    RBP: ffff8afb83d67000 R08: 0000000000000001 R09: ffff8afb9652f850
    R10: ffff960cc2df7908 R11: 0000000000000002 R12: 0000000000000000
    R13: ffff8afb8d7688a0 R14: ffff8afb8da1a578 R15: 0000000000000224
    FS:  00007f4dac35ce00(0000) GS:ffff8afe30b00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000008 CR3: 000000010ddc6000 CR4: 00000000003506e0
    Call Trace:
     <TASK>
     ? __die+0x23/0x70
     ? page_fault_oops+0x171/0x4e0
     ? plist_add+0xbe/0x100
     ? exc_page_fault+0x7c/0x180
     ? asm_exc_page_fault+0x26/0x30
     ? drm_dp_atomic_find_time_slots+0x5e/0x260 [drm_display_helper 0e67723696438d8e02b741593dd50d80b44c2026]
     ? drm_dp_atomic_find_time_slots+0x28/0x260 [drm_display_helper 0e67723696438d8e02b741593dd50d80b44c2026]
     compute_mst_dsc_configs_for_link+0x2ff/0xa40 [amdgpu 62e600d2a75e9158e1cd0a243bdc8e6da040c054]
     ? fill_plane_buffer_attributes+0x419/0x510 [amdgpu 62e600d2a75e9158e1cd0a243bdc8e6da040c054]
     compute_mst_dsc_configs_for_state+0x1e1/0x250 [amdgpu 62e600d2a75e9158e1cd0a243bdc8e6da040c054]
     amdgpu_dm_atomic_check+0xecd/0x1190 [amdgpu 62e600d2a75e9158e1cd0a243bdc8e6da040c054]
     drm_atomic_check_only+0x5c5/0xa40
     drm_mode_atomic_ioctl+0x76e/0xbc0
     ? _copy_to_user+0x25/0x30
     ? drm_ioctl+0x296/0x4b0
     ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
     drm_ioctl_kernel+0xcd/0x170
     drm_ioctl+0x26d/0x4b0
     ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
     amdgpu_drm_ioctl+0x4e/0x90 [amdgpu 62e600d2a75e9158e1cd0a243bdc8e6da040c054]
     __x64_sys_ioctl+0x94/0xd0
     do_syscall_64+0x60/0x90
     ? do_syscall_64+0x6c/0x90
     entry_SYSCALL_64_after_hwframe+0x72/0xdc
    RIP: 0033:0x7f4dad17f76f
    Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c>
    RSP: 002b:00007ffd9ae859f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
    RAX: ffffffffffffffda RBX: 000055e255a55900 RCX: 00007f4dad17f76f
    RDX: 00007ffd9ae85a90 RSI: 00000000c03864bc RDI: 000000000000000b
    RBP: 00007ffd9ae85a90 R08: 0000000000000003 R09: 0000000000000003
    R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c03864bc
    R13: 000000000000000b R14: 000055e255a7fc60 R15: 000055e255a01eb0
     </TASK>
    Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device ccm cmac algif_hash algif_skcipher af_alg joydev mousedev bnep >
     typec libphy k10temp ipmi_msghandler roles i2c_scmi acpi_cpufreq mac_hid nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_mas>
    CR2: 0000000000000008
    ---[ end trace 0000000000000000 ]---
    RIP: 0010:drm_dp_atomic_find_time_slots+0x5e/0x260 [drm_display_helper]
    Code: 01 00 00 48 8b 85 60 05 00 00 48 63 80 88 00 00 00 3b 43 28 0f 8d 2e 01 00 00 48 8b 53 30 48 8d 04 80 48 8d 04 c2 48 8b 40 18 <48> 8>
    RSP: 0018:ffff960cc2df77d8 EFLAGS: 00010293
    RAX: 0000000000000000 RBX: ffff8afb87e81280 RCX: 0000000000000224
    RDX: ffff8afb9ee37c00 RSI: ffff8afb8da1a578 RDI: ffff8afb87e81280
    RBP: ffff8afb83d67000 R08: 0000000000000001 R09: ffff8afb9652f850
    R10: ffff960cc2df7908 R11: 0000000000000002 R12: 0000000000000000
    R13: ffff8afb8d7688a0 R14: ffff8afb8da1a578 R15: 0000000000000224
    FS:  00007f4dac35ce00(0000) GS:ffff8afe30b00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000008 CR3: 000000010ddc6000 CR4: 00000000003506e0
    
    With a second DP monitor connected, drm_atomic_state in dm atomic check
    sequence does not include the connector state for the old/existing/first
    DP monitor. In such case, dsc determination policy would hit a null ptr
    when it tries to iterate the old/existing stream that does not have a
    valid connector state attached to it. When that happens, dm atomic check
    should call drm_atomic_get_connector_state for a new connector state.
    Existing dm has already done that, except for RV due to it does not have
    official support of dsc where .num_dsc is not defined in dcn10 resource
    cap, that prevent from getting drm_atomic_get_connector_state called.
    So, skip dsc determination policy for ASICs that don't have DSC support.
    
    Cc: stable@vger.kernel.org # 6.1+
    Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2314
    Reviewed-by: Wayne Lin <wayne.lin@amd.com>
    Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
    Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amd/display: Increase frame-larger-than for all display_mode_vba files [+ + +]

Author: Nathan Chancellor <nathan@kernel.org>
Date:   Mon Feb 5 14:54:05 2024 -0700

    drm/amd/display: Increase frame-larger-than for all display_mode_vba files
    
    commit e63e35f0164c43fbc1adb481d6604f253b9f9667 upstream.
    
    After a recent change in LLVM, allmodconfig (which has CONFIG_KCSAN=y
    and CONFIG_WERROR=y enabled) has a few new instances of
    -Wframe-larger-than for the mode support and system configuration
    functions:
    
      drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_mode_vba_20v2.c:3393:6: error: stack frame size (2144) exceeds limit (2048) in 'dml20v2_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
       3393 | void dml20v2_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib)
            |      ^
      1 error generated.
    
      drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn21/display_mode_vba_21.c:3520:6: error: stack frame size (2192) exceeds limit (2048) in 'dml21_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
       3520 | void dml21_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib)
            |      ^
      1 error generated.
    
      drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_mode_vba_20.c:3286:6: error: stack frame size (2128) exceeds limit (2048) in 'dml20_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
       3286 | void dml20_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib)
            |      ^
      1 error generated.
    
    Without the sanitizers enabled, there are no warnings.
    
    This was the catalyst for commit 6740ec97bcdb ("drm/amd/display:
    Increase frame warning limit with KASAN or KCSAN in dml2") and that same
    change was made to dml in commit 5b750b22530f ("drm/amd/display:
    Increase frame warning limit with KASAN or KCSAN in dml") but the
    frame_warn_flag variable was not applied to all files. Do so now to
    clear up the warnings and make all these files consistent.
    
    Cc: stable@vger.kernel.org
    Closes: https://github.com/ClangBuiltLinux/linux/issue/1990
    Signed-off-by: Nathan Chancellor <nathan@kernel.org>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amd/display: Preserve original aspect ratio in create stream [+ + +]

Author: Tom Chung <chiahsuan.chung@amd.com>
Date:   Tue Jan 30 15:34:08 2024 +0800

    drm/amd/display: Preserve original aspect ratio in create stream
    
    commit deb110292180cd501f6fde2a0178d65fcbcabb0c upstream.
    
    [Why]
    The original picture aspect ratio in mode struct may have chance be
    overwritten with wrong aspect ratio data in create_stream_for_sink().
    It will create a different VIC output and cause HDMI compliance test
    failed.
    
    [How]
    Preserve the original picture aspect ratio data during create the
    stream.
    
    Cc: Mario Limonciello <mario.limonciello@amd.com>
    Cc: Alex Deucher <alexander.deucher@amd.com>
    Cc: stable@vger.kernel.org
    Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
    Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
    Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amd: Don't init MEC2 firmware when it fails to load [+ + +]

Author: David McFarland <corngood@gmail.com>
Date:   Mon Jan 29 18:18:22 2024 -0400

    drm/amd: Don't init MEC2 firmware when it fails to load
    
    commit 8ef85a0ce24a6d9322dfa2a67477e473c3619b4f upstream.
    
    The same calls are made directly above, but conditional on the firmware
    loading and validating successfully.
    
    Cc: stable@vger.kernel.org
    Fixes: 9931b67690cf ("drm/amd: Load GFX10 microcode during early_init")
    Signed-off-by: David McFarland <corngood@gmail.com>
    Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdgpu/soc21: update VCN 4 max HEVC encoding resolution [+ + +]

Author: Thong <thong.thai@amd.com>
Date:   Tue Feb 6 18:05:16 2024 -0500

    drm/amdgpu/soc21: update VCN 4 max HEVC encoding resolution
    
    commit 2f542421a47e8246e9b7d2c6508fe3a6e6c63078 upstream.
    
    Update the maximum resolution reported for HEVC encoding on VCN 4
    devices to reflect its 8K encoding capability.
    
    Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3159
    Signed-off-by: Thong <thong.thai@amd.com>
    Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
    Acked-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdgpu: Reset IH OVERFLOW_CLEAR bit [+ + +]

Author: Friedrich Vock <friedrich.vock@gmx.de>
Date:   Tue Jan 23 12:52:03 2024 +0100

    drm/amdgpu: Reset IH OVERFLOW_CLEAR bit
    
    commit 7330256268664ea0a7dd5b07a3fed363093477dd upstream.
    
    Allows us to detect subsequent IH ring buffer overflows as well.
    
    Cc: Joshua Ashton <joshua@froggi.es>
    Cc: Alex Deucher <alexander.deucher@amd.com>
    Cc: Christian Kц╤nig <christian.koenig@amd.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
    Reviewed-by: Christian Kц╤nig <christian.koenig@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/msm/gem: Fix double resv lock aquire [+ + +]

Author: Rob Clark <robdclark@chromium.org>
Date:   Wed Jan 31 07:08:54 2024 -0800

    drm/msm/gem: Fix double resv lock aquire
    
    [ Upstream commit 03facb39d6c6433a78d0f79c7a146b1e6a61943e ]
    
    Since commit 79e2cf2e7a19 ("drm/gem: Take reservation lock for vmap/vunmap
    operations"), the resv lock is already held in the prime vmap path, so
    don't try to grab it again.
    
    v2: This applies to vunmap path as well
    v3: Fix fixes commit
    
    Fixes: 79e2cf2e7a19 ("drm/gem: Take reservation lock for vmap/vunmap operations")
    Signed-off-by: Rob Clark <robdclark@chromium.org>
    Acked-by: Christian Kц╤nig <christian.koenig@amd.com>
    Patchwork: https://patchwork.freedesktop.org/patch/576642/
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/msm: Wire up tlb ops [+ + +]

Author: Rob Clark <robdclark@chromium.org>
Date:   Tue Feb 13 09:23:40 2024 -0800

    drm/msm: Wire up tlb ops
    
    commit 8c7bfd8262319fd3f127a5380f593ea76f1b88a2 upstream.
    
    The brute force iommu_flush_iotlb_all() was good enough for unmap, but
    in some cases a map operation could require removing a table pte entry
    to replace with a block entry.  This also requires tlb invalidation.
    Missing this was resulting an obscure iova fault on what should be a
    valid buffer address.
    
    Thanks to Robin Murphy for helping me understand the cause of the fault.
    
    Cc: Robin Murphy <robin.murphy@arm.com>
    Cc: stable@vger.kernel.org
    Fixes: b145c6e65eb0 ("drm/msm: Add support to create a local pagetable")
    Signed-off-by: Rob Clark <robdclark@chromium.org>
    Patchwork: https://patchwork.freedesktop.org/patch/578117/
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/prime: Support page array >= 4GB [+ + +]

Author: Philip Yang <Philip.Yang@amd.com>
Date:   Mon Aug 21 16:02:01 2023 -0400

    drm/prime: Support page array >= 4GB
    
    commit b671cd3d456315f63171a670769356a196cf7fd0 upstream.
    
    Without unsigned long typecast, the size is passed in as zero if page
    array size >= 4GB, nr_pages >= 0x100000, then sg list converted will
    have the first and the last chunk lost.
    
    Signed-off-by: Philip Yang <Philip.Yang@amd.com>
    Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
    Reviewed-by: Christian Kц╤nig <christian.koenig@amd.com>
    CC: stable@vger.kernel.org
    Signed-off-by: Christian Kц╤nig <christian.koenig@amd.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20230821200201.24685-1-Philip.Yang@amd.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/virtio: Set segment size for virtio_gpu device [+ + +]

Author: Sebastian Ott <sebott@redhat.com>
Date:   Tue Jan 23 19:14:14 2024 +0100

    drm/virtio: Set segment size for virtio_gpu device
    
    commit 9c64e749cebd9c2d3d55261530a98bcccb83b950 upstream.
    
    Set the segment size of the virtio_gpu device to the value
    used by the drm helpers when allocating sg lists to fix the
    following complaint from DMA_API debug code:
    
    DMA-API: virtio-pci 0000:07:00.0: mapping sg segment longer than
    device claims to support [len=262144] [max=65536]
    
    Cc: stable@vger.kernel.org
    Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>
    Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
    Signed-off-by: Sebastian Ott <sebott@redhat.com>
    Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/7258a4cc-da16-5c34-a042-2a23ee396d56@redhat.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Clean up dentry ops and add revalidate function [+ + +]

Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Tue Feb 6 07:09:57 2024 -0500

    eventfs: Clean up dentry ops and add revalidate function
    
    commit 8dce06e98c70a7fcbb4bca7d90faf40522e65c58 upstream.
    
    In order for the dentries to stay up-to-date with the eventfs changes,
    just add a 'd_revalidate' function that checks the 'is_freed' bit.
    
    Also, clean up the dentry release to actually use d_release() rather
    than the slightly odd d_iput() function.  We don't care about the inode,
    all we want to do is to get rid of the refcount to the eventfs data
    added by dentry->d_fsdata.
    
    It would probably be cleaner to make eventfs its own filesystem, or at
    least set its own dentry ops when looking up eventfs files.  But as it
    is, only eventfs dentries use d_fsdata, so we don't really need to split
    these things up by use.
    
    Another thing that might be worth doing is to make all eventfs lookups
    mark their dentries as not worth caching.  We could do that with
    d_delete(), but the DCACHE_DONTCACHE flag would likely be even better.
    
    As it is, the dentries are all freeable, but they only tend to get freed
    at memory pressure rather than more proactively.  But that's a separate
    issue.
    
    Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/
    Link: https://lore.kernel.org/linux-trace-kernel/20240131185513.124644253@goodmis.org
    
    Cc: stable@vger.kernel.org
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Al Viro <viro@ZenIV.linux.org.uk>
    Cc: Ajay Kaher <ajay.kaher@broadcom.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Delete eventfs_inode when the last dentry is freed [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:26 2024 -0500

    eventfs: Delete eventfs_inode when the last dentry is freed
    
    commit 020010fbfa202aa528a52743eba4ab0da3400a4e upstream.
    
    There exists a race between holding a reference of an eventfs_inode dentry
    and the freeing of the eventfs_inode. If user space has a dentry held long
    enough, it may still be able to access the dentry's eventfs_inode after it
    has been freed.
    
    To prevent this, have he eventfs_inode freed via the last dput() (or via
    RCU if the eventfs_inode does not have a dentry).
    
    This means reintroducing the eventfs_inode del_list field at a temporary
    place to put the eventfs_inode. It needs to mark it as freed (via the
    list) but also must invalidate the dentry immediately as the return from
    eventfs_remove_dir() expects that they are. But the dentry invalidation
    must not be called under the eventfs_mutex, so it must be done after the
    eventfs_inode is marked as free (put on a deletion list).
    
    Link: https://lkml.kernel.org/r/20231101172650.123479767@goodmis.org
    
    Cc: stable@vger.kernel.org
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Ajay Kaher <akaher@vmware.com>
    Fixes: 5bdcd5f5331a2 ("eventfs: Implement removal of meta data from eventfs")
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Do ctx->pos update for all iterations in eventfs_iterate() [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:44 2024 -0500

    eventfs: Do ctx->pos update for all iterations in eventfs_iterate()
    
    commit 1e4624eb5a0ecaae0d2c4e3019bece119725bb98 upstream.
    
    The ctx->pos was only updated when it added an entry, but the "skip to
    current pos" check (c--) happened for every loop regardless of if the
    entry was added or not. This inconsistency caused readdir to be incorrect.
    
    It was due to:
    
            for (i = 0; i < ei->nr_entries; i++) {
    
                    if (c > 0) {
                            c--;
                            continue;
                    }
    
                    mutex_lock(&eventfs_mutex);
                    /* If ei->is_freed then just bail here, nothing more to do */
                    if (ei->is_freed) {
                            mutex_unlock(&eventfs_mutex);
                            goto out;
                    }
                    r = entry->callback(name, &mode, &cdata, &fops);
                    mutex_unlock(&eventfs_mutex);
    
                    [..]
                    ctx->pos++;
            }
    
    But this can cause the iterator to return a file that was already read.
    That's because of the way the callback() works. Some events may not have
    all files, and the callback can return 0 to tell eventfs to skip the file
    for this directory.
    
    for instance, we have:
    
     # ls /sys/kernel/tracing/events/ftrace/function
    format  hist  hist_debug  id  inject
    
    and
    
     # ls /sys/kernel/tracing/events/sched/sched_switch/
    enable  filter  format  hist  hist_debug  id  inject  trigger
    
    Where the function directory is missing "enable", "filter" and
    "trigger". That's because the callback() for events has:
    
    static int event_callback(const char *name, umode_t *mode, void **data,
                              const struct file_operations **fops)
    {
            struct trace_event_file *file = *data;
            struct trace_event_call *call = file->event_call;
    
    [..]
    
            /*
             * Only event directories that can be enabled should have
             * triggers or filters, with the exception of the "print"
             * event that can have a "trigger" file.
             */
            if (!(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE)) {
                    if (call->class->reg && strcmp(name, "enable") == 0) {
                            *mode = TRACE_MODE_WRITE;
                            *fops = &ftrace_enable_fops;
                            return 1;
                    }
    
                    if (strcmp(name, "filter") == 0) {
                            *mode = TRACE_MODE_WRITE;
                            *fops = &ftrace_event_filter_fops;
                            return 1;
                    }
            }
    
            if (!(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE) ||
                strcmp(trace_event_name(call), "print") == 0) {
                    if (strcmp(name, "trigger") == 0) {
                            *mode = TRACE_MODE_WRITE;
                            *fops = &event_trigger_fops;
                            return 1;
                    }
            }
    [..]
            return 0;
    }
    
    Where the function event has the TRACE_EVENT_FL_IGNORE_ENABLE set.
    
    This means that the entries array elements for "enable", "filter" and
    "trigger" when called on the function event will have the callback return
    0 and not 1, to tell eventfs to skip these files for it.
    
    Because the "skip to current ctx->pos" check happened for all entries, but
    the ctx->pos++ only happened to entries that exist, it would confuse the
    reading of a directory. Which would cause:
    
     # ls /sys/kernel/tracing/events/ftrace/function/
    format  hist  hist  hist_debug  hist_debug  id  inject  inject
    
    The missing "enable", "filter" and "trigger" caused ls to show "hist",
    "hist_debug" and "inject" twice.
    
    Update the ctx->pos for every iteration to keep its update and the "skip"
    update consistent. This also means that on error, the ctx->pos needs to be
    decremented if it was incremented without adding something.
    
    Link: https://lore.kernel.org/all/20240104150500.38b15a62@gandalf.local.home/
    Link: https://lore.kernel.org/linux-trace-kernel/20240104220048.172295263@goodmis.org
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Fixes: 493ec81a8fb8e ("eventfs: Stop using dcache_readdir() for getdents()")
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Do not allow NULL parent to eventfs_start_creating() [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:33 2024 -0500

    eventfs: Do not allow NULL parent to eventfs_start_creating()
    
    commit fc4561226feaad5fcdcb55646c348d77b8ee69c5 upstream.
    
    The eventfs directory is dynamically created via the meta data supplied by
    the existing trace events. All files and directories in eventfs has a
    parent. Do not allow NULL to be passed into eventfs_start_creating() as
    the parent because that should never happen. Warn if it does.
    
    Link: https://lkml.kernel.org/r/20231121231112.693841807@goodmis.org
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Do not create dentries nor inodes in iterate_shared [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:48 2024 -0500

    eventfs: Do not create dentries nor inodes in iterate_shared
    
    commit 852e46e239ee6db3cd220614cf8bce96e79227c2 upstream.
    
    The original eventfs code added a wrapper around the dcache_readdir open
    callback and created all the dentries and inodes at open, and increment
    their ref count. A wrapper was added around the dcache_readdir release
    function to decrement all the ref counts of those created inodes and
    dentries. But this proved to be buggy[1] for when a kprobe was created
    during a dir read, it would create a dentry between the open and the
    release, and because the release would decrement all ref counts of all
    files and directories, that would include the kprobe directory that was
    not there to have its ref count incremented in open. This would cause the
    ref count to go to negative and later crash the kernel.
    
    To solve this, the dentries and inodes that were created and had their ref
    count upped in open needed to be saved. That list needed to be passed from
    the open to the release, so that the release would only decrement the ref
    counts of the entries that were incremented in the open.
    
    Unfortunately, the dcache_readdir logic was already using the
    file->private_data, which is the only field that can be used to pass
    information from the open to the release. What was done was the eventfs
    created another descriptor that had a void pointer to save the
    dcache_readdir pointer, and it wrapped all the callbacks, so that it could
    save the list of entries that had their ref counts incremented in the
    open, and pass it to the release. The wrapped callbacks would just put
    back the dcache_readdir pointer and call the functions it used so it could
    still use its data[2].
    
    But Linus had an issue with the "hijacking" of the file->private_data
    (unfortunately this discussion was on a security list, so no public link).
    Which we finally agreed on doing everything within the iterate_shared
    callback and leave the dcache_readdir out of it[3]. All the information
    needed for the getents() could be created then.
    
    But this ended up being buggy too[4]. The iterate_shared callback was not
    the right place to create the dentries and inodes. Even Christian Brauner
    had issues with that[5].
    
    An attempt was to go back to creating the inodes and dentries at
    the open, create an array to store the information in the
    file->private_data, and pass that information to the other callbacks.[6]
    
    The difference between that and the original method, is that it does not
    use dcache_readdir. It also does not up the ref counts of the dentries and
    pass them. Instead, it creates an array of a structure that saves the
    dentry's name and inode number. That information is used in the
    iterate_shared callback, and the array is freed in the dir release. The
    dentries and inodes created in the open are not used for the iterate_share
    or release callbacks. Just their names and inode numbers.
    
    Linus did not like that either[7] and just wanted to remove the dentries
    being created in iterate_shared and use the hard coded inode numbers.
    
    [ All this while Linus enjoyed an unexpected vacation during the merge
      window due to lack of power. ]
    
    [1] https://lore.kernel.org/linux-trace-kernel/20230919211804.230edf1e@gandalf.local.home/
    [2] https://lore.kernel.org/linux-trace-kernel/20230922163446.1431d4fa@gandalf.local.home/
    [3] https://lore.kernel.org/linux-trace-kernel/20240104015435.682218477@goodmis.org/
    [4] https://lore.kernel.org/all/202401152142.bfc28861-oliver.sang@intel.com/
    [5] https://lore.kernel.org/all/20240111-unzahl-gefegt-433acb8a841d@brauner/
    [6] https://lore.kernel.org/all/20240116114711.7e8637be@gandalf.local.home/
    [7] https://lore.kernel.org/all/20240116170154.5bf0a250@gandalf.local.home/
    
    Link: https://lore.kernel.org/linux-trace-kernel/20240116211353.573784051@goodmis.org
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Al  Viro <viro@ZenIV.linux.org.uk>
    Cc: Ajay Kaher <ajay.kaher@broadcom.com>
    Fixes: 493ec81a8fb8 ("eventfs: Stop using dcache_readdir() for getdents()")
    Reported-by: kernel test robot <oliver.sang@intel.com>
    Closes: https://lore.kernel.org/oe-lkp/202401152142.bfc28861-oliver.sang@intel.com
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Do not invalidate dentry in create_file/dir_dentry() [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:30 2024 -0500

    eventfs: Do not invalidate dentry in create_file/dir_dentry()
    
    commit 71cade82f2b553a74d046c015c986f2df165696f upstream.
    
    With the call to simple_recursive_removal() on the entire eventfs sub
    system when the directory is removed, it performs the d_invalidate on all
    the dentries when it is removed. There's no need to do clean ups when a
    dentry is being created while the directory is being deleted.
    
    As dentries are cleaned up by the simpler_recursive_removal(), trying to
    do d_invalidate() in these functions will cause the dentry to be
    invalidated twice, and crash the kernel.
    
    Link: https://lore.kernel.org/all/20231116123016.140576-1-naresh.kamboju@linaro.org/
    Link: https://lkml.kernel.org/r/20231120235154.422970988@goodmis.org
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Fixes: 407c6726ca71 ("eventfs: Use simple_recursive_removal() to clean up dentries")
    Reported-by: Mark Rutland <mark.rutland@arm.com>
    Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
    Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Fix bitwise fields for "is_events" [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:39 2024 -0500

    eventfs: Fix bitwise fields for "is_events"
    
    commit fd56cd5f6d76e93356d9520cf9dabffe1e3d1aa0 upstream.
    
    A flag was needed to denote which eventfs_inode was the "events"
    directory, so a bit was taken from the "nr_entries" field, as there's not
    that many entries, and 2^30 is plenty. But the bit number for nr_entries
    was not updated to reflect the bit taken from it, which would add an
    unnecessary integer to the structure.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20240102151832.7ca87275@gandalf.local.home
    
    Cc: stable@vger.kernel.org
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Fixes: 7e8358edf503e ("eventfs: Fix file and directory uid and gid ownership")
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Fix events beyond NAME_MAX blocking tasks [+ + +]

Author: Beau Belgrave <beaub@linux.microsoft.com>
Date:   Tue Feb 6 07:09:35 2024 -0500

    eventfs: Fix events beyond NAME_MAX blocking tasks
    
    commit 5eaf7f0589c0d88178f0fbeebe0e0b7108258707 upstream.
    
    Eventfs uses simple_lookup(), however, it will fail if the name of the
    entry is beyond NAME_MAX length. When this error is encountered, eventfs
    still tries to create dentries instead of skipping the dentry creation.
    When the dentry is attempted to be created in this state d_wait_lookup()
    will loop forever, waiting for the lookup to be removed.
    
    Fix eventfs to return the error in simple_lookup() back to the caller
    instead of continuing to try to create the dentry.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20231210213534.497-1-beaub@linux.microsoft.com
    
    Fixes: 63940449555e ("eventfs: Implement eventfs lookup, read, open functions")
    Link: https://lore.kernel.org/linux-trace-kernel/20231208183601.GA46-beaub@linux.microsoft.com/
    Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Fix failure path in eventfs_create_events_dir() [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:15 2024 -0500

    eventfs: Fix failure path in eventfs_create_events_dir()
    
    commit 7e8ad67c9b5c11e990c320ed7e7563f2301672a7 upstream.
    
    The failure path of allocating ei goes to a path that dereferences ei.
    Add another label that skips over the ei dereferences to do the rest of
    the clean up.
    
    Link: https://lore.kernel.org/all/70e7bace-561c-95f-1117-706c2c220bc@inria.fr/
    Link: https://lore.kernel.org/linux-trace-kernel/20231019204132.6662fef0@gandalf.local.home
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Fixes: 5790b1fb3d67 ("eventfs: Remove eventfs_file and just use eventfs_inode")
    Reported-by: Julia Lawall <julia.lawall@inria.fr>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Fix file and directory uid and gid ownership [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:37 2024 -0500

    eventfs: Fix file and directory uid and gid ownership
    
    commit 7e8358edf503e87236c8d07f69ef0ed846dd5112 upstream.
    
    It was reported that when mounting the tracefs file system with a gid
    other than root, the ownership did not carry down to the eventfs directory
    due to the dynamic nature of it.
    
    A fix was done to solve this, but it had two issues.
    
    (a) if the attr passed into update_inode_attr() was NULL, it didn't do
        anything. This is true for files that have not had a chown or chgrp
        done to itself or any of its sibling files, as the attr is allocated
        for all children when any one needs it.
    
     # umount /sys/kernel/tracing
     # mount -o rw,seclabel,relatime,gid=1000 -t tracefs nodev /mnt
    
     # ls -ld /mnt/events/sched
    drwxr-xr-x 28 root rostedt 0 Dec 21 13:12 /mnt/events/sched/
    
     # ls -ld /mnt/events/sched/sched_switch
    drwxr-xr-x 2 root rostedt 0 Dec 21 13:12 /mnt/events/sched/sched_switch/
    
    But when checking the files:
    
     # ls -l /mnt/events/sched/sched_switch
    total 0
    -rw-r----- 1 root root 0 Dec 21 13:12 enable
    -rw-r----- 1 root root 0 Dec 21 13:12 filter
    -r--r----- 1 root root 0 Dec 21 13:12 format
    -r--r----- 1 root root 0 Dec 21 13:12 hist
    -r--r----- 1 root root 0 Dec 21 13:12 id
    -rw-r----- 1 root root 0 Dec 21 13:12 trigger
    
    (b) When the attr does not denote the UID or GID, it defaulted to using
        the parent uid or gid. This is incorrect as changing the parent
        uid or gid will automatically change all its children.
    
     # chgrp tracing /mnt/events/timer
    
     # ls -ld /mnt/events/timer
    drwxr-xr-x 2 root tracing 0 Dec 21 14:34 /mnt/events/timer
    
     # ls -l /mnt/events/timer
    total 0
    -rw-r----- 1 root root    0 Dec 21 14:35 enable
    -rw-r----- 1 root root    0 Dec 21 14:35 filter
    drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_cancel
    drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_expire_entry
    drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_expire_exit
    drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_init
    drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_start
    drwxr-xr-x 2 root tracing 0 Dec 21 14:35 itimer_expire
    drwxr-xr-x 2 root tracing 0 Dec 21 14:35 itimer_state
    drwxr-xr-x 2 root tracing 0 Dec 21 14:35 tick_stop
    drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_cancel
    drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_expire_entry
    drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_expire_exit
    drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_init
    drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_start
    
    At first it was thought that this could be easily fixed by just making the
    default ownership of the superblock when it was mounted. But this does not
    handle the case of:
    
     # chgrp tracing instances
     # mkdir instances/foo
    
    If the superblock was used, then the group ownership would be that of what
    it was when it was mounted, when it should instead be "tracing".
    
    Instead, set a flag for the top level eventfs directory ("events") to flag
    which eventfs_inode belongs to it.
    
    Since the "events" directory's dentry and inode are never freed, it does
    not need to use its attr field to restore its mode and ownership. Use the
    this eventfs_inode's attr as the default ownership for all the files and
    directories underneath it.
    
    When the events eventfs_inode is created, it sets its ownership to its
    parent uid and gid. As the events directory is created at boot up before
    it gets mounted, this will always be uid=0 and gid=0. If it's created via
    an instance, then it will take the ownership of the instance directory.
    
    When the file system is mounted, it will update all the gids if one is
    specified. This will have a callback to update the events evenfs_inode's
    default entries.
    
    When a file or directory is created under the events directory, it will
    walk the ei->dentry parents until it finds the evenfs_inode that belongs
    to the events directory to retrieve the default uid and gid values.
    
    Link: https://lore.kernel.org/all/CAHk-=wiwQtUHvzwyZucDq8=Gtw+AnwScyLhpFswrQ84PjhoGsg@mail.gmail.com/
    Link: https://lore.kernel.org/linux-trace-kernel/20231221190757.7eddbca9@gandalf.local.home
    
    Cc: stable@vger.kernel.org
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Dongliang Cui <cuidongliang390@gmail.com>
    Cc: Hongyu Jin  <hongyu.jin@unisoc.com>
    Fixes: 0dfc852b6fe3 ("eventfs: Have event files and directories default to parent uid and gid")
    Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Tested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Fix kerneldoc of eventfs_remove_rec() [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:20 2024 -0500

    eventfs: Fix kerneldoc of eventfs_remove_rec()
    
    commit 9037caa09ed345b35325200f0e4acf5a94ae0a65 upstream.
    
    The eventfs_remove_rec() had some missing parameters in the kerneldoc
    comment above it. Also, rephrase the description a bit more to have a bit
    more correct grammar.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20231030121523.0b2225a7@gandalf.local.home
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode");
    Reported-by: kernel test robot <lkp@intel.com>
    Closes: https://lore.kernel.org/oe-kbuild-all/202310052216.4SgqasWo-lkp@intel.com/
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Fix typo in eventfs_inode union comment [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:18 2024 -0500

    eventfs: Fix typo in eventfs_inode union comment
    
    commit 29e06c10702e81a7d0b75020ca514d2f2962704a upstream.
    
    It's eventfs_inode not eventfs_indoe. There's no deer involved!
    
    Link: https://lore.kernel.org/linux-trace-kernel/20231024131024.5634c743@gandalf.local.home
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode")
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Fix WARN_ON() in create_file_dentry() [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:17 2024 -0500

    eventfs: Fix WARN_ON() in create_file_dentry()
    
    commit a9de4eb15ad430fe45747c211e367da745a90093 upstream.
    
    As the comment right above a WARN_ON() in create_file_dentry() states:
    
      * Note, with the mutex held, the e_dentry cannot have content
      * and the ei->is_freed be true at the same time.
    
    But the WARN_ON() only has:
    
      WARN_ON_ONCE(ei->is_free);
    
    Where to match the comment (and what it should actually do) is:
    
      dentry = *e_dentry;
      WARN_ON_ONCE(dentry && ei->is_free)
    
    Also in that case, set dentry to NULL (although it should never happen).
    
    Link: https://lore.kernel.org/linux-trace-kernel/20231024123628.62b88755@gandalf.local.home
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode")
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Get rid of dentry pointers without refcounts [+ + +]

Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Tue Feb 6 07:09:58 2024 -0500

    eventfs: Get rid of dentry pointers without refcounts
    
    commit 43aa6f97c2d03a52c1ddb86768575fc84344bdbb upstream.
    
    The eventfs inode had pointers to dentries (and child dentries) without
    actually holding a refcount on said pointer.  That is fundamentally
    broken, and while eventfs tried to then maintain coherence with dentries
    going away by hooking into the '.d_iput' callback, that doesn't actually
    work since it's not ordered wrt lookups.
    
    There were two reasonms why eventfs tried to keep a pointer to a dentry:
    
     - the creation of a 'events' directory would actually have a stable
       dentry pointer that it created with tracefs_start_creating().
    
       And it needed that dentry when tearing it all down again in
       eventfs_remove_events_dir().
    
       This use is actually ok, because the special top-level events
       directory dentries are actually stable, not just a temporary cache of
       the eventfs data structures.
    
     - the 'eventfs_inode' (aka ei) needs to stay around as long as there
       are dentries that refer to it.
    
       It then used these dentry pointers as a replacement for doing
       reference counting: it would try to make sure that there was only
       ever one dentry associated with an event_inode, and keep a child
       dentry array around to see which dentries might still refer to the
       parent ei.
    
    This gets rid of the invalid dentry pointer use, and renames the one
    valid case to a different name to make it clear that it's not just any
    random dentry.
    
    The magic child dentry array that is kind of a "reverse reference list"
    is simply replaced by having child dentries take a ref to the ei.  As
    does the directory dentries.  That makes the broken use case go away.
    
    Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/
    Link: https://lore.kernel.org/linux-trace-kernel/20240131185513.280463000@goodmis.org
    
    Cc: stable@vger.kernel.org
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Al Viro <viro@ZenIV.linux.org.uk>
    Cc: Ajay Kaher <ajay.kaher@broadcom.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Have a free_ei() that just frees the eventfs_inode [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:22 2024 -0500

    eventfs: Have a free_ei() that just frees the eventfs_inode
    
    commit db3a397209b00d2e4e0a068608e5c546fc064b82 upstream.
    
    As the eventfs_inode is freed in two different locations, make a helper
    function free_ei() to make sure all the allocated fields of the
    eventfs_inode is freed.
    
    This requires renaming the existing free_ei() which is called by the srcu
    handler to free_rcu_ei() and have free_ei() just do the freeing, where
    free_rcu_ei() will call it.
    
    Link: https://lkml.kernel.org/r/20231101172649.265214087@goodmis.org
    
    Cc: Ajay Kaher <akaher@vmware.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Have event files and directories default to parent uid and gid [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:36 2024 -0500

    eventfs: Have event files and directories default to parent uid and gid
    
    commit 0dfc852b6fe3cbecbea67332a0dce2bebeba540d upstream.
    
    Dongliang reported:
    
      I found that in the latest version, the nodes of tracefs have been
      changed to dynamically created.
    
      This has caused me to encounter a problem where the gid I specified in
      the mounting parameters cannot apply to all files, as in the following
      situation:
    
      /data/tmp/events # mount | grep tracefs
      tracefs on /data/tmp type tracefs (rw,seclabel,relatime,gid=3012)
    
      gid 3012 = readtracefs
    
      /data/tmp # ls -lh
      total 0
      -r--r-----   1 root readtracefs 0 1970-01-01 08:00 README
      -r--r-----   1 root readtracefs 0 1970-01-01 08:00 available_events
    
      ums9621_1h10:/data/tmp/events # ls -lh
      total 0
      drwxr-xr-x 2 root root 0 2023-12-19 00:56 alarmtimer
      drwxr-xr-x 2 root root 0 2023-12-19 00:56 asoc
    
      It will prevent certain applications from accessing tracefs properly, I
      try to avoid this issue by making the following modifications.
    
    To fix this, have the files created default to taking the ownership of
    the parent dentry unless the ownership was previously set by the user.
    
    Link: https://lore.kernel.org/linux-trace-kernel/1703063706-30539-1-git-send-email-dongliang.cui@unisoc.com/
    Link: https://lore.kernel.org/linux-trace-kernel/20231220105017.1489d790@gandalf.local.home
    
    Cc: stable@vger.kernel.org
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Hongyu Jin  <hongyu.jin@unisoc.com>
    Fixes: 28e12c09f5aa0 ("eventfs: Save ownership and mode")
    Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Reported-by: Dongliang Cui <cuidongliang390@gmail.com>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Have eventfs_iterate() stop immediately if ei->is_freed is set [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:43 2024 -0500

    eventfs: Have eventfs_iterate() stop immediately if ei->is_freed is set
    
    commit e109deadb73318cf4a3bd61287d969f705df278f upstream.
    
    If ei->is_freed is set in eventfs_iterate(), it means that the directory
    that is being iterated on is in the process of being freed. Just exit the
    loop immediately when that is ever detected, and separate out the return
    of the entry->callback() from ei->is_freed.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20240104220048.016261289@goodmis.org
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Have the inodes all for files and directories all be the same [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:47 2024 -0500

    eventfs: Have the inodes all for files and directories all be the same
    
    commit 53c41052ba3121761e6f62a813961164532a214f upstream.
    
    The dentries and inodes are created in the readdir for the sole purpose of
    getting a consistent inode number. Linus stated that is unnecessary, and
    that all inodes can have the same inode number. For a virtual file system
    they are pretty meaningless.
    
    Instead use a single unique inode number for all files and one for all
    directories.
    
    Link: https://lore.kernel.org/all/20240116133753.2808d45e@gandalf.local.home/
    Link: https://lore.kernel.org/linux-trace-kernel/20240116211353.412180363@goodmis.org
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Al  Viro <viro@ZenIV.linux.org.uk>
    Cc: Ajay Kaher <ajay.kaher@broadcom.com>
    Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Hold eventfs_mutex when calling callback functions [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:25 2024 -0500

    eventfs: Hold eventfs_mutex when calling callback functions
    
    commit 44365329f8219fc379097c2c9a75ff53f123764f upstream.
    
    The callback function that is used to create inodes and dentries is not
    protected by anything and the data that is passed to it could become
    stale. After eventfs_remove_dir() is called by the tracing system, it is
    free to remove the events that are associated to that directory.
    Unfortunately, that means the callbacks must not be called after that.
    
         CPU0                               CPU1
         ----                               ----
     eventfs_root_lookup() {
                                     eventfs_remove_dir() {
                                          mutex_lock(&event_mutex);
                                          ei->is_freed = set;
                                          mutex_unlock(&event_mutex);
                                     }
                                     kfree(event_call);
    
        for (...) {
          entry = &ei->entries[i];
          r = entry->callback() {
              call = data;          // call == event_call above
              if (call->flags ...)
    
     [ USE AFTER FREE BUG ]
    
    The safest way to protect this is to wrap the callback with:
    
     mutex_lock(&eventfs_mutex);
     if (!ei->is_freed)
         r = entry->callback();
     else
         r = -1;
     mutex_unlock(&eventfs_mutex);
    
    This will make sure that the callback will not be called after it is
    freed. But now it needs to be known that the callback is called while
    holding internal eventfs locks, and that it must not call back into the
    eventfs / tracefs system. There's no reason it should anyway, but document
    that as well.
    
    Link: https://lore.kernel.org/all/CA+G9fYu9GOEbD=rR5eMR-=HJ8H6rMsbzDC2ZY5=Y50WpWAE7_Q@mail.gmail.com/
    Link: https://lkml.kernel.org/r/20231101172649.906696613@goodmis.org
    
    Cc: Ajay Kaher <akaher@vmware.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode")
    Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
    Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
    Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
    Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
    Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Initialize the tracefs inode properly [+ + +]

Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Tue Feb 6 07:09:53 2024 -0500

    eventfs: Initialize the tracefs inode properly
    
    commit 4fa4b010b83fb2f837b5ef79e38072a79e96e4f1 upstream.
    
    The tracefs-specific fields in the inode were not initialized before the
    inode was exposed to others through the dentry with 'd_instantiate()'.
    
    Move the field initializations up to before the d_instantiate.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20240131185512.478449628@goodmis.org
    
    Cc: stable@vger.kernel.org
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Al Viro <viro@ZenIV.linux.org.uk>
    Cc: Ajay Kaher <ajay.kaher@broadcom.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode")
    Reported-by: kernel test robot <oliver.sang@intel.com>
    Closes: https://lore.kernel.org/oe-lkp/202401291043.e62e89dc-oliver.sang@intel.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Keep all directory links at 1 [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:10:02 2024 -0500

    eventfs: Keep all directory links at 1
    
    commit ca185770db914869ff9fe773bac5e0e5e4165b83 upstream.
    
    The directory link count in eventfs was somewhat bogus. It was only being
    updated when a directory child was being looked up and not on creation.
    
    One solution would be to update in get_attr() the link count by iterating
    the ei->children list and then adding 2. But that could slow down simple
    stat() calls, especially if it's done on all directories in eventfs.
    
    Another solution would be to add a parent pointer to the eventfs_inode
    and keep track of the number of sub directories it has on creation. But
    this adds overhead for something not really worthwhile.
    
    The solution decided upon is to keep all directory links in eventfs as 1.
    This tells user space not to rely on the hard links of directories. Which
    in this case it shouldn't.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20240201002719.GS2087318@ZenIV/
    Link: https://lore.kernel.org/linux-trace-kernel/20240201161617.339968298@goodmis.org
    
    Cc: stable@vger.kernel.org
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Al Viro <viro@ZenIV.linux.org.uk>
    Cc: Ajay Kaher <ajay.kaher@broadcom.com>
    Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
    Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Make sure that parent->d_inode is locked in creating files/dirs [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:34 2024 -0500

    eventfs: Make sure that parent->d_inode is locked in creating files/dirs
    
    commit f49f950c217bfb40f11662bab39cb388d41e4cfb upstream.
    
    Since the locking of the parent->d_inode has been moved outside the
    creation of the files and directories (as it use to be locked via a
    conditional), add a WARN_ON_ONCE() to the case that it's not locked.
    
    Link: https://lkml.kernel.org/r/20231121231112.853962542@goodmis.org
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Move taking of inode_lock into dcache_dir_open_wrapper() [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:32 2024 -0500

    eventfs: Move taking of inode_lock into dcache_dir_open_wrapper()
    
    commit bcae32c5632fc0a0dbce46fa731cd23403117e66 upstream.
    
    The both create_file_dentry() and create_dir_dentry() takes a boolean
    parameter "lookup", as on lookup the inode_lock should already be taken,
    but for dcache_dir_open_wrapper() it is not taken.
    
    There's no reason that the dcache_dir_open_wrapper() can't take the
    inode_lock before calling these functions. In fact, it's better if it
    does, as the lock can be held throughout both directory and file
    creations.
    
    This also simplifies the code, and possibly prevents unexpected race
    conditions when the lock is released.
    
    Link: https://lkml.kernel.org/r/20231121231112.528544825@goodmis.org
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode")
    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Read ei->entries before ei->children in eventfs_iterate() [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:45 2024 -0500

    eventfs: Read ei->entries before ei->children in eventfs_iterate()
    
    commit 704f960dbee2f1634f4b4e16f208cb16eaf41c1e upstream.
    
    In order to apply a shortcut to skip over the current ctx->pos
    immediately, by using the ei->entries array, the reading of that array
    should be first. Moving the array reading before the linked list reading
    will make the shortcut change diff nicer to read.
    
    Link: https://lore.kernel.org/all/CAHk-=wiKwDUDv3+jCsv-uacDcHDVTYsXtBR9=6sGM5mqX+DhOg@mail.gmail.com/
    Link: https://lore.kernel.org/linux-trace-kernel/20240104220048.333115095@goodmis.org
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Remove "is_freed" union with rcu head [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:21 2024 -0500

    eventfs: Remove "is_freed" union with rcu head
    
    commit f2f496370afcbc5227d7002da28c74b91fed12ff upstream.
    
    The eventfs_inode->is_freed was a union with the rcu_head with the
    assumption that when it was on the srcu list the head would contain a
    pointer which would make "is_freed" true. But that was a wrong assumption
    as the rcu head is a single link list where the last element is NULL.
    
    Instead, split the nr_entries integer so that "is_freed" is one bit and
    the nr_entries is the next 31 bits. As there shouldn't be more than 10
    (currently there's at most 5 to 7 depending on the config), this should
    not be a problem.
    
    Link: https://lkml.kernel.org/r/20231101172649.049758712@goodmis.org
    
    Cc: stable@vger.kernel.org
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Ajay Kaher <akaher@vmware.com>
    Fixes: 63940449555e7 ("eventfs: Implement eventfs lookup, read, open functions")
    Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Remove "lookup" parameter from create_dir/file_dentry() [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:40 2024 -0500

    eventfs: Remove "lookup" parameter from create_dir/file_dentry()
    
    commit b0f7e2d739b4aac131ea1662d086a07775097b05 upstream.
    
    The "lookup" parameter is a way to differentiate the call to
    create_file/dir_dentry() from when it's just a lookup (no need to up the
    dentry refcount) and accessed via a readdir (need to up the refcount).
    
    But reality, it just makes the code more complex. Just up the refcount and
    let the caller decide to dput() the result or not.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20240103102553.17a19cea@gandalf.local.home
    Link: https://lore.kernel.org/linux-trace-kernel/20240104015435.517502710@goodmis.org
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Ajay Kaher <akaher@vmware.com>
    Cc: Al Viro <viro@ZenIV.linux.org.uk>
    Cc: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Remove eventfs_file and just use eventfs_inode [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:12 2024 -0500

    eventfs: Remove eventfs_file and just use eventfs_inode
    
    commit 5790b1fb3d672d9a1fe3881a7181dfdbe741568f upstream.
    
    Instead of having a descriptor for every file represented in the eventfs
    directory, only have the directory itself represented. Change the API to
    send in a list of entries that represent all the files in the directory
    (but not other directories). The entry list contains a name and a callback
    function that will be used to create the files when they are accessed.
    
    struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry *parent,
                                                    const struct eventfs_entry *entries,
                                                    int size, void *data);
    
    is used for the top level eventfs directory, and returns an eventfs_inode
    that will be used by:
    
    struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode *parent,
                                             const struct eventfs_entry *entries,
                                             int size, void *data);
    
    where both of the above take an array of struct eventfs_entry entries for
    every file that is in the directory.
    
    The entries are defined by:
    
    typedef int (*eventfs_callback)(const char *name, umode_t *mode, void **data,
                                    const struct file_operations **fops);
    
    struct eventfs_entry {
            const char                      *name;
            eventfs_callback                callback;
    };
    
    Where the name is the name of the file and the callback gets called when
    the file is being created. The callback passes in the name (in case the
    same callback is used for multiple files), a pointer to the mode, data and
    fops. The data will be pointing to the data that was passed in
    eventfs_create_dir() or eventfs_create_events_dir() but may be overridden
    to point to something else, as it will be used to point to the
    inode->i_private that is created. The information passed back from the
    callback is used to create the dentry/inode.
    
    If the callback fills the data and the file should be created, it must
    return a positive number. On zero or negative, the file is ignored.
    
    This logic may also be used as a prototype to convert entire pseudo file
    systems into just-in-time allocation.
    
    The "show_events_dentry" file has been updated to show the directories,
    and any files they have.
    
    With just the eventfs_file allocations:
    
     Before after deltas for meminfo (in kB):
    
       MemFree:             -14360
       MemAvailable:        -14260
       Buffers:             40
       Cached:              24
       Active:              44
       Inactive:            48
       Inactive(anon):      28
       Active(file):        44
       Inactive(file):      20
       Dirty:               -4
       AnonPages:           28
       Mapped:              4
       KReclaimable:        132
       Slab:                1604
       SReclaimable:        132
       SUnreclaim:          1472
       Committed_AS:        12
    
     Before after deltas for slabinfo:
    
       <slab>:              <objects>       [ * <size> = <total>]
    
       ext4_inode_cache     27              [* 1184 = 31968 ]
       extent_status        102             [*   40 = 4080 ]
       tracefs_inode_cache  144             [*  656 = 94464 ]
       buffer_head          39              [*  104 = 4056 ]
       shmem_inode_cache    49              [*  800 = 39200 ]
       filp                 -53             [*  256 = -13568 ]
       dentry               251             [*  192 = 48192 ]
       lsm_file_cache       277             [*   32 = 8864 ]
       vm_area_struct       -14             [*  184 = -2576 ]
       trace_event_file     1748            [*   88 = 153824 ]
       kmalloc-1k           35              [* 1024 = 35840 ]
       kmalloc-256          49              [*  256 = 12544 ]
       kmalloc-192          -28             [*  192 = -5376 ]
       kmalloc-128          -30             [*  128 = -3840 ]
       kmalloc-96           10581           [*   96 = 1015776 ]
       kmalloc-64           3056            [*   64 = 195584 ]
       kmalloc-32           1291            [*   32 = 41312 ]
       kmalloc-16           2310            [*   16 = 36960 ]
       kmalloc-8            9216            [*    8 = 73728 ]
    
     Free memory dropped by 14,360 kB
     Available memory dropped by 14,260 kB
     Total slab additions in size: 1,771,032 bytes
    
    With this change:
    
     Before after deltas for meminfo (in kB):
    
       MemFree:             -12084
       MemAvailable:        -11976
       Buffers:             32
       Cached:              32
       Active:              72
       Inactive:            168
       Inactive(anon):      176
       Active(file):        72
       Inactive(file):      -8
       Dirty:               24
       AnonPages:           196
       Mapped:              8
       KReclaimable:        148
       Slab:                836
       SReclaimable:        148
       SUnreclaim:          688
       Committed_AS:        324
    
     Before after deltas for slabinfo:
    
       <slab>:              <objects>       [ * <size> = <total>]
    
       tracefs_inode_cache  144             [* 656 = 94464 ]
       shmem_inode_cache    -23             [* 800 = -18400 ]
       filp                 -92             [* 256 = -23552 ]
       dentry               179             [* 192 = 34368 ]
       lsm_file_cache       -3              [* 32 = -96 ]
       vm_area_struct       -13             [* 184 = -2392 ]
       trace_event_file     1748            [* 88 = 153824 ]
       kmalloc-1k           -49             [* 1024 = -50176 ]
       kmalloc-256          -27             [* 256 = -6912 ]
       kmalloc-128          1864            [* 128 = 238592 ]
       kmalloc-64           4685            [* 64 = 299840 ]
       kmalloc-32           -72             [* 32 = -2304 ]
       kmalloc-16           256             [* 16 = 4096 ]
       total = 721352
    
     Free memory dropped by 12,084 kB
     Available memory dropped by 11,976 kB
     Total slab additions in size:  721,352 bytes
    
    That's over 2 MB in savings per instance for free and available memory,
    and over 1 MB in savings per instance of slab memory.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20231003184059.4924468e@gandalf.local.home
    Link: https://lore.kernel.org/linux-trace-kernel/20231004165007.43d79161@gandalf.local.home
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Ajay Kaher <akaher@vmware.com>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Remove expectation that ei->is_freed means ei->dentry == NULL [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:29 2024 -0500

    eventfs: Remove expectation that ei->is_freed means ei->dentry == NULL
    
    commit 88903daecacf03b1e5636e1b5f18bda5b07030fc upstream.
    
    The logic to free the eventfs_inode (ei) use to set is_freed and clear the
    "dentry" field under the eventfs_mutex. But that changed when a race was
    found where the ei->dentry needed to be cleared when the last dput() was
    called on it. But there was still logic that checked if ei->dentry was not
    NULL and is_freed is set, and would warn if it was.
    
    But since that situation was changed and the ei->dentry isn't cleared
    until the last dput() is called on it while the ei->is_freed is set, do
    not test for that condition anymore, and change the comments to reflect
    that.
    
    Link: https://lkml.kernel.org/r/20231120235154.265826243@goodmis.org
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Fixes: 020010fbfa20 ("eventfs: Delete eventfs_inode when the last dentry is freed")
    Reported-by: Mark Rutland <mark.rutland@arm.com>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Remove extra dget() in eventfs_create_events_dir() [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:19 2024 -0500

    eventfs: Remove extra dget() in eventfs_create_events_dir()
    
    commit 77bc4d4921bd3497678ba8e7f4e480de35692f05 upstream.
    
    The creation of the top events directory does a dget() at the end of the
    creation in eventfs_create_events_dir() with a comment saying the final
    dput() will happen when it is removed. The problem is that a dget() is
    already done on the dentry when it was created with tracefs_start_creating()!
    The dget() now just causes a memory leak of that dentry.
    
    Remove the extra dget() as the final dput() in the deletion of the events
    directory actually matches the one in tracefs_start_creating().
    
    Link: https://lore.kernel.org/linux-trace-kernel/20231031124229.4f2e3fa1@gandalf.local.home
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode")
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Remove fsnotify*() functions from lookup() [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:10:01 2024 -0500

    eventfs: Remove fsnotify*() functions from lookup()
    
    commit 12d823b31fadf47c8f36ecada7abac5f903cac33 upstream.
    
    The dentries and inodes are created when referenced in the lookup code.
    There's no reason to call fsnotify_*() functions when they are created by
    a reference. It doesn't make any sense.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20240201002719.GS2087318@ZenIV/
    Link: https://lore.kernel.org/linux-trace-kernel/20240201161617.166973329@goodmis.org
    
    Cc: stable@vger.kernel.org
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Al Viro <viro@ZenIV.linux.org.uk>
    Cc: Ajay Kaher <ajay.kaher@broadcom.com>
    Fixes: a376007917776 ("eventfs: Implement functions to create files and dirs when accessed");
    Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Remove special processing of dput() of events directory [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:27 2024 -0500

    eventfs: Remove special processing of dput() of events directory
    
    commit 62d65cac119d08d39f751b4e3e2063ed996edc05 upstream.
    
    The top level events directory is no longer special with regards to how it
    should be delete. Remove the extra processing for it in
    eventfs_set_ei_status_free().
    
    Link: https://lkml.kernel.org/r/20231101172650.340876747@goodmis.org
    
    Cc: Ajay Kaher <akaher@vmware.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Remove unused d_parent pointer field [+ + +]

Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Tue Feb 6 07:09:56 2024 -0500

    eventfs: Remove unused d_parent pointer field
    
    commit 408600be78cdb8c650a97ecc7ff411cb216811b5 upstream.
    
    It's never used
    
    Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/
    Link: https://lore.kernel.org/linux-trace-kernel/20240131185512.961772428@goodmis.org
    
    Cc: stable@vger.kernel.org
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Al Viro <viro@ZenIV.linux.org.uk>
    Cc: Ajay Kaher <ajay.kaher@broadcom.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Restructure eventfs_inode structure to be more condensed [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:10:00 2024 -0500

    eventfs: Restructure eventfs_inode structure to be more condensed
    
    commit 264424dfdd5cbd92bc5b5ddf93944929fc877fac upstream.
    
    Some of the eventfs_inode structure has holes in it. Rework the structure
    to be a bit more condensed, and also remove the no longer used llist
    field.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20240201161617.002321438@goodmis.org
    
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Al Viro <viro@ZenIV.linux.org.uk>
    Cc: Ajay Kaher <ajay.kaher@broadcom.com>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Save directory inodes in the eventfs_inode structure [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:50 2024 -0500

    eventfs: Save directory inodes in the eventfs_inode structure
    
    commit 834bf76add3e6168038150f162cbccf1fd492a67 upstream.
    
    The eventfs inodes and directories are allocated when referenced. But this
    leaves the issue of keeping consistent inode numbers and the number is
    only saved in the inode structure itself. When the inode is no longer
    referenced, it can be freed. When the file that the inode was representing
    is referenced again, the inode is once again created, but the inode number
    needs to be the same as it was before.
    
    Just making the inode numbers the same for all files is fine, but that
    does not work with directories. The find command will check for loops via
    the inode number and having the same inode number for directories triggers:
    
      # find /sys/kernel/tracing
    find: File system loop detected;
    '/sys/kernel/debug/tracing/events/initcall/initcall_finish' is part of the same file system loop as
    '/sys/kernel/debug/tracing/events/initcall'.
    [..]
    
    Linus pointed out that the eventfs_inode structure ends with a single
    32bit int, and on 64 bit machines, there's likely a 4 byte hole due to
    alignment. We can use this hole to store the inode number for the
    eventfs_inode. All directories in eventfs are represented by an
    eventfs_inode and that data structure can hold its inode number.
    
    That last int was also purposely placed at the end of the structure to
    prevent holes from within. Now that there's a 4 byte number to hold the
    inode, both the inode number and the last integer can be moved up in the
    structure for better cache locality, where the llist and rcu fields can be
    moved to the end as they are only used when the eventfs_inode is being
    deleted.
    
    Link: https://lore.kernel.org/all/CAMuHMdXKiorg-jiuKoZpfZyDJ3Ynrfb8=X+c7x0Eewxn-YRdCA@mail.gmail.com/
    Link: https://lore.kernel.org/linux-trace-kernel/20240122152748.46897388@gandalf.local.home
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
    Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
    Fixes: 53c41052ba31 ("eventfs: Have the inodes all for files and directories all be the same")
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Save ownership and mode [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:24 2024 -0500

    eventfs: Save ownership and mode
    
    commit 28e12c09f5aa081b2d13d1340e3610070b6c624d upstream.
    
    Now that inodes and dentries are created on the fly, they are also
    reclaimed on memory pressure. Since the ownership and file mode are saved
    in the inode, if they are freed, any changes to the ownership and mode
    will be lost.
    
    To counter this, if the user changes the permissions or ownership, save
    them, and when creating the inodes again, restore those changes.
    
    Link: https://lkml.kernel.org/r/20231101172649.691841445@goodmis.org
    
    Cc: stable@vger.kernel.org
    Cc: Ajay Kaher <akaher@vmware.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Fixes: 63940449555e7 ("eventfs: Implement eventfs lookup, read, open functions")
    Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Shortcut eventfs_iterate() by skipping entries already read [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:46 2024 -0500

    eventfs: Shortcut eventfs_iterate() by skipping entries already read
    
    commit 1de94b52d5e8d8b32f0252f14fad1f1edc2e71f1 upstream.
    
    As the ei->entries array is fixed for the duration of the eventfs_inode,
    it can be used to skip over already read entries in eventfs_iterate().
    
    That is, if ctx->pos is greater than zero, there's no reason in doing the
    loop across the ei->entries array for the entries less than ctx->pos.
    Instead, start the lookup of the entries at the current ctx->pos.
    
    Link: https://lore.kernel.org/all/CAHk-=wiKwDUDv3+jCsv-uacDcHDVTYsXtBR9=6sGM5mqX+DhOg@mail.gmail.com/
    Link: https://lore.kernel.org/linux-trace-kernel/20240104220048.494956957@goodmis.org
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Stop using dcache_readdir() for getdents() [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:41 2024 -0500

    eventfs: Stop using dcache_readdir() for getdents()
    
    commit 493ec81a8fb8e4ada6f223b8b73791a1280d4774 upstream.
    
    The eventfs creates dynamically allocated dentries and inodes. Using the
    dcache_readdir() logic for its own directory lookups requires hiding the
    cursor of the dcache logic and playing games to allow the dcache_readdir()
    to still have access to the cursor while the eventfs saved what it created
    and what it needs to release.
    
    Instead, just have eventfs have its own iterate_shared callback function
    that will fill in the dent entries. This simplifies the code quite a bit.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20240104015435.682218477@goodmis.org
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Ajay Kaher <akaher@vmware.com>
    Cc: Al Viro <viro@ZenIV.linux.org.uk>
    Cc: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Test for ei->is_freed when accessing ei->dentry [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:23 2024 -0500

    eventfs: Test for ei->is_freed when accessing ei->dentry
    
    commit 77a06c33a22d13f3a6e31f06f6ee6bca666e6898 upstream.
    
    The eventfs_inode (ei) is protected by SRCU, but the ei->dentry is not. It
    is protected by the eventfs_mutex. Anytime the eventfs_mutex is released,
    and access to the ei->dentry needs to be done, it should first check if
    ei->is_freed is set under the eventfs_mutex. If it is, then the ei->dentry
    is invalid and must not be used. The ei->dentry must only be accessed
    under the eventfs_mutex and after checking if ei->is_freed is set.
    
    When the ei is being freed, it will (under the eventfs_mutex) set is_freed
    and at the same time move the dentry to a free list to be cleared after
    the eventfs_mutex is released. This means that any access to the
    ei->dentry must check first if ei->is_freed is set, because if it is, then
    the dentry is on its way to be freed.
    
    Also add comments to describe this better.
    
    Link: https://lore.kernel.org/all/CA+G9fYt6pY+tMZEOg=SoEywQOe19fGP3uR15SGowkdK+_X85Cg@mail.gmail.com/
    Link: https://lore.kernel.org/all/CA+G9fYuDP3hVQ3t7FfrBAjd_WFVSurMgCepTxunSJf=MTe=6aA@mail.gmail.com/
    Link: https://lkml.kernel.org/r/20231101172649.477608228@goodmis.org
    
    Cc: Ajay Kaher <akaher@vmware.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode")
    Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
    Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
    Reported-by: Beau Belgrave <beaub@linux.microsoft.com>
    Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
    Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
    Tested-by: Beau Belgrave <beaub@linux.microsoft.com>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Use ERR_CAST() in eventfs_create_events_dir() [+ + +]

Author: Nathan Chancellor <nathan@kernel.org>
Date:   Tue Feb 6 07:09:14 2024 -0500

    eventfs: Use ERR_CAST() in eventfs_create_events_dir()
    
    commit b8a555dc31e5aa18d976de0bc228006e398a2e7d upstream.
    
    When building with clang and CONFIG_RANDSTRUCT_FULL=y, there is an error
    due to a cast in eventfs_create_events_dir():
    
      fs/tracefs/event_inode.c:734:10: error: casting from randomized structure pointer type 'struct dentry *' to 'struct eventfs_inode *'
        734 |                 return (struct eventfs_inode *)dentry;
            |                        ^
      1 error generated.
    
    Use the ERR_CAST() function to resolve the error, as it was designed for
    this exact situation (casting an error pointer to another type).
    
    Link: https://lore.kernel.org/linux-trace-kernel/20231018-ftrace-fix-clang-randstruct-v1-1-338cb214abfb@kernel.org
    
    Closes: https://github.com/ClangBuiltLinux/linux/issues/1947
    Fixes: 5790b1fb3d67 ("eventfs: Remove eventfs_file and just use eventfs_inode")
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Signed-off-by: Nathan Chancellor <nathan@kernel.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Use eventfs_remove_events_dir() [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:13 2024 -0500

    eventfs: Use eventfs_remove_events_dir()
    
    commit 2819f23ac12ce93ff79ca7a54597df9a4a1f6331 upstream.
    
    The update to removing the eventfs_file changed the way the events top
    level directory was handled. Instead of returning a dentry, it now returns
    the eventfs_inode. In this changed, the removing of the events top level
    directory is not much different than removing any of the other
    directories. Because of this, the removal just called eventfs_remove_dir()
    instead of eventfs_remove_events_dir().
    
    Although eventfs_remove_dir() does the clean up, it misses out on the
    dget() of the ei->dentry done in eventfs_create_events_dir(). It makes
    more sense to match eventfs_create_events_dir() with a specific function
    eventfs_remove_events_dir() and this specific function can then perform
    the dput() to the dentry that had the dget() when it was created.
    
    Fixes: 5790b1fb3d67 ("eventfs: Remove eventfs_file and just use eventfs_inode")
    Reported-by: kernel test robot <lkp@intel.com>
    Closes: https://lore.kernel.org/oe-kbuild-all/202310051743.y9EobbUr-lkp@intel.com/
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Use GFP_NOFS for allocation when eventfs_mutex is held [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:31 2024 -0500

    eventfs: Use GFP_NOFS for allocation when eventfs_mutex is held
    
    commit 4763d635c907baed212664dc579dde1663bb2676 upstream.
    
    If memory reclaim happens, it can reclaim file system pages. The file
    system pages from eventfs may take the eventfs_mutex on reclaim. This
    means that allocation while holding the eventfs_mutex must not call into
    filesystem reclaim. A lockdep splat uncovered this.
    
    Link: https://lkml.kernel.org/r/20231121231112.373501894@goodmis.org
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Fixes: 28e12c09f5aa0 ("eventfs: Save ownership and mode")
    Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode")
    Reported-by: Mark Rutland <mark.rutland@arm.com>
    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Use kcalloc() instead of kzalloc() [+ + +]

Author: Erick Archer <erick.archer@gmx.com>
Date:   Tue Feb 6 07:09:49 2024 -0500

    eventfs: Use kcalloc() instead of kzalloc()
    
    commit 1057066009c4325bb1d8430c9274894d0860e7c3 upstream.
    
    As noted in the "Deprecated Interfaces, Language Features, Attributes,
    and Conventions" documentation [1], size calculations (especially
    multiplication) should not be performed in memory allocator (or similar)
    function arguments due to the risk of them overflowing. This could lead
    to values wrapping around and a smaller allocation being made than the
    caller was expecting. Using those allocations could lead to linear
    overflows of heap memory and other misbehaviors.
    
    So, use the purpose specific kcalloc() function instead of the argument
    size * count in the kzalloc() function.
    
    [1] https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments
    
    Link: https://lore.kernel.org/linux-trace-kernel/20240115181658.4562-1-erick.archer@gmx.com
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Link: https://github.com/KSPP/linux/issues/162
    Signed-off-by: Erick Archer <erick.archer@gmx.com>
    Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Use simple_recursive_removal() to clean up dentries [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:28 2024 -0500

    eventfs: Use simple_recursive_removal() to clean up dentries
    
    commit 407c6726ca71b33330d2d6345d9ea7ebc02575e9 upstream.
    
    Looking at how dentry is removed via the tracefs system, I found that
    eventfs does not do everything that it did under tracefs. The tracefs
    removal of a dentry calls simple_recursive_removal() that does a lot more
    than a simple d_invalidate().
    
    As it should be a requirement that any eventfs_inode that has a dentry, so
    does its parent. When removing a eventfs_inode, if it has a dentry, a call
    to simple_recursive_removal() on that dentry should clean up all the
    dentries underneath it.
    
    Add WARN_ON_ONCE() to check for the parent having a dentry if any children
    do.
    
    Link: https://lore.kernel.org/all/20231101022553.GE1957730@ZenIV/
    Link: https://lkml.kernel.org/r/20231101172650.552471568@goodmis.org
    
    Cc: stable@vger.kernel.org
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Fixes: 5bdcd5f5331a2 ("eventfs: Implement removal of meta data from eventfs")
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Warn if an eventfs_inode is freed without is_freed being set [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:59 2024 -0500

    eventfs: Warn if an eventfs_inode is freed without is_freed being set
    
    commit 5a49f996046ba947466bc7461e4b19c4d1daf978 upstream.
    
    There should never be a case where an evenfs_inode is being freed without
    is_freed being set. Add a WARN_ON_ONCE() if it ever happens. That would
    mean there was one too many put_ei()s.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20240201161616.843551963@goodmis.org
    
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Al Viro <viro@ZenIV.linux.org.uk>
    Cc: Ajay Kaher <ajay.kaher@broadcom.com>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks() [+ + +]

Author: Baokun Li <libaokun1@huawei.com>
Date:   Thu Jan 4 22:20:36 2024 +0800

    ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks()
    
    commit 2331fd4a49864e1571b4f50aa3aa1536ed6220d0 upstream.
    
    After updating bb_free in mb_free_blocks, it is possible to return without
    updating bb_fragments because the block being freed is found to have
    already been freed, which leads to inconsistency between bb_free and
    bb_fragments.
    
    Since the group may be unlocked in ext4_grp_locked_error(), this can lead
    to problems such as dividing by zero when calculating the average fragment
    length. Hence move the update of bb_free to after the block double-free
    check guarantees that the corresponding statistics are updated only after
    the core block bitmap is modified.
    
    Fixes: eabe0444df90 ("ext4: speed-up releasing blocks on commit")
    CC:  <stable@vger.kernel.org> # 3.10
    Suggested-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Baokun Li <libaokun1@huawei.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20240104142040.2835097-5-libaokun1@huawei.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: fix double-free of blocks due to wrong extents moved_len [+ + +]

Author: Baokun Li <libaokun1@huawei.com>
Date:   Thu Jan 4 22:20:33 2024 +0800

    ext4: fix double-free of blocks due to wrong extents moved_len
    
    commit 55583e899a5357308274601364741a83e78d6ac4 upstream.
    
    In ext4_move_extents(), moved_len is only updated when all moves are
    successfully executed, and only discards orig_inode and donor_inode
    preallocations when moved_len is not zero. When the loop fails to exit
    after successfully moving some extents, moved_len is not updated and
    remains at 0, so it does not discard the preallocations.
    
    If the moved extents overlap with the preallocated extents, the
    overlapped extents are freed twice in ext4_mb_release_inode_pa() and
    ext4_process_freed_data() (as described in commit 94d7c16cbbbd ("ext4:
    Fix double-free of blocks with EXT4_IOC_MOVE_EXT")), and bb_free is
    incremented twice. Hence when trim is executed, a zero-division bug is
    triggered in mb_update_avg_fragment_size() because bb_free is not zero
    and bb_fragments is zero.
    
    Therefore, update move_len after each extent move to avoid the issue.
    
    Reported-by: Wei Chen <harperchen1110@gmail.com>
    Reported-by: xingwei lee <xrivendell7@gmail.com>
    Closes: https://lore.kernel.org/r/CAO4mrferzqBUnCag8R3m2zf897ts9UEuhjFQGPtODT92rYyR2Q@mail.gmail.com
    Fixes: fcf6b1b729bc ("ext4: refactor ext4_move_extents code base")
    CC:  <stable@vger.kernel.org> # 3.18
    Signed-off-by: Baokun Li <libaokun1@huawei.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20240104142040.2835097-2-libaokun1@huawei.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

firewire: core: correct documentation of fw_csr_string() kernel API [+ + +]

Author: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Date:   Thu Feb 1 20:53:18 2024 +0900

    firewire: core: correct documentation of fw_csr_string() kernel API
    
    commit 5f9ab17394f831cb7986ec50900fa37507a127f1 upstream.
    
    Against its current description, the kernel API can accepts all types of
    directory entries.
    
    This commit corrects the documentation.
    
    Cc: stable@vger.kernel.org
    Fixes: 3c2c58cb33b3 ("firewire: core: fw_csr_string addendum")
    Link: https://lore.kernel.org/r/20240130100409.30128-2-o-takashi@sakamocchi.jp
    Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_super [+ + +]

Author: Oscar Salvador <osalvador@suse.de>
Date:   Tue Jan 30 22:04:18 2024 +0100

    fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_super
    
    commit 79d72c68c58784a3e1cd2378669d51bfd0cb7498 upstream.
    
    When configuring a hugetlb filesystem via the fsconfig() syscall, there is
    a possible NULL dereference in hugetlbfs_fill_super() caused by assigning
    NULL to ctx->hstate in hugetlbfs_parse_param() when the requested pagesize
    is non valid.
    
    E.g: Taking the following steps:
    
         fd = fsopen("hugetlbfs", FSOPEN_CLOEXEC);
         fsconfig(fd, FSCONFIG_SET_STRING, "pagesize", "1024", 0);
         fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
    
    Given that the requested "pagesize" is invalid, ctxt->hstate will be replaced
    with NULL, losing its previous value, and we will print an error:
    
     ...
     ...
     case Opt_pagesize:
     ps = memparse(param->string, &rest);
     ctx->hstate = h;
     if (!ctx->hstate) {
             pr_err("Unsupported page size %lu MB\n", ps / SZ_1M);
             return -EINVAL;
     }
     return 0;
     ...
     ...
    
    This is a problem because later on, we will dereference ctxt->hstate in
    hugetlbfs_fill_super()
    
     ...
     ...
     sb->s_blocksize = huge_page_size(ctx->hstate);
     ...
     ...
    
    Causing below Oops.
    
    Fix this by replacing cxt->hstate value only when then pagesize is known
    to be valid.
    
     kernel: hugetlbfs: Unsupported page size 0 MB
     kernel: BUG: kernel NULL pointer dereference, address: 0000000000000028
     kernel: #PF: supervisor read access in kernel mode
     kernel: #PF: error_code(0x0000) - not-present page
     kernel: PGD 800000010f66c067 P4D 800000010f66c067 PUD 1b22f8067 PMD 0
     kernel: Oops: 0000 [#1] PREEMPT SMP PTI
     kernel: CPU: 4 PID: 5659 Comm: syscall Tainted: G            E      6.8.0-rc2-default+ #22 5a47c3fef76212addcc6eb71344aabc35190ae8f
     kernel: Hardware name: Intel Corp. GROVEPORT/GROVEPORT, BIOS GVPRCRB1.86B.0016.D04.1705030402 05/03/2017
     kernel: RIP: 0010:hugetlbfs_fill_super+0xb4/0x1a0
     kernel: Code: 48 8b 3b e8 3e c6 ed ff 48 85 c0 48 89 45 20 0f 84 d6 00 00 00 48 b8 ff ff ff ff ff ff ff 7f 4c 89 e7 49 89 44 24 20 48 8b 03 <8b> 48 28 b8 00 10 00 00 48 d3 e0 49 89 44 24 18 48 8b 03 8b 40 28
     kernel: RSP: 0018:ffffbe9960fcbd48 EFLAGS: 00010246
     kernel: RAX: 0000000000000000 RBX: ffff9af5272ae780 RCX: 0000000000372004
     kernel: RDX: ffffffffffffffff RSI: ffffffffffffffff RDI: ffff9af555e9b000
     kernel: RBP: ffff9af52ee66b00 R08: 0000000000000040 R09: 0000000000370004
     kernel: R10: ffffbe9960fcbd48 R11: 0000000000000040 R12: ffff9af555e9b000
     kernel: R13: ffffffffa66b86c0 R14: ffff9af507d2f400 R15: ffff9af507d2f400
     kernel: FS:  00007ffbc0ba4740(0000) GS:ffff9b0bd7000000(0000) knlGS:0000000000000000
     kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     kernel: CR2: 0000000000000028 CR3: 00000001b1ee0000 CR4: 00000000001506f0
     kernel: Call Trace:
     kernel:  <TASK>
     kernel:  ? __die_body+0x1a/0x60
     kernel:  ? page_fault_oops+0x16f/0x4a0
     kernel:  ? search_bpf_extables+0x65/0x70
     kernel:  ? fixup_exception+0x22/0x310
     kernel:  ? exc_page_fault+0x69/0x150
     kernel:  ? asm_exc_page_fault+0x22/0x30
     kernel:  ? __pfx_hugetlbfs_fill_super+0x10/0x10
     kernel:  ? hugetlbfs_fill_super+0xb4/0x1a0
     kernel:  ? hugetlbfs_fill_super+0x28/0x1a0
     kernel:  ? __pfx_hugetlbfs_fill_super+0x10/0x10
     kernel:  vfs_get_super+0x40/0xa0
     kernel:  ? __pfx_bpf_lsm_capable+0x10/0x10
     kernel:  vfs_get_tree+0x25/0xd0
     kernel:  vfs_cmd_create+0x64/0xe0
     kernel:  __x64_sys_fsconfig+0x395/0x410
     kernel:  do_syscall_64+0x80/0x160
     kernel:  ? syscall_exit_to_user_mode+0x82/0x240
     kernel:  ? do_syscall_64+0x8d/0x160
     kernel:  ? syscall_exit_to_user_mode+0x82/0x240
     kernel:  ? do_syscall_64+0x8d/0x160
     kernel:  ? exc_page_fault+0x69/0x150
     kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0x76
     kernel: RIP: 0033:0x7ffbc0cb87c9
     kernel: Code: 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 96 0d 00 f7 d8 64 89 01 48
     kernel: RSP: 002b:00007ffc29d2f388 EFLAGS: 00000206 ORIG_RAX: 00000000000001af
     kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffbc0cb87c9
     kernel: RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003
     kernel: RBP: 00007ffc29d2f3b0 R08: 0000000000000000 R09: 0000000000000000
     kernel: R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
     kernel: R13: 00007ffc29d2f4c0 R14: 0000000000000000 R15: 0000000000000000
     kernel:  </TASK>
     kernel: Modules linked in: rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) nfs(E) lockd(E) grace(E) sunrpc(E) netfs(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) intel_rapl_msr(E) intel_rapl_common(E) iTCO_wdt(E) intel_pmc_bxt(E) sb_edac(E) iTCO_vendor_support(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) rfkill(E) ipmi_ssif(E) kvm(E) acpi_ipmi(E) irqbypass(E) pcspkr(E) igb(E) ipmi_si(E) mei_me(E) i2c_i801(E) joydev(E) intel_pch_thermal(E) i2c_smbus(E) dca(E) lpc_ich(E) mei(E) ipmi_devintf(E) ipmi_msghandler(E) acpi_pad(E) tiny_power_button(E) button(E) fuse(E) efi_pstore(E) configfs(E) ip_tables(E) x_tables(E) ext4(E) mbcache(E) jbd2(E) hid_generic(E) usbhid(E) sd_mod(E) t10_pi(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) polyval_clmulni(E) ahci(E) xhci_pci(E) polyval_generic(E) gf128mul(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha256_ssse3(E) xhci_pci_renesas(E) libahci(E) ehci_pci(E) sha1_ssse3(E) xhci_hcd(E) ehci_hcd(E) libata(E)
     kernel:  mgag200(E) i2c_algo_bit(E) usbcore(E) wmi(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) scsi_common(E) aesni_intel(E) crypto_simd(E) cryptd(E)
     kernel: Unloaded tainted modules: acpi_cpufreq(E):1 fjes(E):1
     kernel: CR2: 0000000000000028
     kernel: ---[ end trace 0000000000000000 ]---
     kernel: RIP: 0010:hugetlbfs_fill_super+0xb4/0x1a0
     kernel: Code: 48 8b 3b e8 3e c6 ed ff 48 85 c0 48 89 45 20 0f 84 d6 00 00 00 48 b8 ff ff ff ff ff ff ff 7f 4c 89 e7 49 89 44 24 20 48 8b 03 <8b> 48 28 b8 00 10 00 00 48 d3 e0 49 89 44 24 18 48 8b 03 8b 40 28
     kernel: RSP: 0018:ffffbe9960fcbd48 EFLAGS: 00010246
     kernel: RAX: 0000000000000000 RBX: ffff9af5272ae780 RCX: 0000000000372004
     kernel: RDX: ffffffffffffffff RSI: ffffffffffffffff RDI: ffff9af555e9b000
     kernel: RBP: ffff9af52ee66b00 R08: 0000000000000040 R09: 0000000000370004
     kernel: R10: ffffbe9960fcbd48 R11: 0000000000000040 R12: ffff9af555e9b000
     kernel: R13: ffffffffa66b86c0 R14: ffff9af507d2f400 R15: ffff9af507d2f400
     kernel: FS:  00007ffbc0ba4740(0000) GS:ffff9b0bd7000000(0000) knlGS:0000000000000000
     kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     kernel: CR2: 0000000000000028 CR3: 00000001b1ee0000 CR4: 00000000001506f0
    
    Link: https://lkml.kernel.org/r/20240130210418.3771-1-osalvador@suse.de
    Fixes: 32021982a324 ("hugetlbfs: Convert to fs_context")
    Signed-off-by: Michal Hocko <mhocko@suse.com>
    Signed-off-by: Oscar Salvador <osalvador@suse.de>
    Acked-by: Muchun Song <muchun.song@linux.dev>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fs/proc: do_task_stat: move thread_group_cputime_adjusted() outside of lock_task_sighand() [+ + +]

Author: Oleg Nesterov <oleg@redhat.com>
Date:   Tue Jan 23 16:33:55 2024 +0100

    fs/proc: do_task_stat: move thread_group_cputime_adjusted() outside of lock_task_sighand()
    
    commit 60f92acb60a989b14e4b744501a0df0f82ef30a3 upstream.
    
    Patch series "fs/proc: do_task_stat: use sig->stats_".
    
    do_task_stat() has the same problem as getrusage() had before "getrusage:
    use sig->stats_lock rather than lock_task_sighand()": a hard lockup.  If
    NR_CPUS threads call lock_task_sighand() at the same time and the process
    has NR_THREADS, spin_lock_irq will spin with irqs disabled O(NR_CPUS *
    NR_THREADS) time.
    
    
    This patch (of 3):
    
    thread_group_cputime() does its own locking, we can safely shift
    thread_group_cputime_adjusted() which does another for_each_thread loop
    outside of ->siglock protected section.
    
    Not only this removes for_each_thread() from the critical section with
    irqs disabled, this removes another case when stats_lock is taken with
    siglock held.  We want to remove this dependency, then we can change the
    users of stats_lock to not disable irqs.
    
    Link: https://lkml.kernel.org/r/20240123153313.GA21832@redhat.com
    Link: https://lkml.kernel.org/r/20240123153355.GA21854@redhat.com
    Signed-off-by: Oleg Nesterov <oleg@redhat.com>
    Signed-off-by: Dylan Hatch <dylanbhatch@google.com>
    Cc: Eric W. Biederman <ebiederm@xmission.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fs: relax mount_setattr() permission checks [+ + +]

Author: Christian Brauner <brauner@kernel.org>
Date:   Tue Feb 6 11:22:09 2024 +0100

    fs: relax mount_setattr() permission checks
    
    commit 46f5ab762d048dad224436978315cbc2fa79c630 upstream.
    
    When we added mount_setattr() I added additional checks compared to the
    legacy do_reconfigure_mnt() and do_change_type() helpers used by regular
    mount(2). If that mount had a parent then verify that the caller and the
    mount namespace the mount is attached to match and if not make sure that
    it's an anonymous mount.
    
    The real rootfs falls into neither category. It is neither an anoymous
    mount because it is obviously attached to the initial mount namespace
    but it also obviously doesn't have a parent mount. So that means legacy
    mount(2) allows changing mount properties on the real rootfs but
    mount_setattr(2) blocks this. I never thought much about this but of
    course someone on this planet of earth changes properties on the real
    rootfs as can be seen in [1].
    
    Since util-linux finally switched to the new mount api in 2.39 not so
    long ago it also relies on mount_setattr() and that surfaced this issue
    when Fedora 39 finally switched to it. Fix this.
    
    Link: https://bugzilla.redhat.com/show_bug.cgi?id=2256843
    Link: https://lore.kernel.org/r/20240206-vfs-mount-rootfs-v1-1-19b335eee133@kernel.org
    Reviewed-by: Jan Kara <jack@suse.cz>
    Reported-by: Karel Zak <kzak@redhat.com>
    Cc: stable@vger.kernel.org # v5.12+
    Signed-off-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default [+ + +]

Author: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Date:   Wed Jan 10 09:13:06 2024 +0900

    ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default
    
    commit a8b9cf62ade1bf17261a979fc97e40c2d7842353 upstream.
    
    The commit 60c8971899f3 ("ftrace: Make DIRECT_CALLS work WITH_ARGS
    and !WITH_REGS") changed DIRECT_CALLS to use SAVE_ARGS when there
    are multiple ftrace_ops at the same function, but since the x86 only
    support to jump to direct_call from ftrace_regs_caller, when we set
    the function tracer on the same target function on x86, ftrace-direct
    does not work as below (this actually works on arm64.)
    
    At first, insmod ftrace-direct.ko to put a direct_call on
    'wake_up_process()'.
    
     # insmod kernel/samples/ftrace/ftrace-direct.ko
     # less trace
    ...
              <idle>-0       [006] ..s1.   564.686958: my_direct_func: waking up rcu_preempt-17
              <idle>-0       [007] ..s1.   564.687836: my_direct_func: waking up kcompactd0-63
              <idle>-0       [006] ..s1.   564.690926: my_direct_func: waking up rcu_preempt-17
              <idle>-0       [006] ..s1.   564.696872: my_direct_func: waking up rcu_preempt-17
              <idle>-0       [007] ..s1.   565.191982: my_direct_func: waking up kcompactd0-63
    
    Setup a function filter to the 'wake_up_process' too, and enable it.
    
     # cd /sys/kernel/tracing/
     # echo wake_up_process > set_ftrace_filter
     # echo function > current_tracer
     # less trace
    ...
              <idle>-0       [006] ..s3.   686.180972: wake_up_process <-call_timer_fn
              <idle>-0       [006] ..s3.   686.186919: wake_up_process <-call_timer_fn
              <idle>-0       [002] ..s3.   686.264049: wake_up_process <-call_timer_fn
              <idle>-0       [002] d.h6.   686.515216: wake_up_process <-kick_pool
              <idle>-0       [002] d.h6.   686.691386: wake_up_process <-kick_pool
    
    Then, only function tracer is shown on x86.
    But if you enable 'kprobe on ftrace' event (which uses SAVE_REGS flag)
    on the same function, it is shown again.
    
     # echo 'p wake_up_process' >> dynamic_events
     # echo 1 > events/kprobes/p_wake_up_process_0/enable
     # echo > trace
     # less trace
    ...
              <idle>-0       [006] ..s2.  2710.345919: p_wake_up_process_0: (wake_up_process+0x4/0x20)
              <idle>-0       [006] ..s3.  2710.345923: wake_up_process <-call_timer_fn
              <idle>-0       [006] ..s1.  2710.345928: my_direct_func: waking up rcu_preempt-17
              <idle>-0       [006] ..s2.  2710.349931: p_wake_up_process_0: (wake_up_process+0x4/0x20)
              <idle>-0       [006] ..s3.  2710.349934: wake_up_process <-call_timer_fn
              <idle>-0       [006] ..s1.  2710.349937: my_direct_func: waking up rcu_preempt-17
    
    To fix this issue, use SAVE_REGS flag for multiple ftrace_ops flag of
    direct_call by default.
    
    Link: https://lore.kernel.org/linux-trace-kernel/170484558617.178953.1590516949390270842.stgit@devnote2
    
    Fixes: 60c8971899f3 ("ftrace: Make DIRECT_CALLS work WITH_ARGS and !WITH_REGS")
    Cc: stable@vger.kernel.org
    Cc: Florent Revest <revest@chromium.org>
    Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Reviewed-by: Mark Rutland <mark.rutland@arm.com>
    Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64]
    Acked-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

HID: bpf: actually free hdev memory after attaching a HID-BPF program [+ + +]

Author: Benjamin Tissoires <bentiss@kernel.org>
Date:   Wed Jan 24 12:26:58 2024 +0100

    HID: bpf: actually free hdev memory after attaching a HID-BPF program
    
    commit 89be8aa5b0ecb3b729c7bcff64bb2af7921fec63 upstream.
    
    Turns out that I got my reference counts wrong and each successful
    bus_find_device() actually calls get_device(), and we need to manually
    call put_device().
    
    Ensure each bus_find_device() gets a matching put_device() when releasing
    the bpf programs and fix all the error paths.
    
    Cc: <stable@vger.kernel.org>
    Fixes: f5c27da4e3c8 ("HID: initial BPF implementation")
    Link: https://lore.kernel.org/r/20240124-b4-hid-bpf-fixes-v2-2-052520b1e5e6@kernel.org
    Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

HID: bpf: remove double fdget() [+ + +]

Author: Benjamin Tissoires <bentiss@kernel.org>
Date:   Wed Jan 24 12:26:57 2024 +0100

    HID: bpf: remove double fdget()
    
    commit 7cdd2108903a4e369eb37579830afc12a6877ec2 upstream.
    
    When the kfunc hid_bpf_attach_prog() is called, we called twice fdget():
    one for fetching the type of the bpf program, and one for actually
    attaching the program to the device.
    
    The problem is that between those two calls, we have no guarantees that
    the prog_fd is still the same file descriptor for the given program.
    
    Solve this by calling bpf_prog_get() earlier, and use this to fetch the
    program type.
    
    Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
    Link: https://lore.kernel.org/bpf/CAO-hwJJ8vh8JD3-P43L-_CLNmPx0hWj44aom0O838vfP4=_1CA@mail.gmail.com/T/#t
    Cc: <stable@vger.kernel.org>
    Fixes: f5c27da4e3c8 ("HID: initial BPF implementation")
    Link: https://lore.kernel.org/r/20240124-b4-hid-bpf-fixes-v2-1-052520b1e5e6@kernel.org
    Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

HID: i2c-hid-of: fix NULL-deref on failed power up [+ + +]

Author: Johan Hovold <johan+linaro@kernel.org>
Date:   Fri Jan 26 18:09:01 2024 +0100

    HID: i2c-hid-of: fix NULL-deref on failed power up
    
    commit 00aab7dcb2267f2aef59447602f34501efe1a07f upstream.
    
    A while back the I2C HID implementation was split in an ACPI and OF
    part, but the new OF driver never initialises the client pointer which
    is dereferenced on power-up failures.
    
    Fixes: b33752c30023 ("HID: i2c-hid: Reorganize so ACPI and OF are separate modules")
    Cc: stable@vger.kernel.org      # 5.12
    Cc: Douglas Anderson <dianders@chromium.org>
    Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
    Reviewed-by: Douglas Anderson <dianders@chromium.org>
    Signed-off-by: Jiri Kosina <jkosina@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

HID: wacom: Do not register input devices until after hid_hw_start [+ + +]

Author: Jason Gerecke <killertofu@gmail.com>
Date:   Mon Jan 29 14:35:45 2024 -0800

    HID: wacom: Do not register input devices until after hid_hw_start
    
    commit c1d6708bf0d3dd976460d435373cf5abf21ce258 upstream.
    
    If a input device is opened before hid_hw_start is called, events may
    not be received from the hardware. In the case of USB-backed devices,
    for example, the hid_hw_start function is responsible for filling in
    the URB which is submitted when the input device is opened. If a device
    is opened prematurely, polling will never start because the device will
    not have been in the correct state to send the URB.
    
    Because the wacom driver registers its input devices before calling
    hid_hw_start, there is a window of time where a device can be opened
    and end up in an inoperable state. Some ARM-based Chromebooks in particular
    reliably trigger this bug.
    
    This commit splits the wacom_register_inputs function into two pieces.
    One which is responsible for setting up the allocated inputs (and runs
    prior to hid_hw_start so that devices are ready for any input events
    they may end up receiving) and another which only registers the devices
    (and runs after hid_hw_start to ensure devices can be immediately opened
    without issue). Note that the functions to initialize the LEDs and remotes
    are also moved after hid_hw_start to maintain their own dependency chains.
    
    Fixes: 7704ac937345 ("HID: wacom: implement generic HID handling for pen generic devices")
    Cc: stable@vger.kernel.org # v3.18+
    Suggested-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
    Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
    Tested-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
    Signed-off-by: Jiri Kosina <jkosina@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

HID: wacom: generic: Avoid reporting a serial of '0' to userspace [+ + +]

Author: Tatsunosuke Tobita <tatsunosuke.tobita@wacom.com>
Date:   Thu Feb 1 13:40:55 2024 +0900

    HID: wacom: generic: Avoid reporting a serial of '0' to userspace
    
    commit ab41a31dd5e2681803642b6d08590b61867840ec upstream.
    
    The xf86-input-wacom driver does not treat '0' as a valid serial
    number and will drop any input report which contains an
    MSC_SERIAL = 0 event. The kernel driver already takes care to
    avoid sending any MSC_SERIAL event if the value of serial[0] == 0
    (which is the case for devices that don't actually report a
    serial number), but this is not quite sufficient.
    Only the lower 32 bits of the serial get reported to userspace,
    so if this portion of the serial is zero then there can still
    be problems.
    
    This commit allows the driver to report either the lower 32 bits
    if they are non-zero or the upper 32 bits otherwise.
    
    Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
    Signed-off-by: Tatsunosuke Tobita <tatsunosuke.tobita@wacom.com>
    Fixes: f85c9dc678a5 ("HID: wacom: generic: Support tool ID and additional tool types")
    CC: stable@vger.kernel.org # v4.10
    Signed-off-by: Jiri Kosina <jkosina@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hv_netvsc: Fix race condition between netvsc_probe and netvsc_remove [+ + +]

Author: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
Date:   Tue Jan 30 23:35:51 2024 -0800

    hv_netvsc: Fix race condition between netvsc_probe and netvsc_remove
    
    commit e0526ec5360a48ad3ab2e26e802b0532302a7e11 upstream.
    
    In commit ac5047671758 ("hv_netvsc: Disable NAPI before closing the
    VMBus channel"), napi_disable was getting called for all channels,
    including all subchannels without confirming if they are enabled or not.
    
    This caused hv_netvsc getting hung at napi_disable, when netvsc_probe()
    has finished running but nvdev->subchan_work has not started yet.
    netvsc_subchan_work() -> rndis_set_subchannel() has not created the
    sub-channels and because of that netvsc_sc_open() is not running.
    netvsc_remove() calls cancel_work_sync(&nvdev->subchan_work), for which
    netvsc_subchan_work did not run.
    
    netif_napi_add() sets the bit NAPI_STATE_SCHED because it ensures NAPI
    cannot be scheduled. Then netvsc_sc_open() -> napi_enable will clear the
    NAPIF_STATE_SCHED bit, so it can be scheduled. napi_disable() does the
    opposite.
    
    Now during netvsc_device_remove(), when napi_disable is called for those
    subchannels, napi_disable gets stuck on infinite msleep.
    
    This fix addresses this problem by ensuring that napi_disable() is not
    getting called for non-enabled NAPI struct.
    But netif_napi_del() is still necessary for these non-enabled NAPI struct
    for cleanup purpose.
    
    Call trace:
    [  654.559417] task:modprobe        state:D stack:    0 pid: 2321 ppid:  1091 flags:0x00004002
    [  654.568030] Call Trace:
    [  654.571221]  <TASK>
    [  654.573790]  __schedule+0x2d6/0x960
    [  654.577733]  schedule+0x69/0xf0
    [  654.581214]  schedule_timeout+0x87/0x140
    [  654.585463]  ? __bpf_trace_tick_stop+0x20/0x20
    [  654.590291]  msleep+0x2d/0x40
    [  654.593625]  napi_disable+0x2b/0x80
    [  654.597437]  netvsc_device_remove+0x8a/0x1f0 [hv_netvsc]
    [  654.603935]  rndis_filter_device_remove+0x194/0x1c0 [hv_netvsc]
    [  654.611101]  ? do_wait_intr+0xb0/0xb0
    [  654.615753]  netvsc_remove+0x7c/0x120 [hv_netvsc]
    [  654.621675]  vmbus_remove+0x27/0x40 [hv_vmbus]
    
    Cc: stable@vger.kernel.org
    Fixes: ac5047671758 ("hv_netvsc: Disable NAPI before closing the VMBus channel")
    Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
    Reviewed-by: Dexuan Cui <decui@microsoft.com>
    Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://lore.kernel.org/r/1706686551-28510-1-git-send-email-schakrabarti@linux.microsoft.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hv_netvsc: Register VF in netvsc_probe if NET_DEVICE_REGISTER missed [+ + +]

Author: Shradha Gupta <shradhagupta@linux.microsoft.com>
Date:   Thu Feb 1 20:40:38 2024 -0800

    hv_netvsc: Register VF in netvsc_probe if NET_DEVICE_REGISTER missed
    
    commit 9cae43da9867412f8bd09aee5c8a8dc5e8dc3dc2 upstream.
    
    If hv_netvsc driver is unloaded and reloaded, the NET_DEVICE_REGISTER
    handler cannot perform VF register successfully as the register call
    is received before netvsc_probe is finished. This is because we
    register register_netdevice_notifier() very early( even before
    vmbus_driver_register()).
    To fix this, we try to register each such matching VF( if it is visible
    as a netdevice) at the end of netvsc_probe.
    
    Cc: stable@vger.kernel.org
    Fixes: 85520856466e ("hv_netvsc: Fix race of register_netdevice_notifier and VF register")
    Suggested-by: Dexuan Cui <decui@microsoft.com>
    Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
    Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
    Reviewed-by: Dexuan Cui <decui@microsoft.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

i2c: i801: Fix block process call transactions [+ + +]

Author: Jean Delvare <jdelvare@suse.de>
Date:   Wed Feb 14 15:59:39 2024 +0100

    i2c: i801: Fix block process call transactions
    
    [ Upstream commit c1c9d0f6f7f1dbf29db996bd8e166242843a5f21 ]
    
    According to the Intel datasheets, software must reset the block
    buffer index twice for block process call transactions: once before
    writing the outgoing data to the buffer, and once again before
    reading the incoming data from the buffer.
    
    The driver is currently missing the second reset, causing the wrong
    portion of the block buffer to be read.
    
    Signed-off-by: Jean Delvare <jdelvare@suse.de>
    Reported-by: Piotr Zakowski <piotr.zakowski@intel.com>
    Closes: https://lore.kernel.org/linux-i2c/20240213120553.7b0ab120@endymion.delvare/
    Fixes: 315cd67c9453 ("i2c: i801: Add Block Write-Block Read Process Call support")
    Reviewed-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
    Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

i2c: pasemi: split driver into two separate modules [+ + +]

Author: Arnd Bergmann <arnd@arndb.de>
Date:   Mon Feb 12 12:19:04 2024 +0100

    i2c: pasemi: split driver into two separate modules
    
    [ Upstream commit f44bff19268517ee98e80e944cad0f04f1db72e3 ]
    
    On powerpc, it is possible to compile test both the new apple (arm) and
    old pasemi (powerpc) drivers for the i2c hardware at the same time,
    which leads to a warning about linking the same object file twice:
    
    scripts/Makefile.build:244: drivers/i2c/busses/Makefile: i2c-pasemi-core.o is added to multiple modules: i2c-apple i2c-pasemi
    
    Rework the driver to have an explicit helper module, letting Kbuild
    take care of whether this should be built-in or a loadable driver.
    
    Fixes: 9bc5f4f660ff ("i2c: pasemi: Split pci driver to its own file")
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Reviewed-by: Sven Peter <sven@svenpeter.dev>
    Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

i2c: qcom-geni: Correct I2C TRE sequence [+ + +]

Author: Viken Dadhaniya <quic_vdadhani@quicinc.com>
Date:   Mon Feb 12 18:22:39 2024 +0530

    i2c: qcom-geni: Correct I2C TRE sequence
    
    [ Upstream commit 83ef106fa732aea8558253641cd98e8a895604d7 ]
    
    For i2c read operation in GSI mode, we are getting timeout
    due to malformed TRE basically incorrect TRE sequence
    in gpi(drivers/dma/qcom/gpi.c) driver.
    
    I2C driver has geni_i2c_gpi(I2C_WRITE) function which generates GO TRE and
    geni_i2c_gpi(I2C_READ)generates DMA TRE. Hence to generate GO TRE before
    DMA TRE, we should move geni_i2c_gpi(I2C_WRITE) before
    geni_i2c_gpi(I2C_READ) inside the I2C GSI mode transfer function
    i.e. geni_i2c_gpi_xfer().
    
    TRE stands for Transfer Ring Element - which is basically an element with
    size of 4 words. It contains all information like slave address,
    clk divider, dma address value data size etc).
    
    Mainly we have 3 TREs(Config, GO and DMA tre).
    - CONFIG TRE : consists of internal register configuration which is
                   required before start of the transfer.
    - DMA TRE :    contains DDR/Memory address, called as DMA descriptor.
    - GO TRE :     contains Transfer directions, slave ID, Delay flags, Length
                   of the transfer.
    
    I2c driver calls GPI driver API to config each TRE depending on the
    protocol.
    
    For read operation tre sequence will be as below which is not aligned
    to hardware programming guide.
    
    - CONFIG tre
    - DMA tre
    - GO tre
    
    As per Qualcomm's internal Hardware Programming Guide, we should configure
    TREs in below sequence for any RX only transfer.
    
    - CONFIG tre
    - GO tre
    - DMA tre
    
    Fixes: d8703554f4de ("i2c: qcom-geni: Add support for GPI DMA")
    Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
    Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
    Tested-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> # qrb5165-rb5
    Co-developed-by: Mukesh Kumar Savaliya <quic_msavaliy@quicinc.com>
    Signed-off-by: Mukesh Kumar Savaliya <quic_msavaliy@quicinc.com>
    Signed-off-by: Viken Dadhaniya <quic_vdadhani@quicinc.com>
    Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

i40e: Do not allow untrusted VF to remove administratively set MAC [+ + +]

Author: Ivan Vecera <ivecera@redhat.com>
Date:   Thu Feb 8 10:03:33 2024 -0800

    i40e: Do not allow untrusted VF to remove administratively set MAC
    
    [ Upstream commit 73d9629e1c8c1982f13688c4d1019c3994647ccc ]
    
    Currently when PF administratively sets VF's MAC address and the VF
    is put down (VF tries to delete all MACs) then the MAC is removed
    from MAC filters and primary VF MAC is zeroed.
    
    Do not allow untrusted VF to remove primary MAC when it was set
    administratively by PF.
    
    Reproducer:
    1) Create VF
    2) Set VF interface up
    3) Administratively set the VF's MAC
    4) Put VF interface down
    
    [root@host ~]# echo 1 > /sys/class/net/enp2s0f0/device/sriov_numvfs
    [root@host ~]# ip link set enp2s0f0v0 up
    [root@host ~]# ip link set enp2s0f0 vf 0 mac fe:6c:b5:da:c7:7d
    [root@host ~]# ip link show enp2s0f0
    23: enp2s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
        link/ether 3c:ec:ef:b7:dd:04 brd ff:ff:ff:ff:ff:ff
        vf 0     link/ether fe:6c:b5:da:c7:7d brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    [root@host ~]# ip link set enp2s0f0v0 down
    [root@host ~]# ip link show enp2s0f0
    23: enp2s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
        link/ether 3c:ec:ef:b7:dd:04 brd ff:ff:ff:ff:ff:ff
        vf 0     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    
    Fixes: 700bbf6c1f9e ("i40e: allow VF to remove any MAC filter")
    Fixes: ceb29474bbbc ("i40e: Add support for VF to specify its primary MAC address")
    Signed-off-by: Ivan Vecera <ivecera@redhat.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Link: https://lore.kernel.org/r/20240208180335.1844996-1-anthony.l.nguyen@intel.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

i40e: Fix waiting for queues of all VSIs to be disabled [+ + +]

Author: Ivan Vecera <ivecera@redhat.com>
Date:   Wed Nov 8 17:01:03 2023 +0100

    i40e: Fix waiting for queues of all VSIs to be disabled
    
    [ Upstream commit c73729b64bb692186da080602cd13612783f52ac ]
    
    The function i40e_pf_wait_queues_disabled() iterates all PF's VSIs
    up to 'pf->hw.func_caps.num_vsis' but this is incorrect because
    the real number of VSIs can be up to 'pf->num_alloc_vsi' that
    can be higher. Fix this loop.
    
    Fixes: 69129dc39fac ("i40e: Modify Tx disable wait flow in case of DCB reconfiguration")
    Signed-off-by: Ivan Vecera <ivecera@redhat.com>
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
    Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

iio: accel: bma400: Fix a compilation problem [+ + +]

Author: Mario Limonciello <mario.limonciello@amd.com>
Date:   Wed Jan 31 16:52:46 2024 -0600

    iio: accel: bma400: Fix a compilation problem
    
    commit 4cb81840d8f29b66d9d05c6d7f360c9560f7e2f4 upstream.
    
    The kernel fails when compiling without `CONFIG_REGMAP_I2C` but with
    `CONFIG_BMA400`.
    ```
    ld: drivers/iio/accel/bma400_i2c.o: in function `bma400_i2c_probe':
    bma400_i2c.c:(.text+0x23): undefined reference to `__devm_regmap_init_i2c'
    ```
    
    Link: https://download.01.org/0day-ci/archive/20240131/202401311634.FE5CBVwe-lkp@intel.com/config
    Fixes: 465c811f1f20 ("iio: accel: Add driver for the BMA400")
    Fixes: 9bea10642396 ("iio: accel: bma400: add support for bma400 spi")
    Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
    Link: https://lore.kernel.org/r/20240131225246.14169-1-mario.limonciello@amd.com
    Cc: <Stable@vger.kernel.org>
    Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iio: adc: ad4130: only set GPIO_CTRL if pin is unused [+ + +]

Author: Cosmin Tanislav <demonsingur@gmail.com>
Date:   Wed Feb 7 15:20:06 2024 +0200

    iio: adc: ad4130: only set GPIO_CTRL if pin is unused
    
    [ Upstream commit 78367c32bebfe833cd30c855755d863a4ff3fdee ]
    
    Currently, GPIO_CTRL bits are set even if the pins are used for
    measurements.
    
    GPIO_CTRL bits should only be set if the pin is not used for
    other functionality.
    
    Fix this by only setting the GPIO_CTRL bits if the pin has no
    other function.
    
    Fixes: 62094060cf3a ("iio: adc: ad4130: add AD4130 driver")
    Signed-off-by: Cosmin Tanislav <demonsingur@gmail.com>
    Reviewed-by: Nuno Sa <nuno.sa@analog.com>
    Link: https://lore.kernel.org/r/20240207132007.253768-2-demonsingur@gmail.com
    Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

iio: adc: ad4130: zero-initialize clock init data [+ + +]

Author: Cosmin Tanislav <demonsingur@gmail.com>
Date:   Wed Feb 7 15:20:05 2024 +0200

    iio: adc: ad4130: zero-initialize clock init data
    
    [ Upstream commit a22b0a2be69a36511cb5b37d948b651ddf7debf3 ]
    
    The clk_init_data struct does not have all its members
    initialized, causing issues when trying to expose the internal
    clock on the CLK pin.
    
    Fix this by zero-initializing the clk_init_data struct.
    
    Fixes: 62094060cf3a ("iio: adc: ad4130: add AD4130 driver")
    Signed-off-by: Cosmin Tanislav <demonsingur@gmail.com>
    Reviewed-by: Nuno Sa <nuno.sa@analog.com>
    Link: https://lore.kernel.org/r/20240207132007.253768-1-demonsingur@gmail.com
    Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

iio: adc: ad_sigma_delta: ensure proper DMA alignment [+ + +]

Author: Nuno Sa <nuno.sa@analog.com>
Date:   Wed Jan 17 13:41:03 2024 +0100

    iio: adc: ad_sigma_delta: ensure proper DMA alignment
    
    commit 59598510be1d49e1cff7fd7593293bb8e1b2398b upstream.
    
    Aligning the buffer to the L1 cache is not sufficient in some platforms
    as they might have larger cacheline sizes for caches after L1 and thus,
    we can't guarantee DMA safety.
    
    That was the whole reason to introduce IIO_DMA_MINALIGN in [1]. Do the same
    for the sigma_delta ADCs.
    
    [1]: https://lore.kernel.org/linux-iio/20220508175712.647246-2-jic23@kernel.org/
    
    Fixes: 0fb6ee8d0b5e ("iio: ad_sigma_delta: Don't put SPI transfer buffer on the stack")
    Signed-off-by: Nuno Sa <nuno.sa@analog.com>
    Link: https://lore.kernel.org/r/20240117-dev_sigma_delta_no_irq_flags-v1-1-db39261592cf@analog.com
    Cc: <Stable@vger.kernel.org>
    Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iio: commom: st_sensors: ensure proper DMA alignment [+ + +]

Author: Nuno Sa <nuno.sa@analog.com>
Date:   Wed Jan 31 10:16:47 2024 +0100

    iio: commom: st_sensors: ensure proper DMA alignment
    
    commit 862cf85fef85becc55a173387527adb4f076fab0 upstream.
    
    Aligning the buffer to the L1 cache is not sufficient in some platforms
    as they might have larger cacheline sizes for caches after L1 and thus,
    we can't guarantee DMA safety.
    
    That was the whole reason to introduce IIO_DMA_MINALIGN in [1]. Do the same
    for st_sensors common buffer.
    
    While at it, moved the odr_lock before buffer_data as we definitely
    don't want any other data to share a cacheline with the buffer.
    
    [1]: https://lore.kernel.org/linux-iio/20220508175712.647246-2-jic23@kernel.org/
    
    Fixes: e031d5f558f1 ("iio:st_sensors: remove buffer allocation at each buffer enable")
    Signed-off-by: Nuno Sa <nuno.sa@analog.com>
    Cc: <Stable@vger.kernel.org>
    Link: https://lore.kernel.org/r/20240131-dev_dma_safety_stm-v2-1-580c07fae51b@analog.com
    Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iio: core: fix memleak in iio_device_register_sysfs [+ + +]

Author: Dinghao Liu <dinghao.liu@zju.edu.cn>
Date:   Fri Dec 8 15:31:19 2023 +0800

    iio: core: fix memleak in iio_device_register_sysfs
    
    commit 95a0d596bbd0552a78e13ced43f2be1038883c81 upstream.
    
    When iio_device_register_sysfs_group() fails, we should
    free iio_dev_opaque->chan_attr_group.attrs to prevent
    potential memleak.
    
    Fixes: 32f171724e5c ("iio: core: rework iio device group creation")
    Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
    Link: https://lore.kernel.org/r/20231208073119.29283-1-dinghao.liu@zju.edu.cn
    Cc: <Stable@vger.kernel.org>
    Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iio: hid-sensor-als: Return 0 for HID_USAGE_SENSOR_TIME_TIMESTAMP [+ + +]

Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Date:   Sun Feb 4 04:56:17 2024 -0800

    iio: hid-sensor-als: Return 0 for HID_USAGE_SENSOR_TIME_TIMESTAMP
    
    commit 621c6257128149e45b36ffb973a01c3f3461b893 upstream.
    
    When als_capture_sample() is called with usage ID
    HID_USAGE_SENSOR_TIME_TIMESTAMP, return 0. The HID sensor core ignores
    the return value for capture_sample() callback, so return value doesn't
    make difference. But correct the return value to return success instead
    of -EINVAL.
    
    Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
    Link: https://lore.kernel.org/r/20240204125617.2635574-1-srinivas.pandruvada@linux.intel.com
    Cc: <Stable@vger.kernel.org>
    Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iio: imu: adis: ensure proper DMA alignment [+ + +]

Author: Nuno Sa <nuno.sa@analog.com>
Date:   Wed Jan 17 14:10:49 2024 +0100

    iio: imu: adis: ensure proper DMA alignment
    
    commit 8e98b87f515d8c4bae521048a037b2cc431c3fd5 upstream.
    
    Aligning the buffer to the L1 cache is not sufficient in some platforms
    as they might have larger cacheline sizes for caches after L1 and thus,
    we can't guarantee DMA safety.
    
    That was the whole reason to introduce IIO_DMA_MINALIGN in [1]. Do the same
    for the sigma_delta ADCs.
    
    [1]: https://lore.kernel.org/linux-iio/20220508175712.647246-2-jic23@kernel.org/
    
    Fixes: ccd2b52f4ac6 ("staging:iio: Add common ADIS library")
    Signed-off-by: Nuno Sa <nuno.sa@analog.com>
    Link: https://lore.kernel.org/r/20240117-adis-improv-v1-1-7f90e9fad200@analog.com
    Cc: <Stable@vger.kernel.org>
    Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iio: imu: bno055: serdev requires REGMAP [+ + +]

Author: Randy Dunlap <rdunlap@infradead.org>
Date:   Wed Jan 10 10:56:11 2024 -0800

    iio: imu: bno055: serdev requires REGMAP
    
    commit 35ec2d03b282a939949090bd8c39eb37a5856721 upstream.
    
    There are a ton of build errors when REGMAP is not set, so select
    REGMAP to fix all of them.
    
    Examples (not all of them):
    
    ../drivers/iio/imu/bno055/bno055_ser_core.c:495:15: error: variable 'bno055_ser_regmap_bus' has initializer but incomplete type
      495 | static struct regmap_bus bno055_ser_regmap_bus = {
    ../drivers/iio/imu/bno055/bno055_ser_core.c:496:10: error: 'struct regmap_bus' has no member named 'write'
      496 |         .write = bno055_ser_write_reg,
    ../drivers/iio/imu/bno055/bno055_ser_core.c:497:10: error: 'struct regmap_bus' has no member named 'read'
      497 |         .read = bno055_ser_read_reg,
    ../drivers/iio/imu/bno055/bno055_ser_core.c: In function 'bno055_ser_probe':
    ../drivers/iio/imu/bno055/bno055_ser_core.c:532:18: error: implicit declaration of function 'devm_regmap_init'; did you mean 'vmem_map_init'? [-Werror=implicit-function-declaration]
      532 |         regmap = devm_regmap_init(&serdev->dev, &bno055_ser_regmap_bus,
    ../drivers/iio/imu/bno055/bno055_ser_core.c:532:16: warning: assignment to 'struct regmap *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
      532 |         regmap = devm_regmap_init(&serdev->dev, &bno055_ser_regmap_bus,
    ../drivers/iio/imu/bno055/bno055_ser_core.c: At top level:
    ../drivers/iio/imu/bno055/bno055_ser_core.c:495:26: error: storage size of 'bno055_ser_regmap_bus' isn't known
      495 | static struct regmap_bus bno055_ser_regmap_bus = {
    
    Fixes: 2eef5a9cc643 ("iio: imu: add BNO055 serdev driver")
    Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
    Cc: Andrea Merello <andrea.merello@iit.it>
    Cc: Jonathan Cameron <jic23@kernel.org>
    Cc: Lars-Peter Clausen <lars@metafoo.de>
    Cc: linux-iio@vger.kernel.org
    Cc: <Stable@vger.kernel.org>
    Link: https://lore.kernel.org/r/20240110185611.19723-1-rdunlap@infradead.org
    Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iio: magnetometer: rm3100: add boundary check for the value read from RM3100_REG_TMRC [+ + +]

Author: zhili.liu <zhili.liu@ucas.com.cn>
Date:   Tue Jan 2 09:07:11 2024 +0800

    iio: magnetometer: rm3100: add boundary check for the value read from RM3100_REG_TMRC
    
    commit 792595bab4925aa06532a14dd256db523eb4fa5e upstream.
    
    Recently, we encounter kernel crash in function rm3100_common_probe
    caused by out of bound access of array rm3100_samp_rates (because of
    underlying hardware failures). Add boundary check to prevent out of
    bound access.
    
    Fixes: 121354b2eceb ("iio: magnetometer: Add driver support for PNI RM3100")
    Suggested-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
    Signed-off-by: zhili.liu <zhili.liu@ucas.com.cn>
    Link: https://lore.kernel.org/r/1704157631-3814-1-git-send-email-zhouzhouyi@gmail.com
    Cc: <Stable@vger.kernel.org>
    Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iio: pressure: bmp280: Add missing bmp085 to SPI id table [+ + +]

Author: Sam Protsenko <semen.protsenko@linaro.org>
Date:   Wed Dec 20 12:47:53 2023 -0600

    iio: pressure: bmp280: Add missing bmp085 to SPI id table
    
    commit b67f3e653e305abf1471934d7b9fdb9ad2df3eef upstream.
    
    "bmp085" is missing in bmp280_spi_id[] table, which leads to the next
    warning in dmesg:
    
        SPI driver bmp280 has no spi_device_id for bosch,bmp085
    
    Add "bmp085" to bmp280_spi_id[] by mimicking its existing description in
    bmp280_of_spi_match[] table to fix the above warning.
    
    Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org>
    Fixes: b26b4e91700f ("iio: pressure: bmp280: add SPI interface driver")
    Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
    Cc: <Stable@vger.kernel.org>
    Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

interconnect: qcom: sc8180x: Mark CO0 BCM keepalive [+ + +]

Author: Konrad Dybcio <konrad.dybcio@linaro.org>
Date:   Wed Jan 10 15:16:26 2024 +0200

    interconnect: qcom: sc8180x: Mark CO0 BCM keepalive
    
    [ Upstream commit 85e985a4f46e462a37f1875cb74ed380e7c0c2e0 ]
    
    The CO0 BCM needs to be up at all times, otherwise some hardware (like
    the UFS controller) loses its connection to the rest of the SoC,
    resulting in a hang of the platform, accompanied by a spectacular
    logspam.
    
    Mark it as keepalive to prevent such cases.
    
    Fixes: 9c8c6bac1ae8 ("interconnect: qcom: Add SC8180x providers")
    Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
    Link: https://lore.kernel.org/r/20231214-topic-sc8180_fixes-v1-1-421904863006@linaro.org
    Signed-off-by: Georgi Djakov <djakov@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

interconnect: qcom: sm8550: Enable sync_state [+ + +]

Author: Konrad Dybcio <konrad.dybcio@linaro.org>
Date:   Wed Jan 10 15:16:46 2024 +0200

    interconnect: qcom: sm8550: Enable sync_state
    
    [ Upstream commit 24406f6794aa631516241deb9e19de333d6a0600 ]
    
    To ensure the interconnect votes are actually meaningful and in order to
    prevent holding all buses at FMAX, introduce the sync state callback.
    
    Fixes: e6f0d6a30f73 ("interconnect: qcom: Add SM8550 interconnect provider driver")
    Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
    Reviewed-by: Bjorn Andersson <andersson@kernel.org>
    Link: https://lore.kernel.org/r/20231218-topic-8550_fixes-v1-2-ce1272d77540@linaro.org
    Signed-off-by: Georgi Djakov <djakov@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

io_uring/net: fix multishot accept overflow handling [+ + +]

Author: Jens Axboe <axboe@kernel.dk>
Date:   Wed Feb 14 08:23:05 2024 -0700

    io_uring/net: fix multishot accept overflow handling
    
    commit a37ee9e117ef73bbc2f5c0b31911afd52d229861 upstream.
    
    If we hit CQ ring overflow when attempting to post a multishot accept
    completion, we don't properly save the result or return code. This
    results in losing the accepted fd value.
    
    Instead, we return the result from the poll operation that triggered
    the accept retry. This is generally POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
    which is 0xc3, or 195, which looks like a valid file descriptor, but it
    really has no connection to that.
    
    Handle this like we do for other multishot completions - assign the
    result, and return IOU_STOP_MULTISHOT to cancel any further completions
    from this request when overflow is hit. This preserves the result, as we
    should, and tells the application that the request needs to be re-armed.
    
    Cc: stable@vger.kernel.org
    Fixes: 515e26961295 ("io_uring: revert "io_uring fix multishot accept ordering"")
    Link: https://github.com/axboe/liburing/issues/1062
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

irqchip/gic-v3-its: Fix GICv4.1 VPE affinity update [+ + +]

Author: Marc Zyngier <maz@kernel.org>
Date:   Tue Feb 13 10:12:06 2024 +0000

    irqchip/gic-v3-its: Fix GICv4.1 VPE affinity update
    
    commit af9acbfc2c4b72c378d0b9a2ee023ed01055d3e2 upstream.
    
    When updating the affinity of a VPE, the VMOVP command is currently skipped
    if the two CPUs are part of the same VPE affinity.
    
    But this is wrong, as the doorbell corresponding to this VPE is still
    delivered on the 'old' CPU, which screws up the balancing.  Furthermore,
    offlining that 'old' CPU results in doorbell interrupts generated for this
    VPE being discarded.
    
    The harsh reality is that VMOVP cannot be elided when a set_affinity()
    request occurs. It needs to be obeyed, and if an optimisation is to be
    made, it is at the point where the affinity change request is made (such as
    in KVM).
    
    Drop the VMOVP elision altogether, and only use the vpe_table_mask
    to try and stay within the same ITS affinity group if at all possible.
    
    Fixes: dd3f050a216e (irqchip/gic-v4.1: Implement the v4.1 flavour of VMOVP)
    Reported-by: Kunkun Jiang <jiangkunkun@huawei.com>
    Signed-off-by: Marc Zyngier <maz@kernel.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240213101206.2137483-4-maz@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

irqchip/gic-v3-its: Handle non-coherent GICv4 redistributors [+ + +]

Author: Marc Zyngier <maz@kernel.org>
Date:   Tue Feb 13 10:12:04 2024 +0000

    irqchip/gic-v3-its: Handle non-coherent GICv4 redistributors
    
    [ Upstream commit 846297e11e8ae428f8b00156a0cfe2db58100702 ]
    
    Although the GICv3 code base has gained some handling of systems failing to
    handle the shareability attributes, the GICv4 side of things has been
    firmly ignored.
    
    This is unfortunate, as the new recent addition of the "dma-noncoherent" is
    supposed to apply to all of the GICR tables, and not just the ones that are
    common to v3 and v4.
    
    Add some checks to handle the VPROPBASE/VPENDBASE shareability and
    cacheability attributes in the same way we deal with the other GICR_BASE
    registers, wrapping the flag check in a helper for improved readability.
    
    Note that this has been found by inspection only, as I don't have access to
    HW that suffers from this particular issue.
    
    Fixes: 3a0fff0fb6a3 ("irqchip/gic-v3: Enable non-coherent redistributors/ITSes DT probing")
    Signed-off-by: Marc Zyngier <maz@kernel.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
    Link: https://lore.kernel.org/r/20240213101206.2137483-2-maz@kernel.org
    Signed-off-by: Sasha Levin <sashal@kernel.org>

irqchip/gic-v3-its: Restore quirk probing for ACPI-based systems [+ + +]

Author: Marc Zyngier <maz@kernel.org>
Date:   Tue Feb 13 10:12:05 2024 +0000

    irqchip/gic-v3-its: Restore quirk probing for ACPI-based systems
    
    commit 8b02da04ad978827e5ccd675acf170198f747a7a upstream.
    
    While refactoring the way the ITSs are probed, the handling of quirks
    applicable to ACPI-based platforms was lost. As a result, systems such as
    HIP07 lose their GICv4 functionnality, and some other may even fail to
    boot, unless they are configured to boot with DT.
    
    Move the enabling of quirks into its_probe_one(), making it common to all
    firmware implementations.
    
    Fixes: 9585a495ac93 ("irqchip/gic-v3-its: Split allocation from initialisation of its_node")
    Signed-off-by: Marc Zyngier <maz@kernel.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
    Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240213101206.2137483-3-maz@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

irqchip/irq-brcmstb-l2: Add write memory barrier before exit [+ + +]

Author: Doug Berger <opendmb@gmail.com>
Date:   Fri Feb 9 17:24:49 2024 -0800

    irqchip/irq-brcmstb-l2: Add write memory barrier before exit
    
    commit b0344d6854d25a8b3b901c778b1728885dd99007 upstream.
    
    It was observed on Broadcom devices that use GIC v3 architecture L1
    interrupt controllers as the parent of brcmstb-l2 interrupt controllers
    that the deactivation of the parent interrupt could happen before the
    brcmstb-l2 deasserted its output. This would lead the GIC to reactivate the
    interrupt only to find that no L2 interrupt was pending. The result was a
    spurious interrupt invoking handle_bad_irq() with its associated
    messaging. While this did not create a functional problem it is a waste of
    cycles.
    
    The hazard exists because the memory mapped bus writes to the brcmstb-l2
    registers are buffered and the GIC v3 architecture uses a very efficient
    system register write to deactivate the interrupt.
    
    Add a write memory barrier prior to invoking chained_irq_exit() to
    introduce a dsb(st) on those systems to ensure the system register write
    cannot be executed until the memory mapped writes are visible to the
    system.
    
    [ florian: Added Fixes tag ]
    
    Fixes: 7f646e92766e ("irqchip: brcmstb-l2: Add Broadcom Set Top Box  Level-2 interrupt controller")
    Signed-off-by: Doug Berger <opendmb@gmail.com>
    Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
    Acked-by: Marc Zyngier <maz@kernel.org>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240210012449.3009125-1-florian.fainelli@broadcom.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

irqchip/loongson-eiointc: Use correct struct type in eiointc_domain_alloc() [+ + +]

Author: Bibo Mao <maobibo@loongson.cn>
Date:   Tue Jan 30 16:27:20 2024 +0800

    irqchip/loongson-eiointc: Use correct struct type in eiointc_domain_alloc()
    
    [ Upstream commit f1c2765c6afcd1f71f76ed8c9bf94acedab4cecb ]
    
    eiointc_domain_alloc() uses struct eiointc, which is not defined, for a
    pointer. Older compilers treat that as a forward declaration and due to
    assignment of a void pointer there is no warning emitted. As the variable
    is then handed in as a void pointer argument to irq_domain_set_info() the
    code is functional.
    
    Use struct eiointc_priv instead.
    
    [ tglx: Rewrote changelog ]
    
    Fixes: dd281e1a1a93 ("irqchip: Add Loongson Extended I/O interrupt controller support")
    Signed-off-by: Bibo Mao <maobibo@loongson.cn>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Huacai Chen <chenhuacai@loongson.cn>
    Link: https://lore.kernel.org/r/20240130082722.2912576-2-maobibo@loongson.cn
    Signed-off-by: Sasha Levin <sashal@kernel.org>

kallsyms: ignore ARMv4 thunks along with others [+ + +]

Author: Arnd Bergmann <arnd@arndb.de>
Date:   Thu Feb 15 09:32:08 2024 +0100

    kallsyms: ignore ARMv4 thunks along with others
    
    [ Upstream commit a951884d82886d8453d489f84f20ac168d062b38 ]
    
    lld is now able to build ARMv4 and ARMv4T kernels, which means it can
    generate thunks for those (__ARMv4PILongThunk_*, __ARMv4PILongBXThunk_*)
    that can interfere with kallsyms table generation since they do not get
    ignore like the corresponding ARMv5+ ones are:
    
    Inconsistent kallsyms data
    Try "make KALLSYMS_EXTRA_PASS=1" as a workaround
    
    Replace the hardcoded list of thunk symbols with a more general regex that
    covers this one along with future symbols that follow the same pattern.
    
    Fixes: 5eb6e280432d ("ARM: 9289/1: Allow pre-ARMv5 builds with ld.lld 16.0.0 and newer")
    Fixes: efe6e3068067 ("kallsyms: fix nonconverging kallsyms table with lld")
    Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

kbuild: Fix changing ELF file type for output of gen_btf for big endian [+ + +]

Author: Nathan Chancellor <nathan@kernel.org>
Date:   Mon Feb 12 19:05:10 2024 -0700

    kbuild: Fix changing ELF file type for output of gen_btf for big endian
    
    commit e3a9ee963ad8ba677ca925149812c5932b49af69 upstream.
    
    Commit 90ceddcb4950 ("bpf: Support llvm-objcopy for vmlinux BTF")
    changed the ELF type of .btf.vmlinux.bin.o to ET_REL via dd, which works
    fine for little endian platforms:
    
       00000000  7f 45 4c 46 02 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
      -00000010  03 00 b7 00 01 00 00 00  00 00 00 80 00 80 ff ff  |................|
      +00000010  01 00 b7 00 01 00 00 00  00 00 00 80 00 80 ff ff  |................|
    
    However, for big endian platforms, it changes the wrong byte, resulting
    in an invalid ELF file type, which ld.lld rejects:
    
       00000000  7f 45 4c 46 02 02 01 00  00 00 00 00 00 00 00 00  |.ELF............|
      -00000010  00 03 00 16 00 00 00 01  00 00 00 00 00 10 00 00  |................|
      +00000010  01 03 00 16 00 00 00 01  00 00 00 00 00 10 00 00  |................|
    
      Type:                              <unknown>: 103
    
      ld.lld: error: .btf.vmlinux.bin.o: unknown file type
    
    Fix this by updating the entire 16-bit e_type field rather than just a
    single byte, so that everything works correctly for all platforms and
    linkers.
    
       00000000  7f 45 4c 46 02 02 01 00  00 00 00 00 00 00 00 00  |.ELF............|
      -00000010  00 03 00 16 00 00 00 01  00 00 00 00 00 10 00 00  |................|
      +00000010  00 01 00 16 00 00 00 01  00 00 00 00 00 10 00 00  |................|
    
      Type:                              REL (Relocatable file)
    
    While in the area, update the comment to mention that binutils 2.35+
    matches LLD's behavior of rejecting an ET_EXEC input, which occurred
    after the comment was added.
    
    Cc: stable@vger.kernel.org
    Fixes: 90ceddcb4950 ("bpf: Support llvm-objcopy for vmlinux BTF")
    Link: https://github.com/llvm/llvm-project/pull/75643
    Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
    Signed-off-by: Nathan Chancellor <nathan@kernel.org>
    Reviewed-by: Fangrui Song <maskray@google.com>
    Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Reviewed-by: Justin Stitt <justinstitt@google.com>
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ksmbd: free aux buffer if ksmbd_iov_pin_rsp_read fails [+ + +]

Author: Fedor Pchelkin <pchelkin@ispras.ru>
Date:   Mon Feb 5 14:19:16 2024 +0300

    ksmbd: free aux buffer if ksmbd_iov_pin_rsp_read fails
    
    commit 108a020c64434fed4b69762879d78cd24088b4c7 upstream.
    
    ksmbd_iov_pin_rsp_read() doesn't free the provided aux buffer if it
    fails. Seems to be the caller's responsibility to clear the buffer in
    error case.
    
    Found by Linux Verification Center (linuxtesting.org).
    
    Fixes: e2b76ab8b5c9 ("ksmbd: add support for read compound")
    Cc: stable@vger.kernel.org
    Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
    Acked-by: Namjae Jeon <linkinjeon@kernel.org>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: arm64: Fix circular locking dependency [+ + +]

Author: Sebastian Ene <sebastianene@google.com>
Date:   Wed Jan 24 09:10:28 2024 +0000

    KVM: arm64: Fix circular locking dependency
    
    commit 10c02aad111df02088d1a81792a709f6a7eca6cc upstream.
    
    The rule inside kvm enforces that the vcpu->mutex is taken *inside*
    kvm->lock. The rule is violated by the pkvm_create_hyp_vm() which acquires
    the kvm->lock while already holding the vcpu->mutex lock from
    kvm_vcpu_ioctl(). Avoid the circular locking dependency altogether by
    protecting the hyp vm handle with the config_lock, much like we already
    do for other forms of VM-scoped data.
    
    Signed-off-by: Sebastian Ene <sebastianene@google.com>
    Cc: stable@vger.kernel.org
    Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
    Signed-off-by: Marc Zyngier <maz@kernel.org>
    Link: https://lore.kernel.org/r/20240124091027.1477174-2-sebastianene@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: selftests: Avoid infinite loop in hyperv_features when invtsc is missing [+ + +]

Author: Vitaly Kuznetsov <vkuznets@redhat.com>
Date:   Mon Jan 29 09:58:46 2024 +0100

    KVM: selftests: Avoid infinite loop in hyperv_features when invtsc is missing
    
    [ Upstream commit 8ad4855273488c9bd5320b3fee80f66f0023f326 ]
    
    When X86_FEATURE_INVTSC is missing, guest_test_msrs_access() was supposed
    to skip testing dependent Hyper-V invariant TSC feature. Unfortunately,
    'continue' does not lead to that as stage is not incremented. Moreover,
    'vm' allocated with vm_create_with_one_vcpu() is not freed and the test
    runs out of available file descriptors very quickly.
    
    Fixes: bd827bd77537 ("KVM: selftests: Test Hyper-V invariant TSC control")
    Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
    Link: https://lore.kernel.org/r/20240129085847.2674082-1-vkuznets@redhat.com
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

KVM: selftests: Delete superfluous, unused "stage" variable in AMX test [+ + +]

Author: Sean Christopherson <seanjc@google.com>
Date:   Tue Jan 9 14:03:02 2024 -0800

    KVM: selftests: Delete superfluous, unused "stage" variable in AMX test
    
    [ Upstream commit 46fee9e38995af9ae16a8cc7d05031486d44cf35 ]
    
    Delete the AMX's tests "stage" counter, as the counter is no longer used,
    which makes clang unhappy:
    
      x86_64/amx_test.c:224:6: error: variable 'stage' set but not used
              int stage, ret;
                  ^
      1 error generated.
    
    Note, "stage" was never really used, it just happened to be dumped out by
    a (failed) assertion on run->exit_reason, i.e. the AMX test has no concept
    of stages, the code was likely copy+pasted from a different test.
    
    Fixes: c96f57b08012 ("KVM: selftests: Make vCPU exit reason test assertion common")
    Reviewed-by: Jim Mattson <jmattson@google.com>
    Link: https://lore.kernel.org/r/20240109220302.399296-1-seanjc@google.com
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

KVM: selftests: Fix a semaphore imbalance in the dirty ring logging test [+ + +]

Author: Sean Christopherson <seanjc@google.com>
Date:   Fri Feb 2 15:18:31 2024 -0800

    KVM: selftests: Fix a semaphore imbalance in the dirty ring logging test
    
    [ Upstream commit ba58f873cdeec30b6da48e28dd5782c5a3e1371b ]
    
    When finishing the final iteration of dirty_log_test testcase, set
    host_quit _before_ the final "continue" so that the vCPU worker doesn't
    run an extra iteration, and delete the hack-a-fix of an extra "continue"
    from the dirty ring testcase.  This fixes a bug where the extra post to
    sem_vcpu_cont may not be consumed, which results in failures in subsequent
    runs of the testcases.  The bug likely was missed during development as
    x86 supports only a single "guest mode", i.e. there aren't any subsequent
    testcases after the dirty ring test, because for_each_guest_mode() only
    runs a single iteration.
    
    For the regular dirty log testcases, letting the vCPU run one extra
    iteration is a non-issue as the vCPU worker waits on sem_vcpu_cont if and
    only if the worker is explicitly told to stop (vcpu_sync_stop_requested).
    But for the dirty ring test, which needs to periodically stop the vCPU to
    reap the dirty ring, letting the vCPU resume the guest _after_ the last
    iteration means the vCPU will get stuck without an extra "continue".
    
    However, blindly firing off an post to sem_vcpu_cont isn't guaranteed to
    be consumed, e.g. if the vCPU worker sees host_quit==true before resuming
    the guest.  This results in a dangling sem_vcpu_cont, which leads to
    subsequent iterations getting out of sync, as the vCPU worker will
    continue on before the main task is ready for it to resume the guest,
    leading to a variety of asserts, e.g.
    
      ==== Test Assertion Failure ====
      dirty_log_test.c:384: dirty_ring_vcpu_ring_full
      pid=14854 tid=14854 errno=22 - Invalid argument
         1  0x00000000004033eb: dirty_ring_collect_dirty_pages at dirty_log_test.c:384
         2  0x0000000000402d27: log_mode_collect_dirty_pages at dirty_log_test.c:505
         3   (inlined by) run_test at dirty_log_test.c:802
         4  0x0000000000403dc7: for_each_guest_mode at guest_modes.c:100
         5  0x0000000000401dff: main at dirty_log_test.c:941 (discriminator 3)
         6  0x0000ffff9be173c7: ?? ??:0
         7  0x0000ffff9be1749f: ?? ??:0
         8  0x000000000040206f: _start at ??:?
      Didn't continue vcpu even without ring full
    
    Alternatively, the test could simply reset the semaphores before each
    testcase, but papering over hacks with more hacks usually ends in tears.
    
    Reported-by: Shaoqin Huang <shahuang@redhat.com>
    Fixes: 84292e565951 ("KVM: selftests: Add dirty ring buffer test")
    Reviewed-by: Peter Xu <peterx@redhat.com>
    Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
    Link: https://lore.kernel.org/r/20240202231831.354848-1-seanjc@google.com
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

KVM: x86/pmu: Fix type length error when reading pmu->fixed_ctr_ctrl [+ + +]

Author: Mingwei Zhang <mizhang@google.com>
Date:   Tue Jan 23 22:12:20 2024 +0000

    KVM: x86/pmu: Fix type length error when reading pmu->fixed_ctr_ctrl
    
    commit 05519c86d6997cfb9bb6c82ce1595d1015b718dc upstream.
    
    Use a u64 instead of a u8 when taking a snapshot of pmu->fixed_ctr_ctrl
    when reprogramming fixed counters, as truncating the value results in KVM
    thinking fixed counter 2 is already disabled (the bug also affects fixed
    counters 3+, but KVM doesn't yet support those).  As a result, if the
    guest disables fixed counter 2, KVM will get a false negative and fail to
    reprogram/disable emulation of the counter, which can leads to incorrect
    counts and spurious PMIs in the guest.
    
    Fixes: 76d287b2342e ("KVM: x86/pmu: Drop "u8 ctrl, int idx" for reprogram_fixed_counter()")
    Cc: stable@vger.kernel.org
    Signed-off-by: Mingwei Zhang <mizhang@google.com>
    Link: https://lore.kernel.org/r/20240123221220.3911317-1-mizhang@google.com
    [sean: rewrite changelog to call out the effects of the bug]
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: x86: make KVM_REQ_NMI request iff NMI pending for vcpu [+ + +]

Author: Prasad Pandit <pjp@fedoraproject.org>
Date:   Wed Jan 3 13:23:43 2024 +0530

    KVM: x86: make KVM_REQ_NMI request iff NMI pending for vcpu
    
    commit 6231c9e1a9f35b535c66709aa8a6eda40dbc4132 upstream.
    
    kvm_vcpu_ioctl_x86_set_vcpu_events() routine makes 'KVM_REQ_NMI'
    request for a vcpu even when its 'events->nmi.pending' is zero.
    Ex:
        qemu_thread_start
         kvm_vcpu_thread_fn
          qemu_wait_io_event
           qemu_wait_io_event_common
            process_queued_cpu_work
             do_kvm_cpu_synchronize_post_init/_reset
              kvm_arch_put_registers
               kvm_put_vcpu_events (cpu, level=[2|3])
    
    This leads vCPU threads in QEMU to constantly acquire & release the
    global mutex lock, delaying the guest boot due to lock contention.
    Add check to make KVM_REQ_NMI request only if vcpu has NMI pending.
    
    Fixes: bdedff263132 ("KVM: x86: Route pending NMIs from userspace through process_nmi()")
    Cc: stable@vger.kernel.org
    Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
    Link: https://lore.kernel.org/r/20240103075343.549293-1-ppandit@redhat.com
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

lan966x: Fix crash when adding interface under a lag [+ + +]

Author: Horatiu Vultur <horatiu.vultur@microchip.com>
Date:   Tue Feb 6 13:30:54 2024 +0100

    lan966x: Fix crash when adding interface under a lag
    
    [ Upstream commit 15faa1f67ab405d47789d4702f587ec7df7ef03e ]
    
    There is a crash when adding one of the lan966x interfaces under a lag
    interface. The issue can be reproduced like this:
    ip link add name bond0 type bond miimon 100 mode balance-xor
    ip link set dev eth0 master bond0
    
    The reason is because when adding a interface under the lag it would go
    through all the ports and try to figure out which other ports are under
    that lag interface. And the issue is that lan966x can have ports that are
    NULL pointer as they are not probed. So then iterating over these ports
    it would just crash as they are NULL pointers.
    The fix consists in actually checking for NULL pointers before accessing
    something from the ports. Like we do in other places.
    
    Fixes: cabc9d49333d ("net: lan966x: Add lag support for lan966x")
    Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
    Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://lore.kernel.org/r/20240206123054.3052966-1-horatiu.vultur@microchip.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Linux: Linux 6.6.18 [+ + +]

Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Fri Feb 23 09:25:28 2024 +0100

    Linux 6.6.18
    
    Link: https://lore.kernel.org/r/20240220205637.572693592@linuxfoundation.org
    Tested-by: SeongJae Park <sj@kernel.org>
    Tested-by: Allen Pais <apais@linux.microsoft.com>
    Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
    Tested-by: Takeshi Ogasawara <takeshi.ogasawara@futuring-girl.com>
    Tested-by: Jon Hunter <jonathanh@nvidia.com>
    Tested-by: Shuah Khan <skhan@linuxfoundation.org>
    Link: https://lore.kernel.org/r/20240221125953.770767246@linuxfoundation.org
    Tested-by: Takeshi Ogasawara <takeshi.ogasawara@futuring-girl.com>
    Tested-by: Allen Pais <apais@linux.microsoft.com>
    Tested-by: Jon Hunter <jonathanh@nvidia.com>
    Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
    Tested-by: Ron Economos <re@w6rz.net>
    Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
    Tested-by: kernelci.org bot <bot@kernelci.org>
    Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
    Tested-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
    Tested-by: Kelsey Steele <kelseysteele@linux.microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

linux/init: remove __memexit* annotations [+ + +]

Author: Masahiro Yamada <masahiroy@kernel.org>
Date:   Mon Oct 23 02:06:05 2023 +0900

    linux/init: remove __memexit* annotations
    
    commit 6a4e59eeedc3018cb57722eecfcbb49431aeb05f upstream.
    
    We have never used __memexit, __memexitdata, or __memexitconst.
    
    These were unneeded.
    
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Acked-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

LoongArch: Fix earlycon parameter if KASAN enabled [+ + +]

Author: Huacai Chen <chenhuacai@kernel.org>
Date:   Tue Feb 6 12:32:05 2024 +0800

    LoongArch: Fix earlycon parameter if KASAN enabled
    
    commit 639420e9f6cd9ca074732b17ac450d2518d5937f upstream.
    
    The earlycon parameter is based on fixmap, and fixmap addresses are not
    supposed to be shadowed by KASAN. So return the kasan_early_shadow_page
    in kasan_mem_to_shadow() if the input address is above FIXADDR_START.
    Otherwise earlycon cannot work after kasan_init().
    
    Cc: stable@vger.kernel.org
    Fixes: 5aa4ac64e6add3e ("LoongArch: Add KASAN (Kernel Address Sanitizer) support")
    Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

lsm: fix default return value of the socket_getpeersec_*() hooks [+ + +]

Author: Ondrej Mosnacek <omosnace@redhat.com>
Date:   Fri Jan 26 19:45:31 2024 +0100

    lsm: fix default return value of the socket_getpeersec_*() hooks
    
    commit 5a287d3d2b9de2b3e747132c615599907ba5c3c1 upstream.
    
    For these hooks the true "neutral" value is -EOPNOTSUPP, which is
    currently what is returned when no LSM provides this hook and what LSMs
    return when there is no security context set on the socket. Correct the
    value in <linux/lsm_hooks.h> and adjust the dispatch functions in
    security/security.c to avoid issues when the BPF LSM is enabled.
    
    Cc: stable@vger.kernel.org
    Fixes: 98e828a0650f ("security: Refactor declaration of LSM hooks")
    Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
    [PM: subject line tweak]
    Signed-off-by: Paul Moore <paul@paul-moore.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

lsm: fix the logic in security_inode_getsecctx() [+ + +]

Author: Ondrej Mosnacek <omosnace@redhat.com>
Date:   Fri Jan 26 11:44:03 2024 +0100

    lsm: fix the logic in security_inode_getsecctx()
    
    commit 99b817c173cd213671daecd25ca27f56b0c7c4ec upstream.
    
    The inode_getsecctx LSM hook has previously been corrected to have
    -EOPNOTSUPP instead of 0 as the default return value to fix BPF LSM
    behavior. However, the call_int_hook()-generated loop in
    security_inode_getsecctx() was left treating 0 as the neutral value, so
    after an LSM returns 0, the loop continues to try other LSMs, and if one
    of them returns a non-zero value, the function immediately returns with
    said value. So in a situation where SELinux and the BPF LSMs registered
    this hook, -EOPNOTSUPP would be incorrectly returned whenever SELinux
    returned 0.
    
    Fix this by open-coding the call_int_hook() loop and making it use the
    correct LSM_RET_DEFAULT() value as the neutral one, similar to what
    other hooks do.
    
    Cc: stable@vger.kernel.org
    Reported-by: Stephen Smalley <stephen.smalley.work@gmail.com>
    Link: https://lore.kernel.org/selinux/CAEjxPJ4ev-pasUwGx48fDhnmjBnq_Wh90jYPwRQRAqXxmOKD4Q@mail.gmail.com/
    Link: https://bugzilla.redhat.com/show_bug.cgi?id=2257983
    Fixes: b36995b8609a ("lsm: fix default return value for inode_getsecctx")
    Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
    Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
    [PM: subject line tweak]
    Signed-off-by: Paul Moore <paul@paul-moore.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md: bypass block throttle for superblock update [+ + +]

Author: Junxiao Bi <junxiao.bi@oracle.com>
Date:   Wed Nov 8 10:22:15 2023 -0800

    md: bypass block throttle for superblock update
    
    [ Upstream commit d6e035aad6c09991da1c667fb83419329a3baed8 ]
    
    commit 5e2cf333b7bd ("md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d")
    introduced a hung bug and will be reverted in next patch, since the issue
    that commit is fixing is due to md superblock write is throttled by wbt,
    to fix it, we can have superblock write bypass block layer throttle.
    
    Fixes: 5e2cf333b7bd ("md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d")
    Cc: stable@vger.kernel.org # v5.19+
    Suggested-by: Yu Kuai <yukuai3@huawei.com>
    Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
    Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
    Reviewed-by: Yu Kuai <yukuai3@huawei.com>
    Signed-off-by: Song Liu <song@kernel.org>
    Link: https://lore.kernel.org/r/20231108182216.73611-1-junxiao.bi@oracle.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

media: ir_toy: fix a memleak in irtoy_tx [+ + +]

Author: Zhipeng Lu <alexious@zju.edu.cn>
Date:   Wed Jan 17 09:14:19 2024 +0100

    media: ir_toy: fix a memleak in irtoy_tx
    
    [ Upstream commit dc9ceb90c4b42c6e5c6757df1d6257110433788e ]
    
    When irtoy_command fails, buf should be freed since it is allocated by
    irtoy_tx, or there is a memleak.
    
    Fixes: 4114978dcd24 ("media: ir_toy: prevent device from hanging during transmit")
    Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn>
    Signed-off-by: Sean Young <sean@mess.org>
    Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

media: rc: bpf attach/detach requires write permission [+ + +]

Author: Sean Young <sean@mess.org>
Date:   Thu Apr 13 10:50:32 2023 +0200

    media: rc: bpf attach/detach requires write permission
    
    commit 6a9d552483d50953320b9d3b57abdee8d436f23f upstream.
    
    Note that bpf attach/detach also requires CAP_NET_ADMIN.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Sean Young <sean@mess.org>
    Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: Revert "media: rkisp1: Drop IRQF_SHARED" [+ + +]

Author: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Date:   Mon Dec 18 08:54:00 2023 +0100

    media: Revert "media: rkisp1: Drop IRQF_SHARED"
    
    commit a107d643b2a3382e0a2d2c4ef08bf8c6bff4561d upstream.
    
    This reverts commit 85d2a31fe4d9be1555f621ead7a520d8791e0f74.
    
    The rkisp1 does share interrupt lines on some platforms, after all. Thus
    we need to revert this, and implement a fix for the rkisp1 shared irq
    handling in a follow-up patch.
    
    Closes: https://lore.kernel.org/all/87o7eo8vym.fsf@gmail.com/
    Link: https://lore.kernel.org/r/20231218-rkisp-shirq-fix-v1-1-173007628248@ideasonboard.com
    
    Reported-by: Mikhail Rudenko <mike.rudenko@gmail.com>
    Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
    Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
    Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MIPS: Add 'memory' clobber to csum_ipv6_magic() inline assembler [+ + +]

Author: Guenter Roeck <linux@roeck-us.net>
Date:   Sun Feb 11 08:08:37 2024 -0800

    MIPS: Add 'memory' clobber to csum_ipv6_magic() inline assembler
    
    [ Upstream commit d55347bfe4e66dce2e1e7501e5492f4af3e315f8 ]
    
    After 'lib: checksum: Use aligned accesses for ip_fast_csum and
    csum_ipv6_magic tests' was applied, the test_csum_ipv6_magic unit test
    started failing for all mips platforms, both little and bit endian.
    Oddly enough, adding debug code into test_csum_ipv6_magic() made the
    problem disappear.
    
    The gcc manual says:
    
    "The "memory" clobber tells the compiler that the assembly code performs
     memory reads or writes to items other than those listed in the input
     and output operands (for example, accessing the memory pointed to by one
     of the input parameters)
    "
    
    This is definitely the case for csum_ipv6_magic(). Indeed, adding the
    'memory' clobber fixes the problem.
    
    Cc: Charlie Jenkins <charlie@rivosinc.com>
    Cc: Palmer Dabbelt <palmer@rivosinc.com>
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
    Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

misc: fastrpc: Mark all sessions as invalid in cb_remove [+ + +]

Author: Ekansh Gupta <quic_ekangupt@quicinc.com>
Date:   Mon Jan 8 17:18:33 2024 +0530

    misc: fastrpc: Mark all sessions as invalid in cb_remove
    
    commit a4e61de63e34860c36a71d1a364edba16fb6203b upstream.
    
    In remoteproc shutdown sequence, rpmsg_remove will get called which
    would depopulate all the child nodes that have been created during
    rpmsg_probe. This would result in cb_remove call for all the context
    banks for the remoteproc. In cb_remove function, session 0 is
    getting skipped which is not correct as session 0 will never become
    available again. Add changes to mark session 0 also as invalid.
    
    Fixes: f6f9279f2bf0 ("misc: fastrpc: Add Qualcomm fastrpc basic driver model")
    Cc: stable <stable@kernel.org>
    Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
    Link: https://lore.kernel.org/r/20240108114833.20480-1-quic_ekangupt@quicinc.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm/memory: Use exception ip to search exception tables [+ + +]

Author: Jiaxun Yang <jiaxun.yang@flygoat.com>
Date:   Fri Feb 2 12:30:28 2024 +0000

    mm/memory: Use exception ip to search exception tables
    
    [ Upstream commit 8fa5070833886268e4fb646daaca99f725b378e9 ]
    
    On architectures with delay slot, instruction_pointer() may differ
    from where exception was triggered.
    
    Use exception_ip we just introduced to search exception tables to
    get rid of the problem.
    
    Fixes: 4bce37a68ff8 ("mips/mm: Convert to using lock_mm_and_find_vma()")
    Reported-by: Xi Ruoyao <xry111@xry111.site>
    Link: https://lore.kernel.org/r/75e9fd7b08562ad9b456a5bdaacb7cc220311cc9.camel@xry111.site/
    Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
    Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again [+ + +]

Author: Zach O'Keefe <zokeefe@google.com>
Date:   Thu Jan 18 10:19:53 2024 -0800

    mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again
    
    commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78 upstream.
    
    (struct dirty_throttle_control *)->thresh is an unsigned long, but is
    passed as the u32 divisor argument to div_u64().  On architectures where
    unsigned long is 64 bytes, the argument will be implicitly truncated.
    
    Use div64_u64() instead of div_u64() so that the value used in the "is
    this a safe division" check is the same as the divisor.
    
    Also, remove redundant cast of the numerator to u64, as that should happen
    implicitly.
    
    This would be difficult to exploit in memcg domain, given the ratio-based
    arithmetic domain_drity_limits() uses, but is much easier in global
    writeback domain with a BDI_CAP_STRICTLIMIT-backing device, using e.g.
    vm.dirty_bytes=(1<<32)*PAGE_SIZE so that dtc->thresh == (1<<32)
    
    Link: https://lkml.kernel.org/r/20240118181954.1415197-1-zokeefe@google.com
    Fixes: f6789593d5ce ("mm/page-writeback.c: fix divide by zero in bdi_dirty_limits()")
    Signed-off-by: Zach O'Keefe <zokeefe@google.com>
    Cc: Maxim Patlasov <MPatlasov@parallels.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm: hugetlb pages should not be reserved by shmat() if SHM_NORESERVE [+ + +]

Author: Prakash Sangappa <prakash.sangappa@oracle.com>
Date:   Tue Jan 23 12:04:42 2024 -0800

    mm: hugetlb pages should not be reserved by shmat() if SHM_NORESERVE
    
    commit e656c7a9e59607d1672d85ffa9a89031876ffe67 upstream.
    
    For shared memory of type SHM_HUGETLB, hugetlb pages are reserved in
    shmget() call.  If SHM_NORESERVE flags is specified then the hugetlb pages
    are not reserved.  However when the shared memory is attached with the
    shmat() call the hugetlb pages are getting reserved incorrectly for
    SHM_HUGETLB shared memory created with SHM_NORESERVE which is a bug.
    
    -------------------------------
    Following test shows the issue.
    
    $cat shmhtb.c
    
    int main()
    {
            int shmflags = 0660 | IPC_CREAT | SHM_HUGETLB | SHM_NORESERVE;
            int shmid;
    
            shmid = shmget(SKEY, SHMSZ, shmflags);
            if (shmid < 0)
            {
                    printf("shmat: shmget() failed, %d\n", errno);
                    return 1;
            }
            printf("After shmget()\n");
            system("cat /proc/meminfo | grep -i hugepages_");
    
            shmat(shmid, NULL, 0);
            printf("\nAfter shmat()\n");
            system("cat /proc/meminfo | grep -i hugepages_");
    
            shmctl(shmid, IPC_RMID, NULL);
            return 0;
    }
    
     #sysctl -w vm.nr_hugepages=20
     #./shmhtb
    
    After shmget()
    HugePages_Total:      20
    HugePages_Free:       20
    HugePages_Rsvd:        0
    HugePages_Surp:        0
    
    After shmat()
    HugePages_Total:      20
    HugePages_Free:       20
    HugePages_Rsvd:        5 <--
    HugePages_Surp:        0
    --------------------------------
    
    Fix is to ensure that hugetlb pages are not reserved for SHM_HUGETLB shared
    memory in the shmat() call.
    
    Link: https://lkml.kernel.org/r/1706040282-12388-1-git-send-email-prakash.sangappa@oracle.com
    Signed-off-by: Prakash Sangappa <prakash.sangappa@oracle.com>
    Acked-by: Muchun Song <muchun.song@linux.dev>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: sdhci-pci-o2micro: Fix a warm reboot issue that disk can't be detected by BIOS [+ + +]

Author: Fred Ai <fred.ai@bayhubtech.com>
Date:   Sat Feb 3 02:29:08 2024 -0800

    mmc: sdhci-pci-o2micro: Fix a warm reboot issue that disk can't be detected by BIOS
    
    commit 58aeb5623c2ebdadefe6352b14f8076a7073fea0 upstream.
    
    Driver shall switch clock source from DLL clock to
    OPE clock when power off card to ensure that card
    can be identified with OPE clock by BIOS.
    
    Signed-off-by: Fred Ai <fred.ai@bayhubtech.com>
    Fixes:4be33cf18703 ("mmc: sdhci-pci-o2micro: Improve card input timing at SDR104/HS200 mode")
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240203102908.4683-1-fredaibayhubtech@126.com
    Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: slot-gpio: Allow non-sleeping GPIO ro [+ + +]

Author: Alexander Stein <alexander.stein@ew.tq-group.com>
Date:   Tue Feb 6 09:39:12 2024 +0100

    mmc: slot-gpio: Allow non-sleeping GPIO ro
    
    commit cc9432c4fb159a3913e0ce3173b8218cd5bad2e0 upstream.
    
    This change uses the appropriate _cansleep or non-sleeping API for
    reading GPIO read-only state. This allows users with GPIOs that
    never sleepbeing called in atomic context.
    
    Implement the same mechanism as in commit 52af318c93e97 ("mmc: Allow
    non-sleeping GPIO cd").
    
    Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240206083912.2543142-1-alexander.stein@ew.tq-group.com
    Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

modpost: Add '.ltext' and '.ltext.*' to TEXT_SECTIONS [+ + +]

Author: Nathan Chancellor <nathan@kernel.org>
Date:   Tue Jan 23 15:59:55 2024 -0700

    modpost: Add '.ltext' and '.ltext.*' to TEXT_SECTIONS
    
    commit 397586506c3da005b9333ce5947ad01e8018a3be upstream.
    
    After the linked LLVM change, building ARCH=um defconfig results in a
    segmentation fault in modpost. Prior to commit a23e7584ecf3 ("modpost:
    unify 'sym' and 'to' in default_mismatch_handler()"), there was a
    warning:
    
      WARNING: modpost: vmlinux.o(__ex_table+0x88): Section mismatch in reference to the .ltext:(unknown)
      WARNING: modpost: The relocation at __ex_table+0x88 references
      section ".ltext" which is not in the list of
      authorized sections.  If you're adding a new section
      and/or if this reference is valid, add ".ltext" to the
      list of authorized sections to jump to on fault.
      This can be achieved by adding ".ltext" to
      OTHER_TEXT_SECTIONS in scripts/mod/modpost.c.
    
    The linked LLVM change moves global objects to the '.ltext' (and
    '.ltext.*' with '-ffunction-sections') sections with '-mcmodel=large',
    which ARCH=um uses. These sections should be handled just as '.text'
    and '.text.*' are, so add them to TEXT_SECTIONS.
    
    Cc: stable@vger.kernel.org
    Closes: https://github.com/ClangBuiltLinux/linux/issues/1981
    Link: https://github.com/llvm/llvm-project/commit/4bf8a688956a759b7b6b8d94f42d25c13c7af130
    Signed-off-by: Nathan Chancellor <nathan@kernel.org>
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

modpost: trim leading spaces when processing source files list [+ + +]

Author: Radek Krejci <radek.krejci@oracle.com>
Date:   Wed Feb 14 10:14:07 2024 +0100

    modpost: trim leading spaces when processing source files list
    
    [ Upstream commit 5d9a16b2a4d9e8fa028892ded43f6501bc2969e5 ]
    
    get_line() does not trim the leading spaces, but the
    parse_source_files() expects to get lines with source files paths where
    the first space occurs after the file path.
    
    Fixes: 70f30cfe5b89 ("modpost: use read_text_file() and get_line() for reading text files")
    Signed-off-by: Radek Krejci <radek.krejci@oracle.com>
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

mptcp: check addrs list in userspace_pm_get_local_id [+ + +]

Author: Geliang Tang <geliang@kernel.org>
Date:   Thu Feb 8 19:03:53 2024 +0100

    mptcp: check addrs list in userspace_pm_get_local_id
    
    commit f012d796a6de662692159c539689e47e662853a8 upstream.
    
    Before adding a new entry in mptcp_userspace_pm_get_local_id(), it's
    better to check whether this address is already in userspace pm local
    address list. If it's in the list, no need to add a new entry, just
    return it's address ID and use this address.
    
    Fixes: 8b20137012d9 ("mptcp: read attributes of addr entries managed by userspace PMs")
    Cc: stable@vger.kernel.org
    Signed-off-by: Geliang Tang <geliang.tang@linux.dev>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mptcp: drop the push_pending field [+ + +]

Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu Feb 8 19:03:49 2024 +0100

    mptcp: drop the push_pending field
    
    commit bdd70eb68913c960acb895b00a8c62eb64715b1f upstream.
    
    Such field is there to avoid acquiring the data lock in a few spots,
    but it adds complexity to the already non trivial locking schema.
    
    All the relevant call sites (mptcp-level re-injection, set socket
    options), are slow-path, drop such field in favor of 'cb_flags', adding
    the relevant locking.
    
    This patch could be seen as an improvement, instead of a fix. But it
    simplifies the next patch. The 'Fixes' tag has been added to help having
    this series backported to stable.
    
    Fixes: e9d09baca676 ("mptcp: avoid atomic bit manipulation when possible")
    Cc: stable@vger.kernel.org
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mptcp: fix data re-injection from stale subflow [+ + +]

Author: Paolo Abeni <pabeni@redhat.com>
Date:   Wed Jan 31 22:49:46 2024 +0100

    mptcp: fix data re-injection from stale subflow
    
    commit b6c620dc43ccb4e802894e54b651cf81495e9598 upstream.
    
    When the MPTCP PM detects that a subflow is stale, all the packet
    scheduler must re-inject all the mptcp-level unacked data. To avoid
    acquiring unneeded locks, it first try to check if any unacked data
    is present at all in the RTX queue, but such check is currently
    broken, as it uses TCP-specific helper on an MPTCP socket.
    
    Funnily enough fuzzers and static checkers are happy, as the accessed
    memory still belongs to the mptcp_sock struct, and even from a
    functional perspective the recovery completed successfully, as
    the short-cut test always failed.
    
    A recent unrelated TCP change - commit d5fed5addb2b ("tcp: reorganize
    tcp_sock fast path variables") - exposed the issue, as the tcp field
    reorganization makes the mptcp code always skip the re-inection.
    
    Fix the issue dropping the bogus call: we are on a slow path, the early
    optimization proved once again to be evil.
    
    Fixes: 1e1d9d6f119c ("mptcp: handle pending data on closed subflow")
    Cc: stable@vger.kernel.org
    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/468
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1-1-4c1c11e571ff@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mptcp: fix rcv space initialization [+ + +]

Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu Feb 8 19:03:50 2024 +0100

    mptcp: fix rcv space initialization
    
    commit 013e3179dbd2bc756ce1dd90354abac62f65b739 upstream.
    
    mptcp_rcv_space_init() is supposed to happen under the msk socket
    lock, but active msk socket does that without such protection.
    
    Leverage the existing mptcp_propagate_state() helper to that extent.
    We need to ensure mptcp_rcv_space_init will happen before
    mptcp_rcv_space_adjust(), and the release_cb does not assure that:
    explicitly check for such condition.
    
    While at it, move the wnd_end initialization out of mptcp_rcv_space_init(),
    it never belonged there.
    
    Note that the race does not produce ill effect in practice, but
    change allows cleaning-up and defying better the locking model.
    
    Fixes: a6b118febbab ("mptcp: add receive buffer auto-tuning")
    Cc: stable@vger.kernel.org
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mptcp: really cope with fastopen race [+ + +]

Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu Feb 8 19:03:54 2024 +0100

    mptcp: really cope with fastopen race
    
    commit 337cebbd850f94147cee05252778f8f78b8c337f upstream.
    
    Fastopen and PM-trigger subflow shutdown can race, as reported by
    syzkaller.
    
    In my first attempt to close such race, I missed the fact that
    the subflow status can change again before the subflow_state_change
    callback is invoked.
    
    Address the issue additionally copying with all the states directly
    reachable from TCP_FIN_WAIT1.
    
    Fixes: 1e777f39b4d7 ("mptcp: add MSG_FASTOPEN sendmsg flag support")
    Fixes: 4fd19a307016 ("mptcp: fix inconsistent state on fastopen race")
    Cc: stable@vger.kernel.org
    Reported-by: syzbot+c53d4d3ddb327e80bc51@syzkaller.appspotmail.com
    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/458
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net/handshake: Fix handshake_req_destroy_test1 [+ + +]

Author: Chuck Lever <chuck.lever@oracle.com>
Date:   Tue Feb 6 14:16:31 2024 -0500

    net/handshake: Fix handshake_req_destroy_test1
    
    [ Upstream commit 4e1d71cabb19ec2586827adfc60d68689c68c194 ]
    
    Recently, handshake_req_destroy_test1 started failing:
    
    Expected handshake_req_destroy_test == req, but
        handshake_req_destroy_test == 0000000000000000
        req == 0000000060f99b40
    not ok 11 req_destroy works
    
    This is because "sock_release(sock)" was replaced with "fput(filp)"
    to address a memory leak. Note that sock_release() is synchronous
    but fput() usually delays the final close and clean-up.
    
    The delay is not consequential in the other cases that were changed
    but handshake_req_destroy_test1 is testing that handshake_req_cancel()
    followed by closing the file actually does call the ->hp_destroy
    method. Thus the PTR_EQ test at the end has to be sure that the
    final close is complete before it checks the pointer.
    
    We cannot use a completion here because if ->hp_destroy is never
    called (ie, there is an API bug) then the test will hang.
    
    Reported by: Guenter Roeck <linux@roeck-us.net>
    Closes: https://lore.kernel.org/netdev/ZcKDd1to4MPANCrn@tissot.1015granger.net/T/#mac5c6299f86799f1c71776f3a07f9c566c7c3c40
    Fixes: 4a0f07d71b04 ("net/handshake: Fix memory leak in __sock_create() and sock_alloc_file()")
    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Link: https://lore.kernel.org/r/170724699027.91401.7839730697326806733.stgit@oracle-102.nfsv4bat.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: dsa: mv88e6xxx: Fix failed probe due to unsupported C45 reads [+ + +]

Author: Andrew Lunn <andrew@lunn.ch>
Date:   Mon Jan 29 23:49:48 2024 +0100

    net: dsa: mv88e6xxx: Fix failed probe due to unsupported C45 reads
    
    commit 585b40e25dc9ff3d2b03d1495150540849009e5b upstream.
    
    Not all mv88e6xxx device support C45 read/write operations. Those
    which do not return -EOPNOTSUPP. However, when phylib scans the bus,
    it considers this fatal, and the probe of the MDIO bus fails, which in
    term causes the mv88e6xxx probe as a whole to fail.
    
    When there is no device on the bus for a given address, the pull up
    resistor on the data line results in the read returning 0xffff. The
    phylib core code understands this when scanning for devices on the
    bus. C45 allows multiple devices to be supported at one address, so
    phylib will perform a few reads at each address, so although thought
    not the most efficient solution, it is a way to avoid fatal
    errors. Make use of this as a minimal fix for stable to fix the
    probing problems.
    
    Follow up patches will rework how C45 operates to make it similar to
    C22 which considers -ENODEV as a none-fatal, and swap mv88e6xxx to
    using this.
    
    Cc: stable@vger.kernel.org
    Fixes: 743a19e38d02 ("net: dsa: mv88e6xxx: Separate C22 and C45 transactions")
    Reported-by: Tim Menninger <tmenninger@purestorage.com>
    Signed-off-by: Andrew Lunn <andrew@lunn.ch>
    Link: https://lore.kernel.org/r/20240129224948.1531452-1-andrew@lunn.ch
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: ethernet: ti: cpsw: enable mac_managed_pm to fix mdio [+ + +]

Author: Sinthu Raja <sinthu.raja@ti.com>
Date:   Tue Feb 6 06:29:28 2024 +0530

    net: ethernet: ti: cpsw: enable mac_managed_pm to fix mdio
    
    commit bc4ce46b1e3d1da4309405cd4afc7c0fcddd0b90 upstream.
    
    The below commit  introduced a WARN when phy state is not in the states:
    PHY_HALTED, PHY_READY and PHY_UP.
    commit 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
    
    When cpsw resumes, there have port in PHY_NOLINK state, so the below
    warning comes out. Set mac_managed_pm be true to tell mdio that the phy
    resume/suspend is managed by the mac, to fix the following warning:
    
    WARNING: CPU: 0 PID: 965 at drivers/net/phy/phy_device.c:326 mdio_bus_phy_resume+0x140/0x144
    CPU: 0 PID: 965 Comm: sh Tainted: G           O       6.1.46-g247b2535b2 #1
    Hardware name: Generic AM33XX (Flattened Device Tree)
     unwind_backtrace from show_stack+0x18/0x1c
     show_stack from dump_stack_lvl+0x24/0x2c
     dump_stack_lvl from __warn+0x84/0x15c
     __warn from warn_slowpath_fmt+0x1a8/0x1c8
     warn_slowpath_fmt from mdio_bus_phy_resume+0x140/0x144
     mdio_bus_phy_resume from dpm_run_callback+0x3c/0x140
     dpm_run_callback from device_resume+0xb8/0x2b8
     device_resume from dpm_resume+0x144/0x314
     dpm_resume from dpm_resume_end+0x14/0x20
     dpm_resume_end from suspend_devices_and_enter+0xd0/0x924
     suspend_devices_and_enter from pm_suspend+0x2e0/0x33c
     pm_suspend from state_store+0x74/0xd0
     state_store from kernfs_fop_write_iter+0x104/0x1ec
     kernfs_fop_write_iter from vfs_write+0x1b8/0x358
     vfs_write from ksys_write+0x78/0xf8
     ksys_write from ret_fast_syscall+0x0/0x54
    Exception stack(0xe094dfa8 to 0xe094dff0)
    dfa0:                   00000004 005c3fb8 00000001 005c3fb8 00000004 00000001
    dfc0: 00000004 005c3fb8 b6f6bba0 00000004 00000004 0059edb8 00000000 00000000
    dfe0: 00000004 bed918f0 b6f09bd3 b6e89a66
    
    Cc: <stable@vger.kernel.org> # v6.0+
    Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
    Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM")
    Signed-off-by: Sinthu Raja <sinthu.raja@ti.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: ethernet: ti: cpsw_new: enable mac_managed_pm to fix mdio [+ + +]

Author: Sinthu Raja <sinthu.raja@ti.com>
Date:   Tue Feb 6 06:29:27 2024 +0530

    net: ethernet: ti: cpsw_new: enable mac_managed_pm to fix mdio
    
    commit 9def04e759caa5a3d741891037ae99f81e2fff01 upstream.
    
    The below commit  introduced a WARN when phy state is not in the states:
    PHY_HALTED, PHY_READY and PHY_UP.
    commit 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
    
    When cpsw_new resumes, there have port in PHY_NOLINK state, so the below
    warning comes out. Set mac_managed_pm be true to tell mdio that the phy
    resume/suspend is managed by the mac, to fix the following warning:
    
    WARNING: CPU: 0 PID: 965 at drivers/net/phy/phy_device.c:326 mdio_bus_phy_resume+0x140/0x144
    CPU: 0 PID: 965 Comm: sh Tainted: G           O       6.1.46-g247b2535b2 #1
    Hardware name: Generic AM33XX (Flattened Device Tree)
     unwind_backtrace from show_stack+0x18/0x1c
     show_stack from dump_stack_lvl+0x24/0x2c
     dump_stack_lvl from __warn+0x84/0x15c
     __warn from warn_slowpath_fmt+0x1a8/0x1c8
     warn_slowpath_fmt from mdio_bus_phy_resume+0x140/0x144
     mdio_bus_phy_resume from dpm_run_callback+0x3c/0x140
     dpm_run_callback from device_resume+0xb8/0x2b8
     device_resume from dpm_resume+0x144/0x314
     dpm_resume from dpm_resume_end+0x14/0x20
     dpm_resume_end from suspend_devices_and_enter+0xd0/0x924
     suspend_devices_and_enter from pm_suspend+0x2e0/0x33c
     pm_suspend from state_store+0x74/0xd0
     state_store from kernfs_fop_write_iter+0x104/0x1ec
     kernfs_fop_write_iter from vfs_write+0x1b8/0x358
     vfs_write from ksys_write+0x78/0xf8
     ksys_write from ret_fast_syscall+0x0/0x54
    Exception stack(0xe094dfa8 to 0xe094dff0)
    dfa0:                   00000004 005c3fb8 00000001 005c3fb8 00000004 00000001
    dfc0: 00000004 005c3fb8 b6f6bba0 00000004 00000004 0059edb8 00000000 00000000
    dfe0: 00000004 bed918f0 b6f09bd3 b6e89a66
    
    Cc: <stable@vger.kernel.org> # v6.0+
    Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
    Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM")
    Signed-off-by: Sinthu Raja <sinthu.raja@ti.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: hsr: remove WARN_ONCE() in send_hsr_supervision_frame() [+ + +]

Author: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Date:   Wed Jan 24 02:21:47 2024 -0800

    net: hsr: remove WARN_ONCE() in send_hsr_supervision_frame()
    
    commit 37e8c97e539015637cb920d3e6f1e404f707a06e upstream.
    
    Syzkaller reported [1] hitting a warning after failing to allocate
    resources for skb in hsr_init_skb(). Since a WARN_ONCE() call will
    not help much in this case, it might be prudent to switch to
    netdev_warn_once(). At the very least it will suppress syzkaller
    reports such as [1].
    
    Just in case, use netdev_warn_once() in send_prp_supervision_frame()
    for similar reasons.
    
    [1]
    HSR: Could not send supervision frame
    WARNING: CPU: 1 PID: 85 at net/hsr/hsr_device.c:294 send_hsr_supervision_frame+0x60a/0x810 net/hsr/hsr_device.c:294
    RIP: 0010:send_hsr_supervision_frame+0x60a/0x810 net/hsr/hsr_device.c:294
    ...
    Call Trace:
     <IRQ>
     hsr_announce+0x114/0x370 net/hsr/hsr_device.c:382
     call_timer_fn+0x193/0x590 kernel/time/timer.c:1700
     expire_timers kernel/time/timer.c:1751 [inline]
     __run_timers+0x764/0xb20 kernel/time/timer.c:2022
     run_timer_softirq+0x58/0xd0 kernel/time/timer.c:2035
     __do_softirq+0x21a/0x8de kernel/softirq.c:553
     invoke_softirq kernel/softirq.c:427 [inline]
     __irq_exit_rcu kernel/softirq.c:632 [inline]
     irq_exit_rcu+0xb7/0x120 kernel/softirq.c:644
     sysvec_apic_timer_interrupt+0x95/0xb0 arch/x86/kernel/apic/apic.c:1076
     </IRQ>
     <TASK>
     asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:649
    ...
    
    This issue is also found in older kernels (at least up to 5.10).
    
    Cc: stable@vger.kernel.org
    Reported-by: syzbot+3ae0a3f42c84074b7c8e@syzkaller.appspotmail.com
    Fixes: 121c33b07b31 ("net: hsr: introduce common code for skb initialization")
    Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: openvswitch: limit the number of recursions from action sets [+ + +]

Author: Aaron Conole <aconole@redhat.com>
Date:   Wed Feb 7 08:24:15 2024 -0500

    net: openvswitch: limit the number of recursions from action sets
    
    [ Upstream commit 6e2f90d31fe09f2b852de25125ca875aabd81367 ]
    
    The ovs module allows for some actions to recursively contain an action
    list for complex scenarios, such as sampling, checking lengths, etc.
    When these actions are copied into the internal flow table, they are
    evaluated to validate that such actions make sense, and these calls
    happen recursively.
    
    The ovs-vswitchd userspace won't emit more than 16 recursion levels
    deep.  However, the module has no such limit and will happily accept
    limits larger than 16 levels nested.  Prevent this by tracking the
    number of recursions happening and manually limiting it to 16 levels
    nested.
    
    The initial implementation of the sample action would track this depth
    and prevent more than 3 levels of recursion, but this was removed to
    support the clone use case, rather than limited at the current userspace
    limit.
    
    Fixes: 798c166173ff ("openvswitch: Optimize sample action for the clone use cases")
    Signed-off-by: Aaron Conole <aconole@redhat.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://lore.kernel.org/r/20240207132416.1488485-2-aconole@redhat.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: stmmac: do not clear TBS enable bit on link up/down [+ + +]

Author: Esben Haabendal <esben@geanix.com>
Date:   Fri Jan 26 10:10:41 2024 +0100

    net: stmmac: do not clear TBS enable bit on link up/down
    
    commit 4896bb7c0b31a0a3379b290ea7729900c59e0c69 upstream.
    
    With the dma conf being reallocated on each call to stmmac_open(), any
    information in there is lost, unless we specifically handle it.
    
    The STMMAC_TBS_EN bit is set when adding an etf qdisc, and the etf qdisc
    therefore would stop working when link was set down and then back up.
    
    Fixes: ba39b344e924 ("net: ethernet: stmicro: stmmac: generate stmmac dma conf before open")
    Cc: stable@vger.kernel.org
    Signed-off-by: Esben Haabendal <esben@geanix.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: stmmac: protect updates of 64-bit statistics counters [+ + +]

Author: Petr Tesarik <petr@tesarici.cz>
Date:   Sat Feb 3 20:09:27 2024 +0100

    net: stmmac: protect updates of 64-bit statistics counters
    
    commit 38cc3c6dcc09dc3a1800b5ec22aef643ca11eab8 upstream.
    
    As explained by a comment in <linux/u64_stats_sync.h>, write side of struct
    u64_stats_sync must ensure mutual exclusion, or one seqcount update could
    be lost on 32-bit platforms, thus blocking readers forever. Such lockups
    have been observed in real world after stmmac_xmit() on one CPU raced with
    stmmac_napi_poll_tx() on another CPU.
    
    To fix the issue without introducing a new lock, split the statics into
    three parts:
    
    1. fields updated only under the tx queue lock,
    2. fields updated only during NAPI poll,
    3. fields updated only from interrupt context,
    
    Updates to fields in the first two groups are already serialized through
    other locks. It is sufficient to split the existing struct u64_stats_sync
    so that each group has its own.
    
    Note that tx_set_ic_bit is updated from both contexts. Split this counter
    so that each context gets its own, and calculate their sum to get the total
    value in stmmac_get_ethtool_stats().
    
    For the third group, multiple interrupts may be processed by different CPUs
    at the same time, but interrupts on the same CPU will not nest. Move fields
    from this group to a newly created per-cpu struct stmmac_pcpu_stats.
    
    Fixes: 133466c3bbe1 ("net: stmmac: use per-queue 64 bit statistics where necessary")
    Link: https://lore.kernel.org/netdev/Za173PhviYg-1qIn@torres.zugschlus.de/t/
    Cc: stable@vger.kernel.org
    Signed-off-by: Petr Tesarik <petr@tesarici.cz>
    Reviewed-by: Jisheng Zhang <jszhang@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: sysfs: Fix /sys/class/net/ path for statistics [+ + +]

Author: Breno Leitao <leitao@debian.org>
Date:   Fri Feb 9 01:55:18 2024 -0800

    net: sysfs: Fix /sys/class/net/<iface> path for statistics
    
    [ Upstream commit 5b3fbd61b9d1f4ed2db95aaf03f9adae0373784d ]
    
    The Documentation/ABI/testing/sysfs-class-net-statistics documentation
    is pointing to the wrong path for the interface.  Documentation is
    pointing to /sys/class/<iface>, instead of /sys/class/net/<iface>.
    
    Fix it by adding the `net/` directory before the interface.
    
    Fixes: 6044f9700645 ("net: sysfs: document /sys/class/net/statistics/*")
    Signed-off-by: Breno Leitao <leitao@debian.org>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: tls: factor out tls_*crypt_async_wait() [+ + +]

Author: Jakub Kicinski <kuba@kernel.org>
Date:   Tue Feb 6 17:18:18 2024 -0800

    net: tls: factor out tls_*crypt_async_wait()
    
    [ Upstream commit c57ca512f3b68ddcd62bda9cc24a8f5584ab01b1 ]
    
    Factor out waiting for async encrypt and decrypt to finish.
    There are already multiple copies and a subsequent fix will
    need more. No functional changes.
    
    Note that crypto_wait_req() returns wait->err
    
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Stable-dep-of: aec7961916f3 ("tls: fix race between async notify and socket close")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: tls: fix returned read length with async decrypt [+ + +]

Author: Jakub Kicinski <kuba@kernel.org>
Date:   Tue Feb 6 17:18:24 2024 -0800

    net: tls: fix returned read length with async decrypt
    
    [ Upstream commit ac437a51ce662364062f704e321227f6728e6adc ]
    
    We double count async, non-zc rx data. The previous fix was
    lucky because if we fully zc async_copy_bytes is 0 so we add 0.
    Decrypted already has all the bytes we handled, in all cases.
    We don't have to adjust anything, delete the erroneous line.
    
    Fixes: 4d42cd6bc2ac ("tls: rx: fix return value for async crypto")
    Co-developed-by: Sabrina Dubroca <sd@queasysnail.net>
    Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: tls: fix use-after-free with partial reads and async decrypt [+ + +]

Author: Sabrina Dubroca <sd@queasysnail.net>
Date:   Tue Feb 6 17:18:22 2024 -0800

    net: tls: fix use-after-free with partial reads and async decrypt
    
    [ Upstream commit 32b55c5ff9103b8508c1e04bfa5a08c64e7a925f ]
    
    tls_decrypt_sg doesn't take a reference on the pages from clear_skb,
    so the put_page() in tls_decrypt_done releases them, and we trigger
    a use-after-free in process_rx_list when we try to read from the
    partially-read skb.
    
    Fixes: fd31f3996af2 ("tls: rx: decrypt into a fresh skb")
    Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: tls: handle backlogging of crypto requests [+ + +]

Author: Jakub Kicinski <kuba@kernel.org>
Date:   Tue Feb 6 17:18:21 2024 -0800

    net: tls: handle backlogging of crypto requests
    
    [ Upstream commit 8590541473188741055d27b955db0777569438e3 ]
    
    Since we're setting the CRYPTO_TFM_REQ_MAY_BACKLOG flag on our
    requests to the crypto API, crypto_aead_{encrypt,decrypt} can return
     -EBUSY instead of -EINPROGRESS in valid situations. For example, when
    the cryptd queue for AESNI is full (easy to trigger with an
    artificially low cryptd.cryptd_max_cpu_qlen), requests will be enqueued
    to the backlog but still processed. In that case, the async callback
    will also be called twice: first with err == -EINPROGRESS, which it
    seems we can just ignore, then with err == 0.
    
    Compared to Sabrina's original patch this version uses the new
    tls_*crypt_async_wait() helpers and converts the EBUSY to
    EINPROGRESS to avoid having to modify all the error handling
    paths. The handling is identical.
    
    Fixes: a54667f6728c ("tls: Add support for encryption using async offload accelerator")
    Fixes: 94524d8fc965 ("net/tls: Add support for async decryption of tls records")
    Co-developed-by: Sabrina Dubroca <sd@queasysnail.net>
    Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
    Link: https://lore.kernel.org/netdev/9681d1febfec295449a62300938ed2ae66983f28.1694018970.git.sd@queasysnail.net/
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netfilter: ipset: fix performance regression in swap operation [+ + +]

Author: Jozsef Kadlecsik <kadlec@netfilter.org>
Date:   Mon Jan 29 10:57:01 2024 +0100

    netfilter: ipset: fix performance regression in swap operation
    
    commit 97f7cf1cd80eeed3b7c808b7c12463295c751001 upstream.
    
    The patch "netfilter: ipset: fix race condition between swap/destroy
    and kernel side add/del/test", commit 28628fa9 fixes a race condition.
    But the synchronize_rcu() added to the swap function unnecessarily slows
    it down: it can safely be moved to destroy and use call_rcu() instead.
    
    Eric Dumazet pointed out that simply calling the destroy functions as
    rcu callback does not work: sets with timeout use garbage collectors
    which need cancelling at destroy which can wait. Therefore the destroy
    functions are split into two: cancelling garbage collectors safely at
    executing the command received by netlink and moving the remaining
    part only into the rcu callback.
    
    Link: https://lore.kernel.org/lkml/C0829B10-EAA6-4809-874E-E1E9C05A8D84@automattic.com/
    Fixes: 28628fa952fe ("netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test")
    Reported-by: Ale Crismani <ale.crismani@automattic.com>
    Reported-by: David Wang <00107082@163.com>
    Tested-by: David Wang <00107082@163.com>
    Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

netfilter: ipset: Missing gc cancellations fixed [+ + +]

Author: Jozsef Kadlecsik <kadlec@netfilter.org>
Date:   Sun Feb 4 16:26:42 2024 +0100

    netfilter: ipset: Missing gc cancellations fixed
    
    commit 27c5a095e2518975e20a10102908ae8231699879 upstream.
    
    The patch fdb8e12cc2cc ("netfilter: ipset: fix performance regression
    in swap operation") missed to add the calls to gc cancellations
    at the error path of create operations and at module unload. Also,
    because the half of the destroy operations now executed by a
    function registered by call_rcu(), neither NFNL_SUBSYS_IPSET mutex
    or rcu read lock is held and therefore the checking of them results
    false warnings.
    
    Fixes: 97f7cf1cd80e ("netfilter: ipset: fix performance regression in swap operation")
    Reported-by: syzbot+52bbc0ad036f6f0d4a25@syzkaller.appspotmail.com
    Reported-by: Brad Spengler <spender@grsecurity.net>
    Reported-by: п║я┌п╟я│ п²п╦я┤п╦п©п╬я─п╬п╡п╦я┤ <stasn77@gmail.com>
    Tested-by: Brad Spengler <spender@grsecurity.net>
    Tested-by: п║я┌п╟я│ п²п╦я┤п╦п©п╬я─п╬п╡п╦я┤ <stasn77@gmail.com>
    Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfc: nci: free rx_data_reassembly skb on NCI device cleanup [+ + +]

Author: Fedor Pchelkin <pchelkin@ispras.ru>
Date:   Thu Jan 25 12:53:09 2024 +0300

    nfc: nci: free rx_data_reassembly skb on NCI device cleanup
    
    commit bfb007aebe6bff451f7f3a4be19f4f286d0d5d9c upstream.
    
    rx_data_reassembly skb is stored during NCI data exchange for processing
    fragmented packets. It is dropped only when the last fragment is processed
    or when an NTF packet with NCI_OP_RF_DEACTIVATE_NTF opcode is received.
    However, the NCI device may be deallocated before that which leads to skb
    leak.
    
    As by design the rx_data_reassembly skb is bound to the NCI device and
    nothing prevents the device to be freed before the skb is processed in
    some way and cleaned, free it on the NCI device cleanup.
    
    Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
    
    Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
    Cc: stable@vger.kernel.org
    Reported-by: syzbot+6b7c68d9c21e4ee4251b@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/lkml/000000000000f43987060043da7b@google.com/
    Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfp: enable NETDEV_XDP_ACT_REDIRECT feature flag [+ + +]

Author: James Hershaw <james.hershaw@corigine.com>
Date:   Fri Feb 2 13:37:19 2024 +0200

    nfp: enable NETDEV_XDP_ACT_REDIRECT feature flag
    
    commit 0f4d6f011bca0df2051532b41b596366aa272019 upstream.
    
    Enable previously excluded xdp feature flag for NFD3 devices. This
    feature flag is required in order to bind nfp interfaces to an xdp
    socket and the nfp driver does in fact support the feature.
    
    Fixes: 66c0e13ad236 ("drivers: net: turn on XDP features")
    Cc: stable@vger.kernel.org # 6.3+
    Signed-off-by: James Hershaw <james.hershaw@corigine.com>
    Signed-off-by: Louis Peens <louis.peens@corigine.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfp: flower: add hardware offload check for post ct entry [+ + +]

Author: Hui Zhou <hui.zhou@corigine.com>
Date:   Wed Jan 24 17:19:08 2024 +0200

    nfp: flower: add hardware offload check for post ct entry
    
    commit cefa98e806fd4e2a5e2047457a11ae5f17b8f621 upstream.
    
    The nfp offload flow pay will not allocate a mask id when the out port
    is openvswitch internal port. This is because these flows are used to
    configure the pre_tun table and are never actually send to the firmware
    as an add-flow message. When a tc rule which action contains ct and
    the post ct entry's out port is openvswitch internal port, the merge
    offload flow pay with the wrong mask id of 0 will be send to the
    firmware. Actually, the nfp can not support hardware offload for this
    situation, so return EOPNOTSUPP.
    
    Fixes: bd0fe7f96a3c ("nfp: flower-ct: add zone table entry when handling pre/post_ct flows")
    CC: stable@vger.kernel.org # 5.14+
    Signed-off-by: Hui Zhou <hui.zhou@corigine.com>
    Signed-off-by: Louis Peens <louis.peens@corigine.com>
    Link: https://lore.kernel.org/r/20240124151909.31603-2-louis.peens@corigine.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfp: flower: fix hardware offload for the transfer layer port [+ + +]

Author: Hui Zhou <hui.zhou@corigine.com>
Date:   Wed Jan 24 17:19:09 2024 +0200

    nfp: flower: fix hardware offload for the transfer layer port
    
    commit 3a007b8009b5f8af021021b7a590a6da0dc4c6e0 upstream.
    
    The nfp driver will merge the tp source port and tp destination port
    into one dword which the offset must be zero to do hardware offload.
    However, the mangle action for the tp source port and tp destination
    port is separated for tc ct action. Modify the mangle action for the
    FLOW_ACT_MANGLE_HDR_TYPE_TCP and FLOW_ACT_MANGLE_HDR_TYPE_UDP to
    satisfy the nfp driver offload check for the tp port.
    
    The mangle action provides a 4B value for source, and a 4B value for
    the destination, but only 2B of each contains the useful information.
    For offload the 2B of each is combined into a single 4B word. Since the
    incoming mask for the source is '0xFFFF<mask>' the shift-left will
    throw away the 0xFFFF part. When this gets combined together in the
    offload it will clear the destination field. Fix this by setting the
    lower bits back to 0xFFFF, effectively doing a rotate-left operation on
    the mask.
    
    Fixes: 5cee92c6f57a ("nfp: flower: support hw offload for ct nat action")
    CC: stable@vger.kernel.org # 6.1+
    Signed-off-by: Hui Zhou <hui.zhou@corigine.com>
    Signed-off-by: Louis Peens <louis.peens@corigine.com>
    Link: https://lore.kernel.org/r/20240124151909.31603-3-louis.peens@corigine.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfp: flower: prevent re-adding mac index for bonded port [+ + +]

Author: Daniel de Villiers <daniel.devilliers@corigine.com>
Date:   Fri Feb 2 13:37:18 2024 +0200

    nfp: flower: prevent re-adding mac index for bonded port
    
    commit 1a1c13303ff6d64e6f718dc8aa614e580ca8d9b4 upstream.
    
    When physical ports are reset (either through link failure or manually
    toggled down and up again) that are slaved to a Linux bond with a tunnel
    endpoint IP address on the bond device, not all tunnel packets arriving
    on the bond port are decapped as expected.
    
    The bond dev assigns the same MAC address to itself and each of its
    slaves. When toggling a slave device, the same MAC address is therefore
    offloaded to the NFP multiple times with different indexes.
    
    The issue only occurs when re-adding the shared mac. The
    nfp_tunnel_add_shared_mac() function has a conditional check early on
    that checks if a mac entry already exists and if that mac entry is
    global: (entry && nfp_tunnel_is_mac_idx_global(entry->index)). In the
    case of a bonded device (For example br-ex), the mac index is obtained,
    and no new index is assigned.
    
    We therefore modify the conditional in nfp_tunnel_add_shared_mac() to
    check if the port belongs to the LAG along with the existing checks to
    prevent a new global mac index from being re-assigned to the slave port.
    
    Fixes: 20cce8865098 ("nfp: flower: enable MAC address sharing for offloadable devs")
    CC: stable@vger.kernel.org # 5.1+
    Signed-off-by: Daniel de Villiers <daniel.devilliers@corigine.com>
    Signed-off-by: Louis Peens <louis.peens@corigine.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfp: use correct macro for LengthSelect in BAR config [+ + +]

Author: Daniel Basilio <daniel.basilio@corigine.com>
Date:   Fri Feb 2 13:37:17 2024 +0200

    nfp: use correct macro for LengthSelect in BAR config
    
    commit b3d4f7f2288901ed2392695919b3c0e24c1b4084 upstream.
    
    The 1st and 2nd expansion BAR configuration registers are configured,
    when the driver starts up, in variables 'barcfg_msix_general' and
    'barcfg_msix_xpb', respectively. The 'LengthSelect' field is ORed in
    from bit 0, which is incorrect. The 'LengthSelect' field should
    start from bit 27.
    
    This has largely gone un-noticed because
    NFP_PCIE_BAR_PCIE2CPP_LengthSelect_32BIT happens to be 0.
    
    Fixes: 4cb584e0ee7d ("nfp: add CPP access core")
    Cc: stable@vger.kernel.org # 4.11+
    Signed-off-by: Daniel Basilio <daniel.basilio@corigine.com>
    Signed-off-by: Louis Peens <louis.peens@corigine.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd: don't take fi_lock in nfsd_break_deleg_cb() [+ + +]

Author: NeilBrown <neilb@suse.de>
Date:   Mon Feb 5 13:22:39 2024 +1100

    nfsd: don't take fi_lock in nfsd_break_deleg_cb()
    
    commit 5ea9a7c5fe4149f165f0e3b624fe08df02b6c301 upstream.
    
    A recent change to check_for_locks() changed it to take ->flc_lock while
    holding ->fi_lock.  This creates a lock inversion (reported by lockdep)
    because there is a case where ->fi_lock is taken while holding
    ->flc_lock.
    
    ->flc_lock is held across ->fl_lmops callbacks, and
    nfsd_break_deleg_cb() is one of those and does take ->fi_lock.  However
    it doesn't need to.
    
    Prior to v4.17-rc1~110^2~22 ("nfsd: create a separate lease for each
    delegation") nfsd_break_deleg_cb() would walk the ->fi_delegations list
    and so needed the lock.  Since then it doesn't walk the list and doesn't
    need the lock.
    
    Two actions are performed under the lock.  One is to call
    nfsd_break_one_deleg which calls nfsd4_run_cb().  These doesn't act on
    the nfs4_file at all, so don't need the lock.
    
    The other is to set ->fi_had_conflict which is in the nfs4_file.
    This field is only ever set here (except when initialised to false)
    so there is no possible problem will multiple threads racing when
    setting it.
    
    The field is tested twice in nfs4_set_delegation().  The first test does
    not hold a lock and is documented as an opportunistic optimisation, so
    it doesn't impose any need to hold ->fi_lock while setting
    ->fi_had_conflict.
    
    The second test in nfs4_set_delegation() *is* make under ->fi_lock, so
    removing the locking when ->fi_had_conflict is set could make a change.
    The change could only be interesting if ->fi_had_conflict tested as
    false even though nfsd_break_one_deleg() ran before ->fi_lock was
    unlocked.  i.e. while hash_delegation_locked() was running.
    As hash_delegation_lock() doesn't interact in any way with nfs4_run_cb()
    there can be no importance to this interaction.
    
    So this patch removes the locking from nfsd_break_one_deleg() and moves
    the final test on ->fi_had_conflict out of the locked region to make it
    clear that locking isn't important to the test.  It is still tested
    *after* vfs_setlease() has succeeded.  This might be significant and as
    vfs_setlease() takes ->flc_lock, and nfsd_break_one_deleg() is called
    under ->flc_lock this "after" is a true ordering provided by a spinlock.
    
    Fixes: edcf9725150e ("nfsd: fix RELEASE_LOCKOWNER")
    Signed-off-by: NeilBrown <neilb@suse.de>
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nilfs2: fix data corruption in dsync block recovery for small block sizes [+ + +]

Author: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Date:   Wed Jan 24 21:19:36 2024 +0900

    nilfs2: fix data corruption in dsync block recovery for small block sizes
    
    commit 67b8bcbaed4777871bb0dcc888fb02a614a98ab1 upstream.
    
    The helper function nilfs_recovery_copy_block() of
    nilfs_recovery_dsync_blocks(), which recovers data from logs created by
    data sync writes during a mount after an unclean shutdown, incorrectly
    calculates the on-page offset when copying repair data to the file's page
    cache.  In environments where the block size is smaller than the page
    size, this flaw can cause data corruption and leak uninitialized memory
    bytes during the recovery process.
    
    Fix these issues by correcting this byte offset calculation on the page.
    
    Link: https://lkml.kernel.org/r/20240124121936.10575-1-konishi.ryusuke@gmail.com
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
    Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nilfs2: fix hang in nilfs_lookup_dirty_data_buffers() [+ + +]

Author: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Date:   Wed Jan 31 23:56:57 2024 +0900

    nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()
    
    commit 38296afe3c6ee07319e01bb249aa4bb47c07b534 upstream.
    
    Syzbot reported a hang issue in migrate_pages_batch() called by mbind()
    and nilfs_lookup_dirty_data_buffers() called in the log writer of nilfs2.
    
    While migrate_pages_batch() locks a folio and waits for the writeback to
    complete, the log writer thread that should bring the writeback to
    completion picks up the folio being written back in
    nilfs_lookup_dirty_data_buffers() that it calls for subsequent log
    creation and was trying to lock the folio.  Thus causing a deadlock.
    
    In the first place, it is unexpected that folios/pages in the middle of
    writeback will be updated and become dirty.  Nilfs2 adds a checksum to
    verify the validity of the log being written and uses it for recovery at
    mount, so data changes during writeback are suppressed.  Since this is
    broken, an unclean shutdown could potentially cause recovery to fail.
    
    Investigation revealed that the root cause is that the wait for writeback
    completion in nilfs_page_mkwrite() is conditional, and if the backing
    device does not require stable writes, data may be modified without
    waiting.
    
    Fix these issues by making nilfs_page_mkwrite() wait for writeback to
    finish regardless of the stable write requirement of the backing device.
    
    Link: https://lkml.kernel.org/r/20240131145657.4209-1-konishi.ryusuke@gmail.com
    Fixes: 1d1d1a767206 ("mm: only enforce stable page writes if the backing device requires it")
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
    Reported-by: syzbot+ee2ae68da3b22d04cd8d@syzkaller.appspotmail.com
    Closes: https://lkml.kernel.org/r/00000000000047d819061004ad6c@google.com
    Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nilfs2: fix potential bug in end_buffer_async_write [+ + +]

Author: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Date:   Sun Feb 4 01:16:45 2024 +0900

    nilfs2: fix potential bug in end_buffer_async_write
    
    commit 5bc09b397cbf1221f8a8aacb1152650c9195b02b upstream.
    
    According to a syzbot report, end_buffer_async_write(), which handles the
    completion of block device writes, may detect abnormal condition of the
    buffer async_write flag and cause a BUG_ON failure when using nilfs2.
    
    Nilfs2 itself does not use end_buffer_async_write().  But, the async_write
    flag is now used as a marker by commit 7f42ec394156 ("nilfs2: fix issue
    with race condition of competition between segments for dirty blocks") as
    a means of resolving double list insertion of dirty blocks in
    nilfs_lookup_dirty_data_buffers() and nilfs_lookup_node_buffers() and the
    resulting crash.
    
    This modification is safe as long as it is used for file data and b-tree
    node blocks where the page caches are independent.  However, it was
    irrelevant and redundant to also introduce async_write for segment summary
    and super root blocks that share buffers with the backing device.  This
    led to the possibility that the BUG_ON check in end_buffer_async_write
    would fail as described above, if independent writebacks of the backing
    device occurred in parallel.
    
    The use of async_write for segment summary buffers has already been
    removed in a previous change.
    
    Fix this issue by removing the manipulation of the async_write flag for
    the remaining super root block buffer.
    
    Link: https://lkml.kernel.org/r/20240203161645.4992-1-konishi.ryusuke@gmail.com
    Fixes: 7f42ec394156 ("nilfs2: fix issue with race condition of competition between segments for dirty blocks")
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
    Reported-by: syzbot+5c04210f7c7f897c1e7f@syzkaller.appspotmail.com
    Closes: https://lkml.kernel.org/r/00000000000019a97c05fd42f8c8@google.com
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nouveau/svm: fix kvcalloc() argument order [+ + +]

Author: Arnd Bergmann <arnd@arndb.de>
Date:   Mon Feb 12 12:22:17 2024 +0100

    nouveau/svm: fix kvcalloc() argument order
    
    [ Upstream commit 2c80a2b715df75881359d07dbaacff8ad411f40e ]
    
    The conversion to kvcalloc() mixed up the object size and count
    arguments, causing a warning:
    
    drivers/gpu/drm/nouveau/nouveau_svm.c: In function 'nouveau_svm_fault_buffer_ctor':
    drivers/gpu/drm/nouveau/nouveau_svm.c:1010:40: error: 'kvcalloc' sizes specified with 'sizeof' in the earlier argument and not in the later argument [-Werror=calloc-transposed-args]
     1010 |         buffer->fault = kvcalloc(sizeof(*buffer->fault), buffer->entries, GFP_KERNEL);
          |                                        ^
    drivers/gpu/drm/nouveau/nouveau_svm.c:1010:40: note: earlier argument should specify number of elements, later size of each element
    
    The behavior is still correct aside from the warning, but fixing it avoids
    the warnings and can help the compiler track the individual objects better.
    
    Fixes: 71e4bbca070e ("nouveau/svm: Use kvcalloc() instead of kvzalloc()")
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: Danilo Krummrich <dakr@redhat.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240212112230.1117284-1-arnd@kernel.org
    Signed-off-by: Sasha Levin <sashal@kernel.org>

nouveau: offload fence uevents work to workqueue [+ + +]

Author: Dave Airlie <airlied@redhat.com>
Date:   Mon Jan 29 11:26:45 2024 +1000

    nouveau: offload fence uevents work to workqueue
    
    commit 39126abc5e20611579602f03b66627d7cd1422f0 upstream.
    
    This should break the deadlock between the fctx lock and the irq lock.
    
    This offloads the processing off the work from the irq into a workqueue.
    
    Cc: linux-stable@vger.kernel.org
    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Link: https://patchwork.freedesktop.org/patch/576237/
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

of: property: Add in-ports/out-ports support to of_graph_get_port_parent() [+ + +]

Author: Saravana Kannan <saravanak@google.com>
Date:   Tue Feb 6 17:18:02 2024 -0800

    of: property: Add in-ports/out-ports support to of_graph_get_port_parent()
    
    commit 8f1e0d791b5281f3a38620bc7c57763dc551be15 upstream.
    
    Similar to the existing "ports" node name, coresight device tree bindings
    have added "in-ports" and "out-ports" as standard node names for a
    collection of ports.
    
    Add support for these name to of_graph_get_port_parent() so that
    remote-endpoint parsing can find the correct parent node for these
    coresight ports too.
    
    Signed-off-by: Saravana Kannan <saravanak@google.com>
    Link: https://lore.kernel.org/r/20240207011803.2637531-4-saravanak@google.com
    Signed-off-by: Rob Herring <robh@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

of: property: fix typo in io-channels [+ + +]

Author: Nuno Sa <nuno.sa@analog.com>
Date:   Tue Jan 23 16:14:22 2024 +0100

    of: property: fix typo in io-channels
    
    commit 8f7e917907385e112a845d668ae2832f41e64bf5 upstream.
    
    The property is io-channels and not io-channel. This was effectively
    preventing the devlink creation.
    
    Fixes: 8e12257dead7 ("of: property: Add device link support for iommus, mboxes and io-channels")
    Cc: stable@vger.kernel.org
    Signed-off-by: Nuno Sa <nuno.sa@analog.com>
    Reviewed-by: Saravana Kannan <saravanak@google.com>
    Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Link: https://lore.kernel.org/r/20240123-iio-backend-v7-1-1bff236b8693@analog.com
    Signed-off-by: Rob Herring <robh@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

of: property: Improve finding the consumer of a remote-endpoint property [+ + +]

Author: Saravana Kannan <saravanak@google.com>
Date:   Tue Feb 6 17:18:00 2024 -0800

    of: property: Improve finding the consumer of a remote-endpoint property
    
    [ Upstream commit f4653ec9861cd96a1a6a3258c4a807898ee8cf3c ]
    
    We have a more accurate function to find the right consumer of a
    remote-endpoint property instead of searching for a parent with
    compatible string property. So, use that instead. While at it, make the
    code to find the consumer a bit more flexible and based on the property
    being parsed.
    
    Fixes: f7514a663016 ("of: property: fw_devlink: Add support for remote-endpoint")
    Signed-off-by: Saravana Kannan <saravanak@google.com>
    Link: https://lore.kernel.org/r/20240207011803.2637531-2-saravanak@google.com
    Signed-off-by: Rob Herring <robh@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

of: property: Improve finding the supplier of a remote-endpoint property [+ + +]

Author: Saravana Kannan <saravanak@google.com>
Date:   Tue Feb 6 17:18:01 2024 -0800

    of: property: Improve finding the supplier of a remote-endpoint property
    
    [ Upstream commit 782bfd03c3ae2c0e6e01b661b8e18f1de50357be ]
    
    After commit 4a032827daa8 ("of: property: Simplify of_link_to_phandle()"),
    remote-endpoint properties created a fwnode link from the consumer device
    to the supplier endpoint. This is a tiny bit inefficient (not buggy) when
    trying to create device links or detecting cycles. So, improve this the
    same way we improved finding the consumer of a remote-endpoint property.
    
    Fixes: 4a032827daa8 ("of: property: Simplify of_link_to_phandle()")
    Signed-off-by: Saravana Kannan <saravanak@google.com>
    Link: https://lore.kernel.org/r/20240207011803.2637531-3-saravanak@google.com
    Signed-off-by: Rob Herring <robh@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

of: unittest: Fix compile in the non-dynamic case [+ + +]

Author: Christian A. Ehrhardt <lk@c--e.de>
Date:   Mon Jan 29 20:25:56 2024 +0100

    of: unittest: Fix compile in the non-dynamic case
    
    [ Upstream commit 607aad1e4356c210dbef9022955a3089377909b2 ]
    
    If CONFIG_OF_KOBJ is not set, a device_node does not contain a
    kobj and attempts to access the embedded kobj via kref_read break
    the compile.
    
    Replace affected kref_read calls with a macro that reads the
    refcount if it exists and returns 1 if there is no embedded kobj.
    
    Reported-by: kernel test robot <lkp@intel.com>
    Closes: https://lore.kernel.org/oe-kbuild-all/202401291740.VP219WIz-lkp@intel.com/
    Fixes: 4dde83569832 ("of: Fix double free in of_parse_phandle_with_args_map")
    Signed-off-by: Christian A. Ehrhardt <lk@c--e.de>
    Link: https://lore.kernel.org/r/20240129192556.403271-1-lk@c--e.de
    Signed-off-by: Rob Herring <robh@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

parisc: BTLB: Fix crash when setting up BTLB at CPU bringup [+ + +]

Author: Helge Deller <deller@gmx.de>
Date:   Wed Jan 31 13:37:25 2024 +0100

    parisc: BTLB: Fix crash when setting up BTLB at CPU bringup
    
    commit 913b9d443a0180cf0de3548f1ab3149378998486 upstream.
    
    When using hotplug and bringing up a 32-bit CPU, ask the firmware about the
    BTLB information to set up the static (block) TLB entries.
    
    For that write access to the static btlb_info struct is needed, but
    since it is marked __ro_after_init the kernel segfaults with missing
    write permissions.
    
    Fix the crash by dropping the __ro_after_init annotation.
    
    Fixes: e5ef93d02d6c ("parisc: BTLB: Initialize BTLB tables at CPU startup")
    Signed-off-by: Helge Deller <deller@gmx.de>
    Cc: <stable@vger.kernel.org> # v6.6+
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

parisc: Fix random data corruption from exception handler [+ + +]

Author: Helge Deller <deller@gmx.de>
Date:   Sat Jan 20 15:29:27 2024 +0100

    parisc: Fix random data corruption from exception handler
    
    commit 8b1d72395635af45410b66cc4c4ab37a12c4a831 upstream.
    
    The current exception handler implementation, which assists when accessing
    user space memory, may exhibit random data corruption if the compiler decides
    to use a different register than the specified register %r29 (defined in
    ASM_EXCEPTIONTABLE_REG) for the error code. If the compiler choose another
    register, the fault handler will nevertheless store -EFAULT into %r29 and thus
    trash whatever this register is used for.
    Looking at the assembly I found that this happens sometimes in emulate_ldd().
    
    To solve the issue, the easiest solution would be if it somehow is
    possible to tell the fault handler which register is used to hold the error
    code. Using %0 or %1 in the inline assembly is not posssible as it will show
    up as e.g. %r29 (with the "%r" prefix), which the GNU assembler can not
    convert to an integer.
    
    This patch takes another, better and more flexible approach:
    We extend the __ex_table (which is out of the execution path) by one 32-word.
    In this word we tell the compiler to insert the assembler instruction
    "or %r0,%r0,%reg", where %reg references the register which the compiler
    choosed for the error return code.
    In case of an access failure, the fault handler finds the __ex_table entry and
    can examine the opcode. The used register is encoded in the lowest 5 bits, and
    the fault handler can then store -EFAULT into this register.
    
    Since we extend the __ex_table to 3 words we can't use the BUILDTIME_TABLE_SORT
    config option any longer.
    
    Signed-off-by: Helge Deller <deller@gmx.de>
    Cc: <stable@vger.kernel.org> # v6.0+
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

parisc: Prevent hung tasks when printing inventory on serial console [+ + +]

Author: Helge Deller <deller@gmx.de>
Date:   Fri Jan 19 21:16:39 2024 +0100

    parisc: Prevent hung tasks when printing inventory on serial console
    
    commit c8708d758e715c3824a73bf0cda97292b52be44d upstream.
    
    Printing the inventory on a serial console can be quite slow and thus may
    trigger the hung task detector (CONFIG_DETECT_HUNG_TASK=y) and possibly
    reboot the machine. Adding a cond_resched() prevents this.
    
    Signed-off-by: Helge Deller <deller@gmx.de>
    Cc: <stable@vger.kernel.org> # v6.0+
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

PCI: Fix active state requirement in PME polling [+ + +]

Author: Alex Williamson <alex.williamson@redhat.com>
Date:   Tue Jan 23 11:55:31 2024 -0700

    PCI: Fix active state requirement in PME polling
    
    [ Upstream commit 41044d5360685e78a869d40a168491a70cdb7e73 ]
    
    The commit noted in fixes added a bogus requirement that runtime PM managed
    devices need to be in the RPM_ACTIVE state for PME polling.  In fact, only
    devices in low power states should be polled.
    
    However there's still a requirement that the device config space must be
    accessible, which has implications for both the current state of the polled
    device and the parent bridge, when present.  It's not sufficient to assume
    the bridge remains in D0 and cases have been observed where the bridge
    passes the D0 test, but the PM state indicates RPM_SUSPENDING and config
    space of the polled device becomes inaccessible during pci_pme_wakeup().
    
    Therefore, since the bridge is already effectively required to be in the
    RPM_ACTIVE state, formalize this in the code and elevate the PM usage count
    to maintain the state while polling the subordinate device.
    
    This resolves a regression reported in the bugzilla below where a
    Thunderbolt/USB4 hierarchy fails to scan for an attached NVMe endpoint
    downstream of a bridge in a D3hot power state.
    
    Link: https://lore.kernel.org/r/20240123185548.1040096-1-alex.williamson@redhat.com
    Fixes: d3fcd7360338 ("PCI: Fix runtime PM race with PME polling")
    Reported-by: Sanath S <sanath.s@amd.com>
    Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218360
    Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Tested-by: Sanath S <sanath.s@amd.com>
    Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>
    Cc: Lukas Wunner <lukas@wunner.de>
    Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

perf: CXL: fix mismatched cpmu event opcode [+ + +]

Author: Hojin Nam <hj96.nam@samsung.com>
Date:   Thu Feb 8 10:34:15 2024 +0900

    perf: CXL: fix mismatched cpmu event opcode
    
    [ Upstream commit 719da04f2d1285922abca72b074fb6fa75d464ea ]
    
    S2M NDR BI-ConflictAck opcode is described as 4 in the CXL
    r3.0 3.3.9 Table 3.43. However, it is defined as 3 in macro definition.
    
    Fixes: 5d7107c72796 ("perf: CXL Performance Monitoring Unit driver")
    Signed-off-by: Hojin Nam <hj96.nam@samsung.com>
    Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Link: https://lore.kernel.org/r/20240208013415epcms2p2904187c8a863f4d0d2adc980fb91a2dc@epcms2p2
    Signed-off-by: Will Deacon <will@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

pmdomain: core: Move the unused cleanup to a _sync initcall [+ + +]

Author: Konrad Dybcio <konrad.dybcio@linaro.org>
Date:   Wed Dec 27 16:21:24 2023 +0100

    pmdomain: core: Move the unused cleanup to a _sync initcall
    
    commit 741ba0134fa7822fcf4e4a0a537a5c4cfd706b20 upstream.
    
    The unused clock cleanup uses the _sync initcall to give all users at
    earlier initcalls time to probe. Do the same to avoid leaving some PDs
    dangling at "on" (which actually happened on qcom!).
    
    Fixes: 2fe71dcdfd10 ("PM / domains: Add late_initcall to disable unused PM domains")
    Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20231227-topic-pmdomain_sync_cleanup-v1-1-5f36769d538b@linaro.org
    Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

pmdomain: mediatek: fix race conditions with genpd [+ + +]

Author: Eugen Hristev <eugen.hristev@collabora.com>
Date:   Mon Dec 25 15:36:15 2023 +0200

    pmdomain: mediatek: fix race conditions with genpd
    
    commit c41336f4d69057cbf88fed47951379b384540df5 upstream.
    
    If the power domains are registered first with genpd and *after that*
    the driver attempts to power them on in the probe sequence, then it is
    possible that a race condition occurs if genpd tries to power them on
    in the same time.
    The same is valid for powering them off before unregistering them
    from genpd.
    Attempt to fix race conditions by first removing the domains from genpd
    and *after that* powering down domains.
    Also first power up the domains and *after that* register them
    to genpd.
    
    Fixes: 59b644b01cf4 ("soc: mediatek: Add MediaTek SCPSYS power domains")
    Signed-off-by: Eugen Hristev <eugen.hristev@collabora.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20231225133615.78993-1-eugen.hristev@collabora.com
    Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

pmdomain: renesas: r8a77980-sysc: CR7 must be always on [+ + +]

Author: Geert Uytterhoeven <geert+renesas@glider.be>
Date:   Fri Jan 12 17:33:55 2024 +0100

    pmdomain: renesas: r8a77980-sysc: CR7 must be always on
    
    commit f0e4a1356466ec1858ae8e5c70bea2ce5e55008b upstream.
    
    The power domain containing the Cortex-R7 CPU core on the R-Car V3H SoC
    must always be in power-on state, unlike on other SoCs in the R-Car Gen3
    family.  See Table 9.4 "Power domains" in the R-Car Series, 3rd
    Generation Hardware UserБ─≥s Manual Rev.1.00 and later.
    
    Fix this by marking the domain as a CPU domain without control
    registers, so the driver will not touch it.
    
    Fixes: 41d6d8bd8ae9 ("soc: renesas: rcar-sysc: add R8A77980 support")
    Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/fdad9a86132d53ecddf72b734dac406915c4edc0.1705076735.git.geert+renesas@glider.be
    Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc/64: Set task pt_regs->link to the LR value on scv entry [+ + +]

Author: Naveen N Rao <naveen@kernel.org>
Date:   Fri Feb 2 21:13:16 2024 +0530

    powerpc/64: Set task pt_regs->link to the LR value on scv entry
    
    commit aad98efd0b121f63a2e1c221dcb4d4850128c697 upstream.
    
    Nysal reported that userspace backtraces are missing in offcputime bcc
    tool. As an example:
        $ sudo ./bcc/tools/offcputime.py -uU
        Tracing off-CPU time (us) of user threads by user stack... Hit Ctrl-C to end.
    
        ^C
            write
            -                python (9107)
                8
    
            write
            -                sudo (9105)
                9
    
            mmap
            -                python (9107)
                16
    
            clock_nanosleep
            -                multipathd (697)
                3001604
    
    The offcputime bcc tool attaches a bpf program to a kprobe on
    finish_task_switch(), which is usually hit on a syscall from userspace.
    With the switch to system call vectored, we started setting
    pt_regs->link to zero. This is because system call vectored behaves like
    a function call with LR pointing to the system call return address, and
    with no modification to SRR0/SRR1. The LR value does indicate our next
    instruction, so it is being saved as pt_regs->nip, and pt_regs->link is
    being set to zero. This is not a problem by itself, but BPF uses perf
    callchain infrastructure for capturing stack traces, and that stores LR
    as the second entry in the stack trace. perf has code to cope with the
    second entry being zero, and skips over it. However, generic userspace
    unwinders assume that a zero entry indicates end of the stack trace,
    resulting in a truncated userspace stack trace.
    
    Rather than fixing all userspace unwinders to ignore/skip past the
    second entry, store the real LR value in pt_regs->link so that there
    continues to be a valid, though duplicate entry in the stack trace.
    
    With this change:
        $ sudo ./bcc/tools/offcputime.py -uU
        Tracing off-CPU time (us) of user threads by user stack... Hit Ctrl-C to end.
    
        ^C
            write
            write
            [unknown]
            [unknown]
            [unknown]
            [unknown]
            [unknown]
            PyObject_VectorcallMethod
            [unknown]
            [unknown]
            PyObject_CallOneArg
            PyFile_WriteObject
            PyFile_WriteString
            [unknown]
            [unknown]
            PyObject_Vectorcall
            _PyEval_EvalFrameDefault
            PyEval_EvalCode
            [unknown]
            [unknown]
            [unknown]
            _PyRun_SimpleFileObject
            _PyRun_AnyFileObject
            Py_RunMain
            [unknown]
            Py_BytesMain
            [unknown]
            __libc_start_main
            -                python (1293)
                7
    
            write
            write
            [unknown]
            sudo_ev_loop_v1
            sudo_ev_dispatch_v1
            [unknown]
            [unknown]
            [unknown]
            [unknown]
            __libc_start_main
            -                sudo (1291)
                7
    
            syscall
            syscall
            bpf_open_perf_buffer_opts
            [unknown]
            [unknown]
            [unknown]
            [unknown]
            _PyObject_MakeTpCall
            PyObject_Vectorcall
            _PyEval_EvalFrameDefault
            PyEval_EvalCode
            [unknown]
            [unknown]
            [unknown]
            _PyRun_SimpleFileObject
            _PyRun_AnyFileObject
            Py_RunMain
            [unknown]
            Py_BytesMain
            [unknown]
            __libc_start_main
            -                python (1293)
                11
    
            clock_nanosleep
            clock_nanosleep
            nanosleep
            sleep
            [unknown]
            [unknown]
            __clone
            -                multipathd (698)
                3001661
    
    Fixes: 7fa95f9adaee ("powerpc/64s: system call support for scv/rfscv instructions")
    Cc: stable@vger.kernel.org
    Reported-by: "Nysal Jan K.A" <nysal@linux.ibm.com>
    Signed-off-by: Naveen N Rao <naveen@kernel.org>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://msgid.link/20240202154316.395276-1-naveen@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc/6xx: set High BAT Enable flag on G2_LE cores [+ + +]

Author: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Date:   Wed Jan 24 11:38:38 2024 +0100

    powerpc/6xx: set High BAT Enable flag on G2_LE cores
    
    [ Upstream commit a038a3ff8c6582404834852c043dadc73a5b68b4 ]
    
    MMU_FTR_USE_HIGH_BATS is set for G2_LE cores and derivatives like e300cX,
    but the high BATs need to be enabled in HID2 to work. Add register
    definitions and add the needed setup to __setup_cpu_603.
    
    This fixes boot on CPUs like the MPC5200B with STRICT_KERNEL_RWX enabled
    on systems where the flag has not been set by the bootloader already.
    
    Fixes: e4d6654ebe6e ("powerpc/mm/32s: rework mmu_mapin_ram()")
    Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
    Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://msgid.link/20240124103838.43675-1-matthias.schiffer@ew.tq-group.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

powerpc/cputable: Add missing PPC_FEATURE_BOOKE on PPC64 Book-E [+ + +]

Author: David Engraf <david.engraf@sysgo.com>
Date:   Wed Feb 7 10:27:58 2024 +0100

    powerpc/cputable: Add missing PPC_FEATURE_BOOKE on PPC64 Book-E
    
    commit eb6d871f4ba49ac8d0537e051fe983a3a4027f61 upstream.
    
    Commit e320a76db4b0 ("powerpc/cputable: Split cpu_specs[] out of
    cputable.h") moved the cpu_specs to separate header files. Previously
    PPC_FEATURE_BOOKE was enabled by CONFIG_PPC_BOOK3E_64. The definition in
    cpu_specs_e500mc.h for PPC64 no longer enables PPC_FEATURE_BOOKE.
    
    This breaks user space reading the ELF hwcaps and expect
    PPC_FEATURE_BOOKE. Debugging an application with gdb is no longer
    working on e5500/e6500 because the 64-bit detection relies on
    PPC_FEATURE_BOOKE for Book-E.
    
    Fixes: e320a76db4b0 ("powerpc/cputable: Split cpu_specs[] out of cputable.h")
    Cc: stable@vger.kernel.org # v6.1+
    Signed-off-by: David Engraf <david.engraf@sysgo.com>
    Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://msgid.link/20240207092758.1058893-1-david.engraf@sysgo.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc/kasan: Fix addr error caused by page alignment [+ + +]

Author: Jiangfeng Xiao <xiaojiangfeng@huawei.com>
Date:   Tue Jan 23 09:45:59 2024 +0800

    powerpc/kasan: Fix addr error caused by page alignment
    
    [ Upstream commit 4a7aee96200ad281a5cc4cf5c7a2e2a49d2b97b0 ]
    
    In kasan_init_region, when k_start is not page aligned, at the begin of
    for loop, k_cur = k_start & PAGE_MASK is less than k_start, and then
    `va = block + k_cur - k_start` is less than block, the addr va is invalid,
    because the memory address space from va to block is not alloced by
    memblock_alloc, which will not be reserved by memblock_reserve later, it
    will be used by other places.
    
    As a result, memory overwriting occurs.
    
    for example:
    int __init __weak kasan_init_region(void *start, size_t size)
    {
    [...]
            /* if say block(dcd97000) k_start(feef7400) k_end(feeff3fe) */
            block = memblock_alloc(k_end - k_start, PAGE_SIZE);
            [...]
            for (k_cur = k_start & PAGE_MASK; k_cur < k_end; k_cur += PAGE_SIZE) {
                    /* at the begin of for loop
                     * block(dcd97000) va(dcd96c00) k_cur(feef7000) k_start(feef7400)
                     * va(dcd96c00) is less than block(dcd97000), va is invalid
                     */
                    void *va = block + k_cur - k_start;
                    [...]
            }
    [...]
    }
    
    Therefore, page alignment is performed on k_start before
    memblock_alloc() to ensure the validity of the VA address.
    
    Fixes: 663c0c9496a6 ("powerpc/kasan: Fix shadow area set up for modules.")
    Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://msgid.link/1705974359-43790-1-git-send-email-xiaojiangfeng@huawei.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

powerpc/kasan: Limit KASAN thread size increase to 32KB [+ + +]

Author: Michael Ellerman <mpe@ellerman.id.au>
Date:   Mon Feb 12 17:42:44 2024 +1100

    powerpc/kasan: Limit KASAN thread size increase to 32KB
    
    [ Upstream commit f1acb109505d983779bbb7e20a1ee6244d2b5736 ]
    
    KASAN is seen to increase stack usage, to the point that it was reported
    to lead to stack overflow on some 32-bit machines (see link).
    
    To avoid overflows the stack size was doubled for KASAN builds in
    commit 3e8635fb2e07 ("powerpc/kasan: Force thread size increase with
    KASAN").
    
    However with a 32KB stack size to begin with, the doubling leads to a
    64KB stack, which causes build errors:
      arch/powerpc/kernel/switch.S:249: Error: operand out of range (0x000000000000fe50 is not between 0xffffffffffff8000 and 0x0000000000007fff)
    
    Although the asm could be reworked, in practice a 32KB stack seems
    sufficient even for KASAN builds - the additional usage seems to be in
    the 2-3KB range for a 64-bit KASAN build.
    
    So only increase the stack for KASAN if the stack size is < 32KB.
    
    Fixes: 18f14afe2816 ("powerpc/64s: Increase default stack size to 32KB")
    Reported-by: Spoorthy <spoorthy@linux.ibm.com>
    Reported-by: Benjamin Gray <bgray@linux.ibm.com>
    Reviewed-by: Benjamin Gray <bgray@linux.ibm.com>
    Link: https://lore.kernel.org/linuxppc-dev/bug-207129-206035@https.bugzilla.kernel.org%2F/
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://msgid.link/20240212064244.3924505-1-mpe@ellerman.id.au
    Signed-off-by: Sasha Levin <sashal@kernel.org>

powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add [+ + +]

Author: Gaurav Batra <gbatra@linux.ibm.com>
Date:   Mon Jan 22 16:24:07 2024 -0600

    powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add
    
    [ Upstream commit ed8b94f6e0acd652ce69bd69d678a0c769172df8 ]
    
    When a PCI device is dynamically added, the kernel oopses with a NULL
    pointer dereference:
    
      BUG: Kernel NULL pointer dereference on read at 0x00000030
      Faulting instruction address: 0xc0000000006bbe5c
      Oops: Kernel access of bad area, sig: 11 [#1]
      LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
      Modules linked in: rpadlpar_io rpaphp rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs xsk_diag bonding nft_compat nf_tables nfnetlink rfkill binfmt_misc dm_multipath rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_umad ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm mlx5_ib ib_uverbs ib_core pseries_rng drm drm_panel_orientation_quirks xfs libcrc32c mlx5_core mlxfw sd_mod t10_pi sg tls ibmvscsi ibmveth scsi_transport_srp vmx_crypto pseries_wdt psample dm_mirror dm_region_hash dm_log dm_mod fuse
      CPU: 17 PID: 2685 Comm: drmgr Not tainted 6.7.0-203405+ #66
      Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_008) hv:phyp pSeries
      NIP:  c0000000006bbe5c LR: c000000000a13e68 CTR: c0000000000579f8
      REGS: c00000009924f240 TRAP: 0300   Not tainted  (6.7.0-203405+)
      MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24002220  XER: 20040006
      CFAR: c000000000a13e64 DAR: 0000000000000030 DSISR: 40000000 IRQMASK: 0
      ...
      NIP sysfs_add_link_to_group+0x34/0x94
      LR  iommu_device_link+0x5c/0x118
      Call Trace:
       iommu_init_device+0x26c/0x318 (unreliable)
       iommu_device_link+0x5c/0x118
       iommu_init_device+0xa8/0x318
       iommu_probe_device+0xc0/0x134
       iommu_bus_notifier+0x44/0x104
       notifier_call_chain+0xb8/0x19c
       blocking_notifier_call_chain+0x64/0x98
       bus_notify+0x50/0x7c
       device_add+0x640/0x918
       pci_device_add+0x23c/0x298
       of_create_pci_dev+0x400/0x884
       of_scan_pci_dev+0x124/0x1b0
       __of_scan_bus+0x78/0x18c
       pcibios_scan_phb+0x2a4/0x3b0
       init_phb_dynamic+0xb8/0x110
       dlpar_add_slot+0x170/0x3b8 [rpadlpar_io]
       add_slot_store.part.0+0xb4/0x130 [rpadlpar_io]
       kobj_attr_store+0x2c/0x48
       sysfs_kf_write+0x64/0x78
       kernfs_fop_write_iter+0x1b0/0x290
       vfs_write+0x350/0x4a0
       ksys_write+0x84/0x140
       system_call_exception+0x124/0x330
       system_call_vectored_common+0x15c/0x2ec
    
    Commit a940904443e4 ("powerpc/iommu: Add iommu_ops to report capabilities
    and allow blocking domains") broke DLPAR add of PCI devices.
    
    The above added iommu_device structure to pci_controller. During
    system boot, PCI devices are discovered and this newly added iommu_device
    structure is initialized by a call to iommu_device_register().
    
    During DLPAR add of a PCI device, a new pci_controller structure is
    allocated but there are no calls made to iommu_device_register()
    interface.
    
    Fix is to register the iommu device during DLPAR add as well.
    
    Fixes: a940904443e4 ("powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains")
    Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com>
    [mpe: Trim oops and tweak some change log wording]
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://msgid.link/20240122222407.39603-1-gbatra@linux.ibm.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

powerpc/pseries: fix accuracy of stolen time [+ + +]

Author: Shrikanth Hegde <sshegde@linux.ibm.com>
Date:   Tue Feb 13 10:56:35 2024 +0530

    powerpc/pseries: fix accuracy of stolen time
    
    commit cbecc9fcbbec60136b0180ba0609c829afed5c81 upstream.
    
    powerVM hypervisor updates the VPA fields with stolen time data.
    It currently reports enqueue_dispatch_tb and ready_enqueue_tb for
    this purpose. In linux these two fields are used to report the stolen time.
    
    The VPA fields are updated at the TB frequency. On powerPC its mostly
    set at 512Mhz. Hence this needs a conversion to ns when reporting it
    back as rest of the kernel timings are in ns. This conversion is already
    handled in tb_to_ns function. So use that function to report accurate
    stolen time.
    
    Observed this issue and used an Capped Shared Processor LPAR(SPLPAR) to
    simplify the experiments. In all these cases, 100% VP Load is run using
    stress-ng workload. Values of stolen time is in percentages as reported
    by mpstat. With the patch values are close to expected.
    
                    6.8.rc1         +Patch
    12EC/12VP          0.0             0.0
    12EC/24VP         25.7            50.2
    12EC/36VP         37.3            69.2
    12EC/48VP         38.5            78.3
    
    Fixes: 0e8a63132800 ("powerpc/pseries: Implement CONFIG_PARAVIRT_TIME_ACCOUNTING")
    Cc: stable@vger.kernel.org # v6.1+
    Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
    Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
    Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://msgid.link/20240213052635.231597-1-sshegde@linux.ibm.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ptrace: Introduce exception_ip arch hook [+ + +]

Author: Jiaxun Yang <jiaxun.yang@flygoat.com>
Date:   Fri Feb 2 12:30:26 2024 +0000

    ptrace: Introduce exception_ip arch hook
    
    [ Upstream commit 11ba1728be3edb6928791f4c622f154ebe228ae6 ]
    
    On architectures with delay slot, architecture level instruction
    pointer (or program counter) in pt_regs may differ from where
    exception was triggered.
    
    Introduce exception_ip hook to invoke architecture code and determine
    actual instruction pointer to the exception.
    
    Link: https://lore.kernel.org/lkml/00d1b813-c55f-4365-8d81-d70258e10b16@app.fastmail.com/
    Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
    Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Stable-dep-of: 8fa507083388 ("mm/memory: Use exception ip to search exception tables")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Revert "drm/amd: flush any delayed gfxoff on suspend entry" [+ + +]

Author: Mario Limonciello <mario.limonciello@amd.com>
Date:   Wed Feb 7 23:52:54 2024 -0600

    Revert "drm/amd: flush any delayed gfxoff on suspend entry"
    
    commit 916361685319098f696b798ef1560f69ed96e934 upstream.
    
    commit ab4750332dbe ("drm/amdgpu/sdma5.2: add begin/end_use ring
    callbacks") caused GFXOFF control to be used more heavily and the
    codepath that was removed from commit 0dee72639533 ("drm/amd: flush any
    delayed gfxoff on suspend entry") now can be exercised at suspend again.
    
    Users report that by using GNOME to suspend the lockscreen trigger will
    cause SDMA traffic and the system can deadlock.
    
    This reverts commit 0dee726395333fea833eaaf838bc80962df886c8.
    
    Acked-by: Alex Deucher <alexander.deucher@amd.com>
    Fixes: ab4750332dbe ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks")
    Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "drm/msm/gpu: Push gpu lock down past runpm" [+ + +]

Author: Rob Clark <robdclark@chromium.org>
Date:   Tue Jan 9 10:22:17 2024 -0800

    Revert "drm/msm/gpu: Push gpu lock down past runpm"
    
    commit 917e9b7c2350e3e53162fcf5035e5f2d68e2cbed upstream.
    
    This reverts commit abe2023b4cea192ab266b351fd38dc9dbd846df0.
    
    Changing the locking order means that scheduler/msm_job_run() can race
    with the recovery kthread worker, with the result that the GPU gets an
    extra runpm get when we are trying to power it off.  Leaving the GPU in
    an unrecovered state.
    
    I'll need to come up with a different scheme for appeasing lockdep.
    
    Signed-off-by: Rob Clark <robdclark@chromium.org>
    Patchwork: https://patchwork.freedesktop.org/patch/573835/
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "eventfs: Check for NULL ef in eventfs_set_attr()" [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:07 2024 -0500

    Revert "eventfs: Check for NULL ef in eventfs_set_attr()"
    
    This reverts commit d8f492a059728bbd397defbc9b8d2f4159d869b5.
    
    The eventfs was not designed properly and may have some hidden bugs in it.
    Linus rewrote it properly and I trust his version more than this one. Revert
    the backported patches for 6.6 and re-apply all the changes to make it
    equivalent to Linus's version.
    
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "eventfs: Delete eventfs_inode when the last dentry is freed" [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:09 2024 -0500

    Revert "eventfs: Delete eventfs_inode when the last dentry is freed"
    
    This reverts commit ea4c30a0a73fb5cb2604539db550f1e620bb949c.
    
    The eventfs was not designed properly and may have some hidden bugs in it.
    Linus rewrote it properly and I trust his version more than this one. Revert
    the backported patches for 6.6 and re-apply all the changes to make it
    equivalent to Linus's version.
    
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "eventfs: Do not allow NULL parent to eventfs_start_creating()" [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:06 2024 -0500

    Revert "eventfs: Do not allow NULL parent to eventfs_start_creating()"
    
    This reverts commit 6abb8c223ce12078a0f2c129656a13338dfe960b.
    
    The eventfs was not designed properly and may have some hidden bugs in it.
    Linus rewrote it properly and I trust his version more than this one. Revert
    the backported patches for 6.6 and re-apply all the changes to make it
    equivalent to Linus's version.
    
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "eventfs: Remove "is_freed" union with rcu head" [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:11 2024 -0500

    Revert "eventfs: Remove "is_freed" union with rcu head"
    
    This reverts commit fa18a8a0539b02cc621938091691f0b73f0b1288.
    
    The eventfs was not designed properly and may have some hidden bugs in it.
    Linus rewrote it properly and I trust his version more than this one. Revert
    the backported patches for 6.6 and re-apply all the changes to make it
    equivalent to Linus's version.
    
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "eventfs: Save ownership and mode" [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:10 2024 -0500

    Revert "eventfs: Save ownership and mode"
    
    This reverts commit 9aaee3eebc91dd9ccebf6b6bc8a5f59d04ef718b.
    
    The eventfs was not designed properly and may have some hidden bugs in it.
    Linus rewrote it properly and I trust his version more than this one. Revert
    the backported patches for 6.6 and re-apply all the changes to make it
    equivalent to Linus's version.
    
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "eventfs: Use simple_recursive_removal() to clean up dentries" [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:08 2024 -0500

    Revert "eventfs: Use simple_recursive_removal() to clean up dentries"
    
    This reverts commit 055907ad2c14838c90d63297f7bab8d180a5d844.
    
    The eventfs was not designed properly and may have some hidden bugs in it.
    Linus rewrote it properly and I trust his version more than this one. Revert
    the backported patches for 6.6 and re-apply all the changes to make it
    equivalent to Linus's version.
    
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "kobject: Remove redundant checks for whether ktype is NULL" [+ + +]

Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Thu Feb 8 16:02:50 2024 +0000

    Revert "kobject: Remove redundant checks for whether ktype is NULL"
    
    [ Upstream commit 3ca8fbabcceb8bfe44f7f50640092fd8f1de375c ]
    
    This reverts commit 1b28cb81dab7c1eedc6034206f4e8d644046ad31.
    
    It is reported to cause problems, so revert it for now until the root
    cause can be found.
    
    Reported-by: kernel test robot <oliver.sang@intel.com>
    Fixes: 1b28cb81dab7 ("kobject: Remove redundant checks for whether ktype is NULL")
    Cc: Zhen Lei <thunder.leizhen@huawei.com>
    Closes: https://lore.kernel.org/oe-lkp/202402071403.e302e33a-oliver.sang@intel.com
    Link: https://lore.kernel.org/r/2024020849-consensus-length-6264@gregkh
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Revert "powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add" [+ + +]

Author: Michael Ellerman <mpe@ellerman.id.au>
Date:   Wed Feb 14 11:00:41 2024 +1100

    Revert "powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add"
    
    commit 1fba2bf8e9d5a27b7394856181b6200de7260b79 upstream.
    
    This reverts commit ed8b94f6e0acd652ce69bd69d678a0c769172df8.
    
    Gaurav reported that there are still problems with the patch and it
    should be reverted pending a fuller fix.
    
    Link: https://lore.kernel.org/all/4f6fc1ac-7a76-4447-9d0e-f55c0be373f8@linux.ibm.com/
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "workqueue: Override implicit ordered attribute in workqueue_apply_unbound_cpumask()" [+ + +]

Author: Tejun Heo <tj@kernel.org>
Date:   Mon Feb 5 15:43:41 2024 -1000

    Revert "workqueue: Override implicit ordered attribute in workqueue_apply_unbound_cpumask()"
    
    commit aac8a59537dfc704ff344f1aacfd143c089ee20f upstream.
    
    This reverts commit ca10d851b9ad0338c19e8e3089e24d565ebfffd7.
    
    The commit allowed workqueue_apply_unbound_cpumask() to clear __WQ_ORDERED
    on now removed implicitly ordered workqueues. This was incorrect in that
    system-wide config change shouldn't break ordering properties of all
    workqueues. The reason why apply_workqueue_attrs() path was allowed to do so
    was because it was targeting the specific workqueue - either the workqueue
    had WQ_SYSFS set or the workqueue user specifically tried to change
    max_active, both of which indicate that the workqueue doesn't need to be
    ordered.
    
    The implicitly ordered workqueue promotion was removed by the previous
    commit 3bc1e711c26b ("workqueue: Don't implicitly make UNBOUND workqueues w/
    @max_active==1 ordered"). However, it didn't update this path and broke
    build. Let's revert the commit which was incorrect in the first place which
    also fixes build.
    
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Fixes: 3bc1e711c26b ("workqueue: Don't implicitly make UNBOUND workqueues w/ @max_active==1 ordered")
    Fixes: ca10d851b9ad ("workqueue: Override implicit ordered attribute in workqueue_apply_unbound_cpumask()")
    Cc: stable@vger.kernel.org # v6.6+
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ring-buffer: Clean ring_buffer_poll_wait() error return [+ + +]

Author: Vincent Donnefort <vdonnefort@google.com>
Date:   Wed Jan 31 14:09:55 2024 +0000

    ring-buffer: Clean ring_buffer_poll_wait() error return
    
    commit 66bbea9ed6446b8471d365a22734dc00556c4785 upstream.
    
    The return type for ring_buffer_poll_wait() is __poll_t. This is behind
    the scenes an unsigned where we can set event bits. In case of a
    non-allocated CPU, we do return instead -EINVAL (0xffffffea). Lucky us,
    this ends up setting few error bits (EPOLLERR | EPOLLHUP | EPOLLNVAL), so
    user-space at least is aware something went wrong.
    
    Nonetheless, this is an incorrect code. Replace that -EINVAL with a
    proper EPOLLERR to clean that output. As this doesn't change the
    behaviour, there's no need to treat this change as a bug fix.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20240131140955.3322792-1-vdonnefort@google.com
    
    Cc: stable@vger.kernel.org
    Fixes: 6721cb6002262 ("ring-buffer: Do not poll non allocated cpu buffers")
    Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

s390/qeth: Fix potential loss of L3-IP@ in case of network issues [+ + +]

Author: Alexandra Winter <wintera@linux.ibm.com>
Date:   Tue Feb 6 09:58:49 2024 +0100

    s390/qeth: Fix potential loss of L3-IP@ in case of network issues
    
    commit 2fe8a236436fe40d8d26a1af8d150fc80f04ee1a upstream.
    
    Symptom:
    In case of a bad cable connection (e.g. dirty optics) a fast sequence of
    network DOWN-UP-DOWN-UP could happen. UP triggers recovery of the qeth
    interface. In case of a second DOWN while recovery is still ongoing, it
    can happen that the IP@ of a Layer3 qeth interface is lost and will not
    be recovered by the second UP.
    
    Problem:
    When registration of IP addresses with Layer 3 qeth devices fails, (e.g.
    because of bad address format) the respective IP address is deleted from
    its hash-table in the driver. If registration fails because of a ENETDOWN
    condition, the address should stay in the hashtable, so a subsequent
    recovery can restore it.
    
    3caa4af834df ("qeth: keep ip-address after LAN_OFFLINE failure")
    fixes this for registration failures during normal operation, but not
    during recovery.
    
    Solution:
    Keep L3-IP address in case of ENETDOWN in qeth_l3_recover_ip(). For
    consistency with qeth_l3_add_ip() we also keep it in case of EADDRINUSE,
    i.e. for some reason the card already/still has this address registered.
    
    Fixes: 4a71df50047f ("qeth: new qeth device driver")
    Cc: stable@vger.kernel.org
    Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
    Link: https://lore.kernel.org/r/20240206085849.2902775-1-wintera@linux.ibm.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

sched/membarrier: reduce the ability to hammer on sys_membarrier [+ + +]

Author: Linus Torvalds <torvalds@linuxfoundation.org>
Date:   Sun Feb 4 15:25:12 2024 +0000

    sched/membarrier: reduce the ability to hammer on sys_membarrier
    
    commit 944d5fe50f3f03daacfea16300e656a1691c4a23 upstream.
    
    On some systems, sys_membarrier can be very expensive, causing overall
    slowdowns for everything.  So put a lock on the path in order to
    serialize the accesses to prevent the ability for this to be called at
    too high of a frequency and saturate the machine.
    
    Reviewed-and-tested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Acked-by: Borislav Petkov <bp@alien8.de>
    Fixes: 22e4ebb97582 ("membarrier: Provide expedited private command")
    Fixes: c5f58bd58f43 ("membarrier: Provide GLOBAL_EXPEDITED command")
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scs: add CONFIG_MMU dependency for vfree_atomic() [+ + +]

Author: Samuel Holland <samuel.holland@sifive.com>
Date:   Mon Jan 22 09:52:01 2024 -0800

    scs: add CONFIG_MMU dependency for vfree_atomic()
    
    commit 6f9dc684cae638dda0570154509884ee78d0f75c upstream.
    
    The shadow call stack implementation fails to build without CONFIG_MMU:
    
      ld.lld: error: undefined symbol: vfree_atomic
      >>> referenced by scs.c
      >>>               kernel/scs.o:(scs_free) in archive vmlinux.a
    
    Link: https://lkml.kernel.org/r/20240122175204.2371009-1-samuel.holland@sifive.com
    Fixes: a2abe7cbd8fe ("scs: switch to vmapped shadow stacks")
    Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
    Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
    Cc: Will Deacon <will@kernel.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: Revert "scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock" [+ + +]

Author: Lee Duncan <lduncan@suse.com>
Date:   Fri Feb 9 10:07:34 2024 -0800

    scsi: Revert "scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock"
    
    commit 977fe773dcc7098d8eaf4ee6382cb51e13e784cb upstream.
    
    This reverts commit 1a1975551943f681772720f639ff42fbaa746212.
    
    This commit causes interrupts to be lost for FCoE devices, since it changed
    sping locks from "bh" to "irqsave".
    
    Instead, a work queue should be used, and will be addressed in a separate
    commit.
    
    Fixes: 1a1975551943 ("scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock")
    Signed-off-by: Lee Duncan <lduncan@suse.com>
    Link: https://lore.kernel.org/r/c578cdcd46b60470535c4c4a953e6a1feca0dffd.1707500786.git.lduncan@suse.com
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: storvsc: Fix ring buffer size calculation [+ + +]

Author: Michael Kelley <mhklinux@outlook.com>
Date:   Mon Jan 22 09:09:56 2024 -0800

    scsi: storvsc: Fix ring buffer size calculation
    
    commit f4469f3858352ad1197434557150b1f7086762a0 upstream.
    
    Current code uses the specified ring buffer size (either the default of 128
    Kbytes or a module parameter specified value) to encompass the one page
    ring buffer header plus the actual ring itself.  When the page size is 4K,
    carving off one page for the header isn't significant.  But when the page
    size is 64K on ARM64, only half of the default 128 Kbytes is left for the
    actual ring.  While this doesn't break anything, the smaller ring size
    could be a performance bottleneck.
    
    Fix this by applying the VMBUS_RING_SIZE macro to the specified ring buffer
    size.  This macro adds a page for the header, and rounds up the size to a
    page boundary, using the page size for which the kernel is built.  Use this
    new size for subsequent ring buffer calculations.  For example, on ARM64
    with 64K page size and the default ring size, this results in the actual
    ring being 128 Kbytes, which is intended.
    
    Cc: stable@vger.kernel.org # 5.15.x
    Signed-off-by: Michael Kelley <mhklinux@outlook.com>
    Link: https://lore.kernel.org/r/20240122170956.496436-1-mhklinux@outlook.com
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests/landlock: Fix fs_test build with old libc [+ + +]

Author: Hu Yadi <hu.yadi@h3c.com>
Date:   Wed Jan 24 10:29:08 2024 +0800

    selftests/landlock: Fix fs_test build with old libc
    
    [ Upstream commit 40b7835e74e0383be308d528c5e0e41b3bf72ade ]
    
    One issue comes up while building selftest/landlock/fs_test on my side
    (gcc 7.3/glibc-2.28/kernel-4.19).
    
    gcc -Wall -O2 -isystem   fs_test.c -lcap -o selftests/landlock/fs_test
    fs_test.c:4575:9: error: initializer element is not constant
      .mnt = mnt_tmp,
             ^~~~~~~
    
    Signed-off-by: Hu Yadi <hu.yadi@h3c.com>
    Suggested-by: Jiao <jiaoxupo@h3c.com>
    Reviewed-by: Berlin <berlin@h3c.com>
    Link: https://lore.kernel.org/r/20240124022908.42100-1-hu.yadi@h3c.com
    Fixes: 04f9070e99a4 ("selftests/landlock: Add tests for pseudo filesystems")
    [mic: Factor out mount's data string and make mnt_tmp static]
    Signed-off-by: Mickaц╚l Salaц╪n <mic@digikod.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

selftests/mm: ksm_tests should only MADV_HUGEPAGE valid memory [+ + +]

Author: Ryan Roberts <ryan.roberts@arm.com>
Date:   Mon Jan 22 12:05:54 2024 +0000

    selftests/mm: ksm_tests should only MADV_HUGEPAGE valid memory
    
    commit d021b442cf312664811783e92b3d5e4548e92a53 upstream.
    
    ksm_tests was previously mmapping a region of memory, aligning the
    returned pointer to a PMD boundary, then setting MADV_HUGEPAGE, but was
    setting it past the end of the mmapped area due to not taking the pointer
    alignment into consideration.  Fix this behaviour.
    
    Up until commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
    boundaries"), this buggy behavior was (usually) masked because the
    alignment difference was always less than PMD-size.  But since the
    mentioned commit, `ksm_tests -H -s 100` started failing.
    
    Link: https://lkml.kernel.org/r/20240122120554.3108022-1-ryan.roberts@arm.com
    Fixes: 325254899684 ("selftests: vm: add KSM huge pages merging time test")
    Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
    Cc: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests/mm: switch to bash from sh [+ + +]

Author: Muhammad Usama Anjum <usama.anjum@collabora.com>
Date:   Tue Jan 16 14:04:54 2024 +0500

    selftests/mm: switch to bash from sh
    
    commit bc29036e1da1cf66e5f8312649aeec2d51ea3d86 upstream.
    
    Running charge_reserved_hugetlb.sh generates errors if sh is set to
    dash:
    
    ./charge_reserved_hugetlb.sh: 9: [[: not found
    ./charge_reserved_hugetlb.sh: 19: [[: not found
    ./charge_reserved_hugetlb.sh: 27: [[: not found
    ./charge_reserved_hugetlb.sh: 37: [[: not found
    ./charge_reserved_hugetlb.sh: 45: Syntax error: "(" unexpected
    
    Switch to using /bin/bash instead of /bin/sh.  Make the switch for
    write_hugetlb_memory.sh as well which is called from
    charge_reserved_hugetlb.sh.
    
    Link: https://lkml.kernel.org/r/20240116090455.3407378-1-usama.anjum@collabora.com
    Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
    Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: David Laight <David.Laight@ACULAB.COM>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests/mm: Update va_high_addr_switch.sh to check CPU for la57 flag [+ + +]

Author: Audra Mitchell <audra@redhat.com>
Date:   Fri Jan 19 15:58:01 2024 -0500

    selftests/mm: Update va_high_addr_switch.sh to check CPU for la57 flag
    
    commit 52e63d67b5bb423b33d7a262ac7f8bd375a90145 upstream.
    
    In order for the page table level 5 to be in use, the CPU must have the
    setting enabled in addition to the CONFIG option. Check for the flag to be
    set to avoid false test failures on systems that do not have this cpu flag
    set.
    
    The test does a series of mmap calls including three using the
    MAP_FIXED flag and specifying an address that is 1<<47 or 1<<48.  These
    addresses are only available if you are using level 5 page tables,
    which requires both the CPU to have the capabiltiy (la57 flag) and the
    kernel to be configured.  Currently the test only checks for the kernel
    configuration option, so this test can still report a false positive.
    Here are the three failing lines:
    
    $ ./va_high_addr_switch | grep FAILED
    mmap(ADDR_SWITCH_HINT, 2 * PAGE_SIZE, MAP_FIXED): 0xffffffffffffffff - FAILED
    mmap(HIGH_ADDR, MAP_FIXED): 0xffffffffffffffff - FAILED
    mmap(ADDR_SWITCH_HINT, 2 * PAGE_SIZE, MAP_FIXED): 0xffffffffffffffff - FAILED
    
    I thought (for about a second) refactoring the test so that these three
    mmap calls will only be run on systems with the level 5 page tables
    available, but the whole point of the test is to check the level 5
    feature...
    
    Link: https://lkml.kernel.org/r/20240119205801.62769-1-audra@redhat.com
    Fixes: 4f2930c6718a ("selftests/vm: only run 128TBswitch with 5-level paging")
    Signed-off-by: Audra Mitchell <audra@redhat.com>
    Cc: Rafael Aquini <raquini@redhat.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Adam Sindelar <adam@wowsignal.io>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests/net: convert test_bridge_backup_port.sh to run it in unique namespace [+ + +]

Author: Hangbin Liu <liuhangbin@gmail.com>
Date:   Wed Dec 6 15:07:53 2023 +0800

    selftests/net: convert test_bridge_backup_port.sh to run it in unique namespace
    
    [ Upstream commit 4624a78c18c62da815f3253966b7a87995f77e1b ]
    
    There is no h1 h2 actually. Remove it. Here is the test result after
    conversion.
    
    ]# ./test_bridge_backup_port.sh
    
    Backup port
    -----------
    TEST: Forwarding out of swp1                                        [ OK ]
    TEST: No forwarding out of vx0                                      [ OK ]
    TEST: swp1 carrier off                                              [ OK ]
    TEST: No forwarding out of swp1                                     [ OK ]
    ...
    Backup nexthop ID - ping
    ------------------------
    TEST: Ping with backup nexthop ID                                   [ OK ]
    TEST: Ping after disabling backup nexthop ID                        [ OK ]
    
    Backup nexthop ID - torture test
    --------------------------------
    TEST: Torture test                                                  [ OK ]
    
    Tests passed:  83
    Tests failed:   0
    
    Acked-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Tested-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Stable-dep-of: 38ee0cb2a2e2 ("selftests: net: Fix bridge backup port test flakiness")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

selftests: bridge_mdb: Use MDB get instead of dump [+ + +]

Author: Ido Schimmel <idosch@nvidia.com>
Date:   Wed Oct 25 15:30:19 2023 +0300

    selftests: bridge_mdb: Use MDB get instead of dump
    
    [ Upstream commit e8bba9e83c88ea951dafd3319c97c55a52b3637d ]
    
    Test the new MDB get functionality by converting dump and grep to MDB
    get.
    
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Stable-dep-of: dd6b34589441 ("selftests: forwarding: Suppress grep warnings")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

selftests: forwarding: Fix bridge locked port test flakiness [+ + +]

Author: Ido Schimmel <idosch@nvidia.com>
Date:   Thu Feb 8 17:55:29 2024 +0200

    selftests: forwarding: Fix bridge locked port test flakiness
    
    [ Upstream commit f97f1fcc96908c97a240ff6cb4474e155abfa0d7 ]
    
    The redirection test case fails in the netdev CI on debug kernels
    because an FDB entry is learned despite the presence of a tc filter that
    redirects incoming traffic [1].
    
    I am unable to reproduce the failure locally, but I can see how it can
    happen given that learning is first enabled and only then the ingress tc
    filter is configured. On debug kernels the time window between these two
    operations is longer compared to regular kernels, allowing random
    packets to be transmitted and trigger learning.
    
    Fix by reversing the order and configure the ingress tc filter before
    enabling learning.
    
    [1]
    [...]
     # TEST: Locked port MAB redirect                                      [FAIL]
     # Locked entry created for redirected traffic
    
    Fixes: 38c43a1ce758 ("selftests: forwarding: Add test case for traffic redirection from a locked port")
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
    Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
    Link: https://lore.kernel.org/r/20240208155529.1199729-5-idosch@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

selftests: forwarding: Fix bridge MDB test flakiness [+ + +]

Author: Ido Schimmel <idosch@nvidia.com>
Date:   Thu Feb 8 17:55:27 2024 +0200

    selftests: forwarding: Fix bridge MDB test flakiness
    
    [ Upstream commit 7399e2ce4d424f426417496eb289458780eea985 ]
    
    After enabling a multicast querier on the bridge (like the test is
    doing), the bridge will wait for the Max Response Delay before starting
    to forward according to its MDB in order to let Membership Reports
    enough time to be received and processed.
    
    Currently, the test is waiting for exactly the default Max Response
    Delay (10 seconds) which is racy and leads to failures [1].
    
    Fix by reducing the Max Response Delay to 1 second.
    
    [1]
     [...]
     # TEST: IPv4 host entries forwarding tests                            [FAIL]
     # Packet locally received after flood
    
    Fixes: b6d00da08610 ("selftests: forwarding: Add bridge MDB test")
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
    Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
    Link: https://lore.kernel.org/r/20240208155529.1199729-3-idosch@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

selftests: forwarding: Fix layer 2 miss test flakiness [+ + +]

Author: Ido Schimmel <idosch@nvidia.com>
Date:   Thu Feb 8 17:55:26 2024 +0200

    selftests: forwarding: Fix layer 2 miss test flakiness
    
    [ Upstream commit 93590849a05edffaefa11695fab98f621259ded2 ]
    
    After enabling a multicast querier on the bridge (like the test is
    doing), the bridge will wait for the Max Response Delay before starting
    to forward according to its MDB in order to let Membership Reports
    enough time to be received and processed.
    
    Currently, the test is waiting for exactly the default Max Response
    Delay (10 seconds) which is racy and leads to failures [1].
    
    Fix by reducing the Max Response Delay to 1 second.
    
    [1]
     [...]
     # TEST: L2 miss - Multicast (IPv4)                                    [FAIL]
     # Unregistered multicast filter was hit after adding MDB entry
    
    Fixes: 8c33266ae26a ("selftests: forwarding: Add layer 2 miss test cases")
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
    Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
    Link: https://lore.kernel.org/r/20240208155529.1199729-2-idosch@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

selftests: forwarding: Suppress grep warnings [+ + +]

Author: Ido Schimmel <idosch@nvidia.com>
Date:   Thu Feb 8 17:55:28 2024 +0200

    selftests: forwarding: Suppress grep warnings
    
    [ Upstream commit dd6b34589441f2ad4698dd88a664811550148b41 ]
    
    Suppress the following grep warnings:
    
    [...]
    INFO: # Port group entries configuration tests - (*, G)
    TEST: Common port group entries configuration tests (IPv4 (*, G))   [ OK ]
    TEST: Common port group entries configuration tests (IPv6 (*, G))   [ OK ]
    grep: warning: stray \ before /
    grep: warning: stray \ before /
    grep: warning: stray \ before /
    TEST: IPv4 (*, G) port group entries configuration tests            [ OK ]
    grep: warning: stray \ before /
    grep: warning: stray \ before /
    grep: warning: stray \ before /
    TEST: IPv6 (*, G) port group entries configuration tests            [ OK ]
    [...]
    
    They do not fail the test, but do clutter the output.
    
    Fixes: b6d00da08610 ("selftests: forwarding: Add bridge MDB test")
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
    Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
    Link: https://lore.kernel.org/r/20240208155529.1199729-4-idosch@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

selftests: mm: fix map_hugetlb failure on 64K page size systems [+ + +]

Author: Nico Pache <npache@redhat.com>
Date:   Fri Jan 19 06:14:29 2024 -0700

    selftests: mm: fix map_hugetlb failure on 64K page size systems
    
    commit 91b80cc5b39f00399e8e2d17527cad2c7fa535e2 upstream.
    
    On systems with 64k page size and 512M huge page sizes, the allocation and
    test succeeds but errors out at the munmap.  As the comment states, munmap
    will failure if its not HUGEPAGE aligned.  This is due to the length of
    the mapping being 1/2 the size of the hugepage causing the munmap to not
    be hugepage aligned.  Fix this by making the mapping length the full
    hugepage if the hugepage is larger than the length of the mapping.
    
    Link: https://lkml.kernel.org/r/20240119131429.172448-1-npache@redhat.com
    Signed-off-by: Nico Pache <npache@redhat.com>
    Cc: Donet Tom <donettom@linux.vnet.ibm.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Christophe Leroy <christophe.leroy@c-s.fr>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests: mptcp: add missing kconfig for NF Filter [+ + +]

Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Wed Jan 31 22:49:47 2024 +0100

    selftests: mptcp: add missing kconfig for NF Filter
    
    commit 3645c844902bd4e173d6704fc2a37e8746904d67 upstream.
    
    Since the commit mentioned below, 'mptcp_join' selftests is using
    IPTables to add rules to the Filter table.
    
    It is then required to have IP_NF_FILTER KConfig.
    
    This KConfig is usually enabled by default in many defconfig, but we
    recently noticed that some CI were running our selftests without them
    enabled.
    
    Fixes: 8d014eaa9254 ("selftests: mptcp: add ADD_ADDR timeout test case")
    Cc: stable@vger.kernel.org
    Reviewed-by: Geliang Tang <geliang@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests: mptcp: add missing kconfig for NF Filter in v6 [+ + +]

Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Wed Jan 31 22:49:48 2024 +0100

    selftests: mptcp: add missing kconfig for NF Filter in v6
    
    commit 8c86fad2cecdc6bf7283ecd298b4d0555bd8b8aa upstream.
    
    Since the commit mentioned below, 'mptcp_join' selftests is using
    IPTables to add rules to the Filter table for IPv6.
    
    It is then required to have IP6_NF_FILTER KConfig.
    
    This KConfig is usually enabled by default in many defconfig, but we
    recently noticed that some CI were running our selftests without them
    enabled.
    
    Fixes: 523514ed0a99 ("selftests: mptcp: add ADD_ADDR IPv6 test cases")
    Cc: stable@vger.kernel.org
    Reviewed-by: Geliang Tang <geliang@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1-3-4c1c11e571ff@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests: mptcp: add missing kconfig for NF Mangle [+ + +]

Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Wed Jan 31 22:49:49 2024 +0100

    selftests: mptcp: add missing kconfig for NF Mangle
    
    commit 2d41f10fa497182df9012d3e95d9cea24eb42e61 upstream.
    
    Since the commit mentioned below, 'mptcp_join' selftests is using
    IPTables to add rules to the Mangle table, only in IPv4.
    
    This KConfig is usually enabled by default in many defconfig, but we
    recently noticed that some CI were running our selftests without them
    enabled.
    
    Fixes: b6e074e171bc ("selftests: mptcp: add infinite map testcase")
    Cc: stable@vger.kernel.org
    Reviewed-by: Geliang Tang <geliang@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1-4-4c1c11e571ff@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests: mptcp: add mptcp_lib_kill_wait [+ + +]

Author: Geliang Tang <geliang@kernel.org>
Date:   Tue Nov 28 15:18:53 2023 -0800

    selftests: mptcp: add mptcp_lib_kill_wait
    
    commit bdbef0a6ff10603895b0ba39f56bf874cb2b551a upstream.
    
    To avoid duplicated code in different MPTCP selftests, we can add
    and use helpers defined in mptcp_lib.sh.
    
    Export kill_wait() helper in userspace_pm.sh into mptcp_lib.sh and
    rename it as mptcp_lib_kill_wait(). It can be used to instead of
    kill_wait() in mptcp_join.sh. Use the new helper in both scripts.
    
    Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Link: https://lore.kernel.org/r/20231128-send-net-next-2023107-v4-9-8d6b94150f6b@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests: mptcp: allow changing subtests prefix [+ + +]

Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Wed Jan 31 22:49:52 2024 +0100

    selftests: mptcp: allow changing subtests prefix
    
    commit de46d138e7735eded9756906747fd3a8c3a42225 upstream.
    
    If a CI executes the same selftest multiple times with different
    options, all results from the same subtests will have the same title,
    which confuse the CI. With the same title printed in TAP, the tests are
    considered as the same ones.
    
    Now, it is possible to override this prefix by using MPTCP_LIB_KSFT_TEST
    env var, and have a different title.
    
    While at it, use 'basename' to remove the suffix as well instead of
    using an extra 'sed'.
    
    Fixes: c4192967e62f ("selftests: mptcp: lib: format subtests results in TAP")
    Cc: stable@vger.kernel.org
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1-7-4c1c11e571ff@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests: mptcp: increase timeout to 30 min [+ + +]

Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Wed Jan 31 22:49:50 2024 +0100

    selftests: mptcp: increase timeout to 30 min
    
    commit 4d4dfb2019d7010efb65926d9d1c1793f9a367c6 upstream.
    
    On very slow environments -- e.g. when QEmu is used without KVM --,
    mptcp_join.sh selftest can take a bit more than 20 minutes. Bump the
    default timeout by 50% as it seems normal to take that long on some
    environments.
    
    When a debug kernel config is used, this selftest will take even longer,
    but that's certainly not a common test env to consider for the timeout.
    
    The Fixes tag that has been picked here is there simply to help having
    this patch backported to older stable versions. It is difficult to point
    to the exact commit that made some env reaching the timeout from time to
    time.
    
    Fixes: d17b968b9876 ("selftests: mptcp: increase timeout to 20 minutes")
    Cc: stable@vger.kernel.org
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1-5-4c1c11e571ff@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests: net: Fix bridge backup port test flakiness [+ + +]

Author: Ido Schimmel <idosch@nvidia.com>
Date:   Thu Feb 8 14:31:10 2024 +0200

    selftests: net: Fix bridge backup port test flakiness
    
    [ Upstream commit 38ee0cb2a2e2ade077442085638eb181b0562971 ]
    
    The test toggles the carrier of a bridge port in order to test the
    bridge backup port feature.
    
    Due to the linkwatch delayed work the carrier change is not always
    reflected fast enough to the bridge driver and packets are not forwarded
    as the test expects, resulting in failures [1].
    
    Fix by busy waiting on the bridge port state until it changes to the
    desired state following the carrier change.
    
    [1]
     # Backup port
     # -----------
     [...]
     # TEST: swp1 carrier off                                              [ OK ]
     # TEST: No forwarding out of swp1                                     [FAIL]
     [  641.995910] br0: port 1(swp1) entered disabled state
     # TEST: No forwarding out of vx0                                      [ OK ]
    
    Fixes: b408453053fb ("selftests: net: Add bridge backup port and backup nexthop ID test")
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Petr Machata <petrm@nvidia.com>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
    Link: https://lore.kernel.org/r/20240208123110.1063930-1-idosch@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

serial: core: introduce uart_port_tx_flags() [+ + +]

Author: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Date:   Thu Feb 1 11:55:56 2024 +0100

    serial: core: introduce uart_port_tx_flags()
    
    commit 3ee07964d407411fd578a3bc998de44fd64d266a upstream.
    
    And an enum with a flag: UART_TX_NOSTOP. To NOT call
    __port->ops->stop_tx() when the circular buffer is empty. mxs-uart needs
    this (see the next patch).
    
    Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org>
    Cc: stable <stable@kernel.org>
    Tested-by: Emil Kronborg <emil.kronborg@protonmail.com>
    Link: https://lore.kernel.org/r/20240201105557.28043-1-jirislaby@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

serial: max310x: fail probe if clock crystal is unstable [+ + +]

Author: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Date:   Tue Jan 16 16:30:00 2024 -0500

    serial: max310x: fail probe if clock crystal is unstable
    
    commit 8afa6c6decea37e7cb473d2c60473f37f46cea35 upstream.
    
    A stable clock is really required in order to use this UART, so log an
    error message and bail out if the chip reports that the clock is not
    stable.
    
    Fixes: 4cf9a888fd3c ("serial: max310x: Check the clock readiness")
    Cc: stable@vger.kernel.org
    Suggested-by: Jan Kundrц║t <jan.kundrat@cesnet.cz>
    Link: https://www.spinics.net/lists/linux-serial/msg35773.html
    Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
    Link: https://lore.kernel.org/r/20240116213001.3691629-4-hugo@hugovil.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

serial: max310x: improve crystal stable clock detection [+ + +]

Author: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Date:   Tue Jan 16 16:29:59 2024 -0500

    serial: max310x: improve crystal stable clock detection
    
    commit 93cd256ab224c2519e7c4e5f58bb4f1ac2bf0965 upstream.
    
    Some people are seeing a warning similar to this when using a crystal:
    
        max310x 11-006c: clock is not stable yet
    
    The datasheet doesn't mention the maximum time to wait for the clock to be
    stable when using a crystal, and it seems that the 10ms delay in the driver
    is not always sufficient.
    
    Jan Kundrц║t reported that it took three tries (each separated by 10ms) to
    get a stable clock.
    
    Modify behavior to check stable clock ready bit multiple times (20), and
    waiting 10ms between each try.
    
    Note: the first draft of the driver originally used a 50ms delay, without
    checking the clock stable bit.
    Then a loop with 1000 retries was implemented, each time reading the clock
    stable bit.
    
    Fixes: 4cf9a888fd3c ("serial: max310x: Check the clock readiness")
    Cc: stable@vger.kernel.org
    Suggested-by: Jan Kundrц║t <jan.kundrat@cesnet.cz>
    Link: https://www.spinics.net/lists/linux-serial/msg35773.html
    Link: https://lore.kernel.org/all/20240110174015.6f20195fde08e5c9e64e5675@hugovil.com/raw
    Link: https://github.com/boundarydevices/linux/commit/e5dfe3e4a751392515d78051973190301a37ca9a
    Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
    Link: https://lore.kernel.org/r/20240116213001.3691629-3-hugo@hugovil.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

serial: max310x: prevent infinite while() loop in port startup [+ + +]

Author: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Date:   Tue Jan 16 16:30:01 2024 -0500

    serial: max310x: prevent infinite while() loop in port startup
    
    commit b35f8dbbce818b02c730dc85133dc7754266e084 upstream.
    
    If there is a problem after resetting a port, the do/while() loop that
    checks the default value of DIVLSB register may run forever and spam the
    I2C bus.
    
    Add a delay before each read of DIVLSB, and a maximum number of tries to
    prevent that situation from happening.
    
    Also fail probe if port reset is unsuccessful.
    
    Fixes: 10d8b34a4217 ("serial: max310x: Driver rework")
    Cc: stable@vger.kernel.org
    Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
    Link: https://lore.kernel.org/r/20240116213001.3691629-5-hugo@hugovil.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

serial: max310x: set default value when reading clock ready bit [+ + +]

Author: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Date:   Tue Jan 16 16:29:58 2024 -0500

    serial: max310x: set default value when reading clock ready bit
    
    commit 0419373333c2f2024966d36261fd82a453281e80 upstream.
    
    If regmap_read() returns a non-zero value, the 'val' variable can be left
    uninitialized.
    
    Clear it before calling regmap_read() to make sure we properly detect
    the clock ready bit.
    
    Fixes: 4cf9a888fd3c ("serial: max310x: Check the clock readiness")
    Cc: stable@vger.kernel.org
    Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
    Link: https://lore.kernel.org/r/20240116213001.3691629-2-hugo@hugovil.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

serial: mxs-auart: fix tx [+ + +]

Author: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Date:   Thu Feb 1 11:55:57 2024 +0100

    serial: mxs-auart: fix tx
    
    commit 7be50f2e8f20fc2299069b28dea59a28e3abe20a upstream.
    
    Emil reports:
      After updating Linux on an i.MX28 board, serial communication over
      AUART broke. When I TX from the board and measure on the TX pin, it
      seems like the HW fifo is not emptied before the transmission is
      stopped.
    
    MXS performs weird things with stop_tx(). The driver makes it
    conditional on uart_tx_stopped().
    
    So the driver needs special handling. Pass the brand new UART_TX_NOSTOP
    to uart_port_tx_flags() and handle the stop on its own.
    
    Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org>
    Reported-by: Emil Kronborg <emil.kronborg@protonmail.com>
    Cc: stable <stable@kernel.org>
    Fixes: 2d141e683e9a ("tty: serial: use uart_port_tx() helper")
    Closes: https://lore.kernel.org/all/miwgbnvy3hjpnricubg76ytpn7xoceehwahupy25bubbduu23s@om2lptpa26xw/
    Tested-by: Stefan Wahren <wahrenst@gmx.net>
    Tested-by: Emil Kronborg <emil.kronborg@protonmail.com>
    Link: https://lore.kernel.org/r/20240201105557.28043-2-jirislaby@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

smb: client: set correct id, uid and cruid for multiuser automounts [+ + +]

Author: Paulo Alcantara <pc@manguebit.com>
Date:   Sun Feb 11 20:19:30 2024 -0300

    smb: client: set correct id, uid and cruid for multiuser automounts
    
    commit 4508ec17357094e2075f334948393ddedbb75157 upstream.
    
    When uid, gid and cruid are not specified, we need to dynamically
    set them into the filesystem context used for automounting otherwise
    they'll end up reusing the values from the parent mount.
    
    Fixes: 9fd29a5bae6e ("cifs: use fs_context for automounts")
    Reported-by: Shane Nehring <snehring@iastate.edu>
    Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2259257
    Cc: stable@vger.kernel.org # 6.2+
    Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

smb: Fix regression in writes when non-standard maximum write size negotiated [+ + +]

Author: Steve French <stfrench@microsoft.com>
Date:   Tue Feb 6 16:34:22 2024 -0600

    smb: Fix regression in writes when non-standard maximum write size negotiated
    
    commit 4860abb91f3d7fbaf8147d54782149bb1fc45892 upstream.
    
    The conversion to netfs in the 6.3 kernel caused a regression when
    maximum write size is set by the server to an unexpected value which is
    not a multiple of 4096 (similarly if the user overrides the maximum
    write size by setting mount parm "wsize", but sets it to a value that
    is not a multiple of 4096).  When negotiated write size is not a
    multiple of 4096 the netfs code can skip the end of the final
    page when doing large sequential writes, causing data corruption.
    
    This section of code is being rewritten/removed due to a large
    netfs change, but until that point (ie for the 6.3 kernel until now)
    we can not support non-standard maximum write sizes.
    
    Add a warning if a user specifies a wsize on mount that is not
    a multiple of 4096 (and round down), also add a change where we
    round down the maximum write size if the server negotiates a value
    that is not a multiple of 4096 (we also have to check to make sure that
    we do not round it down to zero).
    
    Reported-by: "R. Diez" <rdiez-2006@rd10.de>
    Fixes: d08089f649a0 ("cifs: Change the I/O paths to use an iterator rather than a page list")
    Suggested-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
    Acked-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
    Tested-by: Matthew Ruffell <matthew.ruffell@canonical.com>
    Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
    Cc: stable@vger.kernel.org # v6.3+
    Cc: David Howells <dhowells@redhat.com>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

spi: imx: fix the burst length at DMA mode and CPU mode [+ + +]

Author: Carlos Song <carlos.song@nxp.com>
Date:   Sun Feb 4 17:19:12 2024 +0800

    spi: imx: fix the burst length at DMA mode and CPU mode
    
    [ Upstream commit c712c05e46c8ce550842951e9e2606e24dbf0475 ]
    
    For DMA mode, the bus width of the DMA is equal to the size of data
    word, so burst length should be configured as bits per word.
    
    For CPU mode, because of the spi transfer len is in byte, so calculate
    the total number of words according to spi transfer len and bits per
    word, burst length should be configured as total data bits.
    
    Signed-off-by: Carlos Song <carlos.song@nxp.com>
    Reviewed-by: Clark Wang <xiaoning.wang@nxp.com>
    Fixes: e9b220aeacf1 ("spi: spi-imx: correctly configure burst length when using dma")
    Fixes: 5f66db08cbd3 ("spi: imx: Take in account bits per word instead of assuming 8-bits")
    Link: https://lore.kernel.org/r/20240204091912.36488-1-carlos.song@nxp.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

spi: ppc4xx: Drop write-only variable [+ + +]

Author: Uwe Kleine-Kц╤nig <u.kleine-koenig@pengutronix.de>
Date:   Sat Feb 10 17:40:08 2024 +0100

    spi: ppc4xx: Drop write-only variable
    
    [ Upstream commit b3aa619a8b4706f35cb62f780c14e68796b37f3f ]
    
    Since commit 24778be20f87 ("spi: convert drivers to use
    bits_per_word_mask") the bits_per_word variable is only written to. The
    check that was there before isn't needed any more as the spi core
    ensures that only 8 bit transfers are used, so the variable can go away
    together with all assignments to it.
    
    Fixes: 24778be20f87 ("spi: convert drivers to use bits_per_word_mask")
    Signed-off-by: Uwe Kleine-Kц╤nig <u.kleine-koenig@pengutronix.de>
    Link: https://lore.kernel.org/r/20240210164006.208149-8-u.kleine-koenig@pengutronix.de
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

staging: iio: ad5933: fix type mismatch regression [+ + +]

Author: David Schiller <david.schiller@jku.at>
Date:   Mon Jan 22 14:49:17 2024 +0100

    staging: iio: ad5933: fix type mismatch regression
    
    commit 6db053cd949fcd6254cea9f2cd5d39f7bd64379c upstream.
    
    Commit 4c3577db3e4f ("Staging: iio: impedance-analyzer: Fix sparse
    warning") fixed a compiler warning, but introduced a bug that resulted
    in one of the two 16 bit IIO channels always being zero (when both are
    enabled).
    
    This is because int is 32 bits wide on most architectures and in the
    case of a little-endian machine the two most significant bytes would
    occupy the buffer for the second channel as 'val' is being passed as a
    void pointer to 'iio_push_to_buffers()'.
    
    Fix by defining 'val' as u16. Tested working on ARM64.
    
    Fixes: 4c3577db3e4f ("Staging: iio: impedance-analyzer: Fix sparse warning")
    Signed-off-by: David Schiller <david.schiller@jku.at>
    Link: https://lore.kernel.org/r/20240122134916.2137957-1-david.schiller@jku.at
    Cc: <Stable@vger.kernel.org>
    Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

thunderbolt: Fix setting the CNS bit in ROUTER_CS_5 [+ + +]

Author: Mohammad Rahimi <rahimi.mhmmd@gmail.com>
Date:   Sat Jan 27 11:26:28 2024 +0800

    thunderbolt: Fix setting the CNS bit in ROUTER_CS_5
    
    commit ec4d82f855ce332de26fe080892483de98cc1a19 upstream.
    
    The bit 23, CM TBT3 Not Supported (CNS), in ROUTER_CS_5 indicates
    whether a USB4 Connection Manager is TBT3-Compatible and should be:
        0b for TBT3-Compatible
        1b for Not TBT3-Compatible
    
    Fixes: b04079837b20 ("thunderbolt: Add initial support for USB4")
    Cc: stable@vger.kernel.org
    Signed-off-by: Mohammad Rahimi <rahimi.mhmmd@gmail.com>
    Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tls: extract context alloc/initialization out of tls_set_sw_offload [+ + +]

Author: Sabrina Dubroca <sd@queasysnail.net>
Date:   Mon Oct 9 22:50:46 2023 +0200

    tls: extract context alloc/initialization out of tls_set_sw_offload
    
    [ Upstream commit 615580cbc99af0da2d1c7226fab43a3d5003eb97 ]
    
    Simplify tls_set_sw_offload a bit.
    
    Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Stable-dep-of: aec7961916f3 ("tls: fix race between async notify and socket close")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

tls: fix race between async notify and socket close [+ + +]

Author: Jakub Kicinski <kuba@kernel.org>
Date:   Tue Feb 6 17:18:19 2024 -0800

    tls: fix race between async notify and socket close
    
    [ Upstream commit aec7961916f3f9e88766e2688992da6980f11b8d ]
    
    The submitting thread (one which called recvmsg/sendmsg)
    may exit as soon as the async crypto handler calls complete()
    so any code past that point risks touching already freed data.
    
    Try to avoid the locking and extra flags altogether.
    Have the main thread hold an extra reference, this way
    we can depend solely on the atomic ref counter for
    synchronization.
    
    Don't futz with reiniting the completion, either, we are now
    tightly controlling when completion fires.
    
    Reported-by: valis <sec@valis.email>
    Fixes: 0cada33241d9 ("net/tls: fix race condition causing kernel panic")
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

tls: fix race between tx work scheduling and socket close [+ + +]

Author: Jakub Kicinski <kuba@kernel.org>
Date:   Tue Feb 6 17:18:20 2024 -0800

    tls: fix race between tx work scheduling and socket close
    
    [ Upstream commit e01e3934a1b2d122919f73bc6ddbe1cdafc4bbdb ]
    
    Similarly to previous commit, the submitting thread (recvmsg/sendmsg)
    may exit as soon as the async crypto handler calls complete().
    Reorder scheduling the work before calling complete().
    This seems more logical in the first place, as it's
    the inverse order of what the submitting thread will do.
    
    Reported-by: valis <sec@valis.email>
    Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

tools/rtla: Exit with EXIT_SUCCESS when help is invoked [+ + +]

Author: John Kacur <jkacur@redhat.com>
Date:   Fri Feb 2 19:16:07 2024 -0500

    tools/rtla: Exit with EXIT_SUCCESS when help is invoked
    
    commit b5f319360371087d52070d8f3fc7789e80ce69a6 upstream.
    
    Fix rtla so that the following commands exit with 0 when help is invoked
    
    rtla osnoise top -h
    rtla osnoise hist -h
    rtla timerlat top -h
    rtla timerlat hist -h
    
    Link: https://lore.kernel.org/linux-trace-devel/20240203001607.69703-1-jkacur@redhat.com
    
    Cc: stable@vger.kernel.org
    Fixes: 1eeb6328e8b3 ("rtla/timerlat: Add timerlat hist mode")
    Signed-off-by: John Kacur <jkacur@redhat.com>
    Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tools/rtla: Fix clang warning about mount_point var size [+ + +]

Author: Daniel Bristot de Oliveira <bristot@kernel.org>
Date:   Tue Feb 6 12:05:31 2024 +0100

    tools/rtla: Fix clang warning about mount_point var size
    
    commit 30369084ac6e27479a347899e74f523e6ca29b89 upstream.
    
    clang is reporting this warning:
    
    $ make HOSTCC=clang CC=clang LLVM_IAS=1
    [...]
    clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions
            -fstack-protector-strong -fasynchronous-unwind-tables
            -fstack-clash-protection  -Wall -Werror=format-security
            -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS
            $(pkg-config --cflags libtracefs)    -c -o src/utils.o src/utils.c
    
    src/utils.c:548:66: warning: 'fscanf' may overflow; destination buffer in argument 3 has size 1024, but the corresponding specifier may require size 1025 [-Wfortify-source]
      548 |         while (fscanf(fp, "%*s %" STR(MAX_PATH) "s %99s %*s %*d %*d\n", mount_point, type) == 2) {
          |                                                                         ^
    
    Increase mount_point variable size to MAX_PATH+1 to avoid the overflow.
    
    Link: https://lkml.kernel.org/r/1b46712e93a2f4153909514a36016959dcc4021c.1707217097.git.bristot@kernel.org
    
    Cc: stable@vger.kernel.org
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Nathan Chancellor <nathan@kernel.org>
    Cc: Nick Desaulniers <ndesaulniers@google.com>
    Cc: Bill Wendling <morbo@google.com>
    Cc: Justin Stitt <justinstitt@google.com>
    Cc: Donald Zickus <dzickus@redhat.com>
    Fixes: a957cbc02531 ("rtla: Add -C cgroup support")
    Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tools/rtla: Fix Makefile compiler options for clang [+ + +]

Author: Daniel Bristot de Oliveira <bristot@kernel.org>
Date:   Tue Feb 6 12:05:29 2024 +0100

    tools/rtla: Fix Makefile compiler options for clang
    
    commit bc4cbc9d260ba8358ca63662919f4bb223cb603b upstream.
    
    The following errors are showing up when compiling rtla with clang:
    
     $ make HOSTCC=clang CC=clang LLVM_IAS=1
     [...]
    
      clang -O -g -DVERSION=\"6.8.0-rc1\" -flto=auto -ffat-lto-objects
            -fexceptions -fstack-protector-strong
            -fasynchronous-unwind-tables -fstack-clash-protection  -Wall
            -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2
            -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized
            $(pkg-config --cflags libtracefs)    -c -o src/utils.o src/utils.c
    
      clang: warning: optimization flag '-ffat-lto-objects' is not supported [-Wignored-optimization-argument]
      warning: unknown warning option '-Wno-maybe-uninitialized'; did you mean '-Wno-uninitialized'? [-Wunknown-warning-option]
      1 warning generated.
    
      clang -o rtla -ggdb  src/osnoise.o src/osnoise_hist.o src/osnoise_top.o
      src/rtla.o src/timerlat_aa.o src/timerlat.o src/timerlat_hist.o
      src/timerlat_top.o src/timerlat_u.o src/trace.o src/utils.o $(pkg-config --libs libtracefs)
    
      src/osnoise.o: file not recognized: file format not recognized
      clang: error: linker command failed with exit code 1 (use -v to see invocation)
      make: *** [Makefile:110: rtla] Error 1
    
    Solve these issues by:
      - removing -ffat-lto-objects and -Wno-maybe-uninitialized if using clang
      - informing the linker about -flto=auto
    
    Link: https://lore.kernel.org/linux-trace-kernel/567ac1b94effc228ce9a0225b9df7232a9b35b55.1707217097.git.bristot@kernel.org
    
    Cc: stable@vger.kernel.org
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Nathan Chancellor <nathan@kernel.org>
    Cc: Nick Desaulniers <ndesaulniers@google.com>
    Cc: Bill Wendling <morbo@google.com>
    Cc: Justin Stitt <justinstitt@google.com>
    Fixes: 1a7b22ab15eb ("tools/rtla: Build with EXTRA_{C,LD}FLAGS")
    Suggested-by: Donald Zickus <dzickus@redhat.com>
    Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tools/rtla: Fix uninitialized bucket/data->bucket_size warning [+ + +]

Author: Daniel Bristot de Oliveira <bristot@kernel.org>
Date:   Tue Feb 6 12:05:30 2024 +0100

    tools/rtla: Fix uninitialized bucket/data->bucket_size warning
    
    commit 64dc40f7523369912d7adb22c8cb655f71610505 upstream.
    
    When compiling rtla with clang, I am getting the following warnings:
    
    $ make HOSTCC=clang CC=clang LLVM_IAS=1
    
    [..]
    clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions
            -fstack-protector-strong -fasynchronous-unwind-tables
            -fstack-clash-protection  -Wall -Werror=format-security
            -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS
            $(pkg-config --cflags libtracefs)
            -c -o src/osnoise_hist.o src/osnoise_hist.c
    src/osnoise_hist.c:138:6: warning: variable 'bucket' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
      138 |         if (data->bucket_size)
          |             ^~~~~~~~~~~~~~~~~
    src/osnoise_hist.c:149:6: note: uninitialized use occurs here
      149 |         if (bucket < entries)
          |             ^~~~~~
    src/osnoise_hist.c:138:2: note: remove the 'if' if its condition is always true
      138 |         if (data->bucket_size)
          |         ^~~~~~~~~~~~~~~~~~~~~~
      139 |                 bucket = duration / data->bucket_size;
    src/osnoise_hist.c:132:12: note: initialize the variable 'bucket' to silence this warning
      132 |         int bucket;
          |                   ^
          |                    = 0
    1 warning generated.
    
    [...]
    
    clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions
            -fstack-protector-strong -fasynchronous-unwind-tables
            -fstack-clash-protection  -Wall -Werror=format-security
            -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS
            $(pkg-config --cflags libtracefs)
            -c -o src/timerlat_hist.o src/timerlat_hist.c
    src/timerlat_hist.c:181:6: warning: variable 'bucket' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
      181 |         if (data->bucket_size)
          |             ^~~~~~~~~~~~~~~~~
    src/timerlat_hist.c:204:6: note: uninitialized use occurs here
      204 |         if (bucket < entries)
          |             ^~~~~~
    src/timerlat_hist.c:181:2: note: remove the 'if' if its condition is always true
      181 |         if (data->bucket_size)
          |         ^~~~~~~~~~~~~~~~~~~~~~
      182 |                 bucket = latency / data->bucket_size;
    src/timerlat_hist.c:175:12: note: initialize the variable 'bucket' to silence this warning
      175 |         int bucket;
          |                   ^
          |                    = 0
    1 warning generated.
    
    This is a legit warning, but data->bucket_size is always > 0 (see
    timerlat_hist_parse_args()), so the if is not necessary.
    
    Remove the unneeded if (data->bucket_size) to avoid the warning.
    
    Link: https://lkml.kernel.org/r/6e1b1665cd99042ae705b3e0fc410858c4c42346.1707217097.git.bristot@kernel.org
    
    Cc: stable@vger.kernel.org
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Nathan Chancellor <nathan@kernel.org>
    Cc: Nick Desaulniers <ndesaulniers@google.com>
    Cc: Bill Wendling <morbo@google.com>
    Cc: Justin Stitt <justinstitt@google.com>
    Cc: Donald Zickus <dzickus@redhat.com>
    Fixes: 1eeb6328e8b3 ("rtla/timerlat: Add timerlat hist mode")
    Fixes: 829a6c0b5698 ("rtla/osnoise: Add the hist mode")
    Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tools/rtla: Remove unused sched_getattr() function [+ + +]

Author: Daniel Bristot de Oliveira <bristot@kernel.org>
Date:   Tue Feb 6 12:05:32 2024 +0100

    tools/rtla: Remove unused sched_getattr() function
    
    commit 084ce16df0f060efd371092a09a7ae74a536dc11 upstream.
    
    Clang is reporting:
    
    $ make HOSTCC=clang CC=clang LLVM_IAS=1
    [...]
    clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fstack-clash-protection  -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS $(pkg-config --cflags libtracefs)    -c -o src/utils.o src/utils.c
    src/utils.c:241:19: warning: unused function 'sched_getattr' [-Wunused-function]
      241 | static inline int sched_getattr(pid_t pid, struct sched_attr *attr,
          |                   ^~~~~~~~~~~~~
    1 warning generated.
    
    Which is correct, so remove the unused function.
    
    Link: https://lkml.kernel.org/r/eaed7ba122c4ae88ce71277c824ef41cbf789385.1707217097.git.bristot@kernel.org
    
    Cc: stable@vger.kernel.org
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Nathan Chancellor <nathan@kernel.org>
    Cc: Nick Desaulniers <ndesaulniers@google.com>
    Cc: Bill Wendling <morbo@google.com>
    Cc: Justin Stitt <justinstitt@google.com>
    Cc: Donald Zickus <dzickus@redhat.com>
    Fixes: b1696371d865 ("rtla: Helper functions for rtla")
    Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tools/rtla: Replace setting prio with nice for SCHED_OTHER [+ + +]

Author: limingming3 <limingming890315@gmail.com>
Date:   Wed Feb 7 14:51:42 2024 +0800

    tools/rtla: Replace setting prio with nice for SCHED_OTHER
    
    commit 14f08c976ffe0d2117c6199c32663df1cbc45c65 upstream.
    
    Since the sched_priority for SCHED_OTHER is always 0, it makes no
    sence to set it.
    Setting nice for SCHED_OTHER seems more meaningful.
    
    Link: https://lkml.kernel.org/r/20240207065142.1753909-1-limingming3@lixiang.com
    
    Cc: stable@vger.kernel.org
    Fixes: b1696371d865 ("rtla: Helper functions for rtla")
    Signed-off-by: limingming3 <limingming3@lixiang.com>
    Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tools/rv: Fix curr_reactor uninitialized variable [+ + +]

Author: Daniel Bristot de Oliveira <bristot@kernel.org>
Date:   Tue Feb 6 12:05:34 2024 +0100

    tools/rv: Fix curr_reactor uninitialized variable
    
    commit 61ec586bc0815959d3314cf7ce242529c977b357 upstream.
    
    clang is reporting:
    
    $ make HOSTCC=clang CC=clang LLVM_IAS=1
    
    clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions
            -fstack-protector-strong -fasynchronous-unwind-tables
            -fstack-clash-protection  -Wall -Werror=format-security
            -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS
            $(pkg-config --cflags libtracefs)  -I include
            -c -o src/in_kernel.o src/in_kernel.c
    [...]
    
    src/in_kernel.c:227:6: warning: variable 'curr_reactor' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
      227 |         if (!end)
          |             ^~~~
    src/in_kernel.c:242:9: note: uninitialized use occurs here
      242 |         return curr_reactor;
          |                ^~~~~~~~~~~~
    src/in_kernel.c:227:2: note: remove the 'if' if its condition is always false
      227 |         if (!end)
          |         ^~~~~~~~~
      228 |                 goto out_free;
          |                 ~~~~~~~~~~~~~
    src/in_kernel.c:221:6: warning: variable 'curr_reactor' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
      221 |         if (!start)
          |             ^~~~~~
    src/in_kernel.c:242:9: note: uninitialized use occurs here
      242 |         return curr_reactor;
          |                ^~~~~~~~~~~~
    src/in_kernel.c:221:2: note: remove the 'if' if its condition is always false
      221 |         if (!start)
          |         ^~~~~~~~~~~
      222 |                 goto out_free;
          |                 ~~~~~~~~~~~~~
    src/in_kernel.c:215:20: note: initialize the variable 'curr_reactor' to silence this warning
      215 |         char *curr_reactor;
          |                           ^
          |                            = NULL
    2 warnings generated.
    
    Which is correct. Setting curr_reactor to NULL avoids the problem.
    
    Link: https://lkml.kernel.org/r/3a35551149e5ee0cb0950035afcb8082c3b5d05b.1707217097.git.bristot@kernel.org
    
    Cc: stable@vger.kernel.org
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Nathan Chancellor <nathan@kernel.org>
    Cc: Nick Desaulniers <ndesaulniers@google.com>
    Cc: Bill Wendling <morbo@google.com>
    Cc: Justin Stitt <justinstitt@google.com>
    Cc: Donald Zickus <dzickus@redhat.com>
    Fixes: 6d60f89691fc ("tools/rv: Add in-kernel monitor interface")
    Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tools/rv: Fix Makefile compiler options for clang [+ + +]

Author: Daniel Bristot de Oliveira <bristot@kernel.org>
Date:   Tue Feb 6 12:05:33 2024 +0100

    tools/rv: Fix Makefile compiler options for clang
    
    commit f9b2c87105c989a7b259c6da87673ada96dce2f8 upstream.
    
    The following errors are showing up when compiling rv with clang:
    
     $ make HOSTCC=clang CC=clang LLVM_IAS=1
     [...]
      clang -O -g -DVERSION=\"6.8.0-rc1\" -flto=auto -ffat-lto-objects
      -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables
      -fstack-clash-protection  -Wall -Werror=format-security
      -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS
      -Wno-maybe-uninitialized $(pkg-config --cflags libtracefs)
      -I include   -c -o src/utils.o src/utils.c
      clang: warning: optimization flag '-ffat-lto-objects' is not supported [-Wignored-optimization-argument]
      warning: unknown warning option '-Wno-maybe-uninitialized'; did you mean '-Wno-uninitialized'? [-Wunknown-warning-option]
      1 warning generated.
    
      clang -o rv -ggdb  src/in_kernel.o src/rv.o src/trace.o src/utils.o $(pkg-config --libs libtracefs)
      src/in_kernel.o: file not recognized: file format not recognized
      clang: error: linker command failed with exit code 1 (use -v to see invocation)
      make: *** [Makefile:110: rv] Error 1
    
    Solve these issues by:
      - removing -ffat-lto-objects and -Wno-maybe-uninitialized if using clang
      - informing the linker about -flto=auto
    
    Link: https://lkml.kernel.org/r/ed94a8ddc2ca8c8ef663cfb7ae9dd196c4a66b33.1707217097.git.bristot@kernel.org
    
    Cc: stable@vger.kernel.org
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Nathan Chancellor <nathan@kernel.org>
    Cc: Nick Desaulniers <ndesaulniers@google.com>
    Cc: Bill Wendling <morbo@google.com>
    Cc: Justin Stitt <justinstitt@google.com>
    Fixes: 4bc4b131d44c ("rv: Add rv tool")
    Suggested-by: Donald Zickus <dzickus@redhat.com>
    Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracefs/eventfs: Modify mismatched function name [+ + +]

Author: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Date:   Tue Feb 6 07:09:16 2024 -0500

    tracefs/eventfs: Modify mismatched function name
    
    commit 64bf2f685c795e75dd855761c75a193ee5998731 upstream.
    
    No functional modification involved.
    
    fs/tracefs/event_inode.c:864: warning: expecting prototype for eventfs_remove(). Prototype was for eventfs_remove_dir() instead.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20231019031353.73846-1-jiapeng.chong@linux.alibaba.com
    
    Reported-by: Abaci Robot <abaci@linux.alibaba.com>
    Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6939
    Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracefs/eventfs: Use root and instance inodes as default ownership [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:42 2024 -0500

    tracefs/eventfs: Use root and instance inodes as default ownership
    
    commit 8186fff7ab649085e2c60d032d9a20a85af1d87c upstream.
    
    Instead of walking the dentries on mount/remount to update the gid values of
    all the dentries if a gid option is specified on mount, just update the root
    inode. Add .getattr, .setattr, and .permissions on the tracefs inode
    operations to update the permissions of the files and directories.
    
    For all files and directories in the top level instance:
    
     /sys/kernel/tracing/*
    
    It will use the root inode as the default permissions. The inode that
    represents: /sys/kernel/tracing (or wherever it is mounted).
    
    When an instance is created:
    
     mkdir /sys/kernel/tracing/instance/foo
    
    The directory "foo" and all its files and directories underneath will use
    the default of what foo is when it was created. A remount of tracefs will
    not affect it.
    
    If a user were to modify the permissions of any file or directory in
    tracefs, it will also no longer be modified by a change in ownership of a
    remount.
    
    The events directory, if it is in the top level instance, will use the
    tracefs root inode as the default ownership for itself and all the files and
    directories below it.
    
    For the events directory in an instance ("foo"), it will keep the ownership
    of what it was when it was created, and that will be used as the default
    ownership for the files and directories beneath it.
    
    Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wjVdGkjDXBbvLn2wbZnqP4UsH46E3gqJ9m7UG6DpX2+WA@mail.gmail.com/
    Link: https://lore.kernel.org/linux-trace-kernel/20240103215016.1e0c9811@gandalf.local.home
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Al Viro <viro@ZenIV.linux.org.uk>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracefs: Avoid using the ei->dentry pointer unnecessarily [+ + +]

Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Tue Feb 6 07:09:54 2024 -0500

    tracefs: Avoid using the ei->dentry pointer unnecessarily
    
    commit 99c001cb617df409dac275a059d6c3f187a2da7a upstream.
    
    The eventfs_find_events() code tries to walk up the tree to find the
    event directory that a dentry belongs to, in order to then find the
    eventfs inode that is associated with that event directory.
    
    However, it uses an odd combination of walking the dentry parent,
    looking up the eventfs inode associated with that, and then looking up
    the dentry from there.  Repeat.
    
    But the code shouldn't have back-pointers to dentries in the first
    place, and it should just walk the dentry parenthood chain directly.
    
    Similarly, 'set_top_events_ownership()' looks up the dentry from the
    eventfs inode, but the only reason it wants a dentry is to look up the
    superblock in order to look up the root dentry.
    
    But it already has the real filesystem inode, which has that same
    superblock pointer.  So just pass in the superblock pointer using the
    information that's already there, instead of looking up extraneous data
    that is irrelevant.
    
    Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/
    Link: https://lore.kernel.org/linux-trace-kernel/20240131185512.638645365@goodmis.org
    
    Cc: stable@vger.kernel.org
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Al Viro <viro@ZenIV.linux.org.uk>
    Cc: Ajay Kaher <ajay.kaher@broadcom.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracefs: Check for dentry->d_inode exists in set_gid() [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:38 2024 -0500

    tracefs: Check for dentry->d_inode exists in set_gid()
    
    commit ad579864637af46447208254719943179b69d41a upstream.
    
    If a getdents() is called on the tracefs directory but does not get all
    the files, it can leave a "cursor" dentry in the d_subdirs list of tracefs
    dentry. This cursor dentry does not have a d_inode for it. Before
    referencing tracefs_inode from the dentry, the d_inode must first be
    checked if it has content. If not, then it's not a tracefs_inode and can
    be ignored.
    
    The following caused a crash:
    
     #define getdents64(fd, dirp, count) syscall(SYS_getdents64, fd, dirp, count)
     #define BUF_SIZE 256
     #define TDIR "/tmp/file0"
    
     int main(void)
     {
            char buf[BUF_SIZE];
            int fd;
            int n;
    
            mkdir(TDIR, 0777);
            mount(NULL, TDIR, "tracefs", 0, NULL);
            fd = openat(AT_FDCWD, TDIR, O_RDONLY);
            n = getdents64(fd, buf, BUF_SIZE);
            ret = mount(NULL, TDIR, NULL, MS_NOSUID|MS_REMOUNT|MS_RELATIME|MS_LAZYTIME,
                        "gid=1000");
            return 0;
     }
    
    That's because the 256 BUF_SIZE was not big enough to read all the
    dentries of the tracefs file system and it left a "cursor" dentry in the
    subdirs of the tracefs root inode. Then on remounting with "gid=1000",
    it would cause an iteration of all dentries which hit:
    
            ti = get_tracefs(dentry->d_inode);
            if (ti && (ti->flags & TRACEFS_EVENT_INODE))
                    eventfs_update_gid(dentry, gid);
    
    Which crashed because of the dereference of the cursor dentry which had a NULL
    d_inode.
    
    In the subdir loop of the dentry lookup of set_gid(), if a child has a
    NULL d_inode, simply skip it.
    
    Link: https://lore.kernel.org/all/20240102135637.3a21fb10@gandalf.local.home/
    Link: https://lore.kernel.org/linux-trace-kernel/20240102151249.05da244d@gandalf.local.home
    
    Cc: stable@vger.kernel.org
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Fixes: 7e8358edf503e ("eventfs: Fix file and directory uid and gid ownership")
    Reported-by: "Ubisectech Sirius" <bugreport@ubisectech.com>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracefs: dentry lookup crapectomy [+ + +]

Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Tue Feb 6 07:09:55 2024 -0500

    tracefs: dentry lookup crapectomy
    
    commit 49304c2b93e4f7468b51ef717cbe637981397115 upstream.
    
    The dentry lookup for eventfs files was very broken, and had lots of
    signs of the old situation where the filesystem names were all created
    statically in the dentry tree, rather than being looked up dynamically
    based on the eventfs data structures.
    
    You could see it in the naming - how it claimed to "create" dentries
    rather than just look up the dentries that were given it.
    
    You could see it in various nonsensical and very incorrect operations,
    like using "simple_lookup()" on the dentries that were passed in, which
    only results in those dentries becoming negative dentries.  Which meant
    that any other lookup would possibly return ENOENT if it saw that
    negative dentry before the data was then later filled in.
    
    You could see it in the immense amount of nonsensical code that didn't
    actually just do lookups.
    
    Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/
    Link: https://lore.kernel.org/linux-trace-kernel/20240131233227.73db55e1@gandalf.local.home
    
    Cc: stable@vger.kernel.org
    Cc: Al Viro <viro@ZenIV.linux.org.uk>
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Ajay Kaher <ajay.kaher@broadcom.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracefs: remove stale update_gid code [+ + +]

Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Tue Feb 6 07:09:51 2024 -0500

    tracefs: remove stale update_gid code
    
    commit 29142dc92c37d3259a33aef15b03e6ee25b0d188 upstream.
    
    The 'eventfs_update_gid()' function is no longer called, so remove it
    (and the helper function it uses).
    
    Link: https://lore.kernel.org/all/CAHk-=wj+DsZZ=2iTUkJ-Nojs9fjYMvPs1NuoM3yK7aTDtJfPYQ@mail.gmail.com/
    
    Fixes: 8186fff7ab64 ("tracefs/eventfs: Use root and instance inodes as default ownership")
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracefs: Zero out the tracefs_inode when allocating it [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Feb 6 07:09:52 2024 -0500

    tracefs: Zero out the tracefs_inode when allocating it
    
    commit d81786f53aec14fd4d56263145a0635afbc64617 upstream.
    
    eventfs uses the tracefs_inode and assumes that it's already initialized
    to zero. That is, it doesn't set fields to zero (like ti->private) after
    getting its tracefs_inode. This causes bugs due to stale values.
    
    Just initialize the entire structure to zero on allocation so there isn't
    any more surprises.
    
    This is a partial fix to access to ti->private. The assignment still needs
    to be made before the dentry is instantiated.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20240131185512.315825944@goodmis.org
    
    Cc: stable@vger.kernel.org
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Al Viro <viro@ZenIV.linux.org.uk>
    Cc: Ajay Kaher <ajay.kaher@broadcom.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode")
    Reported-by: kernel test robot <oliver.sang@intel.com>
    Closes: https://lore.kernel.org/oe-lkp/202401291043.e62e89dc-oliver.sang@intel.com
    Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracing/probes: Fix to search structure fields correctly [+ + +]

Author: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Date:   Sat Feb 17 21:25:42 2024 +0900

    tracing/probes: Fix to search structure fields correctly
    
    commit 9704669c386f9bbfef2e002e7e690c56b7dcf5de upstream.
    
    Fix to search a field from the structure which has anonymous union
    correctly.
    Since the reference `type` pointer was updated in the loop, the search
    loop suddenly aborted where it hits an anonymous union. Thus it can not
    find the field after the anonymous union. This avoids updating the
    cursor `type` pointer in the loop.
    
    Link: https://lore.kernel.org/all/170791694361.389532.10047514554799419688.stgit@devnote2/
    
    Fixes: 302db0f5b3d8 ("tracing/probes: Add a function to search a member of a struct/union")
    Cc: stable@vger.kernel.org
    Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracing/probes: Fix to set arg size and fmt after setting type from BTF [+ + +]

Author: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Date:   Wed Jan 24 00:03:02 2024 +0900

    tracing/probes: Fix to set arg size and fmt after setting type from BTF
    
    commit 9a571c1e275cedacd48c66a6bddd0c23f1dffdbf upstream.
    
    Since the BTF type setting updates probe_arg::type, the type size
    calculation and setting print-fmt should be done after that.
    Without this fix, the argument size and print-fmt can be wrong.
    
    Link: https://lore.kernel.org/all/170602218196.215583.6417859469540955777.stgit@devnote2/
    
    Fixes: b576e09701c7 ("tracing/probes: Support function parameters if BTF is available")
    Cc: stable@vger.kernel.org
    Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracing/probes: Fix to show a parse error for bad type for $comm [+ + +]

Author: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Date:   Wed Jan 24 00:02:34 2024 +0900

    tracing/probes: Fix to show a parse error for bad type for $comm
    
    commit 8c427cc2fa73684ea140999e121b7b6c1c717632 upstream.
    
    Fix to show a parse error for bad type (non-string) for $comm/$COMM and
    immediate-string. With this fix, error_log file shows appropriate error
    message as below.
    
     /sys/kernel/tracing # echo 'p vfs_read $comm:u32' >> kprobe_events
    sh: write error: Invalid argument
     /sys/kernel/tracing # echo 'p vfs_read \"hoge":u32' >> kprobe_events
    sh: write error: Invalid argument
     /sys/kernel/tracing # cat error_log
    
    [   30.144183] trace_kprobe: error: $comm and immediate-string only accepts string type
      Command: p vfs_read $comm:u32
                                ^
    [   62.618500] trace_kprobe: error: $comm and immediate-string only accepts string type
      Command: p vfs_read \"hoge":u32
                                  ^
    Link: https://lore.kernel.org/all/170602215411.215583.2238016352271091852.stgit@devnote2/
    
    Fixes: 3dd1f7f24f8c ("tracing: probeevent: Fix to make the type of $comm string")
    Cc: stable@vger.kernel.org
    Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracing/synthetic: Fix trace_string() return value [+ + +]

Author: Thorsten Blum <thorsten.blum@toblux.com>
Date:   Wed Feb 14 23:05:56 2024 +0100

    tracing/synthetic: Fix trace_string() return value
    
    commit 9b6326354cf9a41521b79287da3bfab022ae0b6d upstream.
    
    Fix trace_string() by assigning the string length to the return variable
    which got lost in commit ddeea494a16f ("tracing/synthetic: Use union
    instead of casts") and caused trace_string() to always return 0.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20240214220555.711598-1-thorsten.blum@toblux.com
    
    Cc: stable@vger.kernel.org
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Fixes: ddeea494a16f ("tracing/synthetic: Use union instead of casts")
    Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracing/timerlat: Move hrtimer_init to timerlat_fd open() [+ + +]

Author: Daniel Bristot de Oliveira <bristot@kernel.org>
Date:   Thu Feb 1 16:13:39 2024 +0100

    tracing/timerlat: Move hrtimer_init to timerlat_fd open()
    
    commit 1389358bb008e7625942846e9f03554319b7fecc upstream.
    
    Currently, the timerlat's hrtimer is initialized at the first read of
    timerlat_fd, and destroyed at close(). It works, but it causes an error
    if the user program open() and close() the file without reading.
    
    Here's an example:
    
     # echo NO_OSNOISE_WORKLOAD > /sys/kernel/debug/tracing/osnoise/options
     # echo timerlat > /sys/kernel/debug/tracing/current_tracer
    
     # cat <<EOF > ./timerlat_load.py
     # !/usr/bin/env python3
    
     timerlat_fd = open("/sys/kernel/tracing/osnoise/per_cpu/cpu0/timerlat_fd", 'r')
     timerlat_fd.close();
     EOF
    
     # ./taskset -c 0 ./timerlat_load.py
    <BOOM>
    
     BUG: kernel NULL pointer dereference, address: 0000000000000010
     #PF: supervisor read access in kernel mode
     #PF: error_code(0x0000) - not-present page
     PGD 0 P4D 0
     Oops: 0000 [#1] PREEMPT SMP NOPTI
     CPU: 1 PID: 2673 Comm: python3 Not tainted 6.6.13-200.fc39.x86_64 #1
     Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39 04/01/2014
     RIP: 0010:hrtimer_active+0xd/0x50
     Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 57 30 <8b> 42 10 a8 01 74 09 f3 90 8b 42 10 a8 01 75 f7 80 7f 38 00 75 1d
     RSP: 0018:ffffb031009b7e10 EFLAGS: 00010286
     RAX: 000000000002db00 RBX: ffff9118f786db08 RCX: 0000000000000000
     RDX: 0000000000000000 RSI: ffff9117a0e64400 RDI: ffff9118f786db08
     RBP: ffff9118f786db80 R08: ffff9117a0ddd420 R09: ffff9117804d4f70
     R10: 0000000000000000 R11: 0000000000000000 R12: ffff9118f786db08
     R13: ffff91178fdd5e20 R14: ffff9117840978c0 R15: 0000000000000000
     FS:  00007f2ffbab1740(0000) GS:ffff9118f7840000(0000) knlGS:0000000000000000
     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     CR2: 0000000000000010 CR3: 00000001b402e000 CR4: 0000000000750ee0
     PKRU: 55555554
     Call Trace:
      <TASK>
      ? __die+0x23/0x70
      ? page_fault_oops+0x171/0x4e0
      ? srso_alias_return_thunk+0x5/0x7f
      ? avc_has_extended_perms+0x237/0x520
      ? exc_page_fault+0x7f/0x180
      ? asm_exc_page_fault+0x26/0x30
      ? hrtimer_active+0xd/0x50
      hrtimer_cancel+0x15/0x40
      timerlat_fd_release+0x48/0xe0
      __fput+0xf5/0x290
      __x64_sys_close+0x3d/0x80
      do_syscall_64+0x60/0x90
      ? srso_alias_return_thunk+0x5/0x7f
      ? __x64_sys_ioctl+0x72/0xd0
      ? srso_alias_return_thunk+0x5/0x7f
      ? syscall_exit_to_user_mode+0x2b/0x40
      ? srso_alias_return_thunk+0x5/0x7f
      ? do_syscall_64+0x6c/0x90
      ? srso_alias_return_thunk+0x5/0x7f
      ? exit_to_user_mode_prepare+0x142/0x1f0
      ? srso_alias_return_thunk+0x5/0x7f
      ? syscall_exit_to_user_mode+0x2b/0x40
      ? srso_alias_return_thunk+0x5/0x7f
      ? do_syscall_64+0x6c/0x90
      entry_SYSCALL_64_after_hwframe+0x6e/0xd8
     RIP: 0033:0x7f2ffb321594
     Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d d5 cd 0d 00 00 74 13 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3c c3 0f 1f 00 55 48 89 e5 48 83 ec 10 89 7d
     RSP: 002b:00007ffe8d8eef18 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
     RAX: ffffffffffffffda RBX: 00007f2ffba4e668 RCX: 00007f2ffb321594
     RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
     RBP: 00007ffe8d8eef40 R08: 0000000000000000 R09: 0000000000000000
     R10: 55c926e3167eae79 R11: 0000000000000202 R12: 0000000000000003
     R13: 00007ffe8d8ef030 R14: 0000000000000000 R15: 00007f2ffba4e668
      </TASK>
     CR2: 0000000000000010
     ---[ end trace 0000000000000000 ]---
    
    Move hrtimer_init to timerlat_fd open() to avoid this problem.
    
    Link: https://lore.kernel.org/linux-trace-kernel/7324dd3fc0035658c99b825204a66049389c56e3.1706798888.git.bristot@kernel.org
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: stable@vger.kernel.org
    Fixes: e88ed227f639 ("tracing/timerlat: Add user-space interface")
    Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracing/trigger: Fix to return error if failed to alloc snapshot [+ + +]

Author: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Date:   Fri Jan 26 09:42:58 2024 +0900

    tracing/trigger: Fix to return error if failed to alloc snapshot
    
    commit 0958b33ef5a04ed91f61cef4760ac412080c4e08 upstream.
    
    Fix register_snapshot_trigger() to return error code if it failed to
    allocate a snapshot instead of 0 (success). Unless that, it will register
    snapshot trigger without an error.
    
    Link: https://lore.kernel.org/linux-trace-kernel/170622977792.270660.2789298642759362200.stgit@devnote2
    
    Fixes: 0bbe7f719985 ("tracing: Fix the race between registering 'snapshot' event trigger and triggering 'snapshot' operation")
    Cc: stable@vger.kernel.org
    Cc: Vincent Donnefort <vdonnefort@google.com>
    Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracing: Fix a NULL vs IS_ERR() bug in event_subsystem_dir() [+ + +]

Author: Dan Carpenter <dan.carpenter@linaro.org>
Date:   Fri Oct 20 16:52:45 2023 +0300

    tracing: Fix a NULL vs IS_ERR() bug in event_subsystem_dir()
    
    commit 5264a2f4bb3baf712e19f1f053caaa8d7d3afa2e upstream.
    
    The eventfs_create_dir() function returns error pointers, it never returns
    NULL.  Update the check to reflect that.
    
    Link: https://lore.kernel.org/linux-trace-kernel/ff641474-84e2-46a7-9d7a-62b251a1050c@moroto.mountain
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Fixes: 5790b1fb3d67 ("eventfs: Remove eventfs_file and just use eventfs_inode")
    Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracing: Fix HAVE_DYNAMIC_FTRACE_WITH_REGS ifdef [+ + +]

Author: Petr Pavlu <petr.pavlu@suse.com>
Date:   Tue Feb 13 14:24:34 2024 +0100

    tracing: Fix HAVE_DYNAMIC_FTRACE_WITH_REGS ifdef
    
    commit bdbddb109c75365d22ec4826f480c5e75869e1cb upstream.
    
    Commit a8b9cf62ade1 ("ftrace: Fix DIRECT_CALLS to use SAVE_REGS by
    default") attempted to fix an issue with direct trampolines on x86, see
    its description for details. However, it wrongly referenced the
    HAVE_DYNAMIC_FTRACE_WITH_REGS config option and the problem is still
    present.
    
    Add the missing "CONFIG_" prefix for the logic to work as intended.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20240213132434.22537-1-petr.pavlu@suse.com
    
    Fixes: a8b9cf62ade1 ("ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default")
    Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracing: Fix wasted memory in saved_cmdlines logic [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Fri Feb 9 06:36:22 2024 -0500

    tracing: Fix wasted memory in saved_cmdlines logic
    
    commit 44dc5c41b5b1267d4dd037d26afc0c4d3a568acb upstream.
    
    While looking at improving the saved_cmdlines cache I found a huge amount
    of wasted memory that should be used for the cmdlines.
    
    The tracing data saves pids during the trace. At sched switch, if a trace
    occurred, it will save the comm of the task that did the trace. This is
    saved in a "cache" that maps pids to comms and exposed to user space via
    the /sys/kernel/tracing/saved_cmdlines file. Currently it only caches by
    default 128 comms.
    
    The structure that uses this creates an array to store the pids using
    PID_MAX_DEFAULT (which is usually set to 32768). This causes the structure
    to be of the size of 131104 bytes on 64 bit machines.
    
    In hex: 131104 = 0x20020, and since the kernel allocates generic memory in
    powers of two, the kernel would allocate 0x40000 or 262144 bytes to store
    this structure. That leaves 131040 bytes of wasted space.
    
    Worse, the structure points to an allocated array to store the comm names,
    which is 16 bytes times the amount of names to save (currently 128), which
    is 2048 bytes. Instead of allocating a separate array, make the structure
    end with a variable length string and use the extra space for that.
    
    This is similar to a recommendation that Linus had made about eventfs_inode names:
    
      https://lore.kernel.org/all/20240130190355.11486-5-torvalds@linux-foundation.org/
    
    Instead of allocating a separate string array to hold the saved comms,
    have the structure end with: char saved_cmdlines[]; and round up to the
    next power of two over sizeof(struct saved_cmdline_buffers) + num_cmdlines * TASK_COMM_LEN
    It will use this extra space for the saved_cmdline portion.
    
    Now, instead of saving only 128 comms by default, by using this wasted
    space at the end of the structure it can save over 8000 comms and even
    saves space by removing the need for allocating the other array.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20240209063622.1f7b6d5f@rorschach.local.home
    
    Cc: stable@vger.kernel.org
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Vincent Donnefort <vdonnefort@google.com>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Cc: Mete Durlu <meted@linux.ibm.com>
    Fixes: 939c7a4f04fcd ("tracing: Introduce saved_cmdlines_size file")
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracing: Inform kmemleak of saved_cmdlines allocation [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Wed Feb 14 11:20:46 2024 -0500

    tracing: Inform kmemleak of saved_cmdlines allocation
    
    commit 2394ac4145ea91b92271e675a09af2a9ea6840b7 upstream.
    
    The allocation of the struct saved_cmdlines_buffer structure changed from:
    
            s = kmalloc(sizeof(*s), GFP_KERNEL);
            s->saved_cmdlines = kmalloc_array(TASK_COMM_LEN, val, GFP_KERNEL);
    
    to:
    
            orig_size = sizeof(*s) + val * TASK_COMM_LEN;
            order = get_order(orig_size);
            size = 1 << (order + PAGE_SHIFT);
            page = alloc_pages(GFP_KERNEL, order);
            if (!page)
                    return NULL;
    
            s = page_address(page);
            memset(s, 0, sizeof(*s));
    
            s->saved_cmdlines = kmalloc_array(TASK_COMM_LEN, val, GFP_KERNEL);
    
    Where that s->saved_cmdlines allocation looks to be a dangling allocation
    to kmemleak. That's because kmemleak only keeps track of kmalloc()
    allocations. For allocations that use page_alloc() directly, the kmemleak
    needs to be explicitly informed about it.
    
    Add kmemleak_alloc() and kmemleak_free() around the page allocation so
    that it doesn't give the following false positive:
    
    unreferenced object 0xffff8881010c8000 (size 32760):
      comm "swapper", pid 0, jiffies 4294667296
      hex dump (first 32 bytes):
        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
      backtrace (crc ae6ec1b9):
        [<ffffffff86722405>] kmemleak_alloc+0x45/0x80
        [<ffffffff8414028d>] __kmalloc_large_node+0x10d/0x190
        [<ffffffff84146ab1>] __kmalloc+0x3b1/0x4c0
        [<ffffffff83ed7103>] allocate_cmdlines_buffer+0x113/0x230
        [<ffffffff88649c34>] tracer_alloc_buffers.isra.0+0x124/0x460
        [<ffffffff8864a174>] early_trace_init+0x14/0xa0
        [<ffffffff885dd5ae>] start_kernel+0x12e/0x3c0
        [<ffffffff885f5758>] x86_64_start_reservations+0x18/0x30
        [<ffffffff885f582b>] x86_64_start_kernel+0x7b/0x80
        [<ffffffff83a001c3>] secondary_startup_64_no_verify+0x15e/0x16b
    
    Link: https://lore.kernel.org/linux-trace-kernel/87r0hfnr9r.fsf@kernel.org/
    Link: https://lore.kernel.org/linux-trace-kernel/20240214112046.09a322d6@gandalf.local.home
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Fixes: 44dc5c41b5b1 ("tracing: Fix wasted memory in saved_cmdlines logic")
    Reported-by: Kalle Valo <kvalo@kernel.org>
    Tested-by: Kalle Valo <kvalo@kernel.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracing: Make system_callback() function static [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Thu Oct 5 10:47:45 2023 -0400

    tracing: Make system_callback() function static
    
    commit 5ddd8baa4857709b4e5d84b376d735152851955b upstream.
    
    The system_callback() function in trace_events.c is only used within that
    file. The "static" annotation was missed.
    
    Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode")
    Reported-by: kernel test robot <lkp@intel.com>
    Closes: https://lore.kernel.org/oe-kbuild-all/202310051743.y9EobbUr-lkp@intel.com/
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

um: Fix adding '-no-pie' for clang [+ + +]

Author: Nathan Chancellor <nathan@kernel.org>
Date:   Tue Jan 23 15:59:54 2024 -0700

    um: Fix adding '-no-pie' for clang
    
    commit 846cfbeed09b45d985079a9173cf390cc053715b upstream.
    
    The kernel builds with -fno-PIE, so commit 883354afbc10 ("um: link
    vmlinux with -no-pie") added the compiler linker flag '-no-pie' via
    cc-option because '-no-pie' was only supported in GCC 6.1.0 and newer.
    
    While this works for GCC, this does not work for clang because cc-option
    uses '-c', which stops the pipeline right before linking, so '-no-pie'
    is unconsumed and clang warns, causing cc-option to fail just as it
    would if the option was entirely unsupported:
    
      $ clang -Werror -no-pie -c -o /dev/null -x c /dev/null
      clang-16: error: argument unused during compilation: '-no-pie' [-Werror,-Wunused-command-line-argument]
    
    A recent version of clang exposes this because it generates a relocation
    under '-mcmodel=large' that is not supported in PIE mode:
    
      /usr/sbin/ld: init/main.o: relocation R_X86_64_32 against symbol `saved_command_line' can not be used when making a PIE object; recompile with -fPIE
      /usr/sbin/ld: failed to set dynamic section sizes: bad value
      clang: error: linker command failed with exit code 1 (use -v to see invocation)
    
    Remove the cc-option check altogether. It is wasteful to invoke the
    compiler to check for '-no-pie' because only one supported compiler
    version does not support it, GCC 5.x (as it is supported with the
    minimum version of clang and GCC 6.1.0+). Use a combination of the
    gcc-min-version macro and CONFIG_CC_IS_CLANG to unconditionally add
    '-no-pie' with CONFIG_LD_SCRIPT_DYN=y, so that it is enabled with all
    compilers that support this. Furthermore, using gcc-min-version can help
    turn this back into
    
      LINK-$(CONFIG_LD_SCRIPT_DYN) += -no-pie
    
    when the minimum version of GCC is bumped past 6.1.0.
    
    Cc: stable@vger.kernel.org
    Closes: https://github.com/ClangBuiltLinux/linux/issues/1982
    Signed-off-by: Nathan Chancellor <nathan@kernel.org>
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Linux: update workarounds for gcc "asm goto" issue [+ + +]

Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Thu Feb 15 11:14:33 2024 -0800

    update workarounds for gcc "asm goto" issue
    
    commit 68fb3ca0e408e00db1c3f8fccdfa19e274c033be upstream.
    
    In commit 4356e9f841f7 ("work around gcc bugs with 'asm goto' with
    outputs") I did the gcc workaround unconditionally, because the cause of
    the bad code generation wasn't entirely clear.
    
    In the meantime, Jakub Jelinek debugged the issue, and has come up with
    a fix in gcc [2], which also got backported to the still maintained
    branches of gcc-11, gcc-12 and gcc-13.
    
    Note that while the fix technically wasn't in the original gcc-14
    branch, Jakub says:
    
     "while it is true that no GCC 14 snapshots until today (or whenever the
      fix will be committed) have the fix, for GCC trunk it is up to the
      distros to use the latest snapshot if they use it at all and would
      allow better testing of the kernel code without the workaround, so
      that if there are other issues they won't be discovered years later.
      Most userland code doesn't actually use asm goto with outputs..."
    
    so we will consider gcc-14 to be fixed - if somebody is using gcc
    snapshots of the gcc-14 before the fix, they should upgrade.
    
    Note that while the bug goes back to gcc-11, in practice other gcc
    changes seem to have effectively hidden it since gcc-12.1 as per a
    bisect by Jakub.  So even a gcc-14 snapshot without the fix likely
    doesn't show actual problems.
    
    Also, make the default 'asm_goto_output()' macro mark the asm as
    volatile by hand, because of an unrelated gcc issue [1] where it doesn't
    match the documented behavior ("asm goto is always volatile").
    
    Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103979 [1]
    Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 [2]
    Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/
    Requested-by: Jakub Jelinek <jakub@redhat.com>
    Cc: Uros Bizjak <ubizjak@gmail.com>
    Cc: Nick Desaulniers <ndesaulniers@google.com>
    Cc: Sean Christopherson <seanjc@google.com>
    Cc: Andrew Pinski <quic_apinski@quicinc.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: chipidea: core: handle power lost in workqueue [+ + +]

Author: Xu Yang <xu.yang_2@nxp.com>
Date:   Fri Jan 19 20:35:37 2024 +0800

    usb: chipidea: core: handle power lost in workqueue
    
    commit cc509b6a47e7c8998d9e41c273191299d5d9d631 upstream.
    
    When power is recycled in usb controller during system power management,
    the controller will recognize it and switch role if role has been changed
    during power lost. In current design, it will be completed in resume()
    function. However, this may bring issues since usb class devices have
    their pm operations too and these device's resume() functions are still
    not being called at this point. When usb controller recognized host role
    should be stopped, these usb class devices will be removed at this point.
    But these usb class devices can't be removed in some cases, such as scsi
    devices. Since scsi driver may sync data to U-disk, however it will block
    there because scsi drvier can only handle pm request when is in suspended
    state. Therefore, there may exist a dependency between ci_resume() and usb
    class device's resume(). To break this potential dependency, we need to
    handle power lost work in a workqueue.
    
    Fixes: 74494b33211d ("usb: chipidea: core: add controller resume support when controller is powered off")
    cc: stable@vger.kernel.org
    Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
    Link: https://lore.kernel.org/r/20240119123537.3614838-1-xu.yang_2@nxp.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: core: Prevent null pointer dereference in update_port_device_state [+ + +]

Author: Udipto Goswami <quic_ugoswami@quicinc.com>
Date:   Wed Jan 10 15:28:14 2024 +0530

    usb: core: Prevent null pointer dereference in update_port_device_state
    
    commit 12783c0b9e2c7915a50d5ec829630ff2da50472c upstream.
    
    Currently, the function update_port_device_state gets the usb_hub from
    udev->parent by calling usb_hub_to_struct_hub.
    However, in case the actconfig or the maxchild is 0, the usb_hub would
    be NULL and upon further accessing to get port_dev would result in null
    pointer dereference.
    
    Fix this by introducing an if check after the usb_hub is populated.
    
    Fixes: 83cb2604f641 ("usb: core: add sysfs entry for usb device state")
    Cc: stable@vger.kernel.org
    Signed-off-by: Udipto Goswami <quic_ugoswami@quicinc.com>
    Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
    Link: https://lore.kernel.org/r/20240110095814.7626-1-quic_ugoswami@quicinc.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: dwc3: gadget: Fix NULL pointer dereference in dwc3_gadget_suspend [+ + +]

Author: Uttkarsh Aggarwal <quic_uaggarwa@quicinc.com>
Date:   Fri Jan 19 15:18:25 2024 +0530

    usb: dwc3: gadget: Fix NULL pointer dereference in dwc3_gadget_suspend
    
    commit 61a348857e869432e6a920ad8ea9132e8d44c316 upstream.
    
    In current scenario if Plug-out and Plug-In performed continuously
    there could be a chance while checking for dwc->gadget_driver in
    dwc3_gadget_suspend, a NULL pointer dereference may occur.
    
    Call Stack:
    
            CPU1:                           CPU2:
            gadget_unbind_driver            dwc3_suspend_common
            dwc3_gadget_stop                dwc3_gadget_suspend
                                            dwc3_disconnect_gadget
    
    CPU1 basically clears the variable and CPU2 checks the variable.
    Consider CPU1 is running and right before gadget_driver is cleared
    and in parallel CPU2 executes dwc3_gadget_suspend where it finds
    dwc->gadget_driver which is not NULL and resumes execution and then
    CPU1 completes execution. CPU2 executes dwc3_disconnect_gadget where
    it checks dwc->gadget_driver is already NULL because of which the
    NULL pointer deference occur.
    
    Cc: stable@vger.kernel.org
    Fixes: 9772b47a4c29 ("usb: dwc3: gadget: Fix suspend/resume during device mode")
    Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
    Signed-off-by: Uttkarsh Aggarwal <quic_uaggarwa@quicinc.com>
    Link: https://lore.kernel.org/r/20240119094825.26530-1-quic_uaggarwa@quicinc.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: f_mass_storage: forbid async queue when shutdown happen [+ + +]

Author: yuan linyu <yuanlinyu@hihonor.com>
Date:   Tue Jan 23 11:48:29 2024 +0800

    usb: f_mass_storage: forbid async queue when shutdown happen
    
    commit b2d2d7ea0dd09802cf5a0545bf54d8ad8987d20c upstream.
    
    When write UDC to empty and unbind gadget driver from gadget device, it is
    possible that there are many queue failures for mass storage function.
    
    The root cause is mass storage main thread alaways try to queue request to
    receive a command from host if running flag is on, on platform like dwc3,
    if pull down called, it will not queue request again and return
    -ESHUTDOWN, but it not affect running flag of mass storage function.
    
    Check return code from mass storage function and clear running flag if it
    is -ESHUTDOWN, also indicate start in/out transfer failure to break loops.
    
    Cc: stable <stable@kernel.org>
    Signed-off-by: yuan linyu <yuanlinyu@hihonor.com>
    Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
    Link: https://lore.kernel.org/r/20240123034829.3848409-1-yuanlinyu@hihonor.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: hub: check for alternate port before enabling A_ALT_HNP_SUPPORT [+ + +]

Author: Oliver Neukum <oneukum@suse.com>
Date:   Mon Jan 22 16:35:32 2024 +0100

    USB: hub: check for alternate port before enabling A_ALT_HNP_SUPPORT
    
    commit f17c34ffc792bbb520e4b61baa16b6cfc7d44b13 upstream.
    
    The OTG 1.3 spec has the feature A_ALT_HNP_SUPPORT, which tells
    a device that it is connected to the wrong port. Some devices
    refuse to operate if you enable that feature, because it indicates
    to them that they ought to request to be connected to another port.
    
    According to the spec this feature may be used based only the following
    three conditions:
    
    6.5.3 a_alt_hnp_support
    Setting this feature indicates to the B-device that it is connected to
    an A-device port that is not capable of HNP, but that the A-device does
    have an alternate port that is capable of HNP.
    The A-device is required to set this feature under the following conditions:
    Б─╒ the A-device has multiple receptacles
    Б─╒ the A-device port that connects to the B-device does not support HNP
    Б─╒ the A-device has another port that does support HNP
    
    A check for the third and first condition is missing. Add it.
    
    Signed-off-by: Oliver Neukum <oneukum@suse.com>
    Cc: stable <stable@kernel.org>
    Fixes: 7d2d641c44269 ("usb: otg: don't set a_alt_hnp_support feature for OTG 2.0 device")
    Link: https://lore.kernel.org/r/20240122153545.12284-1-oneukum@suse.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: typec: tpcm: Fix issues with power being removed during reset [+ + +]

Author: Mark Brown <broonie@kernel.org>
Date:   Mon Feb 12 18:42:13 2024 +0000

    usb: typec: tpcm: Fix issues with power being removed during reset
    
    commit 69f89168b310878be82d7d97bc0d22068ad858c0 upstream.
    
    Since the merge of b717dfbf73e8 ("Revert "usb: typec: tcpm: fix
    cc role at port reset"") into mainline the LibreTech Renegade
    Elite/Firefly has died during boot, the main symptom observed in testing
    is a sudden stop in console output.  Gц║bor Stefanik identified in review
    that the patch would cause power to be removed from devices without
    batteries (like this board), observing that while the patch is correct
    according to the spec this appears to be an oversight in the spec.
    
    Given that the change makes previously working systems unusable let's
    revert it, there was some discussion of identifying systems that have
    alternative power and implementing the standards conforming behaviour in
    only that case.
    
    Fixes: b717dfbf73e8 ("Revert "usb: typec: tcpm: fix cc role at port reset"")
    Cc: stable <stable@kernel.org>
    Cc: Badhri Jagan Sridharan <badhri@google.com>
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
    Reviewed-by: Guenter Roeck <linux@roeck-us.net>
    Link: https://lore.kernel.org/r/20240212-usb-fix-renegade-v1-1-22c43c88d635@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: ucsi: Add missing ppm_lock [+ + +]

Author: Christian A. Ehrhardt <lk@c--e.de>
Date:   Sun Jan 21 21:41:21 2024 +0100

    usb: ucsi: Add missing ppm_lock
    
    commit c9aed03a0a683fd1600ea92f2ad32232d4736272 upstream.
    
    Calling ->sync_write must be done while holding the PPM lock as
    the mailbox logic does not support concurrent commands.
    
    At least since the addition of partner task this means that
    ucsi_acknowledge_connector_change should be called with the
    PPM lock held as it calls ->sync_write.
    
    Thus protect the only call to ucsi_acknowledge_connector_change
    with the PPM. All other calls to ->sync_write already happen
    under the PPM lock.
    
    Fixes: b9aa02ca39a4 ("usb: typec: ucsi: Add polling mechanism for partner tasks like alt mode checking")
    Cc: stable@vger.kernel.org
    Signed-off-by: "Christian A. Ehrhardt" <lk@c--e.de>
    Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
    Link: https://lore.kernel.org/r/20240121204123.275441-2-lk@c--e.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: ucsi_acpi: Fix command completion handling [+ + +]

Author: Christian A. Ehrhardt <lk@c--e.de>
Date:   Sun Jan 21 21:41:22 2024 +0100

    usb: ucsi_acpi: Fix command completion handling
    
    commit 2840143e393a4ddc1caab4372969ea337371168c upstream.
    
    In case of a spurious or otherwise delayed notification it is
    possible that CCI still reports the previous completion. The
    UCSI spec is aware of this and provides two completion bits in
    CCI, one for normal commands and one for acks. As acks and commands
    alternate the notification handler can determine if the completion
    bit is from the current command.
    
    The initial UCSI code correctly handled this but the distinction
    between the two completion bits was lost with the introduction of
    the new API.
    
    To fix this revive the ACK_PENDING bit for ucsi_acpi and only complete
    commands if the completion bit matches.
    
    Fixes: f56de278e8ec ("usb: typec: ucsi: acpi: Move to the new API")
    Cc: stable@vger.kernel.org
    Signed-off-by: "Christian A. Ehrhardt" <lk@c--e.de>
    Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
    Link: https://lore.kernel.org/r/20240121204123.275441-3-lk@c--e.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: ulpi: Fix debugfs directory leak [+ + +]

Author: Sean Anderson <sean.anderson@seco.com>
Date:   Fri Jan 26 17:38:00 2024 -0500

    usb: ulpi: Fix debugfs directory leak
    
    commit 3caf2b2ad7334ef35f55b95f3e1b138c6f77b368 upstream.
    
    The ULPI per-device debugfs root is named after the ulpi device's
    parent, but ulpi_unregister_interface tries to remove a debugfs
    directory named after the ulpi device itself. This results in the
    directory sticking around and preventing subsequent (deferred) probes
    from succeeding. Change the directory name to match the ulpi device.
    
    Fixes: bd0a0a024f2a ("usb: ulpi: Add debugfs support")
    Cc: stable@vger.kernel.org
    Signed-off-by: Sean Anderson <sean.anderson@seco.com>
    Link: https://lore.kernel.org/r/20240126223800.2864613-1-sean.anderson@seco.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

userfaultfd: fix mmap_changing checking in mfill_atomic_hugetlb [+ + +]

Author: Lokesh Gidra <lokeshgidra@google.com>
Date:   Wed Jan 17 14:37:29 2024 -0800

    userfaultfd: fix mmap_changing checking in mfill_atomic_hugetlb
    
    commit 67695f18d55924b2013534ef3bdc363bc9e14605 upstream.
    
    In mfill_atomic_hugetlb(), mmap_changing isn't being checked
    again if we drop mmap_lock and reacquire it. When the lock is not held,
    mmap_changing could have been incremented. This is also inconsistent
    with the behavior in mfill_atomic().
    
    Link: https://lkml.kernel.org/r/20240117223729.1444522-1-lokeshgidra@google.com
    Fixes: df2cc96e77011 ("userfaultfd: prevent non-cooperative events vs mcopy_atomic races")
    Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Mike Rapoport <rppt@kernel.org>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: Brian Geffon <bgeffon@google.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Jann Horn <jannh@google.com>
    Cc: Kalesh Singh <kaleshsingh@google.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Nicolas Geoffray <ngeoffray@google.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Suren Baghdasaryan <surenb@google.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wifi: cfg80211: fix wiphy delayed work queueing [+ + +]

Author: Johannes Berg <johannes.berg@intel.com>
Date:   Thu Jan 25 09:51:09 2024 +0100

    wifi: cfg80211: fix wiphy delayed work queueing
    
    commit b743287d7a0007493f5cada34ed2085d475050b4 upstream.
    
    When a wiphy work is queued with timer, and then again
    without a delay, it's started immediately but *also*
    started again after the timer expires. This can lead,
    for example, to warnings in mac80211's offchannel code
    as reported by Jouni. Running the same work twice isn't
    expected, of course. Fix this by deleting the timer at
    this point, when queuing immediately due to delay=0.
    
    Cc: stable@vger.kernel.org
    Reported-by: Jouni Malinen <j@w1.fi>
    Fixes: a3ee4dc84c4e ("wifi: cfg80211: add a work abstraction with special semantics")
    Link: https://msgid.link/20240125095108.2feb0eaaa446.I4617f3210ed0e7f252290d5970dac6a876aa595b@changeid
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wifi: iwlwifi: fix double-free bug [+ + +]

Author: Johannes Berg <johannes.berg@intel.com>
Date:   Tue Jan 23 20:08:11 2024 +0200

    wifi: iwlwifi: fix double-free bug
    
    commit 353d321f63f7dbfc9ef58498cc732c9fe886a596 upstream.
    
    The storage for the TLV PC register data wasn't done like all
    the other storage in the drv->fw area, which is cleared at the
    end of deallocation. Therefore, the freeing must also be done
    differently, explicitly NULL'ing it out after the free, since
    otherwise there's a nasty double-free bug here if a file fails
    to load after this has been parsed, and we get another free
    later (e.g. because no other file exists.) Fix that by adding
    the missing NULL assignment.
    
    Cc: stable@vger.kernel.org
    Fixes: 5e31b3df86ec ("wifi: iwlwifi: dbg: print pc register data once fw dump occurred")
    Reported-by: Guy Kaplan <guy.kaplan@intel.com>
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Reviewed-by: Gregory Greenman <gregory.greenman@intel.com>
    Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
    Link: https://msgid.link/20240123200528.675f3c24ec0d.I6ab4015cd78d82dd95471f840629972ef0331de3@changeid
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wifi: iwlwifi: Fix some error codes [+ + +]

Author: Dan Carpenter <dan.carpenter@linaro.org>
Date:   Thu Feb 8 13:17:06 2024 +0300

    wifi: iwlwifi: Fix some error codes
    
    [ Upstream commit c6ebb5b67641994de8bc486b33457fe0b681d6fe ]
    
    This saves the error as PTR_ERR(wifi_pkg).  The problem is that
    "wifi_pkg" is a valid pointer, not an error pointer.  Set the error code
    to -EINVAL instead.
    
    Fixes: 2a8084147bff ("iwlwifi: acpi: support reading and storing WRDS revision 1 and 2")
    Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
    Link: https://msgid.link/9620bb77-2d7c-4d76-b255-ad824ebf8e35@moroto.mountain
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: iwlwifi: mvm: fix a crash when we run out of stations [+ + +]

Author: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Date:   Tue Feb 6 18:02:04 2024 +0200

    wifi: iwlwifi: mvm: fix a crash when we run out of stations
    
    commit b7198383ef2debe748118996f627452281cf27d7 upstream.
    
    A DoS tool that injects loads of authentication frames made our AP
    crash. The iwl_mvm_is_dup() function couldn't find the per-queue
    dup_data which was not allocated.
    
    The root cause for that is that we ran out of stations in the firmware
    and we didn't really add the station to the firmware, yet we didn't
    return an error to mac80211.
    Mac80211 was thinking that we have the station and because of that,
    sta_info::uploaded was set to 1. This allowed
    ieee80211_find_sta_by_ifaddr() to return a valid station object, but
    that ieee80211_sta didn't have any iwl_mvm_sta object initialized and
    that caused the crash mentioned earlier when we got Rx on that station.
    
    Cc: stable@vger.kernel.org
    Fixes: 57974a55d995 ("wifi: iwlwifi: mvm: refactor iwl_mvm_mac_sta_state_common()")
    Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
    Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
    Link: https://msgid.link/20240206175739.1f76c44b2486.I6a00955e2842f15f0a089db2f834adb9d10fbe35@changeid
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wifi: iwlwifi: uninitialized variable in iwl_acpi_get_ppag_table() [+ + +]

Author: Dan Carpenter <dan.carpenter@linaro.org>
Date:   Thu Feb 8 13:17:31 2024 +0300

    wifi: iwlwifi: uninitialized variable in iwl_acpi_get_ppag_table()
    
    [ Upstream commit 65c6ee90455053cfd3067c17aaa4a42b0c766543 ]
    
    This is an error path and Smatch complains that "tbl_rev" is uninitialized
    on this path.  All the other functions follow this same patter where they
    set the error code and goto out_free so that's probably what was intended
    here as well.
    
    Fixes: e8e10a37c51c ("iwlwifi: acpi: move ppag code from mvm to fw/acpi")
    Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
    Link: https://msgid.link/09900c01-6540-4a32-9451-563da0029cb6@moroto.mountain
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: mac80211: reload info pointer in ieee80211_tx_dequeue() [+ + +]

Author: Johannes Berg <johannes.berg@intel.com>
Date:   Wed Jan 31 16:49:10 2024 +0100

    wifi: mac80211: reload info pointer in ieee80211_tx_dequeue()
    
    commit c98d8836b817d11fdff4ca7749cbbe04ff7f0c64 upstream.
    
    This pointer can change here since the SKB can change, so we
    actually later open-coded IEEE80211_SKB_CB() again. Reload
    the pointer where needed, so the monitor-mode case using it
    gets fixed, and then use info-> later as well.
    
    Cc: stable@vger.kernel.org
    Fixes: 531682159092 ("mac80211: fix VLAN handling with TXQs")
    Link: https://msgid.link/20240131164910.b54c28d583bc.I29450cec84ea6773cff5d9c16ff92b836c331471@changeid
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

work around gcc bugs with : 'asm goto' with outputs [+ + +]

Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Fri Feb 9 12:39:31 2024 -0800

    work around gcc bugs with 'asm goto' with outputs
    
    commit 4356e9f841f7fbb945521cef3577ba394c65f3fc upstream.
    
    We've had issues with gcc and 'asm goto' before, and we created a
    'asm_volatile_goto()' macro for that in the past: see commits
    3f0116c3238a ("compiler/gcc4: Add quirk for 'asm goto' miscompilation
    bug") and a9f180345f53 ("compiler/gcc4: Make quirk for
    asm_volatile_goto() unconditional").
    
    Then, much later, we ended up removing the workaround in commit
    43c249ea0b1e ("compiler-gcc.h: remove ancient workaround for gcc PR
    58670") because we no longer supported building the kernel with the
    affected gcc versions, but we left the macro uses around.
    
    Now, Sean Christopherson reports a new version of a very similar
    problem, which is fixed by re-applying that ancient workaround.  But the
    problem in question is limited to only the 'asm goto with outputs'
    cases, so instead of re-introducing the old workaround as-is, let's
    rename and limit the workaround to just that much less common case.
    
    It looks like there are at least two separate issues that all hit in
    this area:
    
     (a) some versions of gcc don't mark the asm goto as 'volatile' when it
         has outputs:
    
            https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619
            https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420
    
         which is easy to work around by just adding the 'volatile' by hand.
    
     (b) Internal compiler errors:
    
            https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422
    
         which are worked around by adding the extra empty 'asm' as a
         barrier, as in the original workaround.
    
    but the problem Sean sees may be a third thing since it involves bad
    code generation (not an ICE) even with the manually added 'volatile'.
    
    but the same old workaround works for this case, even if this feels a
    bit like voodoo programming and may only be hiding the issue.
    
    Reported-and-tested-by: Sean Christopherson <seanjc@google.com>
    Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/
    Cc: Nick Desaulniers <ndesaulniers@google.com>
    Cc: Uros Bizjak <ubizjak@gmail.com>
    Cc: Jakub Jelinek <jakub@redhat.com>
    Cc: Andrew Pinski <quic_apinski@quicinc.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/barrier: Do not serialize MSR accesses on AMD [+ + +]

Author: Borislav Petkov (AMD) <bp@alien8.de>
Date:   Fri Oct 27 14:24:16 2023 +0200

    x86/barrier: Do not serialize MSR accesses on AMD
    
    commit 04c3024560d3a14acd18d0a51a1d0a89d29b7eb5 upstream.
    
    AMD does not have the requirement for a synchronization barrier when
    acccessing a certain group of MSRs. Do not incur that unnecessary
    penalty there.
    
    There will be a CPUID bit which explicitly states that a MFENCE is not
    needed. Once that bit is added to the APM, this will be extended with
    it.
    
    While at it, move to processor.h to avoid include hell. Untangling that
    file properly is a matter for another day.
    
    Some notes on the performance aspect of why this is relevant, courtesy
    of Kishon VijayAbraham <Kishon.VijayAbraham@amd.com>:
    
    On a AMD Zen4 system with 96 cores, a modified ipi-bench[1] on a VM
    shows x2AVIC IPI rate is 3% to 4% lower than AVIC IPI rate. The
    ipi-bench is modified so that the IPIs are sent between two vCPUs in the
    same CCX. This also requires to pin the vCPU to a physical core to
    prevent any latencies. This simulates the use case of pinning vCPUs to
    the thread of a single CCX to avoid interrupt IPI latency.
    
    In order to avoid run-to-run variance (for both x2AVIC and AVIC), the
    below configurations are done:
    
      1) Disable Power States in BIOS (to prevent the system from going to
         lower power state)
    
      2) Run the system at fixed frequency 2500MHz (to prevent the system
         from increasing the frequency when the load is more)
    
    With the above configuration:
    
    *) Performance measured using ipi-bench for AVIC:
      Average Latency:  1124.98ns [Time to send IPI from one vCPU to another vCPU]
    
      Cumulative throughput: 42.6759M/s [Total number of IPIs sent in a second from
                                         48 vCPUs simultaneously]
    
    *) Performance measured using ipi-bench for x2AVIC:
      Average Latency:  1172.42ns [Time to send IPI from one vCPU to another vCPU]
    
      Cumulative throughput: 40.9432M/s [Total number of IPIs sent in a second from
                                         48 vCPUs simultaneously]
    
    From above, x2AVIC latency is ~4% more than AVIC. However, the expectation is
    x2AVIC performance to be better or equivalent to AVIC. Upon analyzing
    the perf captures, it is observed significant time is spent in
    weak_wrmsr_fence() invoked by x2apic_send_IPI().
    
    With the fix to skip weak_wrmsr_fence()
    
    *) Performance measured using ipi-bench for x2AVIC:
      Average Latency:  1117.44ns [Time to send IPI from one vCPU to another vCPU]
    
      Cumulative throughput: 42.9608M/s [Total number of IPIs sent in a second from
                                         48 vCPUs simultaneously]
    
    Comparing the performance of x2AVIC with and without the fix, it can be seen
    the performance improves by ~4%.
    
    Performance captured using an unmodified ipi-bench using the 'mesh-ipi' option
    with and without weak_wrmsr_fence() on a Zen4 system also showed significant
    performance improvement without weak_wrmsr_fence(). The 'mesh-ipi' option ignores
    CCX or CCD and just picks random vCPU.
    
      Average throughput (10 iterations) with weak_wrmsr_fence(),
            Cumulative throughput: 4933374 IPI/s
    
      Average throughput (10 iterations) without weak_wrmsr_fence(),
            Cumulative throughput: 6355156 IPI/s
    
    [1] https://github.com/bytedance/kvm-utils/tree/master/microbenchmark/ipi-bench
    
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20230622095212.20940-1-bp@alien8.de
    Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/boot: Construct PE/COFF .text section from assembler [+ + +]

Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Fri Sep 15 17:16:29 2023 +0000

    x86/boot: Construct PE/COFF .text section from assembler
    
    commit efa089e63b56bdc5eca754b995cb039dd7a5457e upstream.
    
    Now that the size of the setup block is visible to the assembler, it is
    possible to populate the PE/COFF header fields from the asm code
    directly, instead of poking the values into the binary using the build
    tool. This will make it easier to reorganize the section layout without
    having to tweak the build tool in lockstep.
    
    This change has no impact on the resulting bzImage binary.
    
    Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20230915171623.655440-15-ardb@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/boot: Define setup size in linker script [+ + +]

Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Fri Sep 15 17:16:27 2023 +0000

    x86/boot: Define setup size in linker script
    
    commit 093ab258e3fb1d1d3afdfd4a69403d44ce90e360 upstream.
    
    The setup block contains the real mode startup code that is used when
    booting from a legacy BIOS, along with the boot_params/setup_data that
    is used by legacy x86 bootloaders to pass the command line and initial
    ramdisk parameters, among other things.
    
    The setup block also contains the PE/COFF header of the entire combined
    image, which includes the compressed kernel image, the decompressor and
    the EFI stub.
    
    This PE header describes the layout of the executable image in memory,
    and currently, the fact that the setup block precedes it makes it rather
    fiddly to get the right values into the right place in the final image.
    
    Let's make things a bit easier by defining the setup_size in the linker
    script so it can be referenced from the asm code directly, rather than
    having to rely on the build tool to calculate it. For the time being,
    add 64 bytes of fixed padding for the .reloc and .compat sections - this
    will be removed in a subsequent patch after the PE/COFF header has been
    reorganized.
    
    This change has no impact on the resulting bzImage binary when
    configured with CONFIG_EFI_MIXED=y.
    
    Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20230915171623.655440-13-ardb@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/boot: Derive file size from _edata symbol [+ + +]

Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Fri Sep 15 17:16:28 2023 +0000

    x86/boot: Derive file size from _edata symbol
    
    commit aeb92067f6ae994b541d7f9752fe54ed3d108bcc upstream.
    
    Tweak the linker script so that the value of _edata represents the
    decompressor binary's file size rounded up to the appropriate alignment.
    This removes the need to calculate it in the build tool, and will make
    it easier to refer to the file size from the header directly in
    subsequent changes to the PE header layout.
    
    While adding _edata to the sed regex that parses the compressed
    vmlinux's symbol list, tweak the regex a bit for conciseness.
    
    This change has no impact on the resulting bzImage binary when
    configured with CONFIG_EFI_STUB=y.
    
    Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20230915171623.655440-14-ardb@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/boot: Drop PE/COFF .reloc section [+ + +]

Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Fri Sep 15 17:16:30 2023 +0000

    x86/boot: Drop PE/COFF .reloc section
    
    commit fa5750521e0a4efbc1af05223da9c4bbd6c21c83 upstream.
    
    Ancient buggy EFI loaders may have required a .reloc section to be
    present at some point in time, but this has not been true for a long
    time so the .reloc section can just be dropped.
    
    Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20230915171623.655440-16-ardb@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/boot: Drop redundant code setting the root device [+ + +]

Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Tue Sep 12 09:00:57 2023 +0000

    x86/boot: Drop redundant code setting the root device
    
    commit 7448e8e5d15a3c4df649bf6d6d460f78396f7e1e upstream.
    
    The root device defaults to 0,0 and is no longer configurable at build
    time [0], so there is no need for the build tool to ever write to this
    field.
    
    [0] 079f85e624189292 ("x86, build: Do not set the root_dev field in bzImage")
    
    This change has no impact on the resulting bzImage binary.
    
    Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20230912090051.4014114-23-ardb@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/boot: Drop references to startup_64 [+ + +]

Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Tue Sep 12 09:00:59 2023 +0000

    x86/boot: Drop references to startup_64
    
    commit b618d31f112bea3d2daea19190d63e567f32a4db upstream.
    
    The x86 boot image generation tool assign a default value to startup_64
    and subsequently parses the actual value from zoffset.h but it never
    actually uses the value anywhere. So remove this code.
    
    This change has no impact on the resulting bzImage binary.
    
    Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20230912090051.4014114-25-ardb@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/boot: Grab kernel_info offset from zoffset header directly [+ + +]

Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Fri Sep 15 17:16:25 2023 +0000

    x86/boot: Grab kernel_info offset from zoffset header directly
    
    commit 2e765c02dcbfc2a8a4527c621a84b9502f6b9bd2 upstream.
    
    Instead of parsing zoffset.h and poking the kernel_info offset value
    into the header from the build tool, just grab the value directly in the
    asm file that describes this header.
    
    This change has no impact on the resulting bzImage binary.
    
    Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20230915171623.655440-11-ardb@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/boot: Increase section and file alignment to 4k/512 [+ + +]

Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Fri Sep 15 17:16:32 2023 +0000

    x86/boot: Increase section and file alignment to 4k/512
    
    commit 3e3eabe26dc88692d34cf76ca0e0dd331481cc15 upstream.
    
    Align x86 with other EFI architectures, and increase the section
    alignment to the EFI page size (4k), so that firmware is able to honour
    the section permission attributes and map code read-only and data
    non-executable.
    
    There are a number of requirements that have to be taken into account:
    - the sign tools get cranky when there are gaps between sections in the
      file view of the image
    - the virtual offset of each section must be aligned to the image's
      section alignment
    - the file offset *and size* of each section must be aligned to the
      image's file alignment
    - the image size must be aligned to the section alignment
    - each section's virtual offset must be greater than or equal to the
      size of the headers.
    
    In order to meet all these requirements, while avoiding the need for
    lots of padding to accommodate the .compat section, the latter is placed
    at an arbitrary offset towards the end of the image, but aligned to the
    minimum file alignment (512 bytes). The space before the .text section
    is therefore distributed between the PE header, the .setup section and
    the .compat section, leaving no gaps in the file coverage, making the
    signing tools happy.
    
    Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20230915171623.655440-18-ardb@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/boot: Omit compression buffer from PE/COFF image memory footprint [+ + +]

Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Tue Sep 12 09:00:56 2023 +0000

    x86/boot: Omit compression buffer from PE/COFF image memory footprint
    
    commit 8eace5b3555606e684739bef5bcdfcfe68235257 upstream.
    
    Now that the EFI stub decompresses the kernel and hands over to the
    decompressed image directly, there is no longer a need to provide a
    decompression buffer as part of the .BSS allocation of the PE/COFF
    image. It also means the PE/COFF image can be loaded anywhere in memory,
    and setting the preferred image base is unnecessary. So drop the
    handling of this from the header and from the build tool.
    
    Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20230912090051.4014114-22-ardb@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/boot: Remove the 'bugger off' message [+ + +]

Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Tue Sep 12 09:00:55 2023 +0000

    x86/boot: Remove the 'bugger off' message
    
    commit 768171d7ebbce005210e1cf8456f043304805c15 upstream.
    
    Ancient (pre-2003) x86 kernels could boot from a floppy disk straight from
    the BIOS, using a small real mode boot stub at the start of the image
    where the BIOS would expect the boot record (or boot block) to appear.
    
    Due to its limitations (kernel size < 1 MiB, no support for IDE, USB or
    El Torito floppy emulation), this support was dropped, and a Linux aware
    bootloader is now always required to boot the kernel from a legacy BIOS.
    
    To smoothen this transition, the boot stub was not removed entirely, but
    replaced with one that just prints an error message telling the user to
    install a bootloader.
    
    As it is unlikely that anyone doing direct floppy boot with such an
    ancient kernel is going to upgrade to v6.5+ and expect that this boot
    method still works, printing this message is kind of pointless, and so
    it should be possible to remove the logic that emits it.
    
    Let's free up this space so it can be used to expand the PE header in a
    subsequent patch.
    
    Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
    Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/boot: Set EFI handover offset directly in header asm [+ + +]

Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Fri Sep 15 17:16:26 2023 +0000

    x86/boot: Set EFI handover offset directly in header asm
    
    commit eac956345f99dda3d68f4ae6cf7b494105e54780 upstream.
    
    The offsets of the EFI handover entrypoints are available to the
    assembler when constructing the header, so there is no need to set them
    from the build tool afterwards.
    
    This change has no impact on the resulting bzImage binary.
    
    Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20230915171623.655440-12-ardb@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/boot: Split off PE/COFF .data section [+ + +]

Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Fri Sep 15 17:16:31 2023 +0000

    x86/boot: Split off PE/COFF .data section
    
    commit 34951f3c28bdf6481d949a20413b2ce7693687b2 upstream.
    
    Describe the code and data of the decompressor binary using separate
    .text and .data PE/COFF sections, so that we will be able to map them
    using restricted permissions once we increase the section and file
    alignment sufficiently. This avoids the need for memory mappings that
    are writable and executable at the same time, which is something that
    is best avoided for security reasons.
    
    Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20230915171623.655440-17-ardb@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/efi: Disregard setup header of loaded image [+ + +]

Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Tue Sep 12 09:00:53 2023 +0000

    x86/efi: Disregard setup header of loaded image
    
    commit 7e50262229faad0c7b8c54477cd1c883f31cc4a7 upstream.
    
    The native EFI entrypoint does not take a struct boot_params from the
    loader, but instead, it constructs one from scratch, using the setup
    header data placed at the start of the image.
    
    This setup header is placed in a way that permits legacy loaders to
    manipulate the contents (i.e., to pass the kernel command line or the
    address and size of an initial ramdisk), but EFI boot does not use it in
    that way - it only copies the contents that were placed there at build
    time, but EFI loaders will not (and should not) manipulate the setup
    header to configure the boot. (Commit 63bf28ceb3ebbe76 "efi: x86: Wipe
    setup_data on pure EFI boot" deals with some of the fallout of using
    setup_data in a way that breaks EFI boot.)
    
    Given that none of the non-zero values that are copied from the setup
    header into the EFI stub's struct boot_params are relevant to the boot
    now that the EFI stub no longer enters via the legacy decompressor, the
    copy can be omitted altogether.
    
    Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20230912090051.4014114-19-ardb@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/efi: Drop alignment flags from PE section headers [+ + +]

Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Tue Sep 12 09:00:54 2023 +0000

    x86/efi: Drop alignment flags from PE section headers
    
    commit bfab35f552ab3dd6d017165bf9de1d1d20f198cc upstream.
    
    The section header flags for alignment are documented in the PE/COFF
    spec as being applicable to PE object files only, not to PE executables
    such as the Linux bzImage, so let's drop them from the PE header.
    
    Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20230912090051.4014114-20-ardb@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/efi: Drop EFI stub .bss from .data section [+ + +]

Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Tue Sep 12 09:00:52 2023 +0000

    x86/efi: Drop EFI stub .bss from .data section
    
    commit 5f51c5d0e905608ba7be126737f7c84a793ae1aa upstream.
    
    Now that the EFI stub always zero inits its BSS section upon entry,
    there is no longer a need to place the BSS symbols carried by the stub
    into the .data section.
    
    Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20230912090051.4014114-18-ardb@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/efistub: Use 1:1 file:memory mapping for PE/COFF .compat section [+ + +]

Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Mon Feb 5 09:11:07 2024 +0100

    x86/efistub: Use 1:1 file:memory mapping for PE/COFF .compat section
    
    commit 1ad55cecf22f05f1c884adf63cc09d3c3e609ebf upstream.
    
    The .compat section is a dummy PE section that contains the address of
    the 32-bit entrypoint of the 64-bit kernel image if it is bootable from
    32-bit firmware (i.e., CONFIG_EFI_MIXED=y)
    
    This section is only 8 bytes in size and is only referenced from the
    loader, and so it is placed at the end of the memory view of the image,
    to avoid the need for padding it to 4k, which is required for sections
    appearing in the middle of the image.
    
    Unfortunately, this violates the PE/COFF spec, and even if most EFI
    loaders will work correctly (including the Tianocore reference
    implementation), PE loaders do exist that reject such images, on the
    basis that both the file and memory views of the file contents should be
    described by the section headers in a monotonically increasing manner
    without leaving any gaps.
    
    So reorganize the sections to avoid this issue. This results in a slight
    padding overhead (< 4k) which can be avoided if desired by disabling
    CONFIG_EFI_MIXED (which is only needed in rare cases these days)
    
    Fixes: 3e3eabe26dc8 ("x86/boot: Increase section and file alignment to 4k/512")
    Reported-by: Mike Beaton <mjsbeaton@gmail.com>
    Link: https://lkml.kernel.org/r/CAHzAAWQ6srV6LVNdmfbJhOwhBw5ZzxxZZ07aHt9oKkfYAdvuQQ%40mail.gmail.com
    Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/fpu: Stop relying on userspace for info to fault in xsave buffer [+ + +]

Author: Andrei Vagin <avagin@google.com>
Date:   Mon Jan 29 22:36:03 2024 -0800

    x86/fpu: Stop relying on userspace for info to fault in xsave buffer
    
    commit d877550eaf2dc9090d782864c96939397a3c6835 upstream.
    
    Before this change, the expected size of the user space buffer was
    taken from fx_sw->xstate_size. fx_sw->xstate_size can be changed
    from user-space, so it is possible construct a sigreturn frame where:
    
     * fx_sw->xstate_size is smaller than the size required by valid bits in
       fx_sw->xfeatures.
     * user-space unmaps parts of the sigrame fpu buffer so that not all of
       the buffer required by xrstor is accessible.
    
    In this case, xrstor tries to restore and accesses the unmapped area
    which results in a fault. But fault_in_readable succeeds because buf +
    fx_sw->xstate_size is within the still mapped area, so it goes back and
    tries xrstor again. It will spin in this loop forever.
    
    Instead, fault in the maximum size which can be touched by XRSTOR (taken
    from fpstate->user_size).
    
    [ dhansen: tweak subject / changelog ]
    
    Fixes: fcb3635f5018 ("x86/fpu/signal: Handle #PF in the direct restore path")
    Reported-by: Konstantin Bogomolov <bogomolov@google.com>
    Suggested-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Andrei Vagin <avagin@google.com>
    Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
    Cc:stable@vger.kernel.org
    Link: https://lore.kernel.org/all/20240130063603.3392627-1-avagin%40google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/Kconfig: Transmeta Crusoe is CPU family 5, not 6 [+ + +]

Author: Aleksander Mazur <deweloper@wp.pl>
Date:   Tue Jan 23 14:43:00 2024 +0100

    x86/Kconfig: Transmeta Crusoe is CPU family 5, not 6
    
    commit f6a1892585cd19e63c4ef2334e26cd536d5b678d upstream.
    
    The kernel built with MCRUSOE is unbootable on Transmeta Crusoe.  It shows
    the following error message:
    
      This kernel requires an i686 CPU, but only detected an i586 CPU.
      Unable to boot - please use a kernel appropriate for your CPU.
    
    Remove MCRUSOE from the condition introduced in commit in Fixes, effectively
    changing X86_MINIMUM_CPU_FAMILY back to 5 on that machine, which matches the
    CPU family given by CPUID.
    
      [ bp: Massage commit message. ]
    
    Fixes: 25d76ac88821 ("x86/Kconfig: Explicitly enumerate i686-class CPUs in Kconfig")
    Signed-off-by: Aleksander Mazur <deweloper@wp.pl>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Acked-by: H. Peter Anvin <hpa@zytor.com>
    Cc: <stable@kernel.org>
    Link: https://lore.kernel.org/r/20240123134309.1117782-1-deweloper@wp.pl
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/mm/ident_map: Use gbpages only where full GB page should be mapped. [+ + +]

Author: Steve Wahl <steve.wahl@hpe.com>
Date:   Fri Jan 26 10:48:41 2024 -0600

    x86/mm/ident_map: Use gbpages only where full GB page should be mapped.
    
    commit d794734c9bbfe22f86686dc2909c25f5ffe1a572 upstream.
    
    When ident_pud_init() uses only gbpages to create identity maps, large
    ranges of addresses not actually requested can be included in the
    resulting table; a 4K request will map a full GB.  On UV systems, this
    ends up including regions that will cause hardware to halt the system
    if accessed (these are marked "reserved" by BIOS).  Even processor
    speculation into these regions is enough to trigger the system halt.
    
    Only use gbpages when map creation requests include the full GB page
    of space.  Fall back to using smaller 2M pages when only portions of a
    GB page are included in the request.
    
    No attempt is made to coalesce mapping requests. If a request requires
    a map entry at the 2M (pmd) level, subsequent mapping requests within
    the same 1G region will also be at the pmd level, even if adjacent or
    overlapping such requests could have been combined to map a full
    gbpage.  Existing usage starts with larger regions and then adds
    smaller regions, so this should not have any great consequence.
    
    [ dhansen: fix up comment formatting, simplifty changelog ]
    
    Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
    Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/all/20240126164841.170866-1-steve.wahl%40hpe.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xen-netback: properly sync TX responses [+ + +]

Author: Jan Beulich <jbeulich@suse.com>
Date:   Mon Jan 29 14:03:08 2024 +0100

    xen-netback: properly sync TX responses
    
    commit 7b55984c96ffe9e236eb9c82a2196e0b1f84990d upstream.
    
    Invoking the make_tx_response() / push_tx_responses() pair with no lock
    held would be acceptable only if all such invocations happened from the
    same context (NAPI instance or dealloc thread). Since this isn't the
    case, and since the interface "spec" also doesn't demand that multicast
    operations may only be performed with no in-flight transmits,
    MCAST_{ADD,DEL} processing also needs to acquire the response lock
    around the invocations.
    
    To prevent similar mistakes going forward, "downgrade" the present
    functions to private helpers of just the two remaining ones using them
    directly, with no forward declarations anymore. This involves renaming
    what so far was make_tx_response(), for the new function of that name
    to serve the new (wrapper) purpose.
    
    While there,
    - constify the txp parameters,
    - correct xenvif_idx_release()'s status parameter's type,
    - rename {,_}make_tx_response()'s status parameters for consistency with
      xenvif_idx_release()'s.
    
    Fixes: 210c34dcd8d9 ("xen-netback: add support for multicast control")
    Cc: stable@vger.kernel.org
    Signed-off-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Paul Durrant <paul@xen.org>
    Link: https://lore.kernel.org/r/980c6c3d-e10e-4459-8565-e8fbde122f00@suse.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

zonefs: Improve error handling [+ + +]

Author: Damien Le Moal <dlemoal@kernel.org>
Date:   Thu Feb 8 17:26:59 2024 +0900

    zonefs: Improve error handling
    
    commit 14db5f64a971fce3d8ea35de4dfc7f443a3efb92 upstream.
    
    Write error handling is racy and can sometime lead to the error recovery
    path wrongly changing the inode size of a sequential zone file to an
    incorrect value  which results in garbage data being readable at the end
    of a file. There are 2 problems:
    
    1) zonefs_file_dio_write() updates a zone file write pointer offset
       after issuing a direct IO with iomap_dio_rw(). This update is done
       only if the IO succeed for synchronous direct writes. However, for
       asynchronous direct writes, the update is done without waiting for
       the IO completion so that the next asynchronous IO can be
       immediately issued. However, if an asynchronous IO completes with a
       failure right before the i_truncate_mutex lock protecting the update,
       the update may change the value of the inode write pointer offset
       that was corrected by the error path (zonefs_io_error() function).
    
    2) zonefs_io_error() is called when a read or write error occurs. This
       function executes a report zone operation using the callback function
       zonefs_io_error_cb(), which does all the error recovery handling
       based on the current zone condition, write pointer position and
       according to the mount options being used. However, depending on the
       zoned device being used, a report zone callback may be executed in a
       context that is different from the context of __zonefs_io_error(). As
       a result, zonefs_io_error_cb() may be executed without the inode
       truncate mutex lock held, which can lead to invalid error processing.
    
    Fix both problems as follows:
    - Problem 1: Perform the inode write pointer offset update before a
      direct write is issued with iomap_dio_rw(). This is safe to do as
      partial direct writes are not supported (IOMAP_DIO_PARTIAL is not
      set) and any failed IO will trigger the execution of zonefs_io_error()
      which will correct the inode write pointer offset to reflect the
      current state of the one on the device.
    - Problem 2: Change zonefs_io_error_cb() into zonefs_handle_io_error()
      and call this function directly from __zonefs_io_error() after
      obtaining the zone information using blkdev_report_zones() with a
      simple callback function that copies to a local stack variable the
      struct blk_zone obtained from the device. This ensures that error
      handling is performed holding the inode truncate mutex.
      This change also simplifies error handling for conventional zone files
      by bypassing the execution of report zones entirely. This is safe to
      do because the condition of conventional zones cannot be read-only or
      offline and conventional zone files are always fully mapped with a
      constant file size.
    
    Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
    Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system")
    Cc: stable@vger.kernel.org
    Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
    Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Список изменений в Linux 6.6.18