Skip to content

DLPX-86177 Azure Accelerated networking broken because Mellanox drivers absent in kernel#27

Merged
palash-gandhi merged 1 commit into
developfrom
dlpx/pr/pgandhi-delphix/4df2fea5-f639-4e7c-a411-61d5a4698992
May 22, 2023
Merged

DLPX-86177 Azure Accelerated networking broken because Mellanox drivers absent in kernel#27
palash-gandhi merged 1 commit into
developfrom
dlpx/pr/pgandhi-delphix/4df2fea5-f639-4e7c-a411-61d5a4698992

Conversation

@palash-gandhi
Copy link
Copy Markdown
Contributor

@palash-gandhi palash-gandhi commented May 20, 2023

Problem

ESCL-4467 came in where the customer did not notice evidence of accelerated networking in terms of throughput. There were other indications that the new virtual device was ignored by the kernel.

In 7.0, we disabled kernel modules as part of DLPX-83442 Disable various kernel modules which we don't use by prakashsurya · Pull Request #14 · delphix/linux-kernel-azure . This included disabling the Mellanox drivers causing AN to break.

Solution

Re-enable the Mellanox modules required for AN.

Testing Done

ab-pre-push: http://selfservice.jenkins.delphix.com/job/appliance-build-orchestrator-pre-push/5508/

delphix@pg-develop-mlx-fix:~$ get-appliance-version
12.0.0.0-snapshot.20230520044431399+jenkins-selfservice-appliance-build-develop-pre-push-211

delphix@pg-develop-mlx-fix:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:0d:3a:fc:df:0c brd ff:ff:ff:ff:ff:ff
    inet 10.39.241.180/20 brd 10.39.255.255 scope global eth0
       valid_lft forever preferred_lft forever
3: enP38618s1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master eth0 state UP group default qlen 1000
    link/ether 00:0d:3a:fc:df:0c brd ff:ff:ff:ff:ff:ff

delphix@pg-develop-mlx-fix:~$ ethtool -S eth0 | grep vf_
     vf_rx_packets: 523
     vf_rx_bytes: 73079
     vf_tx_packets: 799
     vf_tx_bytes: 166492
     vf_tx_dropped: 0
     ...

delphix@pg-develop-mlx-fix:~$ grep MLX5 /boot/config-5.4.0-1107-dx2023052002-113599c15-azure
CONFIG_MLX5_CORE=m
CONFIG_MLX5_ACCEL=y
CONFIG_MLX5_FPGA=y
CONFIG_MLX5_CORE_EN=y
CONFIG_MLX5_EN_ARFS=y
CONFIG_MLX5_EN_RXNFC=y
CONFIG_MLX5_MPFS=y
CONFIG_MLX5_ESWITCH=y
CONFIG_MLX5_CORE_EN_DCB=y
CONFIG_MLX5_CORE_IPOIB=y
CONFIG_MLX5_FPGA_IPSEC=y
CONFIG_MLX5_EN_IPSEC=y
CONFIG_MLX5_FPGA_TLS=y
CONFIG_MLX5_TLS=y
CONFIG_MLX5_EN_TLS=y
CONFIG_MLX5_SW_STEERING=y


delphix@pg-develop-mlx-fix:~$ ls /lib/modules/5.4.0-1107-dx2023052002-113599c15-azure/kernel/drivers/net/ethernet/mellanox/
mlx4  mlx5  mlxfw  mlxsw

@palash-gandhi palash-gandhi force-pushed the dlpx/pr/pgandhi-delphix/4df2fea5-f639-4e7c-a411-61d5a4698992 branch from e0079ca to 13a7350 Compare May 20, 2023 18:16
@palash-gandhi palash-gandhi marked this pull request as ready for review May 20, 2023 18:20
@sebroy
Copy link
Copy Markdown
Contributor

sebroy commented May 22, 2023

@pgandhi-delphix @david-mendez1 , do we have no end-to-end automation that tests Azure accelerated networking? If that's a gap, can we plan on filling it?

@palash-gandhi
Copy link
Copy Markdown
Contributor Author

@pgandhi-delphix @david-mendez1 , do we have no end-to-end automation that tests Azure accelerated networking? If that's a gap, can we plan on filling it?

@sebroy we do have a test, but it is incomplete. I have filed https://delphix.atlassian.net/browse/QA-41154 to add some more checks to that test.

@prakashsurya
Copy link
Copy Markdown

This might not be necessary, but we might want to port this to all the other platforms simply for consistency.

@palash-gandhi palash-gandhi merged commit 8dcd234 into develop May 22, 2023
@palash-gandhi palash-gandhi deleted the dlpx/pr/pgandhi-delphix/4df2fea5-f639-4e7c-a411-61d5a4698992 branch May 22, 2023 17:48
jwk404 pushed a commit to jwk404/linux-kernel-azure that referenced this pull request Mar 20, 2024
jwk404 pushed a commit to jwk404/linux-kernel-azure that referenced this pull request Mar 23, 2024
jwk404 pushed a commit to jwk404/linux-kernel-azure that referenced this pull request Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants