Skip to content

popen pm#476

Closed
plasorak wants to merge 29 commits intodevelopfrom
plasorak/popen-pm
Closed

popen pm#476
plasorak wants to merge 29 commits intodevelopfrom
plasorak/popen-pm

Conversation

@plasorak
Copy link
Collaborator

This PR creates a new process manager that doesn't use SSH, just bare subprocesses on the host it is running. Of course this only works for process that need to be started on localhost and the process manager should throw an error if you try to launch a process anywhere else.

To test, run:

drunc-unified-shell popen-standalone config/daqsystemtest/example-configs.data.xml local-1x1-config test

One can run integration tests with it too with a very simple modification, PR incoming.

This would remove all the SSH shenanigans that happen with integrationtests.

@PawelPlesniak
Copy link
Collaborator

Tested with full integration test - as expected for SSH process manager testing, nothing has changed. Also tested with a local interactive run using local-1x1-config and ehn1-local-1x1-config.
Found missing host name in the ps table.

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┓
┃ session   ┃ friendly name           ┃ user     ┃ host ┃ uuid   ┃ alive ┃ exit-code ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━┩
│ PawelTest │ root-controller         │ pplesnia │      │ 896103 │ False │ -9        │
│ PawelTest │   ru-controller         │ pplesnia │      │ 896135 │ False │ -9        │
│ PawelTest │     ru-01               │ pplesnia │      │ 896139 │ False │ -9        │
│ PawelTest │   df-controller         │ pplesnia │      │ 896159 │ False │ -9        │
│ PawelTest │     tp-stream-writer    │ pplesnia │      │ 896162 │ False │ -9        │
│ PawelTest │     dfo-01              │ pplesnia │      │ 896197 │ False │ -9        │
│ PawelTest │     df-01               │ pplesnia │      │ 896259 │ False │ -9        │
│ PawelTest │   trg-controller        │ pplesnia │      │ 896344 │ False │ -9        │
│ PawelTest │     tc-maker-1          │ pplesnia │      │ 896464 │ False │ -9        │
│ PawelTest │     mlt                 │ pplesnia │      │ 896511 │ False │ -9        │
│ PawelTest │   hsi-fake-controller   │ pplesnia │      │ 896523 │ False │ -9        │
│ PawelTest │     hsi-fake-01         │ pplesnia │      │ 896539 │ False │ -9        │
│ PawelTest │     hsi-fake-to-tc-app  │ pplesnia │      │ 896597 │ False │ -9        │
│ PawelTest │ local-connection-server │ pplesnia │      │ 896024 │ False │ -9        │
└───────────┴─────────────────────────┴──────────┴──────┴────────┴───────┴───────────┘

Also tried doing

drunc-unified-shell popen-standalone config/daqsystemtest/example-configs.data.xml ehn1-local-1x1-config PawelTest

which went into error on boot. Checking the logs also failed as

drunc-unified-shell > logs --name root-controller
object async_generator can't be used in 'await' expression
Traceback (most recent call last):
  File "/nfs/home/pplesnia/nightlyDev/fddaq-v5.3.2-rc4-a9/sourcecode/drunc/src/drunc/broadcast/server/decorators.py", line 91, in wrap
    async for a in cmd(obj, request, context):
  File "/nfs/home/pplesnia/nightlyDev/fddaq-v5.3.2-rc4-a9/sourcecode/drunc/src/drunc/authoriser/decorators.py", line 67, in check_token
    async for a in cmd(obj, request, context):
  File "/nfs/home/pplesnia/nightlyDev/fddaq-v5.3.2-rc4-a9/sourcecode/drunc/src/drunc/process_manager/process_manager.py", line 542, in logs
    async for r in await self._logs_impl(data):
TypeError: object async_generator can't be used in 'await' expression

@PawelPlesniak
Copy link
Collaborator

PawelPlesniak commented Jul 2, 2025

Testing also raises issues with how the subprocesses are exited

pplesnia 1000263  0.0  0.1 2084988 105660 ?      S    12:18   0:00 /nfs/home/pplesnia/nightlyDev/fddaq-v5.3.2-rc4-a9/.venv/bin/python /nfs/home/pplesnia/nightlyDev/fddaq-v5.3.2-rc4-a9/.venv/bin/drunc-controller -s PawelTest -k ehn1-local-1x1-config -n ru-controller -c grpc://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml -l INFO
pplesnia 1000264  0.0  0.1 2084988 105020 ?      S    12:18   0:00 /nfs/home/pplesnia/nightlyDev/fddaq-v5.3.2-rc4-a9/.venv/bin/python /nfs/home/pplesnia/nightlyDev/fddaq-v5.3.2-rc4-a9/.venv/bin/drunc-controller -s PawelTest -k ehn1-local-1x1-config -n ru-controller -c grpc://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml -l INFO
pplesnia 1000308  0.0  0.1 2233476 104720 ?      S    12:18   0:00 /nfs/home/pplesnia/nightlyDev/fddaq-v5.3.2-rc4-a9/.venv/bin/python /nfs/home/pplesnia/nightlyDev/fddaq-v5.3.2-rc4-a9/.venv/bin/drunc-controller -s PawelTest -k ehn1-local-1x1-config -n df-controller -c grpc://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml -l INFO
pplesnia 1000309  0.0  0.1 2233476 104132 ?      S    12:18   0:00 /nfs/home/pplesnia/nightlyDev/fddaq-v5.3.2-rc4-a9/.venv/bin/python /nfs/home/pplesnia/nightlyDev/fddaq-v5.3.2-rc4-a9/.venv/bin/drunc-controller -s PawelTest -k ehn1-local-1x1-config -n df-controller -c grpc://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml -l INFO
pplesnia 1000462  0.0  0.1 2159744 104132 ?      S    12:18   0:00 /nfs/home/pplesnia/nightlyDev/fddaq-v5.3.2-rc4-a9/.venv/bin/python /nfs/home/pplesnia/nightlyDev/fddaq-v5.3.2-rc4-a9/.venv/bin/drunc-controller -s PawelTest -k ehn1-local-1x1-config -n trg-controller -c grpc://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml -l INFO
pplesnia 1000467  0.0  0.1 2159744 103524 ?      S    12:18   0:00 /nfs/home/pplesnia/nightlyDev/fddaq-v5.3.2-rc4-a9/.venv/bin/python /nfs/home/pplesnia/nightlyDev/fddaq-v5.3.2-rc4-a9/.venv/bin/drunc-controller -s PawelTest -k ehn1-local-1x1-config -n trg-controller -c grpc://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml -l INFO
pplesnia 1000517  0.0  0.1 2158728 104076 ?      S    12:18   0:00 /nfs/home/pplesnia/nightlyDev/fddaq-v5.3.2-rc4-a9/.venv/bin/python /nfs/home/pplesnia/nightlyDev/fddaq-v5.3.2-rc4-a9/.venv/bin/drunc-controller -s PawelTest -k ehn1-local-1x1-config -n hsi-fake-controller -c grpc://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml -l INFO
pplesnia 1000518  0.0  0.1 2158728 103436 ?      S    12:18   0:00 /nfs/home/pplesnia/nightlyDev/fddaq-v5.3.2-rc4-a9/.venv/bin/python /nfs/home/pplesnia/nightlyDev/fddaq-v5.3.2-rc4-a9/.venv/bin/drunc-controller -s PawelTest -k ehn1-local-1x1-config -n hsi-fake-controller -c grpc://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml -l INFO

This is found after shutdown

[pplesnia@np04-srv-019 nightlyDev]$ ps aux | grep pplesnia | grep daq_application
\pplesnia  896144  387  6.9 9442920 4516224 ?     Sl   12:03  71:15 daq_application -s PawelTest -k local-1x1-config -n ru-01 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  896164  3.7  0.2 1119832 171252 ?      Sl   12:03   0:41 daq_application -s PawelTest -k local-1x1-config -n tp-stream-writer -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  896217  2.0  0.2 1586572 145064 ?      Sl   12:03   0:22 daq_application -s PawelTest -k local-1x1-config -n dfo-01 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  896302  4.4  0.2 1130516 173964 ?      Sl   12:03   0:48 daq_application -s PawelTest -k local-1x1-config -n df-01 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  896479  5.7  0.2 1532352 191428 ?      Sl   12:03   1:03 daq_application -s PawelTest -k local-1x1-config -n tc-maker-1 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  896518  9.0  0.3 1903656 204664 ?      Sl   12:03   1:39 daq_application -s PawelTest -k local-1x1-config -n mlt -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  896560  2.9  0.3 1618320 207360 ?      Sl   12:03   0:32 daq_application -s PawelTest -k local-1x1-config -n hsi-fake-01 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  896636  0.9  0.2 1388996 143672 ?      Sl   12:03   0:10 daq_application -s PawelTest -k local-1x1-config -n hsi-fake-to-tc-app -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  927227 51.8  0.7 4702988 479020 ?      Sl   12:07   7:22 daq_application -s PawelTest -k ehn1-local-1x1-config -n ru-01 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  927234  1.2  0.2 1337984 158180 ?      Sl   12:07   0:10 daq_application -s PawelTest -k ehn1-local-1x1-config -n tp-stream-writer -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  927237  0.5  0.2 1411504 145996 ?      Sl   12:07   0:04 daq_application -s PawelTest -k ehn1-local-1x1-config -n dfo-01 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  927244  1.8  0.2 1569840 192444 ?      Sl   12:07   0:15 daq_application -s PawelTest -k ehn1-local-1x1-config -n df-01 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  927271  2.2  0.2 1725656 171396 ?      Sl   12:07   0:19 daq_application -s PawelTest -k ehn1-local-1x1-config -n tc-maker-1 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  927274  4.5  0.3 2768492 237180 ?      Sl   12:07   0:38 daq_application -s PawelTest -k ehn1-local-1x1-config -n mlt -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  927328  1.8  0.3 2062412 225308 ?      Sl   12:07   0:16 daq_application -s PawelTest -k ehn1-local-1x1-config -n hsi-fake-01 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  927447  0.7  0.2 1336804 150520 ?      Sl   12:07   0:06 daq_application -s PawelTest -k ehn1-local-1x1-config -n hsi-fake-to-tc-app -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  972217  0.3  0.5 5235860 329872 ?      Sl   12:14   0:01 daq_application -s PawelTest -k ehn1-local-1x1-config -n ru-01 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  972295  0.3  0.2 918944 137856 ?       Sl   12:14   0:01 daq_application -s PawelTest -k ehn1-local-1x1-config -n tp-stream-writer -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  972351  0.3  0.2 919284 140884 ?       Sl   12:14   0:01 daq_application -s PawelTest -k ehn1-local-1x1-config -n dfo-01 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  972394  0.3  0.2 921428 138768 ?       Sl   12:14   0:01 daq_application -s PawelTest -k ehn1-local-1x1-config -n df-01 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  972518  0.3  0.2 1037080 167988 ?      Sl   12:14   0:01 daq_application -s PawelTest -k ehn1-local-1x1-config -n tc-maker-1 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  972562  0.4  0.2 1113632 169948 ?      Sl   12:14   0:01 daq_application -s PawelTest -k ehn1-local-1x1-config -n mlt -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  972650  0.3  0.2 966484 171508 ?       Sl   12:14   0:01 daq_application -s PawelTest -k ehn1-local-1x1-config -n hsi-fake-01 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  972722  0.3  0.2 984536 138308 ?       Sl   12:14   0:01 daq_application -s PawelTest -k ehn1-local-1x1-config -n hsi-fake-to-tc-app -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  978142  0.9  0.5 5604744 347420 ?      Sl   12:15   0:03 daq_application -s PawelTest -k ehn1-local-1x1-config -n ru-01 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  978149  0.3  0.2 927136 144200 ?       Sl   12:15   0:01 daq_application -s PawelTest -k ehn1-local-1x1-config -n tp-stream-writer -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  978152  0.4  0.2 1354388 152656 ?      Sl   12:15   0:01 daq_application -s PawelTest -k ehn1-local-1x1-config -n dfo-01 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  978157  0.4  0.2 1288512 149052 ?      Sl   12:15   0:01 daq_application -s PawelTest -k ehn1-local-1x1-config -n df-01 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  978261  0.4  0.2 1288476 158388 ?      Sl   12:15   0:01 daq_application -s PawelTest -k ehn1-local-1x1-config -n tc-maker-1 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  978317  0.5  0.2 1431644 160248 ?      Sl   12:15   0:02 daq_application -s PawelTest -k ehn1-local-1x1-config -n mlt -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  978421  0.5  0.2 1306256 180320 ?      Sl   12:15   0:01 daq_application -s PawelTest -k ehn1-local-1x1-config -n hsi-fake-01 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  978473  0.4  0.2 1287956 162784 ?      Sl   12:15   0:01 daq_application -s PawelTest -k ehn1-local-1x1-config -n hsi-fake-to-tc-app -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  993734  1.4  0.4 5604744 280260 ?      Sl   12:17   0:03 daq_application -s PawelTest -k ehn1-local-1x1-config -n ru-01 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  994534  0.4  0.1 927136 128172 ?       Sl   12:17   0:01 daq_application -s PawelTest -k ehn1-local-1x1-config -n tp-stream-writer -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  994745  0.5  0.2 1354404 142852 ?      Sl   12:17   0:01 daq_application -s PawelTest -k ehn1-local-1x1-config -n dfo-01 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  994977  0.5  0.2 1288512 135720 ?      Sl   12:17   0:01 daq_application -s PawelTest -k ehn1-local-1x1-config -n df-01 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  995804  0.5  0.2 1288476 136272 ?      Sl   12:17   0:01 daq_application -s PawelTest -k ehn1-local-1x1-config -n tc-maker-1 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  999416  1.4  0.4 5604752 282216 ?      Sl   12:18   0:03 daq_application -s PawelTest -k ehn1-local-1x1-config -n ru-01 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  999434  0.4  0.1 927136 128880 ?       Sl   12:18   0:01 daq_application -s PawelTest -k ehn1-local-1x1-config -n tp-stream-writer -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  999445  0.6  0.2 1354404 135724 ?      Sl   12:18   0:01 daq_application -s PawelTest -k ehn1-local-1x1-config -n dfo-01 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  999475  0.6  0.2 1288512 135336 ?      Sl   12:18   0:01 daq_application -s PawelTest -k ehn1-local-1x1-config -n df-01 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  999530  0.6  0.2 1288476 139552 ?      Sl   12:18   0:01 daq_application -s PawelTest -k ehn1-local-1x1-config -n tc-maker-1 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  999602  0.7  0.2 1431664 142152 ?      Sl   12:18   0:01 daq_application -s PawelTest -k ehn1-local-1x1-config -n mlt -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  999687  0.6  0.2 1306256 146576 ?      Sl   12:18   0:01 daq_application -s PawelTest -k ehn1-local-1x1-config -n hsi-fake-01 -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml
pplesnia  999734  0.6  0.2 1287956 135740 ?      Sl   12:18   0:01 daq_application -s PawelTest -k ehn1-local-1x1-config -n hsi-fake-to-tc-app -c rest://localhost:0 -d oksconflibs:config/daqsystemtest/example-configs.data.xml

@plasorak
Copy link
Collaborator Author

plasorak commented Jul 2, 2025

Huhuh, I thought I had fixed that problem of lingering processes. That's not good

@PawelPlesniak
Copy link
Collaborator

Processes are still lingering

@PawelPlesniak
Copy link
Collaborator

processes are still lingering as of now

jcfreeman2 pushed a commit to DUNE-DAQ/integrationtest that referenced this pull request Jul 31, 2025
…sts can work with DUNE-DAQ/drunc#476 (provided develop is merged into the corresponding drunc feature branch)
@PawelPlesniak PawelPlesniak self-assigned this Sep 17, 2025
@PawelPlesniak
Copy link
Collaborator

Development on hold - addressing #590 first

@PawelPlesniak
Copy link
Collaborator

On hold until controller cleanup is complete
#607

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants