-
Notifications
You must be signed in to change notification settings - Fork 1.3k
agent, server: improve packet framing and use TLS 1.3 #11503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This pull request refactors the TLS framing and buffer management in the `Link` class to improve correctness and maintainability, and updates the SSL context initialization to use TLS 1.3 for enhanced security. CloudStack uses a 4-byte header for TLS packets. Earlier, it was not sent within the TLS application data, which affected maintainability and the implementation of agent-server communication using a different language. The most important changes are grouped below. * Reworked the TLS buffer handling in `Link.java`, replacing legacy header and packet assembly logic with a more robust system using `netBuffer`, `appBuffer`, and an explicit `headerBuffer` for frame length management. This improves frame parsing and avoids buffer overflows. * Refactored the read and write logic: the `read` method now correctly assembles frames from TLS streams, handling buffer resizing and edge cases, while the `doWrite` method builds TLS packets with a 4-byte length header and payload, ensuring correct framing and handshake handling. * Simplified the message sending and writing logic by removing manual header prepending and using the new framing system; the write queue now contains only payload buffers, and the header is added during the TLS wrap process. * Updated SSL context initialization in `Link.java` to use `SSLUtils.getSSLContextWithLatestVersion()`, ensuring that TLS 1.3 is used for all server, client, and management SSL contexts. * Added a new method `getSSLContextWithLatestVersion()` in `SSLUtils.java`, which returns an `SSLContext` instance for TLS 1.3. Signed-off-by: Abhishek Kumar <[email protected]>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #11503 +/- ##
============================================
- Coverage 17.76% 17.76% -0.01%
+ Complexity 15859 15858 -1
============================================
Files 5923 5923
Lines 530470 530495 +25
Branches 64823 64825 +2
============================================
- Hits 94243 94239 -4
- Misses 425682 425712 +30
+ Partials 10545 10544 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@blueorangutan package |
|
@shwstppr a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 14723 |
Signed-off-by: Abhishek Kumar <[email protected]>
|
@blueorangutan test |
|
@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
|
@blueorangutan test |
|
@shwstppr a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
|
[SF] Trillian test result (tid-14133)
|
|
[LL] Trillian Build Failed (tid-7129) |
|
@blueorangutan package |
|
@shwstppr a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 15690 |
|
@blueorangutan test |
|
@shwstppr a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
|
@blueorangutan test |
|
@shwstppr a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
|
Some issue with smoke test runs. Will investigate and make the required fixes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR upgrades the CloudStack agent-server communication from TLS 1.2 to TLS 1.3 and refactors packet framing to include the 4-byte length header within TLS application data rather than outside it. This improves maintainability and enables proper TLS 1.3 operation. The changes include significant rewrites to buffer management in the Link class and updates to SSL context initialization.
Key Changes:
- Introduced
getSSLContextWithLatestProtocolVersion()method to support TLS 1.3 contexts - Refactored Link class buffer management to use explicit headerBuffer, netBuffer, and appBuffer with frame-aware reading logic
- Modified doWrite to prepend the 4-byte frame header within TLS application data before encryption
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| utils/src/main/java/org/apache/cloudstack/utils/security/SSLUtils.java | Adds new method to create TLS 1.3 SSL contexts |
| utils/src/main/java/com/cloud/utils/nio/NioConnection.java | Enhances SSL handshake logging with protocol and cipher suite details |
| utils/src/main/java/com/cloud/utils/nio/NioClient.java | Enhances SSL handshake logging with protocol and cipher suite details |
| utils/src/main/java/com/cloud/utils/nio/Link.java | Major refactor: updates buffer management, frame parsing, read/write logic, and switches all SSL contexts to TLS 1.3 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| appBuffer.compact(); | ||
| break; | ||
| case BUFFER_OVERFLOW: | ||
| appBuffer = enlargeBuffer(appBuffer, _sslEngine.getSession().getApplicationBufferSize()); |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BUFFER_OVERFLOW case allocates a new buffer without preserving data from appBuffer. When overflow occurs during unwrap, appBuffer may contain unconsumed application data from a previous compact operation (line 261). The enlargeBuffer call discards this data, corrupting the frame assembly. The data should be preserved by flipping appBuffer, copying to the new buffer, and then compacting.
| appBuffer = enlargeBuffer(appBuffer, _sslEngine.getSession().getApplicationBufferSize()); | |
| // Preserve existing data in appBuffer while enlarging it for the next unwrap | |
| appBuffer.flip(); | |
| int requiredCapacity = _sslEngine.getSession().getApplicationBufferSize() + appBuffer.remaining(); | |
| ByteBuffer newAppBuffer = ByteBuffer.allocate(requiredCapacity); | |
| newAppBuffer.put(appBuffer); | |
| appBuffer = newAppBuffer; | |
| appBuffer.compact(); |
| public static SSLContext getSSLContextWithLatestProtocolVersion() throws NoSuchAlgorithmException { | ||
| return SSLContext.getInstance("TLSv1.3"); | ||
| } |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method name in the PR description says 'getSSLContextWithLatestVersion()' but the actual implementation uses 'getSSLContextWithLatestProtocolVersion()'. The description should be updated to match the implementation, or vice versa.
| dataRemaining -= count; | ||
| if (res.getHandshakeStatus() == HandshakeStatus.NEED_UNWRAP) { | ||
| // Unusual during application writes; upper layer should drive handshake | ||
| break; |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When NEED_UNWRAP occurs during write, the method breaks out of the loop without completing the write operation. This leaves some data in appSeq unsent, but the caller assumes the write succeeded. During renegotiation, NEED_UNWRAP can occur during wrap, and the handshake must be driven to completion before continuing the application data write. Consider throwing an exception or implementing handshake handling to ensure data is not silently dropped.
| break; | |
| throw new IOException("SSLEngine reported NEED_UNWRAP during write; renegotiation during write is not supported and application data may remain unsent"); |
| public static SSLContext getSSLContext() throws NoSuchAlgorithmException { | ||
| return SSLContext.getInstance("TLSv1.2"); | ||
| } | ||
|
|
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new method getSSLContextWithLatestProtocolVersion lacks documentation. Add a Javadoc comment explaining that it returns an SSLContext configured for TLS 1.3, and clarify when this should be used versus the TLS 1.2 method to help maintainers understand the distinction.
| /** | |
| * Returns an {@link SSLContext} configured for the latest supported TLS protocol version (currently TLSv1.3). | |
| * <p> | |
| * Use this method when both client and server support TLSv1.3 and you want to take advantage of the | |
| * latest protocol improvements. For environments that require or are limited to TLSv1.2, use | |
| * {@link #getSSLContext()} instead. | |
| * | |
| * @return an {@code SSLContext} instance using the {@code TLSv1.3} protocol | |
| * @throws NoSuchAlgorithmException if the TLSv1.3 protocol is not available in the current environment | |
| */ |
| int totalLen = 0; | ||
| for (ByteBuffer buffer : buffers) { | ||
| totalLen += buffer.limit(); | ||
| } | ||
|
|
||
| int processedLen = 0; | ||
| while (processedLen < totalLen) { | ||
| headBuf.clear(); | ||
| pkgBuf.clear(); | ||
| engResult = sslEngine.wrap(buffers, pkgBuf); | ||
| if (engResult.getHandshakeStatus() != HandshakeStatus.FINISHED && engResult.getHandshakeStatus() != HandshakeStatus.NOT_HANDSHAKING && | ||
| engResult.getStatus() != SSLEngineResult.Status.OK) { | ||
| throw new IOException("SSL: SSLEngine return bad result! " + engResult); | ||
| for (ByteBuffer b : buffers) totalLen += b.remaining(); |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The totalLen calculation could overflow if the sum of buffer.remaining() across all buffers exceeds Integer.MAX_VALUE. While unlikely in practice (would require >2GB of buffered data), an overflow would result in a corrupted negative frame length being written to the header. Consider validating totalLen is positive after the loop, or using long arithmetic with a range check.
| frameRemaining = headerBuffer.getInt(); | ||
| headerBuffer.clear(); | ||
| if (frameRemaining < 0) { | ||
| throw new IOException("Negative frame length"); | ||
| } |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The frame length is read as a signed integer without an upper bound check. A malicious or corrupted header could specify a very large frame length (e.g., Integer.MAX_VALUE = ~2GB), causing an OutOfMemoryError at line 234 when allocating plainTextBuffer. Add a maximum frame size check (e.g., 16MB) to prevent resource exhaustion attacks.
Description
This pull request refactors the TLS framing and buffer management in the
Linkclass to improve correctness and maintainability, and updates the SSL context initialization to use TLS 1.3 for enhanced security. CloudStack uses a 4-byte header for TLS packets. Earlier, it was not sent within the TLS application data, which affected maintainability (simply using TLS1.3 without packet changes didn't work, and it resulted in errors like [1]) and the implementation of agent-server communication using a different language. The most important changes are grouped below.TLS Framing and Buffer Management
Link.java, replacing legacy header and packet assembly logic with a more robust system usingnetBuffer,appBuffer, and an explicitheaderBufferfor frame length management. This improves frame parsing and avoids buffer overflows.readmethod now correctly assembles frames from TLS streams, handling buffer resizing and edge cases, while thedoWritemethod builds TLS packets with a 4-byte length header and payload, ensuring correct framing and handshake handling.Security Improvements
Link.javato useSSLUtils.getSSLContextWithLatestVersion(), ensuring that TLS 1.3 is used for all server, client, and management SSL contexts.getSSLContextWithLatestVersion()inSSLUtils.java, which returns anSSLContextinstance for TLS 1.3.[1] Error in agent-server connection with TLS1.3 without packet framing changes
2025-08-25 18:41:41,698 INFO [utils.nio.NioClient] (main:[]) (logid:) Connecting to 172.120.0.67:8250
2025-08-25 18:41:41,702 INFO [utils.nio.NioClient] (main:[]) (logid:) Connected to 172.120.0.67:8250
2025-08-25 18:41:41,704 INFO [utils.nio.Link] (main:[]) (logid:) Conf file found: /etc/cloudstack/agent/agent.properties
2025-08-25 18:41:41,941 INFO [utils.nio.NioClient] (main:[]) (logid:) SSL: Handshake done
2025-08-25 18:41:41,950 DEBUG [utils.nio.NioClient] (Agent-NioConnectionHandler-1:[]) (logid:) Location 1: Socket Socket[addr=/172.120.0.67,port=8250,localport=59004] closed on read. Probably -1 returned: Input record too big: max = 16709 len = 22679
2025-08-25 18:41:41,950 DEBUG [utils.nio.NioClient] (Agent-NioConnectionHandler-1:[]) (logid:) Closing socket Socket[addr=/172.120.0.67,port=8250,localport=59004]
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
Logs from management server:
Logs from one of the host:
Communication with hosts, system VMs and MS seemed fine
How did you try to break this feature and the system with this change?