A note for the community
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Problem
I think, I found a bug in the http_client source implementation.
Consider the following pipeline:
[sources.native_messages]
type = "http_client"
endpoint = "http://127.0.0.1:8080/native"
decoding.codec = "native"
[sinks.print]
type = "console"
inputs = ["native_messages"]
encoding.codec = "json"
Expectation
Data in Vector-native binary format from http://127.0.0.1:8080/native will be re-encoded to json and printed to the stdout
Reality
2023-03-16T03:14:28.853215Z ERROR source{component_kind="source" component_id=native_messages component_type=http_client component_name=native_messages}: vector::internal_events::codecs: Failed deserializing frame. error=failed to decode Protobuf message: invalid tag value: 0 error_type="parser_failed" stage="processing" internal_log_rate_limit=true
This error isn't triggered when I'm using other source types (everything works as expected with "stdin" source and decoding.codec = "native").
I managed to trace down this error. I think the root cause of this error is on_response callback in the http_client::HttpClientContext:
|
fn on_response(&mut self, _url: &Uri, _header: &Parts, body: &Bytes) -> Option<Vec<Event>> { |
|
// get the body into a byte array |
|
let mut buf = BytesMut::new(); |
|
let body = String::from_utf8_lossy(body); |
|
buf.extend_from_slice(body.as_bytes()); |
|
|
|
// decode and enrich |
|
let mut events = self.decode_events(&mut buf); |
|
self.enrich_events(&mut events); |
|
|
|
Some(events) |
This callback unconditionally invokes String::from_utf8_lossy on the response body before passing it to the decoder. This behavior impacts any binary data produced by the "http_client" source, which is relevant for the "native" and "bytes" codecs.
Configuration
sources.native_messages]
type = "http_client"
endpoint = "http://127.0.0.1:8080/native"
decoding.codec = "native"
[sinks.print]
type = "console"
inputs = ["native_messages"]
encoding.codec = "json"
Version
vector 0.28.1 (x86_64-unknown-linux-gnu ff15924 2023-03-06)
Debug Output
No response
Example Data
No response
Additional Context
No response
References
No response
A note for the community
Problem
I think, I found a bug in the
http_clientsource implementation.Consider the following pipeline:
Expectation
Data in Vector-native binary format from
http://127.0.0.1:8080/nativewill be re-encoded to json and printed to the stdoutReality
This error isn't triggered when I'm using other source types (everything works as expected with "stdin" source and
decoding.codec = "native").I managed to trace down this error. I think the root cause of this error is
on_responsecallback in thehttp_client::HttpClientContext:vector/src/sources/http_client/client.rs
Lines 328 to 338 in be9e2c4
This callback unconditionally invokes
String::from_utf8_lossyon the response body before passing it to the decoder. This behavior impacts any binary data produced by the "http_client" source, which is relevant for the "native" and "bytes" codecs.Configuration
Version
vector 0.28.1 (x86_64-unknown-linux-gnu ff15924 2023-03-06)
Debug Output
No response
Example Data
No response
Additional Context
No response
References
No response