Skip to content

http_client source corrupts binary data before passing it to the decoder #16814

@Dnnd

Description

@Dnnd

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

I think, I found a bug in the http_client source implementation.

Consider the following pipeline:

[sources.native_messages]
type = "http_client"
endpoint = "http://127.0.0.1:8080/native"
decoding.codec = "native"

[sinks.print]
type = "console"
inputs = ["native_messages"]
encoding.codec = "json"

Expectation

Data in Vector-native binary format from http://127.0.0.1:8080/native will be re-encoded to json and printed to the stdout

Reality

2023-03-16T03:14:28.853215Z ERROR source{component_kind="source" component_id=native_messages component_type=http_client component_name=native_messages}: vector::internal_events::codecs: Failed deserializing frame. error=failed to decode Protobuf message: invalid tag value: 0 error_type="parser_failed" stage="processing" internal_log_rate_limit=true

This error isn't triggered when I'm using other source types (everything works as expected with "stdin" source and decoding.codec = "native").

I managed to trace down this error. I think the root cause of this error is on_response callback in the http_client::HttpClientContext:

fn on_response(&mut self, _url: &Uri, _header: &Parts, body: &Bytes) -> Option<Vec<Event>> {
// get the body into a byte array
let mut buf = BytesMut::new();
let body = String::from_utf8_lossy(body);
buf.extend_from_slice(body.as_bytes());
// decode and enrich
let mut events = self.decode_events(&mut buf);
self.enrich_events(&mut events);
Some(events)

This callback unconditionally invokes String::from_utf8_lossy on the response body before passing it to the decoder. This behavior impacts any binary data produced by the "http_client" source, which is relevant for the "native" and "bytes" codecs.

Configuration

sources.native_messages]
type = "http_client"
endpoint = "http://127.0.0.1:8080/native"
decoding.codec = "native"

[sinks.print]
type = "console"
inputs = ["native_messages"]
encoding.codec = "json"

Version

vector 0.28.1 (x86_64-unknown-linux-gnu ff15924 2023-03-06)

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

Metadata

Metadata

Assignees

Labels

source: http_clientAnything `http_client` source relatedtype: bugA code related bug.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions