Skip to content

Conversation

@dolfim-ibm
Copy link
Member

@dolfim-ibm dolfim-ibm commented Oct 13, 2025

This PR is promoting the replacing the StandardPdfPipeline with the new multi-threads version (which was only experimental). The old behavior is still available as legacy.

More details on the new pipeline design and performance in #1951.

Parameters controlling the pipeline:

# batch size in the given stage
pipeline_options.ocr_batch_size = 4
pipeline_options.layout_batch_size = 4
pipeline_options.table_batch_size = 4

# maximum number of pages put in a stage queue
# potentially so many pages opened at the same time (watch for memory)
pipeline_options.queue_max_size = 100

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

@dolfim-ibm dolfim-ibm requested a review from cau-git October 13, 2025 10:53
@github-actions
Copy link
Contributor

github-actions bot commented Oct 13, 2025

DCO Check Passed

Thanks @dolfim-ibm, all your commits are properly signed off. 🎉

@dosubot
Copy link

dosubot bot commented Oct 13, 2025

Related Documentation

Checked 3 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@mergify
Copy link

mergify bot commented Oct 13, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@codecov
Copy link

codecov bot commented Oct 14, 2025

Codecov Report

❌ Patch coverage is 65.91928% with 152 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
docling/pipeline/legacy_standard_pdf_pipeline.py 0.00% 102 Missing ⚠️
docling/pipeline/standard_pdf_pipeline.py 84.93% 50 Missing ⚠️

📢 Thoughts on this report? Let us know!

@dolfim-ibm dolfim-ibm changed the title feat: Use threaded pipeline default and move old behavior to legacy feat: Use threading in the standard pipeline and move old behavior to legacy Oct 31, 2025
PeterStaar-IBM
PeterStaar-IBM previously approved these changes Oct 31, 2025
Copy link
Contributor

@PeterStaar-IBM PeterStaar-IBM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Signed-off-by: Michele Dolfi <[email protected]>
Copy link
Contributor

@PeterStaar-IBM PeterStaar-IBM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonderful!

@dolfim-ibm dolfim-ibm merged commit 268d027 into main Oct 31, 2025
27 of 28 checks passed
@dolfim-ibm dolfim-ibm deleted the feat-threaded-pipeline-default branch October 31, 2025 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants