Add deepseek ocr by molbap · Pull Request #41797 · huggingface/transformers

molbap · 2025-10-22T21:04:03Z

What does this PR do?

As per title. Architecturally: Llava-next used as skeleton with a modified SamModel and a modified ClipVisionModel, keeping the deepseekV2 decoder untouched (using AutoModel) and changing using config only.

Working config + random weights init
Modular draft with subconfigs (two vision configs)
Conversion from original checkpoint done
Modular model finished
Integration tests/OCR tests working as in original codebase
Make modular slimmer
Make processor faster
Complete test suite for transformers
Remap weights to avoid conversion / on-the-fly conversion? (cc @ArthurZucker )

Current branch is functional. You don't need to convert the weights, just run the following on your image and you'll get a nice OCR output.

import torch
from PIL import Image

from transformers import AutoProcessor, DeepseekOcrForConditionalGeneration

processor = AutoProcessor.from_pretrained("deepseek-ai/DeepSeek-OCR")
model = DeepseekOcrForConditionalGeneration.from_pretrained("deepseek-ai/DeepSeek-OCR", dtype=torch.bfloat16)

image = Image.open("handwritten_letter_small.png").convert("RGB")

conversation = [
    {
        "role": "<|User|>",
        "content": [
            {"type": "image", "path": "./handwritten_letter_small.png"},
            {"type": "text", "text": "<|grounding|>Convert the document to markdown."},
        ],
    }
]

inputs = processor.apply_chat_template(
    conversation,
    return_dict=True,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
    )

with torch.no_grad():
    generated = model.generate(**inputs, max_new_tokens=50)

text = processor.batch_decode(generated, skip_special_tokens=False)[0]
print(text.strip())

molbap · 2025-10-28T18:25:13Z

Implementation works. Processor remains to be optimized but getting similar results as in original repository.

HuggingFaceDocBuilderDev · 2025-10-28T18:33:52Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2026-01-19T17:00:23Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, deepseek_ocr

molbap · 2026-01-19T17:01:08Z

What's missing here is a solid mapping between checkpoint state dict and canonical 😅 once it's solid, it'll be a good step to use that model in vLLM/SGLang etc with the transformers modelling backend.

github-actions · 2026-01-19T17:10:21Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=41797&sha=6b3375

ducviet00 · 2026-03-01T09:13:59Z

Hi @molbap, thanks for the work on this!! I'd like to ask about the status of this PR

molbap added 24 commits October 22, 2025 12:05

hop

e931114

iterate

60a825b

fix

72c640c

fixup

690455e

make things simple

e2182c3

update conversion

20e3f6c

I believe this is not needed

7099e23

imports breathing better

edbbd0a

650 loc modular

20c5e0f

conversion and test running

0031e8f

add modular of course

7c0a2f2

naming

3a148ba

no mla deepseek

1b36afb

up

b63c11a

update

92d13ca

update

cfe15ed

Merge branch 'main' into deepseek_ocr

c820daa

fix 'template'

165014d

tosquash

b5acad8

much better (squash too)

34b41d4

tweak

9a2e47f

ugly moe_infer path but nice generation

a52ec39

nice

cec4fb3

cleaner routing

903cb2a

molbap marked this pull request as ready for review October 28, 2025 18:25

molbap changed the title ~~[WIP] add deepseek ocr~~ Add deepseek ocr Oct 28, 2025

molbap added 2 commits October 30, 2025 14:17

Merge branch 'main' into deepseek_ocr

48fdb92

doc, draft tests

39b2683

dolfim-ibm mentioned this pull request Dec 19, 2025

feat: Add DeepSeek-OCR integration docling-project/docling#2721

Open

molbap added 4 commits January 5, 2026 10:58

Merge branch 'main' into deepseek_ocr

a2fdd09

Merge branch 'main' into deepseek_ocr

6e21e66

update with main 5.0 changes

baa554c

cleanup

ee59160

molbap added the New model label Jan 8, 2026

molbap added 17 commits January 16, 2026 10:34

Merge branch 'main' into deepseek_ocr

4fd7a26

cleanup

88986e0

clean up further

78b8f82

fix modular

3199876

fix from config

6a9232d

remove some stuff

8564b89

remove hack(s)

f19e81a

small modifs

9ed0801

post init + remove unrelated

bb45134

missing attr

6033747

weird

b6582f8

fix tied weights?

6f8709d

fixup mapping

6264f3e

init pos ids

a7003f6

make things simpler

823dfd8

cleaner patterns

5058261

👀

a5e42f2

fixup

6b33753

evalstate mentioned this pull request Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

hmellor mentioned this pull request May 1, 2026

DeepSeek OCR specifies an incorrect tokenizer class on the Hub #45739

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add deepseek ocr#41797

Add deepseek ocr#41797
molbap wants to merge 109 commits into
mainfrom
deepseek_ocr

molbap commented Oct 22, 2025 •

edited

Loading

Uh oh!

molbap commented Oct 28, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 28, 2025

Uh oh!

github-actions Bot commented Jan 19, 2026

Uh oh!

molbap commented Jan 19, 2026

Uh oh!

github-actions Bot commented Jan 19, 2026

Uh oh!

ducviet00 commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Uh oh!

Conversation

molbap commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

molbap commented Oct 28, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 28, 2025

Uh oh!

github-actions Bot commented Jan 19, 2026

Uh oh!

molbap commented Jan 19, 2026

Uh oh!

github-actions Bot commented Jan 19, 2026

Uh oh!

ducviet00 commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

molbap commented Oct 22, 2025 •

edited

Loading