Skip to content

feat(ai): Fashion Decode 디테일 enrichment 파이프라인 (브랜드·그룹·아티스트 엔티티 매칭) #262

@cocoyoon

Description

@cocoyoon

Summary

파싱 파이프라인(#TBD-2)이 추출한 문자열 필드(brand, celebrity_name, group_name, product_name)를 기존 DB 엔티티(brands, identities, groups)에 매칭·연결하고, Instagram 핸들/위키/공식 URL 등의 메타데이터로 enrich 한다.

Related:


Architecture

ARQ 잡 (30분 간격)
  → source_media WHERE parse_status='parsed' AND enrich_status='pending'
  → parse_result.items 의 brand, parse_result.celebrity_name/group_name 추출
  → 엔티티 매칭:
       [Brand Resolver]
         1) exact match: brands.name / brands.aliases
         2) fuzzy match: trigram similarity (pg_trgm)
         3) Instagram handle lookup (존재 시)
         4) 새 entity 후보로 등록 (brand_candidates, status='pending')
       [Identity Resolver]
         1) identities.name / aliases
         2) fuzzy
         3) 새 후보
       [Group Resolver] — K-pop 등
         1) groups.name / aliases
  → seed_solutions.brand_id, seed_posts.identity_id/group_id UPDATE
  → source_media UPDATE enrich_status='enriched' | 'partial' | 'needs_review'

스키마

ALTER TABLE warehouse.source_media
  ADD COLUMN enrich_status text DEFAULT 'pending' NOT NULL
    CHECK (enrich_status IN ('pending','enriching','enriched','partial','needs_review','skipped'));

-- 매칭 실패한 신규 엔티티 후보 (Admin 검수 대상)
CREATE TABLE warehouse.entity_candidates (
    id uuid DEFAULT gen_random_uuid() PRIMARY KEY,
    entity_type text NOT NULL CHECK (entity_type IN ('brand','identity','group')),
    raw_name text NOT NULL,
    normalized_name text NOT NULL,
    occurrence_count int DEFAULT 1 NOT NULL,
    first_source_media_id uuid REFERENCES warehouse.source_media(id) ON DELETE SET NULL,
    status text DEFAULT 'pending' NOT NULL
        CHECK (status IN ('pending','approved','merged','rejected')),
    suggested_instagram_handle text,
    suggested_metadata jsonb,
    created_at timestamptz DEFAULT now() NOT NULL,
    updated_at timestamptz DEFAULT now() NOT NULL,
    UNIQUE(entity_type, normalized_name)
);

Resolver 전략

Brand Resolver

  1. Exact: brands.name ILIKE raw_name OR raw_name = ANY(brands.aliases)
  2. Fuzzy: pg_trgm similarity(brands.name, raw_name) > 0.8
  3. Instagram handle: Instagram 공식 계정 조회 (v2 — 외부 API)
  4. Miss: entity_candidates INSERT (occurrence_count++)

Identity / Group Resolver

  • 한/영 혼용 대응: aliases jsonb 의 다국어 변형 포함
  • K-pop 그룹 대응: groups 테이블 + 멤버십 역참조 (group_members)

자동 승인 임계값

  • occurrence_count >= 5 AND fuzzy_score > 0.9 → 자동 병합 (v2)
  • 기본: Admin 수동 승인

Admin UI (별도 이슈로 분리 가능)

  • /admin/entity-candidates — 후보 리스트, 승인/거부/병합
  • 승인 시 brands/identities/groups INSERT + source_media.parse_result 재해소

파일

신규

파일 역할
supabase/migrations/..._add_entity_candidates.sql 스키마
packages/ai-server/src/services/media/enrichment/__init__.py 패키지
packages/ai-server/src/services/media/enrichment/resolvers/brand.py Brand resolver
packages/ai-server/src/services/media/enrichment/resolvers/identity.py Identity resolver
packages/ai-server/src/services/media/enrichment/resolvers/group.py Group resolver
packages/ai-server/src/services/media/enrichment/instagram_lookup.py Instagram 핸들 조회 (v2)
packages/ai-server/src/services/media/enrichment/jobs.py ARQ enrichment 잡
packages/api-server/src/routes/admin/entity_candidates.rs Admin API

수정

파일 변경
supabase/migrations/..._enable_pg_trgm.sql fuzzy 매칭 확장
packages/ai-server/src/bootstrap.py enrichment 잡 30분 스케줄

검증

  • 샘플 파싱 결과 100건 → 엔티티 매칭 정확도 (precision/recall) 측정
  • entity_candidates 에 쌓인 후보가 Admin UI 에 노출
  • Admin 승인 시 brands/identities 생성 + 관련 source_media 재해소
  • fuzzy 매칭 임계값 튜닝 (false positive 최소화)

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

Status
Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions