Summary
파싱 파이프라인(#TBD-2)이 추출한 문자열 필드(brand, celebrity_name, group_name, product_name)를 기존 DB 엔티티(brands, identities, groups)에 매칭·연결하고, Instagram 핸들/위키/공식 URL 등의 메타데이터로 enrich 한다.
Related:
Architecture
ARQ 잡 (30분 간격)
→ source_media WHERE parse_status='parsed' AND enrich_status='pending'
→ parse_result.items 의 brand, parse_result.celebrity_name/group_name 추출
→ 엔티티 매칭:
[Brand Resolver]
1) exact match: brands.name / brands.aliases
2) fuzzy match: trigram similarity (pg_trgm)
3) Instagram handle lookup (존재 시)
4) 새 entity 후보로 등록 (brand_candidates, status='pending')
[Identity Resolver]
1) identities.name / aliases
2) fuzzy
3) 새 후보
[Group Resolver] — K-pop 등
1) groups.name / aliases
→ seed_solutions.brand_id, seed_posts.identity_id/group_id UPDATE
→ source_media UPDATE enrich_status='enriched' | 'partial' | 'needs_review'
스키마
ALTER TABLE warehouse.source_media
ADD COLUMN enrich_status text DEFAULT 'pending' NOT NULL
CHECK (enrich_status IN ('pending','enriching','enriched','partial','needs_review','skipped'));
-- 매칭 실패한 신규 엔티티 후보 (Admin 검수 대상)
CREATE TABLE warehouse.entity_candidates (
id uuid DEFAULT gen_random_uuid() PRIMARY KEY,
entity_type text NOT NULL CHECK (entity_type IN ('brand','identity','group')),
raw_name text NOT NULL,
normalized_name text NOT NULL,
occurrence_count int DEFAULT 1 NOT NULL,
first_source_media_id uuid REFERENCES warehouse.source_media(id) ON DELETE SET NULL,
status text DEFAULT 'pending' NOT NULL
CHECK (status IN ('pending','approved','merged','rejected')),
suggested_instagram_handle text,
suggested_metadata jsonb,
created_at timestamptz DEFAULT now() NOT NULL,
updated_at timestamptz DEFAULT now() NOT NULL,
UNIQUE(entity_type, normalized_name)
);
Resolver 전략
Brand Resolver
- Exact:
brands.name ILIKE raw_name OR raw_name = ANY(brands.aliases)
- Fuzzy: pg_trgm
similarity(brands.name, raw_name) > 0.8
- Instagram handle: Instagram 공식 계정 조회 (v2 — 외부 API)
- Miss:
entity_candidates INSERT (occurrence_count++)
Identity / Group Resolver
- 한/영 혼용 대응:
aliases jsonb 의 다국어 변형 포함
- K-pop 그룹 대응:
groups 테이블 + 멤버십 역참조 (group_members)
자동 승인 임계값
occurrence_count >= 5 AND fuzzy_score > 0.9 → 자동 병합 (v2)
- 기본: Admin 수동 승인
Admin UI (별도 이슈로 분리 가능)
/admin/entity-candidates — 후보 리스트, 승인/거부/병합
- 승인 시
brands/identities/groups INSERT + source_media.parse_result 재해소
파일
신규
| 파일 |
역할 |
supabase/migrations/..._add_entity_candidates.sql |
스키마 |
packages/ai-server/src/services/media/enrichment/__init__.py |
패키지 |
packages/ai-server/src/services/media/enrichment/resolvers/brand.py |
Brand resolver |
packages/ai-server/src/services/media/enrichment/resolvers/identity.py |
Identity resolver |
packages/ai-server/src/services/media/enrichment/resolvers/group.py |
Group resolver |
packages/ai-server/src/services/media/enrichment/instagram_lookup.py |
Instagram 핸들 조회 (v2) |
packages/ai-server/src/services/media/enrichment/jobs.py |
ARQ enrichment 잡 |
packages/api-server/src/routes/admin/entity_candidates.rs |
Admin API |
수정
| 파일 |
변경 |
supabase/migrations/..._enable_pg_trgm.sql |
fuzzy 매칭 확장 |
packages/ai-server/src/bootstrap.py |
enrichment 잡 30분 스케줄 |
검증
🤖 Generated with Claude Code
Summary
파싱 파이프라인(#TBD-2)이 추출한 문자열 필드(brand, celebrity_name, group_name, product_name)를 기존 DB 엔티티(
brands,identities,groups)에 매칭·연결하고, Instagram 핸들/위키/공식 URL 등의 메타데이터로 enrich 한다.Related:
Architecture
스키마
Resolver 전략
Brand Resolver
brands.name ILIKE raw_nameORraw_name = ANY(brands.aliases)similarity(brands.name, raw_name) > 0.8entity_candidatesINSERT (occurrence_count++)Identity / Group Resolver
aliasesjsonb 의 다국어 변형 포함groups테이블 + 멤버십 역참조 (group_members)자동 승인 임계값
occurrence_count >= 5ANDfuzzy_score > 0.9→ 자동 병합 (v2)Admin UI (별도 이슈로 분리 가능)
/admin/entity-candidates— 후보 리스트, 승인/거부/병합brands/identities/groupsINSERT +source_media.parse_result재해소파일
신규
supabase/migrations/..._add_entity_candidates.sqlpackages/ai-server/src/services/media/enrichment/__init__.pypackages/ai-server/src/services/media/enrichment/resolvers/brand.pypackages/ai-server/src/services/media/enrichment/resolvers/identity.pypackages/ai-server/src/services/media/enrichment/resolvers/group.pypackages/ai-server/src/services/media/enrichment/instagram_lookup.pypackages/ai-server/src/services/media/enrichment/jobs.pypackages/api-server/src/routes/admin/entity_candidates.rs수정
supabase/migrations/..._enable_pg_trgm.sqlpackages/ai-server/src/bootstrap.py검증
entity_candidates에 쌓인 후보가 Admin UI 에 노출brands/identities생성 + 관련source_media재해소🤖 Generated with Claude Code