18 KiB
Scenario C: Applying Formatting / Templates
When to Use
Use Scenario C when:
- The user has an existing document and wants to apply a different visual style
- The user wants to rebrand a document (new fonts, colors, heading styles)
- The user provides a template DOCX and wants its look applied to a content document
- The user wants consistent formatting across multiple documents
Do NOT use when: the user wants to edit content (→ Scenario B) or create from scratch (→ Scenario A).
Workflow
1. Analyze source → CLI: analyze source.docx (list styles, fonts, structure)
2. Analyze template → CLI: analyze template.docx (list styles, fonts, structure)
3. Map styles → Create mapping plan (source style → template style)
4. Apply template → CLI: apply-template source.docx --template template.docx --output result.docx
5. Validate (XSD) → CLI: validate result.docx --xsd wml-subset.xsd
6. GATE-CHECK → CLI: validate result.docx --xsd business-rules.xsd ← MUST PASS
7. Diff verify → CLI: diff source.docx result.docx --text-only (content must be identical)
What Gets Copied from Template
| Part | File | Description |
|---|---|---|
| Styles | word/styles.xml |
All style definitions (paragraph, character, table, numbering) |
| Theme | word/theme/theme1.xml |
Color scheme, font scheme, format scheme |
| Numbering | word/numbering.xml |
List and numbering definitions |
| Headers | word/header*.xml |
Header content and formatting |
| Footers | word/footer*.xml |
Footer content and formatting |
| Section props | w:sectPr |
Margins, page size, orientation, columns |
What Does NOT Get Copied
| Part | Reason |
|---|---|
| Document content | Paragraphs, tables, images stay from source |
| Comments | Belong to source document's review history |
| Tracked changes | Belong to source document's revision history |
| Custom XML parts | Application-specific data, not visual |
| Document properties | Title, author, dates belong to source |
| Glossary document | Template's building blocks are not transferred |
Template Structure Analysis (REQUIRED)
Before choosing Overlay or Base-Replace, you MUST analyze the template's internal structure. This is the #1 cause of failure when skipped.
Step 1: Count template paragraphs and identify structural zones
Run $CLI analyze --input template.docx or manually inspect:
# Quick structure scan
scripts/docx_preview.sh template.docx
Identify these zones in the template:
Zone A: Front matter (cover page, declaration, abstract, TOC)
→ These are KEPT from template, never replaced
Zone B: Example/placeholder body content ("第1章 XXX", sample paragraphs)
→ This is REPLACED with user's actual content
Zone C: Back matter (appendices, acknowledgments, blank pages)
→ These are KEPT from template or removed
Zone D: Final sectPr
→ ALWAYS kept from template
Step 2: Find Zone B boundaries (replacement range)
Search the template's document.xml for anchor text that marks the start and end of example content:
Start anchor patterns (first paragraph of example body):
- "第1章", "第一章", "Chapter 1", "1 Introduction", "绪论"
- The first paragraph with a Heading1-equivalent style after TOC
End anchor patterns (last paragraph before back matter):
- "参考文献", "References", "致谢", "Acknowledgments"
- The last paragraph before appendices or final sectPr
# Pseudocode for finding replacement range
for i, element in enumerate(template_body_elements):
text = get_text(element)
style = get_style(element)
if style in heading1_styles and ("第1章" in text or "Chapter 1" in text):
replace_start = i
if "参考文献" in text or "References" in text:
replace_end = i
break
CRITICAL: Verify the range by printing what's inside:
Template elements [0..replace_start-1]: front matter (KEEP)
Template elements [replace_start..replace_end]: example content (REPLACE)
Template elements [replace_end+1..end]: back matter (KEEP)
If replace_start or replace_end cannot be found, DO NOT proceed. Ask the user to identify the replacement boundaries.
Step 3: Decide Overlay vs Base-Replace
Now that you know the structure:
| Observation | Decision |
|---|---|
| Template has ≤30 paragraphs, no cover/TOC | C-1: Overlay (pure style template) |
| Template has >100 paragraphs with cover/TOC/example sections | C-2: Base-Replace |
| Template paragraph count ≈ user document | C-1: Overlay (similar structure) |
| Template paragraph count >> user document (e.g., 263 vs 134) | C-2: Base-Replace |
Step 4: For Base-Replace, execute the replacement
- Load template as base (all files)
- Extract user content elements using
list(body)— NOTfindall('w:p')(which misses tables) - Build new body:
template[0:replace_start] + cleaned_user_content + template[replace_end+1:] - Apply style mapping to every paragraph
- Clean direct formatting (see rules below)
- Rebuild document.xml, keeping template's namespace declarations
- Merge relationships (images + hyperlinks)
- Write output using template as ZIP base
Style Mapping Strategy
When template style names differ from source style names, a mapping is required. This step is mandatory — skipping it is the #1 cause of formatting failures in template application.
Step 0: Extract StyleIds from Both Documents (REQUIRED)
Before any template application, extract and compare styleIds from both documents:
# Extract all styleIds from source
$CLI analyze --input source.docx --styles-only
# Output example:
# Heading1 (paragraph, basedOn: Normal)
# Heading2 (paragraph, basedOn: Normal)
# Normal (paragraph)
# ListBullet (paragraph, basedOn: Normal)
# Extract all styleIds from template
$CLI analyze --input template.docx --styles-only
# Output example:
# 1 (paragraph, basedOn: a, name: "heading 1")
# 2 (paragraph, basedOn: a, name: "heading 2")
# 3 (paragraph, basedOn: a, name: "heading 3")
# a (paragraph, name: "Normal")
# a0 (character, name: "Default Paragraph Font")
Critical distinction: w:styleId vs w:name:
<!-- styleId="1" but name="heading 1" -->
<w:style w:type="paragraph" w:styleId="1">
<w:name w:val="heading 1"/>
<w:basedOn w:val="a"/>
</w:style>
The w:styleId attribute is what <w:pStyle w:val="..."/> references. The w:name attribute is the human-readable display name. They can be completely different. Many CJK templates use numeric styleIds (1, 2, 3, a, a0) instead of English names.
Tier 1: Exact StyleId Match
If source uses Heading1 and template defines Heading1 as a styleId, map directly. No action needed.
Tier 2: Name-Based Match
If no exact styleId match, try matching by w:name attribute:
- Source
Heading1(name="heading 1") → Template styleId1(name="heading 1") - Match is case-insensitive on the name value
Within the same type, also try matching by:
- Built-in style ID (Word's internal ID, e.g., heading 1 = built-in ID 1)
- Style type (paragraph → paragraph, character → character, table → table)
Tier 3: Manual Mapping
For renamed or custom styles, provide an explicit mapping:
{
"styleMap": {
"Heading1": "1",
"Heading2": "2",
"Heading3": "3",
"Heading4": "3",
"Normal": "a",
"BodyText": "a",
"ListBullet": "a",
"CompanyName": "Title",
"OldTableStyle": "TableGrid"
}
}
Common Non-Standard StyleId Patterns
| Template Origin | StyleId Pattern | Example |
|---|---|---|
| Chinese Word (default) | Numeric/alphabetic | 1, 2, 3, a, a0 |
| English Word (default) | English names | Heading1, Normal, Title |
| Google Docs export | Prefixed | Subtitle, NormalWeb |
| WPS Office | Mixed | 1, Heading1, custom names |
| Academic templates | Custom | ThesisHeading1, ThesisBody |
Building the Mapping Table
Follow this algorithm:
-
List source styleIds actually used in
document.xml(not all defined instyles.xml):# Pseudocode: find all unique pStyle values in source document.xml used_styles = set() for p in body.iter('w:p'): pStyle = p.find('w:pPr/w:pStyle') if pStyle is not None: used_styles.add(pStyle.get('val')) -
For each used style, find the best match in template:
- First try: exact styleId match
- Second try: match by
w:namevalue (case-insensitive) - Third try: match by style purpose (any heading → template's heading style)
- Fallback: map to template's default paragraph style (usually
Normalora)
-
Validate the mapping — every source styleId must map to an existing template styleId:
✓ Heading1 → 1 (name match: "heading 1") ✓ Heading2 → 2 (name match: "heading 2") ✓ Normal → a (name match: "Normal") ✗ CustomCallout → ??? (no match found, will fallback to 'a') -
Apply the mapping when copying content — update every
<w:pStyle w:val="..."/>:<!-- Source --> <w:pPr><w:pStyle w:val="Heading1"/></w:pPr> <!-- After mapping --> <w:pPr><w:pStyle w:val="1"/></w:pPr>
Unmapped Styles
Styles in the source document that have no match in the template are logged as warnings:
WARNING: Style 'CustomCallout' has no mapping in template. Content will fall back to 'a' (Normal).
The content is preserved; only the style reference is updated to the template's default paragraph style.
C-2 BASE-REPLACE: Additional StyleId Considerations
When using the template as a base document (C-2 strategy), the template's styles.xml is already in place. You must:
- Never copy source
styles.xml— the template's styles are the authority - Map every content paragraph's pStyle to the template's styleId before insertion
- Strip direct formatting selectively (see detailed rules below) — let the template style control appearance
- Verify table styles — if source tables use
TableGridbut template defines it asa3or similar, remap<w:tblStyle>too - Check character styles —
rPrinside runs may reference character styles likeHyperlinkorStrongthat have different IDs in the template
Direct Formatting Cleanup Rules (Detailed)
When copying content from source to template, apply these rules to EACH paragraph and run:
REMOVE from <w:rPr>:
<w:rFonts w:ascii="..." w:hAnsi="..."/>— Latin font overrides (EXCEPT: keepw:eastAsia)<w:sz>,<w:szCs>— font size (let style control)<w:color>— text color<w:highlight>— highlight color<w:shd>— shading<w:b>,<w:i>— bold/italic UNLESS the source style requires it (e.g., emphasis)<w:u>— underline<w:spacing>— character spacing
KEEP in <w:rPr>:
<w:rFonts w:eastAsia="宋体"/>— CJK font declaration (MUST keep, or Chinese text renders wrong)<w:rFonts w:eastAsia="华文中宋"/>— same reason- Anything inside
<w:drawing>— image references (handle separately via rId remapping)
REMOVE from <w:pPr>:
<w:pBdr>— paragraph borders<w:shd>— paragraph shading<w:spacing>— line/paragraph spacing (let style control)<w:jc>— justification (let style control)<w:tabs>— custom tab stops<w:rPr>inside pPr — default run formatting for the paragraph
KEEP in <w:pPr>:
<w:pStyle>— style reference (after mapping to template's styleId)<w:sectPr>— section properties (if intentionally inserting section breaks)<w:numPr>— numbering reference (after mapping numId to template's numbering)
Table cells (<w:tc>):
Apply the same rPr/pPr cleanup to every paragraph inside every cell. Also:
- Keep
<w:tcPr>structural properties (column span, row span, width) - Remove
<w:tcPr><w:shd>(cell shading — let table style control)
Relationship ID Remapping
When copying parts (headers, footers, images) from the template into the source package, relationship IDs (r:id) may collide.
Problem:
- Source has
rId7→image1.png - Template has
rId7→header1.xml - Copying template's
rId7overwrites source's image reference
Solution:
- Scan source's
document.xml.relsfor all existingrIdvalues - Find the maximum numeric ID (e.g.,
rId12) - Remap all template relationship IDs starting from
rId13 - Update all references in copied parts to use new IDs
<!-- Template original -->
<Relationship Id="rId1" Type="...header" Target="header1.xml" />
<!-- After remapping into source package -->
<Relationship Id="rId13" Type="...header" Target="header1.xml" />
<!-- Update sectPr reference -->
<w:headerReference w:type="default" r:id="rId13" />
Hyperlink Relationship Merging
When the source document contains external hyperlinks (e.g., URLs in references or footnotes), these are stored as relationships in word/_rels/document.xml.rels:
<Relationship Id="rId15" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink"
Target="https://example.com/paper" TargetMode="External"/>
The corresponding text in document.xml references this rId:
<w:hyperlink r:id="rId15">
<w:r><w:t>https://example.com/paper</w:t></w:r>
</w:hyperlink>
Merging steps:
- Scan source document.xml for all
<w:hyperlink r:id="...">elements - For each, find the corresponding relationship in source's rels file
- Check if template already has a relationship with the same Target URL
- If yes: reuse the existing rId, update the hyperlink reference
- If no: assign a new rId (starting from template's max rId + 1), add the relationship to template's rels, update the hyperlink reference
- Also check for hyperlink relationships used in footnotes (
word/_rels/footnotes.xml.rels) and endnotes
Common mistake: Copying hyperlink paragraphs without merging rels → hyperlinks silently break (clicking does nothing in Word).
XSD Gate-Check
What It Is
After template application, the output document MUST pass business-rules.xsd validation. This is a hard gate — if it fails, the document is NOT deliverable.
What business-rules.xsd Checks
| Rule | What It Validates |
|---|---|
| Template styles exist | All styles referenced by content paragraphs are defined in styles.xml |
| Margins match | Page margins match template specification |
| Fonts correct | w:docDefaults fonts match template's font scheme |
| Heading hierarchy | Heading levels are sequential (no H1 → H3 without H2) |
| Required styles present | Normal, Heading1-Heading3, TableGrid exist |
| Page size | Matches template's declared page size |
Handling Failures
GATE-CHECK FAILED:
- Style 'CustomStyle1' referenced in paragraph 14 but not defined in styles.xml
- Margin w:left=1080 does not match template requirement 1440
Fix each failure:
- Missing style: Add the style definition to
styles.xml, or remap the paragraph to an existing style - Margin mismatch: Update
w:sectPrmargins to match template - Font mismatch: Update
w:docDefaultsto match template font scheme - Heading hierarchy gap: Insert intermediate heading levels or adjust existing levels
Re-validate after every fix until gate-check passes.
Common Pitfalls
1. Orphaned Numbering References
Problem: Source document uses w:numId="5" in list paragraphs, but after replacing numbering.xml with the template's version, numbering ID 5 doesn't exist.
Symptom: Lists appear as plain paragraphs (no bullets/numbers).
Fix:
- Map source numbering IDs to template numbering IDs
- Update all
w:numIdreferences in document content - Or merge source numbering definitions into template's
numbering.xml
2. Missing Theme Colors
Problem: Source document's styles reference theme colors (w:themeColor="accent1") that have different values in the template's theme.
Symptom: Colors change unexpectedly (usually acceptable — this IS the point of re-theming). But if a style uses w:color with both w:val and w:themeColor, the theme color wins in Word.
Fix: Review color changes. If specific colors must be preserved, use explicit w:val without w:themeColor.
3. Section Property Conflicts
Problem: Source document has multiple sections (e.g., portrait + landscape pages), but the template assumes a single section.
Symptom: All sections get the same margins/orientation, breaking landscape pages.
Fix:
- Only apply template section properties to the final
w:sectPrinw:body - Preserve intermediate
w:sectPrelements (insidew:pPr) from the source - Or apply template properties to all sections but preserve orientation overrides
4. Embedded Font Conflicts
Problem: Template specifies fonts not available on the target system.
Fix: Either embed fonts in the DOCX (word/fonts/) or use web-safe alternatives:
- Calibri → available on Windows/Mac/Office online
- Arial → universal fallback
- Times New Roman → universal serif fallback
5. Broken Style Inheritance
Problem: Template has Heading1 based on Normal, but after applying template, Normal has different properties, cascading unwanted changes to headings.
Fix: Verify the w:basedOn chain for all critical styles. Ensure base styles are also correctly transferred from template.
Verification Checklist
After template application, verify:
- Content preserved — text diff shows zero content changes
- Gate-check passed —
business-rules.xsdvalidation succeeds - Styles applied — headings, body text, tables use template formatting
- Images intact — all images render correctly (relationship IDs valid)
- Lists working — numbered and bulleted lists display correctly
- Headers/footers — template headers/footers appear on all pages
- Page layout — margins, page size, orientation match template
- No corruption — file opens without errors in Word