7.8 KiB
Scenario B: Editing / Filling Content in Existing DOCX
Core Principle
"First, do no harm." When editing an existing document, minimize changes. Touch only what needs to change. Preserve all formatting, styles, relationships, and structure that are not directly involved in the edit.
When to Use
- Replacing placeholder text (
{{name}},$DATE$,[PLACEHOLDER]) - Updating specific paragraphs or table cells
- Filling in form fields
- Adding or removing paragraphs in a known location
- Inserting tracked changes for review workflows
Do NOT use when: the user wants to change the look/style of the entire document (→ Scenario C) or create from scratch (→ Scenario A).
Workflow
1. Preview → CLI: analyze <input.docx>
2. Analyze → Understand structure: sections, styles, headings, tables
3. Identify → Locate exact edit targets (paragraph index, table index, placeholder text)
4. Edit → Apply surgical changes via CLI or direct XML
5. Validate → CLI: validate <output.docx>
6. Diff → Compare before/after to verify only intended changes were made
When to Use API vs Direct XML
Use CLI Edit Command When:
- Replacing placeholder text (e.g.,
{{fieldName}}→ actual value) - Filling table data from JSON
- Updating document properties (title, author)
- Simple text insertions or deletions
Use Direct XML Manipulation When:
- Text spans multiple runs with different formatting (run-boundary issues)
- Adding complex structures (nested tables, multi-image layouts)
- Manipulating Track Changes markup
- Modifying header/footer content
- Adjusting section properties
Placeholder Patterns
The CLI natively supports {{fieldName}} placeholders:
# Replace all {{placeholders}} from a JSON map
dotnet run ... edit input.docx --fill-placeholders data.json --output filled.docx
Where data.json:
{
"companyName": "Acme Corp",
"date": "March 21, 2026",
"amount": "$15,000.00",
"recipientName": "Jane Smith"
}
Other placeholder formats ($FIELD$, [PLACEHOLDER]) require text replacement:
dotnet run ... edit input.docx --replace "$DATE$" "March 21, 2026" --output updated.docx
Text Replacement Strategies
Simple Replacement
When the entire search text is within a single w:r (run):
<!-- Before -->
<w:r>
<w:rPr><w:b /></w:rPr>
<w:t>{{companyName}}</w:t>
</w:r>
<!-- After — formatting preserved -->
<w:r>
<w:rPr><w:b /></w:rPr>
<w:t>Acme Corp</w:t>
</w:r>
Direct replacement. The run's w:rPr is untouched.
Complex Replacement (Split Runs)
When the search text is split across multiple runs (common when Word applies spell-check or formatting mid-text):
<!-- "{{companyName}}" split into 3 runs -->
<w:r><w:rPr><w:b /></w:rPr><w:t>{{company</w:t></w:r>
<w:r><w:rPr><w:b /><w:i /></w:rPr><w:t>Na</w:t></w:r>
<w:r><w:rPr><w:b /></w:rPr><w:t>me}}</w:t></w:r>
Strategy:
- Concatenate text across runs to find the match
- Place the replacement text in the first run (preserving its
w:rPr) - Remove the text from subsequent runs (or remove the runs entirely if empty)
<!-- After -->
<w:r><w:rPr><w:b /></w:rPr><w:t>Acme Corp</w:t></w:r>
Rule: Always preserve the formatting of the first run in the match.
Table Editing
By Index
Tables are 0-indexed in document order:
dotnet run ... edit input.docx --table-index 0 --table-data data.json --output updated.docx
By Header Matching
Find a table by its header row content:
dotnet run ... edit input.docx --table-match "Name,Amount,Date" --table-data data.json
Table Data JSON Format
{
"rows": [
["Alice Johnson", "$5,000", "2026-03-15"],
["Bob Smith", "$3,200", "2026-03-18"]
],
"appendRows": true
}
appendRows: true— add rows after existing dataappendRows: false(default) — replace all data rows (keeps header row)
Direct XML Table Editing
To modify a specific cell, locate it by row/column index:
<!-- Row 2 (0-indexed), Column 1 -->
<w:tr> <!-- tr[2] -->
<w:tc>...</w:tc>
<w:tc> <!-- tc[1] — target cell -->
<w:p>
<w:r><w:t>Old Value</w:t></w:r>
</w:p>
</w:tc>
</w:tr>
Replace the w:t content. Do NOT modify w:tcPr (cell properties) or w:tblPr (table properties).
Track Changes Guidance
When to Add Revision Marks
- User explicitly requests tracked changes
- Document already has tracking enabled (
w:trackChangesin settings) - Collaborative review workflow
When NOT to Add Revision Marks
- Form filling / placeholder replacement (these are "completing" the document, not "revising" it)
- Direct edits where the user wants a clean result
- Batch data filling operations
Adding Tracked Changes
See references/track_changes_guide.md for full XML examples.
Quick reference — inserting text with tracking:
<w:ins w:id="1" w:author="MiniMaxAI" w:date="2026-03-21T10:00:00Z">
<w:r>
<w:t>New text here</w:t>
</w:r>
</w:ins>
Deleting text with tracking:
<w:del w:id="2" w:author="MiniMaxAI" w:date="2026-03-21T10:00:00Z">
<w:r>
<w:delText>Removed text</w:delText> <!-- MUST use delText, not t -->
</w:r>
</w:del>
Common Pitfalls
1. Breaking Run Boundaries
Problem: Replacing text that spans runs by naively modifying individual runs destroys inline formatting.
Fix: Concatenate run text, find match boundaries, consolidate into the first run, remove consumed runs.
2. Hyperlink Content
Problem: Replacing text inside a w:hyperlink element without preserving the hyperlink wrapper removes the link.
<w:hyperlink r:id="rId5">
<w:r>
<w:rPr><w:rStyle w:val="Hyperlink" /></w:rPr>
<w:t>Click here</w:t> <!-- Only replace this text -->
</w:r>
</w:hyperlink>
Fix: Only modify the w:t inside the hyperlink's run. Never remove or replace the w:hyperlink element itself.
3. Tracked Change Context
Problem: Replacing text that is inside a w:ins or w:del element without understanding the revision context creates invalid markup.
Fix: If the target text is inside a revision mark, either:
- Replace within the revision context (preserving the
w:ins/w:delwrapper) - Or delete the old revision and create a new one
4. Style Preservation
Problem: Adding new paragraphs without specifying a style causes them to inherit Normal, which may not match the surrounding context.
Fix: When inserting paragraphs, copy the w:pStyle from an adjacent paragraph of the same type.
5. Numbering Continuity
Problem: Inserting a new list item breaks numbering sequence.
Fix: Ensure the new paragraph has the same w:numId and w:ilvl as adjacent list items. If continuing a sequence, set w:numPr to match.
6. XML Special Characters
Problem: User content contains &, <, >, ", ' — these must be escaped in XML.
Fix: Always XML-escape user-provided text before inserting into w:t elements:
&→&<→<>→>"→"'→'
7. Whitespace Preservation
Problem: Leading/trailing spaces in w:t are stripped by XML parsers.
Fix: Add xml:space="preserve" attribute:
<w:t xml:space="preserve"> text with leading space</w:t>
Diff Verification
After editing, always compare the before and after states:
# Structural diff — shows only changed elements
dotnet run ... diff original.docx modified.docx
# Text-only diff — shows content changes
dotnet run ... diff original.docx modified.docx --text-only
Verify:
- Only intended text changed
- No styles were modified
- No relationships were added/removed unexpectedly
- Table structure intact (same number of rows/columns unless intentionally changed)
- Images and other media unchanged