# Scenario B: Editing / Filling Content in Existing DOCX ## Core Principle **"First, do no harm."** When editing an existing document, minimize changes. Touch only what needs to change. Preserve all formatting, styles, relationships, and structure that are not directly involved in the edit. --- ## When to Use - Replacing placeholder text (`{{name}}`, `$DATE$`, `[PLACEHOLDER]`) - Updating specific paragraphs or table cells - Filling in form fields - Adding or removing paragraphs in a known location - Inserting tracked changes for review workflows Do NOT use when: the user wants to change the look/style of the entire document (→ Scenario C) or create from scratch (→ Scenario A). --- ## Workflow ``` 1. Preview → CLI: analyze 2. Analyze → Understand structure: sections, styles, headings, tables 3. Identify → Locate exact edit targets (paragraph index, table index, placeholder text) 4. Edit → Apply surgical changes via CLI or direct XML 5. Validate → CLI: validate 6. Diff → Compare before/after to verify only intended changes were made ``` --- ## When to Use API vs Direct XML ### Use CLI Edit Command When: - Replacing placeholder text (e.g., `{{fieldName}}` → actual value) - Filling table data from JSON - Updating document properties (title, author) - Simple text insertions or deletions ### Use Direct XML Manipulation When: - Text spans multiple runs with different formatting (run-boundary issues) - Adding complex structures (nested tables, multi-image layouts) - Manipulating Track Changes markup - Modifying header/footer content - Adjusting section properties --- ## Placeholder Patterns The CLI natively supports `{{fieldName}}` placeholders: ```bash # Replace all {{placeholders}} from a JSON map dotnet run ... edit input.docx --fill-placeholders data.json --output filled.docx ``` Where `data.json`: ```json { "companyName": "Acme Corp", "date": "March 21, 2026", "amount": "$15,000.00", "recipientName": "Jane Smith" } ``` Other placeholder formats (`$FIELD$`, `[PLACEHOLDER]`) require text replacement: ```bash dotnet run ... edit input.docx --replace "$DATE$" "March 21, 2026" --output updated.docx ``` --- ## Text Replacement Strategies ### Simple Replacement When the entire search text is within a single `w:r` (run): ```xml {{companyName}} Acme Corp ``` Direct replacement. The run's `w:rPr` is untouched. ### Complex Replacement (Split Runs) When the search text is split across multiple runs (common when Word applies spell-check or formatting mid-text): ```xml {{company Na me}} ``` Strategy: 1. Concatenate text across runs to find the match 2. Place the replacement text in the **first** run (preserving its `w:rPr`) 3. Remove the text from subsequent runs (or remove the runs entirely if empty) ```xml Acme Corp ``` **Rule**: Always preserve the formatting of the first run in the match. --- ## Table Editing ### By Index Tables are 0-indexed in document order: ```bash dotnet run ... edit input.docx --table-index 0 --table-data data.json --output updated.docx ``` ### By Header Matching Find a table by its header row content: ```bash dotnet run ... edit input.docx --table-match "Name,Amount,Date" --table-data data.json ``` ### Table Data JSON Format ```json { "rows": [ ["Alice Johnson", "$5,000", "2026-03-15"], ["Bob Smith", "$3,200", "2026-03-18"] ], "appendRows": true } ``` - `appendRows: true` — add rows after existing data - `appendRows: false` (default) — replace all data rows (keeps header row) ### Direct XML Table Editing To modify a specific cell, locate it by row/column index: ```xml ... Old Value ``` Replace the `w:t` content. Do NOT modify `w:tcPr` (cell properties) or `w:tblPr` (table properties). --- ## Track Changes Guidance ### When to Add Revision Marks - User explicitly requests tracked changes - Document already has tracking enabled (`w:trackChanges` in settings) - Collaborative review workflow ### When NOT to Add Revision Marks - Form filling / placeholder replacement (these are "completing" the document, not "revising" it) - Direct edits where the user wants a clean result - Batch data filling operations ### Adding Tracked Changes See `references/track_changes_guide.md` for full XML examples. Quick reference — inserting text with tracking: ```xml New text here ``` Deleting text with tracking: ```xml Removed text ``` --- ## Common Pitfalls ### 1. Breaking Run Boundaries **Problem**: Replacing text that spans runs by naively modifying individual runs destroys inline formatting. **Fix**: Concatenate run text, find match boundaries, consolidate into the first run, remove consumed runs. ### 2. Hyperlink Content **Problem**: Replacing text inside a `w:hyperlink` element without preserving the hyperlink wrapper removes the link. ```xml Click here ``` **Fix**: Only modify the `w:t` inside the hyperlink's run. Never remove or replace the `w:hyperlink` element itself. ### 3. Tracked Change Context **Problem**: Replacing text that is inside a `w:ins` or `w:del` element without understanding the revision context creates invalid markup. **Fix**: If the target text is inside a revision mark, either: - Replace within the revision context (preserving the `w:ins`/`w:del` wrapper) - Or delete the old revision and create a new one ### 4. Style Preservation **Problem**: Adding new paragraphs without specifying a style causes them to inherit `Normal`, which may not match the surrounding context. **Fix**: When inserting paragraphs, copy the `w:pStyle` from an adjacent paragraph of the same type. ### 5. Numbering Continuity **Problem**: Inserting a new list item breaks numbering sequence. **Fix**: Ensure the new paragraph has the same `w:numId` and `w:ilvl` as adjacent list items. If continuing a sequence, set `w:numPr` to match. ### 6. XML Special Characters **Problem**: User content contains `&`, `<`, `>`, `"`, `'` — these must be escaped in XML. **Fix**: Always XML-escape user-provided text before inserting into `w:t` elements: - `&` → `&` - `<` → `<` - `>` → `>` - `"` → `"` - `'` → `'` ### 7. Whitespace Preservation **Problem**: Leading/trailing spaces in `w:t` are stripped by XML parsers. **Fix**: Add `xml:space="preserve"` attribute: ```xml text with leading space ``` --- ## Diff Verification After editing, always compare the before and after states: ```bash # Structural diff — shows only changed elements dotnet run ... diff original.docx modified.docx # Text-only diff — shows content changes dotnet run ... diff original.docx modified.docx --text-only ``` Verify: - Only intended text changed - No styles were modified - No relationships were added/removed unexpectedly - Table structure intact (same number of rows/columns unless intentionally changed) - Images and other media unchanged