24 KiB
Minimal-Invasive Editing of Existing xlsx
Make precise, surgical changes to existing xlsx files while preserving everything you do not touch: styles, macros, pivot tables, charts, sparklines, named ranges, data validation, conditional formatting, and all other embedded content.
1. When to Use This Path
Use the edit (unpack → XML edit → pack) path whenever the task involves modifying an existing xlsx file:
- Template filling — populating designated input cells with values or formulas
- Data updates — replacing outdated numbers, text, or dates in a live file
- Content corrections — fixing wrong values, broken formulas, or mistyped labels
- Adding new data rows to an existing table
- Renaming a sheet
- Applying a new style to specific cells
Do NOT use this path for creating a brand-new workbook from scratch. For that, see create.md.
2. Why openpyxl round-trip Is Forbidden for Existing Files
openpyxl load_workbook() followed by workbook.save() is a destructive operation on any file that contains advanced features. The library silently drops content it does not understand:
| Feature | openpyxl behavior | Consequence |
|---|---|---|
VBA macros (vbaProject.bin) |
Dropped entirely | All automation is lost; file saved as .xlsx not .xlsm |
Pivot tables (xl/pivotTables/) |
Dropped | Interactive analysis is destroyed |
| Slicers | Dropped | Filter UI is lost |
Sparklines (<sparklineGroups>) |
Dropped | In-cell mini-charts disappear |
| Chart formatting details | Partially lost | Series colors, custom axes may revert |
| Print area / page breaks | Sometimes lost | Print layout changes |
| Custom XML parts | Dropped | Third-party data bindings broken |
| Theme-linked colors | May be de-themed | Colors converted to absolute, breaking theme switching |
Even on a "plain" file without these features, openpyxl may normalize whitespace in XML that Excel relies on, alter namespace declarations, or reset calcMode flags.
The rule is absolute: never open an existing file with openpyxl for the purpose of re-saving it.
The XML direct-edit approach is safe because it operates on the raw bytes. You only change the nodes you touch. Everything else is byte-equivalent to the original.
3. Standard Operating Procedure
Step 1 — Unpack
python3 SKILL_DIR/scripts/xlsx_unpack.py input.xlsx /tmp/xlsx_work/
The script unzips the xlsx, pretty-prints every XML and .rels file, and prints a categorized inventory of key files plus a warning if high-risk content is detected (VBA, pivot tables, charts).
Read the printed output carefully before proceeding. If the script reports xl/vbaProject.bin or xl/pivotTables/, follow the constraints in Section 7.
Step 2 — Reconnaissance
Map the structure before touching anything.
Identify sheet names and their XML files:
xl/workbook.xml → <sheet name="Revenue" sheetId="1" r:id="rId1"/>
xl/_rels/workbook.xml.rels → <Relationship Id="rId1" Target="worksheets/sheet1.xml"/>
The sheet named "Revenue" lives in xl/worksheets/sheet1.xml. Always resolve this mapping before editing a worksheet.
Understand the shared strings table:
# Count existing entries in xl/sharedStrings.xml
grep -c "<si>" /tmp/xlsx_work/xl/sharedStrings.xml
Every text cell uses a zero-based index into this table. Know the current count before appending.
Understand the styles table:
# Count existing cellXfs entries
grep -c "<xf " /tmp/xlsx_work/xl/styles.xml
New style slots are appended after existing ones. The index of the first new slot = current count.
Scan for high-risk XML regions in the target worksheet:
Look for these elements in the target sheet*.xml before editing:
<mergeCell>— merged cell ranges; row/column insertion shifts these<conditionalFormatting>— condition ranges; row/column insertion shifts these<dataValidations>— validation ranges; row/column insertion shifts these<tableParts>— table definitions; row insertion inside a table needs<tableColumn>updates<sparklineGroups>— sparklines; preserve without modification
Step 3 — Map Intent to Minimal XML Changes
Before writing a single character, produce a written list of exactly which XML nodes change. This prevents scope creep.
| User intent | Files to change | Nodes to change |
|---|---|---|
| Change a cell's numeric value | xl/worksheets/sheetN.xml |
<v> inside target <c> |
| Change a cell's text | xl/sharedStrings.xml (append) + xl/worksheets/sheetN.xml |
New <si>, update cell <v> index |
| Change a cell's formula | xl/worksheets/sheetN.xml |
<f> text inside target <c> |
| Add a new data row at the bottom | xl/worksheets/sheetN.xml + possibly xl/sharedStrings.xml |
Append <row> element |
| Apply a new style to cells | xl/styles.xml + xl/worksheets/sheetN.xml |
Append <xf> in <cellXfs>, update s attribute on <c> |
| Rename a sheet | xl/workbook.xml |
name attribute on <sheet> element |
| Rename a sheet (with cross-sheet formulas) | xl/workbook.xml + all xl/worksheets/*.xml |
name attribute + <f> text referencing old name |
Step 4 — Execute Changes
Use the Edit tool. Edit the minimum. Never rewrite whole files.
See Section 4 for precise XML patterns for each operation type.
Step 5 — Cascade Check
After any change that shifts row or column positions, audit all affected XML regions. See Section 5.
Step 6 — Pack and Validate
python3 SKILL_DIR/scripts/xlsx_pack.py /tmp/xlsx_work/ output.xlsx
python3 SKILL_DIR/scripts/formula_check.py output.xlsx
The pack script validates XML well-formedness before creating the ZIP. Fix any reported parse errors before packing. After packing, run formula_check.py to confirm no formula errors were introduced.
4. Precise XML Patterns for Common Edits
4.1 Changing a Numeric Cell Value
Find the <c r="B5"> element in the worksheet XML and replace the <v> text.
Before:
<c r="B5">
<v>1000</v>
</c>
After (new value 1500):
<c r="B5">
<v>1500</v>
</c>
Rules:
- Do not add or remove the
sattribute (style) unless explicitly changing the style. - Do not add a
tattribute — numbers omittor uset="n". - Do not change the
rattribute (cell reference).
4.2 Changing a Text Cell Value
Text cells reference the shared strings table by index (t="s"). You cannot edit the string in-place without affecting every other cell that uses the same index. The safe approach is to append a new entry.
Before — shared strings file (xl/sharedStrings.xml):
<sst count="4" uniqueCount="4">
<si><t>Revenue</t></si>
<si><t>Cost</t></si>
<si><t>Margin</t></si>
<si><t>Old Label</t></si>
</sst>
After — append new string, increment counts:
<sst count="5" uniqueCount="5">
<si><t>Revenue</t></si>
<si><t>Cost</t></si>
<si><t>Margin</t></si>
<si><t>Old Label</t></si>
<si><t>New Label</t></si>
</sst>
New string is at index 4 (zero-based).
Before — cell in worksheet XML:
<c r="A7" t="s">
<v>3</v>
</c>
After — point to new index:
<c r="A7" t="s">
<v>4</v>
</c>
Rules:
- Never modify or delete existing
<si>entries. Only append. - Both
countanduniqueCountmust be incremented together. - If the new string contains
&,<, or>, escape them:&,<,>. - If the string has leading or trailing spaces, add
xml:space="preserve"to<t>:<si><t xml:space="preserve"> indented text </t></si>
4.3 Changing a Formula
Formulas are stored in <f> elements without a leading = (unlike what you type in Excel's UI).
Before:
<c r="C10">
<f>SUM(C2:C9)</f>
<v>4800</v>
</c>
After (extended range):
<c r="C10">
<f>SUM(C2:C11)</f>
<v></v>
</c>
Rules:
- Clear
<v>to an empty string when changing the formula. The cached value is now stale. - Do not add
t="s"or any type attribute to formula cells. Thetattribute is absent or uses a result-type value, not a formula marker. - Cross-sheet references use
SheetName!CellRef. If the sheet name contains spaces, wrap in single quotes:'Q1 Data'!B5. - The
<f>text must not include the leading=.
Before (converting a hardcoded value to a live formula):
<c r="D15">
<v>95000</v>
</c>
After:
<c r="D15">
<f>SUM(D2:D14)</f>
<v></v>
</c>
4.4 Adding a New Data Row
Append after the last <row> element inside <sheetData>. Row numbers in OOXML are 1-based and must be sequential.
Before (last row is row 10):
<row r="10">
<c r="A10" t="s"><v>3</v></c>
<c r="B10"><v>2023</v></c>
<c r="C10"><v>88000</v></c>
<c r="D10"><f>C10*1.1</f><v></v></c>
</row>
</sheetData>
After (new row 11 appended):
<row r="10">
<c r="A10" t="s"><v>3</v></c>
<c r="B10"><v>2023</v></c>
<c r="C10"><v>88000</v></c>
<c r="D10"><f>C10*1.1</f><v></v></c>
</row>
<row r="11">
<c r="A11" t="s"><v>4</v></c>
<c r="B11"><v>2024</v></c>
<c r="C11"><v>96000</v></c>
<c r="D11"><f>C11*1.1</f><v></v></c>
</row>
</sheetData>
Rules:
- Every
<c>inside the row must haverset to the correct cell address (e.g.,A11). - Text cells need
t="s"and a sharedStrings index in<v>. Numeric cells omitt. - Formula cells use
<f>and an empty<v>. - Copy the
sattribute from the row above if you want matching styles. Do not invent a style index that does not exist instyles.xml. - If the sheet contains a
<dimension>element (e.g.,<dimension ref="A1:D10"/>), update it to include the new row:<dimension ref="A1:D11"/>. - If the sheet contains a
<tableparts>referencing a table, update the table'srefattribute in the correspondingxl/tables/tableN.xmlfile.
4.5 Adding a New Column
Append new <c> elements to each existing <row> and, if present, update the <cols> section.
Before (rows have columns A–C):
<cols>
<col min="1" max="3" width="14" customWidth="1"/>
</cols>
<sheetData>
<row r="1">
<c r="A1" t="s"><v>0</v></c>
<c r="B1" t="s"><v>1</v></c>
<c r="C1" t="s"><v>2</v></c>
</row>
<row r="2">
<c r="A2"><v>100</v></c>
<c r="B2"><v>200</v></c>
<c r="C2"><v>300</v></c>
</row>
</sheetData>
After (adding column D):
<cols>
<col min="1" max="3" width="14" customWidth="1"/>
<col min="4" max="4" width="14" customWidth="1"/>
</cols>
<sheetData>
<row r="1">
<c r="A1" t="s"><v>0</v></c>
<c r="B1" t="s"><v>1</v></c>
<c r="C1" t="s"><v>2</v></c>
<c r="D1" t="s"><v>5</v></c>
</row>
<row r="2">
<c r="A2"><v>100</v></c>
<c r="B2"><v>200</v></c>
<c r="C2"><v>300</v></c>
<c r="D2"><f>A2+B2+C2</f><v></v></c>
</row>
</sheetData>
Rules:
- Adding a column at the end (after the last existing column) is safe — no existing formula references shift.
- Inserting a column in the middle shifts all columns to the right, which requires the same cascade updates as row insertion (see Section 5).
- Update the
<dimension>element if present.
4.6 Modifying or Adding Styles
Styles use a multi-level indirect reference chain. Read ooxml-cheatsheet.md for the full chain. The key rule: only append new entries, never modify existing ones.
Scenario: Add a blue-font style (for hardcoded input cells) that doesn't yet exist.
Step 1 — Check if a matching font already exists in xl/styles.xml:
<!-- Look inside <fonts> for an existing blue font -->
<font>
<color rgb="000000FF"/>
<!-- other attributes -->
</font>
If found, note its index (zero-based position in the <fonts> list). If not found, append.
Step 2 — Append the new font if needed:
Before:
<fonts count="3">
<font>...</font> <!-- index 0 -->
<font>...</font> <!-- index 1 -->
<font>...</font> <!-- index 2 -->
</fonts>
After:
<fonts count="4">
<font>...</font> <!-- index 0 -->
<font>...</font> <!-- index 1 -->
<font>...</font> <!-- index 2 -->
<font>
<b/>
<sz val="11"/>
<color rgb="000000FF"/>
<name val="Calibri"/>
</font> <!-- index 3 (new) -->
</fonts>
Step 3 — Append a new <xf> in <cellXfs>:
Before:
<cellXfs count="5">
<xf .../> <!-- index 0 -->
<xf .../> <!-- index 1 -->
<xf .../> <!-- index 2 -->
<xf .../> <!-- index 3 -->
<xf .../> <!-- index 4 -->
</cellXfs>
After:
<cellXfs count="6">
<xf .../> <!-- index 0 -->
<xf .../> <!-- index 1 -->
<xf .../> <!-- index 2 -->
<xf .../> <!-- index 3 -->
<xf .../> <!-- index 4 -->
<xf numFmtId="0" fontId="3" fillId="0" borderId="0" xfId="0"
applyFont="1"/> <!-- index 5 (new) -->
</cellXfs>
Step 4 — Apply to target cells:
Before:
<c r="B3">
<v>0.08</v>
</c>
After:
<c r="B3" s="5">
<v>0.08</v>
</c>
Rules:
- Never delete or reorder existing entries in
<fonts>,<fills>,<borders>,<cellXfs>. - Always update the
countattribute when appending. - The new
cellXfsindex = the oldcountvalue before appending (zero-based: if count was 5, new index is 5). - Custom
numFmtIDs must be 164 or above. IDs 0–163 are built-in and must not be re-declared. - If the desired style already exists elsewhere in the file (on a similar cell), reuse its
sindex rather than creating a duplicate.
4.7 Renaming a Sheet
Only xl/workbook.xml needs to change — unless cross-sheet formulas reference the old name.
Before (xl/workbook.xml):
<sheet name="Sheet1" sheetId="1" r:id="rId1"/>
After:
<sheet name="Revenue" sheetId="1" r:id="rId1"/>
If any formula in any worksheet references the old name, update those too:
Before (xl/worksheets/sheet2.xml):
<c r="B5"><f>Sheet1!C10</f><v></v></c>
After:
<c r="B5"><f>Revenue!C10</f><v></v></c>
If the new name contains spaces:
<c r="B5"><f>'Q1 Revenue'!C10</f><v></v></c>
Scan all worksheet XML files for the old name:
grep -r "Sheet1!" /tmp/xlsx_work/xl/worksheets/
Rules:
- The
.relsfile and[Content_Types].xmldo NOT need to change — they reference the XML file path, not the sheet name. sheetIdmust not change; it is a stable internal identifier.- Sheet names are case-sensitive in formula references.
5. High-Risk Operations — Cascade Effects
5.1 Inserting a Row in the Middle
Inserting a row at position N shifts all rows from N downward. Every reference to those rows in every XML file must be updated.
Files to check and update:
| XML region | What to update | Example shift |
|---|---|---|
Worksheet <row r="..."> attributes |
Increment row number for all rows >= N | r="7" → r="8" |
All <c r="..."> within those rows |
Increment row number in cell address | r="A7" → r="A8" |
All <f> formula text in any sheet |
Shift absolute row references >= N | B7 → B8 |
<mergeCell ref="..."> |
Shift start and end rows | A7:C7 → A8:C8 |
<conditionalFormatting sqref="..."> |
Shift range | A5:D20 → A5:D21 |
<dataValidations sqref="..."> |
Shift range | B6:B50 → B7:B51 |
xl/charts/chartN.xml data source ranges |
Shift series ranges | Sheet1!$B$5:$B$20 → Sheet1!$B$6:$B$21 |
xl/pivotTables/*.xml source ranges |
Shift source data range | Handle with extreme care — see Section 7 |
<dimension ref="..."> |
Expand to include new extent | A1:D20 → A1:D21 |
xl/tables/tableN.xml ref attribute |
Expand table boundary | A1:D20 → A1:D21 |
Do not attempt row insertion manually in large or formula-heavy files. Use the dedicated shift script instead:
# Insert 1 row at row 5: all rows 5 and below shift down by 1
python3 SKILL_DIR/scripts/xlsx_shift_rows.py /tmp/xlsx_work/ insert 5 1
# Delete 1 row at row 8: all rows 9 and above shift up by 1
python3 SKILL_DIR/scripts/xlsx_shift_rows.py /tmp/xlsx_work/ delete 8 1
The script updates in one pass: <row r="..."> attributes, <c r="..."> cell addresses, all <f> formula text across every worksheet, <mergeCell> ranges, <conditionalFormatting sqref="...">, <dataValidation sqref="...">, <dimension ref="...">, table ref attributes in xl/tables/, chart series ranges in xl/charts/, and pivot cache source ranges in xl/pivotCaches/.
After running the shift script, always repack and validate:
python3 SKILL_DIR/scripts/xlsx_pack.py /tmp/xlsx_work/ output.xlsx
python3 SKILL_DIR/scripts/formula_check.py output.xlsx
What the script does NOT update (review manually):
- Named ranges in
xl/workbook.xml<definedNames>— check and update if they reference shifted rows. - Structured table references (
Table[@Column]) inside formulas. - External workbook links in
xl/externalLinks/.
5.2 Inserting a Column in the Middle
Same cascade logic as row insertion, but for columns. Column references in formulas (B, $C, etc.) and in merged cell ranges, conditional formatting ranges, and chart data sources all need updating.
Column letter shifting is harder to automate safely. Prefer appending columns at the end whenever possible.
5.3 Deleting a Row or Column
Deletion is more dangerous than insertion because any formula that referenced a deleted row or column will become #REF!. Before deleting:
- Search all
<f>elements for references to the deleted range. - If any formula references a cell in the deleted row/column, do not delete — instead, either clear the row's data or consult the user.
- After deletion, shift all references to rows/columns beyond the deletion point downward/leftward.
6. Template Filling — Identifying and Populating Input Cells
Templates designate certain cells as input zones. Common patterns to recognize them:
6.1 How Templates Signal Input Zones
| Signal | XML manifestation | What to look for |
|---|---|---|
| Blue font color | s attribute pointing to a cellXfs entry with fontId → <color rgb="000000FF"/> |
Check styles.xml to decode s values |
| Yellow fill (highlight) | s → fillId → <fill><patternFill><fgColor rgb="00FFFF00"/> |
|
Empty <v> element |
<c r="B5"><v></v></c> or cell entirely absent from <row> |
The cell has no value yet |
| Comment/annotation near cell | xl/comments1.xml with ref="B5" |
Comments often label input fields |
| Named ranges | xl/workbook.xml <definedName> elements |
Template may define InputRevenue etc. |
6.2 Filling a Template Cell
Do not change s attributes. Do not change t attributes unless you must change from empty to typed. Only change <v> or add <f>.
Before (empty input cell with style preserved):
<c r="C5" s="3">
<v></v>
</c>
After (filled with a number, style unchanged):
<c r="C5" s="3">
<v>125000</v>
</c>
After (filled with text — requires shared string entry first):
<!-- 1. Append to sharedStrings.xml: <si><t>North Region</t></si> at index 7 -->
<c r="C5" t="s" s="3">
<v>7</v>
</c>
After (filled with a formula, preserving style):
<c r="C5" s="3">
<f>Assumptions!D12</f>
<v></v>
</c>
6.3 Locating Input Zones Without Opening the File in Excel
After unpacking, decode the style index on suspected input cells to determine if they have the template's input color:
- Note the
svalue on the cell (e.g.,s="4"). - In
xl/styles.xml, find<cellXfs>and look at the 5th entry (index 4). - Note its
fontId(e.g.,fontId="2"). - In
<fonts>, look at the 3rd entry (index 2) and check for<color rgb="000000FF"/>(blue) or other input marker.
If the template uses named ranges as input fields, read them from xl/workbook.xml:
<definedNames>
<definedName name="InputGrowthRate">Assumptions!$B$5</definedName>
<definedName name="InputDiscountRate">Assumptions!$B$6</definedName>
</definedNames>
Fill the target cells (Assumptions!B5, Assumptions!B6) directly.
6.4 Template Filling Rules
- Fill only cells the template designated as inputs. Do not fill cells that are formula-driven.
- Do not apply new styles when filling. The template's formatting is the deliverable.
- Do not add or remove rows inside the template's data area unless the template explicitly has an "append here" zone.
- After filling, verify that no formula errors were introduced: some templates have input-validation formulas that produce
#VALUE!if the wrong data type is entered.
7. Files You Must Never Modify
7.1 Absolute no-touch list
| File / location | Why |
|---|---|
xl/vbaProject.bin |
Binary VBA bytecode. Any byte modification corrupts the macro project. Editing even one bit makes the macros fail to load. |
xl/pivotCaches/pivotCacheDefinition*.xml |
The cache definition ties the pivot table to its source data. Editing it without also updating the corresponding pivotTable*.xml will corrupt the pivot. |
xl/pivotTables/*.xml |
Pivot table XML is tightly coupled with the cache definition and with internal state Excel rebuilds on load. Do not edit. If you shifted rows and the pivot's source range now points to wrong data, update only the <cacheSource> range in the cache definition, and only the ref attribute in the pivot table — no other changes. |
xl/slicers/*.xml |
Slicers are connected to specific cache IDs and pivot fields. Breaking these connections silently corrupts the file. |
xl/connections.xml |
External data connections. Editing breaks live data refresh. |
xl/externalLinks/ |
External workbook links. The binary .bin files in here must not be modified. |
7.2 Conditionally safe files (update only specific attributes)
| File | What you may update | What to leave alone |
|---|---|---|
xl/charts/chartN.xml |
Data series range references (<numRef><f>) after a row/column shift |
Chart type, formatting, layout |
xl/tables/tableN.xml |
ref attribute on <table> after adding rows |
Column definitions, style info |
xl/pivotCaches/pivotCacheDefinition*.xml |
ref attribute on <cacheSource><worksheetSource> after shifting source data |
All other content |
8. Validation After Every Edit
Never skip validation. Even a one-character change in a formula can cause cascading errors.
# Pack
python3 SKILL_DIR/scripts/xlsx_pack.py /tmp/xlsx_work/ output.xlsx
# Static formula validation (always run)
python3 SKILL_DIR/scripts/formula_check.py output.xlsx
# Dynamic validation (if LibreOffice available)
python3 SKILL_DIR/scripts/libreoffice_recalc.py output.xlsx /tmp/recalc.xlsx
python3 SKILL_DIR/scripts/formula_check.py /tmp/recalc.xlsx
If formula_check.py reports any error:
- Unpack the output file again (it is the packed version).
- Locate the reported cell in the worksheet XML.
- Fix the
<f>element. - Repack and re-validate.
Do not deliver the file until formula_check.py reports zero errors.
9. Absolute Rules Summary
| Rule | Rationale |
|---|---|
Never use openpyxl load_workbook + save on an existing file |
Round-trip destroys pivot tables, VBA, sparklines, slicers |
Never delete or reorder existing <si> entries in sharedStrings |
Breaks every cell referencing that index |
Never delete or reorder existing <xf> entries in <cellXfs> |
Breaks every cell using that style index |
Never modify vbaProject.bin |
Binary file; any change corrupts VBA |
Never change sheetId when renaming a sheet |
Internal ID is stable; changing it breaks relationships |
| Never skip post-edit validation | Leaves broken references undetected |
| Never edit more XML nodes than required | Extra changes risk introducing subtle corruption |
Clear <v> to empty string when changing a formula |
Prevents stale cached value from misleading downstream consumers |
| Append-only to sharedStrings | Existing indexes must remain valid |
| Append-only to styles collections | Existing style indexes must remain valid |