Files
shihao 6487becf60 Initial commit: add all skills files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 16:52:49 +08:00

24 KiB
Raw Permalink Blame History

Minimal-Invasive Editing of Existing xlsx

Make precise, surgical changes to existing xlsx files while preserving everything you do not touch: styles, macros, pivot tables, charts, sparklines, named ranges, data validation, conditional formatting, and all other embedded content.


1. When to Use This Path

Use the edit (unpack → XML edit → pack) path whenever the task involves modifying an existing xlsx file:

  • Template filling — populating designated input cells with values or formulas
  • Data updates — replacing outdated numbers, text, or dates in a live file
  • Content corrections — fixing wrong values, broken formulas, or mistyped labels
  • Adding new data rows to an existing table
  • Renaming a sheet
  • Applying a new style to specific cells

Do NOT use this path for creating a brand-new workbook from scratch. For that, see create.md.


2. Why openpyxl round-trip Is Forbidden for Existing Files

openpyxl load_workbook() followed by workbook.save() is a destructive operation on any file that contains advanced features. The library silently drops content it does not understand:

Feature openpyxl behavior Consequence
VBA macros (vbaProject.bin) Dropped entirely All automation is lost; file saved as .xlsx not .xlsm
Pivot tables (xl/pivotTables/) Dropped Interactive analysis is destroyed
Slicers Dropped Filter UI is lost
Sparklines (<sparklineGroups>) Dropped In-cell mini-charts disappear
Chart formatting details Partially lost Series colors, custom axes may revert
Print area / page breaks Sometimes lost Print layout changes
Custom XML parts Dropped Third-party data bindings broken
Theme-linked colors May be de-themed Colors converted to absolute, breaking theme switching

Even on a "plain" file without these features, openpyxl may normalize whitespace in XML that Excel relies on, alter namespace declarations, or reset calcMode flags.

The rule is absolute: never open an existing file with openpyxl for the purpose of re-saving it.

The XML direct-edit approach is safe because it operates on the raw bytes. You only change the nodes you touch. Everything else is byte-equivalent to the original.


3. Standard Operating Procedure

Step 1 — Unpack

python3 SKILL_DIR/scripts/xlsx_unpack.py input.xlsx /tmp/xlsx_work/

The script unzips the xlsx, pretty-prints every XML and .rels file, and prints a categorized inventory of key files plus a warning if high-risk content is detected (VBA, pivot tables, charts).

Read the printed output carefully before proceeding. If the script reports xl/vbaProject.bin or xl/pivotTables/, follow the constraints in Section 7.

Step 2 — Reconnaissance

Map the structure before touching anything.

Identify sheet names and their XML files:

xl/workbook.xml  →  <sheet name="Revenue" sheetId="1" r:id="rId1"/>
xl/_rels/workbook.xml.rels  →  <Relationship Id="rId1" Target="worksheets/sheet1.xml"/>

The sheet named "Revenue" lives in xl/worksheets/sheet1.xml. Always resolve this mapping before editing a worksheet.

Understand the shared strings table:

# Count existing entries in xl/sharedStrings.xml
grep -c "<si>" /tmp/xlsx_work/xl/sharedStrings.xml

Every text cell uses a zero-based index into this table. Know the current count before appending.

Understand the styles table:

# Count existing cellXfs entries
grep -c "<xf " /tmp/xlsx_work/xl/styles.xml

New style slots are appended after existing ones. The index of the first new slot = current count.

Scan for high-risk XML regions in the target worksheet:

Look for these elements in the target sheet*.xml before editing:

  • <mergeCell> — merged cell ranges; row/column insertion shifts these
  • <conditionalFormatting> — condition ranges; row/column insertion shifts these
  • <dataValidations> — validation ranges; row/column insertion shifts these
  • <tableParts> — table definitions; row insertion inside a table needs <tableColumn> updates
  • <sparklineGroups> — sparklines; preserve without modification

Step 3 — Map Intent to Minimal XML Changes

Before writing a single character, produce a written list of exactly which XML nodes change. This prevents scope creep.

User intent Files to change Nodes to change
Change a cell's numeric value xl/worksheets/sheetN.xml <v> inside target <c>
Change a cell's text xl/sharedStrings.xml (append) + xl/worksheets/sheetN.xml New <si>, update cell <v> index
Change a cell's formula xl/worksheets/sheetN.xml <f> text inside target <c>
Add a new data row at the bottom xl/worksheets/sheetN.xml + possibly xl/sharedStrings.xml Append <row> element
Apply a new style to cells xl/styles.xml + xl/worksheets/sheetN.xml Append <xf> in <cellXfs>, update s attribute on <c>
Rename a sheet xl/workbook.xml name attribute on <sheet> element
Rename a sheet (with cross-sheet formulas) xl/workbook.xml + all xl/worksheets/*.xml name attribute + <f> text referencing old name

Step 4 — Execute Changes

Use the Edit tool. Edit the minimum. Never rewrite whole files.

See Section 4 for precise XML patterns for each operation type.

Step 5 — Cascade Check

After any change that shifts row or column positions, audit all affected XML regions. See Section 5.

Step 6 — Pack and Validate

python3 SKILL_DIR/scripts/xlsx_pack.py /tmp/xlsx_work/ output.xlsx
python3 SKILL_DIR/scripts/formula_check.py output.xlsx

The pack script validates XML well-formedness before creating the ZIP. Fix any reported parse errors before packing. After packing, run formula_check.py to confirm no formula errors were introduced.


4. Precise XML Patterns for Common Edits

4.1 Changing a Numeric Cell Value

Find the <c r="B5"> element in the worksheet XML and replace the <v> text.

Before:

<c r="B5">
  <v>1000</v>
</c>

After (new value 1500):

<c r="B5">
  <v>1500</v>
</c>

Rules:

  • Do not add or remove the s attribute (style) unless explicitly changing the style.
  • Do not add a t attribute — numbers omit t or use t="n".
  • Do not change the r attribute (cell reference).

4.2 Changing a Text Cell Value

Text cells reference the shared strings table by index (t="s"). You cannot edit the string in-place without affecting every other cell that uses the same index. The safe approach is to append a new entry.

Before — shared strings file (xl/sharedStrings.xml):

<sst count="4" uniqueCount="4">
  <si><t>Revenue</t></si>
  <si><t>Cost</t></si>
  <si><t>Margin</t></si>
  <si><t>Old Label</t></si>
</sst>

After — append new string, increment counts:

<sst count="5" uniqueCount="5">
  <si><t>Revenue</t></si>
  <si><t>Cost</t></si>
  <si><t>Margin</t></si>
  <si><t>Old Label</t></si>
  <si><t>New Label</t></si>
</sst>

New string is at index 4 (zero-based).

Before — cell in worksheet XML:

<c r="A7" t="s">
  <v>3</v>
</c>

After — point to new index:

<c r="A7" t="s">
  <v>4</v>
</c>

Rules:

  • Never modify or delete existing <si> entries. Only append.
  • Both count and uniqueCount must be incremented together.
  • If the new string contains &, <, or >, escape them: &amp;, &lt;, &gt;.
  • If the string has leading or trailing spaces, add xml:space="preserve" to <t>:
    <si><t xml:space="preserve">  indented text  </t></si>
    

4.3 Changing a Formula

Formulas are stored in <f> elements without a leading = (unlike what you type in Excel's UI).

Before:

<c r="C10">
  <f>SUM(C2:C9)</f>
  <v>4800</v>
</c>

After (extended range):

<c r="C10">
  <f>SUM(C2:C11)</f>
  <v></v>
</c>

Rules:

  • Clear <v> to an empty string when changing the formula. The cached value is now stale.
  • Do not add t="s" or any type attribute to formula cells. The t attribute is absent or uses a result-type value, not a formula marker.
  • Cross-sheet references use SheetName!CellRef. If the sheet name contains spaces, wrap in single quotes: 'Q1 Data'!B5.
  • The <f> text must not include the leading =.

Before (converting a hardcoded value to a live formula):

<c r="D15">
  <v>95000</v>
</c>

After:

<c r="D15">
  <f>SUM(D2:D14)</f>
  <v></v>
</c>

4.4 Adding a New Data Row

Append after the last <row> element inside <sheetData>. Row numbers in OOXML are 1-based and must be sequential.

Before (last row is row 10):

  <row r="10">
    <c r="A10" t="s"><v>3</v></c>
    <c r="B10"><v>2023</v></c>
    <c r="C10"><v>88000</v></c>
    <c r="D10"><f>C10*1.1</f><v></v></c>
  </row>
</sheetData>

After (new row 11 appended):

  <row r="10">
    <c r="A10" t="s"><v>3</v></c>
    <c r="B10"><v>2023</v></c>
    <c r="C10"><v>88000</v></c>
    <c r="D10"><f>C10*1.1</f><v></v></c>
  </row>
  <row r="11">
    <c r="A11" t="s"><v>4</v></c>
    <c r="B11"><v>2024</v></c>
    <c r="C11"><v>96000</v></c>
    <c r="D11"><f>C11*1.1</f><v></v></c>
  </row>
</sheetData>

Rules:

  • Every <c> inside the row must have r set to the correct cell address (e.g., A11).
  • Text cells need t="s" and a sharedStrings index in <v>. Numeric cells omit t.
  • Formula cells use <f> and an empty <v>.
  • Copy the s attribute from the row above if you want matching styles. Do not invent a style index that does not exist in styles.xml.
  • If the sheet contains a <dimension> element (e.g., <dimension ref="A1:D10"/>), update it to include the new row: <dimension ref="A1:D11"/>.
  • If the sheet contains a <tableparts> referencing a table, update the table's ref attribute in the corresponding xl/tables/tableN.xml file.

4.5 Adding a New Column

Append new <c> elements to each existing <row> and, if present, update the <cols> section.

Before (rows have columns AC):

<cols>
  <col min="1" max="3" width="14" customWidth="1"/>
</cols>
<sheetData>
  <row r="1">
    <c r="A1" t="s"><v>0</v></c>
    <c r="B1" t="s"><v>1</v></c>
    <c r="C1" t="s"><v>2</v></c>
  </row>
  <row r="2">
    <c r="A2"><v>100</v></c>
    <c r="B2"><v>200</v></c>
    <c r="C2"><v>300</v></c>
  </row>
</sheetData>

After (adding column D):

<cols>
  <col min="1" max="3" width="14" customWidth="1"/>
  <col min="4" max="4" width="14" customWidth="1"/>
</cols>
<sheetData>
  <row r="1">
    <c r="A1" t="s"><v>0</v></c>
    <c r="B1" t="s"><v>1</v></c>
    <c r="C1" t="s"><v>2</v></c>
    <c r="D1" t="s"><v>5</v></c>
  </row>
  <row r="2">
    <c r="A2"><v>100</v></c>
    <c r="B2"><v>200</v></c>
    <c r="C2"><v>300</v></c>
    <c r="D2"><f>A2+B2+C2</f><v></v></c>
  </row>
</sheetData>

Rules:

  • Adding a column at the end (after the last existing column) is safe — no existing formula references shift.
  • Inserting a column in the middle shifts all columns to the right, which requires the same cascade updates as row insertion (see Section 5).
  • Update the <dimension> element if present.

4.6 Modifying or Adding Styles

Styles use a multi-level indirect reference chain. Read ooxml-cheatsheet.md for the full chain. The key rule: only append new entries, never modify existing ones.

Scenario: Add a blue-font style (for hardcoded input cells) that doesn't yet exist.

Step 1 — Check if a matching font already exists in xl/styles.xml:

<!-- Look inside <fonts> for an existing blue font -->
<font>
  <color rgb="000000FF"/>
  <!-- other attributes -->
</font>

If found, note its index (zero-based position in the <fonts> list). If not found, append.

Step 2 — Append the new font if needed:

Before:

<fonts count="3">
  <font>...</font>   <!-- index 0 -->
  <font>...</font>   <!-- index 1 -->
  <font>...</font>   <!-- index 2 -->
</fonts>

After:

<fonts count="4">
  <font>...</font>   <!-- index 0 -->
  <font>...</font>   <!-- index 1 -->
  <font>...</font>   <!-- index 2 -->
  <font>
    <b/>
    <sz val="11"/>
    <color rgb="000000FF"/>
    <name val="Calibri"/>
  </font>             <!-- index 3 (new) -->
</fonts>

Step 3 — Append a new <xf> in <cellXfs>:

Before:

<cellXfs count="5">
  <xf .../>   <!-- index 0 -->
  <xf .../>   <!-- index 1 -->
  <xf .../>   <!-- index 2 -->
  <xf .../>   <!-- index 3 -->
  <xf .../>   <!-- index 4 -->
</cellXfs>

After:

<cellXfs count="6">
  <xf .../>   <!-- index 0 -->
  <xf .../>   <!-- index 1 -->
  <xf .../>   <!-- index 2 -->
  <xf .../>   <!-- index 3 -->
  <xf .../>   <!-- index 4 -->
  <xf numFmtId="0" fontId="3" fillId="0" borderId="0" xfId="0"
      applyFont="1"/>   <!-- index 5 (new) -->
</cellXfs>

Step 4 — Apply to target cells:

Before:

<c r="B3">
  <v>0.08</v>
</c>

After:

<c r="B3" s="5">
  <v>0.08</v>
</c>

Rules:

  • Never delete or reorder existing entries in <fonts>, <fills>, <borders>, <cellXfs>.
  • Always update the count attribute when appending.
  • The new cellXfs index = the old count value before appending (zero-based: if count was 5, new index is 5).
  • Custom numFmt IDs must be 164 or above. IDs 0163 are built-in and must not be re-declared.
  • If the desired style already exists elsewhere in the file (on a similar cell), reuse its s index rather than creating a duplicate.

4.7 Renaming a Sheet

Only xl/workbook.xml needs to change — unless cross-sheet formulas reference the old name.

Before (xl/workbook.xml):

<sheet name="Sheet1" sheetId="1" r:id="rId1"/>

After:

<sheet name="Revenue" sheetId="1" r:id="rId1"/>

If any formula in any worksheet references the old name, update those too:

Before (xl/worksheets/sheet2.xml):

<c r="B5"><f>Sheet1!C10</f><v></v></c>

After:

<c r="B5"><f>Revenue!C10</f><v></v></c>

If the new name contains spaces:

<c r="B5"><f>'Q1 Revenue'!C10</f><v></v></c>

Scan all worksheet XML files for the old name:

grep -r "Sheet1!" /tmp/xlsx_work/xl/worksheets/

Rules:

  • The .rels file and [Content_Types].xml do NOT need to change — they reference the XML file path, not the sheet name.
  • sheetId must not change; it is a stable internal identifier.
  • Sheet names are case-sensitive in formula references.

5. High-Risk Operations — Cascade Effects

5.1 Inserting a Row in the Middle

Inserting a row at position N shifts all rows from N downward. Every reference to those rows in every XML file must be updated.

Files to check and update:

XML region What to update Example shift
Worksheet <row r="..."> attributes Increment row number for all rows >= N r="7"r="8"
All <c r="..."> within those rows Increment row number in cell address r="A7"r="A8"
All <f> formula text in any sheet Shift absolute row references >= N B7B8
<mergeCell ref="..."> Shift start and end rows A7:C7A8:C8
<conditionalFormatting sqref="..."> Shift range A5:D20A5:D21
<dataValidations sqref="..."> Shift range B6:B50B7:B51
xl/charts/chartN.xml data source ranges Shift series ranges Sheet1!$B$5:$B$20Sheet1!$B$6:$B$21
xl/pivotTables/*.xml source ranges Shift source data range Handle with extreme care — see Section 7
<dimension ref="..."> Expand to include new extent A1:D20A1:D21
xl/tables/tableN.xml ref attribute Expand table boundary A1:D20A1:D21

Do not attempt row insertion manually in large or formula-heavy files. Use the dedicated shift script instead:

# Insert 1 row at row 5: all rows 5 and below shift down by 1
python3 SKILL_DIR/scripts/xlsx_shift_rows.py /tmp/xlsx_work/ insert 5 1

# Delete 1 row at row 8: all rows 9 and above shift up by 1
python3 SKILL_DIR/scripts/xlsx_shift_rows.py /tmp/xlsx_work/ delete 8 1

The script updates in one pass: <row r="..."> attributes, <c r="..."> cell addresses, all <f> formula text across every worksheet, <mergeCell> ranges, <conditionalFormatting sqref="...">, <dataValidation sqref="...">, <dimension ref="...">, table ref attributes in xl/tables/, chart series ranges in xl/charts/, and pivot cache source ranges in xl/pivotCaches/.

After running the shift script, always repack and validate:

python3 SKILL_DIR/scripts/xlsx_pack.py /tmp/xlsx_work/ output.xlsx
python3 SKILL_DIR/scripts/formula_check.py output.xlsx

What the script does NOT update (review manually):

  • Named ranges in xl/workbook.xml <definedNames> — check and update if they reference shifted rows.
  • Structured table references (Table[@Column]) inside formulas.
  • External workbook links in xl/externalLinks/.

5.2 Inserting a Column in the Middle

Same cascade logic as row insertion, but for columns. Column references in formulas (B, $C, etc.) and in merged cell ranges, conditional formatting ranges, and chart data sources all need updating.

Column letter shifting is harder to automate safely. Prefer appending columns at the end whenever possible.

5.3 Deleting a Row or Column

Deletion is more dangerous than insertion because any formula that referenced a deleted row or column will become #REF!. Before deleting:

  1. Search all <f> elements for references to the deleted range.
  2. If any formula references a cell in the deleted row/column, do not delete — instead, either clear the row's data or consult the user.
  3. After deletion, shift all references to rows/columns beyond the deletion point downward/leftward.

6. Template Filling — Identifying and Populating Input Cells

Templates designate certain cells as input zones. Common patterns to recognize them:

6.1 How Templates Signal Input Zones

Signal XML manifestation What to look for
Blue font color s attribute pointing to a cellXfs entry with fontId<color rgb="000000FF"/> Check styles.xml to decode s values
Yellow fill (highlight) sfillId<fill><patternFill><fgColor rgb="00FFFF00"/>
Empty <v> element <c r="B5"><v></v></c> or cell entirely absent from <row> The cell has no value yet
Comment/annotation near cell xl/comments1.xml with ref="B5" Comments often label input fields
Named ranges xl/workbook.xml <definedName> elements Template may define InputRevenue etc.

6.2 Filling a Template Cell

Do not change s attributes. Do not change t attributes unless you must change from empty to typed. Only change <v> or add <f>.

Before (empty input cell with style preserved):

<c r="C5" s="3">
  <v></v>
</c>

After (filled with a number, style unchanged):

<c r="C5" s="3">
  <v>125000</v>
</c>

After (filled with text — requires shared string entry first):

<!-- 1. Append to sharedStrings.xml: <si><t>North Region</t></si> at index 7 -->
<c r="C5" t="s" s="3">
  <v>7</v>
</c>

After (filled with a formula, preserving style):

<c r="C5" s="3">
  <f>Assumptions!D12</f>
  <v></v>
</c>

6.3 Locating Input Zones Without Opening the File in Excel

After unpacking, decode the style index on suspected input cells to determine if they have the template's input color:

  1. Note the s value on the cell (e.g., s="4").
  2. In xl/styles.xml, find <cellXfs> and look at the 5th entry (index 4).
  3. Note its fontId (e.g., fontId="2").
  4. In <fonts>, look at the 3rd entry (index 2) and check for <color rgb="000000FF"/> (blue) or other input marker.

If the template uses named ranges as input fields, read them from xl/workbook.xml:

<definedNames>
  <definedName name="InputGrowthRate">Assumptions!$B$5</definedName>
  <definedName name="InputDiscountRate">Assumptions!$B$6</definedName>
</definedNames>

Fill the target cells (Assumptions!B5, Assumptions!B6) directly.

6.4 Template Filling Rules

  • Fill only cells the template designated as inputs. Do not fill cells that are formula-driven.
  • Do not apply new styles when filling. The template's formatting is the deliverable.
  • Do not add or remove rows inside the template's data area unless the template explicitly has an "append here" zone.
  • After filling, verify that no formula errors were introduced: some templates have input-validation formulas that produce #VALUE! if the wrong data type is entered.

7. Files You Must Never Modify

7.1 Absolute no-touch list

File / location Why
xl/vbaProject.bin Binary VBA bytecode. Any byte modification corrupts the macro project. Editing even one bit makes the macros fail to load.
xl/pivotCaches/pivotCacheDefinition*.xml The cache definition ties the pivot table to its source data. Editing it without also updating the corresponding pivotTable*.xml will corrupt the pivot.
xl/pivotTables/*.xml Pivot table XML is tightly coupled with the cache definition and with internal state Excel rebuilds on load. Do not edit. If you shifted rows and the pivot's source range now points to wrong data, update only the <cacheSource> range in the cache definition, and only the ref attribute in the pivot table — no other changes.
xl/slicers/*.xml Slicers are connected to specific cache IDs and pivot fields. Breaking these connections silently corrupts the file.
xl/connections.xml External data connections. Editing breaks live data refresh.
xl/externalLinks/ External workbook links. The binary .bin files in here must not be modified.

7.2 Conditionally safe files (update only specific attributes)

File What you may update What to leave alone
xl/charts/chartN.xml Data series range references (<numRef><f>) after a row/column shift Chart type, formatting, layout
xl/tables/tableN.xml ref attribute on <table> after adding rows Column definitions, style info
xl/pivotCaches/pivotCacheDefinition*.xml ref attribute on <cacheSource><worksheetSource> after shifting source data All other content

8. Validation After Every Edit

Never skip validation. Even a one-character change in a formula can cause cascading errors.

# Pack
python3 SKILL_DIR/scripts/xlsx_pack.py /tmp/xlsx_work/ output.xlsx

# Static formula validation (always run)
python3 SKILL_DIR/scripts/formula_check.py output.xlsx

# Dynamic validation (if LibreOffice available)
python3 SKILL_DIR/scripts/libreoffice_recalc.py output.xlsx /tmp/recalc.xlsx
python3 SKILL_DIR/scripts/formula_check.py /tmp/recalc.xlsx

If formula_check.py reports any error:

  1. Unpack the output file again (it is the packed version).
  2. Locate the reported cell in the worksheet XML.
  3. Fix the <f> element.
  4. Repack and re-validate.

Do not deliver the file until formula_check.py reports zero errors.


9. Absolute Rules Summary

Rule Rationale
Never use openpyxl load_workbook + save on an existing file Round-trip destroys pivot tables, VBA, sparklines, slicers
Never delete or reorder existing <si> entries in sharedStrings Breaks every cell referencing that index
Never delete or reorder existing <xf> entries in <cellXfs> Breaks every cell using that style index
Never modify vbaProject.bin Binary file; any change corrupts VBA
Never change sheetId when renaming a sheet Internal ID is stable; changing it breaks relationships
Never skip post-edit validation Leaves broken references undetected
Never edit more XML nodes than required Extra changes risk introducing subtle corruption
Clear <v> to empty string when changing a formula Prevents stale cached value from misleading downstream consumers
Append-only to sharedStrings Existing indexes must remain valid
Append-only to styles collections Existing style indexes must remain valid