Initial commit: add all skills files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
691
minimax-xlsx/references/create.md
Normal file
691
minimax-xlsx/references/create.md
Normal file
@@ -0,0 +1,691 @@
|
||||
# Build New xlsx from Scratch
|
||||
|
||||
Create new, production-quality xlsx files using the XML approach. NEVER use openpyxl
|
||||
for writing. NEVER hardcode Python-computed values — every derived number must be a
|
||||
live Excel formula.
|
||||
|
||||
---
|
||||
|
||||
## When to Use This Path
|
||||
|
||||
Use this document when the user wants:
|
||||
- A brand-new Excel file that does not yet exist
|
||||
- A generated report, financial model, or data table
|
||||
- Any "create / build / generate / make" request
|
||||
|
||||
If the user provides an existing file to modify, switch to `edit.md` instead.
|
||||
|
||||
---
|
||||
|
||||
## The Non-Negotiable Rules
|
||||
|
||||
Before touching any file, internalize these four rules:
|
||||
|
||||
1. **Formula-First**: Every calculated value (`SUM`, growth rate, ratio, subtotal, etc.)
|
||||
MUST be written as `<f>SUM(B2:B9)</f>`, not as a hardcoded `<v>5000</v>`. Hardcoded
|
||||
numbers go stale when source data changes. Only raw inputs and assumption parameters
|
||||
may be hardcoded values.
|
||||
|
||||
2. **No openpyxl for writing**: The entire file is built by editing XML directly. Python
|
||||
is only allowed for reading/analysis (`pandas.read_excel()`) and for running helper
|
||||
scripts (`xlsx_pack.py`, `formula_check.py`).
|
||||
|
||||
3. **Style encodes meaning**: Blue font = user input/assumption. Black font = formula
|
||||
result. Green font = cross-sheet reference. See `format.md` for the full color system
|
||||
and style index table.
|
||||
|
||||
4. **Validate before delivery**: Run `formula_check.py` and fix all errors before
|
||||
handing the file to the user.
|
||||
|
||||
---
|
||||
|
||||
## Complete Creation Workflow
|
||||
|
||||
### Step 1 — Plan Before Writing
|
||||
|
||||
Define the full structure on paper before touching any XML:
|
||||
|
||||
- **Sheets**: names, order, purpose (e.g., Assumptions / Model / Summary)
|
||||
- **Layout per sheet**: which rows are headers, inputs, formulas, totals
|
||||
- **String inventory**: collect all text labels you will need in sharedStrings
|
||||
- **Style choices**: what number format each column needs (currency, %, integer, year)
|
||||
- **Cross-sheet links**: which sheets pull data from other sheets
|
||||
|
||||
This planning step prevents the costly cycle of adding strings to sharedStrings
|
||||
mid-way and recomputing all indices.
|
||||
|
||||
---
|
||||
|
||||
### Step 2 — Copy Minimal Template
|
||||
|
||||
```bash
|
||||
cp -r SKILL_DIR/templates/minimal_xlsx/ /tmp/xlsx_work/
|
||||
```
|
||||
|
||||
The template gives you a complete, valid 7-file xlsx skeleton:
|
||||
|
||||
```
|
||||
/tmp/xlsx_work/
|
||||
├── [Content_Types].xml ← MIME type registry
|
||||
├── _rels/
|
||||
│ └── .rels ← root relationship (points to workbook.xml)
|
||||
└── xl/
|
||||
├── workbook.xml ← sheet list and calc settings
|
||||
├── styles.xml ← 13 pre-built financial style slots
|
||||
├── sharedStrings.xml ← text string table (starts empty)
|
||||
├── _rels/
|
||||
│ └── workbook.xml.rels ← maps rId → file paths
|
||||
└── worksheets/
|
||||
└── sheet1.xml ← one empty sheet
|
||||
```
|
||||
|
||||
After copying, rename sheets and add content. Do not create files from scratch —
|
||||
always start from the template.
|
||||
|
||||
---
|
||||
|
||||
### Step 3 — Configure Sheet Structure
|
||||
|
||||
#### Single-Sheet Workbook
|
||||
|
||||
The template already has one sheet named "Sheet1". Just change the `name` attribute
|
||||
in `xl/workbook.xml`:
|
||||
|
||||
```xml
|
||||
<sheets>
|
||||
<sheet name="Revenue Model" sheetId="1" r:id="rId1"/>
|
||||
</sheets>
|
||||
```
|
||||
|
||||
No other files need to change for a single-sheet workbook.
|
||||
|
||||
#### Multi-Sheet Workbook
|
||||
|
||||
Four files must be kept in sync. Work through them in this order:
|
||||
|
||||
**IMPORTANT — rId collision rule**: In the template's `workbook.xml.rels`, the IDs
|
||||
`rId1`, `rId2`, and `rId3` are already taken:
|
||||
- `rId1` → `worksheets/sheet1.xml`
|
||||
- `rId2` → `styles.xml`
|
||||
- `rId3` → `sharedStrings.xml`
|
||||
|
||||
New worksheet entries MUST start at `rId4` and count upward.
|
||||
|
||||
**File 1 of 4 — `xl/workbook.xml`** (sheet list):
|
||||
|
||||
```xml
|
||||
<sheets>
|
||||
<sheet name="Assumptions" sheetId="1" r:id="rId1"/>
|
||||
<sheet name="Model" sheetId="2" r:id="rId4"/>
|
||||
<sheet name="Summary" sheetId="3" r:id="rId5"/>
|
||||
</sheets>
|
||||
```
|
||||
|
||||
Special characters in sheet names:
|
||||
- `&` → `&` in XML: `<sheet name="P&L" .../>`
|
||||
- Max 31 characters
|
||||
- Forbidden: `/ \ ? * [ ] :`
|
||||
- Sheet names with spaces need single quotes in formula references: `'Q1 Data'!B5`
|
||||
|
||||
**File 2 of 4 — `xl/_rels/workbook.xml.rels`** (ID → file mapping):
|
||||
|
||||
```xml
|
||||
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
|
||||
<Relationship Id="rId1"
|
||||
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet"
|
||||
Target="worksheets/sheet1.xml"/>
|
||||
<Relationship Id="rId2"
|
||||
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles"
|
||||
Target="styles.xml"/>
|
||||
<Relationship Id="rId3"
|
||||
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/sharedStrings"
|
||||
Target="sharedStrings.xml"/>
|
||||
<Relationship Id="rId4"
|
||||
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet"
|
||||
Target="worksheets/sheet2.xml"/>
|
||||
<Relationship Id="rId5"
|
||||
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet"
|
||||
Target="worksheets/sheet3.xml"/>
|
||||
</Relationships>
|
||||
```
|
||||
|
||||
**File 3 of 4 — `[Content_Types].xml`** (MIME type declarations):
|
||||
|
||||
```xml
|
||||
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
|
||||
<Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
|
||||
<Default Extension="xml" ContentType="application/xml"/>
|
||||
<Override PartName="/xl/workbook.xml"
|
||||
ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"/>
|
||||
<Override PartName="/xl/worksheets/sheet1.xml"
|
||||
ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/>
|
||||
<Override PartName="/xl/worksheets/sheet2.xml"
|
||||
ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/>
|
||||
<Override PartName="/xl/worksheets/sheet3.xml"
|
||||
ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/>
|
||||
<Override PartName="/xl/styles.xml"
|
||||
ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml"/>
|
||||
<Override PartName="/xl/sharedStrings.xml"
|
||||
ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml"/>
|
||||
</Types>
|
||||
```
|
||||
|
||||
**File 4 of 4 — Create new worksheet XML files**
|
||||
|
||||
Copy `sheet1.xml` to `sheet2.xml` and `sheet3.xml`, then clear the `<sheetData>` content:
|
||||
|
||||
```xml
|
||||
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
|
||||
<worksheet
|
||||
xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
|
||||
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
|
||||
<sheetViews>
|
||||
<sheetView workbookViewId="0"/>
|
||||
</sheetViews>
|
||||
<sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25"
|
||||
xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac"/>
|
||||
<sheetData>
|
||||
<!-- Data rows go here -->
|
||||
</sheetData>
|
||||
<pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/>
|
||||
</worksheet>
|
||||
```
|
||||
|
||||
**Sync checklist** — every time you add a sheet, verify all four are consistent:
|
||||
|
||||
| Check | What to verify |
|
||||
|-------|---------------|
|
||||
| `workbook.xml` | New `<sheet name="..." sheetId="N" r:id="rIdX"/>` exists |
|
||||
| `workbook.xml.rels` | New `<Relationship Id="rIdX" ... Target="worksheets/sheetN.xml"/>` exists |
|
||||
| `[Content_Types].xml` | New `<Override PartName="/xl/worksheets/sheetN.xml" .../>` exists |
|
||||
| Filesystem | `xl/worksheets/sheetN.xml` file actually exists |
|
||||
|
||||
---
|
||||
|
||||
### Step 4 — Populate sharedStrings
|
||||
|
||||
All text values (headers, row labels, category names, any string the user will read)
|
||||
must be stored in `xl/sharedStrings.xml`. Cells reference them by 0-based index.
|
||||
|
||||
**Recommended workflow**: collect ALL text you need first, write the complete table once,
|
||||
then fill in indices while writing worksheet XML. This avoids re-counting indices mid-way.
|
||||
|
||||
```xml
|
||||
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
|
||||
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
|
||||
count="10" uniqueCount="10">
|
||||
<si><t>Item</t></si> <!-- index 0 -->
|
||||
<si><t>FY2023A</t></si> <!-- index 1 -->
|
||||
<si><t>FY2024E</t></si> <!-- index 2 -->
|
||||
<si><t>FY2025E</t></si> <!-- index 3 -->
|
||||
<si><t>YoY Growth</t></si> <!-- index 4 -->
|
||||
<si><t>Revenue</t></si> <!-- index 5 -->
|
||||
<si><t>Cost of Goods Sold</t></si> <!-- index 6 -->
|
||||
<si><t>Gross Profit</t></si> <!-- index 7 -->
|
||||
<si><t>EBITDA</t></si> <!-- index 8 -->
|
||||
<si><t>Net Income</t></si> <!-- index 9 -->
|
||||
</sst>
|
||||
```
|
||||
|
||||
**Attribute rules**:
|
||||
- `uniqueCount` = number of `<si>` elements (unique strings in the table)
|
||||
- `count` = total number of cell references to strings across the entire workbook
|
||||
(if "Revenue" appears in 3 sheets, count is `uniqueCount + 2`)
|
||||
- For new files where each string appears once, `count == uniqueCount`
|
||||
- Both attributes MUST be accurate — wrong values trigger warnings in some Excel versions
|
||||
|
||||
**Special character escaping**:
|
||||
|
||||
```xml
|
||||
<si><t>R&D Expenses</t></si> <!-- & must be & -->
|
||||
<si><t>Revenue < Target</t></si> <!-- < must be < -->
|
||||
<si><t xml:space="preserve"> (note) </t></si> <!-- preserve leading/trailing spaces -->
|
||||
```
|
||||
|
||||
**Helper script**: use `shared_strings_builder.py` to generate the complete
|
||||
`sharedStrings.xml` from a plain list of strings:
|
||||
|
||||
```bash
|
||||
python3 SKILL_DIR/scripts/shared_strings_builder.py \
|
||||
"Item" "FY2024" "FY2025" "Revenue" "Gross Profit" \
|
||||
> /tmp/xlsx_work/xl/sharedStrings.xml
|
||||
```
|
||||
|
||||
Or interactively from a file listing one string per line:
|
||||
|
||||
```bash
|
||||
python3 SKILL_DIR/scripts/shared_strings_builder.py --file strings.txt \
|
||||
> /tmp/xlsx_work/xl/sharedStrings.xml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 5 — Write Worksheet Data
|
||||
|
||||
Edit each `xl/worksheets/sheetN.xml`. Replace the empty `<sheetData>` with rows
|
||||
and cells.
|
||||
|
||||
#### Cell XML Anatomy
|
||||
|
||||
```
|
||||
<c r="B5" t="s" s="4">
|
||||
↑ ↑ ↑
|
||||
address type style index (from cellXfs in styles.xml)
|
||||
|
||||
<v>3</v>
|
||||
↑
|
||||
value (for t="s": sharedStrings index; for numbers: the number itself)
|
||||
```
|
||||
|
||||
#### Data Type Reference
|
||||
|
||||
| Data | `t` attr | XML Example | Notes |
|
||||
|------|---------|-------------|-------|
|
||||
| Shared string (text) | `s` | `<c r="A1" t="s" s="4"><v>0</v></c>` | `<v>` = sharedStrings index |
|
||||
| Number | omit | `<c r="B2" s="5"><v>1000000</v></c>` | default type, `t` omitted |
|
||||
| Percentage (as decimal) | omit | `<c r="C2" s="7"><v>0.125</v></c>` | 12.5% stored as 0.125 |
|
||||
| Boolean | `b` | `<c r="D1" t="b"><v>1</v></c>` | 1=TRUE, 0=FALSE |
|
||||
| Formula | omit | `<c r="B4" s="2"><f>SUM(B2:B3)</f><v></v></c>` | `<v>` left empty |
|
||||
| Cross-sheet formula | omit | `<c r="C1" s="3"><f>Assumptions!B2</f><v></v></c>` | use s=3 (green) |
|
||||
|
||||
#### A Full Sheet Data Example
|
||||
|
||||
```xml
|
||||
<cols>
|
||||
<col min="1" max="1" width="26" customWidth="1"/> <!-- A: label column -->
|
||||
<col min="2" max="5" width="14" customWidth="1"/> <!-- B-E: data columns -->
|
||||
</cols>
|
||||
<sheetData>
|
||||
|
||||
<!-- Row 1: headers (style 4 = bold header) -->
|
||||
<row r="1" ht="18" customHeight="1">
|
||||
<c r="A1" t="s" s="4"><v>0</v></c> <!-- "Item" -->
|
||||
<c r="B1" t="s" s="4"><v>1</v></c> <!-- "FY2023A" -->
|
||||
<c r="C1" t="s" s="4"><v>2</v></c> <!-- "FY2024E" -->
|
||||
<c r="D1" t="s" s="4"><v>3</v></c> <!-- "FY2025E" -->
|
||||
<c r="E1" t="s" s="4"><v>4</v></c> <!-- "YoY Growth" -->
|
||||
</row>
|
||||
|
||||
<!-- Row 2: Revenue — actual value (input) + formula (computed) -->
|
||||
<row r="2">
|
||||
<c r="A2" t="s" s="1"><v>5</v></c> <!-- "Revenue", blue input label -->
|
||||
<c r="B2" s="5"><v>85000000</v></c> <!-- FY2023A actual: $85M, currency input -->
|
||||
<c r="C2" s="6"><f>B2*(1+Assumptions!C3)</f><v></v></c> <!-- formula, currency -->
|
||||
<c r="D2" s="6"><f>C2*(1+Assumptions!D3)</f><v></v></c>
|
||||
<c r="E2" s="8"><f>D2/C2-1</f><v></v></c> <!-- YoY growth, percentage formula -->
|
||||
</row>
|
||||
|
||||
<!-- Row 3: Gross Profit -->
|
||||
<row r="3">
|
||||
<c r="A3" t="s" s="2"><v>7</v></c> <!-- "Gross Profit", black formula label -->
|
||||
<c r="B3" s="6"><f>B2*Assumptions!B4</f><v></v></c>
|
||||
<c r="C3" s="6"><f>C2*Assumptions!C4</f><v></v></c>
|
||||
<c r="D3" s="6"><f>D2*Assumptions!D4</f><v></v></c>
|
||||
<c r="E3" s="8"><f>D3/C3-1</f><v></v></c>
|
||||
</row>
|
||||
|
||||
<!-- Row 5: SUM total row -->
|
||||
<row r="5">
|
||||
<c r="A5" t="s" s="4"><v>8</v></c> <!-- "EBITDA" -->
|
||||
<c r="B5" s="6"><f>SUM(B2:B4)</f><v></v></c>
|
||||
<c r="C5" s="6"><f>SUM(C2:C4)</f><v></v></c>
|
||||
<c r="D5" s="6"><f>SUM(D2:D4)</f><v></v></c>
|
||||
<c r="E5" s="8"><f>D5/C5-1</f><v></v></c>
|
||||
</row>
|
||||
|
||||
</sheetData>
|
||||
```
|
||||
|
||||
#### Column Width and Freeze Pane
|
||||
|
||||
Column widths go **before** `<sheetData>`, freeze pane goes inside `<sheetView>`:
|
||||
|
||||
```xml
|
||||
<!-- Inside <sheetViews><sheetView ...> — freeze the header row -->
|
||||
<pane ySplit="1" topLeftCell="A2" activePane="bottomLeft" state="frozen"/>
|
||||
|
||||
<!-- Before <sheetData> — set column widths -->
|
||||
<cols>
|
||||
<col min="1" max="1" width="28" customWidth="1"/>
|
||||
<col min="2" max="8" width="14" customWidth="1"/>
|
||||
</cols>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 6 — Apply Styles
|
||||
|
||||
The template's `xl/styles.xml` has 13 pre-built semantic style slots (indices 0–12).
|
||||
**Read `format.md` for the complete style index table, color system, and how to add new styles.**
|
||||
|
||||
Quick reference for the most common slots:
|
||||
|
||||
| `s` | Role | Example |
|
||||
|-----|------|---------|
|
||||
| 4 | Header (bold) | Column/row titles |
|
||||
| 5 / 6 | Currency input (blue) / formula (black) | `$#,##0` |
|
||||
| 7 / 8 | Percentage input / formula | `0.0%` |
|
||||
| 11 | Year (no comma) | 2024 not 2,024 |
|
||||
|
||||
Design principle: Blue = human sets this. Black = Excel computes this. Green = cross-sheet.
|
||||
|
||||
If you need a style not in the 13 pre-built slots, follow the append-only procedure in `format.md` section 3.2.
|
||||
|
||||
---
|
||||
|
||||
### Step 7 — Formula Cookbook
|
||||
|
||||
#### XML Formula Syntax Reminder
|
||||
|
||||
Formulas in XML have **no leading `=`**:
|
||||
|
||||
```xml
|
||||
<!-- Excel UI: =SUM(B2:B9) → XML: -->
|
||||
<c r="B10" s="6"><f>SUM(B2:B9)</f><v></v></c>
|
||||
```
|
||||
|
||||
#### Basic Aggregations
|
||||
|
||||
```xml
|
||||
<c r="B10" s="6"><f>SUM(B2:B9)</f><v></v></c>
|
||||
<c r="B11" s="6"><f>AVERAGE(B2:B9)</f><v></v></c>
|
||||
<c r="B12" s="10"><f>COUNT(B2:B9)</f><v></v></c>
|
||||
<c r="B13" s="10"><f>COUNTA(A2:A100)</f><v></v></c>
|
||||
<c r="B14" s="6"><f>MAX(B2:B9)</f><v></v></c>
|
||||
<c r="B15" s="6"><f>MIN(B2:B9)</f><v></v></c>
|
||||
```
|
||||
|
||||
#### Financial Calculations
|
||||
|
||||
```xml
|
||||
<!-- YoY growth rate: current / prior - 1 -->
|
||||
<c r="E5" s="8"><f>D5/C5-1</f><v></v></c>
|
||||
|
||||
<!-- Gross profit: revenue × gross margin -->
|
||||
<c r="B6" s="6"><f>B4*B3</f><v></v></c>
|
||||
|
||||
<!-- EBITDA margin: EBITDA / Revenue -->
|
||||
<c r="B9" s="8"><f>B8/B4</f><v></v></c>
|
||||
|
||||
<!-- Suppress #DIV/0! when denominator may be zero -->
|
||||
<c r="E5" s="8"><f>IF(C5=0,0,D5/C5-1)</f><v></v></c>
|
||||
|
||||
<!-- NPV and IRR (cash flows in B2:B7, discount rate in B1) -->
|
||||
<c r="C1" s="6"><f>NPV(B1,B3:B7)+B2</f><v></v></c>
|
||||
<c r="C2" s="8"><f>IRR(B2:B7)</f><v></v></c>
|
||||
```
|
||||
|
||||
#### Cross-Sheet References
|
||||
|
||||
```xml
|
||||
<!-- No spaces in name: no quotes needed -->
|
||||
<c r="B3" s="3"><f>Assumptions!B5</f><v></v></c>
|
||||
|
||||
<!-- Space in sheet name: single quotes required -->
|
||||
<c r="B3" s="3"><f>'Q1 Data'!B5</f><v></v></c>
|
||||
|
||||
<!-- Ampersand in sheet name (XML-escaped in workbook.xml, but in formula: literal &) -->
|
||||
<c r="B3" s="3"><f>'R&D'!B5</f><v></v></c>
|
||||
|
||||
<!-- Cross-sheet range: SUM of a range in another sheet -->
|
||||
<c r="B10" s="6"><f>SUM(Data!C2:C1000)</f><v></v></c>
|
||||
|
||||
<!-- 3D reference: sum same cell across multiple sheets -->
|
||||
<c r="B5" s="6"><f>SUM(Jan:Dec!B5)</f><v></v></c>
|
||||
```
|
||||
|
||||
Cross-sheet formula cells should use `s="3"` (green) to signal the data origin.
|
||||
|
||||
#### Shared Formulas (Same Pattern Repeated Down a Column)
|
||||
|
||||
When many consecutive cells share the same formula structure with only the row number
|
||||
changing, use shared formulas to keep the XML compact:
|
||||
|
||||
```xml
|
||||
<!-- D2: defines the shared group (si="0", ref="D2:D11") -->
|
||||
<c r="D2" s="8"><f t="shared" ref="D2:D11" si="0">C2/B2-1</f><v></v></c>
|
||||
|
||||
<!-- D3 through D11: reference the same group, no formula text needed -->
|
||||
<c r="D3" s="8"><f t="shared" si="0"/><v></v></c>
|
||||
<c r="D4" s="8"><f t="shared" si="0"/><v></v></c>
|
||||
<c r="D5" s="8"><f t="shared" si="0"/><v></v></c>
|
||||
<c r="D6" s="8"><f t="shared" si="0"/><v></v></c>
|
||||
<c r="D7" s="8"><f t="shared" si="0"/><v></v></c>
|
||||
<c r="D8" s="8"><f t="shared" si="0"/><v></v></c>
|
||||
<c r="D9" s="8"><f t="shared" si="0"/><v></v></c>
|
||||
<c r="D10" s="8"><f t="shared" si="0"/><v></v></c>
|
||||
<c r="D11" s="8"><f t="shared" si="0"/><v></v></c>
|
||||
```
|
||||
|
||||
Excel adjusts relative references automatically (D3 computes `C3/B3-1`, etc.).
|
||||
If you have multiple shared formula groups, assign sequential `si` values (0, 1, 2, …).
|
||||
|
||||
#### Absolute References
|
||||
|
||||
```xml
|
||||
<!-- $B$2 locks to that cell when the formula is copied -->
|
||||
<c r="C5" s="8"><f>B5/$B$2</f><v></v></c>
|
||||
```
|
||||
|
||||
The `$` character needs no XML escaping — write it literally.
|
||||
|
||||
#### Lookup Formulas
|
||||
|
||||
```xml
|
||||
<!-- VLOOKUP: exact match (last arg 0) -->
|
||||
<c r="C5" s="6"><f>VLOOKUP(A5,Assumptions!A:C,2,0)</f><v></v></c>
|
||||
|
||||
<!-- INDEX/MATCH: more flexible -->
|
||||
<c r="C5" s="6"><f>INDEX(B:B,MATCH(A5,A:A,0))</f><v></v></c>
|
||||
|
||||
<!-- XLOOKUP (Excel 2019+) -->
|
||||
<c r="C5" s="6"><f>XLOOKUP(A5,A:A,B:B)</f><v></v></c>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 8 — Pack and Validate
|
||||
|
||||
**Pack**:
|
||||
|
||||
```bash
|
||||
python3 SKILL_DIR/scripts/xlsx_pack.py /tmp/xlsx_work/ /path/to/output.xlsx
|
||||
```
|
||||
|
||||
`xlsx_pack.py` will:
|
||||
1. Check that `[Content_Types].xml` exists at the root
|
||||
2. Parse every `.xml` and `.rels` file for well-formedness — abort if any fail
|
||||
3. Create the ZIP archive with correct compression
|
||||
|
||||
**Validate**:
|
||||
|
||||
```bash
|
||||
python3 SKILL_DIR/scripts/formula_check.py /path/to/output.xlsx
|
||||
```
|
||||
|
||||
`formula_check.py` will:
|
||||
1. Scan every cell for `<c t="e">` entries (cached error values) — all 7 error types
|
||||
2. Extract sheet name references from every `<f>` formula
|
||||
3. Verify each referenced sheet exists in `workbook.xml`
|
||||
|
||||
Fix every reported error before delivery. Exit code 0 = safe to deliver.
|
||||
|
||||
---
|
||||
|
||||
## Pre-Delivery Checklist
|
||||
|
||||
Run through this list before handing the file to the user:
|
||||
|
||||
- [ ] `formula_check.py` reports 0 errors
|
||||
- [ ] Every calculated cell has `<f>` — not just `<v>` with a number
|
||||
- [ ] `sharedStrings.xml` `count` and `uniqueCount` match actual `<si>` count
|
||||
- [ ] Every cell `s` attribute value is in range `0` to `cellXfs count - 1`
|
||||
- [ ] Every sheet in `workbook.xml` has a matching entry in `workbook.xml.rels`
|
||||
- [ ] Every `worksheets/sheetN.xml` file has a matching `<Override>` in `[Content_Types].xml`
|
||||
- [ ] Year columns use `s="11"` (format `0`, no thousands separator)
|
||||
- [ ] Cross-sheet reference formulas use `s="3"` (green font)
|
||||
- [ ] Assumption inputs use `s="1"` or `s="5"` or `s="7"` (blue font)
|
||||
|
||||
---
|
||||
|
||||
## Common Mistakes and Fixes
|
||||
|
||||
| Mistake | Symptom | Fix |
|
||||
|---------|---------|-----|
|
||||
| Formula has leading `=` | Cell shows `=SUM(...)` as text | Remove `=` from `<f>` content |
|
||||
| sharedStrings `count` not updated | Excel warning or blank cells | Count `<si>` elements, update both `count` and `uniqueCount` |
|
||||
| Style index out of range | File corruption / Excel repair | Ensure `s` < `cellXfs count`; append new `<xf>` if needed |
|
||||
| New sheet rId conflicts with styles/sharedStrings rId | Sheet missing or styles lost | New sheets use rId4, rId5, … (rId1-3 are reserved in template) |
|
||||
| Sheet name has `&` unescaped in XML | XML parse error | Use `&` in `workbook.xml` name attribute |
|
||||
| Cross-sheet ref to sheet with space, no quotes | `#REF!` error | Wrap sheet name in single quotes: `'Sheet Name'!B5` |
|
||||
| Cross-sheet ref to non-existent sheet | `#REF!` error | Check `workbook.xml` sheet list vs formula |
|
||||
| Number stored as text (`t="s"`) | Left-aligned, can't sum | Remove `t` attribute from number cells |
|
||||
| Year displayed as `2,024` | Readability issue | Use `s="11"` (numFmtId=1, format `0`) |
|
||||
| Hardcoded Python result instead of formula | "Dead table" — won't update | Replace `<v>N</v>` with `<f>formula</f><v></v>` |
|
||||
|
||||
---
|
||||
|
||||
## Column Letter Reference
|
||||
|
||||
| Col # | Letter | Col # | Letter | Col # | Letter |
|
||||
|-------|--------|-------|--------|-------|--------|
|
||||
| 1 | A | 26 | Z | 27 | AA |
|
||||
| 28 | AB | 52 | AZ | 53 | BA |
|
||||
| 54 | BB | 78 | BZ | 79 | CA |
|
||||
|
||||
Python conversion (use when building formulas programmatically):
|
||||
|
||||
```python
|
||||
def col_letter(n: int) -> str:
|
||||
"""Convert 1-based column number to Excel letter (A, B, ..., Z, AA, AB, ...)."""
|
||||
result = ""
|
||||
while n > 0:
|
||||
n, rem = divmod(n - 1, 26)
|
||||
result = chr(65 + rem) + result
|
||||
return result
|
||||
|
||||
def col_number(s: str) -> int:
|
||||
"""Convert Excel column letter to 1-based number."""
|
||||
n = 0
|
||||
for c in s.upper():
|
||||
n = n * 26 + (ord(c) - 64)
|
||||
return n
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Typical Scenario Walkthroughs
|
||||
|
||||
### Scenario A — Three-Year Financial Model (Single Sheet)
|
||||
|
||||
Layout: rows 1-12 = Assumptions (blue inputs) / rows 14-30 = Model (black formulas).
|
||||
|
||||
```xml
|
||||
<!-- sharedStrings.xml (excerpt) -->
|
||||
<sst count="8" uniqueCount="8">
|
||||
<si><t>Metric</t></si> <!-- 0 -->
|
||||
<si><t>FY2023A</t></si> <!-- 1 -->
|
||||
<si><t>FY2024E</t></si> <!-- 2 -->
|
||||
<si><t>FY2025E</t></si> <!-- 3 -->
|
||||
<si><t>Revenue Growth</t></si> <!-- 4 -->
|
||||
<si><t>Gross Margin</t></si> <!-- 5 -->
|
||||
<si><t>Revenue</t></si> <!-- 6 -->
|
||||
<si><t>Gross Profit</t></si> <!-- 7 -->
|
||||
</sst>
|
||||
|
||||
<!-- sheet1.xml (excerpt) -->
|
||||
<sheetData>
|
||||
<!-- Header -->
|
||||
<row r="1">
|
||||
<c r="A1" t="s" s="4"><v>0</v></c>
|
||||
<c r="B1" t="s" s="4"><v>1</v></c>
|
||||
<c r="C1" t="s" s="4"><v>2</v></c>
|
||||
<c r="D1" t="s" s="4"><v>3</v></c>
|
||||
</row>
|
||||
<!-- Assumptions (rows 2-3) -->
|
||||
<row r="2">
|
||||
<c r="A2" t="s" s="1"><v>4</v></c> <!-- "Revenue Growth", blue -->
|
||||
<c r="B2" s="7"><v>0</v></c> <!-- FY2023A: n/a, 0% placeholder -->
|
||||
<c r="C2" s="7"><v>0.12</v></c> <!-- FY2024E: 12.0% input -->
|
||||
<c r="D2" s="7"><v>0.15</v></c> <!-- FY2025E: 15.0% input -->
|
||||
</row>
|
||||
<row r="3">
|
||||
<c r="A3" t="s" s="1"><v>5</v></c> <!-- "Gross Margin", blue -->
|
||||
<c r="B3" s="7"><v>0.45</v></c>
|
||||
<c r="C3" s="7"><v>0.46</v></c>
|
||||
<c r="D3" s="7"><v>0.47</v></c>
|
||||
</row>
|
||||
<!-- Model (rows 14-15) -->
|
||||
<row r="14">
|
||||
<c r="A14" t="s" s="2"><v>6</v></c> <!-- "Revenue", black -->
|
||||
<c r="B14" s="5"><v>85000000</v></c> <!-- actual, currency input -->
|
||||
<c r="C14" s="6"><f>B14*(1+C2)</f><v></v></c>
|
||||
<c r="D14" s="6"><f>C14*(1+D2)</f><v></v></c>
|
||||
</row>
|
||||
<row r="15">
|
||||
<c r="A15" t="s" s="2"><v>7</v></c> <!-- "Gross Profit", black -->
|
||||
<c r="B15" s="6"><f>B14*B3</f><v></v></c>
|
||||
<c r="C15" s="6"><f>C14*C3</f><v></v></c>
|
||||
<c r="D15" s="6"><f>D14*D3</f><v></v></c>
|
||||
</row>
|
||||
</sheetData>
|
||||
```
|
||||
|
||||
### Scenario B — Data + Summary (Two Sheets)
|
||||
|
||||
The `Summary` sheet pulls from `Data` using cross-sheet formulas (green, `s="3"`):
|
||||
|
||||
```xml
|
||||
<!-- Summary/sheet2.xml sheetData excerpt -->
|
||||
<sheetData>
|
||||
<row r="1">
|
||||
<c r="A1" t="s" s="4"><v>0</v></c> <!-- "Metric" -->
|
||||
<c r="B1" t="s" s="4"><v>1</v></c> <!-- "Value" -->
|
||||
</row>
|
||||
<row r="2">
|
||||
<c r="A2" t="s" s="0"><v>2</v></c> <!-- "Total Revenue" -->
|
||||
<c r="B2" s="3"><f>SUM(Data!C2:C10000)</f><v></v></c>
|
||||
</row>
|
||||
<row r="3">
|
||||
<c r="A3" t="s" s="0"><v>3</v></c> <!-- "Deal Count" -->
|
||||
<c r="B3" s="3"><f>COUNTA(Data!A2:A10000)</f><v></v></c>
|
||||
</row>
|
||||
<row r="4">
|
||||
<c r="A4" t="s" s="0"><v>4</v></c> <!-- "Avg Deal Size" -->
|
||||
<c r="B4" s="3"><f>IF(B3=0,0,B2/B3)</f><v></v></c>
|
||||
</row>
|
||||
</sheetData>
|
||||
```
|
||||
|
||||
### Scenario C — Multi-Department Consolidation
|
||||
|
||||
`Consolidated` sheet sums the same cells from multiple department sheets:
|
||||
|
||||
```xml
|
||||
<!-- Consolidated/sheet4.xml — summing across Dept_Eng and Dept_Mkt -->
|
||||
<sheetData>
|
||||
<row r="5">
|
||||
<c r="A5" t="s" s="2"><v>0</v></c>
|
||||
<!-- No spaces in sheet names → no quotes needed -->
|
||||
<c r="B5" s="3"><f>Dept_Engineering!B5+Dept_Marketing!B5</f><v></v></c>
|
||||
</row>
|
||||
<row r="6">
|
||||
<c r="A6" t="s" s="2"><v>1</v></c>
|
||||
<c r="B6" s="3"><f>SUM(Dept_Engineering!B6,Dept_Marketing!B6)</f><v></v></c>
|
||||
</row>
|
||||
</sheetData>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## What You Must NOT Do
|
||||
|
||||
- Do NOT use openpyxl or any Python library to write the final xlsx file
|
||||
- Do NOT hardcode any calculated value — use `<f>` formulas for every derived number
|
||||
- Do NOT deliver without running `formula_check.py` first
|
||||
- Do NOT set a cell's `s` attribute to a value >= `cellXfs count`
|
||||
- Do NOT modify an existing `<xf>` entry in `styles.xml` — only append new ones
|
||||
- Do NOT add a new sheet without updating all four sync points (workbook.xml,
|
||||
workbook.xml.rels, [Content_Types].xml, actual .xml file)
|
||||
- Do NOT assign new worksheet rIds that overlap with rId1, rId2, or rId3 (reserved
|
||||
for sheet1, styles, sharedStrings in the template)
|
||||
684
minimax-xlsx/references/edit.md
Normal file
684
minimax-xlsx/references/edit.md
Normal file
@@ -0,0 +1,684 @@
|
||||
# Minimal-Invasive Editing of Existing xlsx
|
||||
|
||||
Make precise, surgical changes to existing xlsx files while preserving everything you do not touch: styles, macros, pivot tables, charts, sparklines, named ranges, data validation, conditional formatting, and all other embedded content.
|
||||
|
||||
---
|
||||
|
||||
## 1. When to Use This Path
|
||||
|
||||
Use the edit (unpack → XML edit → pack) path whenever the task involves **modifying an existing xlsx file**:
|
||||
|
||||
- Template filling — populating designated input cells with values or formulas
|
||||
- Data updates — replacing outdated numbers, text, or dates in a live file
|
||||
- Content corrections — fixing wrong values, broken formulas, or mistyped labels
|
||||
- Adding new data rows to an existing table
|
||||
- Renaming a sheet
|
||||
- Applying a new style to specific cells
|
||||
|
||||
Do NOT use this path for creating a brand-new workbook from scratch. For that, see `create.md`.
|
||||
|
||||
---
|
||||
|
||||
## 2. Why openpyxl round-trip Is Forbidden for Existing Files
|
||||
|
||||
openpyxl `load_workbook()` followed by `workbook.save()` is a **destructive operation** on any file that contains advanced features. The library silently drops content it does not understand:
|
||||
|
||||
| Feature | openpyxl behavior | Consequence |
|
||||
|---------|-------------------|-------------|
|
||||
| VBA macros (`vbaProject.bin`) | Dropped entirely | All automation is lost; file saved as `.xlsx` not `.xlsm` |
|
||||
| Pivot tables (`xl/pivotTables/`) | Dropped | Interactive analysis is destroyed |
|
||||
| Slicers | Dropped | Filter UI is lost |
|
||||
| Sparklines (`<sparklineGroups>`) | Dropped | In-cell mini-charts disappear |
|
||||
| Chart formatting details | Partially lost | Series colors, custom axes may revert |
|
||||
| Print area / page breaks | Sometimes lost | Print layout changes |
|
||||
| Custom XML parts | Dropped | Third-party data bindings broken |
|
||||
| Theme-linked colors | May be de-themed | Colors converted to absolute, breaking theme switching |
|
||||
|
||||
Even on a "plain" file without these features, openpyxl may normalize whitespace in XML that Excel relies on, alter namespace declarations, or reset `calcMode` flags.
|
||||
|
||||
**The rule is absolute: never open an existing file with openpyxl for the purpose of re-saving it.**
|
||||
|
||||
The XML direct-edit approach is safe because it operates on the raw bytes. You only change the nodes you touch. Everything else is byte-equivalent to the original.
|
||||
|
||||
---
|
||||
|
||||
## 3. Standard Operating Procedure
|
||||
|
||||
### Step 1 — Unpack
|
||||
|
||||
```bash
|
||||
python3 SKILL_DIR/scripts/xlsx_unpack.py input.xlsx /tmp/xlsx_work/
|
||||
```
|
||||
|
||||
The script unzips the xlsx, pretty-prints every XML and `.rels` file, and prints a categorized inventory of key files plus a warning if high-risk content is detected (VBA, pivot tables, charts).
|
||||
|
||||
Read the printed output carefully before proceeding. If the script reports `xl/vbaProject.bin` or `xl/pivotTables/`, follow the constraints in Section 7.
|
||||
|
||||
### Step 2 — Reconnaissance
|
||||
|
||||
Map the structure before touching anything.
|
||||
|
||||
**Identify sheet names and their XML files:**
|
||||
|
||||
```
|
||||
xl/workbook.xml → <sheet name="Revenue" sheetId="1" r:id="rId1"/>
|
||||
xl/_rels/workbook.xml.rels → <Relationship Id="rId1" Target="worksheets/sheet1.xml"/>
|
||||
```
|
||||
|
||||
The sheet named "Revenue" lives in `xl/worksheets/sheet1.xml`. Always resolve this mapping before editing a worksheet.
|
||||
|
||||
**Understand the shared strings table:**
|
||||
|
||||
```bash
|
||||
# Count existing entries in xl/sharedStrings.xml
|
||||
grep -c "<si>" /tmp/xlsx_work/xl/sharedStrings.xml
|
||||
```
|
||||
|
||||
Every text cell uses a zero-based index into this table. Know the current count before appending.
|
||||
|
||||
**Understand the styles table:**
|
||||
|
||||
```bash
|
||||
# Count existing cellXfs entries
|
||||
grep -c "<xf " /tmp/xlsx_work/xl/styles.xml
|
||||
```
|
||||
|
||||
New style slots are appended after existing ones. The index of the first new slot = current count.
|
||||
|
||||
**Scan for high-risk XML regions in the target worksheet:**
|
||||
|
||||
Look for these elements in the target `sheet*.xml` before editing:
|
||||
|
||||
- `<mergeCell>` — merged cell ranges; row/column insertion shifts these
|
||||
- `<conditionalFormatting>` — condition ranges; row/column insertion shifts these
|
||||
- `<dataValidations>` — validation ranges; row/column insertion shifts these
|
||||
- `<tableParts>` — table definitions; row insertion inside a table needs `<tableColumn>` updates
|
||||
- `<sparklineGroups>` — sparklines; preserve without modification
|
||||
|
||||
### Step 3 — Map Intent to Minimal XML Changes
|
||||
|
||||
Before writing a single character, produce a written list of exactly which XML nodes change. This prevents scope creep.
|
||||
|
||||
| User intent | Files to change | Nodes to change |
|
||||
|-------------|----------------|-----------------|
|
||||
| Change a cell's numeric value | `xl/worksheets/sheetN.xml` | `<v>` inside target `<c>` |
|
||||
| Change a cell's text | `xl/sharedStrings.xml` (append) + `xl/worksheets/sheetN.xml` | New `<si>`, update cell `<v>` index |
|
||||
| Change a cell's formula | `xl/worksheets/sheetN.xml` | `<f>` text inside target `<c>` |
|
||||
| Add a new data row at the bottom | `xl/worksheets/sheetN.xml` + possibly `xl/sharedStrings.xml` | Append `<row>` element |
|
||||
| Apply a new style to cells | `xl/styles.xml` + `xl/worksheets/sheetN.xml` | Append `<xf>` in `<cellXfs>`, update `s` attribute on `<c>` |
|
||||
| Rename a sheet | `xl/workbook.xml` | `name` attribute on `<sheet>` element |
|
||||
| Rename a sheet (with cross-sheet formulas) | `xl/workbook.xml` + all `xl/worksheets/*.xml` | `name` attribute + `<f>` text referencing old name |
|
||||
|
||||
### Step 4 — Execute Changes
|
||||
|
||||
Use the Edit tool. Edit the minimum. Never rewrite whole files.
|
||||
|
||||
See Section 4 for precise XML patterns for each operation type.
|
||||
|
||||
### Step 5 — Cascade Check
|
||||
|
||||
After any change that shifts row or column positions, audit all affected XML regions. See Section 5.
|
||||
|
||||
### Step 6 — Pack and Validate
|
||||
|
||||
```bash
|
||||
python3 SKILL_DIR/scripts/xlsx_pack.py /tmp/xlsx_work/ output.xlsx
|
||||
python3 SKILL_DIR/scripts/formula_check.py output.xlsx
|
||||
```
|
||||
|
||||
The pack script validates XML well-formedness before creating the ZIP. Fix any reported parse errors before packing. After packing, run `formula_check.py` to confirm no formula errors were introduced.
|
||||
|
||||
---
|
||||
|
||||
## 4. Precise XML Patterns for Common Edits
|
||||
|
||||
### 4.1 Changing a Numeric Cell Value
|
||||
|
||||
Find the `<c r="B5">` element in the worksheet XML and replace the `<v>` text.
|
||||
|
||||
**Before:**
|
||||
```xml
|
||||
<c r="B5">
|
||||
<v>1000</v>
|
||||
</c>
|
||||
```
|
||||
|
||||
**After (new value 1500):**
|
||||
```xml
|
||||
<c r="B5">
|
||||
<v>1500</v>
|
||||
</c>
|
||||
```
|
||||
|
||||
Rules:
|
||||
- Do not add or remove the `s` attribute (style) unless explicitly changing the style.
|
||||
- Do not add a `t` attribute — numbers omit `t` or use `t="n"`.
|
||||
- Do not change the `r` attribute (cell reference).
|
||||
|
||||
---
|
||||
|
||||
### 4.2 Changing a Text Cell Value
|
||||
|
||||
Text cells reference the shared strings table by index (`t="s"`). You cannot edit the string in-place without affecting every other cell that uses the same index. The safe approach is to append a new entry.
|
||||
|
||||
**Before — shared strings file (`xl/sharedStrings.xml`):**
|
||||
```xml
|
||||
<sst count="4" uniqueCount="4">
|
||||
<si><t>Revenue</t></si>
|
||||
<si><t>Cost</t></si>
|
||||
<si><t>Margin</t></si>
|
||||
<si><t>Old Label</t></si>
|
||||
</sst>
|
||||
```
|
||||
|
||||
**After — append new string, increment counts:**
|
||||
```xml
|
||||
<sst count="5" uniqueCount="5">
|
||||
<si><t>Revenue</t></si>
|
||||
<si><t>Cost</t></si>
|
||||
<si><t>Margin</t></si>
|
||||
<si><t>Old Label</t></si>
|
||||
<si><t>New Label</t></si>
|
||||
</sst>
|
||||
```
|
||||
|
||||
New string is at index 4 (zero-based).
|
||||
|
||||
**Before — cell in worksheet XML:**
|
||||
```xml
|
||||
<c r="A7" t="s">
|
||||
<v>3</v>
|
||||
</c>
|
||||
```
|
||||
|
||||
**After — point to new index:**
|
||||
```xml
|
||||
<c r="A7" t="s">
|
||||
<v>4</v>
|
||||
</c>
|
||||
```
|
||||
|
||||
Rules:
|
||||
- Never modify or delete existing `<si>` entries. Only append.
|
||||
- Both `count` and `uniqueCount` must be incremented together.
|
||||
- If the new string contains `&`, `<`, or `>`, escape them: `&`, `<`, `>`.
|
||||
- If the string has leading or trailing spaces, add `xml:space="preserve"` to `<t>`:
|
||||
```xml
|
||||
<si><t xml:space="preserve"> indented text </t></si>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4.3 Changing a Formula
|
||||
|
||||
Formulas are stored in `<f>` elements **without a leading `=`** (unlike what you type in Excel's UI).
|
||||
|
||||
**Before:**
|
||||
```xml
|
||||
<c r="C10">
|
||||
<f>SUM(C2:C9)</f>
|
||||
<v>4800</v>
|
||||
</c>
|
||||
```
|
||||
|
||||
**After (extended range):**
|
||||
```xml
|
||||
<c r="C10">
|
||||
<f>SUM(C2:C11)</f>
|
||||
<v></v>
|
||||
</c>
|
||||
```
|
||||
|
||||
Rules:
|
||||
- Clear `<v>` to an empty string when changing the formula. The cached value is now stale.
|
||||
- Do not add `t="s"` or any type attribute to formula cells. The `t` attribute is absent or uses a result-type value, not a formula marker.
|
||||
- Cross-sheet references use `SheetName!CellRef`. If the sheet name contains spaces, wrap in single quotes: `'Q1 Data'!B5`.
|
||||
- The `<f>` text must not include the leading `=`.
|
||||
|
||||
**Before (converting a hardcoded value to a live formula):**
|
||||
```xml
|
||||
<c r="D15">
|
||||
<v>95000</v>
|
||||
</c>
|
||||
```
|
||||
|
||||
**After:**
|
||||
```xml
|
||||
<c r="D15">
|
||||
<f>SUM(D2:D14)</f>
|
||||
<v></v>
|
||||
</c>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4.4 Adding a New Data Row
|
||||
|
||||
Append after the last `<row>` element inside `<sheetData>`. Row numbers in OOXML are 1-based and must be sequential.
|
||||
|
||||
**Before (last row is row 10):**
|
||||
```xml
|
||||
<row r="10">
|
||||
<c r="A10" t="s"><v>3</v></c>
|
||||
<c r="B10"><v>2023</v></c>
|
||||
<c r="C10"><v>88000</v></c>
|
||||
<c r="D10"><f>C10*1.1</f><v></v></c>
|
||||
</row>
|
||||
</sheetData>
|
||||
```
|
||||
|
||||
**After (new row 11 appended):**
|
||||
```xml
|
||||
<row r="10">
|
||||
<c r="A10" t="s"><v>3</v></c>
|
||||
<c r="B10"><v>2023</v></c>
|
||||
<c r="C10"><v>88000</v></c>
|
||||
<c r="D10"><f>C10*1.1</f><v></v></c>
|
||||
</row>
|
||||
<row r="11">
|
||||
<c r="A11" t="s"><v>4</v></c>
|
||||
<c r="B11"><v>2024</v></c>
|
||||
<c r="C11"><v>96000</v></c>
|
||||
<c r="D11"><f>C11*1.1</f><v></v></c>
|
||||
</row>
|
||||
</sheetData>
|
||||
```
|
||||
|
||||
Rules:
|
||||
- Every `<c>` inside the row must have `r` set to the correct cell address (e.g., `A11`).
|
||||
- Text cells need `t="s"` and a sharedStrings index in `<v>`. Numeric cells omit `t`.
|
||||
- Formula cells use `<f>` and an empty `<v>`.
|
||||
- Copy the `s` attribute from the row above if you want matching styles. Do not invent a style index that does not exist in `styles.xml`.
|
||||
- If the sheet contains a `<dimension>` element (e.g., `<dimension ref="A1:D10"/>`), update it to include the new row: `<dimension ref="A1:D11"/>`.
|
||||
- If the sheet contains a `<tableparts>` referencing a table, update the table's `ref` attribute in the corresponding `xl/tables/tableN.xml` file.
|
||||
|
||||
---
|
||||
|
||||
### 4.5 Adding a New Column
|
||||
|
||||
Append new `<c>` elements to each existing `<row>` and, if present, update the `<cols>` section.
|
||||
|
||||
**Before (rows have columns A–C):**
|
||||
```xml
|
||||
<cols>
|
||||
<col min="1" max="3" width="14" customWidth="1"/>
|
||||
</cols>
|
||||
<sheetData>
|
||||
<row r="1">
|
||||
<c r="A1" t="s"><v>0</v></c>
|
||||
<c r="B1" t="s"><v>1</v></c>
|
||||
<c r="C1" t="s"><v>2</v></c>
|
||||
</row>
|
||||
<row r="2">
|
||||
<c r="A2"><v>100</v></c>
|
||||
<c r="B2"><v>200</v></c>
|
||||
<c r="C2"><v>300</v></c>
|
||||
</row>
|
||||
</sheetData>
|
||||
```
|
||||
|
||||
**After (adding column D):**
|
||||
```xml
|
||||
<cols>
|
||||
<col min="1" max="3" width="14" customWidth="1"/>
|
||||
<col min="4" max="4" width="14" customWidth="1"/>
|
||||
</cols>
|
||||
<sheetData>
|
||||
<row r="1">
|
||||
<c r="A1" t="s"><v>0</v></c>
|
||||
<c r="B1" t="s"><v>1</v></c>
|
||||
<c r="C1" t="s"><v>2</v></c>
|
||||
<c r="D1" t="s"><v>5</v></c>
|
||||
</row>
|
||||
<row r="2">
|
||||
<c r="A2"><v>100</v></c>
|
||||
<c r="B2"><v>200</v></c>
|
||||
<c r="C2"><v>300</v></c>
|
||||
<c r="D2"><f>A2+B2+C2</f><v></v></c>
|
||||
</row>
|
||||
</sheetData>
|
||||
```
|
||||
|
||||
Rules:
|
||||
- Adding a column at the end (after the last existing column) is safe — no existing formula references shift.
|
||||
- Inserting a column in the middle shifts all columns to the right, which requires the same cascade updates as row insertion (see Section 5).
|
||||
- Update the `<dimension>` element if present.
|
||||
|
||||
---
|
||||
|
||||
### 4.6 Modifying or Adding Styles
|
||||
|
||||
Styles use a multi-level indirect reference chain. Read `ooxml-cheatsheet.md` for the full chain. The key rule: **only append new entries, never modify existing ones**.
|
||||
|
||||
**Scenario:** Add a blue-font style (for hardcoded input cells) that doesn't yet exist.
|
||||
|
||||
**Step 1 — Check if a matching font already exists in `xl/styles.xml`:**
|
||||
```xml
|
||||
<!-- Look inside <fonts> for an existing blue font -->
|
||||
<font>
|
||||
<color rgb="000000FF"/>
|
||||
<!-- other attributes -->
|
||||
</font>
|
||||
```
|
||||
|
||||
If found, note its index (zero-based position in the `<fonts>` list). If not found, append.
|
||||
|
||||
**Step 2 — Append the new font if needed:**
|
||||
|
||||
Before:
|
||||
```xml
|
||||
<fonts count="3">
|
||||
<font>...</font> <!-- index 0 -->
|
||||
<font>...</font> <!-- index 1 -->
|
||||
<font>...</font> <!-- index 2 -->
|
||||
</fonts>
|
||||
```
|
||||
|
||||
After:
|
||||
```xml
|
||||
<fonts count="4">
|
||||
<font>...</font> <!-- index 0 -->
|
||||
<font>...</font> <!-- index 1 -->
|
||||
<font>...</font> <!-- index 2 -->
|
||||
<font>
|
||||
<b/>
|
||||
<sz val="11"/>
|
||||
<color rgb="000000FF"/>
|
||||
<name val="Calibri"/>
|
||||
</font> <!-- index 3 (new) -->
|
||||
</fonts>
|
||||
```
|
||||
|
||||
**Step 3 — Append a new `<xf>` in `<cellXfs>`:**
|
||||
|
||||
Before:
|
||||
```xml
|
||||
<cellXfs count="5">
|
||||
<xf .../> <!-- index 0 -->
|
||||
<xf .../> <!-- index 1 -->
|
||||
<xf .../> <!-- index 2 -->
|
||||
<xf .../> <!-- index 3 -->
|
||||
<xf .../> <!-- index 4 -->
|
||||
</cellXfs>
|
||||
```
|
||||
|
||||
After:
|
||||
```xml
|
||||
<cellXfs count="6">
|
||||
<xf .../> <!-- index 0 -->
|
||||
<xf .../> <!-- index 1 -->
|
||||
<xf .../> <!-- index 2 -->
|
||||
<xf .../> <!-- index 3 -->
|
||||
<xf .../> <!-- index 4 -->
|
||||
<xf numFmtId="0" fontId="3" fillId="0" borderId="0" xfId="0"
|
||||
applyFont="1"/> <!-- index 5 (new) -->
|
||||
</cellXfs>
|
||||
```
|
||||
|
||||
**Step 4 — Apply to target cells:**
|
||||
|
||||
Before:
|
||||
```xml
|
||||
<c r="B3">
|
||||
<v>0.08</v>
|
||||
</c>
|
||||
```
|
||||
|
||||
After:
|
||||
```xml
|
||||
<c r="B3" s="5">
|
||||
<v>0.08</v>
|
||||
</c>
|
||||
```
|
||||
|
||||
Rules:
|
||||
- Never delete or reorder existing entries in `<fonts>`, `<fills>`, `<borders>`, `<cellXfs>`.
|
||||
- Always update the `count` attribute when appending.
|
||||
- The new `cellXfs` index = the old `count` value before appending (zero-based: if count was 5, new index is 5).
|
||||
- Custom `numFmt` IDs must be 164 or above. IDs 0–163 are built-in and must not be re-declared.
|
||||
- If the desired style already exists elsewhere in the file (on a similar cell), reuse its `s` index rather than creating a duplicate.
|
||||
|
||||
---
|
||||
|
||||
### 4.7 Renaming a Sheet
|
||||
|
||||
**Only `xl/workbook.xml` needs to change** — unless cross-sheet formulas reference the old name.
|
||||
|
||||
**Before (`xl/workbook.xml`):**
|
||||
```xml
|
||||
<sheet name="Sheet1" sheetId="1" r:id="rId1"/>
|
||||
```
|
||||
|
||||
**After:**
|
||||
```xml
|
||||
<sheet name="Revenue" sheetId="1" r:id="rId1"/>
|
||||
```
|
||||
|
||||
**If any formula in any worksheet references the old name, update those too:**
|
||||
|
||||
Before (`xl/worksheets/sheet2.xml`):
|
||||
```xml
|
||||
<c r="B5"><f>Sheet1!C10</f><v></v></c>
|
||||
```
|
||||
|
||||
After:
|
||||
```xml
|
||||
<c r="B5"><f>Revenue!C10</f><v></v></c>
|
||||
```
|
||||
|
||||
If the new name contains spaces:
|
||||
```xml
|
||||
<c r="B5"><f>'Q1 Revenue'!C10</f><v></v></c>
|
||||
```
|
||||
|
||||
Scan all worksheet XML files for the old name:
|
||||
```bash
|
||||
grep -r "Sheet1!" /tmp/xlsx_work/xl/worksheets/
|
||||
```
|
||||
|
||||
Rules:
|
||||
- The `.rels` file and `[Content_Types].xml` do NOT need to change — they reference the XML file path, not the sheet name.
|
||||
- `sheetId` must not change; it is a stable internal identifier.
|
||||
- Sheet names are case-sensitive in formula references.
|
||||
|
||||
---
|
||||
|
||||
## 5. High-Risk Operations — Cascade Effects
|
||||
|
||||
### 5.1 Inserting a Row in the Middle
|
||||
|
||||
Inserting a row at position N shifts all rows from N downward. Every reference to those rows in every XML file must be updated.
|
||||
|
||||
**Files to check and update:**
|
||||
|
||||
| XML region | What to update | Example shift |
|
||||
|------------|---------------|---------------|
|
||||
| Worksheet `<row r="...">` attributes | Increment row number for all rows >= N | `r="7"` → `r="8"` |
|
||||
| All `<c r="...">` within those rows | Increment row number in cell address | `r="A7"` → `r="A8"` |
|
||||
| All `<f>` formula text in any sheet | Shift absolute row references >= N | `B7` → `B8` |
|
||||
| `<mergeCell ref="...">` | Shift start and end rows | `A7:C7` → `A8:C8` |
|
||||
| `<conditionalFormatting sqref="...">` | Shift range | `A5:D20` → `A5:D21` |
|
||||
| `<dataValidations sqref="...">` | Shift range | `B6:B50` → `B7:B51` |
|
||||
| `xl/charts/chartN.xml` data source ranges | Shift series ranges | `Sheet1!$B$5:$B$20` → `Sheet1!$B$6:$B$21` |
|
||||
| `xl/pivotTables/*.xml` source ranges | Shift source data range | Handle with extreme care — see Section 7 |
|
||||
| `<dimension ref="...">` | Expand to include new extent | `A1:D20` → `A1:D21` |
|
||||
| `xl/tables/tableN.xml` `ref` attribute | Expand table boundary | `A1:D20` → `A1:D21` |
|
||||
|
||||
**Do not attempt row insertion manually in large or formula-heavy files.** Use the dedicated shift script instead:
|
||||
|
||||
```bash
|
||||
# Insert 1 row at row 5: all rows 5 and below shift down by 1
|
||||
python3 SKILL_DIR/scripts/xlsx_shift_rows.py /tmp/xlsx_work/ insert 5 1
|
||||
|
||||
# Delete 1 row at row 8: all rows 9 and above shift up by 1
|
||||
python3 SKILL_DIR/scripts/xlsx_shift_rows.py /tmp/xlsx_work/ delete 8 1
|
||||
```
|
||||
|
||||
The script updates in one pass: `<row r="...">` attributes, `<c r="...">` cell addresses, all `<f>` formula text across every worksheet, `<mergeCell>` ranges, `<conditionalFormatting sqref="...">`, `<dataValidation sqref="...">`, `<dimension ref="...">`, table `ref` attributes in `xl/tables/`, chart series ranges in `xl/charts/`, and pivot cache source ranges in `xl/pivotCaches/`.
|
||||
|
||||
**After running the shift script, always repack and validate:**
|
||||
```bash
|
||||
python3 SKILL_DIR/scripts/xlsx_pack.py /tmp/xlsx_work/ output.xlsx
|
||||
python3 SKILL_DIR/scripts/formula_check.py output.xlsx
|
||||
```
|
||||
|
||||
**What the script does NOT update (review manually):**
|
||||
- Named ranges in `xl/workbook.xml` `<definedNames>` — check and update if they reference shifted rows.
|
||||
- Structured table references (`Table[@Column]`) inside formulas.
|
||||
- External workbook links in `xl/externalLinks/`.
|
||||
|
||||
### 5.2 Inserting a Column in the Middle
|
||||
|
||||
Same cascade logic as row insertion, but for columns. Column references in formulas (`B`, `$C`, etc.) and in merged cell ranges, conditional formatting ranges, and chart data sources all need updating.
|
||||
|
||||
Column letter shifting is harder to automate safely. Prefer **appending columns at the end** whenever possible.
|
||||
|
||||
### 5.3 Deleting a Row or Column
|
||||
|
||||
Deletion is more dangerous than insertion because any formula that referenced a deleted row or column will become `#REF!`. Before deleting:
|
||||
|
||||
1. Search all `<f>` elements for references to the deleted range.
|
||||
2. If any formula references a cell in the deleted row/column, do not delete — instead, either clear the row's data or consult the user.
|
||||
3. After deletion, shift all references to rows/columns beyond the deletion point downward/leftward.
|
||||
|
||||
---
|
||||
|
||||
## 6. Template Filling — Identifying and Populating Input Cells
|
||||
|
||||
Templates designate certain cells as input zones. Common patterns to recognize them:
|
||||
|
||||
### 6.1 How Templates Signal Input Zones
|
||||
|
||||
| Signal | XML manifestation | What to look for |
|
||||
|--------|-------------------|-----------------|
|
||||
| Blue font color | `s` attribute pointing to a `cellXfs` entry with `fontId` → `<color rgb="000000FF"/>` | Check `styles.xml` to decode `s` values |
|
||||
| Yellow fill (highlight) | `s` → `fillId` → `<fill><patternFill><fgColor rgb="00FFFF00"/>` | |
|
||||
| Empty `<v>` element | `<c r="B5"><v></v></c>` or cell entirely absent from `<row>` | The cell has no value yet |
|
||||
| Comment/annotation near cell | `xl/comments1.xml` with `ref="B5"` | Comments often label input fields |
|
||||
| Named ranges | `xl/workbook.xml` `<definedName>` elements | Template may define `InputRevenue` etc. |
|
||||
|
||||
### 6.2 Filling a Template Cell
|
||||
|
||||
Do not change `s` attributes. Do not change `t` attributes unless you must change from empty to typed. Only change `<v>` or add `<f>`.
|
||||
|
||||
**Before (empty input cell with style preserved):**
|
||||
```xml
|
||||
<c r="C5" s="3">
|
||||
<v></v>
|
||||
</c>
|
||||
```
|
||||
|
||||
**After (filled with a number, style unchanged):**
|
||||
```xml
|
||||
<c r="C5" s="3">
|
||||
<v>125000</v>
|
||||
</c>
|
||||
```
|
||||
|
||||
**After (filled with text — requires shared string entry first):**
|
||||
```xml
|
||||
<!-- 1. Append to sharedStrings.xml: <si><t>North Region</t></si> at index 7 -->
|
||||
<c r="C5" t="s" s="3">
|
||||
<v>7</v>
|
||||
</c>
|
||||
```
|
||||
|
||||
**After (filled with a formula, preserving style):**
|
||||
```xml
|
||||
<c r="C5" s="3">
|
||||
<f>Assumptions!D12</f>
|
||||
<v></v>
|
||||
</c>
|
||||
```
|
||||
|
||||
### 6.3 Locating Input Zones Without Opening the File in Excel
|
||||
|
||||
After unpacking, decode the style index on suspected input cells to determine if they have the template's input color:
|
||||
|
||||
1. Note the `s` value on the cell (e.g., `s="4"`).
|
||||
2. In `xl/styles.xml`, find `<cellXfs>` and look at the 5th entry (index 4).
|
||||
3. Note its `fontId` (e.g., `fontId="2"`).
|
||||
4. In `<fonts>`, look at the 3rd entry (index 2) and check for `<color rgb="000000FF"/>` (blue) or other input marker.
|
||||
|
||||
If the template uses named ranges as input fields, read them from `xl/workbook.xml`:
|
||||
```xml
|
||||
<definedNames>
|
||||
<definedName name="InputGrowthRate">Assumptions!$B$5</definedName>
|
||||
<definedName name="InputDiscountRate">Assumptions!$B$6</definedName>
|
||||
</definedNames>
|
||||
```
|
||||
|
||||
Fill the target cells (`Assumptions!B5`, `Assumptions!B6`) directly.
|
||||
|
||||
### 6.4 Template Filling Rules
|
||||
|
||||
- Fill only cells the template designated as inputs. Do not fill cells that are formula-driven.
|
||||
- Do not apply new styles when filling. The template's formatting is the deliverable.
|
||||
- Do not add or remove rows inside the template's data area unless the template explicitly has an "append here" zone.
|
||||
- After filling, verify that no formula errors were introduced: some templates have input-validation formulas that produce `#VALUE!` if the wrong data type is entered.
|
||||
|
||||
---
|
||||
|
||||
## 7. Files You Must Never Modify
|
||||
|
||||
### 7.1 Absolute no-touch list
|
||||
|
||||
| File / location | Why |
|
||||
|-----------------|-----|
|
||||
| `xl/vbaProject.bin` | Binary VBA bytecode. Any byte modification corrupts the macro project. Editing even one bit makes the macros fail to load. |
|
||||
| `xl/pivotCaches/pivotCacheDefinition*.xml` | The cache definition ties the pivot table to its source data. Editing it without also updating the corresponding `pivotTable*.xml` will corrupt the pivot. |
|
||||
| `xl/pivotTables/*.xml` | Pivot table XML is tightly coupled with the cache definition and with internal state Excel rebuilds on load. Do not edit. If you shifted rows and the pivot's source range now points to wrong data, update only the `<cacheSource>` range in the cache definition, and only the `ref` attribute in the pivot table — no other changes. |
|
||||
| `xl/slicers/*.xml` | Slicers are connected to specific cache IDs and pivot fields. Breaking these connections silently corrupts the file. |
|
||||
| `xl/connections.xml` | External data connections. Editing breaks live data refresh. |
|
||||
| `xl/externalLinks/` | External workbook links. The binary `.bin` files in here must not be modified. |
|
||||
|
||||
### 7.2 Conditionally safe files (update only specific attributes)
|
||||
|
||||
| File | What you may update | What to leave alone |
|
||||
|------|--------------------|--------------------|
|
||||
| `xl/charts/chartN.xml` | Data series range references (`<numRef><f>`) after a row/column shift | Chart type, formatting, layout |
|
||||
| `xl/tables/tableN.xml` | `ref` attribute on `<table>` after adding rows | Column definitions, style info |
|
||||
| `xl/pivotCaches/pivotCacheDefinition*.xml` | `ref` attribute on `<cacheSource><worksheetSource>` after shifting source data | All other content |
|
||||
|
||||
---
|
||||
|
||||
## 8. Validation After Every Edit
|
||||
|
||||
Never skip validation. Even a one-character change in a formula can cause cascading errors.
|
||||
|
||||
```bash
|
||||
# Pack
|
||||
python3 SKILL_DIR/scripts/xlsx_pack.py /tmp/xlsx_work/ output.xlsx
|
||||
|
||||
# Static formula validation (always run)
|
||||
python3 SKILL_DIR/scripts/formula_check.py output.xlsx
|
||||
|
||||
# Dynamic validation (if LibreOffice available)
|
||||
python3 SKILL_DIR/scripts/libreoffice_recalc.py output.xlsx /tmp/recalc.xlsx
|
||||
python3 SKILL_DIR/scripts/formula_check.py /tmp/recalc.xlsx
|
||||
```
|
||||
|
||||
If `formula_check.py` reports any error:
|
||||
1. Unpack the output file again (it is the packed version).
|
||||
2. Locate the reported cell in the worksheet XML.
|
||||
3. Fix the `<f>` element.
|
||||
4. Repack and re-validate.
|
||||
|
||||
Do not deliver the file until `formula_check.py` reports zero errors.
|
||||
|
||||
---
|
||||
|
||||
## 9. Absolute Rules Summary
|
||||
|
||||
| Rule | Rationale |
|
||||
|------|-----------|
|
||||
| Never use openpyxl `load_workbook` + `save` on an existing file | Round-trip destroys pivot tables, VBA, sparklines, slicers |
|
||||
| Never delete or reorder existing `<si>` entries in sharedStrings | Breaks every cell referencing that index |
|
||||
| Never delete or reorder existing `<xf>` entries in `<cellXfs>` | Breaks every cell using that style index |
|
||||
| Never modify `vbaProject.bin` | Binary file; any change corrupts VBA |
|
||||
| Never change `sheetId` when renaming a sheet | Internal ID is stable; changing it breaks relationships |
|
||||
| Never skip post-edit validation | Leaves broken references undetected |
|
||||
| Never edit more XML nodes than required | Extra changes risk introducing subtle corruption |
|
||||
| Clear `<v>` to empty string when changing a formula | Prevents stale cached value from misleading downstream consumers |
|
||||
| Append-only to sharedStrings | Existing indexes must remain valid |
|
||||
| Append-only to styles collections | Existing style indexes must remain valid |
|
||||
37
minimax-xlsx/references/fix.md
Normal file
37
minimax-xlsx/references/fix.md
Normal file
@@ -0,0 +1,37 @@
|
||||
# FIX — Repair Broken Formulas in an Existing xlsx
|
||||
|
||||
This is an EDIT task. You MUST preserve all original sheets and data. Never create a new workbook.
|
||||
|
||||
## Workflow
|
||||
|
||||
```bash
|
||||
# Step 1: Identify errors
|
||||
python3 SKILL_DIR/scripts/formula_check.py input.xlsx --json
|
||||
|
||||
# Step 2: Unpack
|
||||
python3 SKILL_DIR/scripts/xlsx_unpack.py input.xlsx /tmp/xlsx_work/
|
||||
|
||||
# Step 3: Fix each broken <f> element in the worksheet XML using the Edit tool
|
||||
# (see Error-to-Fix mapping below)
|
||||
|
||||
# Step 4: Pack and validate
|
||||
python3 SKILL_DIR/scripts/xlsx_pack.py /tmp/xlsx_work/ output.xlsx
|
||||
python3 SKILL_DIR/scripts/formula_check.py output.xlsx
|
||||
```
|
||||
|
||||
## Error-to-Fix Mapping
|
||||
|
||||
| Error | Fix Strategy |
|
||||
|-------|-------------|
|
||||
| `#DIV/0!` | Wrap: `IFERROR(original_formula, "-")` |
|
||||
| `#NAME?` | Fix misspelled function (e.g. `SUMM` → `SUM`) |
|
||||
| `#REF!` | Reconstruct the broken reference |
|
||||
| `#VALUE!` | Fix type mismatch |
|
||||
|
||||
For the full list of Excel error types and advanced diagnostics, see `validate.md`.
|
||||
|
||||
## Critical Rules
|
||||
|
||||
- The output MUST contain the same sheets as the input. Do NOT create a new workbook.
|
||||
- Only modify the specific `<f>` elements that are broken — everything else must be untouched.
|
||||
- After packing, always run `formula_check.py` to confirm all errors are resolved.
|
||||
768
minimax-xlsx/references/format.md
Normal file
768
minimax-xlsx/references/format.md
Normal file
@@ -0,0 +1,768 @@
|
||||
# Financial Formatting & Output Standards — Complete Agent Guide
|
||||
|
||||
> This document is the complete reference manual for the agent when applying professional financial formatting to xlsx files. All operations target direct XML surgery on `xl/styles.xml` without using openpyxl. Every operational step provides ready-to-use XML snippets.
|
||||
|
||||
---
|
||||
|
||||
## 1. When to Use This Path
|
||||
|
||||
This document (FORMAT path) applies to the following two scenarios:
|
||||
|
||||
**Scenario A — Dedicated Formatting of an Existing File**
|
||||
The user provides an existing xlsx file and requests that financial modeling formatting standards be applied or unified. The starting point is to unpack the file, audit the existing `styles.xml`, then append missing styles and batch-update cell `s` attributes. No cell values or formulas are modified.
|
||||
|
||||
**Scenario B — Applying Format Standards After CREATE/EDIT**
|
||||
After completing data entry or formula writing, formatting is applied as the final step. At this point, `styles.xml` may come from the minimal_xlsx template (which pre-defines 13 style slots) or from a user file. In either case, follow the principle of "append only, never modify existing xf entries."
|
||||
|
||||
**Not applicable**: Reading or analyzing file contents only (use the READ path); modifying formulas or data (use the EDIT path).
|
||||
|
||||
---
|
||||
|
||||
## 2. Financial Format Semantic System
|
||||
|
||||
### 2.1 Font Color = Cell Role (Color = Role)
|
||||
|
||||
The primary convention of financial modeling: **font color encodes the cell's role, not decoration**. A reviewer can glance at colors to determine which cells are adjustable parameters and which are model-calculated results. This is an industry-wide convention (followed by investment banks, the Big Four, and corporate finance teams).
|
||||
|
||||
| Role | Font Color | AARRGGBB | Use Case |
|
||||
|------|-----------|----------|----------|
|
||||
| Hard-coded input / assumption | Blue | `000000FF` | Growth rates, discount rates, tax rates, and other user-modifiable parameters |
|
||||
| Formula / calculated result | Black | `00000000` | All cells containing a `<f>` element |
|
||||
| Same-workbook cross-sheet reference | Green | `00008000` | Cells whose formula starts with `SheetName!` |
|
||||
| External file link | Red | `00FF0000` | Cells whose formula contains `[FileName.xlsx]` (flagged as fragile links) |
|
||||
| Label / text | Black (default) | theme color | Row labels, category headings |
|
||||
| Key assumption requiring review | Blue font + yellow fill | Font `000000FF` / Fill `00FFFF00` | Provisional values, parameters pending confirmation |
|
||||
|
||||
**Decision tree**:
|
||||
```
|
||||
Does the cell contain a <f> element?
|
||||
+-- Yes -> Does the formula start with [FileName]?
|
||||
| +-- Yes -> Red (external link)
|
||||
| +-- No -> Does the formula contain SheetName!?
|
||||
| +-- Yes -> Green (cross-sheet reference)
|
||||
| +-- No -> Black (same-sheet formula)
|
||||
+-- No -> Is the value a user-adjustable parameter?
|
||||
+-- Yes -> Blue (input/assumption)
|
||||
+-- No -> Black default (label)
|
||||
```
|
||||
|
||||
**Strictly prohibited**: Blue font + `<f>` element coexisting (color role contradiction — must be corrected).
|
||||
|
||||
### 2.2 Number Format Matrix
|
||||
|
||||
| Data Type | formatCode | numFmtId | Display Example | Applicable Scenario |
|
||||
|-----------|-----------|----------|-----------------|---------------------|
|
||||
| Standard currency (whole dollars) | `$#,##0;($#,##0);"-"` | 164 | $1,234 / ($1,234) / - | P&L, balance sheet amount rows |
|
||||
| Standard currency (with cents) | `$#,##0.00;($#,##0.00);"-"` | 169 | $1,234.56 / ($1,234.56) / - | Unit prices, detailed costs |
|
||||
| Thousands (K) | `#,##0,"K"` | 171 | 1,234K | Simplified display for management reports |
|
||||
| Millions (M) | `#,##0,,"M"` | 172 | 1M | Macro-level summary rows |
|
||||
| Percentage (1 decimal) | `0.0%` | 165 | 12.5% | Growth rates, gross margins |
|
||||
| Percentage (2 decimals) | `0.00%` | 170 | 12.50% | IRR, precise interest rates |
|
||||
| Multiple / valuation multiplier | `0.0x` | 166 | 8.5x | EV/EBITDA, P/E |
|
||||
| Integer (thousands separator) | `#,##0` | 167 | 12,345 | Employee count, unit quantities |
|
||||
| Year | `0` | 1 (built-in, no declaration needed) | 2024 | Column header years, prevents 2,024 |
|
||||
| Date | `m/d/yyyy` | 14 (built-in, no declaration needed) | 3/21/2026 | Timelines |
|
||||
| General text | General | 0 (built-in, no declaration needed) | — | Label rows, cells with no format requirement |
|
||||
|
||||
numFmtId 169–172 are custom formats that need to be appended beyond the 4 formats (164–167) pre-defined in the minimal_xlsx template. When appending, assign IDs according to the rules (see Section 3.4).
|
||||
|
||||
**Built-in format IDs do not need to be declared in `<numFmts>`** (IDs 0–163 are built into Excel/LibreOffice; simply reference the numFmtId in `<xf>`):
|
||||
|
||||
| numFmtId | formatCode | Description |
|
||||
|----------|-----------|-------------|
|
||||
| 0 | General | General format |
|
||||
| 1 | `0` | Integer, no thousands separator (use this ID for years) |
|
||||
| 3 | `#,##0` | Thousands-separated integer (no decimals) |
|
||||
| 9 | `0%` | Percentage integer |
|
||||
| 10 | `0.00%` | Percentage with two decimals |
|
||||
| 14 | `m/d/yyyy` | Short date |
|
||||
|
||||
### 2.3 Negative Number Display Standards
|
||||
|
||||
Financial reports have two mainstream conventions for negative numbers — choose one and **maintain consistency** throughout the entire workbook:
|
||||
|
||||
**Parenthetical style (investment banking standard, recommended for external deliverables)**
|
||||
|
||||
```
|
||||
Positive: $1,234 Negative: ($1,234) Zero: -
|
||||
formatCode: $#,##0;($#,##0);"-"
|
||||
```
|
||||
|
||||
**Red minus sign style (suitable for internal operational analysis reports)**
|
||||
|
||||
```
|
||||
Positive: $1,234 Negative: -$1,234 (red)
|
||||
formatCode: $#,##0;[Red]-$#,##0;"-"
|
||||
```
|
||||
|
||||
Rule: Once a style is determined, maintain it across the entire workbook. Do not mix two negative number display styles within the same workbook.
|
||||
|
||||
### 2.4 Zero Value Display Standards
|
||||
|
||||
In financial models, "0" and "no data" have different semantics and should be visually distinct:
|
||||
|
||||
| Scenario | Recommended Display | formatCode Third Segment |
|
||||
|----------|-------------------|--------------------------|
|
||||
| Sparse matrix (most rows have zero-value periods) | Dash `-` | `"-"` |
|
||||
| Quantity counts (zero itself is meaningful) | `0` | `0` or omit |
|
||||
| Placeholder row (explicitly empty) | Leave blank | Do not write to cell |
|
||||
|
||||
Four-segment format syntax: `positive format;negative format;zero value format;text format`
|
||||
|
||||
Zero as dash: `$#,##0;($#,##0);"-"`
|
||||
Zero preserved as 0: `#,##0;(#,##0);0`
|
||||
|
||||
---
|
||||
|
||||
## 3. styles.xml Surgical Operations
|
||||
|
||||
### 3.1 Auditing Existing Styles: Understanding the cellXfs Indirect Reference Chain
|
||||
|
||||
A cell's `s` attribute points to a position index (0-based) in `cellXfs`, and each `<xf>` entry in `cellXfs` references its respective definition libraries through `fontId`, `fillId`, `borderId`, and `numFmtId`.
|
||||
|
||||
Reference chain diagram:
|
||||
|
||||
```
|
||||
Cell <c s="6">
|
||||
| Look up cellXfs by 0-based index
|
||||
cellXfs[6] -> numFmtId="164" fontId="2" fillId="0" borderId="0"
|
||||
| | | |
|
||||
numFmts fonts[2] fills[0] borders[0]
|
||||
id=164 color=00000000 (no fill) (no border)
|
||||
$#,##0... black
|
||||
```
|
||||
|
||||
Audit steps:
|
||||
|
||||
**Step 1**: Read `<numFmts>` and record all declared custom formats and their IDs:
|
||||
```xml
|
||||
<numFmts count="4">
|
||||
<numFmt numFmtId="164" formatCode="$#,##0;($#,##0);"-""/>
|
||||
<numFmt numFmtId="165" formatCode="0.0%"/>
|
||||
<numFmt numFmtId="166" formatCode="0.0x"/>
|
||||
<numFmt numFmtId="167" formatCode="#,##0"/>
|
||||
</numFmts>
|
||||
```
|
||||
Record: current maximum custom numFmtId = 167, next available ID = 168.
|
||||
|
||||
**Step 2**: Read `<fonts>` and list each `<font>` by 0-based index with its color and style:
|
||||
```
|
||||
fontId=0 -> No explicit color (theme default black)
|
||||
fontId=1 -> color rgb="000000FF" (blue, input role)
|
||||
fontId=2 -> color rgb="00000000" (black, formula role)
|
||||
fontId=3 -> color rgb="00008000" (green, cross-sheet reference role)
|
||||
fontId=4 -> <b/> + color rgb="00000000" (bold black, header)
|
||||
```
|
||||
|
||||
**Step 3**: Read `<fills>` and confirm that fills[0] and fills[1] are spec-mandated reserved entries (never delete):
|
||||
```
|
||||
fillId=0 -> patternType="none" (spec-mandated)
|
||||
fillId=1 -> patternType="gray125" (spec-mandated)
|
||||
fillId=2 -> Yellow highlight (if present)
|
||||
```
|
||||
|
||||
**Step 4**: Read `<cellXfs>` and list each `<xf>` entry by 0-based index with its combination:
|
||||
```
|
||||
index 0 -> numFmtId=0, fontId=0, fillId=0 -> Default style
|
||||
index 1 -> numFmtId=0, fontId=1, fillId=0 -> Blue font general (input)
|
||||
index 5 -> numFmtId=164, fontId=1, fillId=0 -> Blue font currency (currency input)
|
||||
index 6 -> numFmtId=164, fontId=2, fillId=0 -> Black font currency (currency formula)
|
||||
...
|
||||
```
|
||||
|
||||
**Step 5**: Verify that all count attributes match the actual number of elements (count mismatches will cause Excel to refuse to open the file).
|
||||
|
||||
### 3.2 Safely Appending New Styles (Golden Rule: Append Only, Never Modify Existing xf)
|
||||
|
||||
**Never modify existing `<xf>` entries**. Modifications will affect all cells that already reference that index, breaking existing formatting. Only append new entries at the end.
|
||||
|
||||
Complete atomic operation sequence for appending new styles (all 5 steps must be executed):
|
||||
|
||||
**Step 1**: Determine if a new `<numFmt>` is needed
|
||||
|
||||
Built-in formats (ID 0–163) skip this step. Custom formats are appended to the end of `<numFmts>`:
|
||||
```xml
|
||||
<numFmts count="5"> <!-- count +1 -->
|
||||
<!-- Keep existing entries unchanged -->
|
||||
<numFmt numFmtId="164" formatCode="$#,##0;($#,##0);"-""/>
|
||||
<numFmt numFmtId="165" formatCode="0.0%"/>
|
||||
<numFmt numFmtId="166" formatCode="0.0x"/>
|
||||
<numFmt numFmtId="167" formatCode="#,##0"/>
|
||||
<!-- Newly appended -->
|
||||
<numFmt numFmtId="168" formatCode="$#,##0.00;($#,##0.00);"-""/>
|
||||
</numFmts>
|
||||
```
|
||||
|
||||
**Step 2**: Determine if a new `<font>` is needed
|
||||
|
||||
Check whether the existing fonts already contain a matching color+style combination. If not, append to the end of `<fonts>`:
|
||||
```xml
|
||||
<fonts count="6"> <!-- count +1 -->
|
||||
<!-- Keep existing entries unchanged -->
|
||||
...
|
||||
<!-- Newly appended: red font (external link role), new fontId = 5 -->
|
||||
<font>
|
||||
<sz val="11"/>
|
||||
<name val="Calibri"/>
|
||||
<color rgb="00FF0000"/>
|
||||
</font>
|
||||
</fonts>
|
||||
```
|
||||
New fontId = the count value before appending (when original count=5, new fontId=5).
|
||||
|
||||
**Step 3**: Determine if a new `<fill>` is needed
|
||||
|
||||
If a new background color is needed, append to the end of `<fills>` (note: fills[0] and fills[1] must never be modified):
|
||||
```xml
|
||||
<fills count="4"> <!-- count +1 -->
|
||||
<fill><patternFill patternType="none"/></fill> <!-- 0: spec-mandated -->
|
||||
<fill><patternFill patternType="gray125"/></fill> <!-- 1: spec-mandated -->
|
||||
<fill> <!-- 2: yellow highlight -->
|
||||
<patternFill patternType="solid">
|
||||
<fgColor rgb="00FFFF00"/>
|
||||
<bgColor indexed="64"/>
|
||||
</patternFill>
|
||||
</fill>
|
||||
<!-- Newly appended: light gray fill (projection period distinction), new fillId = 3 -->
|
||||
<fill>
|
||||
<patternFill patternType="solid">
|
||||
<fgColor rgb="00D3D3D3"/>
|
||||
<bgColor indexed="64"/>
|
||||
</patternFill>
|
||||
</fill>
|
||||
</fills>
|
||||
```
|
||||
|
||||
**Step 4**: Append a new `<xf>` combination at the end of `<cellXfs>`
|
||||
```xml
|
||||
<cellXfs count="14"> <!-- count +1 -->
|
||||
<!-- Keep existing entries 0-12 unchanged -->
|
||||
...
|
||||
<!-- Newly appended index=13: currency with cents formula (black font + numFmtId=168) -->
|
||||
<xf numFmtId="168" fontId="2" fillId="0" borderId="0" xfId="0"
|
||||
applyFont="1" applyNumberFormat="1"/>
|
||||
</cellXfs>
|
||||
```
|
||||
New style index = the count value before appending (when original count=13, new index=13).
|
||||
|
||||
**Step 5**: Record the new style index; subsequently set the `s` attribute of corresponding cells in the sheet XML to this value.
|
||||
|
||||
### 3.3 AARRGGBB Color Format Explanation
|
||||
|
||||
OOXML's `rgb` attribute uses **8-digit hexadecimal AARRGGBB** format (not HTML's 6-digit RRGGBB):
|
||||
|
||||
```
|
||||
AA RR GG BB
|
||||
| | | |
|
||||
Alpha Red Green Blue
|
||||
```
|
||||
|
||||
- Alpha channel: `00` = fully opaque (normal use value); `FF` = fully transparent (invisible, never use this)
|
||||
- Financial color standards always use `00` as the Alpha prefix
|
||||
|
||||
| Color | AARRGGBB | Corresponding Role |
|
||||
|-------|----------|-------------------|
|
||||
| Blue (input) | `000000FF` | Hard-coded assumptions |
|
||||
| Black (formula) | `00000000` | Calculated results |
|
||||
| Green (cross-sheet reference) | `00008000` | Same-workbook cross-sheet |
|
||||
| Red (external link) | `00FF0000` | References to other files |
|
||||
| Yellow (review-required fill) | `00FFFF00` | Key assumption highlight |
|
||||
| Light gray (projection period fill) | `00D3D3D3` | Distinguishing historical vs. forecast periods |
|
||||
| White | `00FFFFFF` | Pure white fill |
|
||||
|
||||
**Common mistake**: Mistakenly writing HTML format `#0000FF` as `FF0000FF` (Alpha=FF makes the color fully transparent and invisible). Correct format: `000000FF`.
|
||||
|
||||
### 3.4 numFmtId Assignment Rules
|
||||
|
||||
```
|
||||
ID 0-163 -> Excel/LibreOffice built-in formats, no declaration needed in <numFmts>, reference directly in <xf>
|
||||
ID 164+ -> Custom formats, must be explicitly declared as <numFmt> elements in <numFmts>
|
||||
```
|
||||
|
||||
Rules for assigning new IDs:
|
||||
1. Read all `numFmtId` attribute values in the current `<numFmts>`
|
||||
2. Take the maximum value + 1 as the next custom format ID
|
||||
3. Do not reuse existing IDs; do not skip numbers
|
||||
|
||||
The minimal_xlsx template pre-defines IDs: 164, 165, 166, 167. The next available ID is 168.
|
||||
|
||||
---
|
||||
|
||||
## 4. Pre-defined Style Index Complete Reference Table (13 Slots)
|
||||
|
||||
The following are the 13 style slots (cellXfs index 0–12) pre-defined in the minimal_xlsx template's `styles.xml`, which can be directly referenced in the cell `s` attribute in sheet XML:
|
||||
|
||||
| Index | Semantic Role | Font Color | Fill | numFmtId | Format Display | Typical Use |
|
||||
|-------|--------------|------------|------|----------|---------------|-------------|
|
||||
| **0** | Default style | Theme black | None | 0 | General | Cells requiring no special formatting |
|
||||
| **1** | Input / assumption (general) | Blue `000000FF` | None | 0 | General | Text-type assumptions, flags |
|
||||
| **2** | Formula / calculated result (general) | Black `00000000` | None | 0 | General | Text concatenation formulas, non-numeric calculations |
|
||||
| **3** | Cross-sheet reference (general) | Green `00008000` | None | 0 | General | Values pulled from cross-sheet (general format) |
|
||||
| **4** | Header (bold) | Bold black | None | 0 | General | Row/column headings |
|
||||
| **5** | Currency input | Blue `000000FF` | None | 164 | $1,234 / ($1,234) / - | Amount inputs in the assumptions area |
|
||||
| **6** | Currency formula | Black `00000000` | None | 164 | $1,234 / ($1,234) / - | Amount calculations in the model area (revenue, EBITDA) |
|
||||
| **7** | Percentage input | Blue `000000FF` | None | 165 | 12.5% | Rate inputs in the assumptions area (growth rate, gross margin assumptions) |
|
||||
| **8** | Percentage formula | Black `00000000` | None | 165 | 12.5% | Rate calculations in the model area (actual gross margin) |
|
||||
| **9** | Integer (comma) input | Blue `000000FF` | None | 167 | 12,345 | Quantity inputs in the assumptions area (employee count) |
|
||||
| **10** | Integer (comma) formula | Black `00000000` | None | 167 | 12,345 | Quantity calculations in the model area |
|
||||
| **11** | Year input | Blue `000000FF` | None | 1 | 2024 | Column header years (no thousands separator) |
|
||||
| **12** | Key assumption highlight | Blue `000000FF` | Yellow `00FFFF00` | 0 | General | Key parameters pending review or confirmation |
|
||||
|
||||
**Selection guide**:
|
||||
- Determine "input" vs. "formula" -> Choose odd-numbered (input/blue) or even-numbered (formula/black) paired slots
|
||||
- Determine data type -> Choose the corresponding currency (5/6) / percentage (7/8) / integer (9/10) / year (11) slot
|
||||
- Cross-sheet reference needing number format -> Append a new green + number format combination (see Section 5.4)
|
||||
- Parameter pending review -> index 12
|
||||
|
||||
---
|
||||
|
||||
## 5. Assumption Separation Principle: XML-Level Implementation
|
||||
|
||||
### 5.1 Structural Design
|
||||
|
||||
Assumption separation principle: **Input assumptions are centralized in a dedicated area (sheet or block); the model calculation area contains only formulas, no hard-coded values**.
|
||||
|
||||
Recommended structure:
|
||||
```
|
||||
Workbook sheet layout
|
||||
sheet 1 "Assumptions" -> All blue-font cells (style 1/5/7/9/11/12)
|
||||
sheet 2 "Model" -> All black or green-font cells (style 2/3/4/6/8/10)
|
||||
```
|
||||
|
||||
Same-sheet zoning approach for simple models:
|
||||
```
|
||||
Rows 1-5: [Assumptions block - blue font]
|
||||
Row 6: [Empty row separator]
|
||||
Rows 7+: [Model block - black/green font formulas referencing assumptions area]
|
||||
```
|
||||
|
||||
### 5.2 Assumptions Area XML Example
|
||||
|
||||
```xml
|
||||
<!-- Assumptions sheet (sheet1.xml) example -->
|
||||
|
||||
<!-- Row 1: Block title -->
|
||||
<row r="1">
|
||||
<c r="A1" s="4" t="inlineStr"><is><t>Model Assumptions</t></is></c>
|
||||
</row>
|
||||
|
||||
<!-- Row 2: Growth rate assumption - blue font percentage input, s="7" -->
|
||||
<row r="2">
|
||||
<c r="A2" t="inlineStr"><is><t>Revenue Growth Rate</t></is></c>
|
||||
<c r="B2" s="7"><v>0.08</v></c>
|
||||
</row>
|
||||
|
||||
<!-- Row 3: Gross margin assumption - blue font percentage input, s="7" -->
|
||||
<row r="3">
|
||||
<c r="A3" t="inlineStr"><is><t>Gross Margin</t></is></c>
|
||||
<c r="B3" s="7"><v>0.65</v></c>
|
||||
</row>
|
||||
|
||||
<!-- Row 4: Base revenue - blue font currency input, s="5" -->
|
||||
<row r="4">
|
||||
<c r="A4" t="inlineStr"><is><t>Base Revenue (Year 0)</t></is></c>
|
||||
<c r="B4" s="5"><v>1000000</v></c>
|
||||
</row>
|
||||
|
||||
<!-- Row 5: Key assumption (pending review) - blue font yellow fill, s="12" -->
|
||||
<row r="5">
|
||||
<c r="A5" t="inlineStr"><is><t>Terminal Growth Rate</t></is></c>
|
||||
<c r="B5" s="12"><v>0.03</v></c>
|
||||
</row>
|
||||
```
|
||||
|
||||
### 5.3 Model Area XML Example (Referencing Assumptions Area)
|
||||
|
||||
```xml
|
||||
<!-- Model sheet (sheet2.xml) example -->
|
||||
|
||||
<!-- Row 1: Column headers (years) - bold header, s="4"; year cells, s="11" -->
|
||||
<row r="1">
|
||||
<c r="A1" s="4" t="inlineStr"><is><t>Metric</t></is></c>
|
||||
<c r="B1" s="11"><v>2024</v></c>
|
||||
<c r="C1" s="11"><v>2025</v></c>
|
||||
<c r="D1" s="11"><v>2026</v></c>
|
||||
</row>
|
||||
|
||||
<!-- Row 2: Revenue row -->
|
||||
<row r="2">
|
||||
<c r="A2" t="inlineStr"><is><t>Revenue</t></is></c>
|
||||
<!-- B2: Base year revenue, cross-sheet reference from Assumptions, green, s="3" (general format) -->
|
||||
<!-- If currency format is needed, append new style s="13" (see Section 5.4) -->
|
||||
<c r="B2" s="3"><f>Assumptions!B4</f><v></v></c>
|
||||
<!-- C2, D2: Next year revenue = prior year * (1 + growth rate), black font currency formula, s="6" -->
|
||||
<c r="C2" s="6"><f>B2*(1+Assumptions!B2)</f><v></v></c>
|
||||
<c r="D2" s="6"><f>C2*(1+Assumptions!B2)</f><v></v></c>
|
||||
</row>
|
||||
|
||||
<!-- Row 3: Gross profit row - black font currency formula, s="6" -->
|
||||
<row r="3">
|
||||
<c r="A3" t="inlineStr"><is><t>Gross Profit</t></is></c>
|
||||
<c r="B3" s="6"><f>B2*Assumptions!B3</f><v></v></c>
|
||||
<c r="C3" s="6"><f>C2*Assumptions!B3</f><v></v></c>
|
||||
<c r="D3" s="6"><f>D2*Assumptions!B3</f><v></v></c>
|
||||
</row>
|
||||
|
||||
<!-- Row 4: Gross margin row - black font percentage formula, s="8" -->
|
||||
<row r="4">
|
||||
<c r="A4" t="inlineStr"><is><t>Gross Margin %</t></is></c>
|
||||
<c r="B4" s="8"><f>B3/B2</f><v></v></c>
|
||||
<c r="C4" s="8"><f>C3/C2</f><v></v></c>
|
||||
<c r="D4" s="8"><f>D3/D2</f><v></v></c>
|
||||
</row>
|
||||
```
|
||||
|
||||
### 5.4 Appending "Green + Number Format" Combinations
|
||||
|
||||
Pre-defined index 3 is green font + general format. If a cross-sheet reference involves a currency amount, a green style with a number format must be appended:
|
||||
|
||||
```xml
|
||||
<!-- Append at the end of <cellXfs> in styles.xml (assuming current count=13, new index=13) -->
|
||||
<!-- index 13: cross-sheet reference + currency format (green font + $#,##0) -->
|
||||
<xf numFmtId="164" fontId="3" fillId="0" borderId="0" xfId="0"
|
||||
applyFont="1" applyNumberFormat="1"/>
|
||||
<!-- Update count to 14 -->
|
||||
```
|
||||
|
||||
After appending, cross-sheet reference currency cells use `s="13"`.
|
||||
|
||||
---
|
||||
|
||||
## 6. Complete Operational Workflow
|
||||
|
||||
### 6.1 Workflow Overview
|
||||
|
||||
```
|
||||
[Existing xlsx or file after CREATE/EDIT]
|
||||
|
|
||||
Step 1: Unpack (extract to temporary directory)
|
||||
|
|
||||
Step 2: Audit styles.xml (review existing styles, build index mapping table)
|
||||
|
|
||||
Step 3: Audit sheet XML (identify cells needing formatting and their semantic roles)
|
||||
|
|
||||
Step 4: Append missing styles (numFmt -> font -> fill -> xf, update counts)
|
||||
|
|
||||
Step 5: Batch-update the s attribute of each cell in the sheet XML
|
||||
|
|
||||
Step 6: XML validity + style reference integrity verification
|
||||
|
|
||||
Step 7: Pack (recompress as xlsx)
|
||||
```
|
||||
|
||||
### 6.2 Step 1 — Unpack
|
||||
|
||||
```bash
|
||||
python3 SKILL_DIR/scripts/xlsx_unpack.py input.xlsx /tmp/xlsx_fmt/
|
||||
```
|
||||
|
||||
If the script is unavailable, unpack manually:
|
||||
```bash
|
||||
mkdir -p /tmp/xlsx_fmt && cp input.xlsx /tmp/xlsx_fmt/input.xlsx
|
||||
cd /tmp/xlsx_fmt && unzip input.xlsx -d unpacked/
|
||||
```
|
||||
|
||||
### 6.3 Step 2 — Audit styles.xml
|
||||
|
||||
Execute according to the method in Section 3.1. Quick check for minimal_xlsx template initial state:
|
||||
- `<cellXfs count="13">` and `<numFmts count="4">` -> Template initial state, all 13 pre-defined slots can be used directly
|
||||
- Otherwise -> A complete review of the existing index mapping is required
|
||||
|
||||
### 6.4 Step 3 — Audit Sheet XML, Build Formatting Plan
|
||||
|
||||
Read `xl/worksheets/sheet*.xml` and evaluate each cell:
|
||||
1. Does it contain a `<f>` element (formula)? -> Requires black/green/red style
|
||||
2. Is it a hard-coded numeric parameter? -> Requires blue style
|
||||
3. Is the data type currency/percentage/integer/year? -> Select the corresponding number format slot
|
||||
4. Is it a header? -> Bold style (index 4)
|
||||
|
||||
Build a formatting mapping table: `{cell coordinate: target style index}`
|
||||
|
||||
### 6.5 Step 4 — Append Styles
|
||||
|
||||
Execute according to the atomic operation sequence in Section 3.2. Update the corresponding count attribute immediately after appending each component.
|
||||
|
||||
### 6.6 Step 5 — Batch-Update Cell s Attributes
|
||||
|
||||
```xml
|
||||
<!-- Before formatting: no style -->
|
||||
<c r="B5"><v>0.08</v></c>
|
||||
|
||||
<!-- After formatting: growth rate assumption, blue font percentage, s="7" -->
|
||||
<c r="B5" s="7"><v>0.08</v></c>
|
||||
```
|
||||
|
||||
```xml
|
||||
<!-- Before formatting: formula without style -->
|
||||
<c r="C10"><f>B10*(1+Assumptions!B2)</f><v></v></c>
|
||||
|
||||
<!-- After formatting: currency formula, black font, s="6" -->
|
||||
<c r="C10" s="6"><f>B10*(1+Assumptions!B2)</f><v></v></c>
|
||||
```
|
||||
|
||||
For consecutive rows of the same type, row-level default styles can be used to reduce repetition:
|
||||
```xml
|
||||
<!-- Entire row uses style=6, only override for exception cells -->
|
||||
<row r="5" s="6" customFormat="1">
|
||||
<c r="A5" s="0" t="inlineStr"><is><t>Operating Income</t></is></c> <!-- Text overridden to default -->
|
||||
<c r="B5"><f>B3-B4</f><v></v></c> <!-- Inherits row-level s=6 -->
|
||||
<c r="C5"><f>C3-C4</f><v></v></c>
|
||||
</row>
|
||||
```
|
||||
|
||||
### 6.7 Step 6 — Verification
|
||||
|
||||
```bash
|
||||
# XML validity verification is handled automatically by xlsx_pack.py, no need to manually run xmllint
|
||||
# The pack script validates styles.xml and sheet XML legality before packaging; it aborts and reports on errors
|
||||
|
||||
# Style audit (optional, audit the entire unpacked directory after formatting is complete)
|
||||
python3 SKILL_DIR/scripts/style_audit.py /tmp/xlsx_fmt/unpacked/
|
||||
|
||||
# Formula error static scan (must specify a single .xlsx file, does not accept directories)
|
||||
# Pack first, then scan:
|
||||
python3 SKILL_DIR/scripts/xlsx_pack.py /tmp/xlsx_fmt/unpacked/ /tmp/output.xlsx
|
||||
python3 SKILL_DIR/scripts/formula_check.py /tmp/output.xlsx
|
||||
```
|
||||
|
||||
Manual style reference integrity check:
|
||||
```bash
|
||||
# Find the maximum s attribute value in the sheet XML
|
||||
grep -o 's="[0-9]*"' /tmp/xlsx_fmt/unpacked/xl/worksheets/sheet1.xml \
|
||||
| grep -o '[0-9]*' | sort -n | tail -1
|
||||
|
||||
# Compare with the cellXfs count attribute (max s value must be < count)
|
||||
grep 'cellXfs count' /tmp/xlsx_fmt/unpacked/xl/styles.xml
|
||||
```
|
||||
|
||||
### 6.8 Step 7 — Pack
|
||||
|
||||
```bash
|
||||
python3 SKILL_DIR/scripts/xlsx_pack.py /tmp/xlsx_fmt/unpacked/ output.xlsx
|
||||
```
|
||||
|
||||
If the script is unavailable, pack manually:
|
||||
```bash
|
||||
cd /tmp/xlsx_fmt/unpacked/
|
||||
zip -r ../output.xlsx . -x "*.DS_Store"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Formatting Completeness Checklist
|
||||
|
||||
Verify each item before delivery:
|
||||
|
||||
### Color Role Consistency
|
||||
- [ ] All numeric cells containing `<f>` elements: fontId corresponds to black (formula) or green (cross-sheet reference)
|
||||
- [ ] All hard-coded numeric values that are user-adjustable parameters: fontId corresponds to blue (input)
|
||||
- [ ] Cross-sheet references (formula contains `SheetName!`): fontId corresponds to green
|
||||
- [ ] External file references (formula contains `[FileName.xlsx]`): fontId corresponds to red
|
||||
- [ ] No cell simultaneously contains a `<f>` element and uses blue font (color role contradiction)
|
||||
|
||||
### Number Format Correctness
|
||||
- [ ] Year columns: numFmtId="1" (`0` format), displays as 2024 not 2,024
|
||||
- [ ] Currency rows: numFmtId="164" or variant, negative numbers display as ($1,234) not -$1,234
|
||||
- [ ] Percentage rows: values stored as decimals (0.08 = 8%), format numFmtId="165", displays as 8.0%
|
||||
- [ ] Zero values: displayed as `-` in sparse matrices rather than `0` (formatCode third segment contains `"-"`)
|
||||
- [ ] Multiple rows (EV/EBITDA, etc.): numFmtId="166" (`0.0x` format)
|
||||
- [ ] Negative number display style is consistent throughout the entire workbook (parenthetical or red minus sign)
|
||||
|
||||
### styles.xml Structural Integrity
|
||||
- [ ] `<numFmts count>` = actual number of `<numFmt>` elements
|
||||
- [ ] `<fonts count>` = actual number of `<font>` elements
|
||||
- [ ] `<fills count>` = actual number of `<fill>` elements (including spec-mandated fills[0] and fills[1])
|
||||
- [ ] `<cellXfs count>` = actual number of `<xf>` elements
|
||||
- [ ] fills[0] is `patternType="none"`, fills[1] is `patternType="gray125"` (spec-mandated)
|
||||
- [ ] All `<xf>` referenced fontId / fillId / borderId are within the valid range of their respective collections
|
||||
- [ ] All cell `s` attribute values < `cellXfs count` (no out-of-bounds references)
|
||||
|
||||
### Assumption Separation Verification
|
||||
- [ ] No black-font numeric cells in the assumptions area/sheet (black numeric = formula, should not be in assumptions)
|
||||
- [ ] No blue-font non-year numeric cells in the model area/sheet (blue numeric = hard-coded, should be in assumptions)
|
||||
- [ ] Input parameters in the model area reference the assumptions area via formulas, not by directly copying values
|
||||
|
||||
### Formula and Format Linkage
|
||||
- [ ] All cells with `<f>` elements have an explicit `s` attribute (must not use default style=0, whose font color is not explicitly black)
|
||||
- [ ] SUM summary rows: style uses black font + corresponding number format (e.g., s="6" for currency summaries)
|
||||
- [ ] Percentage formulas: values stored as decimals, format is `0.0%`; do not multiply values by 100 before applying percentage format
|
||||
|
||||
### Visual Hierarchy
|
||||
- [ ] Header rows (years/metric names): style=4 (bold black)
|
||||
- [ ] Summary rows (Total/EBITDA/Net Income): bold + corresponding number format (append style if needed)
|
||||
- [ ] Unit description rows (e.g., "$ thousands"): use style=0 or style=2 (blue not needed)
|
||||
|
||||
---
|
||||
|
||||
## 8. Prohibited Actions (What You Must NOT Do)
|
||||
|
||||
- **Do not modify existing `<xf>` entries**: This will batch-change the style of all cells referencing that index
|
||||
- **Do not delete fills[0] and fills[1]**: Required by OOXML specification; deletion causes file corruption
|
||||
- **Do not modify cell values or formulas**: The FORMAT path only changes styles, not content
|
||||
- **Do not use openpyxl for formatting**: openpyxl rewrites the entire styles.xml on save, losing unsupported features
|
||||
- **Do not apply global override styles**: Do not cover the entire workbook with a single style; assign precisely by semantic role
|
||||
- **Do not write FF in the Alpha channel**: `rgb="FF0000FF"` makes the color fully transparent; the correct format is `rgb="000000FF"`
|
||||
|
||||
---
|
||||
|
||||
## 9. Common Errors and Fixes
|
||||
|
||||
### Error 1: Year displays as 2,024
|
||||
|
||||
Cause: The year cell's `s` attribute uses a format with thousands separator (e.g., numFmtId="3" or numFmtId="167").
|
||||
|
||||
```xml
|
||||
<!-- Incorrect -->
|
||||
<c r="B1" s="9"><v>2024</v></c>
|
||||
|
||||
<!-- Fix: Change to s="11" (numFmtId="1", format 0) -->
|
||||
<c r="B1" s="11"><v>2024</v></c>
|
||||
```
|
||||
|
||||
### Error 2: Percentage displays as 800% (value was multiplied by 100)
|
||||
|
||||
Cause: 8% was stored as `<v>8</v>` instead of `<v>0.08</v>`. Excel's `%` format automatically multiplies the value by 100 for display.
|
||||
|
||||
```xml
|
||||
<!-- Incorrect -->
|
||||
<c r="B2" s="7"><v>8</v></c>
|
||||
|
||||
<!-- Fix: Value must be stored in decimal form -->
|
||||
<c r="B2" s="7"><v>0.08</v></c>
|
||||
```
|
||||
|
||||
### Error 3: File corruption after appending styles without updating count
|
||||
|
||||
Cause: A `<font>` or `<xf>` element was appended but the count attribute was not updated; Excel reads beyond bounds using the old count.
|
||||
|
||||
Fix: Update the corresponding count immediately after appending each element:
|
||||
```xml
|
||||
<!-- After appending the 6th font, count must be changed from 5 to 6 -->
|
||||
<fonts count="6">
|
||||
...
|
||||
</fonts>
|
||||
```
|
||||
|
||||
### Error 4: Blue font + formula (color role contradiction)
|
||||
|
||||
Cause: A formula cell mistakenly uses an input style (e.g., s="5" for currency input).
|
||||
|
||||
```xml
|
||||
<!-- Incorrect: Formula cell uses blue input style -->
|
||||
<c r="C5" s="5"><f>B5*1.08</f><v></v></c>
|
||||
|
||||
<!-- Fix: Change formula cell to corresponding black formula style (5->6, 7->8, 9->10) -->
|
||||
<c r="C5" s="6"><f>B5*1.08</f><v></v></c>
|
||||
```
|
||||
|
||||
### Error 5: AARRGGBB color missing Alpha (only 6 digits)
|
||||
|
||||
```xml
|
||||
<!-- Incorrect: 6-digit format, behavior depends on implementation, usually causes wrong color -->
|
||||
<color rgb="0000FF"/>
|
||||
|
||||
<!-- Fix: Always use 8-digit AARRGGBB, Alpha fixed at 00 -->
|
||||
<color rgb="000000FF"/>
|
||||
```
|
||||
|
||||
### Error 6: Modifying existing xf (affects all cells referencing that index)
|
||||
|
||||
Cause: Directly modifying attributes of the Nth `<xf>` in cellXfs, causing all cells with `s="N"` to be batch-changed.
|
||||
|
||||
Fix: Keep existing entries unchanged, append a new entry at the end, and only change the `s` attribute of cells that need the new style to the new index:
|
||||
```xml
|
||||
<!-- Incorrect: Modified the existing xf at index=6 -->
|
||||
<xf numFmtId="164" fontId="2" fillId="0" borderId="0" xfId="0"
|
||||
applyFont="1" applyNumberFormat="1" applyAlignment="1">
|
||||
<alignment horizontal="right"/> <!-- New attribute added, affects ALL cells already using s="6" -->
|
||||
</xf>
|
||||
|
||||
<!-- Fix: Append new index (when original count=13, new index=13), only change the s attribute of cells needing right alignment -->
|
||||
<!-- Keep index=6 as-is -->
|
||||
<xf numFmtId="164" fontId="2" fillId="0" borderId="0" xfId="0"
|
||||
applyFont="1" applyNumberFormat="1" applyAlignment="1">
|
||||
<alignment horizontal="right"/>
|
||||
</xf> <!-- New index=13 -->
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10. Financial Model Structure Conventions
|
||||
|
||||
### 10.1 Header Rows
|
||||
|
||||
- Bold font (corresponds to style index 4 in this skill's template)
|
||||
- Year columns: use number format `0` (numFmtId="1", no thousands separator) to prevent 2024 from displaying as 2,024
|
||||
- A unit description row may be added below headers: gray or italic text, e.g., "$ thousands" or "% of Revenue"
|
||||
|
||||
### 10.2 Row Type Standards
|
||||
|
||||
| Row Type | Style Recommendation | Example |
|
||||
|----------|---------------------|---------|
|
||||
| Category heading row | Bold, optionally with fill color | "Revenue" |
|
||||
| Line item row | Normal style | "Product A", "Product B" |
|
||||
| Subtotal row | Bold + top border | "Total Revenue" |
|
||||
| Operating metric row | Normal style | "Gross Margin %" |
|
||||
| Separator row | Empty row | (empty) |
|
||||
|
||||
### 10.3 Multi-Year Model Column Layout
|
||||
|
||||
```
|
||||
Col A: Label column (width 28, left-aligned text, s="4" for headers or s="0" for labels)
|
||||
Col B: FY2022 Actual (width 12, year header s="11", data cells styled by semantic role)
|
||||
Col C: FY2023 Actual
|
||||
Col D: FY2024E (forecast period - can use light gray fill fillId=3 to differentiate)
|
||||
Col E: FY2025E
|
||||
Col F: FY2026E
|
||||
```
|
||||
|
||||
### 10.4 Cross-Sheet Reference Patterns
|
||||
|
||||
Complete XML example of parameters passing from assumptions sheet to model sheet:
|
||||
|
||||
```xml
|
||||
<!-- Assumptions sheet, cell B5: 8% growth rate, blue percentage input -->
|
||||
<c r="B5" s="7"><v>0.08</v></c>
|
||||
|
||||
<!-- Model sheet, cell C10: references assumption area growth rate, green percentage formula -->
|
||||
<!-- Requires appending index=13: green + percentage format (fontId=3, numFmtId=165) -->
|
||||
<c r="C10" s="13"><f>Assumptions!B5</f><v></v></c>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11. Assumption Categories
|
||||
|
||||
In the assumptions area (Assumptions sheet or assumptions block), organize assumptions in the following standard order for ease of review and maintenance:
|
||||
|
||||
1. **Revenue assumptions**: Growth rates, pricing, sales volume
|
||||
2. **Cost assumptions**: Gross margin, fixed/variable cost ratios
|
||||
3. **Working capital**: DSO (Days Sales Outstanding), DPO (Days Payable Outstanding), inventory days
|
||||
4. **Capital expenditures (CapEx)**: As a percentage of revenue or absolute amounts
|
||||
5. **Financing assumptions**: Interest rates, debt repayment schedules
|
||||
6. **Tax and other**: Effective tax rate, depreciation & amortization (D&A)
|
||||
|
||||
---
|
||||
|
||||
## 12. Audit Trail Best Practices
|
||||
|
||||
- Use `s="12"` (blue font + yellow fill highlight) to mark cells requiring review or pending changes, making them immediately visible to reviewers
|
||||
- In sensitivity analysis rows or a separate Sensitivity tab, show the impact of +/-1% changes in key assumptions on results
|
||||
- **Do not hide rows containing assumptions**: Assumption rows must be visible to reviewers; do not use the `hidden="1"` attribute
|
||||
- Note a "Last Updated" date at the top of the assumptions area or in a dedicated cell, recording the last modification time of the model
|
||||
|
||||
---
|
||||
|
||||
## 13. Pre-Delivery Checklist (Common Financial Model Checklist)
|
||||
|
||||
Before outputting the final file, confirm each item:
|
||||
|
||||
- [ ] Formula rows contain no hard-coded values (can use `formula_check.py` to scan the packaged `.xlsx` file)
|
||||
- [ ] Year columns display as 2024 not 2,024 (numFmtId="1", format `0`)
|
||||
- [ ] Negative numbers display as (1,234) not -1,234 (use parenthetical style for externally delivered financial reports)
|
||||
- [ ] Zero values display as `-` in sparse rows rather than `0` (formatCode third segment is `"-"`)
|
||||
- [ ] Growth rates and percentages are stored as decimals (0.08 = 8%), format is `0.0%`
|
||||
- [ ] All cross-sheet reference cells use green font (style index 3 or an appended green + number format combination)
|
||||
- [ ] Assumptions block and model block are clearly separated (different sheets or separated by empty rows within the same sheet)
|
||||
- [ ] Summary rows use `SUM()` formulas, not manually hard-coded totals
|
||||
- [ ] Balance verification: summary rows = sum of their respective line items (a check row can be added at the end of the model to verify)
|
||||
231
minimax-xlsx/references/ooxml-cheatsheet.md
Normal file
231
minimax-xlsx/references/ooxml-cheatsheet.md
Normal file
@@ -0,0 +1,231 @@
|
||||
# OOXML SpreadsheetML Cheat Sheet
|
||||
|
||||
Quick reference for XML manipulation of xlsx files.
|
||||
|
||||
---
|
||||
|
||||
## Package Structure
|
||||
|
||||
```
|
||||
my_file.xlsx (ZIP archive)
|
||||
├── [Content_Types].xml ← declares MIME types for all files
|
||||
├── _rels/
|
||||
│ └── .rels ← root relationship: points to xl/workbook.xml
|
||||
└── xl/
|
||||
├── workbook.xml ← sheet list, calc settings
|
||||
├── styles.xml ← ALL style definitions
|
||||
├── sharedStrings.xml ← ALL text strings (referenced by index)
|
||||
├── _rels/
|
||||
│ └── workbook.xml.rels ← maps r:id → worksheet/styles/sharedStrings files
|
||||
├── worksheets/
|
||||
│ ├── sheet1.xml ← Sheet 1 data
|
||||
│ ├── sheet2.xml ← Sheet 2 data
|
||||
│ └── ...
|
||||
├── charts/ ← chart XML (if any)
|
||||
├── pivotTables/ ← pivot table XML (if any)
|
||||
└── theme/
|
||||
└── theme1.xml ← color/font theme
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cell Reference Format
|
||||
|
||||
```
|
||||
A1 → column A (1), row 1
|
||||
B5 → column B (2), row 5
|
||||
AA1 → column 27, row 1
|
||||
```
|
||||
|
||||
Column letter ↔ number conversion:
|
||||
```python
|
||||
def col_letter(n): # 1-based → letter
|
||||
r = ""
|
||||
while n > 0:
|
||||
n, rem = divmod(n - 1, 26)
|
||||
r = chr(65 + rem) + r
|
||||
return r
|
||||
|
||||
def col_number(s): # letter → 1-based
|
||||
n = 0
|
||||
for c in s.upper():
|
||||
n = n * 26 + (ord(c) - 64)
|
||||
return n
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cell XML Reference
|
||||
|
||||
### Data Types
|
||||
|
||||
| Type | `t` attr | XML Example | Value |
|
||||
|------|---------|-------------|-------|
|
||||
| Number | omit | `<c r="B2"><v>1000</v></c>` | 1000 |
|
||||
| String (shared) | `s` | `<c r="A1" t="s"><v>0</v></c>` | sharedStrings[0] |
|
||||
| String (inline) | `inlineStr` | `<c r="A1" t="inlineStr"><is><t>Hi</t></is></c>` | "Hi" |
|
||||
| Boolean | `b` | `<c r="D1" t="b"><v>1</v></c>` | TRUE |
|
||||
| Error | `e` | `<c r="E1" t="e"><v>#REF!</v></c>` | #REF! |
|
||||
| Formula | omit | `<c r="B4"><f>SUM(B2:B3)</f><v></v></c>` | computed |
|
||||
|
||||
### Formula Types
|
||||
|
||||
```xml
|
||||
<!-- Basic formula (no leading = in XML!) -->
|
||||
<c r="B4"><f>SUM(B2:B3)</f><v></v></c>
|
||||
|
||||
<!-- Cross-sheet -->
|
||||
<c r="C1"><f>Assumptions!B5</f><v></v></c>
|
||||
<c r="C1"><f>'Sheet With Spaces'!B5</f><v></v></c>
|
||||
|
||||
<!-- Shared formula: D2:D100 all use B*C with relative row offset -->
|
||||
<c r="D2"><f t="shared" ref="D2:D100" si="0">B2*C2</f><v></v></c>
|
||||
<c r="D3"><f t="shared" si="0"/><v></v></c>
|
||||
|
||||
<!-- Array formula -->
|
||||
<c r="E1"><f t="array" ref="E1:E5">SORT(A1:A5)</f><v></v></c>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## styles.xml Reference
|
||||
|
||||
### Indirect Reference Chain
|
||||
|
||||
```
|
||||
Cell s="3"
|
||||
↓
|
||||
cellXfs[3] → fontId="2", fillId="0", borderId="0", numFmtId="165"
|
||||
↓ ↓ ↓ ↓ ↓
|
||||
fonts[2] fills[0] borders[0] numFmts: id=165
|
||||
blue color no fill no border "0.0%"
|
||||
```
|
||||
|
||||
### Adding a New Style (step-by-step)
|
||||
|
||||
1. In `<numFmts>`: add `<numFmt numFmtId="168" formatCode="0.00%"/>`, update `count`
|
||||
2. In `<fonts>`: add font entry, note its index
|
||||
3. In `<cellXfs>`: append `<xf numFmtId="168" fontId="N" .../>`, update `count`
|
||||
4. New style index = old `cellXfs count` value (before incrementing)
|
||||
5. Apply to cells: `<c r="B5" s="NEW_INDEX">...</c>`
|
||||
|
||||
### Color Format
|
||||
|
||||
`AARRGGBB` — Alpha (always `00` for opaque) + Red + Green + Blue
|
||||
|
||||
```
|
||||
000000FF → Blue
|
||||
00000000 → Black
|
||||
00008000 → Green (dark)
|
||||
00FF0000 → Red
|
||||
00FFFF00 → Yellow (for fills)
|
||||
00FFFFFF → White
|
||||
```
|
||||
|
||||
### Built-in numFmtIds (no declaration needed)
|
||||
|
||||
| ID | Format | Display |
|
||||
|----|--------|---------|
|
||||
| 0 | General | as-is |
|
||||
| 1 | 0 | 2024 (use for years!) |
|
||||
| 2 | 0.00 | 1000.00 |
|
||||
| 3 | #,##0 | 1,000 |
|
||||
| 4 | #,##0.00 | 1,000.00 |
|
||||
| 9 | 0% | 15% |
|
||||
| 10 | 0.00% | 15.25% |
|
||||
| 14 | m/d/yyyy | 3/21/2026 |
|
||||
|
||||
---
|
||||
|
||||
## sharedStrings.xml Reference
|
||||
|
||||
```xml
|
||||
<sst count="3" uniqueCount="3">
|
||||
<si><t>Revenue</t></si> <!-- index 0 -->
|
||||
<si><t>Cost</t></si> <!-- index 1 -->
|
||||
<si><t>Margin</t></si> <!-- index 2 -->
|
||||
</sst>
|
||||
```
|
||||
|
||||
Text with leading/trailing spaces:
|
||||
```xml
|
||||
<si><t xml:space="preserve"> indented </t></si>
|
||||
```
|
||||
|
||||
Special characters:
|
||||
```xml
|
||||
<si><t>R&D Expenses</t></si> <!-- & must be & -->
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## workbook.xml / .rels Sync
|
||||
|
||||
Every `<sheet>` in workbook.xml needs a matching `<Relationship>` in workbook.xml.rels:
|
||||
|
||||
```xml
|
||||
<!-- workbook.xml -->
|
||||
<!-- NOTE: rId numbering depends on what rIds are already in workbook.xml.rels.
|
||||
The minimal template reserves rId1=sheet1, rId2=styles, rId3=sharedStrings.
|
||||
When ADDING sheets to the template, start from rId4 to avoid conflicts.
|
||||
The rId3 here is just a generic illustration — use the next available rId. -->
|
||||
<sheet name="Summary" sheetId="3" r:id="rId3"/>
|
||||
|
||||
<!-- workbook.xml.rels -->
|
||||
<Relationship Id="rId3"
|
||||
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet"
|
||||
Target="worksheets/sheet3.xml"/>
|
||||
```
|
||||
|
||||
And a matching `<Override>` in `[Content_Types].xml`:
|
||||
```xml
|
||||
<Override PartName="/xl/worksheets/sheet3.xml"
|
||||
ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Column / Row Dimensions
|
||||
|
||||
```xml
|
||||
<!-- Before <sheetData> -->
|
||||
<cols>
|
||||
<col min="1" max="1" width="28" customWidth="1"/> <!-- A: 28 chars -->
|
||||
<col min="2" max="6" width="14" customWidth="1"/> <!-- B-F: 14 chars -->
|
||||
</cols>
|
||||
|
||||
<!-- Row height on individual rows -->
|
||||
<row r="1" ht="20" customHeight="1">
|
||||
...
|
||||
</row>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Freeze Panes
|
||||
|
||||
Inside `<sheetView>`:
|
||||
```xml
|
||||
<!-- Freeze row 1 (header row stays visible) -->
|
||||
<pane ySplit="1" topLeftCell="A2" activePane="bottomLeft" state="frozen"/>
|
||||
|
||||
<!-- Freeze column A -->
|
||||
<pane xSplit="1" topLeftCell="B1" activePane="topRight" state="frozen"/>
|
||||
|
||||
<!-- Freeze both row 1 and column A -->
|
||||
<pane xSplit="1" ySplit="1" topLeftCell="B2" activePane="bottomRight" state="frozen"/>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7 Excel Error Types (All Must Be Absent at Delivery)
|
||||
|
||||
| Error | Meaning | Detect in XML |
|
||||
|-------|---------|---------------|
|
||||
| `#REF!` | Invalid cell reference | `<c t="e"><v>#REF!</v></c>` |
|
||||
| `#DIV/0!` | Divide by zero | `<c t="e"><v>#DIV/0!</v></c>` |
|
||||
| `#VALUE!` | Wrong data type | `<c t="e"><v>#VALUE!</v></c>` |
|
||||
| `#NAME?` | Unknown function/name | `<c t="e"><v>#NAME?</v></c>` |
|
||||
| `#NULL!` | Empty intersection | `<c t="e"><v>#NULL!</v></c>` |
|
||||
| `#NUM!` | Number out of range | `<c t="e"><v>#NUM!</v></c>` |
|
||||
| `#N/A` | Value not found | `<c t="e"><v>#N/A</v></c>` |
|
||||
97
minimax-xlsx/references/read-analyze.md
Normal file
97
minimax-xlsx/references/read-analyze.md
Normal file
@@ -0,0 +1,97 @@
|
||||
# Data Reading & Analysis Guide
|
||||
|
||||
> Reference for the READ path. Use `xlsx_reader.py` for structure discovery and data quality auditing,
|
||||
> then pandas for custom analysis. **Never modify the source file.**
|
||||
|
||||
---
|
||||
|
||||
## When to Use This Path
|
||||
|
||||
The user asks to read, analyze, view, summarize, extract, or answer questions about an Excel/CSV file's contents,
|
||||
without requiring file modification. If modification is needed, hand off to `edit.md`.
|
||||
|
||||
---
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1 — Structure Discovery
|
||||
|
||||
Run `xlsx_reader.py` first. It handles format detection, encoding fallback, structure exploration, and data quality audit:
|
||||
|
||||
```bash
|
||||
python3 SKILL_DIR/scripts/xlsx_reader.py input.xlsx # full report
|
||||
python3 SKILL_DIR/scripts/xlsx_reader.py input.xlsx --sheet Sales # single sheet
|
||||
python3 SKILL_DIR/scripts/xlsx_reader.py input.xlsx --quality # quality audit only
|
||||
python3 SKILL_DIR/scripts/xlsx_reader.py input.xlsx --json # machine-readable
|
||||
```
|
||||
|
||||
Supported formats: `.xlsx`, `.xlsm`, `.csv`, `.tsv`. The script tries multiple encodings for CSV (utf-8-sig, gbk, utf-8, latin-1).
|
||||
|
||||
### Step 2 — Custom Analysis with pandas
|
||||
|
||||
Load data and perform the analysis the user requests:
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
df = pd.read_excel("input.xlsx", sheet_name=None) # dict of all sheets
|
||||
# For CSV: pd.read_csv("input.csv")
|
||||
```
|
||||
|
||||
**Header handling** (when the default `header=0` doesn't work):
|
||||
|
||||
| Situation | Code |
|
||||
|-----------|------|
|
||||
| Header on row 3 | `pd.read_excel(path, header=2)` |
|
||||
| Multi-level merged header | `pd.read_excel(path, header=[0, 1])` |
|
||||
| No header | `pd.read_excel(path, header=None)` |
|
||||
|
||||
**Analysis quick reference:**
|
||||
|
||||
| Scenario | Pattern |
|
||||
|----------|---------|
|
||||
| Descriptive stats | `df.describe()` or `df['Col'].agg(['sum', 'mean', 'min', 'max'])` |
|
||||
| Group aggregation | `df.groupby('Region')['Revenue'].agg(Total='sum', Avg='mean')` |
|
||||
| Top N | `df.groupby('Region')['Revenue'].sum().sort_values(ascending=False).head(5)` |
|
||||
| Pivot table | `df.pivot_table(values='Revenue', index='Region', columns='Quarter', aggfunc='sum', margins=True)` |
|
||||
| Time series | `df.set_index(pd.to_datetime(df['Date'])).resample('ME')['Revenue'].sum()` |
|
||||
| Cross-sheet merge | `pd.merge(sales, customers, on='CustomerID', how='left', validate='m:1')` |
|
||||
| Stack sheets | `pd.concat([df.assign(Source=name) for name, df in sheets.items()], ignore_index=True)` |
|
||||
| Large files (>50MB) | `pd.read_excel(path, usecols=['Date', 'Revenue'])` or `pd.read_csv(path, chunksize=10000)` |
|
||||
|
||||
### Step 3 — Output
|
||||
|
||||
If the user specifies an output file path, write results to it (highest priority). Format the report as:
|
||||
|
||||
```
|
||||
## Analysis Report: {filename}
|
||||
### File Overview — format, sheets, row counts
|
||||
### Data Quality — nulls, duplicates, mixed types (or "no issues")
|
||||
### Key Findings — direct answer to the user's question
|
||||
### Additional Notes — formula NaN, encoding issues, caveats
|
||||
```
|
||||
|
||||
**Numeric display**: monetary `1,234,567.89`, percentage `12.3%`, multiples `8.5x`, counts as integers.
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
| Pitfall | Cause | Fix |
|
||||
|---------|-------|-----|
|
||||
| Formula cells read as NaN | `<v>` cache empty in freshly generated files | Inform user; suggest opening in Excel and re-saving; or use `libreoffice_recalc.py` |
|
||||
| CSV encoding errors | Chinese Windows exports use GBK | `xlsx_reader.py` auto-tries multiple encodings; manually specify if all fail |
|
||||
| Mixed types in column | Column has both numbers and text (e.g., "N/A") | `pd.to_numeric(df['Col'], errors='coerce')` — report unconvertible rows |
|
||||
| Year shows as 2,024 | Thousands separator format applied to year | `df['Year'].astype(int).astype(str)` |
|
||||
| Multi-level headers | Two-row header merged | `pd.read_excel(path, header=[0, 1])`, then flatten with `' - '.join()` |
|
||||
| Row number mismatch | pandas 0-indexed vs Excel 1-indexed | `excel_row = pandas_index + 2` (+1 for 1-index, +1 for header) |
|
||||
|
||||
**Critical**: Never open with `data_only=True` then `save()` — this permanently destroys all formulas.
|
||||
|
||||
---
|
||||
|
||||
## Prohibitions
|
||||
|
||||
- Never modify the source file (no `save()`, no XML edits)
|
||||
- Never report formula NaN as "data is zero" — explain it's a formula cache issue
|
||||
- Never report pandas indices as Excel row numbers
|
||||
- Never make speculative conclusions unsupported by the data
|
||||
772
minimax-xlsx/references/validate.md
Normal file
772
minimax-xlsx/references/validate.md
Normal file
@@ -0,0 +1,772 @@
|
||||
# Formula Validation & Recalculation Guide
|
||||
|
||||
Ensure every formula in an xlsx file is provably correct before delivery. A file that opens without visible errors is not a passing file — only a file that has cleared both validation tiers is a passing file.
|
||||
|
||||
---
|
||||
|
||||
## Foundational Rules
|
||||
|
||||
- **Never declare PASS without running `formula_check.py` first.** Visual inspection of a spreadsheet is not validation.
|
||||
- **Tier 1 (static) is mandatory in every scenario.** Tier 2 (dynamic) is mandatory when LibreOffice is available. If it is unavailable, you must state this explicitly in the report — you may not silently skip it.
|
||||
- **Never use openpyxl with `data_only=True` to check formula values.** Opening and saving a workbook in `data_only=True` mode permanently replaces all formulas with their last cached values. Formulas cannot be recovered afterward.
|
||||
- **Auto-fix only deterministic errors.** Any fix that requires understanding business logic must be flagged for human review.
|
||||
|
||||
---
|
||||
|
||||
## Two-Tier Validation Architecture
|
||||
|
||||
```
|
||||
Tier 1 — Static Validation (XML scan, no external tools)
|
||||
│
|
||||
├── Detect: all 7 Excel error types already cached in <v> elements
|
||||
├── Detect: cross-sheet references pointing to nonexistent sheets
|
||||
├── Detect: formula cells with t="e" attribute (error type marker)
|
||||
└── Tool: formula_check.py + manual XML inspection
|
||||
│
|
||||
▼ (if LibreOffice is present)
|
||||
Tier 2 — Dynamic Validation (LibreOffice headless recalculation)
|
||||
│
|
||||
├── Executes all formulas via the LibreOffice Calc engine
|
||||
├── Populates <v> cache values with real computed results
|
||||
├── Exposes runtime errors invisible before recalculation
|
||||
└── Follow-up: re-run Tier 1 on the recalculated file
|
||||
```
|
||||
|
||||
**Why two tiers?**
|
||||
|
||||
openpyxl and all Python xlsx libraries write formula strings (e.g. `=SUM(B2:B9)`) into `<f>` elements but do not evaluate them. A freshly generated file has empty `<v>` cache elements for every formula cell. This means:
|
||||
|
||||
- Tier 1 can only catch errors that are already encoded in the XML — either as `t="e"` cells or as structurally broken cross-sheet references.
|
||||
- Tier 2 uses LibreOffice as the actual calculation engine, runs every formula, fills `<v>` with real results, and surfaces runtime errors (`#DIV/0!`, `#N/A`, etc.) that can only appear after computation.
|
||||
|
||||
Neither tier alone is sufficient. Together they cover the full correctability surface.
|
||||
|
||||
---
|
||||
|
||||
## Tier 1 — Static Validation
|
||||
|
||||
Static validation requires no external tools. It works directly on the ZIP/XML structure of the xlsx file.
|
||||
|
||||
### Step 1: Run formula_check.py
|
||||
|
||||
**Standard (human-readable) output:**
|
||||
|
||||
```bash
|
||||
python3 SKILL_DIR/scripts/formula_check.py /path/to/file.xlsx
|
||||
```
|
||||
|
||||
**JSON output (for programmatic processing):**
|
||||
|
||||
```bash
|
||||
python3 SKILL_DIR/scripts/formula_check.py /path/to/file.xlsx --json
|
||||
```
|
||||
|
||||
**Single-sheet mode (faster for targeted checks):**
|
||||
|
||||
```bash
|
||||
python3 SKILL_DIR/scripts/formula_check.py /path/to/file.xlsx --sheet Summary
|
||||
```
|
||||
|
||||
**Summary mode (counts only, no per-cell detail):**
|
||||
|
||||
```bash
|
||||
python3 SKILL_DIR/scripts/formula_check.py /path/to/file.xlsx --summary
|
||||
```
|
||||
|
||||
Exit codes:
|
||||
- `0` — no hard errors (PASS or PASS with heuristic warnings)
|
||||
- `1` — hard errors detected, or file cannot be opened (FAIL)
|
||||
|
||||
#### What formula_check.py examines
|
||||
|
||||
The script opens the xlsx as a ZIP archive without using any Excel library. It reads `xl/workbook.xml` to enumerate sheet names and named ranges, reads `xl/_rels/workbook.xml.rels` to map each sheet to its XML file, then iterates every `<c>` element in every worksheet.
|
||||
|
||||
It performs five checks:
|
||||
|
||||
1. **Error-value detection**: If the cell has `t="e"`, its `<v>` element contains an Excel error string. The cell is recorded with its sheet name, cell reference (e.g. `C5`), the error value, and the formula text if present.
|
||||
|
||||
2. **Broken cross-sheet reference detection**: If the cell has an `<f>` element, the script extracts all sheet names referenced in the formula (both `SheetName!` and `'Sheet Name'!` syntax). Each name is compared against the list of sheets in `workbook.xml`. A mismatch is a broken reference.
|
||||
|
||||
3. **Unknown named-range detection (heuristic)**: Identifiers in formulas that are not function names, not cell references, and not found in `workbook.xml`'s `<definedNames>` are flagged as `unknown_name_ref` warnings. This is a heuristic — false positives are possible; always verify manually.
|
||||
|
||||
4. **Shared formula integrity**: Shared formula consumer cells (those with only `<f t="shared" si="N"/>`) are skipped for formula counting and cross-ref checks because they inherit the primary cell's formula. Only the primary cell (with `ref="..."` attribute and formula text) is checked and counted.
|
||||
|
||||
5. **Malformed error cells**: Cells with `t="e"` but no `<v>` child element are flagged as structural XML issues.
|
||||
|
||||
Hard errors (exit code 1): `error_value`, `broken_sheet_ref`, `malformed_error_cell`, `file_error`
|
||||
Soft warnings (exit code 0): `unknown_name_ref` — must be verified manually but do not block delivery alone
|
||||
|
||||
#### Reading formula_check.py human-readable output
|
||||
|
||||
A clean file looks like this:
|
||||
|
||||
```
|
||||
File : /tmp/budget_2024.xlsx
|
||||
Sheets : Summary, Q1, Q2, Q3, Q4, Assumptions
|
||||
Formulas checked : 312 distinct formula cells
|
||||
Shared formula ranges : 4 ranges
|
||||
Errors found : 0
|
||||
|
||||
PASS — No formula errors detected
|
||||
```
|
||||
|
||||
A file with errors looks like this:
|
||||
|
||||
```
|
||||
File : /tmp/budget_2024.xlsx
|
||||
Sheets : Summary, Q1, Q2, Q3, Q4, Assumptions
|
||||
Formulas checked : 312 distinct formula cells
|
||||
Shared formula ranges : 4 ranges
|
||||
Errors found : 4
|
||||
|
||||
── Error Details ──
|
||||
[FAIL] [Summary!C12] contains #REF! (formula: Q1!A0/Q1!A1)
|
||||
[FAIL] [Summary!D15] references missing sheet 'Q5'
|
||||
Formula: Q5!D15
|
||||
Valid sheets: ['Assumptions', 'Q1', 'Q2', 'Q3', 'Q4', 'Summary']
|
||||
[FAIL] [Q1!F8] contains #DIV/0!
|
||||
[WARN] [Q2!B10] uses unknown name 'GrowthAssumptions' (heuristic — verify manually)
|
||||
Formula: SUM(GrowthAssumptions)
|
||||
Defined names: ['RevenueRange', 'CostRange']
|
||||
|
||||
FAIL — 3 error(s) must be fixed before delivery
|
||||
WARN — 1 heuristic warning(s) require manual review
|
||||
```
|
||||
|
||||
Interpretation of each line:
|
||||
- `[FAIL] [Summary!C12] contains #REF! (formula: Q1!A0/Q1!A1)` — The cell has `t="e"` and `<v>#REF!</v>`. The formula references row 0, which does not exist in Excel's 1-based system. This is an off-by-one error in a generated reference.
|
||||
- `[FAIL] [Summary!D15] references missing sheet 'Q5'` — The formula contains `Q5!D15`, but no sheet named `Q5` exists in the workbook. The valid sheet list is provided for comparison.
|
||||
- `[FAIL] [Q1!F8] contains #DIV/0!` — This cell's `<v>` is already an error value (the file was previously recalculated). The formula divided by zero.
|
||||
- `[WARN] [Q2!B10] uses unknown name 'GrowthAssumptions'` — The identifier `GrowthAssumptions` appears in the formula but is not in `<definedNames>`. This may be a typo or a name that was accidentally omitted. It is a heuristic warning — verify manually. The warning alone does not block delivery.
|
||||
|
||||
#### Reading formula_check.py JSON output
|
||||
|
||||
```json
|
||||
{
|
||||
"file": "/tmp/budget_2024.xlsx",
|
||||
"sheets_checked": ["Summary", "Q1", "Q2", "Q3", "Q4", "Assumptions"],
|
||||
"formula_count": 312,
|
||||
"shared_formula_ranges": 4,
|
||||
"error_count": 4,
|
||||
"errors": [
|
||||
{
|
||||
"type": "error_value",
|
||||
"error": "#REF!",
|
||||
"sheet": "Summary",
|
||||
"cell": "C12",
|
||||
"formula": "Q1!A0/Q1!A1"
|
||||
},
|
||||
{
|
||||
"type": "broken_sheet_ref",
|
||||
"sheet": "Summary",
|
||||
"cell": "D15",
|
||||
"formula": "Q5!D15",
|
||||
"missing_sheet": "Q5",
|
||||
"valid_sheets": ["Assumptions", "Q1", "Q2", "Q3", "Q4", "Summary"]
|
||||
},
|
||||
{
|
||||
"type": "error_value",
|
||||
"error": "#DIV/0!",
|
||||
"sheet": "Q1",
|
||||
"cell": "F8",
|
||||
"formula": null
|
||||
},
|
||||
{
|
||||
"type": "unknown_name_ref",
|
||||
"sheet": "Q2",
|
||||
"cell": "B10",
|
||||
"formula": "SUM(GrowthAssumptions)",
|
||||
"unknown_name": "GrowthAssumptions",
|
||||
"defined_names": ["RevenueRange", "CostRange"],
|
||||
"note": "Heuristic check — verify manually if this is a false positive"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Field reference:
|
||||
|
||||
| Field | Meaning |
|
||||
|-------|---------|
|
||||
| `type: "error_value"` | Cell has `t="e"` — an Excel error is stored in the `<v>` element |
|
||||
| `type: "broken_sheet_ref"` | Formula references a sheet name not present in workbook.xml |
|
||||
| `type: "unknown_name_ref"` | Formula references an identifier not in `<definedNames>` (heuristic, soft warning) |
|
||||
| `type: "malformed_error_cell"` | Cell has `t="e"` but no `<v>` child — structural XML problem |
|
||||
| `type: "file_error"` | The file could not be opened (bad ZIP, not found, etc.) |
|
||||
| `sheet` | The sheet where the error was found |
|
||||
| `cell` | Cell reference in A1 notation |
|
||||
| `formula` | The full formula text from the `<f>` element (null if not present) |
|
||||
| `error` | The error string from `<v>` (for `error_value` type) |
|
||||
| `missing_sheet` | The sheet name extracted from the formula that does not exist |
|
||||
| `valid_sheets` | All sheet names actually present in workbook.xml |
|
||||
| `unknown_name` | The identifier that was not found in `<definedNames>` |
|
||||
| `defined_names` | All named ranges actually present in workbook.xml |
|
||||
| `shared_formula_ranges` | Count of shared formula definitions (top-level `<f t="shared" ref="...">` elements) |
|
||||
|
||||
### Step 2: Manual XML inspection
|
||||
|
||||
When formula_check.py reports errors, unpack the file to inspect the raw XML:
|
||||
|
||||
```bash
|
||||
python3 SKILL_DIR/scripts/xlsx_unpack.py /path/to/file.xlsx /tmp/xlsx_inspect/
|
||||
```
|
||||
|
||||
Navigate to the worksheet file for the reported sheet. The sheet-to-file mapping is in `xl/_rels/workbook.xml.rels`. For example, if `rId1` maps to `worksheets/sheet1.xml`, then sheet1.xml is the file for the sheet with `r:id="rId1"` in `xl/workbook.xml`.
|
||||
|
||||
For each reported error cell, locate the `<c r="CELLREF">` element and examine:
|
||||
|
||||
**For `error_value` errors:**
|
||||
```xml
|
||||
<!-- This is what an error cell looks like in XML -->
|
||||
<c r="C12" t="e">
|
||||
<f>Q1!C10/Q1!C11</f>
|
||||
<v>#DIV/0!</v>
|
||||
</c>
|
||||
```
|
||||
|
||||
Ask:
|
||||
- Is the `<f>` formula syntactically correct?
|
||||
- Does the cell reference in the formula point to a row/column that exists?
|
||||
- If it is a division, is it possible the denominator cell is empty or zero?
|
||||
|
||||
**For `broken_sheet_ref` errors:**
|
||||
|
||||
Check `xl/workbook.xml` for the actual sheet list:
|
||||
|
||||
```xml
|
||||
<sheets>
|
||||
<sheet name="Summary" sheetId="1" r:id="rId1"/>
|
||||
<sheet name="Q1" sheetId="2" r:id="rId2"/>
|
||||
<sheet name="Q2" sheetId="3" r:id="rId3"/>
|
||||
</sheets>
|
||||
```
|
||||
|
||||
Sheet names are case-sensitive. `q1` and `Q1` are different sheets. Compare the name in the formula exactly against the names here.
|
||||
|
||||
### Step 3: Cross-sheet reference audit (multi-sheet workbooks)
|
||||
|
||||
For workbooks with 3 or more sheets, run a broader cross-reference audit after unpacking:
|
||||
|
||||
```bash
|
||||
# Extract all formulas containing cross-sheet references
|
||||
grep -h "<f>" /tmp/xlsx_inspect/xl/worksheets/*.xml | grep "!"
|
||||
|
||||
# List all actual sheet names from workbook.xml
|
||||
grep -o 'name="[^"]*"' /tmp/xlsx_inspect/xl/workbook.xml | grep -v sheetId
|
||||
```
|
||||
|
||||
Every sheet name appearing in formulas (in the form `SheetName!` or `'Sheet Name'!`) must appear in the workbook sheet list. If any do not match, that is a broken reference even if formula_check.py did not catch it (which can happen with shared formulas where only the primary cell is examined).
|
||||
|
||||
To check shared formulas specifically, look for `<f t="shared" ref="...">` elements:
|
||||
|
||||
```xml
|
||||
<!-- Shared formula: defined on D2, applied to D2:D100 -->
|
||||
<c r="D2"><f t="shared" ref="D2:D100" si="0">Q1!B2*C2</f><v></v></c>
|
||||
|
||||
<!-- Shared formula consumers: only si is present, no formula text -->
|
||||
<c r="D3"><f t="shared" si="0"/><v></v></c>
|
||||
```
|
||||
|
||||
formula_check.py reads the formula text from the primary cell (`D2` above). The referenced sheet `Q1` in that formula applies to the entire range `D2:D100`. If the sheet is broken, all 99 rows are broken even though they appear as empty `<f>` elements.
|
||||
|
||||
---
|
||||
|
||||
## Tier 2 — Dynamic Validation (LibreOffice Headless)
|
||||
|
||||
### Check LibreOffice availability
|
||||
|
||||
```bash
|
||||
# Check macOS (typical install location)
|
||||
which soffice
|
||||
/Applications/LibreOffice.app/Contents/MacOS/soffice --version
|
||||
|
||||
# Check Linux
|
||||
which libreoffice || which soffice
|
||||
libreoffice --version
|
||||
```
|
||||
|
||||
If neither command returns a path, LibreOffice is not installed. Record "Tier 2: SKIPPED — LibreOffice not available" in the report and proceed to delivery with Tier 1 results only.
|
||||
|
||||
### Install LibreOffice (if permitted in the environment)
|
||||
|
||||
macOS:
|
||||
```bash
|
||||
brew install --cask libreoffice
|
||||
```
|
||||
|
||||
Ubuntu/Debian:
|
||||
```bash
|
||||
sudo apt-get install -y libreoffice
|
||||
```
|
||||
|
||||
### Run headless recalculation
|
||||
|
||||
Use the dedicated recalculation script. It handles binary discovery across macOS and Linux, works from a temporary copy of the input (preserving the original), and provides structured output and exit codes compatible with the validation pipeline.
|
||||
|
||||
```bash
|
||||
# Check LibreOffice availability first
|
||||
python3 SKILL_DIR/scripts/libreoffice_recalc.py --check
|
||||
|
||||
# Run recalculation (default timeout: 60s)
|
||||
python3 SKILL_DIR/scripts/libreoffice_recalc.py /path/to/input.xlsx /tmp/recalculated.xlsx
|
||||
|
||||
# For large or complex files, extend the timeout
|
||||
python3 SKILL_DIR/scripts/libreoffice_recalc.py /path/to/input.xlsx /tmp/recalculated.xlsx --timeout 120
|
||||
```
|
||||
|
||||
Exit codes from `libreoffice_recalc.py`:
|
||||
- `0` — recalculation succeeded, output file written
|
||||
- `2` — LibreOffice not found (note as SKIPPED in report; not a hard failure)
|
||||
- `1` — LibreOffice found but failed (timeout, crash, malformed file)
|
||||
|
||||
**What the script does internally:**
|
||||
|
||||
LibreOffice's `--convert-to xlsx` command opens the file using the full Calc engine with the `--infilter="Calc MS Excel 2007 XML"` filter, executes every formula, writes computed values into the `<v>` cache elements, and saves the output. This is the closest server-side equivalent of "open in Excel and press Save." The script also passes `--norestore` to prevent LibreOffice from attempting to restore previous sessions, which can cause hangs in automated environments.
|
||||
|
||||
**If LibreOffice is not installed:**
|
||||
|
||||
macOS:
|
||||
```bash
|
||||
brew install --cask libreoffice
|
||||
```
|
||||
|
||||
Ubuntu/Debian:
|
||||
```bash
|
||||
sudo apt-get install -y libreoffice
|
||||
```
|
||||
|
||||
**If the script times out (libreoffice_recalc.py exits with code 1 and "timed out" message):**
|
||||
|
||||
Record "Tier 2: TIMEOUT — LibreOffice did not complete within Ns" in the report. Do not retry in a loop. Investigate whether the file has circular references or extremely large data ranges.
|
||||
|
||||
### Re-run Tier 1 after recalculation
|
||||
|
||||
After LibreOffice recalculation, the `<v>` elements contain real computed values. Errors that were invisible before (because `<v>` was empty in a freshly generated file) now appear as `t="e"` cells with actual error strings.
|
||||
|
||||
```bash
|
||||
python3 SKILL_DIR/scripts/formula_check.py /tmp/recalculated.xlsx
|
||||
```
|
||||
|
||||
This second Tier 1 pass is the definitive runtime error check. Any errors it finds are real calculation failures that must be fixed.
|
||||
|
||||
---
|
||||
|
||||
## All 7 Error Types — Causes and Fix Strategies
|
||||
|
||||
### #REF! — Invalid Cell Reference
|
||||
|
||||
**What it means:** The formula references a cell, range, or sheet that no longer exists or never existed.
|
||||
|
||||
**Common causes in generated files:**
|
||||
- Off-by-one error in row/column calculation (e.g., referencing row 0 which does not exist in Excel's 1-based system)
|
||||
- Column letter computed incorrectly (e.g., column 64 maps to `BL`, not `BK`)
|
||||
- Formula references a sheet that was never created or was renamed
|
||||
|
||||
**XML signature:**
|
||||
```xml
|
||||
<c r="D5" t="e">
|
||||
<f>Sheet2!A0</f>
|
||||
<v>#REF!</v>
|
||||
</c>
|
||||
```
|
||||
|
||||
**Fix — correct the reference:**
|
||||
```xml
|
||||
<c r="D5">
|
||||
<f>Sheet2!A1</f>
|
||||
<v></v>
|
||||
</c>
|
||||
```
|
||||
|
||||
Note: remove `t="e"` and clear `<v>` after correcting the formula. The error type marker belongs to the cached state, not the formula.
|
||||
|
||||
**Auto-fixable?** Only if the correct target can be determined with certainty from the surrounding context. Otherwise flag for human review.
|
||||
|
||||
---
|
||||
|
||||
### #DIV/0! — Division by Zero
|
||||
|
||||
**What it means:** The formula divides by a value that is zero or an empty cell (empty cells evaluate to 0 in arithmetic context).
|
||||
|
||||
**Common causes in generated files:**
|
||||
- Percentage change formula `=(B2-B1)/B1` where `B1` is empty or zero
|
||||
- Rate formula `=Value/Total` where the total row hasn't been populated yet
|
||||
|
||||
**XML signature:**
|
||||
```xml
|
||||
<c r="C8" t="e">
|
||||
<f>B8/B7</f>
|
||||
<v>#DIV/0!</v>
|
||||
</c>
|
||||
```
|
||||
|
||||
**Fix — wrap with IFERROR:**
|
||||
```xml
|
||||
<c r="C8">
|
||||
<f>IFERROR(B8/B7,0)</f>
|
||||
<v></v>
|
||||
</c>
|
||||
```
|
||||
|
||||
Alternative — explicit zero check:
|
||||
```xml
|
||||
<c r="C8">
|
||||
<f>IF(B7=0,0,B8/B7)</f>
|
||||
<v></v>
|
||||
</c>
|
||||
```
|
||||
|
||||
**Auto-fixable?** Yes. Wrapping with `IFERROR(...,0)` is safe for most financial formulas. If the business expectation is that the result should display as blank rather than zero, use `IFERROR(...,"")` instead.
|
||||
|
||||
---
|
||||
|
||||
### #VALUE! — Wrong Data Type
|
||||
|
||||
**What it means:** The formula attempts an arithmetic or logical operation on a value of the wrong type (e.g., adding a text string to a number).
|
||||
|
||||
**Common causes in generated files:**
|
||||
- A cell intended to hold a number was written as a string type (`t="s"` or `t="inlineStr"`) instead of a numeric type
|
||||
- A formula references a cell containing text (e.g., a unit label like "thousands") and treats it as a number
|
||||
|
||||
**XML signature:**
|
||||
```xml
|
||||
<c r="F3" t="e">
|
||||
<f>E3+D3</f>
|
||||
<v>#VALUE!</v>
|
||||
</c>
|
||||
```
|
||||
|
||||
**Fix — check source cells for incorrect type:**
|
||||
|
||||
If `D3` was incorrectly written as a string:
|
||||
```xml
|
||||
<!-- Wrong: numeric value stored as string -->
|
||||
<c r="D3" t="inlineStr"><is><t>1000</t></is></c>
|
||||
|
||||
<!-- Correct: numeric value stored as number (t attribute omitted or "n") -->
|
||||
<c r="D3"><v>1000</v></c>
|
||||
```
|
||||
|
||||
Alternatively, wrap the formula with `VALUE()` conversion:
|
||||
```xml
|
||||
<c r="F3">
|
||||
<f>VALUE(E3)+VALUE(D3)</f>
|
||||
<v></v>
|
||||
</c>
|
||||
```
|
||||
|
||||
**Auto-fixable?** Partially. If the source cell type is visibly wrong (a number stored as string), fix the type. If the cause is ambiguous (the cell is supposed to contain text), flag for human review.
|
||||
|
||||
---
|
||||
|
||||
### #NAME? — Unrecognized Name
|
||||
|
||||
**What it means:** The formula contains an identifier that Excel does not recognize — either a misspelled function name, an undefined named range, or a function that is not available in the target Excel version.
|
||||
|
||||
**Common causes in generated files:**
|
||||
- LLM writes a function name with a typo: `SUMIF` written as `SUMIFS` when only 3 arguments are provided, or `XLOOKUP` used in a context targeting Excel 2010
|
||||
- Named range referenced in formula does not exist in `xl/workbook.xml`
|
||||
|
||||
**XML signature:**
|
||||
```xml
|
||||
<c r="B2" t="e">
|
||||
<f>SUMSQ(A2:A10)</f>
|
||||
<v>#NAME?</v>
|
||||
</c>
|
||||
```
|
||||
|
||||
**Fix — verify function name and named ranges:**
|
||||
|
||||
Check named ranges in `xl/workbook.xml`:
|
||||
```xml
|
||||
<definedNames>
|
||||
<definedName name="RevenueRange">Sheet1!$B$2:$B$13</definedName>
|
||||
</definedNames>
|
||||
```
|
||||
|
||||
If the formula references `RevenuRange` (typo), correct it to `RevenueRange`:
|
||||
```xml
|
||||
<c r="B2">
|
||||
<f>SUM(RevenueRange)</f>
|
||||
<v></v>
|
||||
</c>
|
||||
```
|
||||
|
||||
**Auto-fixable?** Only if the correct name is unambiguous (e.g., a single close match exists). Otherwise flag for human review — function name fixes require understanding the intended calculation.
|
||||
|
||||
---
|
||||
|
||||
### #N/A — Value Not Available
|
||||
|
||||
**What it means:** A lookup function (VLOOKUP, HLOOKUP, MATCH, INDEX/MATCH, XLOOKUP) searched for a value that does not exist in the lookup table.
|
||||
|
||||
**Common causes in generated files:**
|
||||
- Lookup key exists in the formula but the lookup table is empty or not yet populated
|
||||
- Key format mismatch (text "2024" vs numeric 2024)
|
||||
|
||||
**XML signature:**
|
||||
```xml
|
||||
<c r="G5" t="e">
|
||||
<f>VLOOKUP(F5,Assumptions!$A$2:$B$20,2,0)</f>
|
||||
<v>#N/A</v>
|
||||
</c>
|
||||
```
|
||||
|
||||
**Fix — wrap with IFERROR for missing-match tolerance:**
|
||||
```xml
|
||||
<c r="G5">
|
||||
<f>IFERROR(VLOOKUP(F5,Assumptions!$A$2:$B$20,2,0),0)</f>
|
||||
<v></v>
|
||||
</c>
|
||||
```
|
||||
|
||||
**Auto-fixable?** Adding `IFERROR` is safe if a zero default is acceptable. If the lookup failure indicates a data integrity problem (the key should always be present), do not auto-fix — flag for human review.
|
||||
|
||||
---
|
||||
|
||||
### #NULL! — Empty Intersection
|
||||
|
||||
**What it means:** The space operator (which computes the intersection of two ranges) was applied to two ranges that do not intersect.
|
||||
|
||||
**Common causes in generated files:**
|
||||
- Accidental space between two range references: `=SUM(A1:A5 C1:C5)` instead of `=SUM(A1:A5,C1:C5)`
|
||||
- Rarely seen in typical financial models; usually indicates a formula generation error
|
||||
|
||||
**XML signature:**
|
||||
```xml
|
||||
<c r="H10" t="e">
|
||||
<f>SUM(A1:A5 C1:C5)</f>
|
||||
<v>#NULL!</v>
|
||||
</c>
|
||||
```
|
||||
|
||||
**Fix — replace space with comma (union) or colon (range):**
|
||||
```xml
|
||||
<!-- Union of two separate ranges -->
|
||||
<c r="H10">
|
||||
<f>SUM(A1:A5,C1:C5)</f>
|
||||
<v></v>
|
||||
</c>
|
||||
```
|
||||
|
||||
**Auto-fixable?** Yes. The space operator is almost never intentional in generated formulas. Replacing with a comma is safe.
|
||||
|
||||
---
|
||||
|
||||
### #NUM! — Numeric Error
|
||||
|
||||
**What it means:** A formula produced a number that Excel cannot represent (overflow, underflow) or a mathematical operation that has no real-number result (square root of negative, LOG of zero or negative).
|
||||
|
||||
**Common causes in generated files:**
|
||||
- IRR or NPV formula where the cash flow series has no convergent solution
|
||||
- `SQRT()` applied to a cell that can be negative
|
||||
- Very large exponentiation
|
||||
|
||||
**XML signature:**
|
||||
```xml
|
||||
<c r="J15" t="e">
|
||||
<f>IRR(B5:B15)</f>
|
||||
<v>#NUM!</v>
|
||||
</c>
|
||||
```
|
||||
|
||||
**Fix — add a conditional guard:**
|
||||
```xml
|
||||
<c r="J15">
|
||||
<f>IFERROR(IRR(B5:B15),"")</f>
|
||||
<v></v>
|
||||
</c>
|
||||
```
|
||||
|
||||
For SQRT:
|
||||
```xml
|
||||
<c r="K5">
|
||||
<f>IF(A5>=0,SQRT(A5),"")</f>
|
||||
<v></v>
|
||||
</c>
|
||||
```
|
||||
|
||||
**Auto-fixable?** Partially. Wrapping with `IFERROR` suppresses the error display but does not fix the underlying calculation issue. Flag the cell for human review even after applying the IFERROR wrapper.
|
||||
|
||||
---
|
||||
|
||||
## Auto-Fix vs. Human Review Decision Matrix
|
||||
|
||||
| Error Type | Auto-Fix Safe? | Condition | Action |
|
||||
|------------|---------------|-----------|--------|
|
||||
| `#DIV/0!` | Yes | Always | Wrap with `IFERROR(formula,0)` |
|
||||
| `#NULL!` | Yes | Always | Replace space operator with comma |
|
||||
| `#REF!` | Yes | Only if correct target is unambiguous from context | Correct reference; otherwise flag |
|
||||
| `#NAME?` | Yes | Only if typo has exactly one plausible correction | Fix name; otherwise flag |
|
||||
| `#N/A` | Conditional | If a zero/blank default is business-acceptable | Add IFERROR wrapper; document assumption |
|
||||
| `#VALUE!` | Conditional | Only if source cell type is clearly wrong | Fix type; otherwise flag |
|
||||
| `#NUM!` | No | Always | Add IFERROR to suppress display, then flag |
|
||||
| Broken sheet ref | Yes | Only if renamed sheet can be identified from workbook.xml | Correct name |
|
||||
| Business logic errors | Never | Any case | Human review only |
|
||||
|
||||
**What counts as a business logic error (never auto-fix):**
|
||||
- A formula that produces a wrong number but no Excel error (e.g., `=SUM(B2:B8)` when the intent was `=SUM(B2:B9)`)
|
||||
- A formula where the IFERROR default value is meaningful (e.g., whether to use 0, blank, or a prior-period value)
|
||||
- Any formula where fixing the error requires knowing what the formula was supposed to calculate
|
||||
|
||||
---
|
||||
|
||||
## Delivery Standard — Validation Report
|
||||
|
||||
Every validation task must produce a structured report. This report is the deliverable, regardless of whether errors were found.
|
||||
|
||||
### Required report format
|
||||
|
||||
```markdown
|
||||
## Formula Validation Report
|
||||
|
||||
**File**: /path/to/filename.xlsx
|
||||
**Date**: YYYY-MM-DD
|
||||
**Sheets checked**: Sheet1, Sheet2, Sheet3
|
||||
**Total formulas scanned**: N
|
||||
|
||||
---
|
||||
|
||||
### Tier 1 — Static Validation
|
||||
|
||||
**Status**: PASS / FAIL
|
||||
**Tool**: formula_check.py (direct XML scan)
|
||||
|
||||
| Sheet | Cell | Error Type | Detail | Fix Applied |
|
||||
|-------|------|-----------|--------|-------------|
|
||||
| Summary | C12 | #REF! | Formula: Q1!A0 | Corrected to Q1!A1 |
|
||||
| Summary | D15 | broken_sheet_ref | References missing sheet 'Q5' | Renamed to Q4 |
|
||||
|
||||
_(If no errors: "No errors detected.")_
|
||||
|
||||
---
|
||||
|
||||
### Tier 2 — Dynamic Validation
|
||||
|
||||
**Status**: PASS / FAIL / SKIPPED
|
||||
**Tool**: LibreOffice headless (version X.Y.Z) / Not available
|
||||
|
||||
_(If SKIPPED: state the reason — LibreOffice not installed, timeout, etc.)_
|
||||
|
||||
| Sheet | Cell | Error Type | Detail | Fix Applied |
|
||||
|-------|------|-----------|--------|-------------|
|
||||
| Q1 | F8 | #DIV/0! | Formula: C8/C7 | Wrapped with IFERROR |
|
||||
|
||||
_(If no errors: "No runtime errors detected after recalculation.")_
|
||||
|
||||
---
|
||||
|
||||
### Summary
|
||||
|
||||
- **Total errors found**: N
|
||||
- **Auto-fixed**: N (list types)
|
||||
- **Flagged for human review**: N (list cells and reason)
|
||||
- **Final status**: PASS (ready for delivery) / FAIL (blocked)
|
||||
|
||||
### Human Review Required
|
||||
|
||||
| Cell | Error | Reason Auto-Fix Not Applied |
|
||||
|------|-------|----------------------------|
|
||||
| Q2!B15 | #NUM! | IRR formula — business must confirm cash flow inputs |
|
||||
```
|
||||
|
||||
### Minimum required fields
|
||||
|
||||
The report is invalid (and delivery is blocked) if any of these are missing:
|
||||
- File path and date
|
||||
- Which sheets were checked
|
||||
- Total formula count
|
||||
- Tier 1 status with explicit PASS/FAIL
|
||||
- Tier 2 status with explicit PASS/FAIL/SKIPPED and reason if SKIPPED
|
||||
- For every error: sheet, cell, error type, and disposition (fixed or flagged)
|
||||
- Final delivery status
|
||||
|
||||
---
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
### Scenario 1: Validate immediately after creating a new file
|
||||
|
||||
When `create.md` workflow produces a new xlsx, run validation before any delivery response.
|
||||
|
||||
```bash
|
||||
# Step 1: Static check on the freshly written file
|
||||
python3 SKILL_DIR/scripts/formula_check.py /path/to/output.xlsx
|
||||
|
||||
# Step 2: Dynamic check (if LibreOffice available)
|
||||
python3 SKILL_DIR/scripts/libreoffice_recalc.py /path/to/output.xlsx /tmp/recalculated.xlsx
|
||||
python3 SKILL_DIR/scripts/formula_check.py /tmp/recalculated.xlsx
|
||||
```
|
||||
|
||||
Expected behavior on a freshly created file: Tier 1 will find zero `error_value` errors (because `<v>` elements are empty, not error-valued). It will find any broken cross-sheet references if sheet names were misspelled. Tier 2 will populate `<v>` and reveal runtime errors like `#DIV/0!`.
|
||||
|
||||
If Tier 2 reveals errors, fix them in the source XML (not the recalculated copy), repack, and re-run both tiers.
|
||||
|
||||
### Scenario 2: Validate after editing an existing file
|
||||
|
||||
When `edit.md` workflow modifies an existing xlsx, validate only the affected sheets if the edit was surgical. If the edit touched shared formulas or cross-sheet references, validate all sheets.
|
||||
|
||||
```bash
|
||||
# Targeted static check — look at specific sheet
|
||||
# (formula_check.py checks all sheets; examine only the relevant section of output)
|
||||
python3 SKILL_DIR/scripts/formula_check.py /path/to/edited.xlsx --json \
|
||||
| python3 -c "
|
||||
import json, sys
|
||||
r = json.load(sys.stdin)
|
||||
for e in r['errors']:
|
||||
if e.get('sheet') in ['Summary', 'Q1']:
|
||||
print(e)
|
||||
"
|
||||
```
|
||||
|
||||
Always run Tier 2 after edits that modify formulas, even if Tier 1 passes. Edits to data ranges can cause previously-valid formulas to produce runtime errors.
|
||||
|
||||
### Scenario 3: User provides a file with suspected formula errors
|
||||
|
||||
When a user submits a file and reports wrong values or visible errors:
|
||||
|
||||
```bash
|
||||
# Step 1: Static scan — find all error cells
|
||||
python3 SKILL_DIR/scripts/formula_check.py /path/to/user_file.xlsx --json > /tmp/validation_results.json
|
||||
|
||||
# Step 2: Unpack for manual inspection
|
||||
python3 SKILL_DIR/scripts/xlsx_unpack.py /path/to/user_file.xlsx /tmp/xlsx_inspect/
|
||||
|
||||
# Step 3: Dynamic recalculation
|
||||
python3 SKILL_DIR/scripts/libreoffice_recalc.py /path/to/user_file.xlsx /tmp/user_file_recalc.xlsx
|
||||
|
||||
# Step 4: Re-validate recalculated file
|
||||
python3 SKILL_DIR/scripts/formula_check.py /tmp/user_file_recalc.xlsx --json > /tmp/validation_after_recalc.json
|
||||
|
||||
# Step 5: Compare before and after
|
||||
python3 - <<'EOF'
|
||||
import json
|
||||
before = json.load(open("/tmp/validation_results.json"))
|
||||
after = json.load(open("/tmp/validation_after_recalc.json"))
|
||||
print(f"Before recalc: {before['error_count']} errors")
|
||||
print(f"After recalc: {after['error_count']} errors")
|
||||
EOF
|
||||
```
|
||||
|
||||
If errors appear only after recalculation (not in the original static scan), the formulas were syntactically correct but produce wrong results at runtime. These are runtime errors that require formula-level fixes, not XML-structure fixes.
|
||||
|
||||
If errors appear in both scans, they were already cached in `<v>` before recalculation — the file was previously opened by Excel/LibreOffice and the errors persisted.
|
||||
|
||||
---
|
||||
|
||||
## Critical Pitfalls
|
||||
|
||||
**Pitfall 1: openpyxl `data_only=True` destroys formulas.**
|
||||
Opening a workbook with `data_only=True` reads cached values instead of formulas. If you then save the workbook, all `<f>` elements are permanently removed and replaced with their last-cached values. Never use this mode for validation workflows.
|
||||
|
||||
**Pitfall 2: Empty `<v>` is not the same as a passing formula.**
|
||||
A freshly generated file has empty `<v>` elements for all formula cells. formula_check.py will not report these as errors — they are not yet errors. They become errors only after recalculation if the calculated value is an error type. This is why Tier 2 is mandatory.
|
||||
|
||||
**Pitfall 3: Shared formula errors affect the entire range.**
|
||||
If a shared formula's primary cell has a broken reference, every cell in the shared range (`ref="D2:D100"`) inherits that broken reference. The count of logical errors can be much larger than the count of distinct error entries in formula_check.py output. When fixing a broken shared formula, fix the primary cell's `<f t="shared" ref="...">` element; the consumers (`<f t="shared" si="N"/>`) automatically inherit the corrected formula.
|
||||
|
||||
**Pitfall 4: Sheet names are case-sensitive.**
|
||||
`=q1!B5` and `=Q1!B5` are different references. Excel internally treats them the same, but formula_check.py's string comparison is case-sensitive. If a formula uses a lowercase sheet name that matches an uppercase sheet in the workbook, it will be flagged as a broken reference. The fix is to match the exact case in `workbook.xml`.
|
||||
|
||||
**Pitfall 5: `--convert-to xlsx` does not guarantee formula preservation.**
|
||||
LibreOffice's conversion can occasionally alter certain formula types (array formulas, dynamic array functions like `SORT`, `UNIQUE`). After Tier 2, if the recalculated file shows formula changes unrelated to error fixing, do not deliver the recalculated file directly — use the original file with targeted XML fixes instead.
|
||||
Reference in New Issue
Block a user