# Build New xlsx from Scratch
Create new, production-quality xlsx files using the XML approach. NEVER use openpyxl
for writing. NEVER hardcode Python-computed values — every derived number must be a
live Excel formula.
---
## When to Use This Path
Use this document when the user wants:
- A brand-new Excel file that does not yet exist
- A generated report, financial model, or data table
- Any "create / build / generate / make" request
If the user provides an existing file to modify, switch to `edit.md` instead.
---
## The Non-Negotiable Rules
Before touching any file, internalize these four rules:
1. **Formula-First**: Every calculated value (`SUM`, growth rate, ratio, subtotal, etc.)
MUST be written as `SUM(B2:B9)`, not as a hardcoded `5000`. Hardcoded
numbers go stale when source data changes. Only raw inputs and assumption parameters
may be hardcoded values.
2. **No openpyxl for writing**: The entire file is built by editing XML directly. Python
is only allowed for reading/analysis (`pandas.read_excel()`) and for running helper
scripts (`xlsx_pack.py`, `formula_check.py`).
3. **Style encodes meaning**: Blue font = user input/assumption. Black font = formula
result. Green font = cross-sheet reference. See `format.md` for the full color system
and style index table.
4. **Validate before delivery**: Run `formula_check.py` and fix all errors before
handing the file to the user.
---
## Complete Creation Workflow
### Step 1 — Plan Before Writing
Define the full structure on paper before touching any XML:
- **Sheets**: names, order, purpose (e.g., Assumptions / Model / Summary)
- **Layout per sheet**: which rows are headers, inputs, formulas, totals
- **String inventory**: collect all text labels you will need in sharedStrings
- **Style choices**: what number format each column needs (currency, %, integer, year)
- **Cross-sheet links**: which sheets pull data from other sheets
This planning step prevents the costly cycle of adding strings to sharedStrings
mid-way and recomputing all indices.
---
### Step 2 — Copy Minimal Template
```bash
cp -r SKILL_DIR/templates/minimal_xlsx/ /tmp/xlsx_work/
```
The template gives you a complete, valid 7-file xlsx skeleton:
```
/tmp/xlsx_work/
├── [Content_Types].xml ← MIME type registry
├── _rels/
│ └── .rels ← root relationship (points to workbook.xml)
└── xl/
├── workbook.xml ← sheet list and calc settings
├── styles.xml ← 13 pre-built financial style slots
├── sharedStrings.xml ← text string table (starts empty)
├── _rels/
│ └── workbook.xml.rels ← maps rId → file paths
└── worksheets/
└── sheet1.xml ← one empty sheet
```
After copying, rename sheets and add content. Do not create files from scratch —
always start from the template.
---
### Step 3 — Configure Sheet Structure
#### Single-Sheet Workbook
The template already has one sheet named "Sheet1". Just change the `name` attribute
in `xl/workbook.xml`:
```xml
```
No other files need to change for a single-sheet workbook.
#### Multi-Sheet Workbook
Four files must be kept in sync. Work through them in this order:
**IMPORTANT — rId collision rule**: In the template's `workbook.xml.rels`, the IDs
`rId1`, `rId2`, and `rId3` are already taken:
- `rId1` → `worksheets/sheet1.xml`
- `rId2` → `styles.xml`
- `rId3` → `sharedStrings.xml`
New worksheet entries MUST start at `rId4` and count upward.
**File 1 of 4 — `xl/workbook.xml`** (sheet list):
```xml
```
Special characters in sheet names:
- `&` → `&` in XML: ``
- Max 31 characters
- Forbidden: `/ \ ? * [ ] :`
- Sheet names with spaces need single quotes in formula references: `'Q1 Data'!B5`
**File 2 of 4 — `xl/_rels/workbook.xml.rels`** (ID → file mapping):
```xml
```
**File 3 of 4 — `[Content_Types].xml`** (MIME type declarations):
```xml
```
**File 4 of 4 — Create new worksheet XML files**
Copy `sheet1.xml` to `sheet2.xml` and `sheet3.xml`, then clear the `` content:
```xml
```
**Sync checklist** — every time you add a sheet, verify all four are consistent:
| Check | What to verify |
|-------|---------------|
| `workbook.xml` | New `` exists |
| `workbook.xml.rels` | New `` exists |
| `[Content_Types].xml` | New `` exists |
| Filesystem | `xl/worksheets/sheetN.xml` file actually exists |
---
### Step 4 — Populate sharedStrings
All text values (headers, row labels, category names, any string the user will read)
must be stored in `xl/sharedStrings.xml`. Cells reference them by 0-based index.
**Recommended workflow**: collect ALL text you need first, write the complete table once,
then fill in indices while writing worksheet XML. This avoids re-counting indices mid-way.
```xml
Item
FY2023A
FY2024E
FY2025E
YoY Growth
Revenue
Cost of Goods Sold
Gross Profit
EBITDA
Net Income
```
**Attribute rules**:
- `uniqueCount` = number of `` elements (unique strings in the table)
- `count` = total number of cell references to strings across the entire workbook
(if "Revenue" appears in 3 sheets, count is `uniqueCount + 2`)
- For new files where each string appears once, `count == uniqueCount`
- Both attributes MUST be accurate — wrong values trigger warnings in some Excel versions
**Special character escaping**:
```xml
R&D Expenses
Revenue < Target
(note)
```
**Helper script**: use `shared_strings_builder.py` to generate the complete
`sharedStrings.xml` from a plain list of strings:
```bash
python3 SKILL_DIR/scripts/shared_strings_builder.py \
"Item" "FY2024" "FY2025" "Revenue" "Gross Profit" \
> /tmp/xlsx_work/xl/sharedStrings.xml
```
Or interactively from a file listing one string per line:
```bash
python3 SKILL_DIR/scripts/shared_strings_builder.py --file strings.txt \
> /tmp/xlsx_work/xl/sharedStrings.xml
```
---
### Step 5 — Write Worksheet Data
Edit each `xl/worksheets/sheetN.xml`. Replace the empty `` with rows
and cells.
#### Cell XML Anatomy
```
↑ ↑ ↑
address type style index (from cellXfs in styles.xml)
3
↑
value (for t="s": sharedStrings index; for numbers: the number itself)
```
#### Data Type Reference
| Data | `t` attr | XML Example | Notes |
|------|---------|-------------|-------|
| Shared string (text) | `s` | `0` | `` = sharedStrings index |
| Number | omit | `1000000` | default type, `t` omitted |
| Percentage (as decimal) | omit | `0.125` | 12.5% stored as 0.125 |
| Boolean | `b` | `1` | 1=TRUE, 0=FALSE |
| Formula | omit | `SUM(B2:B3)` | `` left empty |
| Cross-sheet formula | omit | `Assumptions!B2` | use s=3 (green) |
#### A Full Sheet Data Example
```xml
0
1
2
3
4
5
85000000
B2*(1+Assumptions!C3)
C2*(1+Assumptions!D3)
D2/C2-1
7
B2*Assumptions!B4
C2*Assumptions!C4
D2*Assumptions!D4
D3/C3-1
8
SUM(B2:B4)
SUM(C2:C4)
SUM(D2:D4)
D5/C5-1
```
#### Column Width and Freeze Pane
Column widths go **before** ``, freeze pane goes inside ``:
```xml
```
---
### Step 6 — Apply Styles
The template's `xl/styles.xml` has 13 pre-built semantic style slots (indices 0–12).
**Read `format.md` for the complete style index table, color system, and how to add new styles.**
Quick reference for the most common slots:
| `s` | Role | Example |
|-----|------|---------|
| 4 | Header (bold) | Column/row titles |
| 5 / 6 | Currency input (blue) / formula (black) | `$#,##0` |
| 7 / 8 | Percentage input / formula | `0.0%` |
| 11 | Year (no comma) | 2024 not 2,024 |
Design principle: Blue = human sets this. Black = Excel computes this. Green = cross-sheet.
If you need a style not in the 13 pre-built slots, follow the append-only procedure in `format.md` section 3.2.
---
### Step 7 — Formula Cookbook
#### XML Formula Syntax Reminder
Formulas in XML have **no leading `=`**:
```xml
SUM(B2:B9)
```
#### Basic Aggregations
```xml
SUM(B2:B9)
AVERAGE(B2:B9)
COUNT(B2:B9)
COUNTA(A2:A100)
MAX(B2:B9)
MIN(B2:B9)
```
#### Financial Calculations
```xml
D5/C5-1
B4*B3
B8/B4
IF(C5=0,0,D5/C5-1)
NPV(B1,B3:B7)+B2
IRR(B2:B7)
```
#### Cross-Sheet References
```xml
Assumptions!B5
'Q1 Data'!B5
'R&D'!B5
SUM(Data!C2:C1000)
SUM(Jan:Dec!B5)
```
Cross-sheet formula cells should use `s="3"` (green) to signal the data origin.
#### Shared Formulas (Same Pattern Repeated Down a Column)
When many consecutive cells share the same formula structure with only the row number
changing, use shared formulas to keep the XML compact:
```xml
C2/B2-1
```
Excel adjusts relative references automatically (D3 computes `C3/B3-1`, etc.).
If you have multiple shared formula groups, assign sequential `si` values (0, 1, 2, …).
#### Absolute References
```xml
B5/$B$2
```
The `$` character needs no XML escaping — write it literally.
#### Lookup Formulas
```xml
VLOOKUP(A5,Assumptions!A:C,2,0)
INDEX(B:B,MATCH(A5,A:A,0))
XLOOKUP(A5,A:A,B:B)
```
---
### Step 8 — Pack and Validate
**Pack**:
```bash
python3 SKILL_DIR/scripts/xlsx_pack.py /tmp/xlsx_work/ /path/to/output.xlsx
```
`xlsx_pack.py` will:
1. Check that `[Content_Types].xml` exists at the root
2. Parse every `.xml` and `.rels` file for well-formedness — abort if any fail
3. Create the ZIP archive with correct compression
**Validate**:
```bash
python3 SKILL_DIR/scripts/formula_check.py /path/to/output.xlsx
```
`formula_check.py` will:
1. Scan every cell for `` entries (cached error values) — all 7 error types
2. Extract sheet name references from every `` formula
3. Verify each referenced sheet exists in `workbook.xml`
Fix every reported error before delivery. Exit code 0 = safe to deliver.
---
## Pre-Delivery Checklist
Run through this list before handing the file to the user:
- [ ] `formula_check.py` reports 0 errors
- [ ] Every calculated cell has `` — not just `` with a number
- [ ] `sharedStrings.xml` `count` and `uniqueCount` match actual `` count
- [ ] Every cell `s` attribute value is in range `0` to `cellXfs count - 1`
- [ ] Every sheet in `workbook.xml` has a matching entry in `workbook.xml.rels`
- [ ] Every `worksheets/sheetN.xml` file has a matching `` in `[Content_Types].xml`
- [ ] Year columns use `s="11"` (format `0`, no thousands separator)
- [ ] Cross-sheet reference formulas use `s="3"` (green font)
- [ ] Assumption inputs use `s="1"` or `s="5"` or `s="7"` (blue font)
---
## Common Mistakes and Fixes
| Mistake | Symptom | Fix |
|---------|---------|-----|
| Formula has leading `=` | Cell shows `=SUM(...)` as text | Remove `=` from `` content |
| sharedStrings `count` not updated | Excel warning or blank cells | Count `` elements, update both `count` and `uniqueCount` |
| Style index out of range | File corruption / Excel repair | Ensure `s` < `cellXfs count`; append new `` if needed |
| New sheet rId conflicts with styles/sharedStrings rId | Sheet missing or styles lost | New sheets use rId4, rId5, … (rId1-3 are reserved in template) |
| Sheet name has `&` unescaped in XML | XML parse error | Use `&` in `workbook.xml` name attribute |
| Cross-sheet ref to sheet with space, no quotes | `#REF!` error | Wrap sheet name in single quotes: `'Sheet Name'!B5` |
| Cross-sheet ref to non-existent sheet | `#REF!` error | Check `workbook.xml` sheet list vs formula |
| Number stored as text (`t="s"`) | Left-aligned, can't sum | Remove `t` attribute from number cells |
| Year displayed as `2,024` | Readability issue | Use `s="11"` (numFmtId=1, format `0`) |
| Hardcoded Python result instead of formula | "Dead table" — won't update | Replace `N` with `formula` |
---
## Column Letter Reference
| Col # | Letter | Col # | Letter | Col # | Letter |
|-------|--------|-------|--------|-------|--------|
| 1 | A | 26 | Z | 27 | AA |
| 28 | AB | 52 | AZ | 53 | BA |
| 54 | BB | 78 | BZ | 79 | CA |
Python conversion (use when building formulas programmatically):
```python
def col_letter(n: int) -> str:
"""Convert 1-based column number to Excel letter (A, B, ..., Z, AA, AB, ...)."""
result = ""
while n > 0:
n, rem = divmod(n - 1, 26)
result = chr(65 + rem) + result
return result
def col_number(s: str) -> int:
"""Convert Excel column letter to 1-based number."""
n = 0
for c in s.upper():
n = n * 26 + (ord(c) - 64)
return n
```
---
## Typical Scenario Walkthroughs
### Scenario A — Three-Year Financial Model (Single Sheet)
Layout: rows 1-12 = Assumptions (blue inputs) / rows 14-30 = Model (black formulas).
```xml
Metric
FY2023A
FY2024E
FY2025E
Revenue Growth
Gross Margin
Revenue
Gross Profit
0
1
2
3
4
0
0.12
0.15
5
0.45
0.46
0.47
6
85000000
B14*(1+C2)
C14*(1+D2)
7
B14*B3
C14*C3
D14*D3
```
### Scenario B — Data + Summary (Two Sheets)
The `Summary` sheet pulls from `Data` using cross-sheet formulas (green, `s="3"`):
```xml
0
1
2
SUM(Data!C2:C10000)
3
COUNTA(Data!A2:A10000)
4
IF(B3=0,0,B2/B3)
```
### Scenario C — Multi-Department Consolidation
`Consolidated` sheet sums the same cells from multiple department sheets:
```xml
0
Dept_Engineering!B5+Dept_Marketing!B5
1
SUM(Dept_Engineering!B6,Dept_Marketing!B6)
```
---
## What You Must NOT Do
- Do NOT use openpyxl or any Python library to write the final xlsx file
- Do NOT hardcode any calculated value — use `` formulas for every derived number
- Do NOT deliver without running `formula_check.py` first
- Do NOT set a cell's `s` attribute to a value >= `cellXfs count`
- Do NOT modify an existing `` entry in `styles.xml` — only append new ones
- Do NOT add a new sheet without updating all four sync points (workbook.xml,
workbook.xml.rels, [Content_Types].xml, actual .xml file)
- Do NOT assign new worksheet rIds that overlap with rId1, rId2, or rId3 (reserved
for sheet1, styles, sharedStrings in the template)