# Build New xlsx from Scratch Create new, production-quality xlsx files using the XML approach. NEVER use openpyxl for writing. NEVER hardcode Python-computed values — every derived number must be a live Excel formula. --- ## When to Use This Path Use this document when the user wants: - A brand-new Excel file that does not yet exist - A generated report, financial model, or data table - Any "create / build / generate / make" request If the user provides an existing file to modify, switch to `edit.md` instead. --- ## The Non-Negotiable Rules Before touching any file, internalize these four rules: 1. **Formula-First**: Every calculated value (`SUM`, growth rate, ratio, subtotal, etc.) MUST be written as `SUM(B2:B9)`, not as a hardcoded `5000`. Hardcoded numbers go stale when source data changes. Only raw inputs and assumption parameters may be hardcoded values. 2. **No openpyxl for writing**: The entire file is built by editing XML directly. Python is only allowed for reading/analysis (`pandas.read_excel()`) and for running helper scripts (`xlsx_pack.py`, `formula_check.py`). 3. **Style encodes meaning**: Blue font = user input/assumption. Black font = formula result. Green font = cross-sheet reference. See `format.md` for the full color system and style index table. 4. **Validate before delivery**: Run `formula_check.py` and fix all errors before handing the file to the user. --- ## Complete Creation Workflow ### Step 1 — Plan Before Writing Define the full structure on paper before touching any XML: - **Sheets**: names, order, purpose (e.g., Assumptions / Model / Summary) - **Layout per sheet**: which rows are headers, inputs, formulas, totals - **String inventory**: collect all text labels you will need in sharedStrings - **Style choices**: what number format each column needs (currency, %, integer, year) - **Cross-sheet links**: which sheets pull data from other sheets This planning step prevents the costly cycle of adding strings to sharedStrings mid-way and recomputing all indices. --- ### Step 2 — Copy Minimal Template ```bash cp -r SKILL_DIR/templates/minimal_xlsx/ /tmp/xlsx_work/ ``` The template gives you a complete, valid 7-file xlsx skeleton: ``` /tmp/xlsx_work/ ├── [Content_Types].xml ← MIME type registry ├── _rels/ │ └── .rels ← root relationship (points to workbook.xml) └── xl/ ├── workbook.xml ← sheet list and calc settings ├── styles.xml ← 13 pre-built financial style slots ├── sharedStrings.xml ← text string table (starts empty) ├── _rels/ │ └── workbook.xml.rels ← maps rId → file paths └── worksheets/ └── sheet1.xml ← one empty sheet ``` After copying, rename sheets and add content. Do not create files from scratch — always start from the template. --- ### Step 3 — Configure Sheet Structure #### Single-Sheet Workbook The template already has one sheet named "Sheet1". Just change the `name` attribute in `xl/workbook.xml`: ```xml ``` No other files need to change for a single-sheet workbook. #### Multi-Sheet Workbook Four files must be kept in sync. Work through them in this order: **IMPORTANT — rId collision rule**: In the template's `workbook.xml.rels`, the IDs `rId1`, `rId2`, and `rId3` are already taken: - `rId1` → `worksheets/sheet1.xml` - `rId2` → `styles.xml` - `rId3` → `sharedStrings.xml` New worksheet entries MUST start at `rId4` and count upward. **File 1 of 4 — `xl/workbook.xml`** (sheet list): ```xml ``` Special characters in sheet names: - `&` → `&` in XML: `` - Max 31 characters - Forbidden: `/ \ ? * [ ] :` - Sheet names with spaces need single quotes in formula references: `'Q1 Data'!B5` **File 2 of 4 — `xl/_rels/workbook.xml.rels`** (ID → file mapping): ```xml ``` **File 3 of 4 — `[Content_Types].xml`** (MIME type declarations): ```xml ``` **File 4 of 4 — Create new worksheet XML files** Copy `sheet1.xml` to `sheet2.xml` and `sheet3.xml`, then clear the `` content: ```xml ``` **Sync checklist** — every time you add a sheet, verify all four are consistent: | Check | What to verify | |-------|---------------| | `workbook.xml` | New `` exists | | `workbook.xml.rels` | New `` exists | | `[Content_Types].xml` | New `` exists | | Filesystem | `xl/worksheets/sheetN.xml` file actually exists | --- ### Step 4 — Populate sharedStrings All text values (headers, row labels, category names, any string the user will read) must be stored in `xl/sharedStrings.xml`. Cells reference them by 0-based index. **Recommended workflow**: collect ALL text you need first, write the complete table once, then fill in indices while writing worksheet XML. This avoids re-counting indices mid-way. ```xml Item FY2023A FY2024E FY2025E YoY Growth Revenue Cost of Goods Sold Gross Profit EBITDA Net Income ``` **Attribute rules**: - `uniqueCount` = number of `` elements (unique strings in the table) - `count` = total number of cell references to strings across the entire workbook (if "Revenue" appears in 3 sheets, count is `uniqueCount + 2`) - For new files where each string appears once, `count == uniqueCount` - Both attributes MUST be accurate — wrong values trigger warnings in some Excel versions **Special character escaping**: ```xml R&D Expenses Revenue < Target (note) ``` **Helper script**: use `shared_strings_builder.py` to generate the complete `sharedStrings.xml` from a plain list of strings: ```bash python3 SKILL_DIR/scripts/shared_strings_builder.py \ "Item" "FY2024" "FY2025" "Revenue" "Gross Profit" \ > /tmp/xlsx_work/xl/sharedStrings.xml ``` Or interactively from a file listing one string per line: ```bash python3 SKILL_DIR/scripts/shared_strings_builder.py --file strings.txt \ > /tmp/xlsx_work/xl/sharedStrings.xml ``` --- ### Step 5 — Write Worksheet Data Edit each `xl/worksheets/sheetN.xml`. Replace the empty `` with rows and cells. #### Cell XML Anatomy ``` ↑ ↑ ↑ address type style index (from cellXfs in styles.xml) 3 ↑ value (for t="s": sharedStrings index; for numbers: the number itself) ``` #### Data Type Reference | Data | `t` attr | XML Example | Notes | |------|---------|-------------|-------| | Shared string (text) | `s` | `0` | `` = sharedStrings index | | Number | omit | `1000000` | default type, `t` omitted | | Percentage (as decimal) | omit | `0.125` | 12.5% stored as 0.125 | | Boolean | `b` | `1` | 1=TRUE, 0=FALSE | | Formula | omit | `SUM(B2:B3)` | `` left empty | | Cross-sheet formula | omit | `Assumptions!B2` | use s=3 (green) | #### A Full Sheet Data Example ```xml 0 1 2 3 4 5 85000000 B2*(1+Assumptions!C3) C2*(1+Assumptions!D3) D2/C2-1 7 B2*Assumptions!B4 C2*Assumptions!C4 D2*Assumptions!D4 D3/C3-1 8 SUM(B2:B4) SUM(C2:C4) SUM(D2:D4) D5/C5-1 ``` #### Column Width and Freeze Pane Column widths go **before** ``, freeze pane goes inside ``: ```xml ``` --- ### Step 6 — Apply Styles The template's `xl/styles.xml` has 13 pre-built semantic style slots (indices 0–12). **Read `format.md` for the complete style index table, color system, and how to add new styles.** Quick reference for the most common slots: | `s` | Role | Example | |-----|------|---------| | 4 | Header (bold) | Column/row titles | | 5 / 6 | Currency input (blue) / formula (black) | `$#,##0` | | 7 / 8 | Percentage input / formula | `0.0%` | | 11 | Year (no comma) | 2024 not 2,024 | Design principle: Blue = human sets this. Black = Excel computes this. Green = cross-sheet. If you need a style not in the 13 pre-built slots, follow the append-only procedure in `format.md` section 3.2. --- ### Step 7 — Formula Cookbook #### XML Formula Syntax Reminder Formulas in XML have **no leading `=`**: ```xml SUM(B2:B9) ``` #### Basic Aggregations ```xml SUM(B2:B9) AVERAGE(B2:B9) COUNT(B2:B9) COUNTA(A2:A100) MAX(B2:B9) MIN(B2:B9) ``` #### Financial Calculations ```xml D5/C5-1 B4*B3 B8/B4 IF(C5=0,0,D5/C5-1) NPV(B1,B3:B7)+B2 IRR(B2:B7) ``` #### Cross-Sheet References ```xml Assumptions!B5 'Q1 Data'!B5 'R&D'!B5 SUM(Data!C2:C1000) SUM(Jan:Dec!B5) ``` Cross-sheet formula cells should use `s="3"` (green) to signal the data origin. #### Shared Formulas (Same Pattern Repeated Down a Column) When many consecutive cells share the same formula structure with only the row number changing, use shared formulas to keep the XML compact: ```xml C2/B2-1 ``` Excel adjusts relative references automatically (D3 computes `C3/B3-1`, etc.). If you have multiple shared formula groups, assign sequential `si` values (0, 1, 2, …). #### Absolute References ```xml B5/$B$2 ``` The `$` character needs no XML escaping — write it literally. #### Lookup Formulas ```xml VLOOKUP(A5,Assumptions!A:C,2,0) INDEX(B:B,MATCH(A5,A:A,0)) XLOOKUP(A5,A:A,B:B) ``` --- ### Step 8 — Pack and Validate **Pack**: ```bash python3 SKILL_DIR/scripts/xlsx_pack.py /tmp/xlsx_work/ /path/to/output.xlsx ``` `xlsx_pack.py` will: 1. Check that `[Content_Types].xml` exists at the root 2. Parse every `.xml` and `.rels` file for well-formedness — abort if any fail 3. Create the ZIP archive with correct compression **Validate**: ```bash python3 SKILL_DIR/scripts/formula_check.py /path/to/output.xlsx ``` `formula_check.py` will: 1. Scan every cell for `` entries (cached error values) — all 7 error types 2. Extract sheet name references from every `` formula 3. Verify each referenced sheet exists in `workbook.xml` Fix every reported error before delivery. Exit code 0 = safe to deliver. --- ## Pre-Delivery Checklist Run through this list before handing the file to the user: - [ ] `formula_check.py` reports 0 errors - [ ] Every calculated cell has `` — not just `` with a number - [ ] `sharedStrings.xml` `count` and `uniqueCount` match actual `` count - [ ] Every cell `s` attribute value is in range `0` to `cellXfs count - 1` - [ ] Every sheet in `workbook.xml` has a matching entry in `workbook.xml.rels` - [ ] Every `worksheets/sheetN.xml` file has a matching `` in `[Content_Types].xml` - [ ] Year columns use `s="11"` (format `0`, no thousands separator) - [ ] Cross-sheet reference formulas use `s="3"` (green font) - [ ] Assumption inputs use `s="1"` or `s="5"` or `s="7"` (blue font) --- ## Common Mistakes and Fixes | Mistake | Symptom | Fix | |---------|---------|-----| | Formula has leading `=` | Cell shows `=SUM(...)` as text | Remove `=` from `` content | | sharedStrings `count` not updated | Excel warning or blank cells | Count `` elements, update both `count` and `uniqueCount` | | Style index out of range | File corruption / Excel repair | Ensure `s` < `cellXfs count`; append new `` if needed | | New sheet rId conflicts with styles/sharedStrings rId | Sheet missing or styles lost | New sheets use rId4, rId5, … (rId1-3 are reserved in template) | | Sheet name has `&` unescaped in XML | XML parse error | Use `&` in `workbook.xml` name attribute | | Cross-sheet ref to sheet with space, no quotes | `#REF!` error | Wrap sheet name in single quotes: `'Sheet Name'!B5` | | Cross-sheet ref to non-existent sheet | `#REF!` error | Check `workbook.xml` sheet list vs formula | | Number stored as text (`t="s"`) | Left-aligned, can't sum | Remove `t` attribute from number cells | | Year displayed as `2,024` | Readability issue | Use `s="11"` (numFmtId=1, format `0`) | | Hardcoded Python result instead of formula | "Dead table" — won't update | Replace `N` with `formula` | --- ## Column Letter Reference | Col # | Letter | Col # | Letter | Col # | Letter | |-------|--------|-------|--------|-------|--------| | 1 | A | 26 | Z | 27 | AA | | 28 | AB | 52 | AZ | 53 | BA | | 54 | BB | 78 | BZ | 79 | CA | Python conversion (use when building formulas programmatically): ```python def col_letter(n: int) -> str: """Convert 1-based column number to Excel letter (A, B, ..., Z, AA, AB, ...).""" result = "" while n > 0: n, rem = divmod(n - 1, 26) result = chr(65 + rem) + result return result def col_number(s: str) -> int: """Convert Excel column letter to 1-based number.""" n = 0 for c in s.upper(): n = n * 26 + (ord(c) - 64) return n ``` --- ## Typical Scenario Walkthroughs ### Scenario A — Three-Year Financial Model (Single Sheet) Layout: rows 1-12 = Assumptions (blue inputs) / rows 14-30 = Model (black formulas). ```xml Metric FY2023A FY2024E FY2025E Revenue Growth Gross Margin Revenue Gross Profit 0 1 2 3 4 0 0.12 0.15 5 0.45 0.46 0.47 6 85000000 B14*(1+C2) C14*(1+D2) 7 B14*B3 C14*C3 D14*D3 ``` ### Scenario B — Data + Summary (Two Sheets) The `Summary` sheet pulls from `Data` using cross-sheet formulas (green, `s="3"`): ```xml 0 1 2 SUM(Data!C2:C10000) 3 COUNTA(Data!A2:A10000) 4 IF(B3=0,0,B2/B3) ``` ### Scenario C — Multi-Department Consolidation `Consolidated` sheet sums the same cells from multiple department sheets: ```xml 0 Dept_Engineering!B5+Dept_Marketing!B5 1 SUM(Dept_Engineering!B6,Dept_Marketing!B6) ``` --- ## What You Must NOT Do - Do NOT use openpyxl or any Python library to write the final xlsx file - Do NOT hardcode any calculated value — use `` formulas for every derived number - Do NOT deliver without running `formula_check.py` first - Do NOT set a cell's `s` attribute to a value >= `cellXfs count` - Do NOT modify an existing `` entry in `styles.xml` — only append new ones - Do NOT add a new sheet without updating all four sync points (workbook.xml, workbook.xml.rels, [Content_Types].xml, actual .xml file) - Do NOT assign new worksheet rIds that overlap with rId1, rId2, or rId3 (reserved for sheet1, styles, sharedStrings in the template)