Initial commit: add all skills files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 16:52:49 +08:00
commit 6487becf60
396 changed files with 108871 additions and 0 deletions
--- a/minimax-xlsx/references/read-analyze.md
+++ b/minimax-xlsx/references/read-analyze.md
@@ -0,0 +1,97 @@
+# Data Reading & Analysis Guide
+
+> Reference for the READ path. Use `xlsx_reader.py` for structure discovery and data quality auditing,
+> then pandas for custom analysis. **Never modify the source file.**
+
+---
+
+## When to Use This Path
+
+The user asks to read, analyze, view, summarize, extract, or answer questions about an Excel/CSV file's contents,
+without requiring file modification. If modification is needed, hand off to `edit.md`.
+
+---
+
+## Workflow
+
+### Step 1 — Structure Discovery
+
+Run `xlsx_reader.py` first. It handles format detection, encoding fallback, structure exploration, and data quality audit:
+
+```bash
+python3 SKILL_DIR/scripts/xlsx_reader.py input.xlsx                 # full report
+python3 SKILL_DIR/scripts/xlsx_reader.py input.xlsx --sheet Sales   # single sheet
+python3 SKILL_DIR/scripts/xlsx_reader.py input.xlsx --quality       # quality audit only
+python3 SKILL_DIR/scripts/xlsx_reader.py input.xlsx --json          # machine-readable
+```
+
+Supported formats: `.xlsx`, `.xlsm`, `.csv`, `.tsv`. The script tries multiple encodings for CSV (utf-8-sig, gbk, utf-8, latin-1).
+
+### Step 2 — Custom Analysis with pandas
+
+Load data and perform the analysis the user requests:
+
+```python
+import pandas as pd
+df = pd.read_excel("input.xlsx", sheet_name=None)  # dict of all sheets
+# For CSV: pd.read_csv("input.csv")
+```
+
+**Header handling** (when the default `header=0` doesn't work):
+
+| Situation | Code |
+|-----------|------|
+| Header on row 3 | `pd.read_excel(path, header=2)` |
+| Multi-level merged header | `pd.read_excel(path, header=[0, 1])` |
+| No header | `pd.read_excel(path, header=None)` |
+
+**Analysis quick reference:**
+
+| Scenario | Pattern |
+|----------|---------|
+| Descriptive stats | `df.describe()` or `df['Col'].agg(['sum', 'mean', 'min', 'max'])` |
+| Group aggregation | `df.groupby('Region')['Revenue'].agg(Total='sum', Avg='mean')` |
+| Top N | `df.groupby('Region')['Revenue'].sum().sort_values(ascending=False).head(5)` |
+| Pivot table | `df.pivot_table(values='Revenue', index='Region', columns='Quarter', aggfunc='sum', margins=True)` |
+| Time series | `df.set_index(pd.to_datetime(df['Date'])).resample('ME')['Revenue'].sum()` |
+| Cross-sheet merge | `pd.merge(sales, customers, on='CustomerID', how='left', validate='m:1')` |
+| Stack sheets | `pd.concat([df.assign(Source=name) for name, df in sheets.items()], ignore_index=True)` |
+| Large files (>50MB) | `pd.read_excel(path, usecols=['Date', 'Revenue'])` or `pd.read_csv(path, chunksize=10000)` |
+
+### Step 3 — Output
+
+If the user specifies an output file path, write results to it (highest priority). Format the report as:
+
+```
+## Analysis Report: {filename}
+### File Overview     — format, sheets, row counts
+### Data Quality      — nulls, duplicates, mixed types (or "no issues")
+### Key Findings      — direct answer to the user's question
+### Additional Notes  — formula NaN, encoding issues, caveats
+```
+
+**Numeric display**: monetary `1,234,567.89`, percentage `12.3%`, multiples `8.5x`, counts as integers.
+
+---
+
+## Common Pitfalls
+
+| Pitfall | Cause | Fix |
+|---------|-------|-----|
+| Formula cells read as NaN | `<v>` cache empty in freshly generated files | Inform user; suggest opening in Excel and re-saving; or use `libreoffice_recalc.py` |
+| CSV encoding errors | Chinese Windows exports use GBK | `xlsx_reader.py` auto-tries multiple encodings; manually specify if all fail |
+| Mixed types in column | Column has both numbers and text (e.g., "N/A") | `pd.to_numeric(df['Col'], errors='coerce')` — report unconvertible rows |
+| Year shows as 2,024 | Thousands separator format applied to year | `df['Year'].astype(int).astype(str)` |
+| Multi-level headers | Two-row header merged | `pd.read_excel(path, header=[0, 1])`, then flatten with `' - '.join()` |
+| Row number mismatch | pandas 0-indexed vs Excel 1-indexed | `excel_row = pandas_index + 2` (+1 for 1-index, +1 for header) |
+
+**Critical**: Never open with `data_only=True` then `save()` — this permanently destroys all formulas.
+
+---
+
+## Prohibitions
+
+- Never modify the source file (no `save()`, no XML edits)
+- Never report formula NaN as "data is zero" — explain it's a formula cache issue
+- Never report pandas indices as Excel row numbers
+- Never make speculative conclusions unsupported by the data