Files
skills/minimax-docx/references/xsd_validation_guide.md
shihao 6487becf60 Initial commit: add all skills files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 16:52:49 +08:00

4.9 KiB

XSD Validation Guide

Running Validation

# Validate against the WML subset schema
dotnet run --project minimax-docx validate input.docx --xsd assets/xsd/wml-subset.xsd

# Validate against business rules (REQUIRED for Scenario C gate-check)
dotnet run --project minimax-docx validate input.docx --xsd assets/xsd/business-rules.xsd

# Validate against both
dotnet run --project minimax-docx validate input.docx --xsd assets/xsd/wml-subset.xsd --xsd assets/xsd/business-rules.xsd

What wml-subset.xsd Covers

The subset schema validates the most common WordprocessingML elements:

Area Elements Validated
Document structure w:document, w:body, w:sectPr
Paragraphs w:p, w:pPr, w:r, w:rPr, w:t
Tables w:tbl, w:tblPr, w:tblGrid, w:tr, w:tc
Styles w:styles, w:style, w:docDefaults
Lists w:numbering, w:abstractNum, w:num
Headers/Footers w:hdr, w:ftr
Track Changes w:ins, w:del, w:rPrChange, w:pPrChange
Comments w:comment, w:commentRangeStart, w:commentRangeEnd

What It Does NOT Cover

  • DrawingML elements (a:, pic:, wp:) — image/shape internals
  • VML elements (v:, o:) — legacy shapes
  • Math elements (m:) — equations
  • Extended namespaces (w14, w15, w16*) — vendor extensions
  • Custom XML data parts
  • Relationship and content type validation (structural, not schema-based)

Interpreting Errors

Element Ordering Error

ERROR: Element 'w:jc' is not expected at this position.
Expected: w:spacing, w:ind, w:contextualSpacing, ...
Location: /word/document.xml, line 45

Cause: Child elements are in wrong order. See references/openxml_element_order.md. Fix: Reorder children to match schema sequence.

Missing Required Element

ERROR: Element 'w:tbl' missing required child 'w:tblPr'.
Location: /word/document.xml, line 102

Cause: A required child element is absent. Fix: Add the missing element. Tables require both w:tblPr and w:tblGrid.

Invalid Attribute Value

ERROR: Attribute 'w:val' has invalid value 'middle'.
Expected: 'left', 'center', 'right', 'both', 'distribute'
Location: /word/document.xml, line 78

Cause: An attribute value is not in the allowed enumeration. Fix: Use one of the valid values listed in the error.

Unexpected Element

ERROR: Element 'w:customTag' is not expected.
Location: /word/document.xml, line 200

Cause: An element not defined in the subset schema. May be a vendor extension. Fix: Check if it's a known extension (w14/w15/w16). If so, it's likely safe. If unknown, investigate or remove.


Business Rules XSD

The business-rules.xsd schema enforces project-specific constraints beyond standard OpenXML validity:

Rule What It Checks
Required styles Normal, Heading1-Heading3, TableGrid must exist in styles.xml
Font consistency w:docDefaults fonts match expected values
Margin ranges Page margins within acceptable range (720-2160 DXA)
Page size Must be A4 or Letter
Heading hierarchy No gaps (e.g., H1 → H3 without H2)
Style chain w:basedOn references must resolve to existing styles

Extending Business Rules

To add project-specific rules, add xs:assert or xs:restriction elements:

<!-- Require minimum 1-inch margins -->
<xs:element name="pgMar">
  <xs:complexType>
    <xs:attribute name="top" type="xs:integer">
      <xs:restriction>
        <xs:minInclusive value="1440" />
      </xs:restriction>
    </xs:attribute>
  </xs:complexType>
</xs:element>

Gate-Check: Scenario C Hard Gate

In Scenario C (Apply Template), the output document MUST pass business-rules.xsd validation before delivery:

1. Apply template  →  output.docx
2. Validate        →  dotnet run ... validate output.docx --xsd business-rules.xsd
3. PASS?           →  Deliver to user
4. FAIL?           →  Fix issues, re-validate, repeat until PASS

This is a hard gate. A document that fails business-rules validation is NOT deliverable, even if it opens correctly in Word.


False Positives

Vendor Extensions

Elements from extended namespaces (w14, w15, w16*) are not in the subset schema and may trigger warnings:

WARNING: Element '{http://schemas.microsoft.com/office/word/2010/wordml}shadow' is not expected.

These are generally safe to ignore — they are Microsoft extensions for newer features (e.g., advanced text effects, comment extensions).

Markup Compatibility

Documents may contain mc:AlternateContent blocks with fallback content. The subset schema may not recognize the mc: namespace processing. These are safe if the document opens correctly in Word.

  1. Run validation
  2. Treat errors as must-fix
  3. Review warnings — ignore known vendor extensions, investigate unknown elements
  4. After fixing errors, re-validate to confirm