Show HN: Xmloxide – an agent made rust replacement for libxml2
xmloxide
A pure Rust reimplementation of libxml2 — the de facto standard XML/HTML parsing library in the open-source world.
libxml2 became officially unmaintained in December 2025 with known security issues. xmloxide aims to be a memory-safe, high-performance replacement that passes the same conformance test suites.
Features
- Memory-safe — arena-based tree with zero
unsafein the public API - Conformant — 100% pass rate on the W3C XML Conformance Test Suite (1727/1727 applicable tests)
- Error recovery — parse malformed XML and still produce a usable tree, just like libxml2
- Multiple parsing APIs — DOM tree, SAX2 streaming, XmlReader pull, push/incremental
- HTML parser — error-tolerant HTML 4.01 parsing with auto-closing and void elements
- XPath 1.0 — full expression parser and evaluator with all core functions
- Validation — DTD, RelaxNG, and XML Schema (XSD) validation
- Canonical XML — C14N 1.0 and Exclusive C14N serialization
- XInclude — document inclusion processing
- XML Catalogs — OASIS XML Catalogs for URI resolution
xmllintCLI — command-line tool for parsing, validating, and querying XML- Zero-copy where possible — string interning for fast comparisons
- No global state — each
Documentis self-contained andSend + Sync - C/C++ FFI — full C API with header file (
include/xmloxide.h) for embedding in C/C++ projects - Minimal dependencies — only
encoding_rs(library has zero other deps;clapis CLI-only)
Quick Start
use xmloxide::Document;
let doc = Document::parse_str("
Serialization
use xmloxide::Document; use xmloxide::serial::serialize;
let doc = Document::parse_str("
XPath Queries
use xmloxide::Document; use xmloxide::xpath::{evaluate, XPathValue};
let doc = Document::parse_str("
SAX2 Streaming
use xmloxide::sax::{parse_sax, SaxHandler, DefaultHandler}; use xmloxide::parser::ParseOptions;
struct MyHandler; impl SaxHandler for MyHandler { fn start_element(&mut self, name: &str, _: Option<&str>, _: Option<&str>, _: &[(String, String, Option<String>, Option<String>)]) { println!("Element: {name}"); } }
parse_sax("
HTML Parsing
use xmloxide::html::parse_html;
let doc = parse_html("
Hello
World").unwrap();
let root = doc.root_element().unwrap();
assert_eq!(doc.node_name(root), Some("html"));
Error Recovery
use xmloxide::parser::{parse_str_with_options, ParseOptions};
let opts = ParseOptions::default().recover(true);
let doc = parse_str_with_options("
CLI Tool
Parse and pretty-print
xmllint --format document.xml
Validate against a schema
xmllint --schema schema.xsd document.xml xmllint --relaxng schema.rng document.xml xmllint --dtdvalid schema.dtd document.xml
XPath query
xmllint --xpath "//title" document.xml
Canonical XML
xmllint --c14n document.xml
Parse HTML
xmllint --html page.html
Module Overview
Module
Description
tree
Arena-based DOM tree (Document, NodeId, NodeKind)
parser
XML 1.0 recursive descent parser with error recovery
parser::push
Push/incremental parser for chunked input
html
Error-tolerant HTML 4.01 parser
sax
SAX2 streaming event-driven parser
reader
XmlReader pull-based parsing API
serial
XML serializer and Canonical XML (C14N)
xpath
XPath 1.0 expression parser and evaluator
validation::dtd
DTD parsing and validation
validation::relaxng
RelaxNG schema validation
validation::xsd
XML Schema (XSD) validation
xinclude
XInclude 1.0 document inclusion
catalog
OASIS XML Catalogs for URI resolution
encoding
Character encoding detection and transcoding
ffi
C/C++ FFI bindings (include/xmloxide.h)
Performance
Parsing throughput is competitive with libxml2 — within 3-4% on most documents, and 12% faster on SVG. Serialization is 1.5-2.4x faster thanks to the arena-based tree design. XPath is 1.1-2.7x faster across all benchmarks.
Parsing:
Document
Size
xmloxide
libxml2
Result
Atom feed
4.9 KB
26.7 µs (176 MiB/s)
25.5 µs (184 MiB/s)
~4% slower
SVG drawing
6.3 KB
58.5 µs (103 MiB/s)
65.6 µs (92 MiB/s)
12% faster
Maven POM
11.5 KB
76.9 µs (142 MiB/s)
74.2 µs (148 MiB/s)
~4% slower
XHTML page
10.2 KB
69.5 µs (139 MiB/s)
61.5 µs (157 MiB/s)
~13% slower
Large (374 KB)
374 KB
2.15 ms (169 MiB/s)
2.08 ms (175 MiB/s)
~3% slower
Serialization:
Document
Size
xmloxide
libxml2
Result
Atom feed
4.9 KB
11.3 µs
17.5 µs
1.5x faster
Maven POM
11.5 KB
20.1 µs
47.5 µs
2.4x faster
Large (374 KB)
374 KB
614 µs
1397 µs
2.3x faster
XPath:
Expression
xmloxide
libxml2
Result
Simple path (//entry/title)
1.51 µs
1.63 µs
8% faster
Attribute predicate (//book[@id])
5.91 µs
15.99 µs
2.7x faster
count() function
1.09 µs
1.67 µs
1.5x faster
string() function
1.32 µs
1.77 µs
1.3x faster
Key optimizations: arena-based tree for fast serialization, byte-level pre-checks for character validation, bulk text scanning, ASCII fast paths for name parsing, zero-copy element name splitting, inline entity resolution, XPath // step fusion with fused axis expansion, inlined tree accessors, and name-test fast paths for child/descendant axes.
Run benchmarks (requires libxml2 system library)
cargo bench --features bench-libxml2 --bench comparison_bench
Testing
- 785 unit tests across all modules
- 112 FFI tests covering the full C API surface (including SAX streaming)
- libxml2 compatibility suite — 119/119 tests passing (100%) covering XML parsing, namespaces, error detection, and HTML parsing
- W3C XML Conformance Test Suite — 1727/1727 applicable tests passing (100%)
- Integration tests covering real-world XML documents, edge cases, and error recovery
cargo test --all-features
C/C++ FFI
xmloxide provides a C-compatible API for embedding in C/C++ projects (like Chromium, game engines, or any codebase that currently uses libxml2).
Build shared + static libraries (uses the included Makefile)
make
Or build individually:
make shared # .so / .dylib / .dll make static # .a / .lib
Build and run the C example
make example
#include "xmloxide.h"
xmloxide_document *doc = xmloxide_parse_str("
xmloxide_free_string(name); xmloxide_free_string(text); xmloxide_free_doc(doc);
The full API — including tree navigation and mutation, XPath evaluation, serialization (plain and pretty-printed), HTML parsing, DTD/RelaxNG/XSD validation, C14N, and XML Catalogs — is declared in include/xmloxide.h.
Migrating from libxml2
libxml2
xmloxide (Rust)
xmloxide (C FFI)
xmlReadMemory
Document::parse_str
xmloxide_parse_str
xmlReadFile
Document::parse_file
xmloxide_parse_file
xmlParseDoc
Document::parse_bytes
xmloxide_parse_bytes
htmlReadMemory
html::parse_html
xmloxide_parse_html
xmlFreeDoc
(drop Document)
xmloxide_free_doc
xmlDocGetRootElement
doc.root_element()
xmloxide_doc_root_element
xmlNodeGetContent
doc.text_content(id)
xmloxide_node_text_content
xmlNodeSetContent
doc.set_text_content(id, s)
xmloxide_set_text_content
xmlGetProp
doc.attribute(id, name)
xmloxide_node_attribute
xmlSetProp
doc.set_attribute(...)
xmloxide_set_attribute
xmlNewNode
doc.create_node(...)
xmloxide_create_element
xmlNewText
doc.create_node(Text{..})
xmloxide_create_text
xmlAddChild
doc.append_child(p, c)
xmloxide_append_child
xmlAddPrevSibling
doc.insert_before(ref, c)
xmloxide_insert_before
xmlUnlinkNode
doc.remove_node(id)
xmloxide_remove_node
xmlCopyNode
doc.clone_node(id, deep)
xmloxide_clone_node
xmlGetID
doc.element_by_id(s)
xmloxide_element_by_id
xmlDocDumpMemory
serial::serialize(&doc)
xmloxide_serialize
xmlDocDumpFormatMemory
serial::serialize_with_options
xmloxide_serialize_pretty
htmlDocDumpMemory
serial::html::serialize_html
xmloxide_serialize_html
xmlC14NDocDumpMemory
serial::c14n::canonicalize
xmloxide_canonicalize
xmlXPathEvalExpression
xpath::evaluate
xmloxide_xpath_eval
xmlValidateDtd
validation::dtd::validate
xmloxide_validate_dtd
xmlRelaxNGValidateDoc
validation::relaxng::validate
xmloxide_validate_relaxng
xmlSchemaValidateDoc
validation::xsd::validate_xsd
xmloxide_validate_xsd
xmlXIncludeProcess
xinclude::process_xincludes
xmloxide_process_xincludes
xmlLoadCatalog
Catalog::parse
xmloxide_parse_catalog
xmlSAX2... callbacks
sax::SaxHandler trait
xmloxide_sax_parse
xmlTextReaderRead
reader::XmlReader
xmloxide_reader_read
xmlCreatePushParserCtxt
parser::PushParser
xmloxide_push_parser_new
xmlParseChunk
PushParser::push
xmloxide_push_parser_push
Thread safety: Unlike libxml2, xmloxide has no global state. Each Document is self-contained and Send + Sync. The FFI layer uses thread-local storage for the last error message — each thread has its own error state. No initialization or cleanup functions are needed.
Fuzzing
xmloxide includes fuzz targets for security testing:
Install cargo-fuzz (requires nightly)
cargo install cargo-fuzz
Run a fuzz target
cargo +nightly fuzz run fuzz_xml_parse cargo +nightly fuzz run fuzz_html_parse cargo +nightly fuzz run fuzz_xpath cargo +nightly fuzz run fuzz_roundtrip
Building
cargo build cargo test cargo clippy --all-targets --all-features -- -D warnings cargo bench
Minimum supported Rust version: 1.81
Limitations
- No XML 1.1 — xmloxide implements XML 1.0 (Fifth Edition) only. XML 1.1 is rarely used and not planned.
- No XSLT — XSLT is a separate specification (libxslt) and is out of scope.
- No Schematron — Schematron validation is not implemented. DTD, RelaxNG, and XSD are supported.
- HTML 4.01 only — the HTML parser targets HTML 4.01, not the HTML5 parsing algorithm.
- Push parser buffers internally — the push/incremental parser API (
PushParser) currently buffers all pushed data and performs the full parse onfinish(), rather than truly streaming like libxml2'sxmlParseChunk. SAX streaming (parse_sax) is available as an alternative for memory-constrained large-document processing. - XPath
namespace::axis — thenamespace::axis returns the element node when in-scope namespaces match (rather than materializing separate namespace nodes), following the same pattern as the attribute axis.
Contributing
See CONTRIBUTING.md for development setup and guidelines.
Changelog
See CHANGELOG.md for version history.
License
MIT
Source involved in this report: Read Original Article