|                                                                                                                                                                                                                                    |  | # sax js
A sax-style parser for XML and HTML.
Designed with [node](http://nodejs.org/) in mind, but should work fine inthe browser or other CommonJS implementations.
## What This Is
* A very simple tool to parse through an XML string.* A stepping stone to a streaming HTML parser.* A handy way to deal with RSS and other mostly-ok-but-kinda-broken XML  docs.
## What This Is (probably) Not
* An HTML Parser - That's a fine goal, but this isn't it.  It's just  XML.* A DOM Builder - You can use it to build an object model out of XML,  but it doesn't do that out of the box.* XSLT - No DOM = no querying.* 100% Compliant with (some other SAX implementation) - Most SAX  implementations are in Java and do a lot more than this does.* An XML Validator - It does a little validation when in strict mode, but  not much.* A Schema-Aware XSD Thing - Schemas are an exercise in fetishistic  masochism.* A DTD-aware Thing - Fetching DTDs is a much bigger job.
## Regarding `<!DOCTYPE`s and `<!ENTITY`s
The parser will handle the basic XML entities in text nodes and attributevalues: `& < > ' "`. It's possible to define additionalentities in XML by putting them in the DTD. This parser doesn't do anythingwith that. If you want to listen to the `ondoctype` event, and then fetchthe doctypes, and read the entities and add them to `parser.ENTITIES`, thenbe my guest.
Unknown entities will fail in strict mode, and in loose mode, will passthrough unmolested.
## Usage
```javascriptvar sax = require("./lib/sax"),  strict = true, // set to false for html-mode  parser = sax.parser(strict);
parser.onerror = function (e) {  // an error happened.};parser.ontext = function (t) {  // got some text.  t is the string of text.};parser.onopentag = function (node) {  // opened a tag.  node has "name" and "attributes"};parser.onattribute = function (attr) {  // an attribute.  attr has "name" and "value"};parser.onend = function () {  // parser stream is done, and ready to have more stuff written to it.};
parser.write('<xml>Hello, <who name="world">world</who>!</xml>').close();
// stream usage// takes the same options as the parservar saxStream = require("sax").createStream(strict, options)saxStream.on("error", function (e) {  // unhandled errors will throw, since this is a proper node  // event emitter.  console.error("error!", e)  // clear the error  this._parser.error = null  this._parser.resume()})saxStream.on("opentag", function (node) {  // same object as above})// pipe is supported, and it's readable/writable// same chunks coming in also go out.fs.createReadStream("file.xml")  .pipe(saxStream)  .pipe(fs.createWriteStream("file-copy.xml"))```
## Arguments
Pass the following arguments to the parser function.  All are optional.
`strict` - Boolean. Whether or not to be a jerk. Default: `false`.
`opt` - Object bag of settings regarding string formatting.  All default to `false`.
Settings supported:
* `trim` - Boolean. Whether or not to trim text and comment nodes.* `normalize` - Boolean. If true, then turn any whitespace into a single  space.* `lowercase` - Boolean. If true, then lowercase tag names and attribute names  in loose mode, rather than uppercasing them.* `xmlns` - Boolean. If true, then namespaces are supported.* `position` - Boolean. If false, then don't track line/col/position.* `strictEntities` - Boolean. If true, only parse [predefined XML  entities](http://www.w3.org/TR/REC-xml/#sec-predefined-ent)  (`&`, `'`, `>`, `<`, and `"`)* `unquotedAttributeValues` - Boolean. If true, then unquoted  attribute values are allowed. Defaults to `false` when `strict`  is true, `true` otherwise.
## Methods
`write` - Write bytes onto the stream. You don't have to do this all atonce. You can keep writing as much as you want.
`close` - Close the stream. Once closed, no more data may be written untilit is done processing the buffer, which is signaled by the `end` event.
`resume` - To gracefully handle errors, assign a listener to the `error`event. Then, when the error is taken care of, you can call `resume` tocontinue parsing. Otherwise, the parser will not continue while in an errorstate.
## Members
At all times, the parser object will have the following members:
`line`, `column`, `position` - Indications of the position in the XMLdocument where the parser currently is looking.
`startTagPosition` - Indicates the position where the current tag starts.
`closed` - Boolean indicating whether or not the parser can be written to.If it's `true`, then wait for the `ready` event to write again.
`strict` - Boolean indicating whether or not the parser is a jerk.
`opt` - Any options passed into the constructor.
`tag` - The current tag being dealt with.
And a bunch of other stuff that you probably shouldn't touch.
## Events
All events emit with a single argument. To listen to an event, assign afunction to `on<eventname>`. Functions get executed in the this-context ofthe parser object. The list of supported events are also in the exported`EVENTS` array.
When using the stream interface, assign handlers using the EventEmitter`on` function in the normal fashion.
`error` - Indication that something bad happened. The error will be hangingout on `parser.error`, and must be deleted before parsing can continue. Bylistening to this event, you can keep an eye on that kind of stuff. Note:this happens *much* more in strict mode. Argument: instance of `Error`.
`text` - Text node. Argument: string of text.
`doctype` - The `<!DOCTYPE` declaration. Argument: doctype string.
`processinginstruction` - Stuff like `<?xml foo="blerg" ?>`. Argument:object with `name` and `body` members. Attributes are not parsed, asprocessing instructions have implementation dependent semantics.
`sgmldeclaration` - Random SGML declarations. Stuff like `<!ENTITY p>`would trigger this kind of event. This is a weird thing to support, so itmight go away at some point. SAX isn't intended to be used to parse SGML,after all.
`opentagstart` - Emitted immediately when the tag name is available,but before any attributes are encountered.  Argument: object with a`name` field and an empty `attributes` set.  Note that this is thesame object that will later be emitted in the `opentag` event.
`opentag` - An opening tag. Argument: object with `name` and `attributes`.In non-strict mode, tag names are uppercased, unless the `lowercase`option is set.  If the `xmlns` option is set, then it will containnamespace binding information on the `ns` member, and will have a`local`, `prefix`, and `uri` member.
`closetag` - A closing tag. In loose mode, tags are auto-closed if theirparent closes. In strict mode, well-formedness is enforced. Note thatself-closing tags will have `closeTag` emitted immediately after `openTag`.Argument: tag name.
`attribute` - An attribute node.  Argument: object with `name` and `value`.In non-strict mode, attribute names are uppercased, unless the `lowercase`option is set.  If the `xmlns` option is set, it will also contains namespaceinformation.
`comment` - A comment node.  Argument: the string of the comment.
`opencdata` - The opening tag of a `<![CDATA[` block.
`cdata` - The text of a `<![CDATA[` block. Since `<![CDATA[` blocks can getquite large, this event may fire multiple times for a single block, if itis broken up into multiple `write()`s. Argument: the string of randomcharacter data.
`closecdata` - The closing tag (`]]>`) of a `<![CDATA[` block.
`opennamespace` - If the `xmlns` option is set, then this event willsignal the start of a new namespace binding.
`closenamespace` - If the `xmlns` option is set, then this event willsignal the end of a namespace binding.
`end` - Indication that the closed stream has ended.
`ready` - Indication that the stream has reset, and is ready to be writtento.
`noscript` - In non-strict mode, `<script>` tags trigger a `"script"`event, and their contents are not checked for special xml characters.If you pass `noscript: true`, then this behavior is suppressed.
## Reporting Problems
It's best to write a failing test if you find an issue.  I will alwaysaccept pull requests with failing tests if they demonstrate intendedbehavior, but it is very hard to figure out what issue you're describingwithout a test.  Writing a test is also the best way for you yourselfto figure out if you really understand the issue you think you have withsax-js.
 |