6 * The following notes assume that you are familiar with the YAML specification
7 * (http://yaml.org/spec/cvs/current.html). We mostly follow it, although in
8 * some cases we are less restrictive that it requires.
10 * The process of transforming a YAML stream into a sequence of events is
11 * divided on two steps: Scanning and Parsing.
13 * The Scanner transforms the input stream into a sequence of tokens, while the
14 * parser transform the sequence of tokens produced by the Scanner into a
15 * sequence of parsing events.
17 * The Scanner is rather clever and complicated. The Parser, on the contrary,
18 * is a straightforward implementation of a recursive-descendant parser (or,
19 * LL(1) parser, as it is usually called).
21 * Actually there are two issues of Scanning that might be called "clever", the
22 * rest is quite straightforward. The issues are "block collection start" and
23 * "simple keys". Both issues are explained below in details.
25 * Here the Scanning step is explained and implemented. We start with the list
26 * of all the tokens produced by the Scanner together with short descriptions.
30 * STREAM-START(encoding) # The stream start.
31 * STREAM-END # The stream end.
32 * VERSION-DIRECTIVE(major,minor) # The '%YAML' directive.
33 * TAG-DIRECTIVE(handle,prefix) # The '%TAG' directive.
34 * DOCUMENT-START # '---'
35 * DOCUMENT-END # '...'
36 * BLOCK-SEQUENCE-START # Indentation increase denoting a block
37 * BLOCK-MAPPING-START # sequence or a block mapping.
38 * BLOCK-END # Indentation decrease.
39 * FLOW-SEQUENCE-START # '['
40 * FLOW-SEQUENCE-END # ']'
41 * BLOCK-SEQUENCE-START # '{'
42 * BLOCK-SEQUENCE-END # '}'
45 * KEY # '?' or nothing (simple keys).
47 * ALIAS(anchor) # '*anchor'
48 * ANCHOR(anchor) # '&anchor'
49 * TAG(handle,suffix) # '!handle!suffix'
50 * SCALAR(value,style) # A scalar.
52 * The following two tokens are "virtual" tokens denoting the beginning and the
55 * STREAM-START(encoding)
58 * We pass the information about the input stream encoding with the
61 * The next two tokens are responsible for tags:
63 * VERSION-DIRECTIVE(major,minor)
64 * TAG-DIRECTIVE(handle,prefix)
70 * %TAG !yaml! tag:yaml.org,2002:
73 * The correspoding sequence of tokens:
76 * VERSION-DIRECTIVE(1,1)
77 * TAG-DIRECTIVE("!","!foo")
78 * TAG-DIRECTIVE("!yaml","tag:yaml.org,2002:")
82 * Note that the VERSION-DIRECTIVE and TAG-DIRECTIVE tokens occupy a whole
85 * The document start and end indicators are represented by:
90 * Note that if a YAML stream contains an implicit document (without '---'
91 * and '...' indicators), no DOCUMENT-START and DOCUMENT-END tokens will be
94 * In the following examples, we present whole documents together with the
97 * 1. An implicit document:
103 * STREAM-START(utf-8)
104 * SCALAR("a scalar",single-quoted)
107 * 2. An explicit document:
115 * STREAM-START(utf-8)
117 * SCALAR("a scalar",single-quoted)
121 * 3. Several documents in a stream:
127 * 'yet another scalar'
131 * STREAM-START(utf-8)
132 * SCALAR("a scalar",single-quoted)
134 * SCALAR("another scalar",single-quoted)
136 * SCALAR("yet another scalar",single-quoted)
139 * We have already introduced the SCALAR token above. The following tokens are
140 * used to describe aliases, anchors, tag, and scalars:
145 * SCALAR(value,style)
147 * The following series of examples illustrate the usage of these tokens:
149 * 1. A recursive sequence:
155 * STREAM-START(utf-8)
157 * FLOW-SEQUENCE-START
162 * 2. A tagged scalar:
164 * !!float "3.14" # A good approximation.
168 * STREAM-START(utf-8)
170 * SCALAR("3.14",double-quoted)
173 * 3. Various scalar styles:
175 * --- # Implicit empty plain scalars do not produce tokens.
177 * --- 'a single-quoted scalar'
178 * --- "a double-quoted scalar"
187 * STREAM-START(utf-8)
190 * SCALAR("a plain scalar",plain)
192 * SCALAR("a single-quoted scalar",single-quoted)
194 * SCALAR("a double-quoted scalar",double-quoted)
196 * SCALAR("a literal scalar",literal)
198 * SCALAR("a folded scalar",folded)
201 * Now it's time to review collection-related tokens. We will start with
204 * FLOW-SEQUENCE-START
212 * The tokens FLOW-SEQUENCE-START, FLOW-SEQUENCE-END, FLOW-MAPPING-START, and
213 * FLOW-MAPPING-END represent the indicators '[', ']', '{', and '}'
214 * correspondingly. FLOW-ENTRY represent the ',' indicator. Finally the
215 * indicators '?' and ':', which are used for denoting mapping keys and values,
216 * are represented by the KEY and VALUE tokens.
218 * The following examples show flow collections:
220 * 1. A flow sequence:
222 * [item 1, item 2, item 3]
226 * STREAM-START(utf-8)
227 * FLOW-SEQUENCE-START
228 * SCALAR("item 1",plain)
230 * SCALAR("item 2",plain)
232 * SCALAR("item 3",plain)
239 * a simple key: a value, # Note that the KEY token is produced.
240 * ? a complex key: another value,
245 * STREAM-START(utf-8)
248 * SCALAR("a simple key",plain)
250 * SCALAR("a value",plain)
253 * SCALAR("a complex key",plain)
255 * SCALAR("another value",plain)
260 * A simple key is a key which is not denoted by the '?' indicator. Note that
261 * the Scanner still produce the KEY token whenever it encounters a simple key.
263 * For scanning block collections, the following tokens are used (note that we
264 * repeat KEY and VALUE here):
266 * BLOCK-SEQUENCE-START
267 * BLOCK-MAPPING-START
273 * The tokens BLOCK-SEQUENCE-START and BLOCK-MAPPING-START denote indentation
274 * increase that precedes a block collection (cf. the INDENT token in Python).
275 * The token BLOCK-END denote indentation decrease that ends a block collection
276 * (cf. the DEDENT token in Python). However YAML has some syntax pecularities
277 * that makes detections of these tokens more complex.
279 * The tokens BLOCK-ENTRY, KEY, and VALUE are used to represent the indicators
280 * '-', '?', and ':' correspondingly.
282 * The following examples show how the tokens BLOCK-SEQUENCE-START,
283 * BLOCK-MAPPING-START, and BLOCK-END are emitted by the Scanner:
285 * 1. Block sequences:
298 * STREAM-START(utf-8)
299 * BLOCK-SEQUENCE-START
301 * SCALAR("item 1",plain)
303 * SCALAR("item 2",plain)
305 * BLOCK-SEQUENCE-START
307 * SCALAR("item 3.1",plain)
309 * SCALAR("item 3.2",plain)
312 * BLOCK-MAPPING-START
314 * SCALAR("key 1",plain)
316 * SCALAR("value 1",plain)
318 * SCALAR("key 2",plain)
320 * SCALAR("value 2",plain)
327 * a simple key: a value # The KEY token is produced here.
339 * STREAM-START(utf-8)
340 * BLOCK-MAPPING-START
342 * SCALAR("a simple key",plain)
344 * SCALAR("a value",plain)
346 * SCALAR("a complex key",plain)
348 * SCALAR("another value",plain)
350 * SCALAR("a mapping",plain)
351 * BLOCK-MAPPING-START
353 * SCALAR("key 1",plain)
355 * SCALAR("value 1",plain)
357 * SCALAR("key 2",plain)
359 * SCALAR("value 2",plain)
362 * SCALAR("a sequence",plain)
364 * BLOCK-SEQUENCE-START
366 * SCALAR("item 1",plain)
368 * SCALAR("item 2",plain)
373 * YAML does not always require to start a new block collection from a new
374 * line. If the current line contains only '-', '?', and ':' indicators, a new
375 * block collection may start at the current line. The following examples
376 * illustrate this case:
378 * 1. Collections in a sequence:
389 * STREAM-START(utf-8)
390 * BLOCK-SEQUENCE-START
392 * BLOCK-SEQUENCE-START
394 * SCALAR("item 1",plain)
396 * SCALAR("item 2",plain)
399 * BLOCK-MAPPING-START
401 * SCALAR("key 1",plain)
403 * SCALAR("value 1",plain)
405 * SCALAR("key 2",plain)
407 * SCALAR("value 2",plain)
410 * BLOCK-MAPPING-START
412 * SCALAR("complex key")
414 * SCALAR("complex value")
419 * 2. Collections in a mapping:
430 * STREAM-START(utf-8)
431 * BLOCK-MAPPING-START
433 * SCALAR("a sequence",plain)
435 * BLOCK-SEQUENCE-START
437 * SCALAR("item 1",plain)
439 * SCALAR("item 2",plain)
442 * SCALAR("a mapping",plain)
444 * BLOCK-MAPPING-START
446 * SCALAR("key 1",plain)
448 * SCALAR("value 1",plain)
450 * SCALAR("key 2",plain)
452 * SCALAR("value 2",plain)
457 * YAML also permits non-indented sequences if they are included into a block
458 * mapping. In this case, the token BLOCK-SEQUENCE-START is not produced:
461 * - item 1 # BLOCK-SEQUENCE-START is NOT produced here.
466 * STREAM-START(utf-8)
467 * BLOCK-MAPPING-START
469 * SCALAR("key",plain)
472 * SCALAR("item 1",plain)
474 * SCALAR("item 2",plain)
482 #include <yaml/yaml.h>
487 * Public API declarations.
490 YAML_DECLARE(yaml_token_t *)
491 yaml_parser_get_token(yaml_parser_t *parser);
493 YAML_DECLARE(yaml_token_t *)
494 yaml_parser_peek_token(yaml_parser_t *parser);
497 * High-level token API.
501 yaml_parser_fetch_more_tokens(yaml_parser_t *parser);
504 yaml_parser_fetch_next_token(yaml_parser_t *parser);
507 * Potential simple keys.
511 yaml_parser_stale_simple_keys(yaml_parser_t *parser);
514 yaml_parser_save_simple_key(yaml_parser_t *parser);
517 yaml_parser_remove_simple_key(yaml_parser_t *parser);
520 * Indentation treatment.
524 yaml_parser_add_indent(yaml_parser_t *parser);
527 yaml_parser_remove_indent(yaml_parser_t *parser);
534 yaml_parser_fetch_stream_start(yaml_parser_t *parser);
537 yaml_parser_fetch_stream_end(yaml_parser_t *parser);
540 yaml_parser_fetch_directive(yaml_parser_t *parser);
543 yaml_parser_fetch_document_start(yaml_parser_t *parser);
546 yaml_parser_fetch_document_end(yaml_parser_t *parser);
549 yaml_parser_fetch_document_indicator(yaml_parser_t *parser,
550 yaml_token_type_t type);
553 yaml_parser_fetch_flow_sequence_start(yaml_parser_t *parser);
556 yaml_parser_fetch_flow_mapping_start(yaml_parser_t *parser);
559 yaml_parser_fetch_flow_collection_start(yaml_parser_t *parser,
560 yaml_token_type_t type);
563 yaml_parser_fetch_flow_sequence_end(yaml_parser_t *parser);
566 yaml_parser_fetch_flow_mapping_end(yaml_parser_t *parser);
569 yaml_parser_fetch_flow_collection_end(yaml_parser_t *parser,
570 yaml_token_type_t type);
573 yaml_parser_fetch_flow_entry(yaml_parser_t *parser);
576 yaml_parser_fetch_block_entry(yaml_parser_t *parser);
579 yaml_parser_fetch_key(yaml_parser_t *parser);
582 yaml_parser_fetch_value(yaml_parser_t *parser);
585 yaml_parser_fetch_alias(yaml_parser_t *parser);
588 yaml_parser_fetch_anchor(yaml_parser_t *parser);
591 yaml_parser_fetch_tag(yaml_parser_t *parser);
594 yaml_parser_fetch_literal_scalar(yaml_parser_t *parser);
597 yaml_parser_fetch_folded_scalar(yaml_parser_t *parser);
600 yaml_parser_fetch_block_scalar(yaml_parser_t *parser, int literal);
603 yaml_parser_fetch_single_quoted_scalar(yaml_parser_t *parser);
606 yaml_parser_fetch_double_quoted_scalar(yaml_parser_t *parser);
609 yaml_parser_fetch_flow_scalar(yaml_parser_t *parser, int single);
612 yaml_parser_fetch_plain_scalar(yaml_parser_t *parser);
619 yaml_parser_scan_to_next_token(yaml_parser_t *parser);
621 static yaml_token_t *
622 yaml_parser_scan_directive(yaml_parser_t *parser);
625 yaml_parser_scan_directive_name(yaml_parser_t *parser,
626 yaml_mark_t start_mark, yaml_char_t **name);
629 yaml_parser_scan_yaml_directive_value(yaml_parser_t *parser,
630 yaml_mark_t start_mark, int *major, int *minor);
633 yaml_parser_scan_yaml_directive_number(yaml_parser_t *parser,
634 yaml_mark_t start_mark, int *number);
637 yaml_parser_scan_tag_directive_value(yaml_parser_t *parser,
638 yaml_char_t **handle, yaml_char_t **prefix);
640 static yaml_token_t *
641 yaml_parser_scan_anchor(yaml_parser_t *parser,
642 yaml_token_type_t type);
644 static yaml_token_t *
645 yaml_parser_scan_tag(yaml_parser_t *parser);
648 yaml_parser_scan_tag_handle(yaml_parser_t *parser, int directive,
649 yaml_mark_t start_mark, yaml_char_t **handle);
652 yaml_parser_scan_tag_uri(yaml_parser_t *parser, int directive,
653 yaml_mark_t start_mark, yaml_char_t **url);
655 static yaml_token_t *
656 yaml_parser_scan_block_scalar(yaml_parser_t *parser, int literal);
659 yaml_parser_scan_block_scalar_indicators(yaml_parser_t *parser,
660 yaml_mark_t start_mark, int *chomping, int *increment);
662 static yaml_token_t *
663 yaml_parser_scan_flow_scalar(yaml_parser_t *parser, int single);
665 static yaml_token_t *
666 yaml_parser_scan_plain_scalar(yaml_parser_t *parser);