|  | 
 | ||||
|        | |||||
|  | |||||
| Related Syntax | Related Concepts | ||||
| control structure | do sgml-parse | ||||
Syntax
do sgml-parse document (with id-checking switch-expression)? 
                       (with utf-8 switch-expression)?
                       (creating sgml-dtds{string-expression})? scan string-source-expression
   local-declaration*
   action*
done
do sgml-parse subdocument (with id-checking switch-expression)? 
                          (creating sgml-dtds{string-expression})? scan string-source-expression
   local-declaration*
   action*
done
do sgml-parse instance (with document-element string-expression)? 
                       with (sgml-dtds{string-expression} | current sgml-dtd) 
                       (with id-checking switch-expression)? scan string-source-expression
   local-declaration*
   action*
done
    
 do sgml-parse is used to invoke the SGML parser. A number of activities
          must occur within a do sgml-parse block.
          
document, subdocument, or instance.
            
#content to consume the parsed markup, or a parse continuation operator
              (%c or suppress) to initiate processing of the data by markup
                rules.
          
 The simplest use of do sgml-parse is to process a complete SGML document:
          
  do sgml-parse document scan file "my-sgml.sgml"
     output "%c"
  done
            
        
 This assumes that the file mysgml.sgml contains an SGML document. If the DTD and the instance
          are in different files, they can be joined:
          
  do sgml-parse document scan file "my-dtd.dtd" || file "my-sgml.sgml"
     output "%c"
  done
            
      
 If the same DTD is to be used to parse several input instances, it is best to pre-compile the DTD and store
          it on the built-in sgml-dtds shelf:
          
  do sgml-parse document creating sgml-dtds{"my-dtd"} scan file "my-dtd.dtd" 
     suppress
  done
            
 
          If the instance file names are stored on a shelf my-instances, then each instance can then be
          processed in turn:
            repeat over my-instances
     do sgml-parse instance with sgml-dtds{"my-dtd"} scan file my-instances 
        output "%c"
     done
  again   
            
      
 A nested SGML parse can use the same DTD as an outer SGML parse to validate its own input: for instance,
          
  process
     using group "one"
     do sgml-parse document scan "<!doctype a ["
                              || "<!element a - - (b | #pcdata)*>"
                              || "<!element b - - (#pcdata)>]>"
                              || "<a><b>Hello, World!</b></a>"
        output "%c"
     done
  
  
  group "one"
     element "a"
        using group "b"
        do sgml-parse instance with current sgml-dtd scan "<a>Salut, Monde!</a>"
           output "%c"
        done
        output "%c"
  
  
     element "b"
        output "%c"
  
  
  group "b"
     element "a"
        output "%c"
            
        
 In this program, the SGML parse launched in the element rule for a inside group
          one uses the same DTD as the parse launched in the process rule.
      
 It is possible to parse a partial instance: a piece of data comprising an element from a DTD which is not
          the
        doctype element of that DTD. In this case, the element to be used as the effective doctype for parsing the data is specified using the document-element argument:
          
  do sgml-parse instance with document-element "lamb" with sgml-dtds{"my-dtd"} scan file "partinst.sgml" 
     output "%c"
  done
            
 
          The element's start and end tags can be present, or they can be omitted if the element allows. SGML comments,
          processing instructions and even marked sections can precede and follow the element's start and end tags, but
          anything else (particularly other elements, data, entity references or usemap declarations) is an
          error.
      
 do sgml-parse can be used to parse an SGML subdocument. Subdocument processing can only occur in the
          middle of parsing another SGML document that includes the subdocument reference. The concrete syntax defined
          by the document currently being processed is used to parse the subdocument. In accordance with the SGML
          standard, the subdocument's text must not contain an SGML declaration. 
        
 This is an example of how to make references to SGML subdocument entities trigger parsing of the subdocument
          entities. The source of the subdocument entity text in the example is assumed to be a file whose name is
          either the system identifier (provided by a library rule), the public text description portion of
          the
          public identifier, or the name of the entity (uppercased and with .ent file extension appended).
          
  external-data-entity #implied when entity is subdoc-entity
     local stream file-name
  
     output "subdoc depth exceeded!%n"
        when number of current subdocuments > 100
  
     do when entity is system
        set file-name to "%eq"
  
     else when entity is in-library
        set file-name to "%epq"
  
     else when entity is public
        do scan "%pq"
        match (["+-"] "//")? ((lookahead ! "//") any)* "//"
              [ \ " "]* " " "-//"?
              ((lookahead ! "//") any)* => public-text-description
           set file-name to public-text-description
        done
  
     else
        set file-name to "%uq.ent"
     done
  
     do sgml-parse subdocument scan file file-name
        output "%c"
     done
            
        
 Processing a subdocument increments the integer value returned by the number of current
          subdocuments (and decrements it when the action has finished), but OmniMark does not issue an error
          message when the subdocument nesting level exceeds that allowed by the concrete syntax or when subdoc no is specified by the concrete syntax.
      
 By default, OmniMark checks all SGML idref attributes to make sure they reference valid IDs. This checking may not be appropriate in processing a partial instance. It also takes time. It can
          be disabled using with id-checking followed by a switch expression. The following code will
          parse the specified document without checking IDREFs:
          
  do sgml-parse document with id-checking false scan file "my-sgml.sgml"
     output "%c"
  done
            
      
 SGML is an ASCII-based language. This means that character references greater than 127 (for example �) have no predefined encoding method appropriate to them. The OmniMark parser
          outputs character references between 128 and 255 as equivalent binary byte values. Character references
          greater than 255 cause a markup error.
        
If the document being processed contains numerical character references greater than 127, the parser can be instructed to output them as UTF-8 byte sequences. This will allow character references above 255 to be output as UTF-8 byte encodings. This is appropriate if, and only if, your output will be encoded and interpreted as a UTF-8 document.
 To turn on UTF-8 output of character references, use the with utf-8 modifier with a switch expression that evaluates to true:
          
  process
      do sgml-parse document with utf-8 true scan file "myfile.sgml"
          output "%c"
      done
            
        
Note that actual UTF-8 encoded characters in your input data are unaffected by this setting.
 Note that with utf-8 can only be used with a full document and not with a
        subdocument or instance parse. Subdocument processing inherits the UTF-8 setting of
          the parent parse.
      
When parsing a document, markup rules are fired as follows (if specified in your code):
sgml-declaration-end,
            
dtd-start,
            
dtd-end,
            
prolog-end,
            
epilog-start.
          
 The same rules fire when parsing a subdocument, except for sgml-declaration-end.
        
When compiling a DTD, markup rules are fired as follows (if specified in your code):
dtd-start,
            
dtd-end, and
            
prolog-end.
          
When parsing an instance, the following markup rules fire:
prolog-end,
            
epilog-start.
          
 As with subdocument, instance saves and resets the integer value returned by the number of current
          subdocuments and restores the saved value when the action is finished.
        
 do sgml-parse saves the current setting of sgml-in and sgml-out and
          restores them at the end of the parse.
        
 If there are errors in the SGML declaration or prolog (DTD), then prolog-in-error rule will be
          fired instead of prolog-end and the processing of the remaining input of do
            sgml-parse will terminate. Execution resumes in the actions following the most recently executed
          %c or suppress. However, the amount of input read is undefined in this situation.
          That is, OmniMark may choose to consume the entire input source, it may stop reading the input immediately, or
          it may do something in between. 
    
| Related Syntax | Related Concepts | 
Copyright © Stilo International plc, 1988-2008.