DTD - Guía rápida

La declaración de tipo de documento XML, comúnmente conocida como DTD, es una forma de describir con precisión el lenguaje XML. Los DTD comprueban la validez de la estructura y el vocabulario de un documento XML con las reglas gramaticales del lenguaje XML apropiado.

Un documento XML se puede definir como:

  • Well-formed- Si se adhiere el documento XML en todo el XML Normas generales tales como etiquetas deben estar correctamente anidados, apertura y de cierre deben estar equilibrados, y las etiquetas vacías deben terminar con '/>', entonces se llama como bien formada .

    O

  • Valid- Un documento XML que se dice que es válido cuando no solo está bien formado , sino que también se ajusta a la DTD disponible que especifica qué etiquetas utiliza, qué atributos pueden contener esas etiquetas y qué etiquetas pueden aparecer dentro de otras etiquetas, entre otras propiedades. .

El siguiente diagrama representa que se utiliza una DTD para estructurar el documento XML:

Tipos

DTD se puede clasificar en base a su declaración en el documento XML, como:

  • DTD interno

  • DTD externo

Cuando se declara una DTD dentro del archivo, se llama Internal DTD y si se declara en un archivo separado se llama External DTD.

Aprenderemos más sobre estos en el capítulo Sintaxis DTD

Caracteristicas

A continuación se presentan algunos puntos importantes que describe un DTD:

  • los elementos que pueden aparecer en un documento XML.

  • el orden en que pueden aparecer.

  • elementos opcionales y obligatorios.

  • atributos del elemento y si son opcionales u obligatorios.

  • si los atributos pueden tener valores predeterminados.

Ventajas de usar DTD

  • Documentation- Puede definir su propio formato para los archivos XML. Al mirar este documento, un usuario / desarrollador puede comprender la estructura de los datos.

  • Validation - Permite comprobar la validez de los archivos XML comprobando si los elementos aparecen en el orden correcto, los elementos y atributos obligatorios están en su lugar, los elementos y atributos no se han insertado de forma incorrecta, etc.

Desventajas de usar DTD

  • No admite los espacios de nombres. El espacio de nombres es un mecanismo mediante el cual los nombres de elementos y atributos se pueden asignar a grupos. Sin embargo, en una DTD, los espacios de nombres deben definirse dentro de la DTD, lo que viola el propósito de usar espacios de nombres.

  • Solo admite el tipo de datos de cadena de texto.

  • No está orientado a objetos. Por tanto, el concepto de herencia no se puede aplicar a los DTD.

  • Posibilidades limitadas para expresar la cardinalidad de los elementos.

Una DTD XML se puede especificar dentro del documento o se puede guardar en un documento separado y luego el documento se puede vincular al documento DTD para usarlo.

Sintaxis

La sintaxis básica de un DTD es la siguiente:

<!DOCTYPE element DTD identifier
[
   declaration1
   declaration2
   ........
]>

En la sintaxis anterior:

  • DTD comienza con el delimitador <! DOCTYPE.

  • Un element le dice al analizador que analice el documento desde el elemento raíz especificado.

  • DTD identifieres un identificador para la definición del tipo de documento, que puede ser la ruta a un archivo en el sistema o la URL a un archivo en Internet. Si el DTD apunta a una ruta externa, se llamaexternal subset.

  • los square brackets [ ] adjuntar una lista opcional de declaraciones de entidad llamada internal subset.

DTD interno

Una DTD se denomina DTD interna si los elementos se declaran dentro de los archivos XML. Para hacer referencia a él como DTD interno, el atributo independiente en la declaración XML debe establecerse enyes. Esto significa que la declaración funciona independientemente de una fuente externa.

Sintaxis

La sintaxis del DTD interno es como se muestra:

<!DOCTYPE root-element [element-declarations]>

donde elemento-raíz es el nombre del elemento raíz y declaraciones de elementos es donde declara los elementos.

Ejemplo

A continuación se muestra un ejemplo simple de DTD interno:

<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>

<!DOCTYPE address [
   <!ELEMENT address (name,company,phone)>
   <!ELEMENT name (#PCDATA)>
   <!ELEMENT company (#PCDATA)>
   <!ELEMENT phone (#PCDATA)>
]>

<address>
   <name>Tanmay Patil</name>
   <company>TutorialsPoint</company>
   <phone>(011) 123-4567</phone>
</address>

Repasemos el código anterior:

Start Declaration - Comience la declaración XML con la siguiente declaración.

<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>

DTD- Inmediatamente después del encabezado XML, sigue la declaración del tipo de documento , comúnmente conocida como DOCTYPE -

<!DOCTYPE address [

La declaración DOCTYPE tiene un signo de exclamación (!) Al comienzo del nombre del elemento. El DOCTYPE informa al analizador que un DTD está asociado con este documento XML.

DTD Body - La declaración DOCTYPE va seguida del cuerpo de la DTD, donde declaras elementos, atributos, entidades y notaciones -

<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone_no (#PCDATA)>

Aquí se declaran varios elementos que componen el vocabulario del documento <nombre>. <! ELEMENT name (#PCDATA)> define el nombre del elemento como de tipo "#PCDATA". Aquí #PCDATA significa datos de texto analizables.

End Declaration- Finalmente, la sección de declaración del DTD se cierra mediante un paréntesis de cierre y un paréntesis angular de cierre (]>). Esto efectivamente finaliza la definición y, a partir de entonces, el documento XML sigue inmediatamente.

Reglas

  • La declaración del tipo de documento debe aparecer al principio del documento (precedida solo por el encabezado XML); no está permitido en ningún otro lugar dentro del documento.

  • De manera similar a la declaración DOCTYPE, las declaraciones de elementos deben comenzar con un signo de exclamación.

  • El nombre en la declaración del tipo de documento debe coincidir con el tipo de elemento del elemento raíz.

DTD externo

En DTD externo, los elementos se declaran fuera del archivo XML. Se accede a ellos especificando los atributos del sistema, que pueden ser el archivo .dtd legal o una URL válida. Para hacer referencia a él como DTD externo, el atributo independiente en la declaración XML debe establecerse comono. Esto significa que la declaración incluye información de la fuente externa.

Sintaxis

A continuación se muestra la sintaxis para DTD externa:

<!DOCTYPE root-element SYSTEM "file-name">

donde nombre-archivo es el archivo con extensión .dtd .

Ejemplo

El siguiente ejemplo muestra el uso de DTD externo:

<?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?>
<!DOCTYPE address SYSTEM "address.dtd">

<address>
  <name>Tanmay Patil</name>
  <company>TutorialsPoint</company>
  <phone>(011) 123-4567</phone>
</address>

El contenido del archivo DTD address.dtd son como se muestra -

<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>

Tipos

Puede hacer referencia a un DTD externo utilizando system identifiers o public identifiers.

System Identifiers

Un identificador del sistema le permite especificar la ubicación de un archivo externo que contiene declaraciones DTD. La sintaxis es la siguiente:

<!DOCTYPE name SYSTEM "address.dtd" [...]>

Como puede ver, contiene la palabra clave SYSTEM y una referencia de URI que apunta a la ubicación del documento.

Public Identifiers

Los identificadores públicos proporcionan un mecanismo para localizar recursos DTD y están escritos como se muestra a continuación:

<!DOCTYPE name PUBLIC "-//Beginning XML//DTD Address Example//EN">

Como puede ver, comienza con la palabra clave PUBLIC, seguida de un identificador especializado. Los identificadores públicos se utilizan para identificar una entrada en un catálogo. Los identificadores públicos pueden seguir cualquier formato, sin embargo, un formato de uso común se llama identificadores públicos formales o FPI.

Este capítulo tratará sobre los componentes XML desde la perspectiva de DTD. Un DTD básicamente contendrá declaraciones de los siguientes componentes XML:

  • Element

  • Attributes

  • Entities

Elementos

Los elementos XML se pueden definir como bloques de construcción de un documento XML. Los elementos pueden comportarse como un contenedor para contener texto, elementos, atributos, objetos multimedia o una combinación de todos.

Cada documento XML contiene uno o más elementos, cuyos límites están delimitados por etiquetas de inicio y etiquetas de fin, o elementos vacíos.

Ejemplo

A continuación se muestra un ejemplo simple de elementos XML

<name>
   Tutorials Point
</name>

Como puede ver, hemos definido una etiqueta <name>. Hay un texto entre la etiqueta inicial y final de <nombre>. Los elementos, cuando se utilizan en un XML-DTD, deben declararse, lo que se discutirá en detalle en el capítulo Elementos DTD .

Atributos

Los atributos son parte de los elementos XML. Un elemento puede tener cualquier número de atributos únicos. Los atributos brindan más información sobre el elemento XML o, más precisamente, define una propiedad del elemento. Un atributo XML es siempre un par nombre-valor .

Ejemplo

A continuación se muestra un ejemplo simple de atributos XML:

<img src = "flower.jpg"/>

Aquí img es el nombre del elemento, mientras que src es un nombre de atributo y flower.jpg es un valor dado para el atributo src .

Si se utilizan atributos en una DTD XML, estos deben declararse, lo que se discutirá en detalle en el capítulo Atributos de DTD

Entidades

Las entidades son marcadores de posición en XML. Estos se pueden declarar en el prólogo del documento o en un DTD. Las entidades se pueden clasificar principalmente como:

  • Entidades integradas

  • Entidades de carácter

  • Entidades generales

  • Entidades de parámetros

Hay cinco entidades integradas que se reproducen en XML bien formado, son:

  • ampersand: & amp;

  • Comillas simples: & apos;

  • Mayor que: & gt;

  • Menos de: & lt;

  • Comillas dobles: & quot;

Estudiaremos más sobre las declaraciones de entidades en XML DTD en detalle en el capítulo Entidades DTD

Los elementos XML se pueden definir como bloques de construcción de un documento XML. Los elementos pueden comportarse como un contenedor para contener texto, elementos, atributos, objetos multimedia o una combinación de todos.

Un elemento DTD se declara con una declaración ELEMENT. Cuando un archivo XML es validado por DTD, el analizador inicialmente busca el elemento raíz y luego se validan los elementos secundarios.

Sintaxis

Todas las declaraciones de elementos DTD tienen esta forma general:

<!ELEMENT elementname (content)>
  • La declaración ELEMENT se usa para indicar al analizador que está a punto de definir un elemento.

  • elementname es el nombre del elemento (también llamado identificador genérico ) que está definiendo.

  • content define qué contenido (si lo hay) puede ir dentro del elemento.

Tipos de contenido de elementos

El contenido de la declaración de elementos en un DTD se puede clasificar de la siguiente manera:

  • Contenido vacío

  • Contenido del elemento

  • Contenido mixto

  • Cualquier contenido

Contenido vacío

Este es un caso especial de declaración de elementos. Esta declaración de elemento no tiene ningún contenido. Estos se declaran con la palabra claveEMPTY.

Syntax

A continuación se muestra la sintaxis para la declaración de elementos vacíos:

<!ELEMENT elementname EMPTY >

En la sintaxis anterior:

  • ELEMENTes la declaración de elemento de la categoría VACÍO

  • elementname es el nombre del elemento vacío.

Example

A continuación se muestra un ejemplo simple que demuestra la declaración de un elemento vacío:

<?xml version = "1.0"?>

<!DOCTYPE hr[
   <!ELEMENT address EMPTY>    
]>
<address />

En este ejemplo, la dirección se declara como un elemento vacío. El marcado para el elemento de dirección aparecería como <dirección />.

Element Content

In element declaration with element content, the content would be allowable elements within parentheses. We can also include more than one element.

Syntax

Following is a syntax of element declaration with element content −

<!ELEMENT elementname (child1, child2...)>
  • ELEMENT is the element declaration tag

  • elementname is the name of the element.

  • child1, child2.. are the elements and each element must have its own definition within the DTD.

Example

Below example demonstrates a simple example for element declaration with element content −

<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>

<!DOCTYPE address [
   <!ELEMENT address (name,company,phone)>
   <!ELEMENT name (#PCDATA)>
   <!ELEMENT company (#PCDATA)>
   <!ELEMENT phone (#PCDATA)>
]>

<address>
   <name>Tanmay Patil</name>
   <company>TutorialsPoint</company>
   <phone>(011) 123-4567</phone>
</address>

In the above example, address is the parent element and name, company and phone_no are its child elements.

List of Operators and Syntax Rules

Below table shows the list of operators and syntax rules which can be applied in defining child elements −

Operator Syntax Description Example
+ <!ELEMENT element-name (child1+)> It indicates that child element can occur one or more times inside parent element.

<!ELEMENT address (name+)>

Child element name can occur one or more times inside the element name address.

* <!ELEMENT element-name (child1*)> It indicates that child element can occur zero or more times inside parent element.

<!ELEMENT address (name*)>

Child element name can occur zero or more times inside the element name address.

? <!ELEMENT element-name (child1?)> It indicates that child element can occur zero or one time inside parent element.

<!ELEMENT address (name?)>

Child element name can occur zero or one time inside the element name address.

, <!ELEMENT element-name (child1, child2)> It gives sequence of child elements separated by comma which must be included in the the element-name.

<!ELEMENT address (name, company)>

Sequence of child elements name, company, which must occur in the same order inside the element name address.

| <!ELEMENT element-name (child1 | child2)> It allows making choices in the child element.

<!ELEMENT address (name | company)>

It allows you to choose either of child elements i.e. name or company, which must occur in inside the element name address.

Rules

We need to follow certain rules if there is more than one element content −

  • Sequences − Often the elements within DTD documents must appear in a distinct order. If this is the case, you define the content using a sequence.

    The declaration indicates that the <address> element must have exactly three children - <name>, <company>, and <phone> - and that they must appear in this order. For example −

<!ELEMENT address (name,company,phone)>
  • Choices − Suppose you need to allow one element or another, but not both. In such cases you must use the pipe (|) character. The pipe functions as an exclusive OR. For example −

<!ELEMENT address (mobile | landline)>

Mixed Element Content

This is the combination of (#PCDATA) and children elements. PCDATA stands for parsed character data, that is, text that is not markup. Within mixed content models, text can appear by itself or it can be interspersed between elements. The rules for mixed content models are similar to the element content as discussed in the previous section.

Syntax

Following is a generic syntax for mixed element content −

<!ELEMENT elementname (#PCDATA|child1|child2)*>
  • ELEMENT is the element declaration tag.

  • elementname is the name of the element.

  • PCDATA is the text that is not markup. #PCDATA must come first in the mixed content declaration.

  • child1, child2.. are the elements and each element must have its own definition within the DTD.

  • The operator (*) must follow the mixed content declaration if children elements are included

  • The (#PCDATA) and children element declarations must be separated by the (|) operator.

Example

Following is a simple example demonstrating the mixed content element declaration in a DTD.

<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>

<!DOCTYPE address [
   <!ELEMENT address (#PCDATA|name)*>
   <!ELEMENT name (#PCDATA)>
]>

<address>
   Here's a bit of text mixed up with the child element.
   <name>
      Tanmay Patil
   </name>
</address>

ANY Element Content

You can declare an element using the ANY keyword in the content. It is most often referred to as mixed category element. ANY is useful when you have yet to decide the allowable contents of the element.

Syntax

Following is the syntax for declaring elements with ANY content −

<!ELEMENT elementname ANY>

Here, the ANY keyword indicates that text (PCDATA) and/or any elements declared within the DTD can be used within the content of the <elementname> element. They can be used in any order any number of times. However, the ANY keyword does not allow you to include elements that are not declared within the DTD.

Example

Following is a simple example demonstrating the element declaration with ANY content −

<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>

<!DOCTYPE address [
   <!ELEMENT address ANY>
]>

<address>
   Here's a bit of sample text
</address>

In this chapter we will discuss about DTD Attributes. Attribute gives more information about an element or more precisely it defines a property of an element. An XML attribute is always in the form of a name-value pair. An element can have any number of unique attributes.

Attribute declaration is very much similar to element declarations in many ways except one; instead of declaring allowable content for elements, you declare a list of allowable attributes for each element. These lists are called ATTLIST declaration.

Syntax

Basic syntax of DTD attributes declaration is as follows −

<!ATTLIST element-name attribute-name attribute-type attribute-value>

In the above syntax −

  • The DTD attributes start with <!ATTLIST keyword if the element contains the attribute.

  • element-name specifies the name of the element to which the attribute applies.

  • attribute-name specifies the name of the attribute which is included with the element-name.

  • attribute-type defines the type of attributes. We will discuss more on this in the following sections.

  • attribute-value takes a fixed value that the attributes must define. We will discuss more on this in the following sections.

Example

Below is a simple example for attribute declaration in DTD −

<?xml version = "1.0"?>

<!DOCTYPE address [
   <!ELEMENT address ( name )>
   <!ELEMENT name ( #PCDATA )>
   <!ATTLIST name id CDATA #REQUIRED>
]>

<address>
   <name id = "123">Tanmay Patil</name>
</address>

Let us go through the above code −

  • Begin with the XML declaration with the following statement −

<?xml version = "1.0"?>
  • Immediately following the XML header is the document type declaration, commonly referred to as the DOCTYPE as shown below −

    The DOCTYPE informs the parser that a DTD is associated with this XML document. The DOCTYPE declaration has an exclamation mark (!) at the start of the element name.

<!DOCTYPE address [
  • Following is the body of DTD. Here we have declared element and attribute −

<!ELEMENT address ( name )>
<!ELEMENT name ( #PCDATA )>
  • Attribute id for the element name is defined as given below −

    Here attribute type is CDATA and its value is #REQUIRED.

<!ATTLIST name id CDATA #REQUIRED>

Rules of Attribute Declaration

  • All attributes used in an XML document must be declared in the Document Type Definition (DTD) using an Attribute-List Declaration

  • Attributes may only appear in start or empty tags.

  • The keyword ATTLIST must be in upper case

  • No duplicate attribute names will be allowed within the attribute list for a given element.

Attribute Types

When declaring attributes, you can specify how the processor should handle the data that appears in the value. We can categorize attribute types in three main categories −

  • String type

  • Tokenized types

  • Enumerated types

Following table provides a summary of the different attribute types −

Sr.No. Type & Description
1

CDATA

CDATA is character data (text and not markup). It is a String Attribute Type.

2

ID

It is a unique identifier of the attribute. It should not appear more than once. It is a Tokenized Attribute Type.

3

IDREF

It is used to reference an ID of another element. It is used to establish connections between elements. It is a Tokenized Attribute Type.

4

IDREFS

It is used to reference multiple ID's. It is a Tokenized Attribute Type.

5

ENTITY

It represents an external entity in the document. It is a Tokenized Attribute Type.

6

ENTITIES

It represents a list of external entities in the document. It is a Tokenized Attribute Type.

7

NMTOKEN

It is similar to CDATA and the attribute value consists of a valid XML name. It is a Tokenized Attribute Type.

8

NMTOKENS

It is similar to CDATA and the attribute value consists a list of valid XML name. It is a Tokenized Attribute Type.

9

NOTATION

An element will be referenced to a notation declared in the DTD document. It is an Enumerated Attribute Type.

10

Enumeration

It allows defining a specific list of values where one of the values must match. It is an Enumerated Attribute Type.

Attribute Value Declaration

Within each attribute declaration, you must specify how the value will appear in the document. You can specify if an attribute −

  • can have a default value

  • can have a fixed value

  • is required

  • is implied

Default Values

It contains the default value. The values can be enclosed in single quotes(') or double quotes(").

Syntax

Following is the syntax of value −

<!ATTLIST element-name attribute-name attribute-type "default-value">

where default-value is the attribute value defined.

Example

Following is a simple example of attribute declaration with default value −

<?xml version = "1.0"?>

<!DOCTYPE address [
   <!ELEMENT address ( name )>
   <!ELEMENT name ( #PCDATA )>
   <!ATTLIST name id CDATA "0">
]>

<address>
   <name id = "123">
      Tanmay Patil
   </name>
</address>

In this example we have name element with attribute id whose default value is 0. The default value is been enclosed within the double quotes.

FIXED Values

#FIXED keyword followed by the fixed value is used when you want to specify that the attribute value is constant and cannot be changed. A common use of fixed attributes is specifying version numbers.

Syntax

Following is the syntax of fixed values −

<!ATTLIST element-name attribute-name attribute-type #FIXED "value" >

where #FIXED is an attribute value defined.

Example

Following is a simple example of attribute declaration with FIXED value −

<?xml version = "1.0"?>

<!DOCTYPE address [
  <!ELEMENT address (company)*>
  <!ELEMENT company (#PCDATA)>
  <!ATTLIST company name NMTOKEN #FIXED "tutorialspoint">
]>

<address>
  <company name = "tutorialspoint">we are a free online teaching faculty</company>
</address>

In this example we have used the keyword #FIXED where it indicates that the value "tutorialspoint" is the only value for the attribute name of element <company>. If we try to change the attribute value then it gives an error.

Following is an invalid DTD −

<?xml version = "1.0"?>

<!DOCTYPE address [
  <!ELEMENT address (company)*>
  <!ELEMENT company (#PCDATA)>
  <!ATTLIST company name NMTOKEN #FIXED "tutorialspoint">
]>

<address>
  <company name = "abc">we are a free online teaching faculty</company>
</address>

REQUIRED values

Whenever you want specify that an attribute is required, use #REQUIRED keyword.

Syntax

Following is the syntax of #REQUIRED −

<!ATTLIST element-name attribute-name attribute-type #REQUIRED>

where #REQUIRED is an attribute type defined.

Example

Following is a simple example of DTD attribute declaration with #REQUIRED keyword −

<?xml version = "1.0"?>

<!DOCTYPE address [
   <!ELEMENT address ( name )>
   <!ELEMENT name ( #PCDATA )>
   <!ATTLIST name id CDATA #REQUIRED>
]>

<address>
   <name id = "123">
      Tanmay Patil
   </name>
</address>

In this example we have used #REQUIRED keyword to specify that the attribute id must be provided for the element-name name

IMPLIED Values

When declaring attributes you must always specify a value declaration. If the attribute you are declaring has no default value, has no fixed value, and is not required, then you must declare that the attribute as implied. Keyword #IMPLIED is used to specify an attribute as implied.

Syntax

Following is the syntax of #IMPLIED −

<!ATTLIST element-name attribute-name attribute-type #IMPLIED>

where #IMPLIED is an attribute type defined.

Example

Following is a simple example of #IMPLIED

<?xml version = "1.0"?>

<!DOCTYPE address [
   <!ELEMENT address ( name )>
   <!ELEMENT name ( #PCDATA )>
   <!ATTLIST name id CDATA #IMPLIED>
]>

<address>
   <name />
</address>

In this example we have used the keyword #IMPLIED as we do not want to specify any attributes to be included in element name. It is optional.

Entities are used to define shortcuts to special characters within the XML documents. Entities can be primarily of four types −

  • Built-in entities

  • Character entities

  • General entities

  • Parameter entities

Entity Declaration Syntax

In general, entities can be declared internally or externally. Let us understand each of these and their syntax as follows −

Internal Entity

If an entity is declared within a DTD it is called as internal entity.

Syntax

Following is the syntax for internal entity declaration −

<!ENTITY entity_name "entity_value">

In the above syntax −

  • entity_name is the name of entity followed by its value within the double quotes or single quote.

  • entity_value holds the value for the entity name.

  • The entity value of the Internal Entity is de-referenced by adding prefix & to the entity name i.e. &entity_name.

Example

Following is a simple example for internal entity declaration −

<?xml version = "1.0" encoding = "UTF-8" standalone = "yes"?>

<!DOCTYPE address [
   <!ELEMENT address (#PCDATA)>
   <!ENTITY name "Tanmay patil">
   <!ENTITY company "TutorialsPoint">
   <!ENTITY phone_no "(011) 123-4567">
]>

<address>
   &name;
   &company;
   &phone_no;
</address>

In the above example, the respective entity names name, company and phone_no are replaced by their values in the XML document. The entity values are de-referenced by adding prefix & to the entity name.

Save this file as sample.xml and open it in any browser, you will notice that the entity values for name, company, phone_no are replaced respectively.

External Entity

If an entity is declared outside a DTD it is called as external entity. You can refer to an external Entity by either using system identifiers or public identifiers.

Syntax

Following is the syntax for External Entity declaration −

<!ENTITY name SYSTEM "URI/URL">

In the above syntax −

  • name is the name of entity.

  • SYSTEM is the keyword.

  • URI/URL is the address of the external source enclosed within the double or single quotes.

Types

You can refer to an external DTD by either using −

  • System Identifiers − A system identifier enables you to specify the location of an external file containing DTD declarations.

    As you can see it contains keyword SYSTEM and a URI reference pointing to the document's location. Syntax is as follows −

<!DOCTYPE name SYSTEM "address.dtd" [...]>
  • Public Identifiers − Public identifiers provide a mechanism to locate DTD resources and are written as below −

    As you can see, it begins with keyword PUBLIC, followed by a specialized identifier. Public identifiers are used to identify an entry in a catalog. Public identifiers can follow any format; however, a commonly used format is called Formal Public Identifiers, or FPIs.

<!DOCTYPE name PUBLIC "-//Beginning XML//DTD Address Example//EN">

Example

Let us understand the external entity with the following example −

<?xml version = "1.0" encoding = "UTF-8" standalone = "yes"?>
<!DOCTYPE address SYSTEM "address.dtd">

<address>
   <name>
      Tanmay Patil
   </name>
   
   <company>
      TutorialsPoint
   </company>
   
   <phone>
      (011) 123-4567
   </phone>
</address>

Below is the content of the DTD file address.dtd

<!ELEMENT address (name, company, phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>

Built-in entities

All XML parsers must support built-in entities. In general, you can use these entity references anywhere. You can also use normal text within the XML document, such as in element contents and attribute values.

There are five built-in entities that play their role in well-formed XML, they are −

  • ampersand: &amp;

  • Single quote: &apos;

  • Greater than: &gt;

  • Less than: &lt;

  • Double quote: &quot;

Example

Following example demonstrates the built-in entity declaration −

<?xml version = "1.0"?>

<note>
   <description>I'm a technical writer & programmer</description>
<note>

As you can see here the &amp; character is replaced by & whenever the processor encounters this.

Character entities

Character Entities are used to name some of the entities which are symbolic representation of information i.e characters that are difficult or impossible to type can be substituted by Character Entities.

Example

Following example demonstrates the character entity declaration −

<?xml version = "1.0" encoding = "UTF-8" standalone = "yes"?>
<!DOCTYPE author[
   <!ELEMENT author (#PCDATA)>
   <!ENTITY writer "Tanmay patil">
   <!ENTITY copyright "&#169;">
]>
<author>&writer;&copyright;</author>

You will notice here we have used &#169; as value for copyright character. Save this file as sample.xml and open it in your browser and you will see that copyright is replaced by the character ©.

General entities

General entities must be declared within the DTD before they can be used within an XML document. Instead of representing only a single character, general entities can represent characters, paragraphs, and even entire documents.

Syntax

To declare a general entity, use a declaration of this general form in your DTD −

<!ENTITY ename "text">

Example

Following example demonstrates the general entity declaration −

<?xml version = "1.0"?>

<!DOCTYPE note [
   <!ENTITY source-text "tutorialspoint">
]>

<note>
   &source-text;
</note>

Whenever an XML parser encounters a reference to source-text entity, it will supply the replacement text to the application at the point of the reference.

Parameter entities

The purpose of a parameter entity is to enable you to create reusable sections of replacement text.

Syntax

Following is the syntax for parameter entity declaration −

<!ENTITY % ename "entity_value">
  • entity_value is any character that is not an '&', '%' or ' " '.

Example

Following example demonstrates the parameter entity declaration. Suppose you have element declarations as below −

<!ELEMENT residence (name, street, pincode, city, phone)>
<!ELEMENT apartment (name, street, pincode, city, phone)>
<!ELEMENT office (name, street, pincode, city, phone)>
<!ELEMENT shop (name, street, pincode, city, phone)>

Now suppose you want to add additional eleement country, then then you need to add it to all four declarations. Hence we can go for a parameter entity reference. Now using parameter entity reference the above example will be −

<!ENTITY % area "name, street, pincode, city">
<!ENTITY % contact "phone">

Parameter entities are dereferenced in the same way as a general entity reference, only with a percent sign instead of an ampersand −

<!ELEMENT residence (%area;, %contact;)>
<!ELEMENT apartment (%area;, %contact;)>
<!ELEMENT office (%area;, %contact;)>
<!ELEMENT shop (%area;, %contact;)>

When the parser reads these declarations, it substitutes the entity's replacement text for the entity reference.

We use DTD to describe precisely the XML document. DTDs check the validity of structure and vocabulary of an XML document against the grammatical rules of the appropriate XML language. Now to check the validity of DTD, following procedures can be used −

  • Using XML DTD validation tools − You can use some IDEs such as XML Spy (not free) and XMLStarlet(opensource) can be used to validate XML files against DTD document.

  • Using XML DTD on-line validators − W3C Markup Validation Service is designed to validate Web documents. Use the online validator to check the validaty of your XML DTD here.

  • Write your own XML validators with XML DTD validation API − Newer versions of JDK (above 1.4) support XML DTD validation API. You can write your own validator code to check the validity of XML DTD validation.