See the Front Matter page for more detailed information.
Prose includes novels, shorts stories, essays, etc.
The body will generally take the following structure (with a few exceptions):
There are several main tags that we use to mark up the structural elements of prose.
Divisions are often indicated by a chapter, section, etc. of a book. Nest as many divisions as necessary to properly represent the structure of the text (e.g., chapters, sections, etc.). Be sure to maintain consistency among the division types within the body. For example, in a book of poetry you might have <div type="book"> and then <div type="chapter"> (second-level division). Unless absolutely necessary, do not deviate at some point in the book and start using <div type="section"> as a second-level division. In most cases, this will be a third-level division:
All division tags will have a type attribute. The value of the type attribute will be one of following:
If none of these value correctly describe the section of text you are encoding, document the nature of the division in the VWWP Encoding Problems page.
Chapters are designated using divs, marked with an ID. The ID is formulated by ADD. The <div> tag encloses a chapter. Chapter titles (headings) are indicated using a <head> tag. Page breaks come within the chapter <div>. Chapters are the sections of a text directly below books, generally speaking.
See VWWP TEI P5 Encoding Guidelines for more information about headings.
Paragraphs are marked with a <p> tag. Paragraphs can be marked virtually anywhere in the text to mark a prose block. Paragraphs include <pb/> (page breaks), lists and tables. Paragraphs are extremely versatile and are used in a wide variety of text encoding situations. Generally speaking, if something is written as a paragraph, it can be marked as such. <div> tags cannot come within paragraphs, but <list> tags, <figure> tags, <pb/> tags, <note> tags, and many others can come within <p> tags. So, for instance, if a paragraph is broken up by a blank page and an image, as shown below, you do not need to close the paragraph to include these features. This allows you to maintain bibliographic accuracy.
See VWWP TEI P5 Encoding Guidelines for more information about photographs, graphics and other images.
See VWWP TEI P5 Encoding Guidelines for more information about lists.
See VWWP TEI P5 Encoding Guidelines for more information about tables.
Return to General Guidelines
Quotes are denoted by quotation marks. Only text that comes within quotation marks will be marked as a quotation for the purposes of encoding. There are two types of quotes: quotes that are external to the text and quotes that are internal. The quote element is used for passages that are external to the text, like a reference to a study or another book.[Internal quotes are quotes that are from inside the text (e.g., character speeches or thoughts, notes written by characters, or terms used in the book) and have various TEI elements to represent them.
Quotes that are External to the Text: Outside Sources and Other References
Quotes that come from outside the text are marked by first using a <cit> tag, to denote an external citation. Within the <cit> tag there are two smaller parts, <quote> and <bibl>. <quote> encompasses the body of the quote, or actual quoted text. The <bibl> tag encompasses any bibliographic reference given that identifies the source of the text, such as a title or author. For a more comprehensive discussion of the <bibl> tag, please see the <bibl> section of the guidelines. The <cit> tag denotes the citation as a unit, and the <quote> and <bibl> tags denote smaller portions of the larger unit. Quotes can also be marked with other tags, for instance, inside the <quote> tag, you can have an <l> tag to denote a line of poetry.
Sometimes, citations will occur within the text. In that case, you still use the <cit> tag and mark the quote as you normally would. You must remember, however, that all of the words within the <cit> must be within either a <bibl> or a <quote> tag. You do not need both <quote> and <bibl>, but you do need at least one.
Quotes that are Internal to the Text: Thought, Speech, Writing
Quotations in the text that indicate speech, thought, writing, etc. by one or more characters is marked by the various TEI elements. For instance, dialogue or notes written from one character to another would be indicated using this <q> element. The <q> tag will generally come inside of a set of <p> tags, since most dialogue is denoted within the text by setting it apart as a separate paragraph. Quotes can come within quotes, such as when one speaker quotes someone else. If there is an external quote inside an internal quote, for instance, a character quotes the bible, the correct tags will be used to delineate between the two distinct types of quotes. Sometimes, quotation marks
The emph, foreign, distinct, mentioned, term and soCalled values indicate that a quote is linguistically set a part. For instance, emph is used to denote special emphases placed on a word via quotation marks. The foreign tag indicates that quotation marks were used because the word is in a foreign language. The distinct tag signifies that the quote is in quotation marks because to set it apart from the rest of the text due to some linguistic peculiarity, slang, for instance, or regional dialect. Mentioned is used to indicate that the writer is talking about the word itself rather than using the word. For instance, talking about the part of speech of the word "canary." Term indicates that the word was put in quotation marks because it is a discipline or subject specific term. For example, if the author uses quotations to demarcate medical terminology, then the term type would be indicated. Finally, soCalled is used to indicate scare quotes. If the author removes him or herself from the word via quotation marks, then you mark the term as "soCalled." Below is a reference list of the different TEI elements used to mark up internal quotes:
Quick Reference, Quote Type
*spoken: A quote is spoken out loud by a character in the text. Use said,
*thought: A character thinks a quote, rather than saying it out loud. Use said.
*written: A character internal to the text has written something that is quoted within the text. Use quotation.
*emph: A word or phrase is in quotation marks in order to emphasize it. Use emph.
*distinct: A word or phrase is in quotes because it is linguistically distinct, so, for instance, it is slang or regional dialect. Use distinct.
*mentioned: A word or phrase is in quotes because the author refers to the word itself, such as in a discussion of its part of speech, rather than using the word. Use mentioned.
*term: A word or phrase is in quotes because it is terminology. For instance, computer terms or scientific language. Use term.
*foreign: A word or phrase is in quotation marks because it does not belong to the predominant language used in the text. Use foreign.
*soCalled: An author uses scare quotes to distance him or herself from a word. Use soCalled.
See VWWP TEI P5 Encoding Guidelines for more information about closers.
For more information on how to encode page breaks see the page break section of the general guidelines.
For how to encode the back matter of the text, see the back matter section.
If a part of the prose text that you are trying to encode does not fit one of the above described features, document the problem in the VWWP Encoding Problems page.