Introduction
The purpose for publishing and maintaining E&P Cataloguing
Standards is to facilitate and improve the long-term storage, discovery, and
retrieval for use of information irrespective of form or subject. The goal is to
ensure that all relevant information is found with as small a number of relevant
matches as possible -- high recall and high precision.
The Catalogue Standards (dubbed EPICAT) are maintained as an integral part of the POSC energy
eStandards. We invite the submission of suggested refinements or extensions for
consideration and possible inclusion in a future version of the catalogue
standards. Updated versions will be published
every six months during January and July. Proposals for enhancements and changes
received through the middle of May and November will be considered for the next
update cycle.
The current version of the catalogue standards is based on material submitted to POSC
beginning in March 2002 from work done by Shell Expro and continuing through
refinement submitted through the end of 2002 by Flare Consultants. The current
specifications are considered a draft baseline intended to evolve into stable
and complete industry standards. The catalogue standards cover a wide range of E&P
documents and digital data sources. To the extent possible, the catalogue
standards do not depend on or promulgate any specific organizational structure or technical tools.
POSC's
commitment to evolve and manage cataloguing standards developed during 2002
through three public workshops and an agreement by the POSC Data Store Solutions
Special Interest Group (DSS SIG) to be the user community and source of
requirements in this area.
High-Level Cataloguing Concepts
The motivation for implementing high-level E&P catalogues
was characterized by Shell Expro in their internal Discovery project as
"making a step change in the way Shell Expro manages knowledge, information
and data." A wide variety of problems were identified and addressed by
Shell Expro -- from excessive rework time, lack of clear responsibilities, poor
quality and accessibility, loss of knowledge due to lack of context, and lack of
adequate conventions and standards.
One fundamental principle that came from this work is known as
the Principle of Data Entropy:
The quality and accessibility of
knowledge, information, and data will always deteriorate with time in active
environments unless effort (= energy) is spent.
Another fundamental principle is known as the K-I-D Principle:
Knowledge, information, and data
(K-I-D) are a continuum and should be managed by a coordinated set of management
processes. Whether something is knowledge, information, or data depends largely
on the viewpoint of the reader.

Active E&P projects tend to be short-term, dynamic,
innovative, and creative, making information management virtually too hot to
handle. Project results, however, require controlled information management when
they are ready to move to long-term, static preservation for future learning and
re-use. In the critical phase of publishing project results, effort must be
expended to add order, context, and structure to the results.
Another aspect of the completeness of the cataloguing approach
is the Principle of Document and Digital Database Integration or the Extended
Document Concept:
An information item (or more
correctly, a K-I-D item) is a carrier of knowledge, information, and/or data
independent of representation of its content. This includes both documents
(digital and physical) and the results of digital database queries.
Finally, the contrast between content and context led to the
Principle of Context Integration:
Besides its content
(what?), an information item (a K-I-D item) needs accurate and integrated
context describing the resources involved with its creation. These include
people (who?), technology (how?), and process (why? and when?)
A successful high-level catalogue requires a common language,
expressed in best practices and standards, to improve information item quality
(usability, reliability, etc.) and accessibility for individual companies, for
operators and partners, and for exchanging documents regionally and globally.
POSC is committed to evolve and improve cataloguing standards. To accomplish
this, POSC is committed to cooperation with oil companies, solution providers,
as well as government, national, regional, and industry organizations.
What Constitutes a Catalogue Entry?
Peter Drucker is quoted as saying that, "Ninety percent of
communication issues happen at the functional boundaries." We have been
very focused on managing technical detail and technology in functional or
limited databases and file systems. The focus of the cataloguing initiative is
to create catalogue entries at a higher levels for information items that
exhibit key organizational roles, business processes and work products.
Thus, a catalogue entry for an information item (i.e., document
or digital database query result) consists of populating a set of catalogue
attributes with explicit data (e.g., title, author, date) and classifying data
based on reference vocabularies.
Ideally, all of us will use the same cataloguing standards. More
likely, each of us will use a core of common cataloguing attributes, usage
guidelines, and reference vocabularies. Cataloguing solutions can extend
the universal common usage with mappings with local extensions.
Catalogue Attributes
As of the end of 2002, the cataloguing standards published here
by POSC represent a point in a process of evolution and refinement. Most of the
catalogue attributes are well founded, responding to general standards for
document catalogues, such as the Dublin Core. Experience using the catalogue
attributes is growing from the initial base at Shell Expro to several other oil
companies. POSC encourages all oil companies to put the catalogue attributes to
the test and to share both successes and failures with POSC to help improve the
standards in future releases (planned for June and December at least through
2004).
The catalogue attributes are specified in seven groups, known as
sub-catalogue templates. The groupings are: bibliographic, contextual, usage,
system, control, coverage, and relationships. See the detailed definitions in
the specifications accessible through the link below.
Reference Vocabularies
The status of the reference vocabularies is less complete and
stable than that of the attributes. The published vocabularies are suggestive of
the form and level of granularity to be used, but in most cases are incomplete.
Efforts will continue to build up the vocabularies. In this effort, the re-use
of POSC and other existing standards is very important. One example is the
proposed re-use of an IETF and ISO based natural language standard to serve as
the vocabulary for the language catalogue attribute.
See the detailed definitions of the currently defined reference
vocabularies through the link below. Note that the business process reference
model is also accessible independently from the Specifications page on the Web
site. Note that the Information Item Type vocabulary is currently being reviewed
and will be published soon.
Access to the Catalogue Standards
The catalogue standards are presented as a single Web page that
traverses the hierarchy according to the traversal code presented in PDF and XML
formats. They are also presented in an interactive query and display
application.
Brief Contextual Attribute Tutorial
The several catalogue attributes of the Contextual Template need
some preliminary explanation. These attributes include seven single-purpose
attributes and one composite attribute. We can think of the composite attribute,
the Information Item Type (K-I-D Type, Document Type) as a generic reference to
all Information Items (Documents, Query Responses) that are fundamentally
similar in terms of context. In that sense, the Information Item Type can be
thought of as a reference title.
In the process of creating a catalogue entry for an information
item, identifying the most appropriate Information Item Type is a desirable
short-cut, because a specific Information Item Type defines most if not all of
the other Contextual attributes.
Briefly, the single-purpose Contextual attributes are:
-
Information Item Class - a fundamental classification
characterizing the purpose and basic content format
-
Information Type - a taxonomy of knowledge, information, and
data, e.g.,
-
Asset Type - a taxonomy of company assets
-
Producer
-
Consumer
An example:
Catalogue Implementation
There are a number of options and approaches to the
implementation of a high-level catalogue. Catalogue functionality may be
obtained from an Electronic Document Management System (EDMS) or from another
kind of database solution. A Catalogue may draw entries from an EDMS or may use
an EDMS as a dynamic source for its stored documents. A Catalogue may link with
one or more digital databases / data stores either for entries for stored
digital documents or entries for potential query responses.
User interfaces to a Catalogue may include query and access
capabilities. As for other kinds of query and access environments, both
text-based and map-based interfaces may be appropriate.
Access to underlying documents and query results may be enhanced
with suitable display, viewing, and complex application linkage capabilities.
The broader the coverage of a Catalogue, the more access entitlement
capabilities may be required.
Please consider sending us information about your implementation
approaches and results, so we can share them as guidelines with the
industry.