Monday 22 November 2010

Announcement: Classification & Ontology: International UDC Seminar 2011

Classification & Ontology
Classification & Ontology:
Formal Approaches and Access to Knowledge

The Hague, 19-20 September 2011

Following the success of the 2009 conference we are very pleased to announce the next in the series of biennial conferences devoted to the advancement of bibliographic classification research, organized by the UDC Consortium and hosted by Koninklijke Bibliotheek.

The difference between bibliographic knowledge classification schemes and ontologies resides in their particular purpose and levels of formality. However, they are both based on observation and reasoning and share some structural principles and elements: categories, concepts, properties, class relationships, roles.

The objective of this conference is to promote collaboration and exchange of expertise between different fields dealing with knowledge classifications: bibliographic, web and AI. We hope to learn more about methods in ontology modelling and whether these may be used to improve and formalise data models of bibliographic classifications and enhance their value in information discovery.

Papers are now invited covering the following topics:
  1. Modelling and representation of knowledge classifications
  2. Standards and solutions for innovative and high-quality classification data processing
  3. Applications and implementations of classification structures as ontologies
  4. Theoretical considerations of the role of knowledge classifications
The proposals should be of interest to academic and research communities dealing with conceptual modelling, information systems design, knowledge organization, knowledge engineering, semantic interoperability & information integration, and natural-language processing.

To read more about the invited topics and to submit contribution go to the conference website.

Paper proposal submission deadline: 30 January 2011.

Thursday 18 November 2010

On Lonclass, UDC and RDF

An interesting read about sharing and linking data on Dan Brickley's blog "danbri’s foaf stories: the web, the world, us, you and them" explaining some of the ideas behind the NoTube 'semantic television' project.

Snippets:

"Lonclass is one of the BBC’s in-house classification systems – the “London classification”. I’ve had the privilege of investigating Lonclass within the NoTube project. It’s not currently public, but much of what I say here is also applicable to the UDC classification system upon which it was based. UDC is also not fully public yet; I’ve made a case elsewhere that it should be, and I hope we’ll see that within my lifetime. UDC and Lonclass have a fascinating history and are rich cultural heritage artifacts in their own right, but I’m concerned here only with their role as the keys to many of our digital and real-world archives.

Why would we want to map Lonclass or UDC subject classification codes into RDF?

[....] The work needs to be shared, and RDF is currently our best bet on how to create such work sharing, meaning sharing, information-linking systems in the Web. The hierarchies in UDC and Lonclass don’t attempt to represent all of objective reality; they instead show paths through information.

[...] Classification systems with compositional semantics can be enriched when we map their basic terms using identifiers from other shared data sets. And those in the UDC/Lonclass tradition, while in some ways they’re showing their age (weird numeric codes, huge monolithic, hard-to-maintain databases), … are also amongst the most interesting systems we have today for navigating information, especially when combined with Linked Data techniques and companion datasets."

Friday 30 July 2010

2010 UDC Update meeting at IFLA

76th IFLA General Conference and Assembly
Colleagues who are going to be at 76th IFLA General Conference and Assembly "Open access to knowledge - promoting sustainable progress" in Gothenburg (Sweden) are cordially invited to join us at our traditional UDC Update Session on Friday 13 August 2010, 13:00-14.00, Room R2.

Sunday 28 March 2010

Mapping intricacies: UDC to DDC

Last week, I received an email from Yulia Skora (Ukraine) who was interested in the availability of the mapping between UDC Summary and the Summary of the Russian universal classification LBC (BBK - Библиотечно-библиографическая классификация in English: Library Bibliographic Classification) Summary. It reminded me of yet another challenging area of work. When responding to Yulia I realised that the issues with mapping, for instance, UDC Summary to Dewey Summaries [pdf] are often made more difficult because we have to deal with classification summaries in both systems and we cannot use a known exactMatch in many situations.

In 2008, following advice received from colleagues in the HILT project, two of our colleagues quickly mapped 1000 classes of Dewey Summaries to UDC Master Reference File as a whole. This appeared to be relatively simple. The mapping in this case is simply an answer to a question "and how would you say e.g. Art metal work in UDC?"

But when in 2009 we realised that we were going to release 2000 classes of UDC Summary as linked data, we decided to wait until we had our UDC Summary set defined and completed to be able to publish it mapped to the Dewey Summaries.

As we arrived at this stage, little did we realise how much more complex the reversed mapping of UDC Summary to Dewey Summaries would turn out to be.

Mapping the Dewey Summaries to UDC highlighted situations in which the logic and structure of two systems do not agree. Especially because Dewey tends to enumerate combinations of subject and attributes that do not always logically belong together. For instance, 850 Literatures of Italian, Sardinian, Dalmatian, Romanian, Rhaeto-Romanic languages Italian literature. This class mixes languages from three different subgroups of Romance languages. Italian and Sardinian belong to Italo Romance sub-family; Romanian and Dalmatian are Balkan Romance languages and Rhaeto Romance is the third subgroup that includes Friulian Ladin and Romanch. As UDC literature is based on a strict classification of language families, Dewey class 850 has to be mapped to 3 narrower UDC classes 821.131 Literature of Italo-Romance Languages , 821.132 Literature of Rhaeto-Romance languages and 821.135 Literature of Balkan-Romance Languages, or to a broader class 821.13 Literature of Romance languages. Hence we have to be sure that we have all these classes listed in the UDC Summary to be able to express UDC-DDC many-to-one, specific-to-broader relationships.

Another challenge appears when, e.g., mapping Dewey class 890 Literatures of other specific languages and language families, which does not make sense in UDC in which all languages and literatures have equal status. Standard UDC schedules do not have a selection of preferred literatures and other literatures. In principle, UDC does not allow classes entitled 'others' which do not have defined semantic content. If entities are subdivided and there is no provision for an item outside the listed subclasses then this item is subsumed to a top class or a broader class where all unspecified or general members of that class may be expected. If specification is needed this can be divided by adding an alphabetical extension to the broader class. Here we have to find and list in the UDC Summary all literatures that are 'unpreferred' i.e. lumped in the 890 classes and map them again as many-to-one specific-to-broader match.

The example below illustrates another interesting case. Classes Dewey 061 and UDC 06 cover roughly the same semantic field but in the subdivision the Dewey Summaries lists a combination of subject and place and as an enumerative classification, provides ready made numbers for combinations of place that are most common in an average (American?) library. This is a frequent approach in the schemes created with the physical book arrangement, i.e. library shelves, in mind. UDC, designed as an indexing language for information retrieval, keeps subject and place in separate tables and allows for any concept of place such as, e.g. (7) North America to be used in combination with any subject as these may coincide in documents. Thus combinations such as Newspapers in North America, or Organizations in North America would not be offered as ready made combinations. There is no selection of 'preferred' or 'most needed countries' or languages or cultures in the standard UDC edition:



If we map the Dewey Summaries to UDC in general and do not have to worry about a reverse relationship the situation is very simple as shown above.

Mapping of UDC Summary to Dewey Summaries requires more thought.

Firstly, UDC class (7) North America (common auxiliary of place) which simply represents the place has to be mapped to all occurrences in which this place is 'built in' to the Dewey subjects:

063 Organization of North America
073 Journalism of North America
917 Geography of North America
970 History of North America
277 Christianity in North America
317 General Statistics in North America
557 Earth Sciences of North America

The type of mapping from what is a general UDC concept of place (7) North America to a specific subject is clearly a broader-to-narrow match. Mapping of, for instance, UDC class 07 Newspapers. The press (includes journalism) to DDC class of 073 Journalism of North America is again broad-to-narrow match.

Precombined subjects, such as those shown above from Dewey, may be expressed in UDC Summary as examples of combination within various records. To express an exact match UDC class 07 has to contain example of combination 07(7) Journals. The Press - North America. In some cases we have, therefore, added examples to UDC Summary that represent exact match to Dewey Summaries. It is unfortunate that DDC has so many classes on the top level that deal with a selection of countries or languages that are given a preferred status in the scheme, and repeating these preferences in examples of combinations of UDC emulates an unwelcome cultural bias which we have to balance out somehow.

This brings us to another challenge... UDC 913(7) Regional Geography - North America [contains 2 concepts each of which has its URI] is an exact match to Dewey 917 [represented as one concept, 1 URI]. It seems that, because they represent an exact match to Dewey numbers, these UDC examples of combinations may also need a separate URIs so that they can be published as SKOS data.

Albeit challenging, mapping proves to be a very useful exercise and I am looking forward to future work here especially in relation to our plans to map UDC Summary to Colon Classification. We are discussing this project with colleagues from DRTC in Bangalore (India).

UDC Summary - translation in progress for 21 languages

The UDC Summary translation team had a busy week. We have uploaded the top classes for Estonian and Armenian languages, just a day after we uploaded the top classes for Hindi and over 800 classes of Norwegian that we managed to extract from TEKORD data (courtesy of Rurik Greenal).

We now have 21 languages online and over 30 volunteers working on translations.

Our online translation tool is being enhanced as we speak. A browsing list with a colour scheme indicating record completion and enabling easy selection of records for translation was also added last week.

The online editor now allows the editing of a subject index and mapping. Access to this is now available for contributors working in this area.

The translation progress statistics can now be viewed for all 21 languages.

The progress statistics page harvests up-to-the-minute completion statistics for each language from the UDCS database and displays them in graph format using jQuery and jqPlot. The percentage completion figures for each language (compared to English) are shown in tables as the ones exposed on the right.

Saturday 20 February 2010

UDC Summary: 17 languages online

This weekend we uploaded over 2000 UDC classes in the Ukranian language into the UDC Summary. This is the 17th language so far.

Thanks to help from the publishers and editors of national editions and editors of the UDC Summary, we managed to import almost the complete set of UDC numbers that we needed for many languages.

Our online translator seems to do its job and is being expanded with further features as we speak. Most of the credit, however, goes to our hard working volunteers without whom the whole project would not be possible. We expect that many languages of those that are already online will be completed and proofread by June 2010.

The first alphabetical index and mapping to Dewey summary will appear in March. And we also hope to have the first useful exports available for download. We will be looking for other mappings that may be available and we welcome ideas and suggestions.

Friday 22 January 2010

December issue of the Classification & Indexing Section Newsletter


The latest issue of the Classification & Indexing Section Newsletter (IFLA) contains a short report from the UDC Seminar 2009 "Classification at a Crossroad: multiple directions to usability".

The issue also contain a short text about the multilingual UDC Summary project.