Fantasies of a Man Short of Time: Semantic Web - Answers from My Contemporaries

1)Is all the Excitement Real ? - Basically, I don't see any "excitement" about the semantic web outsidethe W3C and a few academic and government communities. Many of theacademics seem to be cynical, actually - subtly repackaging theirresearch as a "Semantic Web" effort to get funding. There is a lot of*concern* about the problems that the semantic web addresses(especially in the US government) and *interest* in this approach tosolving them, but not a self-amplifying boom like we saw with the Weba decade ago.
2) Semantic Web has few Closed Issues .
3) XML's value proposition for the semantic web is pretty much what itis everywhere else -- its sheer popularity outweighs its numerousflaws.
4) I suspect that something even smaller than OWL-Lite will cover most ofthe functionality that real people put in real ontologies for theforseeable future.
5) The problem IMHO is thatontology building is an extremely difficult activity; think of howlong it took the medical community to come up with their formalvocabulary *and* the biological theories on which it is based.
6) Adoption Curve -

First, Ontologies areunlikely to get much traction, IMHO, as long as they are called"ontologies", and one is forced to be conversant with formalsemantics, formal logics, etc. in order to use them in an enterpriseIT project. Somehow this approach has to be repackaged in a way thatrests cleanly on the foundation of semantic web theory/technology,but exposes only those concepts (and terminology) that are accessibleto ordinary mortals.Second, somehow the process of building usefully large but consistententerprise ontologies must be made more feasible. I'm not sure howpossible this is in principle (Goëdel had something to say along theselines?) but presumably most enterprises have enough structuredinformation in their glossaries, data dictionaries, businessprocesses, and existing IT operations that can be captured andusefully reasoned about .... given time. Technology can only automatethe tedium of humans doing, not take garbage in and spit consistentontologies out. Will technologies that effectively support whathumans need to do to make this happen come onto the market? I seesome hopeful signs, but I don't think Protege even comes close tobeing useful to the kinds of people who will have to do this in themainstream world.Third, if ontology-building is a top-down approach to supportingsemantics, there's a question of whether the bottom-up approach ofmaking sense of things by induction will actually work better. (Theeternal induction vs deduction debate ...). See, for examplehttp://www.eetimes.com/article/showArticle.jhtml?articleID=51201131'Sony Computer Science Laboratory is positioning its "emergentsemantics" as a self-organizing alternative to the W3C's SemanticWeb'. The bottom-up approaches (e..g. the Google approach to webserarch, the SpamBayes approach to spam filtering) seems to be awfullygood at hitting 80:20 points in the real world while the top-downapproaches are still research projects. I am personally convincedthat the bottom-up approach will continue to rule in massivelyunstructured domains such as the web and email; I'm not so sure thatthe top-down ontology building might not be more efficient insituations where there is a lot of semantic structure (e.g. enterpriseIT shops) but it just has to be captured and exploited. Nevertheless,both seems quite viable at the moment, and different companies arebetting on one, the other, or both. It may be that the rate ofadoption of ontologies will be stifled by early successes from thebottom-up approach in real enterprise applications.

>> I agree with Michaels comments, in particular the bit about the bottomup approach being applicable. The other thing I'll note is that theGovernment has a vested interest in driving much of this. One canthink of an Ontology infrastructure as being the Web equivalent of thefederal interstate system. It's a problem that needs to be solved onthe national scale and there is economic benefit as well as all theindirect social (security, medical, you name it) benefit. However,the idea of the government being in charge of how universal knowledgediscovery is done scares the heck out of me, some form of publicoversight with real clout is needed.

>> Even if you have thetechnology to be shared for creating ontologies, theinherently local nature of meaning indicates thatbottom up approaches are likely to dominate. XMLis successful precisely because it only constrainswhat is usefully sharable (mainly, syntax), and thenutility drops off proportional to the size of thecommunity of interest.As to the syntax, XMLers like/tolerate XML and benefitfrom the sharable tools. Otherwise, there is a nearuniversal loathing of its verbosity particularly insome AI and ontology circles. John Sowa lets fly aboutonce a week on that topic.

>>One problem I see, considering how long people have been talking about theSemantic Web, is that there's still surprisingly little data to form into aweb. (I'm just talking in terms of publicly available non-transient RDF.) Iwonder how far XML would have gotten if we'd all spent the first few yearswriting DTDs and only occasionally created little document instances todemonstrate how our DTDs might be used. That didn't happen because peoplecoming out of the SGML world already had plenty of real-world applicationsin which to use DTDs and documents, and the dot com boom gave people lotsmore ideas, but the amount of practical, usable RDF data still seemsremarkably small. I've been compiling a list at rdfdata.org, and it'sgetting harder and harder to find new entries.One could argue that we don't need RDF to build a viable semantic web, butRDF does address problems that need to be addressed, so if you pull it outof the equation something else needs to be plugged back in.

>>Actually, in the MISMO (Mortgage Industry Standards MaintenanceOrganization, which is the agreed upon standards body in the UnitedStates for Mortgage Technology) working groups this has been coming upquite a bit. In its standards process MISMO maintains a data dictionaryof terms that work across the industry, as well as a variety ofstructures (grouped in process areas and transactions) where these termsare used.It seems like a perfect candidate for a top down approach of semanticdescription, possibly via OWL. To be honest on a macro level the problemseems tenable-- much like the examples floating around the web of theWineries and wines, it seems like it would be pretty simple to develop astrategy for describing the data-points, and ultimately the way in whichthey can/should be used (even on a process/transaction basis). Maybethat is because I mentally skipped some things that were important tounderstand...But as Michael said, there is a lot of resistance to terminology--ontology, description logics, KR, etc.-- and we don't have enoughexperts from that domain (i.e., I am not an expert in that domain).There is also an ingrained need for ROI. Unfortunately, predicting ROIin this space is difficult because of a lack of visible successes. Itwould help if the media stopped focusing on what-if and started focusingon what-happened.But ultimately it strikes me that the solution is somewhere in betweenthe top-down and bottom-up approach. It would be really great ifindustry organizations such as MISMO created ontologies for their spaceand people could interact with them using their own local definitionsand mapping them together using equivalence classes. Especially in themortgage industry, if interfacing with a business partner was simply amatter of identifying like terms, and structure was invisible, then Ithink we will have made incredible progress. If you can eliminate theneed for a programmer who understands the esoteric terms of the industryand enable the business experts to identify terms you will greatlyreduce the time and money spent interfacing.Perhaps this is a limited or wrong view of the Semantic Web. But it is asmall step.

>> BTW to the original poster: a better place toask this question would be one where the ontologyexperts hang out. One such list is theConceptual Graphs list. cg@cs.uah.eduCode lists are a productive place to start.This seems easy, but it isn't although it is theeasiest of the problems once one gets past thesyntax and terms of the semantic web app itself.Industry lists have been around for years. Gettingthose into formats that are readily processable isa step in the right direction. Then 'to do what?'Local doesn't always mean 'in our shop'. An industryis a locale of sorts. The mushiness is domain overlap.For instance, we sell systems with jail commissaries.Some of the terminology is local to the 'jail business'but the items sold are items obtainable in mostcommercial stores. Then there are some items which onewould only see in a detention or corrections facilitybut are nonetheless, items one obtains at the jail.This sorting of the domains if done well can providegood code lists, but then one implements say a dropdownthat has members from multiple codelists. Domainoverlap (a domain subsuming multiple domains withsome common members and slightly different definitions)and domain leakage (a member that is adopted from onedomain into another with not so small differences indefinition but the assumption of equivalence) are apart of the semantic drift problem.If the semantic web has one very large hurdle, it isthe very dynamic nature of meaning with regards tochanging intent. Do the best you can but no onecan make time or meaning stand still. YMMV.

>>Sure they can, in the form of contracts. Essentially that is what OWL isfor right-- a contract about the nature/meaning of a particular piece ofinformation? Sure, those considerations will change over time but thatis what versioning is for?Semantic drift is to be expected, and I'll grant that it is a problembut that doesn't mean it makes the whole process useless. I know thatthe fidelity of an MP3 recorded from a CD and an old cassette are twowildly different things. I know that converting the MP3 to anotherformat and back will likely involve some loss-- but it doesn't mean thatthe information is useless, I just have to approach soberly.Code lists are great, shared code lists are more great-- but for eachlevel you go out you have to keep in mind that there will be somelossiness. Fine. Still, sign me up-- if I have a program that can automap 1800 out of 2000 fields reliably, I'll use it.

>>The problem is it isn't contract, but contracts.RFP by RFP. It is great if they can all referenceone ontology, but for that to work, that ontologyhas to be the sum of their requirements; whaddaygit?Another bloated specification. Just whining, here.It isn't that the ontology drifts: it is thatmeaning drifts. Will I accept a noise ratioof 5 to 1? Sure. Sobriety rules. One can'tcount on a large non-local community being soberall the time in all of the places where theymake their decisions. So not just sober choice,but well-considered application. That is as goodas it gets and why many said that frictionlesscomputing was/is nonsense, so YMMV.Don't get me wrong. We're very happy to getstandards for the codelists we use. Stuff theminto an enumeration and let us suck them via anXMLReader right into the database, then to thedropdown. Very happy indeed. But the real trickis to in near real time detect that a user in aparticular context chose the wrong value from thatlist. This is when the semantic stuff starts tohave more value.

>>Kendall,We discussed it before because I had said (a bit facetiously) that thecurrent Semantic Web is mostly FOAF files, tools, and talk. I certainlywouldn't deny that FOAF files are part of the Semantic Web; without them,there'd be little left!As I've mentioned on the rdf-interest list, I still haven't heard a use casethat demonstrates what value RSS 1.0 files of transient data can play in asemantic web. If it was current practice to archive them (like monkeyfistdoes) and I was reading an article by someone and wanted to see more by thatperson, semantic web technology crawling RSS 1.0 archives would make it easyto turn up more articles by that person. Maybe not everything he ever wrote,because in some bylines he may use his middle initial or called himself"James" instead of "Jim", but I would have found something.It's not that I'm against transient data having any role period. Movietimetables are transient data, so if someone made those available as RDFfiles (haven't found anyone who does yet), I could obviously see why thosewould be useful. I'm just wondering how people can apply semantic webtechnology to take advantage of transient RSS 1.0 files to do things thatthey can't do with RSS .9, 2.0, etc. files. In other words, what makes thempart of the semantic web? The mere fact that they're in RDF?The SemWeb life sciences conference is a great example of how a specificdomain, especially one currently suffering from data overload, is fertileground for proving the value of semantic web technology, and publiclyavailable data is appearing (http://www.rdfdata.org/data.html#bio). I wasjust telling a biomedical research professor about it over the weekend, andhe was anxious to hear more.Bob

Bob,We've talked about this before, but every FOAF and RSS 1.0 resource is anRDF file. I don't know why you discount that data as non-transient. Thatpeople don't archive all of their RSS 1.0 events seems a matter of a bestpractice. It doesn't change the fact that there are *lots* of RSS 1.0 (whichare RDF) resources on the Web. (And there are good social reasons for whypeople might not want to maintain all their FOAF versions.)It seems to me that we're maybe in the "intranet" phase of the Semantic Web,that is, lots of non-public RDF inside enterprise and institutional walls,while the amount of RDF on the public Web continues to grow (even if notexponentially).Lots of folks using RDF and OWL in the life sciences world, or so I learnedat the W3C's workshop about SemWeb in LifeSci in Boston a few weeks ago, andthe great majority of that isn't on the public Web.My two cents, anyway. :>Kendall ClarkManaging Editor, XML.com

>> believe the notion behind the semantic web is many fairly small,intersecting ontologies. As described in TBL underground map:http://www.w3.org/2003/Talks/0922-rsoc-tbl/slide23-0.html. Each colored linein this diagram corresponds to an ontology. No single line visits all thestations; but several stations are visited by more than one line.Information is shared within one ontology to interoperate between, say, theaddress book and events. Another ontology interoperates between events andphotos. The result is interoperation of addresses and photos. This is donewithout requiring all stakeholders to agree upon a single interlingua thatcovers all information silos at once.I can't really see how one ontology could be practical even in much smallerenvironment than Sem Web - such as a single company or a single departmentwithin a company. Often, even a single application will require multiplemodular ontologies.In theory, the modularity of ontology models should provide the flexibilityneeded to accommodate different contexts. One could also only reference/usepart of an ontology - parts one can "agree with" - without committing to theentire ontology. In practice, we are still figuring out how this will allwork.

> The problem is it isn't contract, but contracts.
> RFP by RFP. It is great if they can all reference
> one ontology, but for that to work, that ontology
> has to be the sum of their requirements;I was going to say something similar, but from the enterpriseintegration context: It's great if you can get an ontology thatdescribes the implicit semantics in a bunch of applications anddatabases by relating them back to the actual business functions theyserve. BUT it is highly unlikely, in my experience anyway, that theontology will remain the master "contract". Instead, the apps and DBsand business processes will evolve, as they always do, and IF &deity;smiles on us the ontology will be kept in synch.&deity; is, however, a capricious god :-) and seldom smiles on thegeeks trying to make life difficult for the people who are doing whatthey have to do to make the numbers this quarter or whatever.

>>Quite. No one expects a single interlingua,not before TBL or afterwards. These are thewell-known problems of ontologies. The betterauthorities than TBL are people such as JohnSowa, Pat Hayes, etc.Until you map a working ontology to a working database,the practical aspects of size and modularity aren'tapparent. Only a novice builds a database with onegiant very wide table. On the other hand, ensuringthat one has used all of the terminology correctly toname tables and columns, keeping these semanticallyconsistent, and avoiding full normalization thatcan create performance problems is quite an art.So the single upper level ontology that would spancultures, users and space-time is a pipedream.So no disagreement here.XML works because it knows nothing of meaning.Networks are predicated on the notion that thechoices are meaningless to the network (Seethe first page of Shannon and Weaver's work.)Notion one is reproducibility, not interpretability.A meaningful network is almost an oxymoron. A networkof users dynamically negotiating and validating themeaning of messages isn't.

>>
In theory, the modularity of ontology models should provide the flexibility
> needed to accommodate different contexts. One could also only reference/use
> part of an ontology - parts one can "agree with" - without committing to the
> entire ontology. In practice, we are still figuring out how this will all
> work.

Forgive me if this is something I should have learned in SemWeb 101,but doesn't any inferencing mechanism based on logic assume that theontologies are consistent? How does one ensure that the parts ofmultiple ontologies that one "agrees with" are consistent with oneanother? And if they're not, an inferencer could come to anyconclusion whatsoever (e.g. the possibly apocryphal story of BertrandRussell proving that he is the Pope from the premise that 2+2=5) ...or what am I missing here?In practice, what DOES one do, other than work with simple and unitaryontologies that don't imply anything remotely interesting, but letsoftware agents automate the grunt work of generating queries,transformations, etc. that are just too tedious for humans to doquickly and accurately. That's use case for the semantic webtechnologies that I can both grok and see an application for, FWIW.

>>>
Even then, looser can be better, at least until thenumber crunchers get into the act. The act of measurementis the surest expression of a semantic, or something likethat. Otherwise, from the geek perspective, looser lastslonger. We can spend enormous amounts of time identifyingall of the individually meaningful items, or we can implementtwo text boxes labeled Request and Response and get on withbusiness.This of course, negates traceability. So when building anenterprise app, it can be useful to have a 50k foot viewof the end-to-end lifecycle of all of the documents andthe items they control. Really precise data items makeit harder than it has to be if the systems act mainlyas transport/storage, not an interpreter. If the human isinterpreting and taking all of the critical actions,labeled textboxes do just as well. The fear and loathingstarts down in the queries and particularly any placethe system is performing hidden calculations.Cacheing and naming never get easier.

>>Yes, this is exactly right. Semantic Web is all about working with simpleunitary ontologies and having software agents go at them.I don't think you are missing anything. One of the motivations for common"upper" ontologies is that you support the interoperability of yourontologies by maiking them all consistent with the UO. So this could be asolution, but I have difficulty believing in the feasibility of making thishappen, although there are people who swear by it. I know of some work onreasoners that manage contexts, so that you don't have to import all of yourforeign ontology to do reasoning, but this still has the issue of how oneknows it is consistent when you do.

>>One approach to the upper ontology, or any ontology really,is to accept that it is, like law, an artifice. It works aswell as it works when it works and that is as well as it willwork. Like your car, it gets a job done and when it doesn't,you or someone else can fix it.The question of the semantic web is the golem problem: howmuch power and authority will you give the artifice over yourchoices? Otherwise, don't mistake a tool for the truth ofthe results of using the tool. A computer doesn't know howto add 2 + 2. It can be used to simulate that operation andgive a repeatable result. If 2 + 2 = 4 for an acceptablenumber of uses, it is a useful tool. If you hit the onecontext in which that isn't true, it fails. So understandin advance what you are committing to and what the bet is.An interesting question might be, when is an ontology expressingsomething non-trivial? Where there are doubts about the valueof the semantic web, they are related to that question. Thecost of an expert system proved to be very high for theutility it provided over a deliberately limited domain.The assumption seems to be that some of the scaling magicof the WWW will be obtained for the Semantic Web, but again,networks scale precisely because they are NOT meaningful.So this bet may not be a good one.Treat ontologies like law: to be useful, law must betestable or enforceable. Thus the notion of commitmentto rule by law and to an ontology (see Thomas Gruber). Inone view one might say, an ontology is a computable meansfor expressing a precedent. Expressing and applying aprecedent is a matter of judgement, not truth. It isalso useful to inquire of how often you will find asystem useful based on the frequency with which it haltsand asks you a clarifying question, and the value interms of work when it does that? Interupts are expensive.

>> have begun to see the value of this within the US federal governmentspace. The federal government is extremely data-rich, and agencies wantto share information to save money, earn efficiencies, and potentiallyincrease overall data quality (through elimination of redundant, butpossibly inconsistent, data sources). But in order to determine whatinformation can be shared, it is first necessary to identify whatinformation (or types of information) are available *to* share.Ontologies and taxonomies are, I believe, wonderful mechanisms by whichto accomplish this.In addition to identifying opportunities for information sharing, theseartifacts can also identify opportunities for federated queries (perhapsusing Enterprise Information Integration - EII). Consider a hypotheticalsituation in which 2 agencies have arrest information for an individual- but one has it on a domestic basis, and another on an internationalbasis. A federated query between these two data sources - which can bedetermined by comparing their ontologies and taxonomies - can yield anarrest record for a given individual on an international basis.I have found that in educating unfamiliar folks on these artifacts, itworks best to use examples within their own domain. Familiary with theirown data and concepts greatly eases the mental transition.

Fantasies of a Man Short of Time

Wednesday, November 10, 2004

Semantic Web - Answers from My Contemporaries

2 Comments:

About Me

Previous Posts