• 244.00 KB
  • 2022-04-29 14:33:38 发布

数据库系统概念全套配套课件PPT ch23.ppt

  • 56页
  • 当前文档由用户上传发布,收益归属用户
  1. 1、本文档共5页,可阅读全部内容。
  2. 2、本文档内容版权归属内容提供方,所产生的收益全部归内容提供方所有。如果您对本文有版权争议,可选择认领,认领后既往收益都归您。
  3. 3、本文档由用户上传,本站不保证质量和数量令人满意,可能有诸多瑕疵,付费之前,请仔细先通过免费阅读内容等途径辨别内容交易风险。如存在严重挂羊头卖狗肉之情形,可联系本站下载客服投诉处理。
  4. 文档侵权举报电话:19940600175。
'Chapter23:XML XMLStructureofXMLDataXMLDocumentSchemaQueryingandTransformationApplicationProgramInterfacestoXMLStorageofXMLDataXMLApplications IntroductionXML:ExtensibleMarkupLanguageDefinedbytheWWWConsortium(W3C)DerivedfromSGML(StandardGeneralizedMarkupLanguage),butsimplertousethanSGMLDocumentshavetagsgivingextrainformationaboutsectionsofthedocumentE.g.XMLIntroduction…Extensible,unlikeHTMLUserscanaddnewtags,andseparatelyspecifyhowthetagshouldbehandledfordisplay XMLIntroduction(Cont.)Theabilitytospecifynewtags,andtocreatenestedtagstructuresmakeXMLagreatwaytoexchangedata,notjustdocuments.MuchoftheuseofXMLhasbeenindataexchangeapplications,notasareplacementforHTMLTagsmakedata(relatively)self-documentingE.g. Comp.Sci. Taylor 100000 CS-101 Intro.toComputerScience Comp.Sci 4 XML:MotivationDatainterchangeiscriticalintoday’snetworkedworldExamples:Banking:fundstransferOrderprocessing(especiallyinter-companyorders)ScientificdataChemistry:ChemML,…Genetics:BSML(Bio-SequenceMarkupLanguage),…PaperflowofinformationbetweenorganizationsisbeingreplacedbyelectronicflowofinformationEachapplicationareahasitsownsetofstandardsforrepresentinginformationXMLhasbecomethebasisforallnewgenerationdatainterchangeformats XMLMotivation(Cont.)EarliergenerationformatswerebasedonplaintextwithlineheadersindicatingthemeaningoffieldsSimilarinconcepttoemailheadersDoesnotallowfornestedstructures,nostandard“type”languageTiedtoocloselytolowleveldocumentstructure(lines,spaces,etc)EachXMLbasedstandarddefineswhatarevalidelements,usingXMLtypespecificationlanguagestospecifythesyntaxDTD(DocumentTypeDescriptors)XMLSchemaPlustextualdescriptionsofthesemanticsXMLallowsnewtagstobedefinedasrequiredHowever,thismaybeconstrainedbyDTDsAwidevarietyoftoolsisavailableforparsing,browsingandqueryingXMLdocuments/data ComparisonwithRelationalDataInefficient:tags,whichineffectrepresentschemainformation,arerepeatedBetterthanrelationaltuplesasadata-exchangeformatUnlikerelationaltuples,XMLdataisself-documentingduetopresenceoftagsNon-rigidformat:tagscanbeaddedAllowsnestedstructuresWideacceptance,notonlyindatabasesystems,butalsoinbrowsers,tools,andapplications StructureofXMLDataTag:labelforasectionofdataElement:sectionofdatabeginningwithandendingwithmatchingElementsmustbeproperlynestedPropernesting….Impropernesting….</course>Formally:everystarttagmusthaveauniquematchingendtag,thatisinthecontextofthesameparentelement.Everydocumentmusthaveasingletop-levelelement ExampleofNestedElements P-101 …. RS1 Atompoweredrocketsled 2 199.95 SG2 Superbglue 1 liter 29.95 MotivationforNestingNestingofdataisusefulindatatransferExample:elementsrepresentingitemnestedwithinanitemlistelementNestingisnotsupported,ordiscouraged,inrelationaldatabasesWithmultipleorders,customernameandaddressarestoredredundantlynormalizationreplacesnestedstructuresineachorderbyforeignkeyintotablestoringcustomernameandaddressinformationNestingissupportedinobject-relationaldatabasesButnestingisappropriatewhentransferringdataExternalapplicationdoesnothavedirectaccesstodatareferencedbyaforeignkey StructureofXMLData(Cont.)Mixtureoftextwithsub-elementsislegalinXML.Example: Thiscourseisbeingofferedforthefirsttimein2009. BIO-399 ComputationalBiology Biology 3 Usefulfordocumentmarkup,butdiscouragedfordatarepresentation AttributesElementscanhaveattributes Intro.toComputerScience Comp.Sci. 4 Attributesarespecifiedbyname=valuepairsinsidethestartingtagofanelementAnelementmayhaveseveralattributes,buteachattributenamecanonlyoccuronce Attributesvs.SubelementsDistinctionbetweensubelementandattributeInthecontextofdocuments,attributesarepartofmarkup,whilesubelementcontentsarepartofthebasicdocumentcontentsInthecontextofdatarepresentation,thedifferenceisunclearandmaybeconfusingSameinformationcanberepresentedintwoways CS-101Suggestion:useattributesforidentifiersofelements,andusesubelementsforcontents NamespacesXMLdatahastobeexchangedbetweenorganizationsSametagnamemayhavedifferentmeaningindifferentorganizations,causingconfusiononexchangeddocumentsSpecifyingauniquestringasanelementnameavoidsconfusionBettersolution:useunique-name:element-nameAvoidusinglonguniquenamesalloverdocumentbyusingXMLNamespacesCS-101Intro.toComputerScienceComp.Sci.4 … MoreonXMLSyntaxElementswithoutsubelementsortextcontentcanbeabbreviatedbyendingthestarttagwitha/>anddeletingtheendtagTostorestringdatathatmaycontaintags,withoutthetagsbeinginterpretedassubelements,useCDATAasbelow…]]>Here,andaretreatedasjuststringsCDATAstandsfor“characterdata” XMLDocumentSchemaDatabaseschemasconstrainwhatinformationcanbestored,andthedatatypesofstoredvaluesXMLdocumentsarenotrequiredtohaveanassociatedschemaHowever,schemasareveryimportantforXMLdataexchangeOtherwise,asitecannotautomaticallyinterpretdatareceivedfromanothersiteTwomechanismsforspecifyingXMLschemaDocumentTypeDefinition(DTD)WidelyusedXMLSchemaNewer,increasinguse DocumentTypeDefinition(DTD)ThetypeofanXMLdocumentcanbespecifiedusingaDTDDTDconstraintsstructureofXMLdataWhatelementscanoccurWhatattributescan/mustanelementhaveWhatsubelementscan/mustoccurinsideeachelement,andhowmanytimes.DTDdoesnotconstraindatatypesAllvaluesrepresentedasstringsinXMLDTDsyntax ElementSpecificationinDTDSubelementscanbespecifiedasnamesofelements,or#PCDATA(parsedcharacterdata),i.e.,characterstringsEMPTY(nosubelements)orANY(anythingcanbeasubelement)ExampleSubelementspecificationmayhaveregularexpressionsNotation:“|”-alternatives“+”-1ormoreoccurrences“*”-0ormoreoccurrences UniversityDTD ]> AttributeSpecificationinDTDAttributespecification:foreachattributeNameTypeofattributeCDATAID(identifier)orIDREF(IDreference)orIDREFS(multipleIDREFs)moreonthislaterWhethermandatory(#REQUIRED)hasadefaultvalue(value),orneither(#IMPLIED)Examples,or IDsandIDREFsAnelementcanhaveatmostoneattributeoftypeIDTheIDattributevalueofeachelementinanXMLdocumentmustbedistinctThustheIDattributevalueisanobjectidentifierAnattributeoftypeIDREFmustcontaintheIDvalueofanelementinthesamedocumentAnattributeoftypeIDREFScontainsasetof(0ormore)IDvalues.EachIDvaluemustcontaintheIDvalueofanelementinthesamedocument UniversityDTDwithAttributesUniversityDTDwithIDandIDREFattributetypes. ···declarationsfortitle,credits,building, budget,nameandsalary···]> XMLdatawithIDandIDREFattributesTaylor100000Watson90000Intro.toComputerScience4….Srinivasan65000…. LimitationsofDTDsNotypingoftextelementsandattributesAllvaluesarestrings,nointegers,reals,etc.DifficulttospecifyunorderedsetsofsubelementsOrderisusuallyirrelevantindatabases(unlikeinthedocument-layoutenvironmentfromwhichXMLevolved)(A|B)*allowsspecificationofanunorderedset,butCannotensurethateachofAandBoccursonlyonceIDsandIDREFsareuntypedTheinstructorsattributeofancoursemaycontainareferencetoanothercourse,whichismeaninglessinstructorsattributeshouldideallybeconstrainedtorefertoinstructorelements XMLSchemaXMLSchemaisamoresophisticatedschemalanguagewhichaddressesthedrawbacksofDTDs.SupportsTypingofvaluesE.g.integer,string,etcAlso,constraintsonmin/maxvaluesUser-defined,comlextypesManymorefeatures,includinguniquenessandforeignkeyconstraints,inheritanceXMLSchemaisitselfspecifiedinXMLsyntax,unlikeDTDsMore-standardrepresentation,butverboseXMLSchemeisintegratedwithnamespacesBUT:XMLSchemaissignificantlymorecomplicatedthanDTDs. XMLSchemaVersionofUniv.DTD….…Contd. XMLSchemaVersionofUniv.DTD(Cont.)….Choiceof“xs:”wasours--anyothernamespaceprefixcouldbechosenElement“university”hastype“universityType”,whichisdefinedseparatelyxs:complexTypeisusedlatertocreatethenamedcomplextype“UniversityType” MorefeaturesofXMLSchemaAttributesspecifiedbyxs:attributetag:addingtheattributeuse=“required”meansvaluemustbespecifiedKeyconstraint:“departmentnamesformakeyfordepartmentelementsundertherootuniversityelement:Foreignkeyconstraintfromcoursetodepartment: QueryingandTransformingXMLDataTranslationofinformationfromoneXMLschematoanotherQueryingonXMLdataAbovetwoarecloselyrelated,andhandledbythesametoolsStandardXMLquerying/translationlanguagesXPathSimplelanguageconsistingofpathexpressionsXSLTSimplelanguagedesignedfortranslationfromXMLtoXMLandXMLtoHTMLXQueryAnXMLquerylanguagewitharichsetoffeatures TreeModelofXMLDataQueryandtransformationlanguagesarebasedonatreemodelofXMLdataAnXMLdocumentismodeledasatree,withnodescorrespondingtoelementsandattributesElementnodeshavechildnodes,whichcanbeattributesorsubelementsTextinanelementismodeledasatextnodechildoftheelementChildrenofanodeareorderedaccordingtotheirorderintheXMLdocumentElementandattributenodes(exceptfortherootnode)haveasingleparent,whichisanelementnodeTherootnodehasasinglechild,whichistherootelementofthedocument XPathXPathisusedtoaddress(select)partsofdocumentsusingpathexpressionsApathexpressionisasequenceofstepsseparatedby“/”ThinkoffilenamesinadirectoryhierarchyResultofpathexpression:setofvaluesthatalongwiththeircontainingelements/attributesmatchthespecifiedpathE.g./university-3/instructor/nameevaluatedontheuniversity-3datawesawearlierreturnsSrinivasan BrandtE.g./university-3/instructor/name/text()returnsthesamenames,butwithouttheenclosingtags XPath(Cont.)Theinitial“/”denotesrootofthedocument(abovethetop-leveltag)PathexpressionsareevaluatedlefttorightEachstepoperatesonthesetofinstancesproducedbythepreviousstepSelectionpredicatesmayfollowanystepinapath,in[]E.g./university-3/course[credits>=4]returnsaccountelementswithabalancevaluegreaterthan400/university-3/course[credits]returnsaccountelementscontainingacreditssubelementAttributesareaccessedusing“@”E.g./university-3/course[credits>=4]/@course_idreturnsthecourseidentifiersofcourseswithcredits>=4IDREFattributesarenotdereferencedautomatically(moreonthislater) FunctionsinXPathXPathprovidesseveralfunctionsThefunctioncount()attheendofapathcountsthenumberofelementsinthesetgeneratedbythepathE.g./university-2/instructor[count(./teaches/course)>2]Returnsinstructorsteachingmorethan2courses(onuniversity-2schema)Alsofunctionfortestingposition(1,2,..)ofnodew.r.t.siblingsBooleanconnectivesandandorandfunctionnot()canbeusedinpredicatesIDREFscanbereferencedusingfunctionid()id()canalsobeappliedtosetsofreferencessuchasIDREFSandeventostringscontainingmultiplereferencesseparatedbyblanksE.g./university-3/course/id(@dept_name)returnsalldepartmentelementsreferredtofromthedept_nameattributeofcourseelements. MoreXPathFeaturesOperator“|”usedtoimplementunionE.g./university-3/course[@deptname=“Comp.Sci”]| /university-3/course[@deptname=“Biology”]GivesunionofComp.Sci.andBiologycoursesHowever,“|”cannotbenestedinsideotheroperators.“//”canbeusedtoskipmultiplelevelsofnodesE.g./university-3//namefindsanynameelementanywhereunderthe/university-3element,regardlessoftheelementinwhichitiscontained.Astepinthepathcangotoparents,siblings,ancestorsanddescendantsofthenodesgeneratedbythepreviousstep,notjusttothechildren“//”,describedabove,isashortfromforspecifying“alldescendants”“..”specifiestheparent.doc(name)returnstherootofanameddocument XQueryXQueryisageneralpurposequerylanguageforXMLdataCurrentlybeingstandardizedbytheWorldWideWebConsortium(W3C)ThetextbookdescriptionisbasedonaJanuary2005draftofthestandard.Thefinalversionmaydiffer,butmajorfeatureslikelytostayunchanged.XQueryisderivedfromtheQuiltquerylanguage,whichitselfborrowsfromSQL,XQLandXML-QLXQueryusesafor…let…where…orderby…result…syntaxforSQLfromwhereSQLwhereorderbySQLorderbyresultSQLselectletallowstemporaryvariables,andhasnoequivalentinSQL FLWORSyntaxinXQueryForclauseusesXPathexpressions,andvariableinforclauserangesovervaluesinthesetreturnedbyXPathSimpleFLWORexpressioninXQueryfindallcourseswithcredits>3,witheachresultenclosedinan..tagfor$xin/university-3/courselet$courseId:=$x/@course_idwhere$x/credits>3return{$courseId}ItemsinthereturnclauseareXMLtextunlessenclosedin{},inwhichcasetheyareevaluatedLetclausenotreallyneededinthisquery,andselectioncanbedoneInXPath.Querycanbewrittenas:for$xin/university-3/course[credits>3]return{$x/@course_id}Alternativenotationforconstructingelements:returnelementcourse_id{element$x/@course_id} JoinsJoinsarespecifiedinamannerverysimilartoSQLfor$cin/university/course, $iin/university/instructor, $tin/university/teacheswhere$c/course_id=$t/courseidand$t/IID=$i/IIDreturn{$c$i}ThesamequerycanbeexpressedwiththeselectionsspecifiedasXPathselections:for$cin/university/course, $iin/university/instructor, $tin/university/teaches[$c/course_id=$t/course_idand$t/IID=$i/IID]return{$c$i} NestedQueriesThefollowingqueryconvertsdatafromtheflatstructureforuniversityinformationintothenestedstructureusedinuniversity-1 {for$din/university/departmentreturn {$d/*} {for$cin/university/course[deptname=$d/deptname]return$c} } {for$iin/university/instructorreturn {$i/*} {for$cin/university/teaches[IID=$i/IID]return$c/courseid} } $c/*denotesallthechildrenofthenodetowhich$cisbound,withouttheenclosingtop-leveltag GroupingandAggregationNestedqueriesareusedforgroupingfor$din/university/departmentreturn{$d/deptname}{fn:sum(for$iin/university/instructor[dept_name=$d/dept_name]return$i/salary)} SortinginXQueryTheorderbyclausecanbeusedattheendofanyexpression.E.g.toreturninstructorssortedbynamefor$iin/university/instructororderby$i/namereturn{$i/*}Useorderby$i/namedescendingtosortindescendingorderCansortatmultiplelevelsofnesting(sortdepartmentsbydept_name,andbycoursessortedtocourse_idwithineachdepartment){for$din/university/departmentorderby$d/deptnamereturn {$d/*} {for$cin/university/course[deptname=$d/deptname]orderby$c/courseidreturn{$c/*}} } FunctionsandOtherXQueryFeaturesUserdefinedfunctionswiththetypesystemofXMLSchemadeclarefunctionlocal:dept_courses($iidasxs:string)aselement(course)* {for$iin/university/instructor[IID=$iid], $cin/university/courses[dept_name=$i/deptname]return$c }TypesareoptionalforfunctionparametersandreturnvaluesThe*(asindecimal*)indicatesasequenceofvaluesofthattypeUniversalandexistentialquantificationinwhereclausepredicatessome$einpathsatisfiesPevery$einpathsatisfiesPAddandfn:exists($e)topreventempty$efromsatisfyingeveryclauseXQueryalsosupportsIf-then-elseclauses XSLTAstylesheetstoresformattingoptionsforadocument,usuallyseparatelyfromdocumentE.g.anHTMLstylesheetmayspecifyfontcolorsandsizesforheadings,etc.TheXMLStylesheetLanguage(XSL)wasoriginallydesignedforgeneratingHTMLfromXMLXSLTisageneral-purposetransformationlanguageCantranslateXMLtoXML,andXMLtoHTMLXSLTtransformationsareexpressedusingrulescalledtemplatesTemplatescombineselectionusingXPathwithconstructionofresults ApplicationProgramInterfaceTherearetwostandardapplicationprograminterfacestoXMLdata:SAX(SimpleAPIforXML)Basedonparsermodel,userprovideseventhandlersforparsingeventsE.g.startofelement,endofelementDOM(DocumentObjectModel)XMLdataisparsedintoatreerepresentationVarietyoffunctionsprovidedfortraversingtheDOMtreeE.g.:JavaDOMAPIprovidesNodeclasswithmethodsgetParentNode(),getFirstChild(),getNextSibling()getAttribute(),getData()(fortextnode)getElementsByTagName(),…AlsoprovidesfunctionsforupdatingDOMtree StorageofXMLDataXMLdatacanbestoredinNon-relationaldatastoresFlatfilesNaturalforstoringXMLButhasallproblemsdiscussedinChapter1(noconcurrency,norecovery,…)XMLdatabaseDatabasebuiltspecificallyforstoringXMLdata,supportingDOMmodelanddeclarativequeryingCurrentlynocommercial-gradesystemsRelationaldatabasesDatamustbetranslatedintorelationalformAdvantage:maturedatabasesystemsDisadvantages:overheadoftranslatingdataandqueries StorageofXMLinRelationalDatabasesAlternatives:StringRepresentationTreeRepresentationMaptorelations StringRepresentationStoreeachtoplevelelementasastringfieldofatupleinarelationaldatabaseUseasinglerelationtostoreallelements,orUseaseparaterelationforeachtop-levelelementtypeE.g.account,customer,depositorrelationsEachwithastring-valuedattributetostoretheelementIndexing:Storevaluesofsubelements/attributestobeindexedasextrafieldsoftherelation,andbuildindicesonthesefieldsE.g.customer_nameoraccount_numberSomedatabasesystemssupportfunctionindices,whichusetheresultofafunctionasthekeyvalue.Thefunctionshouldreturnthevalueoftherequiredsubelement/attribute StringRepresentation(Cont.)Benefits:CanstoreanyXMLdataevenwithoutDTDAslongastherearemanytop-levelelementsinadocument,stringsaresmallcomparedtofulldocumentAllowsfastaccesstoindividualelements.Drawback:NeedtoparsestringstoaccessvaluesinsidetheelementsParsingisslow. TreeRepresentationTreerepresentation:modelXMLdataastreeandstoreusingrelationsnodes(id,parent_id,type,label,value)Eachelement/attributeisgivenauniqueidentifierTypeindicateselement/attributeLabelspecifiesthetagnameoftheelement/nameofattributeValueisthetextvalueoftheelement/attributeCanaddanextraattributepositiontorecordorderingofchildrenuniversity(id:1)course(id:2)department(id:5)course_id(id:3)dept_name(id:7) TreeRepresentation(Cont.)Benefit:CanstoreanyXMLdata,evenwithoutDTDDrawbacks:Dataisbrokenupintotoomanypieces,increasingspaceoverheadsEvensimplequeriesrequirealargenumberofjoins,whichcanbeslow MappingXMLDatatoRelationsRelationcreatedforeachelementtypewhoseschemaisknown:AnidattributetostoreauniqueidforeachelementArelationattributecorrespondingtoeachelementattributeAparent_idattributetokeeptrackofparentelementAsinthetreerepresentationPositioninformation(ithchild)canbestoretooAllsubelementsthatoccuronlyoncecanbecomerelationattributesFortext-valuedsubelements,storethetextasattributevalueForcomplexsubelements,canstoretheidofthesubelementSubelementsthatcanoccurmultipletimesrepresentedinaseparatetableSimilartohandlingofmultivaluedattributeswhenconvertingERdiagramstotables StoringXMLDatainRelationalSystemsApplyingaboveideastodepartmentelementsinuniversity-1schema,withnestedcourseelements,wegetdepartment(id,dept_name,building,budget)course(parentid,course_id,dept_name,title,credits)Publishing:processofconvertingrelationaldatatoanXMLformatShredding:processofconvertinganXMLdocumentintoasetoftuplestobeinsertedintooneormorerelationsXML-enableddatabasesystemssupportautomatedpublishingandshreddingManysystemsoffernativestorageofXMLdatausingthexmldatatype.Specialinternaldatastructuresandindicesareusedforefficiency SQL/XMLNewstandardSQLextensionthatallowscreationofnestedXMLoutputEachoutputtupleismappedtoanXMLelementrow Comp.Sci. Taylor 100000 ….morerowsiftherearemoreoutputtuples… …otherrelations.. SQLExtensionsxmlelementcreatesXMLelementsxmlattributescreatesattributesselectxmlelement(name“course”,xmlattributes(courseidascourseid,deptnameasdeptname),xmlelement(name“title”,title),xmlelement(name“credits”,credits))fromcourseXmlaggcreatesaforestofXMLelementsselectxmlelement(name“department”,dept_name,xmlagg(xmlforest(course_id)orderbycourse_id))fromcoursegroupbydept_name XMLApplicationsStoringandexchangingdatawithcomplexstructuresE.g.OpenDocumentFormat(ODF)formatstandardforstoringOpenOfficeandOfficeOpenXML(OOXML)formatstandardforstoringMicrosoftOfficedocumentsNumerousotherstandardsforavarietyofapplicationsChemML,MathMLStandardfordataexchangeforWebservicesremotemethodinvocationoverHTTPprotocolMoreinnextslideDatamediationCommondatarepresentationformattobridgedifferentsystems WebServicesTheSimpleObjectAccessProtocol(SOAP)standard:InvocationofproceduresacrossapplicationswithdistinctdatabasesXMLusedtorepresentprocedureinputandoutputAWebserviceisasiteprovidingacollectionofSOAPproceduresDescribedusingtheWebServicesDescriptionLanguage(WSDL)DirectoriesofWebservicesaredescribedusingtheUniversalDescription,Discovery,andIntegration(UDDI)standard EndofChapter23'