• 5.14 MB
  • 2022-04-29 14:31:55 发布

数据库系统概念全套配套课件PPT ch11.ppt

  • 90页
  • 当前文档由用户上传发布,收益归属用户
  1. 1、本文档共5页,可阅读全部内容。
  2. 2、本文档内容版权归属内容提供方,所产生的收益全部归内容提供方所有。如果您对本文有版权争议,可选择认领,认领后既往收益都归您。
  3. 3、本文档由用户上传,本站不保证质量和数量令人满意,可能有诸多瑕疵,付费之前,请仔细先通过免费阅读内容等途径辨别内容交易风险。如存在严重挂羊头卖狗肉之情形,可联系本站下载客服投诉处理。
  4. 文档侵权举报电话:19940600175。
'Chapter11:IndexingandHashing Chapter11:IndexingandHashingBasicConceptsOrderedIndicesB+-TreeIndexFilesB-TreeIndexFilesStaticHashingDynamicHashingComparisonofOrderedIndexingandHashingIndexDefinitioninSQLMultiple-KeyAccess BasicConceptsIndexingmechanismsusedtospeedupaccesstodesireddata.E.g.,authorcataloginlibrarySearchKey-attributetosetofattributesusedtolookuprecordsinafile.Anindexfileconsistsofrecords(calledindexentries)oftheformIndexfilesaretypicallymuchsmallerthantheoriginalfileTwobasickindsofindices:Orderedindices:searchkeysarestoredinsortedorderHashindices:searchkeysaredistributeduniformlyacross“buckets”usinga“hashfunction”.search-keypointer IndexEvaluationMetricsAccesstypessupportedefficiently.E.g.,recordswithaspecifiedvalueintheattributeorrecordswithanattributevaluefallinginaspecifiedrangeofvalues.AccesstimeInsertiontimeDeletiontimeSpaceoverhead OrderedIndicesInanorderedindex,indexentriesarestoredsortedonthesearchkeyvalue.E.g.,authorcataloginlibrary.Primaryindex:inasequentiallyorderedfile,theindexwhosesearchkeyspecifiesthesequentialorderofthefile.AlsocalledclusteringindexThesearchkeyofaprimaryindexisusuallybutnotnecessarilytheprimarykey.Secondaryindex:anindexwhosesearchkeyspecifiesanorderdifferentfromthesequentialorderofthefile.Alsocallednon-clusteringindex.Index-sequentialfile:orderedsequentialfilewithaprimaryindex. DenseIndexFilesDenseindex—Indexrecordappearsforeverysearch-keyvalueinthefile.E.g.indexonIDattributeofinstructorrelation DenseIndexFiles(Cont.)Denseindexondept_name,withinstructorfilesortedondept_name SparseIndexFilesSparseIndex:containsindexrecordsforonlysomesearch-keyvalues.Applicablewhenrecordsaresequentiallyorderedonsearch-keyTolocatearecordwithsearch-keyvalueKwe:Findindexrecordwithlargestsearch-keyvalueij(morethanonepointertobucketj)allocateanewbucketz,andsetij=iz=(ij+1)Updatethesecondhalfofthebucketaddresstableentriesoriginallypointingtoj,topointtozremoveeachrecordinbucketjandreinsert(injorz)recomputenewbucketforKjandinsertrecordinthebucket(furthersplittingisrequiredifthebucketisstillfull)Ifi=ij(onlyonepointertobucketj)Ifireachessomelimitb,ortoomanysplitshavehappenedinthisinsertion,createanoverflowbucketElseincrementianddoublethesizeofthebucketaddresstable.replaceeachentryinthetablebytwoentriesthatpointtothesamebucket.recomputenewbucketaddresstableentryforKjNowi>ijsousethefirstcaseabove.Tosplitabucketjwheninsertingrecordwithsearch-keyvalueKj: DeletioninExtendableHashStructureTodeleteakeyvalue,locateitinitsbucketandremoveit.Thebucketitselfcanberemovedifitbecomesempty(withappropriateupdatestothebucketaddresstable).Coalescingofbucketscanbedone(cancoalesceonlywitha“buddy”buckethavingsamevalueofijandsameij–1prefix,ifitispresent)DecreasingbucketaddresstablesizeisalsopossibleNote:decreasingbucketaddresstablesizeisanexpensiveoperationandshouldbedoneonlyifnumberofbucketsbecomesmuchsmallerthanthesizeofthetable UseofExtendableHashStructure:Example Example(Cont.)InitialHashstructure;bucketsize=2 Example(Cont.)Hashstructureafterinsertionof“Mozart”,“Srinivasan”, and“Wu”records Example(Cont.)HashstructureafterinsertionofEinsteinrecord Example(Cont.)HashstructureafterinsertionofGoldandElSaidrecords Example(Cont.)HashstructureafterinsertionofKatzrecord Example(Cont.)Andafterinsertionofelevenrecords Example(Cont.)AndafterinsertionofKimrecordinprevioushashstructure ExtendableHashingvs.OtherSchemesBenefitsofextendablehashing:HashperformancedoesnotdegradewithgrowthoffileMinimalspaceoverheadDisadvantagesofextendablehashingExtralevelofindirectiontofinddesiredrecordBucketaddresstablemayitselfbecomeverybig(largerthanmemory)CannotallocateverylargecontiguousareasondiskeitherSolution:B+-treestructuretolocatedesiredrecordinbucketaddresstableChangingsizeofbucketaddresstableisanexpensiveoperationLinearhashingisanalternativemechanismAllowsincrementalgrowthofitsdirectory(equivalenttobucketaddresstable)Atthecostofmorebucketoverflows ComparisonofOrderedIndexingandHashingCostofperiodicre-organizationRelativefrequencyofinsertionsanddeletionsIsitdesirabletooptimizeaverageaccesstimeattheexpenseofworst-caseaccesstime?Expectedtypeofqueries:Hashingisgenerallybetteratretrievingrecordshavingaspecifiedvalueofthekey.Ifrangequeriesarecommon,orderedindicesaretobepreferredInpractice:PostgreSQLsupportshashindices,butdiscouragesuseduetopoorperformanceOraclesupportsstatichashorganization,butnothashindicesSQLServersupportsonlyB+-trees BitmapIndicesBitmapindicesareaspecialtypeofindexdesignedforefficientqueryingonmultiplekeysRecordsinarelationareassumedtobenumberedsequentiallyfrom,say,0GivenanumbernitmustbeeasytoretrieverecordnParticularlyeasyifrecordsareoffixedsizeApplicableonattributesthattakeonarelativelysmallnumberofdistinctvaluesE.g.gender,country,state,…E.g.income-level(incomebrokenupintoasmallnumberoflevelssuchas0-9999,10000-19999,20000-50000,50000-infinity)Abitmapissimplyanarrayofbits BitmapIndices(Cont.)InitssimplestformabitmapindexonanattributehasabitmapforeachvalueoftheattributeBitmaphasasmanybitsasrecordsInabitmapforvaluev,thebitforarecordis1iftherecordhasthevaluevfortheattribute,andis0otherwise BitmapIndices(Cont.)BitmapindicesareusefulforqueriesonmultipleattributesnotparticularlyusefulforsingleattributequeriesQueriesareansweredusingbitmapoperationsIntersection(and)Union(or)Complementation(not)EachoperationtakestwobitmapsofthesamesizeandappliestheoperationoncorrespondingbitstogettheresultbitmapE.g.100110AND110011=100010100110OR110011=110111 NOT100110=011001MaleswithincomelevelL1:10010AND10100=10000Canthenretrieverequiredtuples.Countingnumberofmatchingtuplesisevenfaster BitmapIndices(Cont.)BitmapindicesgenerallyverysmallcomparedwithrelationsizeE.g.ifrecordis100bytes,spaceforasinglebitmapis1/800ofspaceusedbyrelation.Ifnumberofdistinctattributevaluesis8,bitmapisonly1%ofrelationsizeDeletionneedstobehandledproperlyExistencebitmaptonoteifthereisavalidrecordatarecordlocationNeededforcomplementationnot(A=v):(NOTbitmap-A-v)ANDExistenceBitmapShouldkeepbitmapsforallvalues,evennullvalueTocorrectlyhandleSQLnullsemanticsforNOT(A=v):intersectaboveresultwith(NOTbitmap-A-Null) EfficientImplementationofBitmapOperationsBitmapsarepackedintowords;asinglewordand(abasicCPUinstruction)computesandof32or64bitsatonceE.g.1-million-bitmapscanbeand-edwithjust31,250instructionCountingnumberof1scanbedonefastbyatrick:Useeachbytetoindexintoaprecomputedarrayof256elementseachstoringthecountof1sinthebinaryrepresentationCanusepairsofbytestospeedupfurtheratahighermemorycostAdduptheretrievedcountsBitmapscanbeusedinsteadofTuple-IDlistsatleaflevelsof B+-trees,forvaluesthathavealargenumberofmatchingrecordsWorthwhileif>1/64oftherecordshavethatvalue,assumingatuple-idis64bitsAbovetechniquemergesbenefitsofbitmapandB+-treeindices IndexDefinitioninSQLCreateanindexcreateindexon ()E.g.:createindexdept_indexoninstructor(dept_name)Usecreateuniqueindextoindirectlyspecifyandenforcetheconditionthatthesearchkeyisacandidatekeyisacandidatekey.NotreallyrequiredifSQLuniqueintegrityconstraintissupportedTodropanindexdropindexMostdatabasesystemsallowspecificationoftypeofindex,andclustering. EndofChapter Figure11.01 Figure11.15 PartitionedHashingHashvaluesaresplitintosegmentsthatdependoneachattributeofthesearch-key.(A1,A2,...,An)fornattributesearch-keyExample:n=2,forcustomer,search-keybeing (customer-street,customer-city)search-keyvaluehashvalue(Main,Harrison)101111 (Main,Brooklyn)101001 (Park,PaloAlto)010010(Spring,Brooklyn)001001(Alma,PaloAlto)110010Toanswerequalityqueryonsingleattribute,needtolookupmultiplebuckets.Similarineffecttogridfiles. GridFilesStructureusedtospeedtheprocessingofgeneralmultiplesearch-keyqueriesinvolvingoneormorecomparisonoperators.Thegridfilehasasinglegridarrayandonelinearscaleforeachsearch-keyattribute.Thegridarrayhasnumberofdimensionsequaltonumberofsearch-keyattributes.MultiplecellsofgridarraycanpointtosamebucketTofindthebucketforasearch-keyvalue,locatetherowandcolumnofitscellusingthelinearscalesandfollowpointer ExampleGridFileforaccount QueriesonaGridFileAgridfileontwoattributesAandBcanhandlequeriesofallfollowingformswithreasonableefficiency(a1Aa2)(b1Bb2)(a1Aa2b1Bb2),.E.g.,toanswer(a1Aa2b1Bb2),uselinearscalestofindcorrespondingcandidategridarraycells,andlookupallthebucketspointedtofromthosecells. GridFiles(Cont.)Duringinsertion,ifabucketbecomesfull,newbucketcanbecreatedifmorethanonecellpointstoit.Ideasimilartoextendablehashing,butonmultipledimensionsIfonlyonecellpointstoit,eitheranoverflowbucketmustbecreatedorthegridsizemustbeincreasedLinearscalesmustbechosentouniformlydistributerecordsacrosscells.Otherwisetherewillbetoomanyoverflowbuckets.Periodicre-organizationtoincreasegridsizewillhelp.Butreorganizationcanbeveryexpensive.Spaceoverheadofgridarraycanbehigh.R-trees(Chapter23)areanalternative'