- 5.14 MB
- 2022-04-29 14:31:55 发布
- 1、本文档共5页,可阅读全部内容。
- 2、本文档内容版权归属内容提供方,所产生的收益全部归内容提供方所有。如果您对本文有版权争议,可选择认领,认领后既往收益都归您。
- 3、本文档由用户上传,本站不保证质量和数量令人满意,可能有诸多瑕疵,付费之前,请仔细先通过免费阅读内容等途径辨别内容交易风险。如存在严重挂羊头卖狗肉之情形,可联系本站下载客服投诉处理。
- 文档侵权举报电话:19940600175。
'Chapter11:IndexingandHashing
Chapter11:IndexingandHashingBasicConceptsOrderedIndicesB+-TreeIndexFilesB-TreeIndexFilesStaticHashingDynamicHashingComparisonofOrderedIndexingandHashingIndexDefinitioninSQLMultiple-KeyAccess
BasicConceptsIndexingmechanismsusedtospeedupaccesstodesireddata.E.g.,authorcataloginlibrarySearchKey-attributetosetofattributesusedtolookuprecordsinafile.Anindexfileconsistsofrecords(calledindexentries)oftheformIndexfilesaretypicallymuchsmallerthantheoriginalfileTwobasickindsofindices:Orderedindices:searchkeysarestoredinsortedorderHashindices:searchkeysaredistributeduniformlyacross“buckets”usinga“hashfunction”.search-keypointer
IndexEvaluationMetricsAccesstypessupportedefficiently.E.g.,recordswithaspecifiedvalueintheattributeorrecordswithanattributevaluefallinginaspecifiedrangeofvalues.AccesstimeInsertiontimeDeletiontimeSpaceoverhead
OrderedIndicesInanorderedindex,indexentriesarestoredsortedonthesearchkeyvalue.E.g.,authorcataloginlibrary.Primaryindex:inasequentiallyorderedfile,theindexwhosesearchkeyspecifiesthesequentialorderofthefile.AlsocalledclusteringindexThesearchkeyofaprimaryindexisusuallybutnotnecessarilytheprimarykey.Secondaryindex:anindexwhosesearchkeyspecifiesanorderdifferentfromthesequentialorderofthefile.Alsocallednon-clusteringindex.Index-sequentialfile:orderedsequentialfilewithaprimaryindex.
DenseIndexFilesDenseindex—Indexrecordappearsforeverysearch-keyvalueinthefile.E.g.indexonIDattributeofinstructorrelation
DenseIndexFiles(Cont.)Denseindexondept_name,withinstructorfilesortedondept_name
SparseIndexFilesSparseIndex:containsindexrecordsforonlysomesearch-keyvalues.Applicablewhenrecordsaresequentiallyorderedonsearch-keyTolocatearecordwithsearch-keyvalueKwe:Findindexrecordwithlargestsearch-keyvalueij(morethanonepointertobucketj)allocateanewbucketz,andsetij=iz=(ij+1)Updatethesecondhalfofthebucketaddresstableentriesoriginallypointingtoj,topointtozremoveeachrecordinbucketjandreinsert(injorz)recomputenewbucketforKjandinsertrecordinthebucket(furthersplittingisrequiredifthebucketisstillfull)Ifi=ij(onlyonepointertobucketj)Ifireachessomelimitb,ortoomanysplitshavehappenedinthisinsertion,createanoverflowbucketElseincrementianddoublethesizeofthebucketaddresstable.replaceeachentryinthetablebytwoentriesthatpointtothesamebucket.recomputenewbucketaddresstableentryforKjNowi>ijsousethefirstcaseabove.Tosplitabucketjwheninsertingrecordwithsearch-keyvalueKj:
DeletioninExtendableHashStructureTodeleteakeyvalue,locateitinitsbucketandremoveit.Thebucketitselfcanberemovedifitbecomesempty(withappropriateupdatestothebucketaddresstable).Coalescingofbucketscanbedone(cancoalesceonlywitha“buddy”buckethavingsamevalueofijandsameij–1prefix,ifitispresent)DecreasingbucketaddresstablesizeisalsopossibleNote:decreasingbucketaddresstablesizeisanexpensiveoperationandshouldbedoneonlyifnumberofbucketsbecomesmuchsmallerthanthesizeofthetable
UseofExtendableHashStructure:Example
Example(Cont.)InitialHashstructure;bucketsize=2
Example(Cont.)Hashstructureafterinsertionof“Mozart”,“Srinivasan”,and“Wu”records
Example(Cont.)HashstructureafterinsertionofEinsteinrecord
Example(Cont.)HashstructureafterinsertionofGoldandElSaidrecords
Example(Cont.)HashstructureafterinsertionofKatzrecord
Example(Cont.)Andafterinsertionofelevenrecords
Example(Cont.)AndafterinsertionofKimrecordinprevioushashstructure
ExtendableHashingvs.OtherSchemesBenefitsofextendablehashing:HashperformancedoesnotdegradewithgrowthoffileMinimalspaceoverheadDisadvantagesofextendablehashingExtralevelofindirectiontofinddesiredrecordBucketaddresstablemayitselfbecomeverybig(largerthanmemory)CannotallocateverylargecontiguousareasondiskeitherSolution:B+-treestructuretolocatedesiredrecordinbucketaddresstableChangingsizeofbucketaddresstableisanexpensiveoperationLinearhashingisanalternativemechanismAllowsincrementalgrowthofitsdirectory(equivalenttobucketaddresstable)Atthecostofmorebucketoverflows
ComparisonofOrderedIndexingandHashingCostofperiodicre-organizationRelativefrequencyofinsertionsanddeletionsIsitdesirabletooptimizeaverageaccesstimeattheexpenseofworst-caseaccesstime?Expectedtypeofqueries:Hashingisgenerallybetteratretrievingrecordshavingaspecifiedvalueofthekey.Ifrangequeriesarecommon,orderedindicesaretobepreferredInpractice:PostgreSQLsupportshashindices,butdiscouragesuseduetopoorperformanceOraclesupportsstatichashorganization,butnothashindicesSQLServersupportsonlyB+-trees
BitmapIndicesBitmapindicesareaspecialtypeofindexdesignedforefficientqueryingonmultiplekeysRecordsinarelationareassumedtobenumberedsequentiallyfrom,say,0GivenanumbernitmustbeeasytoretrieverecordnParticularlyeasyifrecordsareoffixedsizeApplicableonattributesthattakeonarelativelysmallnumberofdistinctvaluesE.g.gender,country,state,…E.g.income-level(incomebrokenupintoasmallnumberoflevelssuchas0-9999,10000-19999,20000-50000,50000-infinity)Abitmapissimplyanarrayofbits
BitmapIndices(Cont.)InitssimplestformabitmapindexonanattributehasabitmapforeachvalueoftheattributeBitmaphasasmanybitsasrecordsInabitmapforvaluev,thebitforarecordis1iftherecordhasthevaluevfortheattribute,andis0otherwise
BitmapIndices(Cont.)BitmapindicesareusefulforqueriesonmultipleattributesnotparticularlyusefulforsingleattributequeriesQueriesareansweredusingbitmapoperationsIntersection(and)Union(or)Complementation(not)EachoperationtakestwobitmapsofthesamesizeandappliestheoperationoncorrespondingbitstogettheresultbitmapE.g.100110AND110011=100010100110OR110011=110111NOT100110=011001MaleswithincomelevelL1:10010AND10100=10000Canthenretrieverequiredtuples.Countingnumberofmatchingtuplesisevenfaster
BitmapIndices(Cont.)BitmapindicesgenerallyverysmallcomparedwithrelationsizeE.g.ifrecordis100bytes,spaceforasinglebitmapis1/800ofspaceusedbyrelation.Ifnumberofdistinctattributevaluesis8,bitmapisonly1%ofrelationsizeDeletionneedstobehandledproperlyExistencebitmaptonoteifthereisavalidrecordatarecordlocationNeededforcomplementationnot(A=v):(NOTbitmap-A-v)ANDExistenceBitmapShouldkeepbitmapsforallvalues,evennullvalueTocorrectlyhandleSQLnullsemanticsforNOT(A=v):intersectaboveresultwith(NOTbitmap-A-Null)
EfficientImplementationofBitmapOperationsBitmapsarepackedintowords;asinglewordand(abasicCPUinstruction)computesandof32or64bitsatonceE.g.1-million-bitmapscanbeand-edwithjust31,250instructionCountingnumberof1scanbedonefastbyatrick:Useeachbytetoindexintoaprecomputedarrayof256elementseachstoringthecountof1sinthebinaryrepresentationCanusepairsofbytestospeedupfurtheratahighermemorycostAdduptheretrievedcountsBitmapscanbeusedinsteadofTuple-IDlistsatleaflevelsofB+-trees,forvaluesthathavealargenumberofmatchingrecordsWorthwhileif>1/64oftherecordshavethatvalue,assumingatuple-idis64bitsAbovetechniquemergesbenefitsofbitmapandB+-treeindices
IndexDefinitioninSQLCreateanindexcreateindexon()E.g.:createindexdept_indexoninstructor(dept_name)Usecreateuniqueindextoindirectlyspecifyandenforcetheconditionthatthesearchkeyisacandidatekeyisacandidatekey.NotreallyrequiredifSQLuniqueintegrityconstraintissupportedTodropanindexdropindexMostdatabasesystemsallowspecificationoftypeofindex,andclustering.
EndofChapter
Figure11.01
Figure11.15
PartitionedHashingHashvaluesaresplitintosegmentsthatdependoneachattributeofthesearch-key.(A1,A2,...,An)fornattributesearch-keyExample:n=2,forcustomer,search-keybeing(customer-street,customer-city)search-keyvaluehashvalue(Main,Harrison)101111(Main,Brooklyn)101001(Park,PaloAlto)010010(Spring,Brooklyn)001001(Alma,PaloAlto)110010Toanswerequalityqueryonsingleattribute,needtolookupmultiplebuckets.Similarineffecttogridfiles.
GridFilesStructureusedtospeedtheprocessingofgeneralmultiplesearch-keyqueriesinvolvingoneormorecomparisonoperators.Thegridfilehasasinglegridarrayandonelinearscaleforeachsearch-keyattribute.Thegridarrayhasnumberofdimensionsequaltonumberofsearch-keyattributes.MultiplecellsofgridarraycanpointtosamebucketTofindthebucketforasearch-keyvalue,locatetherowandcolumnofitscellusingthelinearscalesandfollowpointer
ExampleGridFileforaccount
QueriesonaGridFileAgridfileontwoattributesAandBcanhandlequeriesofallfollowingformswithreasonableefficiency(a1Aa2)(b1Bb2)(a1Aa2b1Bb2),.E.g.,toanswer(a1Aa2b1Bb2),uselinearscalestofindcorrespondingcandidategridarraycells,andlookupallthebucketspointedtofromthosecells.
GridFiles(Cont.)Duringinsertion,ifabucketbecomesfull,newbucketcanbecreatedifmorethanonecellpointstoit.Ideasimilartoextendablehashing,butonmultipledimensionsIfonlyonecellpointstoit,eitheranoverflowbucketmustbecreatedorthegridsizemustbeincreasedLinearscalesmustbechosentouniformlydistributerecordsacrosscells.Otherwisetherewillbetoomanyoverflowbuckets.Periodicre-organizationtoincreasegridsizewillhelp.Butreorganizationcanbeveryexpensive.Spaceoverheadofgridarraycanbehigh.R-trees(Chapter23)areanalternative'
您可能关注的文档
- 数据库系统概念全套配套课件PPT ch13.ppt
- 数据库系统概念全套配套课件PPT ch5.ppt
- 数据库系统概念全套配套课件PPT ch3.ppt
- 数据库系统概念全套配套课件PPT ch1.ppt
- 数据库系统概念全套配套课件PPT ch25.ppt
- 数据库系统概念全套配套课件PPT ch24.ppt
- 数据库系统概念全套配套课件PPT ch17.ppt
- 数据库系统概念全套配套课件PPT ch20.ppt
- 数据库系统概念全套配套课件PPT ch19.ppt
- 数据库系统概念全套配套课件PPT ch14.ppt
- 数据库系统概念全套配套课件PPT ch10.ppt
- 数据库系统概念全套配套课件PPT ch8.ppt
- 数据库系统概念全套配套课件PPT appC.ppt
- 土木工程材料第2版柯国军配套教学课件PPT 03土木工程材料.ppt
- 土木工程材料第2版柯国军配套教学课件PPT 02土木工程材料.ppt
- 土木工程材料第2版柯国军配套教学课件PPT 15土木工程材料.ppt
- 新编经济法高职财经大类专业基础课 配套教学课件PPT 第四章.ppt
- 新编经济法高职财经大类专业基础课 配套教学课件PPT 第三章.ppt