• 3.20 MB
  • 2022-04-29 14:31:58 发布

数据库系统概念全套配套课件PPT ch10.ppt

  • 60页
  • 当前文档由用户上传发布,收益归属用户
  1. 1、本文档共5页,可阅读全部内容。
  2. 2、本文档内容版权归属内容提供方,所产生的收益全部归内容提供方所有。如果您对本文有版权争议,可选择认领,认领后既往收益都归您。
  3. 3、本文档由用户上传,本站不保证质量和数量令人满意,可能有诸多瑕疵,付费之前,请仔细先通过免费阅读内容等途径辨别内容交易风险。如存在严重挂羊头卖狗肉之情形,可联系本站下载客服投诉处理。
  4. 文档侵权举报电话:19940600175。
'Chapter10:StorageandFileStructure Chapter10:StorageandFileStructureOverviewofPhysicalStorageMediaMagneticDisksRAIDTertiaryStorageStorageAccessFileOrganizationOrganizationofRecordsinFilesData-DictionaryStorage ClassificationofPhysicalStorageMediaSpeedwithwhichdatacanbeaccessedCostperunitofdataReliabilitydatalossonpowerfailureorsystemcrashphysicalfailureofthestoragedeviceCandifferentiatestorageinto:volatilestorage:losescontentswhenpowerisswitchedoffnon-volatilestorage:Contentspersistevenwhenpowerisswitchedoff.Includessecondaryandtertiarystorage,aswellasbatter-backedupmain-memory. PhysicalStorageMediaCache–fastestandmostcostlyformofstorage;volatile;managedbythecomputersystemhardware.Mainmemory:fastaccess(10sto100sofnanoseconds;1nanosecond=10–9seconds)generallytoosmall(ortooexpensive)tostoretheentiredatabasecapacitiesofuptoafewGigabyteswidelyusedcurrentlyCapacitieshavegoneupandper-bytecostshavedecreasedsteadilyandrapidly(roughlyfactorof2every2to3years)Volatile—contentsofmainmemoryareusuallylostifapowerfailureorsystemcrashoccurs. PhysicalStorageMedia(Cont.)FlashmemoryDatasurvivespowerfailureDatacanbewrittenatalocationonlyonce,butlocationcanbeerasedandwrittentoagainCansupportonlyalimitednumber(10K–1M)ofwrite/erasecycles.ErasingofmemoryhastobedonetoanentirebankofmemoryReadsareroughlyasfastasmainmemoryButwritesareslow(fewmicroseconds),eraseisslowerWidelyusedinembeddeddevicessuchasdigitalcameras,phones,andUSBkeys PhysicalStorageMedia(Cont.)Magnetic-diskDataisstoredonspinningdisk,andread/writtenmagneticallyPrimarymediumforthelong-termstorageofdata;typicallystoresentiredatabase.Datamustbemovedfromdisktomainmemoryforaccess,andwrittenbackforstorageMuchsloweraccessthanmainmemory(moreonthislater)direct-access–possibletoreaddataondiskinanyorder,unlikemagnetictapeCapacitiesrangeuptoroughly1.5TBasof2009Muchlargercapacityandcost/bytethanmainmemory/flashmemoryGrowingconstantlyandrapidlywithtechnologyimprovements(factorof2to3every2years)Survivespowerfailuresandsystemcrashesdiskfailurecandestroydata,butisrare PhysicalStorageMedia(Cont.)Opticalstoragenon-volatile,dataisreadopticallyfromaspinningdiskusingalaserCD-ROM(640MB)andDVD(4.7to17GB)mostpopularformsBlu-raydisks:27GBto54GBWrite-one,read-many(WORM)opticaldisksusedforarchivalstorage(CD-R,DVD-R,DVD+R)Multiplewriteversionsalsoavailable(CD-RW,DVD-RW,DVD+RW,andDVD-RAM)ReadsandwritesareslowerthanwithmagneticdiskJuke-boxsystems,withlargenumbersofremovabledisks,afewdrives,andamechanismforautomaticloading/unloadingofdisksavailableforstoringlargevolumesofdata PhysicalStorageMedia(Cont.)Tapestoragenon-volatile,usedprimarilyforbackup(torecoverfromdiskfailure),andforarchivaldatasequential-access–muchslowerthandiskveryhighcapacity(40to300GBtapesavailable)tapecanberemovedfromdrivestoragecostsmuchcheaperthandisk,butdrivesareexpensiveTapejukeboxesavailableforstoringmassiveamountsofdatahundredsofterabytes(1terabyte=109bytes)toevenmultiplepetabytes(1petabyte=1012bytes) StorageHierarchy StorageHierarchy(Cont.)primarystorage:Fastestmediabutvolatile(cache,mainmemory).secondarystorage:nextlevelinhierarchy,non-volatile,moderatelyfastaccesstimealsocalledon-linestorageE.g.flashmemory,magneticdiskstertiarystorage:lowestlevelinhierarchy,non-volatile,slowaccesstimealsocalledoff-linestorageE.g.magnetictape,opticalstorage MagneticHardDiskMechanismNOTE:Diagramisschematic,andsimplifiesthestructureofactualdiskdrives MagneticDisksRead-writeheadPositionedveryclosetotheplattersurface(almosttouchingit)Readsorwritesmagneticallyencodedinformation.SurfaceofplatterdividedintocirculartracksOver50K-100KtracksperplatterontypicalharddisksEachtrackisdividedintosectors.Asectoristhesmallestunitofdatathatcanbereadorwritten.Sectorsizetypically512bytesTypicalsectorspertrack:500to1000(oninnertracks)to1000to2000(onoutertracks)Toread/writeasectordiskarmswingstopositionheadonrighttrackplatterspinscontinually;dataisread/writtenassectorpassesunderheadHead-diskassembliesmultiplediskplattersonasinglespindle(1to5usually)oneheadperplatter,mountedonacommonarm.Cylindericonsistsofithtrackofalltheplatters MagneticDisks(Cont.)Earliergenerationdisksweresusceptibletohead-crashesSurfaceofearliergenerationdiskshadmetal-oxidecoatingswhichwoulddisintegrateonheadcrashanddamagealldataondiskCurrentgenerationdisksarelesssusceptibletosuchdisastrousfailures,althoughindividualsectorsmaygetcorruptedDiskcontroller–interfacesbetweenthecomputersystemandthediskdrivehardware.acceptshigh-levelcommandstoreadorwriteasectorinitiatesactionssuchasmovingthediskarmtotherighttrackandactuallyreadingorwritingthedataComputesandattacheschecksumstoeachsectortoverifythatdataisreadbackcorrectlyIfdataiscorrupted,withveryhighprobabilitystoredchecksumwon’tmatchrecomputedchecksumEnsuressuccessfulwritingbyreadingbacksectorafterwritingitPerformsremappingofbadsectors DiskSubsystemMultipledisksconnectedtoacomputersystemthroughacontrollerControllersfunctionality(checksum,badsectorremapping)oftencarriedoutbyindividualdisks;reducesloadoncontrollerDiskinterfacestandardsfamiliesATA(ATadaptor)rangeofstandardsSATA(SerialATA)SCSI(SmallComputerSystemInterconnect)rangeofstandardsSAS(SerialAttachedSCSI)Severalvariantsofeachstandard(differentspeedsandcapabilities) DiskSubsystemDisksusuallyconnecteddirectlytocomputersystemInStorageAreaNetworks(SAN),alargenumberofdisksareconnectedbyahigh-speednetworktoanumberofserversInNetworkAttachedStorage(NAS)networkedstorageprovidesafilesysteminterfaceusingnetworkedfilesystemprotocol,insteadofprovidingadisksysteminterface PerformanceMeasuresofDisksAccesstime–thetimeittakesfromwhenareadorwriterequestisissuedtowhendatatransferbegins.Consistsof:Seektime–timeittakestorepositionthearmoverthecorrecttrack.Averageseektimeis1/2theworstcaseseektime.Wouldbe1/3ifalltrackshadthesamenumberofsectors,andweignorethetimetostartandstoparmmovement4to10millisecondsontypicaldisksRotationallatency–timeittakesforthesectortobeaccessedtoappearunderthehead.Averagelatencyis1/2oftheworstcaselatency.4to11millisecondsontypicaldisks(5400to15000r.p.m.)Data-transferrate–therateatwhichdatacanberetrievedfromorstoredtothedisk.25to100MBpersecondmaxrate,lowerforinnertracksMultipledisksmayshareacontroller,soratethatcontrollercanhandleisalsoimportantE.g.SATA:150MB/sec,SATA-II3Gb(300MB/sec)Ultra320SCSI:320MB/s,SAS(3to6Gb/sec)FiberChannel(FC2Gbor4Gb):256to512MB/s PerformanceMeasures(Cont.)Meantimetofailure(MTTF)–theaveragetimethediskisexpectedtoruncontinuouslywithoutanyfailure.Typically3to5yearsProbabilityoffailureofnewdisksisquitelow,correspondingtoa “theoreticalMTTF”of500,000to1,200,000hoursforanewdiskE.g.,anMTTFof1,200,000hoursforanewdiskmeansthatgiven1000relativelynewdisks,onanaverageonewillfailevery1200hoursMTTFdecreasesasdiskages OptimizationofDisk-BlockAccessBlock–acontiguoussequenceofsectorsfromasingletrackdataistransferredbetweendiskandmainmemoryinblockssizesrangefrom512bytestoseveralkilobytesSmallerblocks:moretransfersfromdiskLargerblocks:morespacewastedduetopartiallyfilledblocksTypicalblocksizestodayrangefrom4to16kilobytesDisk-arm-schedulingalgorithmsorderpendingaccessestotrackssothatdiskarmmovementisminimizedelevatoralgorithm:R1R5R2R4R3R6InnertrackOutertrack OptimizationofDiskBlockAccess(Cont.)Fileorganization–optimizeblockaccesstimebyorganizingtheblockstocorrespondtohowdatawillbeaccessedE.g.Storerelatedinformationonthesameornearbycylinders.FilesmaygetfragmentedovertimeE.g.ifdataisinsertedto/deletedfromthefileOrfreeblocksondiskarescattered,andnewlycreatedfilehasitsblocksscatteredoverthediskSequentialaccesstoafragmentedfileresultsinincreaseddiskarmmovementSomesystemshaveutilitiestodefragmentthefilesystem,inordertospeedupfileaccess Nonvolatilewritebuffersspeedupdiskwritesbywritingblockstoanon-volatileRAMbufferimmediatelyNon-volatileRAM:batterybackedupRAMorflashmemoryEvenifpowerfails,thedataissafeandwillbewrittentodiskwhenpowerreturnsControllerthenwritestodiskwheneverthediskhasnootherrequestsorrequesthasbeenpendingforsometimeDatabaseoperationsthatrequiredatatobesafelystoredbeforecontinuingcancontinuewithoutwaitingfordatatobewrittentodiskWritescanbereorderedtominimizediskarmmovementLogdisk–adiskdevotedtowritingasequentiallogofblockupdatesUsedexactlylikenonvolatileRAMWritetologdiskisveryfastsincenoseeksarerequiredNoneedforspecialhardware(NV-RAM)FilesystemstypicallyreorderwritestodisktoimproveperformanceJournalingfilesystemswritedatainsafeordertoNV-RAMorlogdiskReorderingwithoutjournaling:riskofcorruptionoffilesystemdataOptimizationofDiskBlockAccess(Cont.) FlashStorageNORflashvsNANDflashNANDflashusedwidelyforstorage,sinceitismuchcheaperthanNORflashrequirespage-at-a-timeread(page:512bytesto4KB)transferratearound20MB/secsolidstatedisks:usemultipleflashstoragedevicestoprovidehighertransferrateof100to200MB/seceraseisveryslow(1to2millisecs)eraseblockcontainsmultiplepagesremappingoflogicalpageaddressestophysicalpageaddressesavoidswaitingforerasetranslationtabletracksmappingalsostoredinalabelfieldofflashpageremappingcarriedoutbyflashtranslationlayerafter100,000to1,000,000erases,eraseblockbecomesunreliableandcannotbeusedwearleveling RAIDRAID:RedundantArraysofIndependentDisksdiskorganizationtechniquesthatmanagealargenumbersofdisks,providingaviewofasinglediskofhighcapacityandhighspeedbyusingmultipledisksinparallel,highreliabilitybystoringdataredundantly,sothatdatacanberecoveredevenifadiskfailsThechancethatsomediskoutofasetofNdiskswillfailismuchhigherthanthechancethataspecificsinglediskwillfail.E.g.,asystemwith100disks,eachwithMTTFof100,000hours(approx.11years),willhaveasystemMTTFof1000hours(approx.41days)TechniquesforusingredundancytoavoiddatalossarecriticalwithlargenumbersofdisksOriginallyacost-effectivealternativetolarge,expensivedisksIinRAIDoriginallystoodfor``inexpensive’’TodayRAIDsareusedfortheirhigherreliabilityandbandwidth.The“I”isinterpretedasindependent ImprovementofReliabilityviaRedundancyRedundancy–storeextrainformationthatcanbeusedtorebuildinformationlostinadiskfailureE.g.,Mirroring(orshadowing)Duplicateeverydisk.Logicaldiskconsistsoftwophysicaldisks.EverywriteiscarriedoutonbothdisksReadscantakeplacefromeitherdiskIfonediskinapairfails,datastillavailableintheotherDatalosswouldoccuronlyifadiskfails,anditsmirrordiskalsofailsbeforethesystemisrepairedProbabilityofcombinedeventisverysmallExceptfordependentfailuremodessuchasfireorbuildingcollapseorelectricalpowersurgesMeantimetodatalossdependsonmeantimetofailure, andmeantimetorepairE.g.MTTFof100,000hours,meantimetorepairof10hoursgivesmeantimetodatalossof500*106hours(or57,000years)foramirroredpairofdisks(ignoringdependentfailuremodes) ImprovementinPerformanceviaParallelismTwomaingoalsofparallelisminadisksystem:1.Loadbalancemultiplesmallaccessestoincreasethroughput2.Parallelizelargeaccessestoreduceresponsetime.Improvetransferratebystripingdataacrossmultipledisks.Bit-levelstriping–splitthebitsofeachbyteacrossmultipledisksInanarrayofeightdisks,writebitiofeachbytetodiski.Eachaccesscanreaddataateighttimestherateofasingledisk.Butseek/accesstimeworsethanforasinglediskBitlevelstripingisnotusedmuchanymoreBlock-levelstriping–withndisks,blockiofafilegoestodisk(imodn)+1RequestsfordifferentblockscanruninparalleliftheblocksresideondifferentdisksArequestforalongsequenceofblockscanutilizealldisksinparallel RAIDLevelsSchemestoprovideredundancyatlowercostbyusingdiskstripingcombinedwithparitybitsDifferentRAIDorganizations,orRAIDlevels,havedifferingcost,performanceandreliabilitycharacteristicsRAIDLevel1:MirroreddiskswithblockstripingOffersbestwriteperformance.Popularforapplicationssuchasstoringlogfilesinadatabasesystem.RAIDLevel0:Blockstriping;non-redundant.Usedinhigh-performanceapplicationswheredatalossisnotcritical. RAIDLevels(Cont.)RAIDLevel2:Memory-StyleError-Correcting-Codes(ECC)withbitstriping.RAIDLevel3:Bit-InterleavedParityasingleparitybitisenoughforerrorcorrection,notjustdetection,sinceweknowwhichdiskhasfailedWhenwritingdata,correspondingparitybitsmustalsobecomputedandwrittentoaparitybitdiskTorecoverdatainadamageddisk,computeXORofbitsfromotherdisks(includingparitybitdisk) RAIDLevels(Cont.)RAIDLevel3(Cont.)Fasterdatatransferthanwithasingledisk,butfewerI/OspersecondsinceeverydiskhastoparticipateineveryI/O.SubsumesLevel2(providesallitsbenefits,atlowercost).RAIDLevel4:Block-InterleavedParity;usesblock-levelstriping,andkeepsaparityblockonaseparatediskforcorrespondingblocksfromNotherdisks.Whenwritingdatablock,correspondingblockofparitybitsmustalsobecomputedandwrittentoparitydiskTofindvalueofadamagedblock,computeXORofbitsfromcorrespondingblocks(includingparityblock)fromotherdisks. RAIDLevels(Cont.)RAIDLevel4(Cont.)ProvideshigherI/OratesforindependentblockreadsthanLevel3blockreadgoestoasingledisk,soblocksstoredondifferentdiskscanbereadinparallelProvideshightransferratesforreadsofmultipleblocksthanno-stripingBeforewritingablock,paritydatamustbecomputedCanbedonebyusingoldparityblock,oldvalueofcurrentblockandnewvalueofcurrentblock(2blockreads+2blockwrites)OrbyrecomputingtheparityvalueusingthenewvaluesofblockscorrespondingtotheparityblockMoreefficientforwritinglargeamountsofdatasequentiallyParityblockbecomesabottleneckforindependentblockwritessinceeveryblockwritealsowritestoparitydisk RAIDLevels(Cont.)RAIDLevel5:Block-InterleavedDistributedParity;partitionsdataandparityamongallN+1disks,ratherthanstoringdatainNdisksandparityin1disk.E.g.,with5disks,parityblockfornthsetofblocksisstoredondisk(nmod5)+1,withthedatablocksstoredontheother4disks. RAIDLevels(Cont.)RAIDLevel5(Cont.)HigherI/OratesthanLevel4.Blockwritesoccurinparalleliftheblocksandtheirparityblocksareondifferentdisks.SubsumesLevel4:providessamebenefits,butavoidsbottleneckofparitydisk.RAIDLevel6:P+QRedundancyscheme;similartoLevel5,butstoresextraredundantinformationtoguardagainstmultiplediskfailures.BetterreliabilitythanLevel5atahighercost;notusedaswidely. ChoiceofRAIDLevelFactorsinchoosingRAIDlevelMonetarycostPerformance:NumberofI/Ooperationspersecond,andbandwidthduringnormaloperationPerformanceduringfailurePerformanceduringrebuildoffaileddiskIncludingtimetakentorebuildfaileddiskRAID0isusedonlywhendatasafetyisnotimportantE.g.datacanberecoveredquicklyfromothersourcesLevel2and4neverusedsincetheyaresubsumedby3and5Level3isnotusedanymoresincebit-stripingforcessingleblockreadstoaccessalldisks,wastingdiskarmmovement,whichblockstriping(level5)avoidsLevel6israrelyusedsincelevels1and5offeradequatesafetyformostapplications ChoiceofRAIDLevel(Cont.)Level1providesmuchbetterwriteperformancethanlevel5Level5requiresatleast2blockreadsand2blockwritestowriteasingleblock,whereasLevel1onlyrequires2blockwritesLevel1preferredforhighupdateenvironmentssuchaslogdisksLevel1hadhigherstoragecostthanlevel5diskdrivecapacitiesincreasingrapidly(50%/year)whereasdiskaccesstimeshavedecreasedmuchless(x3in10years)I/Orequirementshaveincreasedgreatly,e.g.forWebserversWhenenoughdiskshavebeenboughttosatisfyrequiredrateofI/O,theyoftenhavesparestoragecapacitysothereisoftennoextramonetarycostforLevel1!Level5ispreferredforapplicationswithlowupdaterate, andlargeamountsofdataLevel1ispreferredforallotherapplications HardwareIssuesSoftwareRAID:RAIDimplementationsdoneentirelyinsoftware,withnospecialhardwaresupportHardwareRAID:RAIDimplementationswithspecialhardwareUsenon-volatileRAMtorecordwritesthatarebeingexecutedBeware:powerfailureduringwritecanresultincorrupteddiskE.g.failureafterwritingoneblockbutbeforewritingthesecondinamirroredsystemSuchcorrupteddatamustbedetectedwhenpowerisrestoredRecoveryfromcorruptionissimilartorecoveryfromfaileddiskNV-RAMhelpstoefficientlydetectedpotentiallycorruptedblocksOtherwiseallblocksofdiskmustbereadandcomparedwithmirror/parityblock HardwareIssues(Cont.)Latentfailures:datasuccessfullywrittenearliergetsdamagedcanresultindatalossevenifonlyonediskfailsDatascrubbing:continuallyscanforlatentfailures,andrecoverfromcopy/parityHotswapping:replacementofdiskwhilesystemisrunning,withoutpowerdownSupportedbysomehardwareRAIDsystems,reducestimetorecovery,andimprovesavailabilitygreatlyManysystemsmaintainsparediskswhicharekeptonline,andusedasreplacementsforfaileddisksimmediatelyondetectionoffailureReducestimetorecoverygreatlyManyhardwareRAIDsystemsensurethatasinglepointoffailurewillnotstopthefunctioningofthesystembyusingRedundantpowersupplieswithbatterybackupMultiplecontrollersandmultipleinterconnectionstoguardagainstcontroller/interconnectionfailures OpticalDisksCompactdisk-readonlymemory(CD-ROM)Removabledisks,640MBperdiskSeektimeabout100msec(opticalreadheadisheavierandslower)Higherlatency(3000RPM)andlowerdata-transferrates(3-6MB/s)comparedtomagneticdisksDigitalVideoDisk(DVD)DVD-5holds4.7GB,andDVD-9holds8.5GBDVD-10andDVD-18aredoublesidedformatswithcapacitiesof9.4GBand17GBBlu-rayDVD:27GB(54GBfordoublesideddisk)Slowseektime,forsamereasonsasCD-ROMRecordonceversions(CD-RandDVD-R)arepopulardatacanonlybewrittenonce,andcannotbeerased.highcapacityandlonglifetime;usedforarchivalstorageMulti-writeversions(CD-RW,DVD-RW,DVD+RWandDVD-RAM)alsoavailable MagneticTapesHoldlargevolumesofdataandprovidehightransferratesFewGBforDAT(DigitalAudioTape)format,10-40GBwithDLT(DigitalLinearTape)format,100GB+withUltriumformat,and330GBwithAmpexhelicalscanformatTransferratesfromfewto10sofMB/sTapesarecheap,butcostofdrivesisveryhighVeryslowaccesstimeincomparisontomagneticandopticaldiskslimitedtosequentialaccess.Someformats(Accelis)providefasterseek(10sofseconds)atcostoflowercapacityUsedmainlyforbackup,forstorageofinfrequentlyusedinformation,andasanoff-linemediumfortransferringinformationfromonesystemtoanother.TapejukeboxesusedforverylargecapacitystorageMultiplepetabyes(1015bytes) FileOrganization,RecordOrganizationandStorageAccess FileOrganizationThedatabaseisstoredasacollectionoffiles.Eachfileisasequenceofrecords.Arecordisasequenceoffields.Oneapproach:assumerecordsizeisfixedeachfilehasrecordsofoneparticulartypeonlydifferentfilesareusedfordifferentrelationsThiscaseiseasiesttoimplement;willconsidervariablelengthrecordslater. Fixed-LengthRecordsSimpleapproach:Storerecordistartingfrombyten(i–1),wherenisthesizeofeachrecord.RecordaccessissimplebutrecordsmaycrossblocksModification:donotallowrecordstocrossblockboundariesDeletionofrecordi:alternatives:moverecordsi+1,...,ntoi,...,n–1moverecordntoidonotmoverecords,but linkallfreerecordsonafreelist Deletingrecord3andcompacting Deletingrecord3andmovinglastrecord FreeListsStoretheaddressofthefirstdeletedrecordinthefileheader.Usethisfirstrecordtostoretheaddressoftheseconddeletedrecord,andsoonCanthinkofthesestoredaddressesaspointerssincethey“point”tothelocationofarecord.Morespaceefficientrepresentation:reusespacefornormalattributesoffreerecordstostorepointers.(Nopointersstoredinin-userecords.) Variable-LengthRecordsVariable-lengthrecordsariseindatabasesystemsinseveralways:Storageofmultiplerecordtypesinafile.Recordtypesthatallowvariablelengthsforoneormorefieldssuchasstrings(varchar)Recordtypesthatallowrepeatingfields(usedinsomeolderdatamodels).AttributesarestoredinorderVariablelengthattributesrepresentedbyfixedsize(offset,length),withactualdatastoredafterallfixedlengthattributesNullvaluesrepresentedbynull-valuebitmap Variable-LengthRecords:SlottedPageStructureSlottedpageheadercontains:numberofrecordentriesendoffreespaceintheblocklocationandsizeofeachrecordRecordscanbemovedaroundwithinapagetokeepthemcontiguouswithnoemptyspacebetweenthem;entryintheheadermustbeupdated.Pointersshouldnotpointdirectlytorecord—insteadtheyshouldpointtotheentryfortherecordinheader. OrganizationofRecordsinFilesHeap–arecordcanbeplacedanywhereinthefilewherethereisspaceSequential–storerecordsinsequentialorder,basedonthevalueofthesearchkeyofeachrecordHashing–ahashfunctioncomputedonsomeattributeofeachrecord;theresultspecifiesinwhichblockofthefiletherecordshouldbeplacedRecordsofeachrelationmaybestoredinaseparatefile.InamultitableclusteringfileorganizationrecordsofseveraldifferentrelationscanbestoredinthesamefileMotivation:storerelatedrecordsonthesameblocktominimizeI/O SequentialFileOrganizationSuitableforapplicationsthatrequiresequentialprocessingoftheentirefileTherecordsinthefileareorderedbyasearch-key SequentialFileOrganization(Cont.)Deletion–usepointerchainsInsertion–locatethepositionwheretherecordistobeinsertedifthereisfreespaceinsertthereifnofreespace,inserttherecordinanoverflowblockIneithercase,pointerchainmustbeupdatedNeedtoreorganizethefile fromtimetotimetorestore sequentialorder MultitableClusteringFileOrganizationStoreseveralrelationsinonefileusingamultitableclusteringfileorganizationdepartmentinstructormultitableclusteringofdepartmentandinstructor MultitableClusteringFileOrganization(cont.)goodforqueriesinvolvingdepartmentinstructor,andforqueriesinvolvingonesingledepartmentanditsinstructorsbadforqueriesinvolvingonlydepartmentresultsinvariablesizerecordsCanaddpointerchainstolinkrecordsofaparticularrelation DataDictionaryStorageInformationaboutrelationsnamesofrelationsnames,typesandlengthsofattributesofeachrelationnamesanddefinitionsofviewsintegrityconstraintsUserandaccountinginformation,includingpasswordsStatisticalanddescriptivedatanumberoftuplesineachrelationPhysicalfileorganizationinformationHowrelationisstored(sequential/hash/…)PhysicallocationofrelationInformationaboutindices(Chapter11)TheDatadictionary(alsocalledsystemcatalog)storesmetadata;thatis,dataaboutdata,suchas RelationalRepresentationofSystemMetadataRelationalrepresentationondiskSpecializeddatastructuresdesignedforefficientaccess,inmemory StorageAccessAdatabasefileispartitionedintofixed-lengthstorageunitscalledblocks.Blocksareunitsofbothstorageallocationanddatatransfer.Databasesystemseekstominimizethenumberofblocktransfersbetweenthediskandmemory.Wecanreducethenumberofdiskaccessesbykeepingasmanyblocksaspossibleinmainmemory.Buffer–portionofmainmemoryavailabletostorecopiesofdiskblocks.Buffermanager–subsystemresponsibleforallocatingbufferspaceinmainmemory. BufferManagerProgramscallonthebuffermanagerwhentheyneedablockfromdisk.Iftheblockisalreadyinthebuffer,buffermanagerreturnstheaddressoftheblockinmainmemoryIftheblockisnotinthebuffer,thebuffermanagerAllocatesspaceinthebufferfortheblockReplacing(throwingout)someotherblock,ifrequired,tomakespaceforthenewblock.Replacedblockwrittenbacktodiskonlyifitwasmodifiedsincethemostrecenttimethatitwaswrittento/fetchedfromthedisk.Readstheblockfromthedisktothebuffer,andreturnstheaddressoftheblockinmainmemorytorequester. Buffer-ReplacementPoliciesMostoperatingsystemsreplacetheblockleastrecentlyused(LRUstrategy)IdeabehindLRU–usepastpatternofblockreferencesasapredictoroffuturereferencesQuerieshavewell-definedaccesspatterns(suchassequentialscans),andadatabasesystemcanusetheinformationinauser’squerytopredictfuturereferencesLRUcanbeabadstrategyforcertainaccesspatternsinvolvingrepeatedscansofdataForexample:whencomputingthejoinof2relationsrandsbyanestedloops foreachtupletrofrdo foreachtupletsofsdo ifthetuplestrandtsmatch…Mixedstrategywithhintsonreplacementstrategyprovided bythequeryoptimizerispreferable Buffer-ReplacementPolicies(Cont.)Pinnedblock–memoryblockthatisnotallowedtobewrittenbacktodisk.Toss-immediatestrategy–freesthespaceoccupiedbyablockassoonasthefinaltupleofthatblockhasbeenprocessedMostrecentlyused(MRU)strategy–systemmustpintheblockcurrentlybeingprocessed.Afterthefinaltupleofthatblockhasbeenprocessed,theblockisunpinned,anditbecomesthemostrecentlyusedblock.BuffermanagercanusestatisticalinformationregardingtheprobabilitythatarequestwillreferenceaparticularrelationE.g.,thedatadictionaryisfrequentlyaccessed.Heuristic:keepdata-dictionaryblocksinmainmemorybufferBuffermanagersalsosupportforcedoutputofblocksforthepurposeofrecovery(moreinChapter16) EndofChapter10 Figure10.03 Figure10.18 Figurein-10.1'