- 3.42 MB
- 2022-04-29 14:33:35 发布
- 1、本文档共5页,可阅读全部内容。
- 2、本文档内容版权归属内容提供方,所产生的收益全部归内容提供方所有。如果您对本文有版权争议,可选择认领,认领后既往收益都归您。
- 3、本文档由用户上传,本站不保证质量和数量令人满意,可能有诸多瑕疵,付费之前,请仔细先通过免费阅读内容等途径辨别内容交易风险。如存在严重挂羊头卖狗肉之情形,可联系本站下载客服投诉处理。
- 文档侵权举报电话:19940600175。
'Chapter16:RecoverySystem
Chapter16:RecoverySystemFailureClassificationStorageStructureRecoveryandAtomicityLog-BasedRecoveryRemoteBackupSystems
FailureClassificationTransactionfailure:Logicalerrors:transactioncannotcompleteduetosomeinternalerrorconditionSystemerrors:thedatabasesystemmustterminateanactivetransactionduetoanerrorcondition(e.g.,deadlock)Systemcrash:apowerfailureorotherhardwareorsoftwarefailurecausesthesystemtocrash.Fail-stopassumption:non-volatilestoragecontentsareassumedtonotbecorruptedbysystemcrashDatabasesystemshavenumerousintegritycheckstopreventcorruptionofdiskdataDiskfailure:aheadcrashorsimilardiskfailuredestroysallorpartofdiskstorageDestructionisassumedtobedetectable:diskdrivesusechecksumstodetectfailures
RecoveryAlgorithmsConsidertransactionTithattransfers$50fromaccountAtoaccountBTwoupdates:subtract50fromAandadd50toBTransactionTirequiresupdatestoAandBtobeoutputtothedatabase.Afailuremayoccurafteroneofthesemodificationshavebeenmadebutbeforebothofthemaremade.ModifyingthedatabasewithoutensuringthatthetransactionwillcommitmayleavethedatabaseinaninconsistentstateNotmodifyingthedatabasemayresultinlostupdatesiffailureoccursjustaftertransactioncommitsRecoveryalgorithmshavetwopartsActionstakenduringnormaltransactionprocessingtoensureenoughinformationexiststorecoverfromfailuresActionstakenafterafailuretorecoverthedatabasecontentstoastatethatensuresatomicity,consistencyanddurability
StorageStructureVolatilestorage:doesnotsurvivesystemcrashesexamples:mainmemory,cachememoryNonvolatilestorage:survivessystemcrashesexamples:disk,tape,flashmemory,non-volatile(batterybackedup)RAMbutmaystillfail,losingdataStablestorage:amythicalformofstoragethatsurvivesallfailuresapproximatedbymaintainingmultiplecopiesondistinctnonvolatilemediaSeebookformoredetailsonhowtoimplementstablestorage
Stable-StorageImplementationMaintainmultiplecopiesofeachblockonseparatediskscopiescanbeatremotesitestoprotectagainstdisasterssuchasfireorflooding.Failureduringdatatransfercanstillresultininconsistentcopies:BlocktransfercanresultinSuccessfulcompletionPartialfailure:destinationblockhasincorrectinformationTotalfailure:destinationblockwasneverupdatedProtectingstoragemediafromfailureduringdatatransfer(onesolution):Executeoutputoperationasfollows(assumingtwocopiesofeachblock):Writetheinformationontothefirstphysicalblock.Whenthefirstwritesuccessfullycompletes,writethesameinformationontothesecondphysicalblock.Theoutputiscompletedonlyafterthesecondwritesuccessfullycompletes.
Stable-StorageImplementation(Cont.)Protectingstoragemediafromfailureduringdatatransfer(cont.):Copiesofablockmaydifferduetofailureduringoutputoperation.Torecoverfromfailure:Firstfindinconsistentblocks:Expensivesolution:Comparethetwocopiesofeverydiskblock.Bettersolution:Recordin-progressdiskwritesonnon-volatilestorage(Non-volatileRAMorspecialareaofdisk).Usethisinformationduringrecoverytofindblocksthatmaybeinconsistent,andonlycomparecopiesofthese.UsedinhardwareRAIDsystemsIfeithercopyofaninconsistentblockisdetectedtohaveanerror(badchecksum),overwriteitbytheothercopy.Ifbothhavenoerror,butaredifferent,overwritethesecondblockbythefirstblock.
DataAccessPhysicalblocksarethoseblocksresidingonthedisk.Bufferblocksaretheblocksresidingtemporarilyinmainmemory.Blockmovementsbetweendiskandmainmemoryareinitiatedthroughthefollowingtwooperations:input(B)transfersthephysicalblockBtomainmemory.output(B)transfersthebufferblockBtothedisk,andreplacestheappropriatephysicalblockthere.Weassume,forsimplicity,thateachdataitemfitsin,andisstoredinside,asingleblock.
ExampleofDataAccessXYABx1y1bufferBufferBlockABufferBlockBinput(A)output(B)read(X)write(Y)diskworkareaofT1workareaofT2memoryx2
DataAccess(Cont.)EachtransactionTihasitsprivatework-areainwhichlocalcopiesofalldataitemsaccessedandupdatedbyitarekept.Ti"slocalcopyofadataitemXiscalledxi.Transferringdataitemsbetweensystembufferblocksanditsprivatework-areadoneby:read(X)assignsthevalueofdataitemXtothelocalvariablexi.write(X)assignsthevalueoflocalvariablexitodataitem{X}inthebufferblock.Note:output(BX)neednotimmediatelyfollowwrite(X).Systemcanperformtheoutputoperationwhenitdeemsfit.TransactionsMustperformread(X)beforeaccessingXforthefirsttime(subsequentreadscanbefromlocalcopy)write(X)canbeexecutedatanytimebeforethetransactioncommits
RecoveryandAtomicityToensureatomicitydespitefailures,wefirstoutputinformationdescribingthemodificationstostablestoragewithoutmodifyingthedatabaseitself.Westudylog-basedrecoverymechanismsindetailWefirstpresentkeyconceptsAndthenpresenttheactualrecoveryalgorithmLessusedalternative:shadow-paging(briefdetailsinbook)
Log-BasedRecoveryAlogiskeptonstablestorage.Thelogisasequenceoflogrecords,andmaintainsarecordofupdateactivitiesonthedatabase.WhentransactionTistarts,itregistersitselfbywritingalogrecordBeforeTiexecuteswrite(X),alogrecordiswritten,whereV1isthevalueofXbeforethewrite(theoldvalue),andV2isthevaluetobewrittentoX(thenewvalue).WhenTifinishesitlaststatement,thelogrecordiswritten.TwoapproachesusinglogsDeferreddatabasemodificationImmediatedatabasemodification
ImmediateDatabaseModificationTheimmediate-modificationschemeallowsupdatesofanuncommittedtransactiontobemadetothebuffer,orthediskitself,beforethetransactioncommitsUpdatelogrecordmustbewrittenbeforedatabaseitemiswrittenWeassumethatthelogrecordisoutputdirectlytostablestorage(Willseelaterthathowtopostponelogrecordoutputtosomeextent)OutputofupdatedblockstostablestoragecantakeplaceatanytimebeforeoraftertransactioncommitOrderinwhichblocksareoutputcanbedifferentfromtheorderinwhichtheyarewritten.Thedeferred-modificationschemeperformsupdatestobuffer/diskonlyatthetimeoftransactioncommitSimplifiessomeaspectsofrecoveryButhasoverheadofstoringlocalcopy
TransactionCommitAtransactionissaidtohavecommittedwhenitscommitlogrecordisoutputtostablestorageallpreviouslogrecordsofthetransactionmusthavebeenoutputalreadyWritesperformedbyatransactionmaystillbeinthebufferwhenthetransactioncommits,andmaybeoutputlater
ImmediateDatabaseModificationExampleLogWriteOutputC=600BB,BCBANote:BXdenotesblockcontainingX.BCoutputbeforeT1commitsBAoutputafterT0commits
ConcurrencyControlandRecoveryWithconcurrenttransactions,alltransactionsshareasinglediskbufferandasinglelogAbufferblockcanhavedataitemsupdatedbyoneormoretransactionsWeassumethatifatransactionTihasmodifiedanitem,noothertransactioncanmodifythesameitemuntilTihascommittedorabortedi.e.theupdatesofuncommittedtransactionsshouldnotbevisibletoothertransactionsOtherwisehowtoperformundoifT1updatesA,thenT2updatesAandcommits,andfinallyT1hastoabort?Canbeensuredbyobtainingexclusivelocksonupdateditemsandholdingthelockstillendoftransaction(stricttwo-phaselocking)Logrecordsofdifferenttransactionsmaybeinterspersedinthelog.
UndoandRedoOperationsUndoofalogrecordwritestheoldvalueV1toXRedoofalogrecordwritesthenewvalueV2toXUndoandRedoofTransactionsundo(Ti)restoresthevalueofalldataitemsupdatedbyTitotheiroldvalues,goingbackwardsfromthelastlogrecordforTieachtimeadataitemXisrestoredtoitsoldvalueVaspeciallogrecordiswrittenoutwhenundoofatransactioniscomplete,alogrecordiswrittenout.redo(Ti)setsthevalueofalldataitemsupdatedbyTitothenewvalues,goingforwardfromthefirstlogrecordforTiNologgingisdoneinthiscase
UndoandRedoonRecoveringfromFailureWhenrecoveringafterfailure:TransactionTineedstobeundoneifthelogcontainstherecord,butdoesnotcontaineithertherecordor.TransactionTineedstoberedoneifthelogcontainstherecordsandcontainstherecordorNotethatIftransactionTiwasundoneearlierandtherecordwrittentothelog,andthenafailureoccurs,onrecoveryfromfailureTiisredonesucharedoredoesalltheoriginalactionsincludingthestepsthatrestoredoldvaluesKnownasrepeatinghistorySeemswasteful,butsimplifiesrecoverygreatly
ImmediateDBModificationRecoveryExampleBelowweshowthelogasitappearsatthreeinstancesoftime.Recoveryactionsineachcaseaboveare:(a)undo(T0):Bisrestoredto2000andAto1000,andlogrecords,,arewrittenout(b)redo(T0)andundo(T1):AandBaresetto950and2050andCisrestoredto700.Logrecords,arewrittenout.(c)redo(T0)andredo(T1):AandBaresetto950and2050respectively.ThenCissetto600
CheckpointsRedoing/undoingalltransactionsrecordedinthelogcanbeveryslowprocessingtheentirelogistime-consumingifthesystemhasrunforalongtimewemightunnecessarilyredotransactionswhichhavealreadyoutputtheirupdatestothedatabase.StreamlinerecoveryprocedurebyperiodicallyperformingcheckpointingOutputalllogrecordscurrentlyresidinginmainmemoryontostablestorage.Outputallmodifiedbufferblockstothedisk.WritealogrecordontostablestoragewhereLisalistofalltransactionsactiveatthetimeofcheckpoint.Allupdatesarestoppedwhiledoingcheckpointing
Checkpoints(Cont.)DuringrecoveryweneedtoconsideronlythemostrecenttransactionTithatstartedbeforethecheckpoint,andtransactionsthatstartedafterTi.ScanbackwardsfromendoflogtofindthemostrecentrecordOnlytransactionsthatareinLorstartedafterthecheckpointneedtoberedoneorundoneTransactionsthatcommittedorabortedbeforethecheckpointalreadyhavealltheirupdatesoutputtostablestorage.SomeearlierpartofthelogmaybeneededforundooperationsContinuescanningbackwardstillarecordisfoundforeverytransactionTiinL.Partsoflogpriortoearliestrecordabovearenotneededforrecovery,andcanbeerasedwheneverdesired.
ExampleofCheckpointsT1canbeignored(updatesalreadyoutputtodiskduetocheckpoint)T2andT3redone.T4undoneTcTfT1T2T3T4checkpointsystemfailure
RecoveryAlgorithmSofar:wecoveredkeyconceptsNow:wepresentthecomponentsofthebasicrecoveryalgorithmLater:wepresentextensionstoallowmoreconcurrency
RecoveryAlgorithmLogging(duringnormaloperation):attransactionstartforeachupdate,andattransactionendTransactionrollback(duringnormaloperation)LetTibethetransactiontoberolledbackScanlogbackwardsfromtheend,andforeachlogrecordofTioftheformperformtheundobywritingV1toXj,writealogrecordsuchlogrecordsarecalledcompensationlogrecordsOncetherecordisfoundstopthescanandwritethelogrecord
Recoveryfromfailure:TwophasesRedophase:replayupdatesofalltransactions,whethertheycommitted,aborted,orareincompleteUndophase:undoallincompletetransactionsRedophase:Findlastrecord,andsetundo-listtoL.ScanforwardfromaboverecordWheneverarecordisfound,redoitbywritingV2toXjWheneveralogrecordisfound,addTitoundo-listWheneveralogrecordorisfound,removeTifromundo-listRecoveryAlgorithm(Cont.)
RecoveryAlgorithm(Cont.)Undophase:ScanlogbackwardsfromendWheneveralogrecordisfoundwhereTiisinundo-listperformsameactionsasfortransactionrollback:performundobywritingV1toXj.writealogrecordWheneveralogrecordisfoundwhereTiisinundo-list,WritealogrecordRemoveTifromundo-listStopwhenundo-listisemptyi.e.hasbeenfoundforeverytransactioninundo-listAfterundophasecompletes,normaltransactionprocessingcancommence
ExampleofRecovery
LogRecordBufferingLogrecordbuffering:logrecordsarebufferedinmainmemory,insteadofofbeingoutputdirectlytostablestorage.Logrecordsareoutputtostablestoragewhenablockoflogrecordsinthebufferisfull,oralogforceoperationisexecuted.Logforceisperformedtocommitatransactionbyforcingallitslogrecords(includingthecommitrecord)tostablestorage.Severallogrecordscanthusbeoutputusingasingleoutputoperation,reducingtheI/Ocost.
LogRecordBuffering(Cont.)Therulesbelowmustbefollowediflogrecordsarebuffered:Logrecordsareoutputtostablestorageintheorderinwhichtheyarecreated.TransactionTientersthecommitstateonlywhenthelogrecordhasbeenoutputtostablestorage.Beforeablockofdatainmainmemoryisoutputtothedatabase,alllogrecordspertainingtodatainthatblockmusthavebeenoutputtostablestorage.Thisruleiscalledthewrite-aheadloggingorWALruleStrictlyspeakingWALonlyrequiresundoinformationtobeoutput
DatabaseBufferingDatabasemaintainsanin-memorybufferofdatablocksWhenanewblockisneeded,ifbufferisfullanexistingblockneedstoberemovedfrombufferIftheblockchosenforremovalhasbeenupdated,itmustbeoutputtodiskTherecoveryalgorithmsupportstheno-forcepolicy:i.e.,updatedblocksneednotbewrittentodiskwhentransactioncommitsforcepolicy:requiresupdatedblockstobewrittenatcommitMoreexpensivecommitTherecoveryalgorithmsupportsthestealpolicy:i.e.,blockscontainingupdatesofuncommittedtransactionscanbewrittentodisk,evenbeforethetransactioncommits
DatabaseBuffering(Cont.)Ifablockwithuncommittedupdatesisoutputtodisk,logrecordswithundoinformationfortheupdatesareoutputtothelogonstablestoragefirst(Writeaheadlogging)Noupdatesshouldbeinprogressonablockwhenitisoutputtodisk.Canbeensuredasfollows.Beforewritingadataitem,transactionacquiresexclusivelockonblockcontainingthedataitemLockcanbereleasedoncethewriteiscompleted.Suchlocksheldforshortdurationarecalledlatches.TooutputablocktodiskFirstacquireanexclusivelatchontheblockEnsuresnoupdatecanbeinprogressontheblockThenperformalogflushThenoutputtheblocktodiskFinallyreleasethelatchontheblock
BufferManagement(Cont.)Databasebuffercanbeimplementedeitherinanareaofrealmain-memoryreservedforthedatabase,orinvirtualmemoryImplementingbufferinreservedmain-memoryhasdrawbacks:Memoryispartitionedbefore-handbetweendatabasebufferandapplications,limitingflexibility.Needsmaychange,andalthoughoperatingsystemknowsbesthowmemoryshouldbedividedupatanytime,itcannotchangethepartitioningofmemory.
BufferManagement(Cont.)Databasebuffersaregenerallyimplementedinvirtualmemoryinspiteofsomedrawbacks:Whenoperatingsystemneedstoevictapagethathasbeenmodified,thepageiswrittentoswapspaceondisk.Whendatabasedecidestowritebufferpagetodisk,bufferpagemaybeinswapspace,andmayhavetobereadfromswapspaceondiskandoutputtothedatabaseondisk,resultinginextraI/O!Knownasdualpagingproblem.IdeallywhenOSneedstoevictapagefromthebuffer,itshouldpasscontroltodatabase,whichinturnshouldOutputthepagetodatabaseinsteadoftoswapspace(makingsuretooutputlogrecordsfirst),ifitismodifiedReleasethepagefromthebuffer,fortheOStouseDualpagingcanthusbeavoided,butcommonoperatingsystemsdonotsupportsuchfunctionality.
FuzzyCheckpointingToavoidlonginterruptionofnormalprocessingduringcheckpointing,allowupdatestohappenduringcheckpointingFuzzycheckpointingisdoneasfollows:TemporarilystopallupdatesbytransactionsWritealogrecordandforcelogtostablestorageNotelistMofmodifiedbufferblocksNowpermittransactionstoproceedwiththeiractionsOutputtodiskallmodifiedbufferblocksinlistMblocksshouldnotbeupdatedwhilebeingoutputFollowWAL:alllogrecordspertainingtoablockmustbeoutputbeforetheblockisoutputStoreapointertothecheckpointrecordinafixedpositionlast_checkpointondisk
FuzzyCheckpointing(Cont.)Whenrecoveringusingafuzzycheckpoint,startscanfromthecheckpointrecordpointedtobylast_checkpointLogrecordsbeforelast_checkpointhavetheirupdatesreflectedindatabaseondisk,andneednotberedone.Incompletecheckpoints,wheresystemhadcrashedwhileperformingcheckpoint,arehandledsafely………..…..Loglast_checkpoint
FailurewithLossofNonvolatileStorageSofarweassumednolossofnon-volatilestorageTechniquesimilartocheckpointingusedtodealwithlossofnon-volatilestoragePeriodicallydumptheentirecontentofthedatabasetostablestorageNotransactionmaybeactiveduringthedumpprocedure;aproceduresimilartocheckpointingmusttakeplaceOutputalllogrecordscurrentlyresidinginmainmemoryontostablestorage.Outputallbufferblocksontothedisk.Copythecontentsofthedatabasetostablestorage.Outputarecordtologonstablestorage.
RecoveringfromFailureofNon-VolatileStorageTorecoverfromdiskfailurerestoredatabasefrommostrecentdump.ConsultthelogandredoalltransactionsthatcommittedafterthedumpCanbeextendedtoallowtransactionstobeactiveduringdump;knownasfuzzydumporonlinedumpSimilartofuzzycheckpointing
RecoverywithEarlyLockReleaseandLogicalUndoOperations
RecoverywithEarlyLockReleaseSupportforhigh-concurrencylockingtechniques,suchasthoseusedforB+-treeconcurrencycontrol,whichreleaselocksearlySupports“logicalundo”Recoverybasedon“repeatinghistory”,wherebyrecoveryexecutesexactlythesameactionsasnormalprocessing
LogicalUndoLoggingOperationslikeB+-treeinsertionsanddeletionsreleaselocksearly.Theycannotbeundonebyrestoringoldvalues(physicalundo),sinceoncealockisreleased,othertransactionsmayhaveupdatedtheB+-tree.Instead,insertions(resp.deletions)areundonebyexecutingadeletion(resp.insertion)operation(knownaslogicalundo).Forsuchoperations,undologrecordsshouldcontaintheundooperationtobeexecutedSuchloggingiscalledlogicalundologging,incontrasttophysicalundologgingOperationsarecalledlogicaloperationsOtherexamples:deleteoftuple,toundoinsertoftupleallowsearlylockreleaseonspaceallocationinformationsubtractamountdeposited,toundodepositallowsearlylockreleaseonbankbalance
PhysicalRedoRedoinformationisloggedphysically(thatis,newvalueforeachwrite)evenforoperationswithlogicalundoLogicalredoisverycomplicatedsincedatabasestateondiskmaynotbe“operationconsistent”whenrecoverystartsPhysicalredologgingdoesnotconflictwithearlylockrelease
OperationLoggingOperationloggingisdoneasfollows:Whenoperationstarts,log.HereOjisauniqueidentifieroftheoperationinstance.Whileoperationisexecuting,normallogrecordswithphysicalredoandphysicalundoinformationarelogged.Whenoperationcompletes,islogged,whereUcontainsinformationneededtoperformalogicalundoinformation.Example:insertof(key,record-id)pair(K5,RID7)intoindexI9….Physicalredoofstepsininsert
OperationLogging(Cont.)Ifcrash/rollbackoccursbeforeoperationcompletes:theoperation-endlogrecordisnotfound,andthephysicalundoinformationisusedtoundooperation.Ifcrash/rollbackoccursaftertheoperationcompletes:theoperation-endlogrecordisfound,andinthiscaselogicalundoisperformedusingU;thephysicalundoinformationfortheoperationisignored.Redoofoperation(aftercrash)stillusesphysicalredoinformation.
TransactionRollbackwithLogicalUndoRollbackoftransactionTiisdoneasfollows:ScanthelogbackwardsIfalogrecordisfound,performtheundoandlogaal.IfarecordisfoundRollbacktheoperationlogicallyusingtheundoinformationU.Updatesperformedduringrollbackareloggedjustlikeduringnormaloperationexecution.Attheendoftheoperationrollback,insteadoflogginganoperation-endrecord,generatearecord.SkipallprecedinglogrecordsforTiuntiltherecordisfound
TransactionRollbackwithLogicalUndo(Cont.)Transactionrollback,scanningthelogbackwards(cont.):Ifaredo-onlyrecordisfoundignoreitIfarecordisfound:skipallprecedinglogrecordsforTiuntiltherecordisfound.StopthescanwhentherecordisfoundAddarecordtothelogSomepointstonote:Cases3and4abovecanoccuronlyifthedatabasecrasheswhileatransactionisbeingrolledback.Skippingoflogrecordsasincase4isimportanttopreventmultiplerollbackofthesameoperation.
TransactionRollbackwithLogicalUndoTransactionrollbackduringnormaloperation
FailureRecoverywithLogicalUndo
TransactionRollback:AnotherExampleExamplewithacompleteandanincompleteoperation….T1Rollbackbeginshereredo-onlylogrecordduringphysicalundo(ofincompleteO2)NormalredorecordsforlogicalundoofO1…Whatifcrashoccurredimmediatelyafterthis?
RecoveryAlgorithmwithLogicalUndoBasicallysameasearlieralgorithm,exceptforchangesdescribedearlierfortransactionrollback(Redophase):ScanlogforwardfromlastrecordtillendoflogRepeathistorybyphysicallyredoingallupdatesofalltransactions,Createanundo-listduringthescanasfollowsundo-listissettoLinitiallyWheneverisfoundTiisaddedtoundo-listWheneverorisfound,Tiisdeletedfromundo-listThisbringsdatabasetostateasofcrash,withcommittedaswellasuncommittedtransactionshavingbeenredone.Nowundo-listcontainstransactionsthatareincomplete,thatis,haveneithercommittednorbeenfullyrolledback.
RecoverywithLogicalUndo(Cont.)Recoveryfromsystemcrash(cont.)(Undophase):Scanlogbackwards,performingundoonlogrecordsoftransactionsfoundinundo-list.Logrecordsoftransactionsbeingrolledbackareprocessedasdescribedearlier,astheyarefoundSinglesharedscanforalltransactionsbeingundoneWhenisfoundforatransactionTiinundo-list,writealogrecord.StopscanwhenrecordshavebeenfoundforallTiinundo-listThisundoestheeffectsofincompletetransactions(thosewithneithercommitnorabortlogrecords).Recoveryisnowcomplete.
ARIESRecoveryAlgorithm
ARIESARIESisastateoftheartrecoverymethodIncorporatesnumerousoptimizationstoreduceoverheadsduringnormalprocessingandtospeeduprecoveryTherecoveryalgorithmwestudiedearlierismodeledafterARIES,butgreatlysimplifiedbyremovingoptimizationsUnliketherecoveryalgorithmdescribedearlier,ARIESUseslogsequencenumber(LSN)toidentifylogrecordsStoresLSNsinpagestoidentifywhatupdateshavealreadybeenappliedtoadatabasepagePhysiologicalredoDirtypagetabletoavoidunnecessaryredosduringrecoveryFuzzycheckpointingthatonlyrecordsinformationaboutdirtypages,anddoesnotrequiredirtypagestobewrittenoutatcheckpointtimeMorecominguponeachoftheabove…
ARIESOptimizationsPhysiologicalredoAffectedpageisphysicallyidentified,actionwithinpagecanbelogicalUsedtoreduceloggingoverheadse.g.whenarecordisdeletedandallotherrecordshavetobemovedtofillholePhysiologicalredocanlogjusttherecorddeletionPhysicalredowouldrequireloggingofoldandnewvaluesformuchofthepageRequirespagetobeoutputtodiskatomicallyEasytoachievewithhardwareRAID,alsosupportedbysomedisksystemsIncompletepageoutputcanbedetectedbychecksumtechniques,ButextraactionsarerequiredforrecoveryTreatedasamediafailure
ARIESDataStructuresARIESusesseveraldatastructuresLogsequencenumber(LSN)identifieseachlogrecordMustbesequentiallyincreasingTypicallyanoffsetfrombeginningoflogfiletoallowfastaccessEasilyextendedtohandlemultiplelogfilesPageLSNLogrecordsofseveraldifferenttypesDirtypagetable
ARIESDataStructures:PageLSNEachpagecontainsaPageLSNwhichistheLSNofthelastlogrecordwhoseeffectsarereflectedonthepageToupdateapage:X-latchthepage,andwritethelogrecordUpdatethepageRecordtheLSNofthelogrecordinPageLSNUnlockpageToflushpagetodisk,mustfirstS-latchpageThuspagestateondiskisoperationconsistentRequiredtosupportphysiologicalredoPageLSNisusedduringrecoverytopreventrepeatedredoThusensuringidempotence
ARIESDataStructures:LogRecordEachlogrecordcontainsLSNofpreviouslogrecordofthesametransactionLSNinlogrecordmaybeimplicitSpecialredo-onlylogrecordcalledcompensationlogrecord(CLR)usedtologactionstakenduringrecoverythatneverneedtobeundoneServestheroleofoperation-abortlogrecordsusedinearlierrecoveryalgorithmHasafieldUndoNextLSNtonotenext(earlier)recordtobeundoneRecordsinbetweenwouldhavealreadybeenundoneRequiredtoavoidrepeatedundoofalreadyundoneactionsLSNTransIDPrevLSNRedoInfoUndoInfoLSNTransIDUndoNextLSNRedoInfo12344"3"2"1"
ARIESDataStructures:DirtyPageTableDirtyPageTableListofpagesinthebufferthathavebeenupdatedContains,foreachsuchpagePageLSNofthepageRecLSNisanLSNsuchthatlogrecordsbeforethisLSNhavealreadybeenappliedtothepageversionondiskSettocurrentendoflogwhenapageisinsertedintodirtypagetable(justbeforebeingupdated)Recordedincheckpoints,helpstominimizeredowork
ARIESDataStructures
ARIESDataStructures:CheckpointLogCheckpointlogrecordContains:DirtyPageTableandlistofactivetransactionsForeachactivetransaction,LastLSN,theLSNofthelastlogrecordwrittenbythetransactionFixedpositionondisknotesLSNoflastcompletedcheckpointlogrecordDirtypagesarenotwrittenoutatcheckpointtimeInstead,theyareflushedoutcontinuously,inthebackgroundCheckpointisthusverylowoverheadcanbedonefrequently
ARIESRecoveryAlgorithmARIESrecoveryinvolvesthreepassesAnalysispass:DeterminesWhichtransactionstoundoWhichpagesweredirty(diskversionnotuptodate)attimeofcrashRedoLSN:LSNfromwhichredoshouldstartRedopass:Repeatshistory,redoingallactionsfromRedoLSNRecLSNandPageLSNsareusedtoavoidredoingactionsalreadyreflectedonpageUndopass:RollsbackallincompletetransactionsTransactionswhoseabortwascompleteearlierarenotundoneKeyidea:noneedtoundothesetransactions:earlierundoactionswerelogged,andareredoneasrequired
AriesRecovery:3PassesAnalysis,redoandundopassesAnalysisdetermineswhereredoshouldstartUndohastogobacktillstartofearliestincompletetransactionLastcheckpointLogTimeEndofLogAnalysispassRedopassUndopass
ARIESRecovery:AnalysisAnalysispassStartsfromlastcompletecheckpointlogrecordReadsDirtyPageTablefromlogrecordSetsRedoLSN=minofRecLSNsofallpagesinDirtyPageTableIncasenopagesaredirty,RedoLSN=checkpointrecord’sLSNSetsundo-list=listoftransactionsincheckpointlogrecordReadsLSNoflastlogrecordforeachtransactioninundo-listfromcheckpointlogrecordScansforwardfromcheckpoint..Cont.onnextpage…
ARIESRecovery:Analysis(Cont.)Analysispass(cont.)ScansforwardfromcheckpointIfanylogrecordfoundfortransactionnotinundo-list,addstransactiontoundo-listWheneveranupdatelogrecordisfoundIfpageisnotinDirtyPageTable,itisaddedwithRecLSNsettoLSNoftheupdatelogrecordIftransactionendlogrecordfound,deletetransactionfromundo-listKeepstrackoflastlogrecordforeachtransactioninundo-listMaybeneededforlaterundoAtendofanalysispass:RedoLSNdetermineswheretostartredopassRecLSNforeachpageinDirtyPageTableusedtominimizeredoworkAlltransactionsinundo-listneedtoberolledback
ARIESRedoPassRedoPass:Repeatshistorybyreplayingeveryactionnotalreadyreflectedinthepageondisk,asfollows:ScansforwardfromRedoLSN.Wheneveranupdatelogrecordisfound:IfthepageisnotinDirtyPageTableortheLSNofthelogrecordislessthantheRecLSNofthepageinDirtyPageTable,thenskipthelogrecordOtherwisefetchthepagefromdisk.IfthePageLSNofthepagefetchedfromdiskislessthantheLSNofthelogrecord,redothelogrecordNOTE:ifeithertestisnegativetheeffectsofthelogrecordhavealreadyappearedonthepage.Firsttestavoidsevenfetchingthepagefromdisk!
ARIESUndoActionsWhenanundoisperformedforanupdatelogrecordGenerateaCLRcontainingtheundoactionperformed(actionsperformedduringundoareloggedphysicalyorphysiologically).CLRforrecordnnotedasn’infigurebelowSetUndoNextLSNoftheCLRtothePrevLSNvalueoftheupdatelogrecordArrowsindicateUndoNextLSNvalueARIESsupportspartialrollbackUsede.g.tohandledeadlocksbyrollingbackjustenoughtoreleasereqd.locksFigureindicatesforwardactionsafterpartialrollbacksrecords3and4initially,later5and6,thenfullrollback12344"3"565"2"1"6"
ARIES:UndoPassUndopass:Performsbackwardscanonlogundoingalltransactioninundo-listBackwardscanoptimizedbyskippingunneededlogrecordsasfollows:NextLSNtobeundoneforeachtransactionsettoLSNoflastlogrecordfortransactionfoundbyanalysispass.AteachsteppicklargestoftheseLSNstoundo,skipbacktoitandundoitAfterundoingalogrecordForordinarylogrecords,setnextLSNtobeundonefortransactiontoPrevLSNnotedinthelogrecordForcompensationlogrecords(CLRs)setnextLSNtobeundotoUndoNextLSNnotedinthelogrecordAllinterveningrecordsareskippedsincetheywouldhavebeenundonealreadyUndosperformedasdescribedearlier
RecoveryActionsinARIES
OtherARIESFeaturesRecoveryIndependencePagescanberecoveredindependentlyofothersE.g.ifsomediskpagesfailtheycanberecoveredfromabackupwhileotherpagesarebeingusedSavepoints:TransactionscanrecordsavepointsandrollbacktoasavepointUsefulforcomplextransactionsAlsousedtorollbackjustenoughtoreleaselocksondeadlock
OtherARIESFeatures(Cont.)Fine-grainedlocking:IndexconcurrencyalgorithmsthatpermittuplelevellockingonindicescanbeusedTheserequirelogicalundo,ratherthanphysicalundo,asinearlierrecoveryalgorithmRecoveryoptimizations:Forexample:DirtypagetablecanbeusedtoprefetchpagesduringredoOutoforderredoispossible:redocanbepostponedonapagebeingfetchedfromdisk,andperformedwhenpageisfetched.Meanwhileotherlogrecordscancontinuetobeprocessed
RemoteBackupSystems
RemoteBackupSystemsRemotebackupsystemsprovidehighavailabilitybyallowingtransactionprocessingtocontinueeveniftheprimarysiteisdestroyed.
RemoteBackupSystems(Cont.)Detectionoffailure:Backupsitemustdetectwhenprimarysitehasfailedtodistinguishprimarysitefailurefromlinkfailuremaintainseveralcommunicationlinksbetweentheprimaryandtheremotebackup.Heart-beatmessagesTransferofcontrol:Totakeovercontrolbackupsitefirstperformrecoveryusingitscopyofthedatabaseandallthelongrecordsithasreceivedfromtheprimary.Thus,completedtransactionsareredoneandincompletetransactionsarerolledback.WhenthebackupsitetakesoverprocessingitbecomesthenewprimaryTotransfercontrolbacktooldprimarywhenitrecovers,oldprimarymustreceiveredologsfromtheoldbackupandapplyallupdateslocally.
RemoteBackupSystems(Cont.)Timetorecover:Toreducedelayintakeover,backupsiteperiodicallyprocesestheredologrecords(ineffect,performingrecoveryfrompreviousdatabasestate),performsacheckpoint,andcanthendeleteearlierpartsofthelog.Hot-Spareconfigurationpermitsveryfasttakeover:Backupcontinuallyprocessesredologrecordastheyarrive,applyingtheupdateslocally.Whenfailureoftheprimaryisdetectedthebackuprollsbackincompletetransactions,andisreadytoprocessnewtransactions.Alternativetoremotebackup:distributeddatabasewithreplicateddataRemotebackupisfasterandcheaper,butlesstoleranttofailuremoreonthisinChapter19
RemoteBackupSystems(Cont.)Ensuredurabilityofupdatesbydelayingtransactioncommituntilupdateisloggedatbackup;avoidthisdelaybypermittinglowerdegreesofdurability.One-safe:commitassoonastransaction’scommitlogrecordiswrittenatprimaryProblem:updatesmaynotarriveatbackupbeforeittakesover.Two-very-safe:commitwhentransaction’scommitlogrecordiswrittenatprimaryandbackupReducesavailabilitysincetransactionscannotcommitifeithersitefails.Two-safe:proceedasintwo-very-safeifbothprimaryandbackupareactive.Ifonlytheprimaryisactive,thetransactioncommitsassoonasiscommitlogrecordiswrittenattheprimary.Betteravailabilitythantwo-very-safe;avoidsproblemoflosttransactionsinone-safe.
EndofChapter16
Figure16.01
Figure16.02
Figure16.03
Figure16.04
Figure16.05
Figure16.06
Figure16.08
Figure16.09
Figure16.10
Extra
ShadowPagingShadowpagingisanalternativetolog-basedrecovery;thisschemeisusefuliftransactionsexecuteseriallyIdea:maintaintwopagetablesduringthelifetimeofatransaction–thecurrentpagetable,andtheshadowpagetableStoretheshadowpagetableinnonvolatilestorage,suchthatstateofthedatabasepriortotransactionexecutionmayberecovered.ShadowpagetableisnevermodifiedduringexecutionTostartwith,boththepagetablesareidentical.Onlycurrentpagetableisusedfordataitemaccessesduringexecutionofthetransaction.WheneveranypageisabouttobewrittenforthefirsttimeAcopyofthispageismadeontoanunusedpage.ThecurrentpagetableisthenmadetopointtothecopyTheupdateisperformedonthecopy
SamplePageTable
ExampleofShadowPagingShadowandcurrentpagetablesafterwritetopage4
ShadowPaging(Cont.)Tocommitatransaction:1.Flushallmodifiedpagesinmainmemorytodisk2.Outputcurrentpagetabletodisk3.Makethecurrentpagetablethenewshadowpagetable,asfollows:keepapointertotheshadowpagetableatafixed(known)locationondisk.tomakethecurrentpagetablethenewshadowpagetable,simplyupdatethepointertopointtocurrentpagetableondiskOncepointertoshadowpagetablehasbeenwritten,transactioniscommitted.Norecoveryisneededafteracrash—newtransactionscanstartrightaway,usingtheshadowpagetable.Pagesnotpointedtofromcurrent/shadowpagetableshouldbefreed(garbagecollected).
ShowPaging(Cont.)Advantagesofshadow-pagingoverlog-basedschemesnooverheadofwritinglogrecordsrecoveryistrivialDisadvantages:CopyingtheentirepagetableisveryexpensiveCanbereducedbyusingapagetablestructuredlikeaB+-treeNoneedtocopyentiretree,onlyneedtocopypathsinthetreethatleadtoupdatedleafnodesCommitoverheadishighevenwithaboveextensionNeedtoflusheveryupdatedpage,andpagetableDatagetsfragmented(relatedpagesgetseparatedondisk)Aftereverytransactioncompletion,thedatabasepagescontainingoldversionsofmodifieddataneedtobegarbagecollectedHardtoextendalgorithmtoallowtransactionstorunconcurrentlyEasiertoextendlogbasedschemes
BlockStorageOperations
PortionoftheDatabaseLogCorrespondingtoT0andT1
StateoftheLogandDatabaseCorrespondingtoT0andT1
PortionoftheSystemLogCorrespondingtoT0andT1
StateofSystemLogandDatabaseCorrespondingtoT0andT1'
您可能关注的文档
- 施工企业会计第二版辛艳红配套教学课件PPT 第1章_总论.ppt
- 数据结构课件PPT110章全 第四章 串.ppt
- 数据结构课件PPT110章全 第五章 数组和广义表.ppt
- 数据结构课件PPT110章全 第六章 树和二叉树.ppt
- 数据结构课件PPT110章全 第七章 图.ppt
- 数据库系统概念全套配套课件PPT ch12.ppt
- 数据库系统概念全套配套课件PPT ch6.ppt
- 数据库系统概念全套配套课件PPT ch2.ppt
- 数据库系统概念全套配套课件PPT appB.ppt
- 数据库系统概念全套配套课件PPT ch7.ppt
- 数据库系统概念全套配套课件PPT ch26.ppt
- 数据库系统概念全套配套课件PPT ch22.ppt
- 数据库系统概念全套配套课件PPT ch23.ppt
- 数据库系统概念全套配套课件PPT ch21.ppt
- 数据库系统概念全套配套课件PPT ch15.ppt
- 土木工程材料第2版柯国军配套教学课件PPT 06土木工程材料.ppt
- 土木工程材料第2版柯国军配套教学课件PPT 01土木工程材料.ppt
- 土木工程材料第2版柯国军配套教学课件PPT 07土木工程材料.ppt