• 3.42 MB
  • 2022-04-29 14:33:35 发布

数据库系统概念全套配套课件PPT ch16.ppt

  • 95页
  • 当前文档由用户上传发布,收益归属用户
  1. 1、本文档共5页,可阅读全部内容。
  2. 2、本文档内容版权归属内容提供方,所产生的收益全部归内容提供方所有。如果您对本文有版权争议,可选择认领,认领后既往收益都归您。
  3. 3、本文档由用户上传,本站不保证质量和数量令人满意,可能有诸多瑕疵,付费之前,请仔细先通过免费阅读内容等途径辨别内容交易风险。如存在严重挂羊头卖狗肉之情形,可联系本站下载客服投诉处理。
  4. 文档侵权举报电话:19940600175。
'Chapter16:RecoverySystem Chapter16:RecoverySystemFailureClassificationStorageStructureRecoveryandAtomicityLog-BasedRecoveryRemoteBackupSystems FailureClassificationTransactionfailure:Logicalerrors:transactioncannotcompleteduetosomeinternalerrorconditionSystemerrors:thedatabasesystemmustterminateanactivetransactionduetoanerrorcondition(e.g.,deadlock)Systemcrash:apowerfailureorotherhardwareorsoftwarefailurecausesthesystemtocrash.Fail-stopassumption:non-volatilestoragecontentsareassumedtonotbecorruptedbysystemcrashDatabasesystemshavenumerousintegritycheckstopreventcorruptionofdiskdataDiskfailure:aheadcrashorsimilardiskfailuredestroysallorpartofdiskstorageDestructionisassumedtobedetectable:diskdrivesusechecksumstodetectfailures RecoveryAlgorithmsConsidertransactionTithattransfers$50fromaccountAtoaccountBTwoupdates:subtract50fromAandadd50toBTransactionTirequiresupdatestoAandBtobeoutputtothedatabase.Afailuremayoccurafteroneofthesemodificationshavebeenmadebutbeforebothofthemaremade.ModifyingthedatabasewithoutensuringthatthetransactionwillcommitmayleavethedatabaseinaninconsistentstateNotmodifyingthedatabasemayresultinlostupdatesiffailureoccursjustaftertransactioncommitsRecoveryalgorithmshavetwopartsActionstakenduringnormaltransactionprocessingtoensureenoughinformationexiststorecoverfromfailuresActionstakenafterafailuretorecoverthedatabasecontentstoastatethatensuresatomicity,consistencyanddurability StorageStructureVolatilestorage:doesnotsurvivesystemcrashesexamples:mainmemory,cachememoryNonvolatilestorage:survivessystemcrashesexamples:disk,tape,flashmemory, non-volatile(batterybackedup)RAMbutmaystillfail,losingdataStablestorage:amythicalformofstoragethatsurvivesallfailuresapproximatedbymaintainingmultiplecopiesondistinctnonvolatilemediaSeebookformoredetailsonhowtoimplementstablestorage Stable-StorageImplementationMaintainmultiplecopiesofeachblockonseparatediskscopiescanbeatremotesitestoprotectagainstdisasterssuchasfireorflooding.Failureduringdatatransfercanstillresultininconsistentcopies:BlocktransfercanresultinSuccessfulcompletionPartialfailure:destinationblockhasincorrectinformationTotalfailure:destinationblockwasneverupdatedProtectingstoragemediafromfailureduringdatatransfer(onesolution):Executeoutputoperationasfollows(assumingtwocopiesofeachblock):Writetheinformationontothefirstphysicalblock.Whenthefirstwritesuccessfullycompletes,writethesameinformationontothesecondphysicalblock.Theoutputiscompletedonlyafterthesecondwritesuccessfullycompletes. Stable-StorageImplementation(Cont.)Protectingstoragemediafromfailureduringdatatransfer(cont.):Copiesofablockmaydifferduetofailureduringoutputoperation.Torecoverfromfailure:Firstfindinconsistentblocks:Expensivesolution:Comparethetwocopiesofeverydiskblock.Bettersolution:Recordin-progressdiskwritesonnon-volatilestorage(Non-volatileRAMorspecialareaofdisk).Usethisinformationduringrecoverytofindblocksthatmaybeinconsistent,andonlycomparecopiesofthese.UsedinhardwareRAIDsystemsIfeithercopyofaninconsistentblockisdetectedtohaveanerror(badchecksum),overwriteitbytheothercopy.Ifbothhavenoerror,butaredifferent,overwritethesecondblockbythefirstblock. DataAccessPhysicalblocksarethoseblocksresidingonthedisk.Bufferblocksaretheblocksresidingtemporarilyinmainmemory.Blockmovementsbetweendiskandmainmemoryareinitiatedthroughthefollowingtwooperations:input(B)transfersthephysicalblockBtomainmemory.output(B)transfersthebufferblockBtothedisk,andreplacestheappropriatephysicalblockthere.Weassume,forsimplicity,thateachdataitemfitsin,andisstoredinside,asingleblock. ExampleofDataAccessXYABx1y1bufferBufferBlockABufferBlockBinput(A)output(B)read(X)write(Y)diskworkareaofT1workareaofT2memoryx2 DataAccess(Cont.)EachtransactionTihasitsprivatework-areainwhichlocalcopiesofalldataitemsaccessedandupdatedbyitarekept.Ti"slocalcopyofadataitemXiscalledxi.Transferringdataitemsbetweensystembufferblocksanditsprivatework-areadoneby:read(X)assignsthevalueofdataitemXtothelocalvariablexi.write(X)assignsthevalueoflocalvariablexitodataitem{X}inthebufferblock.Note:output(BX)neednotimmediatelyfollowwrite(X).Systemcanperformtheoutputoperationwhenitdeemsfit.TransactionsMustperformread(X)beforeaccessingXforthefirsttime(subsequentreadscanbefromlocalcopy)write(X)canbeexecutedatanytimebeforethetransactioncommits RecoveryandAtomicityToensureatomicitydespitefailures,wefirstoutputinformationdescribingthemodificationstostablestoragewithoutmodifyingthedatabaseitself.Westudylog-basedrecoverymechanismsindetailWefirstpresentkeyconceptsAndthenpresenttheactualrecoveryalgorithmLessusedalternative:shadow-paging(briefdetailsinbook) Log-BasedRecoveryAlogiskeptonstablestorage.Thelogisasequenceoflogrecords,andmaintainsarecordofupdateactivitiesonthedatabase.WhentransactionTistarts,itregistersitselfbywritingalogrecordBeforeTiexecuteswrite(X),alogrecordiswritten,whereV1isthevalueofXbeforethewrite(theoldvalue),andV2isthevaluetobewrittentoX(thenewvalue).WhenTifinishesitlaststatement,thelogrecordiswritten.TwoapproachesusinglogsDeferreddatabasemodificationImmediatedatabasemodification ImmediateDatabaseModificationTheimmediate-modificationschemeallowsupdatesofanuncommittedtransactiontobemadetothebuffer,orthediskitself,beforethetransactioncommitsUpdatelogrecordmustbewrittenbeforedatabaseitemiswrittenWeassumethatthelogrecordisoutputdirectlytostablestorage(Willseelaterthathowtopostponelogrecordoutputtosomeextent)OutputofupdatedblockstostablestoragecantakeplaceatanytimebeforeoraftertransactioncommitOrderinwhichblocksareoutputcanbedifferentfromtheorderinwhichtheyarewritten.Thedeferred-modificationschemeperformsupdatestobuffer/diskonlyatthetimeoftransactioncommitSimplifiessomeaspectsofrecoveryButhasoverheadofstoringlocalcopy TransactionCommitAtransactionissaidtohavecommittedwhenitscommitlogrecordisoutputtostablestorageallpreviouslogrecordsofthetransactionmusthavebeenoutputalreadyWritesperformedbyatransactionmaystillbeinthebufferwhenthetransactioncommits,andmaybeoutputlater ImmediateDatabaseModificationExampleLogWriteOutputC=600BB,BCBANote:BXdenotesblockcontainingX.BCoutputbeforeT1commitsBAoutputafterT0commits ConcurrencyControlandRecoveryWithconcurrenttransactions,alltransactionsshareasinglediskbufferandasinglelogAbufferblockcanhavedataitemsupdatedbyoneormoretransactionsWeassumethatifatransactionTihasmodifiedanitem,noothertransactioncanmodifythesameitemuntilTihascommittedorabortedi.e.theupdatesofuncommittedtransactionsshouldnotbevisibletoothertransactionsOtherwisehowtoperformundoifT1updatesA,thenT2updatesAandcommits,andfinallyT1hastoabort?Canbeensuredbyobtainingexclusivelocksonupdateditemsandholdingthelockstillendoftransaction(stricttwo-phaselocking)Logrecordsofdifferenttransactionsmaybeinterspersedinthelog. UndoandRedoOperationsUndoofalogrecordwritestheoldvalueV1toXRedoofalogrecordwritesthenewvalueV2toXUndoandRedoofTransactionsundo(Ti)restoresthevalueofalldataitemsupdatedbyTitotheiroldvalues,goingbackwardsfromthelastlogrecordforTieachtimeadataitemXisrestoredtoitsoldvalueVaspeciallogrecordiswrittenoutwhenundoofatransactioniscomplete,alogrecordiswrittenout.redo(Ti)setsthevalueofalldataitemsupdatedbyTitothenewvalues,goingforwardfromthefirstlogrecordforTiNologgingisdoneinthiscase UndoandRedoonRecoveringfromFailureWhenrecoveringafterfailure:TransactionTineedstobeundoneifthelogcontainstherecord,butdoesnotcontaineithertherecordor.TransactionTineedstoberedoneifthelogcontainstherecordsandcontainstherecordorNotethatIftransactionTiwasundoneearlierandtherecordwrittentothelog,andthenafailureoccurs,onrecoveryfromfailureTiisredonesucharedoredoesalltheoriginalactionsincludingthestepsthatrestoredoldvaluesKnownasrepeatinghistorySeemswasteful,butsimplifiesrecoverygreatly ImmediateDBModificationRecoveryExampleBelowweshowthelogasitappearsatthreeinstancesoftime.Recoveryactionsineachcaseaboveare:(a)undo(T0):Bisrestoredto2000andAto1000,andlogrecords ,,arewrittenout(b)redo(T0)andundo(T1):AandBaresetto950and2050andCisrestoredto700.Logrecords,arewrittenout.(c)redo(T0)andredo(T1):AandBaresetto950and2050respectively.ThenCissetto600 CheckpointsRedoing/undoingalltransactionsrecordedinthelogcanbeveryslowprocessingtheentirelogistime-consumingifthesystemhasrunforalongtimewemightunnecessarilyredotransactionswhichhavealreadyoutputtheirupdatestothedatabase.StreamlinerecoveryprocedurebyperiodicallyperformingcheckpointingOutputalllogrecordscurrentlyresidinginmainmemoryontostablestorage.Outputallmodifiedbufferblockstothedisk.WritealogrecordontostablestoragewhereLisalistofalltransactionsactiveatthetimeofcheckpoint.Allupdatesarestoppedwhiledoingcheckpointing Checkpoints(Cont.)DuringrecoveryweneedtoconsideronlythemostrecenttransactionTithatstartedbeforethecheckpoint,andtransactionsthatstartedafterTi.ScanbackwardsfromendoflogtofindthemostrecentrecordOnlytransactionsthatareinLorstartedafterthecheckpointneedtoberedoneorundoneTransactionsthatcommittedorabortedbeforethecheckpointalreadyhavealltheirupdatesoutputtostablestorage.SomeearlierpartofthelogmaybeneededforundooperationsContinuescanningbackwardstillarecordisfoundforeverytransactionTiinL.Partsoflogpriortoearliestrecordabovearenotneededforrecovery,andcanbeerasedwheneverdesired. ExampleofCheckpointsT1canbeignored(updatesalreadyoutputtodiskduetocheckpoint)T2andT3redone.T4undoneTcTfT1T2T3T4checkpointsystemfailure RecoveryAlgorithmSofar:wecoveredkeyconceptsNow:wepresentthecomponentsofthebasicrecoveryalgorithmLater:wepresentextensionstoallowmoreconcurrency RecoveryAlgorithmLogging(duringnormaloperation):attransactionstartforeachupdate,andattransactionendTransactionrollback(duringnormaloperation)LetTibethetransactiontoberolledbackScanlogbackwardsfromtheend,andforeachlogrecordofTioftheformperformtheundobywritingV1toXj,writealogrecordsuchlogrecordsarecalledcompensationlogrecordsOncetherecordisfoundstopthescanandwritethelogrecord Recoveryfromfailure:TwophasesRedophase:replayupdatesofalltransactions,whethertheycommitted,aborted,orareincompleteUndophase:undoallincompletetransactionsRedophase:Findlastrecord,andsetundo-listtoL.ScanforwardfromaboverecordWheneverarecordisfound,redoitbywritingV2toXjWheneveralogrecordisfound,addTitoundo-listWheneveralogrecordorisfound,removeTifromundo-listRecoveryAlgorithm(Cont.) RecoveryAlgorithm(Cont.)Undophase:ScanlogbackwardsfromendWheneveralogrecordisfoundwhereTiisinundo-listperformsameactionsasfortransactionrollback:performundobywritingV1toXj.writealogrecordWheneveralogrecordisfoundwhereTiisinundo-list,WritealogrecordRemoveTifromundo-listStopwhenundo-listisemptyi.e.hasbeenfoundforeverytransactioninundo-listAfterundophasecompletes,normaltransactionprocessingcancommence ExampleofRecovery LogRecordBufferingLogrecordbuffering:logrecordsarebufferedinmainmemory,insteadofofbeingoutputdirectlytostablestorage.Logrecordsareoutputtostablestoragewhenablockoflogrecordsinthebufferisfull,oralogforceoperationisexecuted.Logforceisperformedtocommitatransactionbyforcingallitslogrecords(includingthecommitrecord)tostablestorage.Severallogrecordscanthusbeoutputusingasingleoutputoperation,reducingtheI/Ocost. LogRecordBuffering(Cont.)Therulesbelowmustbefollowediflogrecordsarebuffered:Logrecordsareoutputtostablestorageintheorderinwhichtheyarecreated.TransactionTientersthecommitstateonlywhenthelogrecord hasbeenoutputtostablestorage.Beforeablockofdatainmainmemoryisoutputtothedatabase,alllogrecordspertainingtodatainthatblockmusthavebeenoutputtostablestorage.Thisruleiscalledthewrite-aheadloggingorWALruleStrictlyspeakingWALonlyrequiresundoinformationtobeoutput DatabaseBufferingDatabasemaintainsanin-memorybufferofdatablocksWhenanewblockisneeded,ifbufferisfullanexistingblockneedstoberemovedfrombufferIftheblockchosenforremovalhasbeenupdated,itmustbeoutputtodiskTherecoveryalgorithmsupportstheno-forcepolicy:i.e.,updatedblocksneednotbewrittentodiskwhentransactioncommitsforcepolicy:requiresupdatedblockstobewrittenatcommitMoreexpensivecommitTherecoveryalgorithmsupportsthestealpolicy:i.e.,blockscontainingupdatesofuncommittedtransactionscanbewrittentodisk,evenbeforethetransactioncommits DatabaseBuffering(Cont.)Ifablockwithuncommittedupdatesisoutputtodisk,logrecordswithundoinformationfortheupdatesareoutputtothelogonstablestoragefirst(Writeaheadlogging)Noupdatesshouldbeinprogressonablockwhenitisoutputtodisk.Canbeensuredasfollows.Beforewritingadataitem,transactionacquiresexclusivelockonblockcontainingthedataitemLockcanbereleasedoncethewriteiscompleted.Suchlocksheldforshortdurationarecalledlatches.TooutputablocktodiskFirstacquireanexclusivelatchontheblockEnsuresnoupdatecanbeinprogressontheblockThenperformalogflushThenoutputtheblocktodiskFinallyreleasethelatchontheblock BufferManagement(Cont.)Databasebuffercanbeimplementedeitherinanareaofrealmain-memoryreservedforthedatabase,orinvirtualmemoryImplementingbufferinreservedmain-memoryhasdrawbacks:Memoryispartitionedbefore-handbetweendatabasebufferandapplications,limitingflexibility.Needsmaychange,andalthoughoperatingsystemknowsbesthowmemoryshouldbedividedupatanytime,itcannotchangethepartitioningofmemory. BufferManagement(Cont.)Databasebuffersaregenerallyimplementedinvirtualmemoryinspiteofsomedrawbacks:Whenoperatingsystemneedstoevictapagethathasbeenmodified,thepageiswrittentoswapspaceondisk.Whendatabasedecidestowritebufferpagetodisk,bufferpagemaybeinswapspace,andmayhavetobereadfromswapspaceondiskandoutputtothedatabaseondisk,resultinginextraI/O!Knownasdualpagingproblem.IdeallywhenOSneedstoevictapagefromthebuffer,itshouldpasscontroltodatabase,whichinturnshouldOutputthepagetodatabaseinsteadoftoswapspace(makingsuretooutputlogrecordsfirst),ifitismodifiedReleasethepagefromthebuffer,fortheOStouseDualpagingcanthusbeavoided,butcommonoperatingsystemsdonotsupportsuchfunctionality. FuzzyCheckpointingToavoidlonginterruptionofnormalprocessingduringcheckpointing,allowupdatestohappenduringcheckpointingFuzzycheckpointingisdoneasfollows:TemporarilystopallupdatesbytransactionsWritealogrecordandforcelogtostablestorageNotelistMofmodifiedbufferblocksNowpermittransactionstoproceedwiththeiractionsOutputtodiskallmodifiedbufferblocksinlistMblocksshouldnotbeupdatedwhilebeingoutputFollowWAL:alllogrecordspertainingtoablockmustbeoutputbeforetheblockisoutputStoreapointertothecheckpointrecordinafixedpositionlast_checkpointondisk FuzzyCheckpointing(Cont.)Whenrecoveringusingafuzzycheckpoint,startscanfromthecheckpointrecordpointedtobylast_checkpointLogrecordsbeforelast_checkpointhavetheirupdatesreflectedindatabaseondisk,andneednotberedone.Incompletecheckpoints,wheresystemhadcrashedwhileperformingcheckpoint,arehandledsafely………..…..Loglast_checkpoint FailurewithLossofNonvolatileStorageSofarweassumednolossofnon-volatilestorageTechniquesimilartocheckpointingusedtodealwithlossofnon-volatilestoragePeriodicallydumptheentirecontentofthedatabasetostablestorageNotransactionmaybeactiveduringthedumpprocedure;aproceduresimilartocheckpointingmusttakeplaceOutputalllogrecordscurrentlyresidinginmainmemoryontostablestorage.Outputallbufferblocksontothedisk.Copythecontentsofthedatabasetostablestorage.Outputarecordtologonstablestorage. RecoveringfromFailureofNon-VolatileStorageTorecoverfromdiskfailurerestoredatabasefrommostrecentdump.ConsultthelogandredoalltransactionsthatcommittedafterthedumpCanbeextendedtoallowtransactionstobeactiveduringdump; knownasfuzzydumporonlinedumpSimilartofuzzycheckpointing RecoverywithEarlyLockReleaseandLogicalUndoOperations RecoverywithEarlyLockReleaseSupportforhigh-concurrencylockingtechniques,suchasthoseusedforB+-treeconcurrencycontrol,whichreleaselocksearlySupports“logicalundo”Recoverybasedon“repeatinghistory”,wherebyrecoveryexecutesexactlythesameactionsasnormalprocessing LogicalUndoLoggingOperationslikeB+-treeinsertionsanddeletionsreleaselocksearly.Theycannotbeundonebyrestoringoldvalues(physicalundo),sinceoncealockisreleased,othertransactionsmayhaveupdatedtheB+-tree.Instead,insertions(resp.deletions)areundonebyexecutingadeletion(resp.insertion)operation(knownaslogicalundo).Forsuchoperations,undologrecordsshouldcontaintheundooperationtobeexecutedSuchloggingiscalledlogicalundologging,incontrasttophysicalundologgingOperationsarecalledlogicaloperationsOtherexamples:deleteoftuple,toundoinsertoftupleallowsearlylockreleaseonspaceallocationinformationsubtractamountdeposited,toundodepositallowsearlylockreleaseonbankbalance PhysicalRedoRedoinformationisloggedphysically(thatis,newvalueforeachwrite)evenforoperationswithlogicalundoLogicalredoisverycomplicatedsincedatabasestateondiskmaynotbe“operationconsistent”whenrecoverystartsPhysicalredologgingdoesnotconflictwithearlylockrelease OperationLoggingOperationloggingisdoneasfollows:Whenoperationstarts,log.HereOjisauniqueidentifieroftheoperationinstance.Whileoperationisexecuting,normallogrecordswithphysicalredoandphysicalundoinformationarelogged.Whenoperationcompletes,islogged,whereUcontainsinformationneededtoperformalogicalundoinformation.Example:insertof(key,record-id)pair(K5,RID7)intoindexI9….Physicalredoofstepsininsert OperationLogging(Cont.)Ifcrash/rollbackoccursbeforeoperationcompletes:theoperation-endlogrecordisnotfound,andthephysicalundoinformationisusedtoundooperation.Ifcrash/rollbackoccursaftertheoperationcompletes:theoperation-endlogrecordisfound,andinthiscaselogicalundoisperformedusingU;thephysicalundoinformationfortheoperationisignored.Redoofoperation(aftercrash)stillusesphysicalredoinformation. TransactionRollbackwithLogicalUndoRollbackoftransactionTiisdoneasfollows:ScanthelogbackwardsIfalogrecordisfound,performtheundoandlogaal.IfarecordisfoundRollbacktheoperationlogicallyusingtheundoinformationU.Updatesperformedduringrollbackareloggedjustlikeduringnormaloperationexecution.Attheendoftheoperationrollback,insteadoflogginganoperation-endrecord,generatearecord.SkipallprecedinglogrecordsforTiuntiltherecord isfound TransactionRollbackwithLogicalUndo(Cont.)Transactionrollback,scanningthelogbackwards(cont.):Ifaredo-onlyrecordisfoundignoreitIfarecordisfound:skipallprecedinglogrecordsforTiuntiltherecord isfound.StopthescanwhentherecordisfoundAddarecordtothelogSomepointstonote:Cases3and4abovecanoccuronlyifthedatabasecrasheswhileatransactionisbeingrolledback.Skippingoflogrecordsasincase4isimportanttopreventmultiplerollbackofthesameoperation. TransactionRollbackwithLogicalUndoTransactionrollbackduringnormaloperation FailureRecoverywithLogicalUndo TransactionRollback:AnotherExampleExamplewithacompleteandanincompleteoperation….T1Rollbackbeginshereredo-onlylogrecordduringphysicalundo(ofincompleteO2)NormalredorecordsforlogicalundoofO1…Whatifcrashoccurredimmediatelyafterthis? RecoveryAlgorithmwithLogicalUndoBasicallysameasearlieralgorithm,exceptforchangesdescribedearlierfortransactionrollback(Redophase):ScanlogforwardfromlastrecordtillendoflogRepeathistorybyphysicallyredoingallupdatesofalltransactions,Createanundo-listduringthescanasfollowsundo-listissettoLinitiallyWheneverisfoundTiisaddedtoundo-listWheneverorisfound,Tiisdeletedfromundo-listThisbringsdatabasetostateasofcrash,withcommittedaswellasuncommittedtransactionshavingbeenredone.Nowundo-listcontainstransactionsthatareincomplete,thatis,haveneithercommittednorbeenfullyrolledback. RecoverywithLogicalUndo(Cont.)Recoveryfromsystemcrash(cont.)(Undophase):Scanlogbackwards,performingundoonlogrecordsoftransactionsfoundinundo-list.Logrecordsoftransactionsbeingrolledbackareprocessedasdescribedearlier,astheyarefoundSinglesharedscanforalltransactionsbeingundoneWhenisfoundforatransactionTiinundo-list,writealogrecord.StopscanwhenrecordshavebeenfoundforallTiinundo-listThisundoestheeffectsofincompletetransactions(thosewithneithercommitnorabortlogrecords).Recoveryisnowcomplete. ARIESRecoveryAlgorithm ARIESARIESisastateoftheartrecoverymethodIncorporatesnumerousoptimizationstoreduceoverheadsduringnormalprocessingandtospeeduprecoveryTherecoveryalgorithmwestudiedearlierismodeledafterARIES,butgreatlysimplifiedbyremovingoptimizationsUnliketherecoveryalgorithmdescribedearlier,ARIESUseslogsequencenumber(LSN)toidentifylogrecordsStoresLSNsinpagestoidentifywhatupdateshavealreadybeenappliedtoadatabasepagePhysiologicalredoDirtypagetabletoavoidunnecessaryredosduringrecoveryFuzzycheckpointingthatonlyrecordsinformationaboutdirtypages,anddoesnotrequiredirtypagestobewrittenoutatcheckpointtimeMorecominguponeachoftheabove… ARIESOptimizationsPhysiologicalredoAffectedpageisphysicallyidentified,actionwithinpagecanbelogicalUsedtoreduceloggingoverheadse.g.whenarecordisdeletedandallotherrecordshavetobemovedtofillholePhysiologicalredocanlogjusttherecorddeletionPhysicalredowouldrequireloggingofoldandnewvaluesformuchofthepageRequirespagetobeoutputtodiskatomicallyEasytoachievewithhardwareRAID,alsosupportedbysomedisksystemsIncompletepageoutputcanbedetectedbychecksumtechniques,ButextraactionsarerequiredforrecoveryTreatedasamediafailure ARIESDataStructuresARIESusesseveraldatastructuresLogsequencenumber(LSN)identifieseachlogrecordMustbesequentiallyincreasingTypicallyanoffsetfrombeginningoflogfiletoallowfastaccessEasilyextendedtohandlemultiplelogfilesPageLSNLogrecordsofseveraldifferenttypesDirtypagetable ARIESDataStructures:PageLSNEachpagecontainsaPageLSNwhichistheLSNofthelastlogrecordwhoseeffectsarereflectedonthepageToupdateapage:X-latchthepage,andwritethelogrecordUpdatethepageRecordtheLSNofthelogrecordinPageLSNUnlockpageToflushpagetodisk,mustfirstS-latchpageThuspagestateondiskisoperationconsistentRequiredtosupportphysiologicalredoPageLSNisusedduringrecoverytopreventrepeatedredoThusensuringidempotence ARIESDataStructures:LogRecordEachlogrecordcontainsLSNofpreviouslogrecordofthesametransactionLSNinlogrecordmaybeimplicitSpecialredo-onlylogrecordcalledcompensationlogrecord(CLR)usedtologactionstakenduringrecoverythatneverneedtobeundoneServestheroleofoperation-abortlogrecordsusedinearlierrecoveryalgorithmHasafieldUndoNextLSNtonotenext(earlier)recordtobeundoneRecordsinbetweenwouldhavealreadybeenundoneRequiredtoavoidrepeatedundoofalreadyundoneactionsLSNTransIDPrevLSNRedoInfoUndoInfoLSNTransIDUndoNextLSNRedoInfo12344"3"2"1" ARIESDataStructures:DirtyPageTableDirtyPageTableListofpagesinthebufferthathavebeenupdatedContains,foreachsuchpagePageLSNofthepageRecLSNisanLSNsuchthatlogrecordsbeforethisLSNhavealreadybeenappliedtothepageversionondiskSettocurrentendoflogwhenapageisinsertedintodirtypagetable(justbeforebeingupdated)Recordedincheckpoints,helpstominimizeredowork ARIESDataStructures ARIESDataStructures:CheckpointLogCheckpointlogrecordContains:DirtyPageTableandlistofactivetransactionsForeachactivetransaction,LastLSN,theLSNofthelastlogrecordwrittenbythetransactionFixedpositionondisknotesLSNoflastcompleted checkpointlogrecordDirtypagesarenotwrittenoutatcheckpointtimeInstead,theyareflushedoutcontinuously,inthebackgroundCheckpointisthusverylowoverheadcanbedonefrequently ARIESRecoveryAlgorithmARIESrecoveryinvolvesthreepassesAnalysispass:DeterminesWhichtransactionstoundoWhichpagesweredirty(diskversionnotuptodate)attimeofcrashRedoLSN:LSNfromwhichredoshouldstartRedopass:Repeatshistory,redoingallactionsfromRedoLSNRecLSNandPageLSNsareusedtoavoidredoingactionsalreadyreflectedonpageUndopass:RollsbackallincompletetransactionsTransactionswhoseabortwascompleteearlierarenotundoneKeyidea:noneedtoundothesetransactions:earlierundoactionswerelogged,andareredoneasrequired AriesRecovery:3PassesAnalysis,redoandundopassesAnalysisdetermineswhereredoshouldstartUndohastogobacktillstartofearliestincompletetransactionLastcheckpointLogTimeEndofLogAnalysispassRedopassUndopass ARIESRecovery:AnalysisAnalysispassStartsfromlastcompletecheckpointlogrecordReadsDirtyPageTablefromlogrecordSetsRedoLSN=minofRecLSNsofallpagesinDirtyPageTableIncasenopagesaredirty,RedoLSN=checkpointrecord’sLSNSetsundo-list=listoftransactionsincheckpointlogrecordReadsLSNoflastlogrecordforeachtransactioninundo-listfromcheckpointlogrecordScansforwardfromcheckpoint..Cont.onnextpage… ARIESRecovery:Analysis(Cont.)Analysispass(cont.)ScansforwardfromcheckpointIfanylogrecordfoundfortransactionnotinundo-list,addstransactiontoundo-listWheneveranupdatelogrecordisfoundIfpageisnotinDirtyPageTable,itisaddedwithRecLSNsettoLSNoftheupdatelogrecordIftransactionendlogrecordfound,deletetransactionfromundo-listKeepstrackoflastlogrecordforeachtransactioninundo-listMaybeneededforlaterundoAtendofanalysispass:RedoLSNdetermineswheretostartredopassRecLSNforeachpageinDirtyPageTableusedtominimizeredoworkAlltransactionsinundo-listneedtoberolledback ARIESRedoPassRedoPass:Repeatshistorybyreplayingeveryactionnotalreadyreflectedinthepageondisk,asfollows:ScansforwardfromRedoLSN.Wheneveranupdatelogrecordisfound:IfthepageisnotinDirtyPageTableortheLSNofthelogrecordislessthantheRecLSNofthepageinDirtyPageTable,thenskipthelogrecordOtherwisefetchthepagefromdisk.IfthePageLSNofthepagefetchedfromdiskislessthantheLSNofthelogrecord,redothelogrecordNOTE:ifeithertestisnegativetheeffectsofthelogrecordhavealreadyappearedonthepage.Firsttestavoidsevenfetchingthepagefromdisk! ARIESUndoActionsWhenanundoisperformedforanupdatelogrecordGenerateaCLRcontainingtheundoactionperformed(actionsperformedduringundoareloggedphysicalyorphysiologically).CLRforrecordnnotedasn’infigurebelowSetUndoNextLSNoftheCLRtothePrevLSNvalueoftheupdatelogrecordArrowsindicateUndoNextLSNvalueARIESsupportspartialrollbackUsede.g.tohandledeadlocksbyrollingbackjustenoughtoreleasereqd.locksFigureindicatesforwardactionsafterpartialrollbacksrecords3and4initially,later5and6,thenfullrollback12344"3"565"2"1"6" ARIES:UndoPassUndopass:Performsbackwardscanonlogundoingalltransactioninundo-listBackwardscanoptimizedbyskippingunneededlogrecordsasfollows:NextLSNtobeundoneforeachtransactionsettoLSNoflastlogrecordfortransactionfoundbyanalysispass.AteachsteppicklargestoftheseLSNstoundo,skipbacktoitandundoitAfterundoingalogrecordForordinarylogrecords,setnextLSNtobeundonefortransactiontoPrevLSNnotedinthelogrecordForcompensationlogrecords(CLRs)setnextLSNtobeundotoUndoNextLSNnotedinthelogrecordAllinterveningrecordsareskippedsincetheywouldhavebeenundonealreadyUndosperformedasdescribedearlier RecoveryActionsinARIES OtherARIESFeaturesRecoveryIndependencePagescanberecoveredindependentlyofothersE.g.ifsomediskpagesfailtheycanberecoveredfromabackupwhileotherpagesarebeingusedSavepoints:TransactionscanrecordsavepointsandrollbacktoasavepointUsefulforcomplextransactionsAlsousedtorollbackjustenoughtoreleaselocksondeadlock OtherARIESFeatures(Cont.)Fine-grainedlocking:IndexconcurrencyalgorithmsthatpermittuplelevellockingonindicescanbeusedTheserequirelogicalundo,ratherthanphysicalundo,asinearlierrecoveryalgorithmRecoveryoptimizations:Forexample:DirtypagetablecanbeusedtoprefetchpagesduringredoOutoforderredoispossible:redocanbepostponedonapagebeingfetchedfromdisk,and performedwhenpageisfetched.Meanwhileotherlogrecordscancontinuetobeprocessed RemoteBackupSystems RemoteBackupSystemsRemotebackupsystemsprovidehighavailabilitybyallowingtransactionprocessingtocontinueeveniftheprimarysiteisdestroyed. RemoteBackupSystems(Cont.)Detectionoffailure:Backupsitemustdetectwhenprimarysitehasfailedtodistinguishprimarysitefailurefromlinkfailuremaintainseveralcommunicationlinksbetweentheprimaryandtheremotebackup.Heart-beatmessagesTransferofcontrol:Totakeovercontrolbackupsitefirstperformrecoveryusingitscopyofthedatabaseandallthelongrecordsithasreceivedfromtheprimary.Thus,completedtransactionsareredoneandincompletetransactionsarerolledback.WhenthebackupsitetakesoverprocessingitbecomesthenewprimaryTotransfercontrolbacktooldprimarywhenitrecovers,oldprimarymustreceiveredologsfromtheoldbackupandapplyallupdateslocally. RemoteBackupSystems(Cont.)Timetorecover:Toreducedelayintakeover,backupsiteperiodicallyprocesestheredologrecords(ineffect,performingrecoveryfrompreviousdatabasestate),performsacheckpoint,andcanthendeleteearlierpartsofthelog.Hot-Spareconfigurationpermitsveryfasttakeover:Backupcontinuallyprocessesredologrecordastheyarrive,applyingtheupdateslocally.Whenfailureoftheprimaryisdetectedthebackuprollsbackincompletetransactions,andisreadytoprocessnewtransactions.Alternativetoremotebackup:distributeddatabasewithreplicateddataRemotebackupisfasterandcheaper,butlesstoleranttofailuremoreonthisinChapter19 RemoteBackupSystems(Cont.)Ensuredurabilityofupdatesbydelayingtransactioncommituntilupdateisloggedatbackup;avoidthisdelaybypermittinglowerdegreesofdurability.One-safe:commitassoonastransaction’scommitlogrecordiswrittenatprimaryProblem:updatesmaynotarriveatbackupbeforeittakesover.Two-very-safe:commitwhentransaction’scommitlogrecordiswrittenatprimaryandbackupReducesavailabilitysincetransactionscannotcommitifeithersitefails.Two-safe:proceedasintwo-very-safeifbothprimaryandbackupareactive.Ifonlytheprimaryisactive,thetransactioncommitsassoonasiscommitlogrecordiswrittenattheprimary.Betteravailabilitythantwo-very-safe;avoidsproblemoflosttransactionsinone-safe. EndofChapter16 Figure16.01 Figure16.02 Figure16.03 Figure16.04 Figure16.05 Figure16.06 Figure16.08 Figure16.09 Figure16.10 Extra ShadowPagingShadowpagingisanalternativetolog-basedrecovery;thisschemeisusefuliftransactionsexecuteseriallyIdea:maintaintwopagetablesduringthelifetimeofatransaction–thecurrentpagetable,andtheshadowpagetableStoretheshadowpagetableinnonvolatilestorage,suchthatstateofthedatabasepriortotransactionexecutionmayberecovered.ShadowpagetableisnevermodifiedduringexecutionTostartwith,boththepagetablesareidentical.Onlycurrentpagetableisusedfordataitemaccessesduringexecutionofthetransaction.WheneveranypageisabouttobewrittenforthefirsttimeAcopyofthispageismadeontoanunusedpage.ThecurrentpagetableisthenmadetopointtothecopyTheupdateisperformedonthecopy SamplePageTable ExampleofShadowPagingShadowandcurrentpagetablesafterwritetopage4 ShadowPaging(Cont.)Tocommitatransaction:1.Flushallmodifiedpagesinmainmemorytodisk2.Outputcurrentpagetabletodisk3.Makethecurrentpagetablethenewshadowpagetable,asfollows:keepapointertotheshadowpagetableatafixed(known)locationondisk.tomakethecurrentpagetablethenewshadowpagetable,simplyupdatethepointertopointtocurrentpagetableondiskOncepointertoshadowpagetablehasbeenwritten,transactioniscommitted.Norecoveryisneededafteracrash—newtransactionscanstartrightaway,usingtheshadowpagetable.Pagesnotpointedtofromcurrent/shadowpagetableshouldbefreed(garbagecollected). ShowPaging(Cont.)Advantagesofshadow-pagingoverlog-basedschemesnooverheadofwritinglogrecordsrecoveryistrivialDisadvantages:CopyingtheentirepagetableisveryexpensiveCanbereducedbyusingapagetablestructuredlikeaB+-treeNoneedtocopyentiretree,onlyneedtocopypathsinthetreethatleadtoupdatedleafnodesCommitoverheadishighevenwithaboveextensionNeedtoflusheveryupdatedpage,andpagetableDatagetsfragmented(relatedpagesgetseparatedondisk)Aftereverytransactioncompletion,thedatabasepagescontainingoldversionsofmodifieddataneedtobegarbagecollectedHardtoextendalgorithmtoallowtransactionstorunconcurrentlyEasiertoextendlogbasedschemes BlockStorageOperations PortionoftheDatabaseLogCorrespondingtoT0andT1 StateoftheLogandDatabaseCorrespondingtoT0andT1 PortionoftheSystemLogCorrespondingtoT0andT1 StateofSystemLogandDatabaseCorrespondingtoT0andT1'