Menu Bar

Thursday, March 23, 2017

Back-Ups, Up and Away!


"Of all the things I've lost, I miss my mind the most" - apparently attributed to Mark Twain

Well, I've been a real slacker in adding anything resembling new content to this blog for the last, oh, 4 years. Time to try to make an effort to correct that.

Let's talk about backups.

The definition of backup, at least in respect to computers as defined in the Oxford English Dictionary is "A copy of a file or other item of data made in case the original is lost or damaged."

So, all we need to do is write a script to copy all the files that make up our database off to a secondary location, right ?

Right ?

No so much.

If the database is down, you could take a copy of all of the files that make up your database purely with a copy at the operating system level. This was historically called a "cold" backup. This, of course, incurs downtime. In years gone by, regular nightly downtime for things like backups was a common affair. I remember when I first got into IT as a computer operator (back at the dawn of time) - we'd start up the machines around 6AM, they'd go from 7AM to 6PM and then we'd shut down the applications, run backups and batch jobs and the like.

However that was another time. In our modern, internet world, users expect 99.99% uptime (Well they expect 100%, but lots of SLA's seem to float in the 99.99%+ area). This means that you don't have the liberty to shut things down for hours while you copy your database files to secondary disk or tape.

So why can't we just copy the database files to backup media with the rest of the files on our server and call it good ?

Oracle knows about changes to the database by a number called the SCN - System Change Number.

Every datafile in an Oracle database has a header block. This header block contains lots of information but among it all, recorded in every datafile, is the SCN at last checkpoint.

The database writer (DBWR) likes to take frequent power naps. If it's not doing anything it sleeps. It will wake up every 3 seconds if the database isn't really busy, or more frequently than that if it is and flush unwritten blocks to disk and the database will perform a checkpoint. As a part of that process, the header block in every file in the database is updated with the same SCN. As a part of making sure that database integrity is maintained in the database, the SCN's across the headers must be consistent.

So, if you're copying a database while the instance is active, the SCN in the datafile headers is constantly being updated. You copy the first file and the SCN is 1000, you copy the second datafile and the SCN is 1100 and the third is at 1200. So then when you try to restore from your backup, the headers are inconsistent and the database won't open.

There are several ways to implement backup strategies to handle this requirement.

Oracle defines any type of backup not done using RMAN as a User Managed Backup. They also provide an integrated backup tool, Recovery Manager (RMAN) to perform backups - and most DBA's will probalby want to go that route.

There are several ways to complicate user managed backups. If you use ASM - you really need to use RMAN for example.

I think I'll wrap up this first entry here, since we've covered the why of backups and the couple of different options for performing Oracle backups. In the next article, I'll cover user managed backups and how to take them.

Until next time.