DEV Community

Arun Kumar for AWS Community Builders

Posted on • Edited on

DMS Configuration for Oracle to RDS migration

Introduction

The following will help you setup Oracle to RDS database migration.

Goals

  • The following guide will help bootstrap your customers Oracle On Premise → RDS migrations, Oracle RDS → Oracle RDS migrations, etc.

Details

DMS consists of 3 main components

  1. An instance that does all the heavy lifting Endpoints, for target and source databases
  2. A task that describes the loading and replication activities
  3. Tasks are where the bulk of the effort is in tuning the replication and load activities.

There are 3 types of tasks

  1. Full Load
  2. Replication (CDC)
  3. Full load and replication
  • Source databases are expected to take a 2–5% hit in resource consumption during the full load and sync operations, for a database of any reasonable size expect to be pushing about 20Mbps worth of traffic consistently during replication operations. Full load operations will use considerably higher bandwidth.

  • For tables with a primary key we will only enable supplemental logging on the column that contains the primary key, for tables with no primary key then we need it enabled it on all columns. If you have particularly busy Oracle schema, this will cause an impact to your performance. Careful of your potential resource contention issues.

Getting Setup

VPC configuration

  • Don’t use FQDN. Just use the primary IP address of your RDS (Target) and your on premise over Direct Connect Oracle server. AWS have confirmed that custom DNS servers do not work, the resources created by DMS do not inherit custom DNS configuration.

Source Database/Schema Preparation (On Premise)

  • You need to create a user and grant them some pretty heavy permissions on both the Source and Target database instances.

  • You can find the AWS provided permissions here; or use the script below to help setup your DMS user on the source Oracle DB.

Outcomes

  • Correct Permissions for DMS user to replicate data
  • Enable archive logging if its not already enabled, and explain the archive log destination id
  • Find the tables with a primary key and list them, then create your alter statement around that list
  • Find the tables without a primary key, then create your alter statement around that list

If you know there are tables in the schema that don’t need to move, it is a good idea to identify those now.

For execution; I use SQL Developer or SQL Plus depending on what I need to do and my connectivity options.

If you are setting up RDS Oracle to RDS Oracle migrations the Source preparation is slightly different.

-- ON SOURCE DB (NON RDS) (as SYSDBA)
-- DMS onnection attitrubtes addSupplementalLogging=Y;readTableSpaceName=true;archivedLogDestId=1;exposeViews=true
-- other attributes
-- https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.Oracle.html
CREATE user DMS identified by <PASSWORD> default tablespace DATA temporary tablespace DATA_TEMP;
-- Change the above to whatever tablespaces you have configured
Grant CREATE session to DMS;
Grant ALTER ANY TABLE to DMS; 
Grant EXECUTE on dbms_crypto to DMS;
Grant SELECT on ALL_VIEWS to DMS;
Grant SELECT ANY TABLE to DMS;
Grant SELECT ANY TRANSACTION to DMS;
Grant SELECT on V_$ARCHIVED_LOG to DMS;
Grant SELECT on V_$LOG to DMS;
Grant SELECT on V_$LOGFILE to DMS;
Grant SELECT on V_$DATABASE to DMS;
Grant SELECT on V_$THREAD to DMS;
Grant SELECT on V_$PARAMETER to DMS;
Grant SELECT on V_$NLS_PARAMETERS to DMS;
Grant SELECT on V_$TIMEZONE_NAMES to DMS;
Grant SELECT on V_$TRANSACTION to DMS;
Grant SELECT on ALL_INDEXES to DMS;
Grant SELECT on ALL_OBJECTS to DMS;
Grant SELECT on DBA_OBJECTS to DMS; 
Grant SELECT on ALL_TABLES to DMS;
Grant SELECT on ALL_USERS to DMS;
Grant SELECT on ALL_CATALOG to DMS;
Grant SELECT on ALL_CONSTRAINTS to DMS;
Grant SELECT on ALL_CONS_COLUMNS to DMS;
Grant SELECT on ALL_TAB_COLS to DMS;
Grant SELECT on ALL_IND_COLUMNS to DMS;
Grant SELECT on ALL_LOG_GROUPS to DMS;
Grant SELECT on SYS.DBA_REGISTRY to DMS;
Grant SELECT on SYS.OBJ$ to DMS;
Grant SELECT on DBA_TABLESPACES to DMS;
Grant SELECT on ALL_TAB_PARTITIONS to DMS;
Grant SELECT on ALL_ENCRYPTED_COLUMNS to DMS;
Grant SELECT on V_$LOGMNR_LOGS to DMS;
Grant SELECT on V_$LOGMNR_CONTENTS to DMS;
Grant LOGMINING TO DMS;
Grant EXECUTE ON dbms_logmnr TO DMS;
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
Enter fullscreen mode Exit fullscreen mode

-- Ensure archive logging is enabled and if its not, here is how to enable it (CAUTION. This turns off the database if "shutdown immediate" wasn't clear enough.)

shutdown immediate;
startup mount;
alter database archivelog;
alter database open;
select dest_id,dest_name, status, destination from v$archive_dest;  
Enter fullscreen mode Exit fullscreen mode

-- find tables which have a primary key.

select at.TABLE_NAME
from all_tables at
where not exists (select 1
from all_constraints ac
where ac.owner = at.owner
and ac.table_name = at.table_name
and ac.constraint_type = 'P')
and at.owner = '<SCHEMA>';
Enter fullscreen mode Exit fullscreen mode

-- This will list the tables with a primary key, you can then construct your supplemental logging statement based on the column which has the primay key.
-- ALTER TABLE ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS

SELECT DISTINCT (a.table_name)
FROM ALL_CONS_COLUMNS A
JOIN ALL_CONSTRAINTS C
ON A.CONSTRAINT_NAME = C.CONSTRAINT_NAME
WHERE
C.CONSTRAINT_TYPE not in('P')
and a.owner ='<SCHEMA>';
Enter fullscreen mode Exit fullscreen mode

-- This will list all the tables without a primary key, these tables need supplemental logging on all columns.
-- If you run the following and then execute all statements generated, ensure that supplemental logging is enabled on ALL TABLES and COLUMNS.
-- Which will cause a greater performance hit. YMMV.
-- select 'ALTER TABLE '||table_name||' ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;' from all_tables where owner = ''

Target Preparation (RDS)

  • If you have tablespaces with specific block sizes (4k, 8k, 16, etc) then you need to setup your RDS parameter group to allow for caching according to the size of those table spaces. Example below.
ParameterGroup:
      Properties:
        Description: saa-oracle-requirements
        Family: oracle-ee-12.1
        Parameters:
          db_4k_cache_size: '61440'
Enter fullscreen mode Exit fullscreen mode

You can find the required AWS provided permissions here; or use the following to help setup your schema user on the target RDS Oracle DB.

Outcomes:

  • Correct permissions for user to ingest data
  • Create some tables spaces
-- ON TARGET RDS DATABASE 
 -- connection attitrubtes n/a
 -- other attributes n/a
 -- Use the name of the schema you want to import below.
 create user SCHEMA identified by <PASSWORD>;
 exec rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVED_LOG','SCHEMA','SELECT');
 exec rdsadmin.rdsadmin_util.grant_sys_object('V_$LOG','SCHEMA','SELECT');
 exec rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGFILE','SCHEMA','SELECT');
 exec rdsadmin.rdsadmin_util.grant_sys_object('V_$DATABASE','SCHEMA','SELECT');
 exec rdsadmin.rdsadmin_util.grant_sys_object('V_$THREAD','SCHEMA','SELECT');
 exec rdsadmin.rdsadmin_util.grant_sys_object('V_$PARAMETER','SCHEMA','SELECT');
 exec rdsadmin.rdsadmin_util.grant_sys_object('V_$NLS_PARAMETERS','SCHEMA','SELECT');
 exec rdsadmin.rdsadmin_util.grant_sys_object('V_$TIMEZONE_NAMES','SCHEMA','SELECT');
 exec rdsadmin.rdsadmin_util.grant_sys_object('V_$TRANSACTION','SCHEMA','SELECT');
 exec rdsadmin.rdsadmin_util.grant_sys_object('DBA_REGISTRY','SCHEMA','SELECT');
 exec rdsadmin.rdsadmin_util.grant_sys_object('OBJ$','SCHEMA','SELECT');
 exec rdsadmin.rdsadmin_util.grant_sys_object('ALL_ENCRYPTED_COLUMNS','SCHEMA','SELECT');
 exec rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_LOGS','SCHEMA','SELECT');
 exec rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_CONTENTS','SCHEMA','SELECT');
 exec rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR','SCHEMA','EXECUTE');
 exec rdsadmin.rdsadmin_util.grant_sys_object('DBMS_CRYTPO', 'SCHEMA','EXECUTE');
 grant SELECT ANY TRANSACTION to SCHEMA;
 grant SELECT on V$NLS_PARAMETERS to SCHEMA;
 grant SELECT on V$TIMEZONE_NAMES to SCHEMA;
 grant SELECT on ALL_INDEXES to SCHEMA;
 grant SELECT on ALL_OBJECTS to SCHEMA;
 grant SELECT on DBA_OBJECTS to SCHEMA;
 grant SELECT on ALL_TABLES to SCHEMA;
 grant SELECT on ALL_USERS to SCHEMA;
 grant SELECT on ALL_CATALOG to SCHEMA;
 grant SELECT on ALL_CONSTRAINTS to SCHEMA;
 grant SELECT on ALL_CONS_COLUMNS to SCHEMA;
 grant SELECT on ALL_TAB_COLS to SCHEMA;
 grant SELECT on ALL_IND_COLUMNS to SCHEMA;
 grant DROP ANY TABLE to SCHEMA;
 grant SELECT ANY TABLE to SCHEMA;
 grant INSERT ANY TABLE to SCHEMA;
 grant UPDATE ANY TABLE to SCHEMA;
 grant CREATE ANY TABLE to SCHEMA;
 grant CREATE ANY VIEW to SCHEMA;
 grant DROP ANY VIEW to SCHEMA;
 grant CREATE ANY PROCEDURE to SCHEMA;
 grant ALTER ANY PROCEDURE to SCHEMA;
 grant DROP ANY PROCEDURE to SCHEMA;
 grant CREATE ANY SEQUENCE to SCHEMA;
 grant CREATE ANY TABLESPACE to SCHEMA;
 grant CREATE ANY TABLE to SCHEMA;
 grant ALTER ANY SEQUENCE to SCHEMA;
 grant DROP ANY SEQUENCE to SCHEMA
 grant select on DBA_USERS to SCHEMA;
 grant select on DBA_TAB_PRIVS to SCHEMA;
 grant select on DBA_OBJECTS to SCHEMA;
 grant select on DBA_SYNONYMS to SCHEMA;
 grant select on DBA_SEQUENCES to SCHEMA;
 grant select on DBA_TYPES to SCHEMA;
 grant select on DBA_INDEXES to SCHEMA;
 grant select on DBA_TABLES to SCHEMA;
 grant select on DBA_TRIGGERS to SCHEMA;
 grant UNLIMITED TABLESPACE to SCHEMA;
 grant CREATE SESSION to SCHEMA;
 grant DROP ANY TABLE to SCHEMA;
 grant ALTER ANY TABLE to SCHEMA;
 grant CREATE ANY INDEX to SCHEMA;
 grant LOCK ANY TABLE to SCHEMA;
 create tablespace DATA;
 create tablespace DATA_4k blocksize 4K;
Enter fullscreen mode Exit fullscreen mode

Replication Instances

[https://docs.aws.amazon.com/cli/latest/reference/dms/create-replication-instance.html]

  • Provision a well sized replication instance, recommend “r” class instances due to memory optimization.

  • If this is for production, make it multi-AZ. Give the instance roughly 150% of the storage for your database.

Endpoint Creation

Source

  • Remember to use the IP address, not FQDN.

  • Remember to add the extra connection attributes from the SQL file above.

Target

  • Again, IP address not FQDN.

  • No additional connection attributes should be necessary, unless you need to disable useDirectPathFullLoad. See the documentation for why you might want to do that..

[https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.Oracle.html]

Tasks

Right. You are finally ready to start debugging replication. Good luck and godspeed.

Don’t use the AWS Console for creating or modifying tasks. Use the CLI if you’re doing this manually. Half the options are hidden in the console.

I highly recommend that you do this schema by schema, with a separate task for each schema you need to migrate. It will make debugging a lot easier.

Also note, global stored procs and triggers aren’t coming with us. Drop these to SQL and manually create them after you finish the full load operation. Schema level indexes, views, etc should come with us. Table level constraints will as well. If you are having issues with triggers etc, I would recommend that you again drop these to SQL and apply them before the full load, then disable the trigger and enable it after full load is complete. I have included some actions in the debugging section that will help.

What do you want to import?

You’ll need to construct a json file which has the an include statement and ordered list of the schema tables you want to import.

I don’t recommend using a wildcard (%) and then excluding the tables you don’t want, and the reason for this is during debugging if you have a problematic table you can just remove that section from your configuration, drop the table from the target and try again or create a separate task just for that table while you debug the issue.

More information here: [https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.html]

ImportTables.json

{
    "rules": [{
            "rule-type": "selection",
            "rule-id": "1", // unique
            "rule-name": "1", // unique 
            "object-locator": {
                "schema-name": "<SCHEMA>",
                "table-name": "TABLE_NAME_ACCEPTS_WILDCARDS_%"
            },
            "rule-action": "include"
        }
}
Enter fullscreen mode Exit fullscreen mode

Task Config
Next you need to configure the task.

There are a few things we want to change from the defaults, otherwise I will leave the investigation of each individual option up to you.

[https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TaskSettings.html]

  1. Only truncate tables that exist, so we keep the metadata. We do this if we have to create any tables in advance, because dropping the table will remove metadata and we just want to clear the contents.
  2. Logging — We want detailed debug logs, or at least debug logs. The defaults don’t give you a lot to work with.

TaskConfig.json

{
    "TargetMetadata": {
        "TargetSchema": "<SCHEMA>",      // the schema name we want to import to.
        "SupportLobs": true,
        "FullLobMode": true,
        "LobChunkSize": 64,
        "LimitedSizeLobMode": false,
        "LobMaxSize": 0,
        "LoadMaxFileSize": 0,
        "ParallelLoadThreads": 0,
        "ParallelLoadBufferSize": 0,
        "BatchApplyEnabled": true,
        "TaskRecoveryTableEnabled": true
    },
    "FullLoadSettings": {
        "TargetTablePrepMode": "TRUNCATE_BEFORE_LOAD", // this one. 
        "CreatePkAfterFullLoad": false,
        "StopTaskCachedChangesApplied": false,
        "StopTaskCachedChangesNotApplied": false,
        "MaxFullLoadSubTasks": 8,
        "TransactionConsistencyTimeout": 600,
        "CommitRate": 10000
    },
    "Logging": {                    // This is where we change the severity of the logging information. 
        "EnableLogging": true,
        "LogComponents": [{
                "Id": "SOURCE_UNLOAD",
                "Severity": "LOGGER_SEVERITY_DETAILED_DEBUG"
            },
            {
                "Id": "TARGET_LOAD",
                "Severity": "LOGGER_SEVERITY_DETAILED_DEBUG"
            },
            {
                "Id": "SOURCE_CAPTURE",
                "Severity": "LOGGER_SEVERITY_DETAILED_DEBUG"
            },
            {
                "Id": "TARGET_APPLY",
                "Severity": "LOGGER_SEVERITY_DETAILED_DEBUG"
            }, {
                "Id": "TASK_MANAGER",
                "Severity": "LOGGER_SEVERITY_DETAILED_DEBUG"
            }
        ],
        "CloudWatchLogGroup": "",
        "CloudWatchLogStream": ""
    },
    "ControlTablesSettings": {
        "historyTimeslotInMinutes": 5,
        "ControlSchema": "",
        "HistoryTimeslotInMinutes": 5,
        "HistoryTableEnabled": true,
        "SuspendedTablesTableEnabled": true,
        "StatusTableEnabled": true
    },
    "StreamBufferSettings": {
        "StreamBufferCount": 3,
        "StreamBufferSizeInMB": 8,
        "CtrlStreamBufferSizeInMB": 5
    },
    "ChangeProcessingDdlHandlingPolicy": {
        "HandleSourceTableDropped": true,
        "HandleSourceTableTruncated": true,
        "HandleSourceTableAltered": true
    },
    "ErrorBehavior": {
        "DataErrorPolicy": "LOG_ERROR",
        "DataTruncationErrorPolicy": "LOG_ERROR",
        "DataErrorEscalationPolicy": "SUSPEND_TABLE",
        "DataErrorEscalationCount": 0,
        "TableErrorPolicy": "SUSPEND_TABLE",
        "TableErrorEscalationPolicy": "STOP_TASK",
        "TableErrorEscalationCount": 0,
        "RecoverableErrorCount": -1,
        "RecoverableErrorInterval": 5,
        "RecoverableErrorThrottling": true,
        "RecoverableErrorThrottlingMax": 1800,
        "ApplyErrorDeletePolicy": "IGNORE_RECORD",
        "ApplyErrorInsertPolicy": "LOG_ERROR",
        "ApplyErrorUpdatePolicy": "LOG_ERROR",
        "ApplyErrorEscalationPolicy": "LOG_ERROR",
        "ApplyErrorEscalationCount": 0,
        "ApplyErrorFailOnTruncationDdl": false,
        "FullLoadIgnoreConflicts": true,
        "FailOnTransactionConsistencyBreached": false,
        "FailOnNoTablesCaptured": false
    },
    "ChangeProcessingTuning": {
        "BatchApplyPreserveTransaction": true,
        "BatchApplyTimeoutMin": 1,
        "BatchApplyTimeoutMax": 30,
        "BatchApplyMemoryLimit": 500,
        "BatchSplitSize": 0,
        "MinTransactionSize": 1000,
        "CommitTimeout": 1,
        "MemoryLimitTotal": 1024,
        "MemoryKeepTime": 60,
        "StatementCacheSize": 50
    },
    "ValidationSettings": {
        "EnableValidation": true,
        "ValidationMode": "ROW_LEVEL",
        "ThreadCount": 5,
        "PartitionSize": 10000,
        "FailureMaxCount": 10000,
        "RecordFailureDelayInMinutes": 5,
        "RecordSuspendDelayInMinutes": 30,
        "MaxKeyColumnSize": 8096,
        "TableFailureMaxCount": 1000,
        "ValidationOnly": false,
        "HandleCollationDiff": false,
        "RecordFailureDelayLimitInMinutes": 0
    }
}
Enter fullscreen mode Exit fullscreen mode

Now you can create and run the task. [https://docs.aws.amazon.com/cli/latest/reference/dms/create-replication-task.html]

aws dms create-replication-task --replication-task-identifier <value> \
 --source-endpoint-arn <value> \ 
 --target-endpoint-arn <value> \ 
 --replication-instance-arn <value> \ 
 --migration-type full-load-and-cdc \ 
 --table-mappings ImportTables.json \
 --replication-task-settings TaskConfig.json
Enter fullscreen mode Exit fullscreen mode

Debugging
If you followed the above initial setup, configuration issues should be minimal.

Cloudwatch logs are enabled for DMS and in the above setup we configured them to be at detailed debug level, this should give you an accurate error message in CloudWatch when a table import misbehaves.

Unfortunately this is where things begin to fall into the realms of DBA territory… but essentially read the logs, weaponise google search, get the DBA guy to help you out. It’s not impossible and in nonproduction you basically get unlimited retries.

Suggestions

  • Multiple, smaller replication task over 1 all inclusive replication task. This will increase your throughput so be mindful of any resource constraints. However, this makes it easier to debug.
  • I used 3 tasks, 1 for the majority of tables and 2 smaller jobs for misc tables or tables that were identified as troublesome during our nonproduction DMS migrations. Order your imports.
  • If you have a table that isn’t importing properly due to a constraint or trigger, disable those on the target during full load and then once that is complete, enable them for CDC replication.
  • Constraints can be turned on and off.
declare
begin
for c1 in (select y1.table_name, y1.constraint_name from user_constraints y1, user_tables x1 where x1.table_name = y1.table_name order by y1.r_constraint_name nulls last) loop
    begin
        dbms_output.put_line('alter table '||c1.table_name||' disable constraint '||c1.constraint_name || ';');
        execute immediate  ('alter table '||c1.table_name||' disable constraint '||c1.constraint_name);
    end;
end loop;

-- uncomment to truncate the table after disabling the constraint.
-- for t1 in (select table_name from user_tables) loop
--     begin
--         dbms_output.put_line('truncate table '||t1.table_name || ';');   
--         execute immediate ('truncate table '||t1.table_name);
--     end;
-- end loop;

for c2 in (select y2.table_name, y2.constraint_name from user_constraints y2, user_tables x2 where x2.table_name = y2.table_name order by y2.r_constraint_name nulls first) loop
    begin
        dbms_output.put_line('alter table '||c2.table_name||' enable constraint '||c2.constraint_name || ';');       
        execute immediate ('alter table '||c2.table_name||' enable constraint '||c2.constraint_name);
    end;
end loop;
end;
Enter fullscreen mode Exit fullscreen mode

Disable/Enable Triggers

CREATE OR REPLACE PROCEDURE ALTER_ALL_TRIGGERS(status VARCHAR2) IS
  CURSOR c_tr IS (SELECT 'ALTER TRIGGER ' || trigger_name AS stmnt FROM user_triggers);
BEGIN
  IF status NOT IN ('ENABLE', 'enable', 'DISABLE', 'disable') THEN
    DBMS_OUTPUT.PUT_LINE('ONLY ''ENABLEDISABLE'' ACCEPTED AS PARAMETERS');
    RAISE VALUE_ERROR;
  END IF;
  FOR tr IN c_tr LOOP
    EXECUTE IMMEDIATE tr.stmnt || ' ' || status;
  END LOOP;
END;
Enter fullscreen mode Exit fullscreen mode
EXEC ALTER_ALL_TRIGGERS('DISABLE');
EXEC ALTER_ALL_TRIGGERS('ENABLE');
Enter fullscreen mode Exit fullscreen mode
  • Use the table statics tab to check out the progress of the DMS task. From here you can also drop a table and reload its data. This can be a useful tool for small tables where validation fails and you make a change but don’t want to restart the whole task.
  • Read the logs. They will often write out the SQL that failed which you can then replay to debug.
  • Finally if all else is failing, remember that dropping an entire table to SQL and manually importing it isn’t the end of the world (depending of course on number of rows). Just exclude it from your TableImport.json and move on.

Validation

After successful Full load and CDC has kicked off, your tasks will take a status of “Load Complete, replication ongoing”.

AWS DMS runs its own validation to ensure that the tables are aligned, which will throw a status of “TableError” in table statistics and “Error” on Tasks screen.

It is now time to get the Application team and DBA to check the schema and table data. You can also run your own validation steps, write some SQL and do a spot check across both databases. The simplest test is of course, if the application works and passes regression testing on the new database.

Top comments (1)

Collapse
 
alexhguerra profile image
alexandre guerra

Hello Arun, thanks for this very detailed tutorial
I have a question
When doing Oracle to oracle migration/dms process, i got confused by the parameter alwaysReplaceEmptyString
Since the process is oracle to oracle, i wonder if there will happen at all any "empty" string, and if i need to set this parameter to True
Rereading all the docs many times, i fail to see the requirement