SQL Loader



Data Loading Methods :
Several Methods are available for Loading Data into Tables in an Oracle Database.
a)  Direct Load ( Path ) INSERT :
Direct Load INSERT can be used to Copy Data from One Table to Another within the Same Database. It Speeds Up the Insert Operation, bypassing the Database Buffer Cache & Writing Data directly into the Datafiles.
b )  SQL*Loader :
SQL*Loader loads Data from External Files into Tables of an Oracle Database.
c )  Export & Import

A)SQL*Loader Features :

1)  SQL*Loader can Use  One or  More Input Files
2)  Data can be Loaded into Several Tables in One Run.
3)  Several Input Records can be Combined into One Logical Record for Loading.
4)  Input Fields can be of  Fixed Or Variable Lengths
5)  Input Data can be in any Format : Character , Binary , Date , Packed Decimal , & Zoned Decimal.
6)  Data can be Loaded from Different Type of Media : Disk , Tape , or  Named Pipes
7)  Options are available  To Replace  OR  To Append to Existing Data in the Tables.
8)  SQL Functions can be Applied on the Input Data before the Row is Stored in the Database.
9)  Column Values can be Auto Generated based on Rules.
     Eg :  Sequential Key value can be Generated & Stored in a Column.
10) Data can be Loaded Directly into the Table ( using Direct Path-Load ), bypassing the Database Buffer Cache.

B)Loading Methods in SQL*Loader :

SQL*Loader provides 2 Methods for Loading Data :
1)  Conventional Path
2)  Direct Path

1)  Conventional Path Load :
During a Conventional Path Load operation, Oracle Execute SQL INSERT Statement to Load the Data.
During a Conventional Path Load operation, Input Records are Parsed based on Field Specifications, And an Array of Records is Built & Inserted into the Table specified in Control File.
Records that do not confirm to the Field Specifications are Rejected  & Records that do not satisfy the Selection Criteria are Discarded.
Conventional Path Load can be use to load Data into Both Clustered & Un-Clustered Tables.
Redo Log Generation is Controlled by the Logging Mode for the Table that is being Loaded.

2)  Direct Path Load :
Direct Path Load builds Blocks of Data in Memory & Saves these Blocks Directly into the Database Table being Loaded.
Direct Path Loads use the Field Specifications to Build whole Oracle Blocks of Data & Write the Blocks Directly to Oracle Datafiles.
Redo Log Entries are Not Generated Unless the Database is in ARCHIVELOG Mode.
Direct Path Load bypasses the Database Buffer Cache, And Accesses the SGA only for Extent Management & Adjustment of the High-Water Mark.
Direct Path Load is generally Faster than Conventional Path Load, But cannot be used in All situations.

Note : catldr.sql Script , supplied by Oracle , Creates Views that are used by Direct Path Load.
It is Automatically Invoked when catalog.sql Script is Run.

1) Method of Saving Data :
Conventional Path Load Uses SQL Processing & COMMITs for Saving Data.
Direct Path Load Uses Data Saves to Write Blocks of Data to Oracle Datafiles.
Indexes are not Updated at each Data Save.
2) Logging Changes :
Conventional Path Load generates Redo Log Entries just as any DML Statement.
In Direct Load Load, Redo Log Entries Are not Generated if :
-- Database is in NOARCHIVELOG Mode
-- Database is in ARCHIVELOG Mode, but Logging is Disabled.




3) Enforcing Constraints :
During Conventional Path Load, All Constraints that are Enabled are enforced.
During Direct Path Load, only UNIQUE, NOT NULL & PRIMARY KEY constraints are enforced, FOREIGN & CHECK are Not enforced.
4) Clustered Table :
Clustered Tables can be Loaded using Conventional Path Loads Only.
Direct Path Loads cannot be Used to Load Rows into Clustered Tables.
5) Locking :
While Conventional Path Load is in progress, other Users can make Changes to Tables that are being Loaded.
During Direct Path Load, Oracle Locks the Tables & Indexes at the Start of the Load & Releases them when the Load is Complete. So, while Direct Path Load is in Progress Other Transactions can’t make Changes to the Tables that are being Loaded. The only Exception to this Rule is when Several Parallel Direct Load Sessions are used concurrently.





              
Note : SQL*Loader Utility & Oracle Server should be the Same Version.

C)SQL*Loader Files :
SQL*Loader Utility obtains the Input from Control File & Data File.
The Output from the SQL*Loader is then Inserted into the Oracle Database.
There are other Output Files Like Log File, Bad File, & Discard File
That SQL*Loader generates during the Load operation.

a)  Loader Control File :
Specifies the Input Format, Output Tables, & Optional Conditions that can be used to Load only Part of the Records Found in the Input Datafiles.

b)  Input Datafiles :
Contains the Data in the Format defined in the Control File 

c)  Parameter File :
This is an Optional File that can be used to Define the Command Line Parameters for the Load.

d)  Log File :
Log File is Created by SQL*Loader. It contains a Record of the Load

e)  Bad File :
Bad File is used by SQL*Loader utility to Write the Records that are Rejected during the Load.

f)  Discard File :
Discard File is a File that can be Created, if necessary, To Store All Records that did not Satisfy the Selection Criteria.


1 )  SQL*Loader Control File : 

SQL*Loader Control File is a Text File that contains DDL instructions.
SQL*Loader Control File tells SQL*Loader :
--Where to Find the Data for Loading ( i.e., Input Data File )
--The Format of the Data
--Names of the Oracle Objects into which the Data must be Loaded.
--How to configure Memory Management, Rejected Records, Discarded Records, and so on  as it Loads the Data.
--Optional Conditions that can be used to Load only Part of the Records Found in the Input Datafiles.
--How to Manipulate the Data.

**  Sample SQL*Loader Control File :
1         -- This is a Sample Control File
2        LOAD DATA        WHEN (57) = ‘.’
9        TRAILING NULLCOLS
10     (   hiredate SYSDATE, “TO_NUMBER(:sal,’$99,999.99’)”, comm INTEGER EXTERNAL ENCLOSED BY ‘(‘ AND ‘%’ “:comm * 100”)

                   
2)  LOAD DATA :
LOAD DATA statement tells SQL*Loader that this is the Beginning of a New Data Load.
If you were Continuing a Load that had been Interrupted in Progress, you would use CONTINUE LOAD DATA statement.

3)  INFILE  keyword :
INFILE  keyword specifies the Namne of a Input Datafile Containing Data that you want to Load.

4)  BADFILE  keyword :
BADFILE  keyword specifies the Name of a File into which Rejected Records are placed.

5)  DISCARDFILE  keyword :
DISCARDFILE  keyword specifies the Name of a File into which Discarded Records are placed.

6)  APPEND :
You can perform 4 Types of Load Operation:
(a)  INSERT : Loads Data into an Empty Table.
(b)  APPEND : Loads Data into a Table that already has Records.
(c)  REPLACE : Loads Data into a Table after Deleting the Existing Table Data.
(d)  TRUNCATE : Loads Data into a Table after Truncating the Table Data.

7)  INTO TABLE  clause :
INTO TABLE  clause Enables you to identify Tables, Fields, & Data-Types.
It Defines the Relationship between Records in the Datafile & Tables in the Database .

SQL*Loader organizes the Input Data into Physical Records, Accordint to the One of the following specified Record Format :
--  Combine a Fixed Number of Physical Records to form each Logical Record.
--  Combine Physical Records into Logical Records while a certain Condition is True.

a)  Using CONCATENATE to Assemble Logical Record :
CONCATENATE is used when SQL*Loader should always Add the Same Number of Physical Records to form One Logical Record.
CONCATENATE  integer
integer : It specifies the Number of Physical Records to Combine.

b )  Using CONTINUEIF to Assemble Logical Record :
CONTINUEIF must be Used if the Number of Physical Records to be continued varies. 
CONTINUEIF Keyword is Followed by a Condition that is Evaluated for each Physical Record as it is Read.
Eg : 2 Records might be Combined if there was a Pound Sign ( # ) in Character Position 80 of the First Record. If there was any other Character, Second Record would Not be Added to the First. 


2)  SQL*Loader Data Files  &  Input Data :


SQL*Loader Reads ( Obtains ) Data from One or More Datafiles specified in  the Control File. Data in the Datafile is Organized as Records.


Datafile can be One of following 3 Formats :
(a)  Fixed-Record Format
(b)  Variable-Record Format
(c)  Stream-Record Format


Default Format is the Stream-Record Format. Record Format can be Defined at the INFILE parameter of the Control File.Note : SQL*Loader Utility provides you the Facility of specifying the Data in the Control File itself. In such cases, you Must Define the INFILE parameter in the Control File as  INFILE * 


a)  Fixed-Record Format :
File is in Fixed-Record Format when all Records in a Datafile are the Same Byte Length. This Format is the Least Flexible, But Performance is Better than Variable Or Sream-Record Format.
Syntax :
INFILE   ‘<Datafile_Name>’  “fix n”
It specifies that SQL*Loader shoult interpret the particular Datafile as being in Fixed-Record Format where Every Record in n Byte Long.
Example :
--  Control File
        LOAD DATA
        INFILE ‘example.dat’ “fix 10”
        INTO TABLE example
        FIELDS TERMINATED BY  ‘,’
        ( col1 , col2 )
 --  DataFile  example.dat :
         0001,abcd
        0002,efgh
        0003,ijkl
This Example shows a Control File that specifies a Fixed-Record Format Datafile.
Datafile contains 4 Physical Records.
First Record is [0001, abcd], which is exactly 9 Bytes long (using a Single-Byte Character Set) & the Carriage Return is the Tenth Byte. 

b)  Variable-Record Format :
File is in Variable-Record Format when the Length of each Record in a Character Field is included at the Beginning of each Record in the Datafile.
This Format is more Flexible than Fixed-Record Format  & have Better Performance than Stream-Record Format.
Syntax :
          INFILE   ‘<Datafile_Name>’  “var n”
        Here,    n = Number of Bytes in the Record Length Field.
                 If n is not specified , SQL*Loader assumes a Length of 5.
           n > 40 Results in an Error.
Example :
        --  Control File
        LOAD DATA
        INFILE ‘example.dat’ “var 3”
        INTO TABLE example
        FIELDS TERMINATED BY  ‘,’ OPTIONALLY ENCLOSED BY ‘”’
        ( col1 char(5) , col2 char(7) )
--  DataFile  example.dat :
         009hello,cd,
        010world,im,
        012my,name is,
Here , Control File tells SQL*Loader to Look for Data in Fields are 3 Bytes long.
Datafile example.dat contains 3 Physical Records.
First is specified to be 009 ( that is, 9 ) Bytes Long, Second is 010 Bytes Long ( including 1 Character NewLine ), & Third is 012 Byte Long.

c)  Fixed-Record Format :
File is in Stream-Record Format when the Records are Not specified by Size; instead SQL*Loader forms Records by Scanning for the Record Terminator.
This Format is the Most Flexible, But Performance is Not good compared to Variable Or Sream-Record Format.
Syntax :
          INFILE   ‘<Datafile_Name>’  [“str Terminator_String”]
Terminator_String is specified by Either ‘char_string’  Or  X’hex_string
‘char_string’  : It is a String of Characters Enclosed in Single Or Double Quotation Marks 
X’hex_string’  : It is a Byte String in HexaDecimal Format.
Note : If no  Terminator_String is specified, it Defaults to the New-Line Character ( End-Of-Line Character ).
The New-Line Character is Connected to the Character Set of the Datafile.
Example :
--  Control File
        LOAD DATA
        INFILE ‘example.dat’ “str ‘|\n’”
        INTO TABLE example
        FIELDS TERMINATED BY  ‘,’ OPTIONALLY ENCLOSED BY ‘”’
        ( col1 char(5) , col2 char(7) )
--  DataFile  example.dat :
         hello,world,|
        james,bond,|
Here , Control File tells SQL*Loader to Load Data in Stream-Record Format where Terminator String is specified using a Character String ‘|\n’.
The Use of the BackSlash ( \ ) Character allows the Character String to specify the Non-Printable Linefeed Character.

3)  Bad File :
Bad File contains Records that were Rejected, Either  by SQL*Loader  OR  by the Oracle Database.
(a)  SQL*Loader Rejects : 
SQL*Loader Rejects Records when the specified Input Format is Invalid. Rejected Records are placed in the Bad File.
e.g., If the Second Enclosure Delimiter is Missing Or If a Delimited Field exceeds its Maximum Length, then SQL*Loader rejects the Record.  
(b)  Oracle Rejects :
Records that are Accepted by SQL*Loader for processing, Are forwarded to the Oracle Database for Insertion.
If Oracle determines that the Row is Valid, then the Row is Inserted into the Database.
If the Row ( Record ) is termed as Invalid, due to Inconsistent Data-Types Or Constraint Violations on the Database Objects, then these Records are stored in the Bad File.
Note : Bad File is specified Using Command Prompt Parameter BAD  Or in the Control File . 
The File is not Created if there are no Bad Records, & If the File already Exists, then it is OverWritten with the Same Name.
(4)  Discard File :
As SQL*Loader Executes , it may Create a Discard File.
This File is Created only if you have specified that a Discard File should be Enabled.
Discard File contains Records that do not meet the Record-Selection Criteria specified WHEN clauses in the Control File. 
Hence, Discard File contains Records that were Not Inserted into any Table in the Database. 
(5)  Log File :
Log File Stores the Detailed Summary of the Load Operation, Including the Description of any Error that occure.
It is Created when SQL*Loader Begins the Execution.
If the Log File Can’t be Created, then the Execution is Terminated.
**  Log File Contents :
(a)  Header Information Section :
--  Date of the Run
--  Software Version Number
(b)  Global Information Section :
--  Names of All Input / Output Files
--  Echo if Command-Line Arguments
--  Continuation Character Specification 
(c)  Table Information Section :
Table Information Section provides the following entries for each Table Loaded:
--  Table Name
--  Load Condition, if any.
i.e., Whether All Records were Loaded  Or Only those meeting the WHEN-clause Criteria.
--  INSERT,  APPEND,  or  REPLACE  specification. 
It also contains following Column Information:
--  If found in Datafile,  the Position, Length, Data Type, & Delimiter.
--  If specified,  RECNUM, SEQUENCE, CONSTANT, or  EXPRESSION.
--  If specified,  DEFAULT  or  NULLIF.
(d)  Table Load Information Section :
Table Load Information Section provides the following entries for each Table that was Loaded :
--  Number of Rows Loaded.
--  Number of Rows that Qualified for Loading But were Rejected due to Data Errors.
--  Number of Rows that were Discarded because they do Not meet all of WHEN clause Condition.
--  Number of Rows whose relevant Fields were all NULL.
(e)  Datafile Information Section :
Datafile Information Section appears only for Datafiles with Data Errors & Provides the following entries :
--  SQL*Loader & Oracle Data Record Error .
--  Records Discarded .
(f)  Summary Statistics Section :
Summary Statistics Section displays following Data :
--  Amount of Space Used :
--  Cumulative Load Statistics; that is, For all Datafiles, the Number of Records that were Skipped, Read, or Rejected.
(g)  Additional Statistics :
Following Statistics are Logged when a Table is Loaded :
--  Direct Path Load of a Partitioned Table Reports per-partition statistics
--  Conventional-Path Load cannot Report per-partition statistics
(6)  Parameter File :
Parameter File stores All the Commonly used Command-Line Parameters.
i.e., You can specify the following at Command Line For performing a Load using the SQL*Loader : 
       C:\>  SQLLDR  PARFILE=sample.par 
In turn , the Parameter File  sample.par would contain the Command-Line Parameter as follows :
USERID=scott/tiger
CONTROL=sample.ctl
LOG=sample.log
DIRECT=true
ERRORS=99
BINDSIZE=100000
**  List of Command-Line Parameters :
(1)  USERID :
This Parameter enables you to specify the Oracle Username & Password.
(2)  CONTROL :
This Parameter Enables you to specify the Control Filename.
(3)  LOG : 
This Parameter Enables you to specify the Log Filename.
Default would be the Control File Name with the .log as the Extension.
(4)  BAD : 
This Parameter Enables you to specify the BAD File.
Default would be the Control File Name with the .bad as the Extension. 
(5)  DICARD : 
This Parameter Enables you to specify the File where            Oracle stores all the Records that Fail to Satisfy the WHEN clause.
(6)  DISCARDMAX : 
You can specify the Maximum Number of Discard to Allow. By Default , All Discard would be Allowed.
(7)  DATA : 
This Parameter specifies the Input Datafile Names. If the Filename is Not specified then Oracle looks for the File With the Control File Name with the .dat as the Extension.
(8)  SKIP : 
You can specifies the Number of Records to Skip.
This Parameter is Typically used to Continue the Load Operation after the Failure.
You can also Skip the Initial Comments & Headings specified in the Data Files.
(9)  LOAD :
You can specifies the Number of Records to be Loaded from the Data File. 
(10) ERRORS :
You can specifies the Maximum Number of Bad ( Rejected ) Records to Allow.
(11) ROWS :
This Parameter Enables you to specify the Number of Rows In the Array to be built before each Insert for Conventional Loads.
While performing a Direct-Path Load, this Defines the Approximate Number of Rows Read from Input for each Data Save.
(12) BINDSIZE :
This Parameter specifies the Maximum Size ( Bytes ) of the Bind Array.
The Size Defined here would Override the Default Size (which is System Dependent ) & any Size Determined by ROWS.
(13) DIRECT :
If this Parameter is Set to TRUE, SQL*Loader then uses the Direct-Path Load.
If this Parameter is Set to FALSE, SQL*Loader then uses the Conventional-Path Load.
(14) PARFILE :
This Parameter specifis the Name of the Parameter File where All the SQL*Loader Parameters are Stored.
(15) FILE :
This Parameter specifis the Database File to Allocate Extents from. It is used only for Parallel Loads. 
(16) COLUMNARRAY ROWS :
This Parameter enables you to specify the Number of Rows to Allocate for Direct Path Column Arrays. 
(17 ) EXTERNAL_TABLE :
This Parameter instructs the Utility whether to Load Data using the External Tables Option.
Possible values are NOT_USED, GENERATE_ONLY, & EXECUTE.
(18 ) MULTITHREADING :
This Parameter can be specified while performing a Load operation on a System with More than One CPU.
Default value is TRUE on a Multi-CPU System & FALSE on Single-CPU System.
(19) PARALLEL :
This Parameter enables you to specify whether Direct Loads can operate in Multiple Concurrent Sessions to Load Data into the same Table.
(20) READSIZE :
This Parameter specify the Size ( Bytes ) of the Read Buffer.
You can specify this in Conventional-Path Loads where more Data can be Read before a Commit is required.
(21) RESUMABLE :
This Parameter is used to Enable Or Disable Resumable Space Allocation. 
You can only use the RESUMABLE_NAME & RESUMABLE_TIMEOUT, if this Parameter is Set to TRUE.
(22) RESUMABLE_NAME  :
This value is a User-Defined Text String that is Inserted in either  USER_RESUMABLE  or  DBA_RESUMABLE  View to help you identify a specific Resumable Statement that has been Suspended. 
(23) RESUMABLE_TIMEOUT :
The value of this Parameter specifies the Time Period during which an Error must be Fixed , else the Execution of the Statement would be Aborted. 
Default is 7200 Seconds. 
(24 ) SILENT :
This Parameter Enables you to Suppress the Header & Feedback Messages that normally appear during Command-Line Execution.
Possible Values are : HEADER, FEEDBACK, ERRORS, DISCARDS, PARTITIONS, ALL.
(25) SKIP_INDEX_MAINTENANCE :
This Parameter, when Set to TRUE, Stops Index Maintenance for Direct-Path Loads, but does Not Apply to Conventional-Path Loads. 
(26) SKIP_UNUSABLE_INDEXES :
This Parameter, when Set to TRUE, allows SQL*Loader To Load a Table with Indexes that are Unusable State prior to Beginning the Load. 
(27) STREAMSIZE :
This Parameter specifies the Size in Bytes for Direct Path Streams.

No comments:

Post a Comment