Friday, August 10, 2012

MSSQL Split, Pad Zeroes, Get Alphabets, and Substring

Got a request to create a data table get from some existing data to some required data formatting.
The Source data is like 'C10 L1_B21773197D.xls', 'EV1 L1_B21773602F.xls', 'EV116 L1_B21773659G.xls', etc.
The following were the requirements.

CategoryCode
No
NVARCHAR(10)
C
Contained in legacy data file name
Sample legacy data file name : “C10 L1_B21773197D
1.       Alphabets prior to Numeric characters)
FolderNo
No
VARCHAR(10)
C001
Contained in legacy data file name
Sample legacy data file name : “C10 L1_B21773197D
1.       convert numeric characters prior to “white space” to a 3 digit number (e.g. “1” --> “001”)
2.       Append 3 digit number to Category Code
VolumeNo
No
VARCHAR(10)
C001L01
Contained in legacy data file name
Sample legacy data file name : “C10 L1_B21773197D
1.       Extract the 2nd occurrence of an Alphabets in File Name after “white Space” (It will be value L or V)
2.       convert the numeric characters prior to “_”to a 2 digit number (e.g. “1” --> “01”)
3.       Append the 2 digit number (step2) to Alphabet (step1) to get volume code like L01
4.       Append volume code to Folder No to get Volume No like C001L01
VolumeBarcode
No
VARCHAR(12)
B21773203C
Contained in legacy data file name
Sample legacy data file name : “C10 L1_B21773197D
·         The 10 Alphanumeric characters after “_”

I created a temporary table with required values and get the desired data on another SELECT.
Follows the SQL statements.

SELECT SUBSTRING([Filename],1,(len([Filename]) - 4)) as TFilename  --This line will remove the file extension 
 ,dbo.f_get_alphabets(PARSENAME(REPLACE(SUBSTRING([Filename],1,(len([Filename]) - 4)), ' ', '.'), 2)) as Code -- This is to get only the alphabets from first portion
,dbo.f_get_numbers(PARSENAME(REPLACE(SUBSTRING([Filename],1,(len([Filename]) - 4)), ' ', '.'), 2)) AS TCat --This is to get only the numbers from first portion
,PARSENAME(REPLACE(SUBSTRING([Filename],1,(len([Filename]) - 4)), ' ', '.'), 1) as Vol -- This part is to get the second portion
into #TempLD -- Store data into temp table
FROM [dbo].[SourceTable]
  
  SELECT Code as CategoryCode,
  Code + RIGHT('000'+ TCat, 3) as FolderNo, -- Padding Zeroes infront to make it 3 digit number with zeroes in front
  Code + RIGHT('000'+ TCat, 3) + LEFT(PARSENAME(REPLACE(Vol, '_', '.'), 2),1)+RIGHT('00'+ RIGHT(PARSENAME(REPLACE(Vol, '_', '.'), 2),1),2) as VolumeNo, -- Padded numbers and Alphabets concatenated to make the desired string
  PARSENAME(REPLACE(Vol, '_', '.'), 1) as VolumeBarcode, -- Split the portion behind _ character
  from #TempLD

Follow is the function used on the above query. For numbers, the IF condition would be changed as @code between 48 and 57.

CREATE FUNCTION [dbo].[f_get_alphabets](
        @string                varchar(255)
) RETURNS varchar(255)
BEGIN
    
    DECLARE @newstring varchar(255)
    DECLARE @num int
    DECLARE @code int
    SELECT @newstring = ''
    SELECT @num = 1

    WHILE @num < LEN(@string)+1
    BEGIN
        SET @code = ASCII(SUBSTRING(@string, @num, 1))
        --- ascii: 48-57 (numbers 0-9)
        --- ascii: 65-90 (uppercase letters A-Z)
        --- ascii: 97-122 (lowercase letters a-z)
        IF (@code between 65 and 90) or (@code between 97 and 122)
        BEGIN
            SET @newstring = @newstring + SUBSTRING(@string, @num, 1)
        END
        SET @num = @num + 1
    END
    RETURN @newstring
END