Tuesday, March 23, 2010

How to Remove Duplicate Rows from a table in SQL Server 2005

Sometimes it may happen. There will be a table without primary key. There are many reasons to get duplicate rows in table. Now we need to remove the duplicate rows. How to delete duplicate rows without Primary Key. In SQL Server 2005, we can use the following Common Table Expression (CTE) query to delete duplicate rows.

Create Table Employees
(
EmployeeID varchar(15),
EmployeeName varchar(30)
)

Insert into Employees(EmployeeID,EmployeeName) values ('TCS/IT/0001','Ram')
Insert into Employees(EmployeeID,EmployeeName) values ('TCS/IT/0001','Ram')
Insert into Employees(EmployeeID,EmployeeName) values ('TCS/IT/0002','Ragul')
Insert into Employees(EmployeeID,EmployeeName) values ('TCS/IT/0002','Ragul')
Insert into Employees(EmployeeID,EmployeeName) values ('TCS/ENG/0001','Joseph')
Insert into Employees(EmployeeID,EmployeeName) values ('TCS/ENG/0001','Joseph')
Insert into Employees(EmployeeID,EmployeeName) values ('TCS/ENG/0001','Joseph')
Insert into Employees(EmployeeID,EmployeeName) values ('TCS/ENG/0003','Abdul')
Insert into Employees(EmployeeID,EmployeeName) values ('TCS/ENG/0003','Abdul')

Before deleting duplicates.

select * from Employees

EmployeeID EmployeeName
TCS/IT/0001 Ram
TCS/IT/0001 Ram
TCS/IT/0002 Ragul
TCS/IT/0002 Ragul
TCS/ENG/0001 Joseph
TCS/ENG/0001 Joseph
TCS/ENG/0001 Joseph
TCS/ENG/0003 Abdul
TCS/ENG/0003 Abdul

Syntax to delete duplicate rows.

;with DelDuplicate as
(
select
row_number() over (partition by EmployeeID order by EmployeeID ) EmpID,
EmployeeID ,
EmployeeName
from
Employees
)
Delete from DelDuplicate where EmpID > 1

After Executing this query, we achieved our goal.

Please find the below result.

select * from Employees

EmployeeID EmployeeName
TCS/IT/0001 Ram
TCS/IT/0002 Ragul
TCS/ENG/0001 Joseph
TCS/ENG/0003 Abdul

Saturday, February 27, 2010

Computed Columns in SQL Server 2005

Computed Column is one of the Hidden Gem of SQL Server 2005. When doing tedious calculation for one column , we can create computed column for that column.
Create table Employees
(
EmployeeID int primary key identity(1,1),
EmpName varchar(60),
Salary Decimal(14,2),
Bonus as Salary*12*0.0833 persisted
)
We can create index for that computed column. It will give the more performance.
create index Idx_Employees_Bonus on Employees(Bonus)
For already existing table, the following syntax can be used.
Alter table Employees add PF as Salary*0.12 persisted

Sunday, January 24, 2010

Recursive Queries using Common Table Expressions (CTE) in SQL Server 2005

A common table expression (CTE) provides significant advantage of being able to reference itself. So It is used to write recursive queries. In earlier versions of SQL Server, a recursive query usually requires using temporary tables, cursors, and logic to control the flow of the recursive steps.

Create table Employees
(
EmployeeID int primary key identity(1,1),
EmpName varchar(50),
Designation varchar(25),
ReportingTo int references Employees (EmployeeID)
)

Insert into Employees(EmpName,Designation,ReportingTo) values ('Raj','Project Manager',NULL)
Insert into Employees(EmpName,Designation,ReportingTo) values ('Ragul','Project Leader',1)
Insert into Employees(EmpName,Designation,ReportingTo) values ('Ramesh','Project Leader',1)
Insert into Employees(EmpName,Designation,ReportingTo) values ('Prakash','Team Leader',2)
Insert into Employees(EmpName,Designation,ReportingTo) values ('Karthik','Team Leader',2)
Insert into Employees(EmpName,Designation,ReportingTo) values ('Arun','Team Leader',3)
Insert into Employees(EmpName,Designation,ReportingTo) values ('Kannan','Team Leader',3)
Insert into Employees(EmpName,Designation,ReportingTo) values ('David','Developer',4)
Insert into Employees(EmpName,Designation,ReportingTo) values ('Javid','Developer',4)
Insert into Employees(EmpName,Designation,ReportingTo) values ('Kishore','Developer',4)
Insert into Employees(EmpName,Designation,ReportingTo) values ('Rishi','Developer',4)

Top to Bottom Search :

This query is used to search the records from top level to bottom level.

With Emp(EmployeeID,EmpName,Designation,ReportingTo,Level) as
(
Select E.EmployeeID,E.EmpName,E.Designation,E.ReportingTo ,0 Level
from Employees E where EmpName='Ragul' --- Anchor member definition
Union All
Select E.EmployeeID,E.EmpName,E.Designation,E.ReportingTo ,Level+1 Level
from Employees E inner join Emp CE on CE.EmployeeID=E.ReportingTo --Recursive member definition
) Select E.EmpName,E.Designation,E.Level from Emp E option(maxrecursion 0)

EmpName Designation Level
Ragul Project Leader 0
Prakash Team Leader 1
Karthik Team Leader 1
David Developer 2
Javid Developer 2
Kishore Developer 2
Rishi Developer 2

Bottom to Top Search :

This query is used to search the records from bottom level to top level.

With Emp(EmployeeID,EmpName,Designation,ReportingTo,Level) as
(
Select E.EmployeeID,E.EmpName,E.Designation,E.ReportingTo , 0 Level
from Employees E where EmpName='Rishi' --- Anchor member definition
Union All
Select E.EmployeeID,E.EmpName,E.Designation,E.ReportingTo , Level+1 Level
from Employees E inner join Emp CE on E.EmployeeID=CE.ReportingTo --Recursive member definition
) Select E.EmpName,E.Designation,E.Level from Emp E option(maxrecursion 0)


EmpName Designation Level
Rishi Developer 0
Prakash Team Leader 1
Ragul Project Leader 2
Raj Project Manager 3

Thursday, December 10, 2009

Crosstab Queries using PIVOT in Sql Server 2005

SQL Server 2005 introduces this new feature. The PIVOT operator is useful for generating cross tab reports. It is rotating rows into columns.
For Example, We are storing EmployeeName, MonthOfSales , SalesAmount in a table. If We need to analyse the monthwise sales for each and every employee, we can use the crosstab query by using PIVOT and UNPIVOT.

Create table EmpTable
(

EmployeeID int primary key identity(1,1),
EmployeeName varchar(60),
MonthOfSales varchar(15),
SalesAmount Decimal(14,2)
)

Insert into EmpTable(EmployeeName,MonthOfSales ,SalesAmount) values ('Success','Jan','5000.00')
Insert into EmpTable(EmployeeName,MonthOfSales ,SalesAmount) values ('Failure','Jan','2000.00')
Insert into EmpTable(EmployeeName,MonthOfSales ,SalesAmount) values ('Average','Jan','3500.00')
Insert into EmpTable(EmployeeName,MonthOfSales ,SalesAmount) values ('Success','Jan','6000.00')
Insert into EmpTable(EmployeeName,MonthOfSales ,SalesAmount) values ('Failure','Jan','2000.00')
Insert into EmpTable(EmployeeName,MonthOfSales ,SalesAmount) values ('Average','Jan','4000.00')
Insert into EmpTable(EmployeeName,MonthOfSales ,SalesAmount) values ('Success','Jan','9000.00')
Insert into EmpTable(EmployeeName,MonthOfSales ,SalesAmount) values ('Failure','Jan','3000.00')
Insert into EmpTable(EmployeeName,MonthOfSales ,SalesAmount) values ('Average','Jan','6000.00')

Select EmployeeName, MonthOfSales , SalesAmount from EmpTable

EmployeeName MonthOfSales SalesAmount
Success Jan 5000.00
Failure Jan 2000.00
Average Jan 3500.00
Success Feb 6000.00
Failure Feb 2000.00
Average Feb 4000.00
Success Mar 9000.00
Failure Mar 3000.00
Average Mar 6000.00

If MonthOfsales column is fixed means, We can use this query

select s.*
from
( select EmployeeName,MonthOfSales ,SalesAmount
from EmpTable
) p
pivot
(
sum(p.SalesAmount) for p.MonthOfSales in ([Jan],[Feb],[Mar])
) s


If we don’t know the column values, We can have the column values dynamically. To achieve this, we are able to use the following dynamic sql query.

Declare @Str nvarchar(max)
Set @Str=N'select s.*
from
( select EmployeeName,MonthOfSales ,SalesAmount
from EmpTable
) p
pivot
(
sum(p.SalesAmount) for p.MonthOfSales in (' +
stuff((select distinct ' ,' + quotename(MonthofSales)
from EmpTable for XML path('')),1,2,'') + ')
) s'
Exec(@Str)

The Results of these queries will be like this.


EmployeeName Jan Feb Mar
Average 3500.00 4000.00 6000.00
Failure 2000.00 2000.00 3000.00
Success 5000.00 6000.00 9000.00

Friday, August 7, 2009

DateTime DataType in SQL Server 2005

When we are using DateTime DataType in SQL Server 2005, values are stored as Date with time upto milliseconds. If we are selecting records between ranges, it will display some inconsistent data. So We can use the following format.

Dateadd(d,datediff(d,0,columnname),0)

For large tables, it will reduce performance.

So we can add computed column for this column and we should create index for this column.