Contents

Find columns with special characters

Problem

/140919-im_special.jpg

Imagine a database where you want to know how many times a certain column contains a special character. The reason I had to do this was that I needed to know the impact of an export interface that wasn’t working properly with special characters.

But as lazy as I am when it comes to doing repetitive tasks, I had to find a way could easily get all columns with special characters in a data set without a lot of hassle.

The solution

Fortunately for me, I can use the internal INFORMATION views of a database in SQL Server to get all the columns with their table, schema, and type of data.

Because this database only used varchar columns for saving text data I only had to search for this data type. You can easily extend the selection by adding the necessary types like TEXT, NVARCHAR, etc.

To create the script I had to accomplish the following things:

  1. Retrieve all the columns that could contain text
  2. Set up a pattern to check the columns with
  3. Loop through the columns
  4. In every loop save the results
  5. Make up a data set of the data retrieved

I eventually got to this script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
-- Replace the mydatabase with the name of your database 
USE [mydatabase] 

-- Delete possible present temporary tables 
IF OBJECT_ID('tempdb..#columns') IS NOT NULL 
  DROP TABLE #columns 

IF OBJECT_ID('tempdb..#results') IS NOT NULL 
  DROP TABLE #results 

IF OBJECT_ID('tempdb..#rows') IS NOT NULL 
  DROP TABLE #rows 

-- Create a table for holding all the columns 
CREATE TABLE #columns 
  ( 
     id            INT NOT NULL PRIMARY KEY IDENTITY(1, 1), 
     database_name VARCHAR(100), 
     schema_name   VARCHAR(50), 
     table_name    VARCHAR(200), 
     column_name   VARCHAR(200), 
     position      INT 
  ) 

-- Create a table for holding the results 
CREATE TABLE #results 
  ( 
     database_name VARCHAR(100), 
     schema_name   VARCHAR(50), 
     table_name    VARCHAR(200), 
     column_name   VARCHAR(200), 
     count         INT 
  ) 

-- Declare the variables needed 
DECLARE @max INT, 
        @cnt INT 
DECLARE @sql     NVARCHAR(max), 
        @pattern VARCHAR(200) 

-- Get all the columns 
INSERT INTO #columns 
SELECT table_catalog, 
       table_schema, 
       table_name, 
       column_name, 
       ordinal_position 
FROM   information_schema.columns 
WHERE  data_type = 'varchar' 
ORDER  BY table_catalog, 
          table_name, 
          ordinal_position 

-- Get the minimum id and max that are needed for the loop 
SELECT @cnt = Min(id) 
FROM   #columns 

SELECT @max = Max(id) 
FROM   #columns 

--Setup the pattern 
SELECT @pattern = 'ëïöüäéíóáèìòàËÏÖÄÉÍÓÁÈÌÒÀ' 

-- Make sure the @sql variable is empty 
SELECT @sql = '' 

-- Start loop 
WHILE( @cnt < ��@max ) 
  BEGIN 
      -- Setup the sql statement 
      SELECT @sql = 'INSERT INTO #results SELECT ''' + database_name + ''', ''' 
                    + schema_name + ''', ''' + table_name + ''', ''' 
                    + column_name + ''', COUNT(1) AS count FROM [' + database_name + '].[' 
                    + schema_name + '].[' + table_name + '] WHERE [' 
                    + column_name 
                    + '] COLLATE Latin1_General_BIN2 LIKE ''%[' 
                    + @pattern + ']%'' ESCAPE ''E'' ' 
      FROM   #columns 
      WHERE  id = @cnt 

      -- Execute the sql statement 
      EXEC sp_executesql 
        @sql 

      -- Increase the counter 
      SELECT @cnt += 1 
  END 

-- Select the data 
SELECT results.database_name, 
       results.schema_name, 
       results.table_name, 
       results.column_name, 
       results.count AS 'count_results' 
FROM   #results results 
WHERE  results.count >= 1

This might take a while when you need to search through a lot of columns and records.

When the script you will get a result like the below:

/posts/2014/find-columns-with-special-characters/140919-results_specialcharacters.png

The pattern can contain any character that you want. I only searched for some of the most common special characters but any character will do. I hope you find this post helpful and I appreciate any comments.