Friday, October 28, 2005
Average mistake
Mathematics teaches average is sum of elements divided by number of the elements.
I.e.,
Average = SUM(elements) / NUMBER(elements).
Let us try to represent this into a T-SQL statement. SUM() function can be used to find the sum of elements and COUNT(*) function can be used for counting the elements. In the other hand we have AVG() used to find the average.
So, SUM(column) / COUNT(*) should be equal to AVG(column). But the reality is, they are not always equal. Let we try this T-SQL code,
-------- Begining of Code -------
CREATE TABLE #tbl_elements (element int)
GO
INSERT INTO #tbl_elements VALUES (5)
INSERT INTO #tbl_elements VALUES (10)
INSERT INTO #tbl_elements VALUES (NULL)
INSERT INTO #tbl_elements VALUES (15)
GO
SELECT
SUM(element)/COUNT(*) AS [Average through Formula],
AVG(element) AS [Average Through Function]
FROM #tbl_elements
GO
DROP TABLE #tbl_elements
GO
-------- End of Code -------
The result of the T-SQL statement is:
Average through Formula - 7
Average Through Function - 10
Why this difference???
The aggregate functions ignore NULL values when aggregating a column which contains both NULL and non-null values. COUNT(*) aggregate function is an exception. This function includes NULL values when it counts. If we need to exclude NULL values in the counting, specify the column name instead of ‘*’ in the function. That is COUNT(*) will count all elements including NULLs. However, COUNT(column) counts only the non-NULL values.
Now, let us re-write the T-SQL code as,
SELECT
SUM(element)/COUNT(element) AS [Average through Formula],
AVG(element) AS [Average Through Function]
FROM #tbl_elements
GO
We need to give extra care when we deal with NULL values to avoid this kind of mistakes.
I.e.,
Average = SUM(elements) / NUMBER(elements).
Let us try to represent this into a T-SQL statement. SUM() function can be used to find the sum of elements and COUNT(*) function can be used for counting the elements. In the other hand we have AVG() used to find the average.
So, SUM(column) / COUNT(*) should be equal to AVG(column). But the reality is, they are not always equal. Let we try this T-SQL code,
-------- Begining of Code -------
CREATE TABLE #tbl_elements (element int)
GO
INSERT INTO #tbl_elements VALUES (5)
INSERT INTO #tbl_elements VALUES (10)
INSERT INTO #tbl_elements VALUES (NULL)
INSERT INTO #tbl_elements VALUES (15)
GO
SELECT
SUM(element)/COUNT(*) AS [Average through Formula],
AVG(element) AS [Average Through Function]
FROM #tbl_elements
GO
DROP TABLE #tbl_elements
GO
-------- End of Code -------
The result of the T-SQL statement is:
Average through Formula - 7
Average Through Function - 10
Why this difference???
The aggregate functions ignore NULL values when aggregating a column which contains both NULL and non-null values. COUNT(*) aggregate function is an exception. This function includes NULL values when it counts. If we need to exclude NULL values in the counting, specify the column name instead of ‘*’ in the function. That is COUNT(*) will count all elements including NULLs. However, COUNT(column) counts only the non-NULL values.
Now, let us re-write the T-SQL code as,
SELECT
SUM(element)/COUNT(element) AS [Average through Formula],
AVG(element) AS [Average Through Function]
FROM #tbl_elements
GO
We need to give extra care when we deal with NULL values to avoid this kind of mistakes.