Friday, October 28, 2005

 

Average mistake

Mathematics teaches average is sum of elements divided by number of the elements.

I.e.,
Average = SUM(elements) / NUMBER(elements).

Let us try to represent this into a T-SQL statement. SUM() function can be used to find the sum of elements and COUNT(*) function can be used for counting the elements. In the other hand we have AVG() used to find the average.

So, SUM(column) / COUNT(*) should be equal to AVG(column). But the reality is, they are not always equal. Let we try this T-SQL code,

-------- Begining of Code -------
CREATE TABLE #tbl_elements (element int)
GO

INSERT INTO #tbl_elements VALUES (5)
INSERT INTO #tbl_elements VALUES (10)
INSERT INTO #tbl_elements VALUES (NULL)
INSERT INTO #tbl_elements VALUES (15)
GO

SELECT
SUM(element)/COUNT(*) AS [Average through Formula],
AVG(element) AS [Average Through Function]
FROM #tbl_elements
GO

DROP TABLE #tbl_elements
GO

-------- End of Code -------

The result of the T-SQL statement is:

Average through Formula - 7
Average Through Function - 10



Why this difference???

The aggregate functions ignore NULL values when aggregating a column which contains both NULL and non-null values. COUNT(*) aggregate function is an exception. This function includes NULL values when it counts. If we need to exclude NULL values in the counting, specify the column name instead of ‘*’ in the function. That is COUNT(*) will count all elements including NULLs. However, COUNT(column) counts only the non-NULL values.

Now, let us re-write the T-SQL code as,

SELECT
SUM(element)/COUNT(element) AS [Average through Formula],
AVG(element) AS [Average Through Function]
FROM #tbl_elements
GO


We need to give extra care when we deal with NULL values to avoid this kind of mistakes.

Comments: Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?