Anton Staykov's Blog: December 2007

Thursday, December 20, 2007

SQL, Joins, Unions and more .. continued II

Last night I decided to let MySQL think about that query ... It seemed to have completed calculations in the morning. Not surprisingly it has returned the very same data like did the MS SQL - The multiplication of the results. But it has taken 6604156 milliseconds (~6604 seconds) to obtain the result. The same result that MS SQL EXPRESS returned in 274709 milliseconds (~275 seconds).

No further comments.
Case closed.

Wednesday, December 19, 2007

SQL, Joins, Unions and more .. continued

Hi again.

Today I made a test - transferred the data into MS SQL Server 2005 Express SP 2.0.

Guess What ? :)

The situation is almost the same ! Wondering why the result was messed, the execution time was arround '274709' - I guess this is the "ms" - "duration" column in the MS SQL Server Profiler. Which is arround 5 minutes.
The mess I am talking about is the following:
The situation is that in of the tables I have about 120k records, which should be displayed, on the other one I have about 5k records. Insted of showing "120000" for "di_count", and "5000" for "ei_count", in the result I got '594163480' (which is the number of rows in one of the left joined tables multiplied by the number of rows in the other one) for each of them - the same figure for the two columns. What is that ?

I just want to make a select from one table, than join second table, then join third table to the FIRST one, then join the last table AGAIN to THE FIRST one. Why is it joining the result alltogether?

Anyway .. using UNION, VIEW and work with the VIEW solves the problem, but I was just wondering why this happens ...

I j ust can't help showing you these two screens:

For the MSSQL Profiler:

And for the MySQL (note: the execution is still not finished!):

I have no comment!

Tuesday, December 18, 2007

SQL, Joins, Unions and more

Ok, this is will be about MySQL once again.

I had that problem. Let's assume we have 3 tables:
1. smetki
2. di_invoices
3. dk_invoices
4. ei_invoices
Each of the XX_invoices has a "foreign_key" to PK of "smetki". I am using MyISAM.
If I perform the following 3 queries separately:
---------------------------------------------------------
select
s.smetka_id, s.created_on,
count(di.invoice_id) as di_count
from
smetki as s
left outer join di_invoices as di on (s.smetka_id = di.smetka_id)
where
s.created_on between '2007-12-10' and '2007-12-15'
group by
s.smetka_id
-----------------------------------------------------------
select
s.smetka_id, s.created_on,
count(dk.invoice_id) as dk_count
from
smetki as s
left outer join dk_invoices as dk on (s.smetka_id = dk.smetka_id)
where
s.created_on between '2007-12-10' and '2007-12-15'
group by
s.smetka_id
------------------------------------------------------------
select
s.smetka_id, s.created_on,
count(ei.invoice_id) as ei_count
from
smetki as s
left outer join ei_invoices as ei on (s.smetka_id = ei.smetki_id)
where
s.created_on between '2007-12-10' and '2007-12-15'
group by
s.smetka_id
------------------------------------------------------------
I have no problem, I have proper indexes and I got the results for less then 200 ms from each query.
But I need the 3 results into 1 single resultset so I decided to make a join like that:
select
s.smetka_id, s.created_on,
count(di.invoice_id) as di_count,
count(dk.invoice_id) as dk_count,
count(ei.invoice_id) as ei_count
from
smetki as s
left outer join di_invoices as di on (s.smetka_id = di.smetka_id)
left outer join dk_cor_invoices as dk on (s.smetka_id = dk.smetka_id)
left outer join ei_invoices as ei on (s.smetka_id = ei.smetki_id)
where
s.created_on between '2007-12-10' and '2007-12-15'
group by
s.smetka_id
Now I have query which executes more than 15 minutes !! I couldn't wait to the end and killed it. There must be something wrong somewhere?
I sould notice that for the given period there are only 6 rows in the "smetki" table, and different number of rows in each xx_invoices table that correspond to the whole resultset from "smetki". I have to mention that there is one intersection between two of the XX_invoices tables where there are about 120k rows for one "smetka_id", and for the same "smetka_id" there are about 5k rows in the other.
Also - there are no other queries running by the time of measuring the performance.

What could be the solution. A friend of mines helped me. Make a UNION he said. So I did it like that:

(select
s.smetka_id, s.created_on,
count(di.invoice_id) as di_count, 0 as dk_count, 0 as ei_count
from
smetki as s
left outer join di_invoices as di on (s.smetka_id = di.smetka_id)
where
s.created_on between '2007-12-10' and '2007-12-15'
group by
s.smetka_id)
UNION
(select
s.smetka_id, s.created_on,
0 as di_count, count(dk.invoice_id) as dk_count, 0 as ei_count
from
smetki as s
left outer join dk_invoices as dk on (s.smetka_id = dk.smetka_id)
where
s.created_on between '2007-12-10' and '2007-12-15'
group by
s.smetka_id)
UNION
(select
s.smetka_id, s.created_on, 0 as di_count, 0 as ei_count,
count(ei.invoice_id) as ei_count
from
smetki as s
left outer join ei_invoices as ei on (s.smetka_id = ei.smetki_id)
where
s.created_on between '2007-12-10' and '2007-12-15'
group by
s.smetka_id )