Tuesday, December 18, 2007

SQL, Joins, Unions and more

Ok, this is will be about MySQL once again.

I had that problem. Let's assume we have 3 tables:
1. smetki
2. di_invoices
3. dk_invoices
4. ei_invoices

Each of the XX_invoices has a "foreign_key" to PK of "smetki". I am using MyISAM.
If I perform the following 3 queries separately:
---------------------------------------------------------
select
s.smetka_id, s.created_on,
count(di.invoice_id) as di_count
from
smetki as s
left outer join di_invoices as di on (s.smetka_id = di.smetka_id)
where
s.created_on between '2007-12-10' and '2007-12-15'
group by
s.smetka_id

-----------------------------------------------------------
select
s.smetka_id, s.created_on,
count(dk.invoice_id) as dk_count
from
smetki as s
left outer join dk_invoices as dk on (s.smetka_id = dk.smetka_id)
where
s.created_on between '2007-12-10' and '2007-12-15'
group by
s.smetka_id

------------------------------------------------------------
select
s.smetka_id, s.created_on,
count(ei.invoice_id) as ei_count
from
smetki as s
left outer join ei_invoices as ei on (s.smetka_id = ei.smetki_id)
where
s.created_on between '2007-12-10' and '2007-12-15'
group by
s.smetka_id
------------------------------------------------------------
I have no problem, I have proper indexes and I got the results for less then 200 ms from each query.
But I need the 3 results into 1 single resultset so I decided to make a join like that:
select
s.smetka_id, s.created_on,
count(di.invoice_id) as di_count,
count(dk.invoice_id) as dk_count,
count(ei.invoice_id) as ei_count
from
smetki as s
left outer join di_invoices as di on (s.smetka_id = di.smetka_id)
left outer join dk_cor_invoices as dk on (s.smetka_id = dk.smetka_id)
left outer join ei_invoices as ei on (s.smetka_id = ei.smetki_id)
where
s.created_on between '2007-12-10' and '2007-12-15'
group by
s.smetka_id

Now I have query which executes more than 15 minutes !! I couldn't wait to the end and killed it. There must be something wrong somewhere?
I sould notice that for the given period there are only 6 rows in the "smetki" table, and different number of rows in each xx_invoices table that correspond to the whole resultset from "smetki". I have to mention that there is one intersection between two of the XX_invoices tables where there are about 120k rows for one "smetka_id", and for the same "smetka_id" there are about 5k rows in the other.
Also - there are no other queries running by the time of measuring the performance.

What could be the solution. A friend of mines helped me. Make a UNION he said. So I did it like that:

(select
s.smetka_id, s.created_on,
count(di.invoice_id) as di_count, 0 as dk_count, 0 as ei_count
from
smetki as s
left outer join di_invoices as di on (s.smetka_id = di.smetka_id)
where
s.created_on between '2007-12-10' and '2007-12-15'
group by
s.smetka_id)
UNION
(select
s.smetka_id, s.created_on,
0 as di_count, count(dk.invoice_id) as dk_count, 0 as ei_count
from
smetki as s
left outer join dk_invoices as dk on (s.smetka_id = dk.smetka_id)
where
s.created_on between '2007-12-10' and '2007-12-15'
group by
s.smetka_id)
UNION
(select
s.smetka_id, s.created_on, 0 as di_count, 0 as ei_count,
count(ei.invoice_id) as ei_count
from
smetki as s
left outer join ei_invoices as ei on (s.smetka_id = ei.smetki_id)
where
s.created_on between '2007-12-10' and '2007-12-15'
group by
s.smetka_id )

OK, but now I have 3 rows for each single "smetka_id" - one row for the results from each table. And then we decided to make a VIEW over that UNION.

Gues what. Now I have the simple query:

select smetka_id, sum(di_count), sum(dk_count), sum(ei_count) from v_test group by smetka_id

Which returns me the desired result for less than a second !

I hope this would help your daily work with MySQL and SQL in general.

No comments: