The other day I tweeted that MySQL unions are my new best friend. I just want to take that back. I just spent the past six hours (it's now 5 am) overhauling a MySQL query and removing one of the unions actually sped up performance. Dazah is really my first foray into the world of big data. I've found the hardest problem so far has been writing queries that will work regardless of the subset of the data size they are working against. For example, there are multiple bubbles and audience segments on Dazah, of all sizes and each favoring a different behavior type (some bubbles are more chatty than others), and I've found that every time I optimize a query for one dataset it fubars with a different one. Just a rant.
pty 882 Posting Pro
Remember that UNION
(and its cousins INTERSECT
and EXCEPT
) essentially require each of the provided queries to be run separately and the results collated. In this example (using PostgreSQL, but the same applies in MySQL), we can see exactly what's happening.
I have a small table with some users and their favourite colours:
peter=# select * from users;
┌────┬─────────┬──────────────────┐
│ id │ name │ favourite_colour │
├────┼─────────┼──────────────────┤
│ 3 │ Francis │ blue │
│ 4 │ Toby │ blue │
│ 1 │ Joey │ red │
│ 2 │ Dwayne │ purple │
└────┴─────────┴──────────────────┘
(4 rows)
Time: 0.400 ms
Now, let's say we want all users with a favourite colour of blue
or red
; if we use a UNION
the following happens:
peter=# explain select name from users where favourite_colour = 'blue' union select name from users where favourite_colour = 'red' ;
┌───────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────────────────────────────────┤
│ HashAggregate (cost=35.33..35.39 rows=6 width=58) │
│ Group Key: users.name │
│ -> Append (cost=0.00..35.31 rows=6 width=58) │
│ -> Seq Scan on users (cost=0.00..17.62 rows=3 width=58) │
│ Filter: ((favourite_colour)::text = 'blue'::text) │
│ -> Seq Scan on users users_1 (cost=0.00..17.62 rows=3 width=58) │
│ Filter: ((favourite_colour)::text = 'red'::text) │
└───────────────────────────────────────────────────────────────────────────┘
(7 rows)
Time: 3.868 ms
As you can see, the query plan involves
- two sequental scans (
Seq Scan
) that each perform aFilter
, - an
Append
operation which is actually performing theUNION
, - plus
Group Key
andHashAggregate
steps, from which the resulting recordset can be returned.
Now, if we perform the same query in another fashion, we can see how much simpler the query plan is:
peter=# explain select name from users where favourite_colour in ('blue', 'red') ;
┌───────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────────────────────────┤
│ Seq Scan on users (cost=0.00..17.62 rows=6 width=58) │
│ Filter: ((favourite_colour)::text = ANY ('{blue,red}'::text[])) │
└───────────────────────────────────────────────────────────────────┘
(2 rows)
Time: 0.510 ms
Unions, along with the other set operations, can be incredibly useful when combining data from different sources but they are unlikely to optimise queries from a single table or relation.
Additionally, two things I noticed while typing this reply:
- the editor isn't resizable, this makes it difficult to refer to what I've typed while formulating a response. This is more prevelant in posts with big code snippets, tables, etc.
- there appears to be a bug that occurs if a multi-line code snippet follows a list directly; the code snippet isn't recognised until there's a line of text between them
Edited by pty
Lee_15 1 Newbie Poster
This is very well written. It highlights some of the issues I have as well. Great article to explain how to go about cleaning up code. I've spent years doing just that but it's often unappreciated.
Edited by Lee_15
Be a part of the DaniWeb community
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.