|2010-10-29 00:09:07 - In reply to message 1 from jpaulo69|
I'm pretty sure lib_event can be combined with MySQLnd to do exactly what you're saying cannot be done :)
Nobody knows about it, but PHP does have some level of asyncronous programming. If lib_event doesn't play nice, then you can still block on multiple descriptors.
We do just this to parallelize and shard. A lot of the things people say PHP can't do have been done ...
|2010-11-02 00:02:02 - In reply to message 2 from Jaimie Sirovich|
|I do not think libevent is really necessary, given that there is already the stream_select function for waiting on activity of given file/stream descriptors.|
As for MySQLND, I was not aware that it could provide file descriptors that we can listen for activity. Actually, looking at the existing documentation, I could not figure how to do it.
Anyway, another problem of most database client libraries is that they were not meant for executing multiple parallel queries at the same time, unless you use multiple connections (different transactions). Basically you need to fetch the results of one query before sending another. This way you are limited when it comes to exploiting the benefits of asynchronous programming.
|2010-11-02 19:16:16 - In reply to message 3 from Manuel Lemos|
I can send you the snippet from our DB wrapper if you'd like (yep, it's proprietary, which is why this is not a problem for us).
Or I can write a blog on it.
Basically it lets you open multiple connections, issue a query per connection, and block until all connections complete. It could be multiple servers, but we usually use it to parallelize to 2 threads/queries.
Currently MySQL will not parallelize a query itself.
|2010-11-02 22:05:59 - In reply to message 4 from Jaimie Sirovich|
|Well I just commented that I could not find in the PHP documentation how to get the MySQL connection handle so I could use stream_select or equivalent to listen on responses from the server asynchronously.|
It seems the documentation is incomplete. If you know where it is documented, just let me know.
You could also be interesting for you to write an article about that in the PHPClasses blog, if you would like. I think it is not necessary to disclose your proprietary wrapper code.
I do not know of any database that supports issuing multiple queries over the same connection. But just being able to do other things while the server does not return the query results would be advantageous.
|2010-11-03 01:45:37 - In reply to message 5 from Manuel Lemos|
|I could extract the relevant lines from our library and refactor it not to use our DB:: code. If you think it's a good post, let me know, and I'll whip it up.|
All you need is mysqli_poll(). It sounds dirty, but it's not a spin lock or anything.
You can, in fact, issue multiple queries over the same connection, but they won't be parallelized. They're run in series. You need that for transactions, but it also avoids a round trip.
MySQL's documentation alludes to leveraging partitioning to parallelize queries one day.
"Queries involving aggregate functions such as SUM() and COUNT() can easily be parallelized. A simple example of such a query might be SELECT salesperson_id, COUNT(orders) as order_total FROM sales GROUP BY salesperson_id;. By “parallelized,” we mean that the query can be run simultaneously on each partition, and the final result obtained merely by summing the results obtained for all partitions."
Basically a limited map-reduce.
I'll do the post if you think it'll be interesting.
|2010-11-06 00:10:55 - In reply to message 6 from Jaimie Sirovich|
|Yes, an article describing how to do use that would be great. Feel free to submit a blog article going to:|
I was not aware of that mysqli_poll function. Actually it is even underdocumented.
It is not ideal because you have to call it regularly, as opposed to have stream_select or similar call to return only when the query ended and returned results. But it is better than nothing.
Using multiple links to execute queries in parallel is not such a good idea, not only because each link will have different transaction contexts and the query results may be inconsistent, but also because each link will consume a lot of server RAM.
Having too many database server links opened may exhaust your server RAM faster and it will limit further the number of PHP processes you may have started using persistent connections.
Serialized queries is also useful as your PHP script can process the results of one query in parallel while the server is executing and sending the results of the next query.
|2010-11-06 01:12:29 - In reply to message 7 from Manuel Lemos|
|looking forward to Jaimie's post.|