PHP Classes

Is the Hack Language Going to Replace PHP? - Lately in PHP podcast episode 46

Recommend this page to a friend!
  Blog PHP Classes blog   RSS 1.0 feed RSS 2.0 feed   Blog Is the Hack Language ...   Post a comment Post a comment   See comments See comments (8)   Trackbacks (0)  

Author:

Viewers: 32

Last month viewers: 11

Categories: PHP Security, Lately in PHP Podcast, PHP opinions

The release of the Facebook Hack language has shaken the PHP community since it implements several frequently requested features that were never implemented, many users are considering to drop PHP in favor of Hack.

This was one of the main topics discussed by Manuel Lemos and Arturs Sosins on the episode 46 of the Lately in PHP podcast.

They also talked about the OpenSSL Heartbleed security bug may affect PHP sites or not, ideas for the PHP 6 engine, the need for an official PHP specification, and an advanced email validation that can provide suggestions for address typos like Google did you mean feature.

Now listed to the podcast, or watch the hangout video or read the transcript to learn more about these interesting PHP topics.




Loaded Article

Contents

Introduction (0:20)

PHP Releases 5.5.10, 5.4.26, 5.4.27 (1:34)

The HeartBleed OpenSSL Security Bug (5:06)

Conditional code with #IFDEF (15:35)

Ideas for PHP 6 engine (21:38)

The Need for an Official PHP Specification (26:42)

Is the Hack Language Going to Replace PHP? (32:52)

Did You Mean Advanced Email Validation in PHP (49:19)

JavaScript Innovation Award Winners of January 2014 (1:05:19)

PHP Innovation Award Winners of January 2014 (1:10:25)

Conclusion (1:17:51)



Contents

Listen or download the podcast, RSS feed and subscribe in iTunes

Watch the podcast video, subscribe to the podcast YouTube channel

Read the podcast transcript


Click on the Play button to listen now.


Download Size: 62MB Listeners: 3104

Introduction music Harbour used with explicit permission from the author Danilo Ercole, from Curitiba, Brazil

View Podcast in iTunes

In iTunes, use the Subscribe to Podcast... item of the Advanced menu, and then enter the URL above to subscribe to this podcast.

Watch the podcast video

Note that the timestamps below in the transcript may not match the same positions in the video because they were based on the audio timestamps and the audio was compacted to truncate silence periods.

See the Lately in PHP podcast play list on YouTube and Subscribe to this channel there.

Show notes

Introduction (0:20)

[Music]

Manuel Lemos: Hello. Welcome to the Lately in PHP podcast. This is episode 46, yet another podcast. And this time, I have a different guest host, Arturs Sosins from Latvia.

Hello, Arturs, how are you doing?

Arturs Sosins: Hello. I'm actually fine. First time we record probably in daytime so I'm really energetic and looking forward.

Manuel Lemos: Yeah, most people will watch the recording afterward. So, probably, they don't have any idea what time it would be here. But I guess from the lighting, they can guess there's something different which is the fact that today, I have the sun coming from my left. Well, it's always a good day to talk about PHP.

Arturs, despite you've been lately the guest of the JSClasses site, this time is filling in for Cesar Rodas but he could make it this time unfortunately for personal matters. I hope he can come back next month.

PHP Releases 5.5.10, 5.4.26, 5.4.27 (1:34)

Manuel Lemos: And this month, we are going to start precisely by the usual topics on which we talk about the latest releases in PHP. So, on PHP 5.5, there was the tenth revision, 5.5.10. As always, it has been a release with mostly bug fixes.  So, PHP 5.5 is just mostly in maintenance. Most things that are added are just bug fixes and minor modifications.

At the same time, there was also a couple of PHP 5.4 releases, one on April 3 already in April. It was 5.4.27. The previous one was already in March so it was about one month before. Also mostly the same changes, because as we are getting closer to PHP 5.6 release, most of the things on previous versions are basically just bug fixes.

Arturs, I wonder if you have been using any PHP lately. I know that you are working more in the game development world. So, it's probably not so much PHP related, but what have you been using in terms of PHP, if any?

Arturs Sosins: Well, honestly, yeah, I've been using  a little less JavaScript than I used to and I think even less PHP, so I'm kind of maybe not following what happens, what advances. I know there are many changes, many great features coming and ideas. That's actually what I would be discussing and looking in this podcast.

But what I noticed in the change log is actually, what I thought was interesting, 5.4 and 5.5 versions have same bug fixes. So they are handled simultaneously, as I understand. They are both fixing versions and managing as different branches. So that was something that I thought was interesting. And will the PHP 5.6 be another branch?

Manuel Lemos: Yeah. Well, even PHP 5.6 does not have many new features. Because, as we mentioned before, there are already plans for PHP 6 or whatever will be the PHP next version. At the same time, there are lots of parallel development outside the PHP core like the Hack language, but we'll get back to that ahead.

The HeartBleed OpenSSL Security Bug (5:06)

Manuel Lemos: Now, talking about some latest developments, these latest releases of PHP have some security bug fixes, some even related with OpenSSL. But the greatest announcement of the latest weeks was the OpenSSL Heartbleed security bug that was found.

Actually, it was announced recently, but I think it was introduced two or three years ago. And only recently was it announced that there was a fix. And I bet that the CIA already knew and probably will be taking advantage of it to exploit some sites and watch some more people that need to be watched, I guess.

But regarding the security bug fix that was announced recently on OpenSSL. I thought it would interesting to write an article because many PHP developers were concerned if PHP would somehow be vulnerable or at least the Web service that use OpenSSL would be vulnerable.

So I wrote this article, basically I tell everybody that was concerned that this concern, this security problem is only affecting sites that are using SSL. That's the basic condition. If your site is not running SSL, HTTPS, it should not be concerned because... Well, you probably should be concerned with the data that goes back and forth which are not encrypted, that's not internal security of your Web server.

Another matter, because not everybody would be affected by this bug, is that this problem is only affecting releases of OpenSSL 1.0.1. Unless you have upgraded to 1.0.1g, you should upgrade because  the previous 1.0.1 releases were the ones that were affected.

But if you are running 1.0.0 or maybe 0.9.8, you probably should not be concerned with this specific vulnerability. You probably should concerned with some other past security problems or other matters related with it.

So, in this article, basically I'll talk about who is affected, who is not affected. Although this vulnerability affects mostly Web servers, it is possible that it might affect clients. For instance, if you have some PHP script that sends SSL requests to a Web server, in theory, it is possible that you could be subjected to some exploit, for instance, if the server was compromised.

Maybe it could take advantage of this vulnerability well for instance making your client share private information essentially contained private files. Well, I'm not really an expert in security. I'm just trying to interpret how far this vulnerability can affect the different setups that OpenSSL could be used. But that's what I got.

First, you should exclude the fact that when you are not using SSL at all and then, you should also exclude situations on which you are not running 1.0.1 prior to the G release.

Well, in any case, there is a site of a company called Qualys SSL Labs and there is an SSL Server Test that can show if different sites are vulnerable. Here, they make several tests, not only to the Heartbleed security vulnerability, but eventually other security problems, for instance, the use of encryption ciphers.

Anyway, if you are not sure if your site is vulnerable, you should go to this site and test it. And probably, if there are any issues related with Heartbleed vulnerability or not, you probably shouldn't... Sorry, you should go to the site and eventually test to see what are the issues.

Here is just an example of results page, which got a F, which is lousy and great and it tells exactly what are the issues. And you should eventually do the same with your server.

Arturs, have you tested any servers and maybe anything related with your current work that could be concerned about security and this vulnerability?

Arturs Sosins: Yeah, well, while I actually develop lots of websites using PHP, of course, I was never responsible for server setups. That was actually another department usually doing that. So, I actually even don't have much websites except probably from my own private ones that I tested that got I think C or something. But they usually use SSL only because they need it for Facebook app being for Facebook app, you have to support HTTPS and SSL player.

But interesting fact about the Heartbleed, I recently read an interview with the developer who introduced it, and he actually committed the code with the bug just one minute before midnight in New Year's Eve.

It was 2011, just one minute before New Year, he submitted this bug. And he basically explained it was an oversight and it's actually quite frustrating that it took two years to notice it. And most probably, it was exploited in this time by the ones who knew, who saw it.

Manuel Lemos: I guess CIA detected in much, much much earlier, right?

[Laughter]

Arturs Sosins: Yeah.

Manuel Lemos: Because they have best experts in security. Because they have to.

Well, jokes apart, if you were concerned if your Web servers are vulnerable or if you need to do anything, now, at least, I hope with this article that I published, you know what to do. Basically, you can do this very quick test in SSL Labs site and then you can figure out what to do. If you are not capable of doing any server maintenance, talk  with whoever needs to do it, if they did not do it already.

On my part, I got a lot of emails of companies saying, Oh, wow, our servers are secured, don't be concerned.

Arturs Sosins: You still should...

Manuel Lemos: CIA is not done anything, they are doing that.

Arturs Sosins: You still should be concerned if you are using same passwords for different websites and if there's one that wasn't secured. Basically all your accounts are compromised.

Manuel Lemos: Right. That's a whole different problem.

[Laughter]

Arturs Sosins: That's not a developer problem, yeah?

Manuel Lemos: Yeah. There are so many security problems that most people never heard. One typical problem that I noticed is that, for instance, on secure sites, if you have SSL2 enabled, you already have an old problem.

And other than that, even if you have  only enabled newer versions of SSL or TLS, you still have to be concerned about certain ciphers that you are using, that are considered to be too weak. It could be broken and it's possible to have some exploits.

OK, I hope this article and our comments are already useful enough for whoever needs to be concerned about security matters.

Conditional code with #IFDEF (15:35)

Manuel Lemos: Now, let's move on with the next topic on which we'll just comment about an idea that appeared on the PHP Internals List for eventual feature for the next PHP versions.

Actually, I don't think this is going to pass. It was just an idea to have some kind of statement like in the C language on which you could just put an ifdef command there actually a directive like there is in C.

But you use it in PHP code, so you would eventually enable sections of code that would run certain statements or not. And some of the people that complain just say, Well, you can start to run that code conditionally or not by checking PHP versions because that check will still be done at runtime.

The defense of this argument was that if newer versions introduce backwards incompatible syntax, that code would not run on older PHP versions and using this ifdef directive, it would sort of avoid the eventual syntax problems. I thought it was actually a pretty sensible idea. I did not see many people accepting but that's my opinion.

What do you think, Arturs?

Arturs Sosins: Yeah, that is an awesome point and a valid point and I would actually really want something like that. If you are a developer who is building something to an end-client, basically you take what server he has or provide the server and build for that and it would work.

But if you're a developer who builds libraries, packages that you want to deliver to other developers, you would want to provide the maximum compatibilities across all versions. And as you said, if syntax change and it probably will change in PHP 6 or whatever it will be called, then some kind of this runtime check that would not even parse the code that is incorrect, completely ignore it, would be awesome addition and a must. I say it's a must.

Manuel Lemos: Right. And that's interesting because once in awhile, in the PHPClasses error logs, I get some warnings about some users that are submitting some codes that has syntax error or at least apparent syntax errors that are in reality caused by the PHP functions to highlight code.

When they find a syntax problem, they throw a warning that goes in the error logs. But it could be also be caused by code that is meant for newer PHP versions than the ones that are running on the site. And then, with that command, probably that problem would be avoided.

Well, I don't know. I'm not sure if there is a final decision regarding this feature.

Arturs Sosins: Well, the consequences of not implementing such feature I think would be that the version that could introduce the most changes, like it's possibly would be PHP 8 would not be so easily adapted. And it would take a long time to adapt to all the servers. It would actually damage it mostly.

Manuel Lemos: Yeah. Another point is that, for instance, now with the Facebook Hack language that we'll talk a bit more in detail ahead, they introduced a new syntax construct which could generate warnings, warnings actually when syntax pass and failed if there is not a thing like this.

And they way they solve it in the Hack language is that now, at the starting code that is not compatible with the regular PHP version, they start with the <?hh in HipHop. And if they start with that, that code in PHP, regular PHP interpreter will ignore that because it thinks that it's like HTML. Probably it would output it, which is really not good idea.

Arturs Sosins: Not a good idea.

Manuel Lemos: So, in this case, I think the ifdef statement would probably be a better  solution. And I think that this is not just for PHP versions. It also detect presence of PHP extensions. Well, that's what I got from this proposal. And sometimes, we are trying to interpret the original proposals and maybe getting it wrong, but that's what I got from this idea.

Ideas for PHP 6 engine (21:38)

Manuel Lemos: And talking about future of the PHP versions, a new page on PHP Wiki was created with the idea to talk about the thoughts, ideas, specifically for Zend Engine. Well, for starters, I am not so sure that Zend Engine will be the engine of PHP 6.

I think depending on Rasmus, probably interesting JIT Compiler or something, more like HipHop VM. It's something that is not for PHP 6, maybe for PHP 7. But, between now and then, I'm sure there would be some intensive discussions, because I can see that the community is split.

On one side, there are those that are sort of against rushing things, introducing substantial changes even on major versions. Because the experience of the past PHP 6 failure was that they tried to change too many things at once and it was taking too long.

So, now they are pushing more towards changes that will take a smaller effort and they would be introduced one at a time. So, it would not cause so many problems as the prior to PHP 6 attempt led.

So, here in this page, we can see some ideas, even assuming that Zend Engine will still be the one that would be using PHP 6. Some of these ideas somehow, we already talked about.

Arturs, did you find any interesting ideas that you think could be more worth mentioning?

Arturs Sosins: Well, I just skimmed through because basically it's engine related, maybe not so interesting for the end PHP users. I just skimmed through. And yes, basically, they want to encourage to use some good practices, like create a C wrapper or C++ API wrapper for some libraries so they could easily abstract them, use in PHP and makes them more separated from each other and not integrated, so it would be more modular and easier to extend and stuff like that.

And it all makes sense, yes. But given the large current code base, it might be hard to change the habits and re-implement everything. So I don't know what was it, what would be usable, what's not.

Manuel Lemos: Yeah. Basically, some people mentioned the eventuality of using the JIT Compiler, the JIT Engine. But here, what we see is basically is that they want to implement a JIT Engine on a Zend Engine.

So they are not conceding in replacing the Zend Engine with the HipHop Compiler, which is probably the most sensible idea because Facebook has many engineers working on it and they're certainly are maintaining. On the other hand, some people may not be so interested on relying on Facebook. But somehow, we already commented about that.

Other than that, I don't know if other ideas are so relevant. Probably, they are interesting but they do not introduce such significant changes, but more important than... I think, just in my opinion... the introduction of a JIT Engine. But OK, these are just few ideas and I think the debate is still starting up. There will plenty of other discussions until they get to a decision of what to do for PHP 6.

The Need for an Official PHP Specification (26:42)

Manuel Lemos: And now, also related with future PHP versions, there was a discussion that I thought was really relevant. Because now the different variants of the PHP engines that exist, which I know for sure PHP Zend Engine-based engine, there is now HipHop VM and there are some other solutions on .NET and Java Virtual Machines.

And so, the proposal here is that there would be an effort sometime in the future to create an official specification of the PHP language because there is none.

This makes it hard for other implementers to create a new engine that supports PHP. Because what we have in terms of specification is nothing, just mostly the PHP documentation and test suites that probably will help test these features that are well implemented but there's nothing formal to document this.

And as always, there is already an intense debate about this matter because some people that clearly are not interested in somehow promoting the existence of other implementations because they are probably too attach to the current official PHP version. So now, if you want just the PHP specification, go write it yourself because that's a lot of trouble.

That's the usual excuse from when people do not want certain things to happen. They start raising artificial difficulties which is silly because the PHP project has gone through so much work and development and creating a specification, at least in my opinion, would not be such a big effort. It's probably not an amusing effort, probably boring to write one, but I'm sure there are people that kind of enjoyed doing that.

Arturs Sosins: Yup, there are.

Manuel Lemos: So, why oppose to it. What do you think, Arturs?

Arturs Sosins: Well, writing specifications is one thing; enforcing it would be completely another. So, how to make the other developers follow the specification conventions, some of the stuff. But if there will be a specification written for PHP, I think it might produce more PHP-like engines created for different environments and machines that would interpret and run it on much more variety of devices. It could help PHP to be even more popular in that way.

Manuel Lemos: Right. But we think that some people of the PHP core are too attached to their own implementation. And well, the discussions are still going on, I think. Whether or not there'd be a specification, that wouldn't stop to create new engines that run PHP codes.

I think it would be nice but in reality, it would also certainly be really really necessary. But OK that's just my point of view.

Arturs Sosins: If someone thinks that specification is not necessary, just let them look at JavaScript CSS in different browser versions so they would understand.

Manuel Lemos: Yeah, but somehow that would happen anyway. I remember that those guys pf Phalenger, the PHP Engine that would run on .NET, before PHP had supported namespace, they're already supporting namespaces.

And they have sort of use a syntax for namespaces that was more like Java with the domain names in inverted order and separated with dots. And it worked. It didn't stop them to evolve the language. And it is even amazing that they used dots to separate the namespace components instead of back slashes.

I remember there was a huge discussion about namespace separator characters. And they complained a lot about it cause problems to use dots, but they did it, the .NET people. Well, actually the Phalenger people.

I remember that they were interviewed in the past and they said that the first thing that they need to do when they developed their own engine is that they need to create a specification for PHP. At least, from their own interpretation of what they think PHP should do.

Well, it seems it has helped them to probably create tests to make sure that their engine was implemented correctly. I don't know, maybe I'm not interpreting right the need for a PHP specification.

Is the Hack Language Going to Replace PHP? (32:52)

Manuel Lemos: Now, as I mentioned before, we're going to comment about the announcement of the Hack language that happened last month. Facebook has been developing this for a few months. Actually, I did not notice that they were doing it, because if I'm not mistaken, the repositories were public.

Well, anyway, they announced it now. And basically, it introduces some features that I think that are great. Some of them probably could not be proposed because somehow there are means to implement them or at least it's not something important. But others they perfectly make sense.

In this article, I divided the types of features in terms of benefits. So one benefit that I considered is bug prevention, features that help bug prevention. And then, there are features that enable some performance optimization. The others that are more meant for somehow some code reuse.

These features, they are really interesting. But if I could pick just one of the most interesting features I've mentioned, mainly the asynchronous programming support. Because as we comment frequently on the JavaScript JSClasses Hangout is always on how, at least how it's done in JavaScript because it requires lots of callbacks to implement things like handling with IO operations that are necessary asynchronous.

But in the regular PHP version, they are synchronous, which means that PHP waits for those operations to end before it proceeds executing any opcodes. And this prevents PHP to run multiple parallel asynchronous operation that could somehow provides some performance enhancement as I already wrote in past article, about doing asynchronous programming.

But the thing that I thought was more interesting regarding the Hack language implementation is that unlike JavaScript, they do not exactly implement it with a regular callbacks, introduced some keywords 'async' and 'await'.

And this 'await' statement is one that I found more interesting, because the way it works is that it takes some code as parameter and the code in front... the code that follows the await statement... is only resumed after any asynchronous operations that run inside the code that is declared with the 'await' keyword has finished.

And this is different from the typical callback system that calls some code because there are situations on which the use of callbacks makes it impossible to deal like we do the asynchronous code.

For instance, if you have some asynchronous code that is inside a for loop, you cannot  break that loop from inside the callback that is invoked when the asynchronous operation is completed. With this 'await' keyword, it makes it possible to actually do that. You take advantage of all the features of asynchronous programming but do it like we are using asynchronous programming.

Well, I don't know, this is my favorite new feature that the Hack language promotes for the reasons that I mentioned, also more in this article. What about you, Arturs? Do you also like this asynchronous programming feature or do you have some other favorite features?

Arturs Sosins: I'd say not only that. Basically, the first time I've heard of a Hack language, I was like, Oh, come on, really? Another language? Do we really need it?

Manuel Lemos: Right.

Arturs Sosins: But then, after reading more about it, I actually understood the whole idea behind it. So, basically, you can take your current PHP code, put it on a Hack server and it would run great. But if you need more speed, like define types and you have a faster, more efficient memory allocation.

You have some heavy tasks that runs on the background using async features. And as we see, as you told it yourself, there are keywords, 'await', 'async'. So if the design of the language is implemented that way, that we simply add those keywords, helping us use this feature of the Hack language without heavily modifying our existing PHP code.

So basically, Hack provides a sense... I hope PHP fans won't trash me for this, but provides a sense of scalability to PHP implementations. And that is an awesome idea. Kudos to the authors and the designers, because I think if Facebook continues with it and we can see a stable spine behind this project, many more developers could actually adopt it.

Manuel Lemos: Right. Well, the idea that I got from this was from operators to this project which was started last year. So we've only had a few months of development. From what I've seen regarding this implementation is that somehow Facebook is sending a message that we can bring all the features that PHP probably needs and we can do it very fast.

We do not need to wait years in lengthy discussions, people boycotting each other's ideas even though many people are pledging for them.

I don't know how core developers are reacting to this because if the PHP development does not get up to speed, I suspect that many PHP developers would simply switch to the Facebook Hack language and do not look back.

Arturs Sosins: I think yeah, that could be divided, as some regular developers could still use the PHP normal version but Hack could be like an enterprise version that would implemented to a more high scalable apps.

And well, if it really goes, there are no major flaws and Facebook will be really solid behind it, then I think by 2020 we might have HackClasses podcast here.

[Laughter]

Manuel Lemos: Yeah, that could be an idea.

Well, I'm thinking if closer to that, then probably some other version of PHPClasses or different language that is not even compatible to PHP, because it's different.

Anyway, another part of the article that I presented there is just giving an opinion about whether will Hack kill PHP. I think for several reasons, maybe not. For instance, the fact that the PHP Hack code is not backwards compatible.

Once you switch to Hack code, you cannot go back because it's not backwards compatible and some people may still only have PHP-based environment around their code. So, they probably support PHP because PHP code will run on Hack and Hack code may not run on PHP engine.

Another complaint that we hear is that HHVM, which is the base of the Hack language, does not yet support all platforms and PHP extensions. And this actually very true because even I tried to compile it on OpenSuSE and I simply failed. Or else I was not having enough patience to figure out the package that I need to have installed because it was taking too long and I sort of gave up. And that's regarding the platforms.

One detail that I mentioned is that based on the statistics of users of PHPClasses, 77% of percent of PHP developers use Windows on desktop. So, probably, they prefer to use Windows and there is not a version of the Hack language for Windows even if nobody uses Hack on Windows on production. It is important for developers that prefer doing it all in Windows.

Arturs Sosins: Exactly. That's the only reason I haven't even considered trying Hack for now, because I was running on Windows.

Manuel Lemos: Yeah. Actually, I was not thinking about you but now that you mentioned it. But so, you are part of the large majority, the 77% that use PHP on Windows. Even though many PHP developers like Microsoft and Windows and whatever, that's a fact.

Even though PHP is mostly running on Linux in production servers, at least for development, many people prefer Windows. If they are comfortable using Windows, that's something that Facebook Hack developers should consider if they want the Hack language to be successful.

Then, there is another reason, that some people do not trust Facebook. They probably would be scared if Facebook would do something against their will and they will regret having migrated to the Hack language. So this is more of psychological reason than fact, basically a fear but it should also be considered.

On the up side, the possibility of the Hack language kill PHP, one of the reasons that could happen is that the Hack engine implements many, many features that PHP users want to have in the language but for some reason the PHP Core developers have turned down when they were proposed.

For instance, annotations, annotations are often used to define types of variables and functions and stuff like that. The proposal to support annotations built-in the language was always sort of rejected and the implementations that exist are based on comments. And now with the Hack language type hinting, at least most of the needs for declaring types in annotations would not be needed.

So, this is just an example of a feature that PHP users would like to have in the language and somehow it's a feature that others proposed that got turned down.

Another reason is the fact that the Hack language is based on HHVM done in compilation in JIT engine. And this is great for many reasons, not just for speed, but also since the HHVM can run on multi-threaded environment, it would make more efficient use of memory. And eventually, we would be able to make better use of server resources.

And finally, I proposed somehow that if the Hack language could support PHP backwards compatible mode that is not like the one that suggested with the <?hh, they would to use comments.

And the comments allow everything. So they could use comments as hints and this would allow the Hack-enabled code to still be compatible with PHP code. This is just an idea that I'm presenting. I do not recall anybody else as providing this idea.

So, these are the reasons why I think there is a chance somehow the Hack language could somehow kill the PHP version. Do you have any other comments regarding this?

Arturs Sosins: Except that your idea might save the PHP community from Hack invasion. That bringing backwards compatibility would be a great solution.

Manuel Lemos: Yes, because I believe that it would be silly... because Hack is in practice a fork of PHP, not a fork in terms of forking the code base, but just the language concepts. And if the PHP community starts splitting to Hack supporters and non-Hack supporters,  I think this may weaken the PHP community. And it wouldn't be good, I think. At least that's the way I see it.

OK, well, we covered a lot about the Hack language.

Did You Mean Advanced Email Validation in PHP (49:19)

Manuel Lemos: Now, moving on with another topic. This time, I'd like to just comment briefly about an article that actually I wrote but it appeared in PHPClasses blog but it is not exactly a part of the main PHPClasses blog.

This is an article related with a package that I have developed. Actually, I've been developing this package since many years ago, since 1999. This package is for email address validation.

Well, the basic email address validation is mostly to just to probably use some regular expression to see if email address meets a certain syntax requirements. But that does not mean that the address is valid. There are many other things that can be done to test if this address is valid.

So I developed this library, which I've been improving over the years to test more cases, not just the basic syntax of the email address. And it can do a lot more things nowadays. And to explain that, I talked about six forms of invalid email addresses.

For instance typing mistakes. For instance, some people can write yaho.com and forget an O there. That is a common mistake. Although domain code exists, usually it's a result of the typing mistake or instead people could type 'gamail' instead of 'gmail'.

And this is a situation that would be interesting to detect as an invalid email address, because email addresses are used for registrations at sites. And if a user enters an invalid address because of the typing mistake, then the user would be frustrated because he will not be able to complete the registration. This component that takes this form invalid email address is very useful.

Another form of not exactly typing mistake is some people sometimes are frustrated with sites that require registration and they type some bogus domain which does not exist. It would always be great to detect that as invalid email address.

And the same goes for temporary domains like those domains that are associated with the IP address of the user machine, but it's a temporary domain like No-IP.com or others like that.

Another case is disposable mailboxes. Some users do not really want register on some sites because it's a bureaucratic process and so they supply temporary email addresses created by sites that provide disposable mailboxes.

Which means that after they register, they would not be looking at those mailboxes, probably just want to confirm their registration. But having sites accepting this type of invalid email addresses could be somehow frustrating because the site would not be able to contact the user to send in important messages.

If there are for instance newsletters, you'll be sending newsletter with your CPU and bandwidth resources for nothing because they would not be seeing the newsletter. So it would be better to not accept this disposable email addresses as valid.

And then, there also a case that is not so common but there are some domains that are used to create what is called spam traps. Spam traps are basically an idea to catch those spammers that are harvesting email addresses from web pages on other sites.

So, some people that decided to nail those spammers, they plant on sites some spam trap email addresses that are attached to domains, that if you try to send email to those addresses, it will go to what they call a honey pot which would cause the IP address of the server that is sending messages to those addresses to be blacklisted.

The problem is that, sometimes, this type of spam traps are abused and some users with malicious intentions supply them to real sites. For instance, for registrations and when the site sends the confirmation email messages, that would cause innocent sites to be blacklisted.

So, it's also good to avoid as much as possible certain email addresses that are served by spam trap domains and servers. But this is something that is not very supported, like blacklists because of potential abuse.

Another type of invalid addresses is those servers that are rejecting messages coming from certain sites or delivered to certain mailboxes because somehow the user decide to reject messages on that address, maybe specific to a certain origin site. And this is another type of invalid email address that should also be avoided.

Finally, the last situation is not really a permanently invalid addresses but may cause some temporary types of invalid addresses which are mailboxes that are full, full of messages. And then, at least until the user deletes some messages, they cannot accept any more emails. So they may cause email messages to be refused at least for awhile.

And the same goes for grey lists. Grey lists are an approach that is used by some servers that temporarily reject the first attempt to deliver a message, but that message would be accepted later if the server tries again. And this is basically to avoid getting messages from spammer that only try once to deliver the message, like most of the spam-sending software.

Anyway, these are the types of invalid email addresses that can be somehow detected by this class that I've developed. Not all types of invalid email addresses were supported. For instance, the types of email addresses that a result of typing mistakes, I have only implemented that feature recently.

Basically, what I did, I looked at all email addresses of users that registered on the PHP Classes site but somehow did not confirm the registration. I looked at those email addresses I gathered, grouped them by domain and that's how I found many common typing mistakes that people make. So it's not a typing mistake that was done once, but was done many times.

And this class is capable of detecting some typing mistakes, obviously not all because all the possible combinations of letters that if could be typed wrongly would be like infinite and it would not be possible. At least some frequently-used typing mistakes that this class can detect. And that was one type of invalid email address that was supported recently in latest versions.

Basically, this article just tells how to use this class. The users do not need to understand completely all the possible scenarios of invalid email address. This article tries to explain but they can see the skip that part of the article and they can jump to the actual section in which the code is presented to them on how to use the class.

Well, whoever is interested on this class who probably has a site that wants to prevent users to enter invalid email addresses can use this class, the only thing that I would like to add is precisely about that feature of detecting email address that results from typing mistakes. And in that case, the class not only returns, success or failure code regarding the invalid email address but also returns a suggestion to fix the email address that probably the user has entered.

If you want to see this in action, go on to the PHPClasses site and try to register and type one of those typo of addresses and see if that this site is suggesting to correct the email address. It has proven to be very useful, although typing mistakes are not very, very frequent, they happen once in awhile. And I think I hope with this solution, this class could be useful to many people.

Arturs, did you use any type email validation on sites that you have probably taking registrations?

Arturs Sosins: I usually don't use registrations such as but mostly third-party authentication like Facebook or something. But I told it on JSClasses podcast and I want to repeat it here, that what you have gathered here is actually awesome and you would have to make it a paid premium service and sell it to others.

[Laughter]

Arturs Sosins: Because, well, it not only enhances the experience of correcting the typos but also prevents spam registration and reduces the overload of mail servers. So, this is really an awesome package.

Manuel Lemos: Yeah, well, I did it out of need, really needed to avoid this problem. This is actually something that I have considered to promote  as product but I would need to have time to put it in my development queue.

But now that you emphasize that suggestion again, I will consider it in double. Maybe who knows, I'll give it some priority because it's not something that does not have such a great need.

Also, regarding what you told me, by the end of the article, I mentioned precisely what you said about authenticating users using accounts probably some other site like Facebook, Google or Hotmail ans Twitter. And this is interesting because it is also a solution that also is possible.

Arturs Sosins: Let the other servers handle all the registration and probably the...

Manuel Lemos: Exactly, exactly. It also provides a better user experience because the users do not have to wait for the confirmation email address. For instance, in PHPClasses site, if you register you can choose Facebook or Google or Hotmail, actually Microsoft account, and also Yahoo! and Stack Overflow and GitHub.

Because those are the most frequently used sites by developers. And this reduces the frustration of users that do not want to go to have their traffic registration processed.

And in the end, it's mostly two clicks, one click to confirm that you want to use certain servers, use Facebook to authenticate. On the second step, you just confirm that you authorize the site to access your account details including the email address.

Even Facebook not only can supply the email address with the provision of the user, of course, but also it can tell if the email address is validated or not. So if you don't want to accept email addresses that were not validated, you can actually tell the user, your email address is not validated, we will not accept it. I think Google and Microsoft OAuth in servers support it.

So, as I was mentioning this is achieved with the OAuth protocol. I'll also mention I have another class that can be used for that purpose. But actually, that class is very, very popular and if you did not know it already, you can just go on the link of the class that I've mentioned on the article and find out more about it.

JavaScript Innovation Award Winners of January 2014 (1:05:19)

Manuel Lemos: And now, moving on to one of our final sections. This podcast has been very interesting but already quite long. So we are wrapping up with the final sections on which we comment about the Innovation Award nominees.

We're going to start first for the Innovation Award nominees of the JSClasses site. We'll talk first actually from the nominees of January, they were voted in February and then in March, the results came out. And so, there were two nominees this month.

Which one would you like to comment, Arturs?

Arturs Sosins: Let me comment... Let me screenshare first. Where was the screenshare?

So, let me actually show Dixan Santiesteban's class, the Deobfuscator tool that basically makes the compressed minified JavaScript or any other language I think PHP has also supported, makes it more readable basically by splitting to more lines like definitions variable.

Of course, it can't provide the meaningful names for the variables. They were strict but as you see it becomes more readable and so, basically, it does what it said, it deobfuscate the code.

Manuel Lemos: Right, that's great. And it's good to notice that not only it supports JavaScript but also PHP.

Arturs Sosins: Yeah.

Manuel Lemos: Although for PHP, it's easier because PHP has the tokenizer extension that can parse the code and does mostly complete stuff and then you just need to write some code to rewrite. And there are plenty of classes in JSClasses that do that.

And on my part, I'll also would like to comment on the other class, which is Node nmap. Basically, nmap is a security tool to scan networks , I mean computers, and check which ports are open on those computers.

And the Node-nmap tool is basically a wrapper around this nmap command to actually use it to make it useful from JavaScript, actually Node.js-based JavaScript. So this is very interesting package.

So, just to comment, this one, it's developed by Jason Gerfen from the United States. So kudos for Jason. He has been a great contributor. I hope he can send more interesting packages so we can talk more about them in the future.

Regarding the JSClasses Innovation Award, I would like to just mention one thing that you'll be looking every month. It's about the ranking of the Innovation Award winners. So, every month, the site computes the ranking, not only by author but also by country.

So considering just 2014, we don't have many packages to talk about. But you can already see that by country, you can see already the countries that are having more points accumulated. And this is interesting because you can start seeing which countries can be more productive.

As I mentioned before, the country that wins the Innovation Award of the year will earn a special prize. And so far,  we are just considering just one month. So, you cannot see ia trend of each country will win, but I think from the outlook of this ranking which are the countries that tend to send more packages, I mean innovative packages, because not all packages are nominated.

So, that was for JSClasses.

PHP Innovation Award Winners of January 2014 (1:10:25)

Manuel Lemos: Now, we are going to move on to the nominees of the PHP Innovation Award of January. Arturs, which one would you like to comment?

Arturs Sosins: OK, let me comment on. Let me screenshare again. So, basically first one, I would want to emphasize is a class of Roger Baklund, from Norway and what it does, it's actually kind of word parsing and processing.

And what it does, it actually tries to determine if a word is a real word not a made-up word based on vowels and consonants, hence the name VowCon. And it goes through the list of the words, assigns some kind of score on how possibly this word could be real or not.

As we see, the threshold this 3.5. And so if the threshold is higher, then probably it's not a real word . So it's actually quite interesting.

Manuel Lemos: This package was also used by another package that was also nominated. This one is just to detect the probability of the package, a word not be real and the other package uses it to detect eventual spam considering that as a factor. So it was quite interesting.

Arturs Sosins: OK, I wanted to mention another class. Basically, what it does... just let me show you... it's a class of zeageorge or something like that. He's from Greece. And what he does, he does an interesting thing.

As I'm lately more into Lua developing, there is a thing as meta methods. So when you access some kind of properties, you could define the callbacks and that falls to implement lots of kind of interesting stuff like bridges between languages or bindings. And this class does exactly the same thing in PHP.

I actually was thinking recently, I wanted to port some kind of library from Lua to PHP or JavaScript and use it to in a Web-based environment. And I was thinking, will it be possible to do such thing?

And basically, yeah, thanks to the author. It would be possible to define some kind of callbacks that would be called when you change some values of the properties of the object. So, yeah, that's a nice class and thanks to the author.

Manuel Lemos: Yeah, it's always good to have a class that somehow go further beyond what the language can do. And on my part, I also would like to comment on a couple of classes, especially that we don't have time to comment on all of the six nominees. So we will just comment about four and one that I would like to comment is this one, MySQL Hot Backup by Orazio Principe from Italy.

Basically, what this class does is it allows to keep backups of whole database or even just a set of tables and do it  while the application's still running. You do not need to stop it. So, it provides a solution that incrementally copies the records of the tables that were modified since the last backup. So, if you do initial backup, it probably take longer but subsequent backups would take less time because you only copy the records that were changed.

This  is an interesting approach. We have seen more solutions that somehow emulate the mysqldump command that comes with MySQL distribution. And that is a sort of rough-type backup, not only takes more time but it may even have problems of missing data during the changes. Or it requires the database tables to be locked which is not a good thing. So kudos to Orazio for his solution. It's really, really interesting.

And the other solution that I wanted to comment, the other nominated package, it's one that does something that's interesting, which is analyzing the characters of a text file, could be a text string, detecting the type of encoding. It could be, for instance, UTF8 with UTF16 or UTF32.

It looks at the patterns of the bytes that appear there and returns the result for all the possible encodings. Or if it is some other encoding, it just returns false, telling that it cannot be sure what would be the encoding.

So, it's interesting. This package from Marcelo Entraigas from Argentina.

Also now, mentioning the ranking of the Innovation Award also for the PHPClasses, in this case, we have more nominated packages and the competition seems to be stronger here. In this case, Italy is in the front, because it has already three nominated packages and in this case were from two authors.

And this is already showing that together, multiple authors of a country may help their country win and even if they do not win individually. So, we are still very early in the championship of the year in terms of individual authors or countries, but we'll be keeping up with this rankings every month to see how the evolution goes.

Conclusion (1:17:51)

Manuel Lemos: Well, with this, basically we have ended this podcast. We talked a lot about several interesting topics. So we took quite a long time to do it. So, it only remains to thank you Arturs for coming and filling in for Cesar. It's great to have you.

On my behalf, that is all for now. Bye.

Arturs Sosins: Bye.

[Music]




You need to be a registered user or login to post a comment

Login Immediately with your account on:



Comments:

1. Thank you, some feedback - The Digital Orchard (2014-05-14 01:36)
Some feedback about the podcast... - 3 replies
Read the whole comment and replies

2. Integrating Hack into PHP - Jurgen_v_O (2014-05-14 01:36)
Integrating Hack into PHP... - 1 reply
Read the whole comment and replies

3. PHP developers on WIndows - Dave Kennard (2014-04-25 22:09)
Site visitors != developers... - 1 reply
Read the whole comment and replies



  Blog PHP Classes blog   RSS 1.0 feed RSS 2.0 feed   Blog Is the Hack Language ...   Post a comment Post a comment   See comments See comments (8)   Trackbacks (0)