-
Something I didn’t approach in my last blog is collections. Hack comes with a variety of collections for organizing data.
Data structures represent a fundamental part of a programming language, because they will determine the information flow in the application.
PHP up to version 5 had a single type for data collections, called “array”. This data type can have three uses: array, hash table, or a combination of the two.
To facilitate the construction of new structures, a number of iteratori were introduced in PHP 5. Unfortunately, the resulting structures had the purpose of accessing objects in a similar fashion with arrays.
Not until PHP 5.3 data structures like SplStack and many others that are truly different were introduced.
However, structures like vectors and tuples were never natively introduced. They can be built, but it is neither simple, nor intuitive.
HHVM’s Hack comes with a different approach, a series of native collections that are ready to be used.
Collection types
The list of collections is:
- Vector – indexed list of items,
- Map – dictionary type hash table,
- Set – list of items that only stores unique values
- Pair – a particular vector case that only has two elements.
Vector, Map, and Set also have immutable (read-only) equivalents. These are: ImmVector, ImmMap and ImmSet. The purpose of these data types is to expose the information for reading purposes and not allow modifications. An immutable collection can be directly generated using the constructor, or using the methods: toImmVector, toImmMap and respectively toImmSet.
Even more, there are a series of abstract classes to help easily implement similar structures:
Vector
The advantage of a vector is that it will always have the keys in sequence and the order of the elements is not going to change. When it comes to arrays, there isn’t any simple way to check if it should behave as a hash table or as a vector. For vectors, unlike for hash tables, the key value is not relevant, only the sequence and the number of elements are important.
Let’s take an example:
1<?hh 2 3function listVector($vector) { 4 echo 'Listing array: ' . PHP_EOL; 5 for($i = 0; $i < count($vector); $i++) { 6 echo $i . ' - ' . $vector[$i] . PHP_EOL; 7 } 8} 9 10$array = array(1, 2, 3); 11 12listVector($array); 13 14// eliminating an element from the array 15unset($array[1]); 16 17listVector($array);
The result will be:
1Listing array: 20 - 1 31 - 2 42 - 3 5Listing array: 60 - 1 7 8Notice: Undefined index: 1 in ../vector.hh on line 6 91 -
The reason is very simple: count returns the real number of elements, but the index is not guaranteed sequential. When the second element of the array was removed, the number of elements was reduced by one, but the index with value 1 was no longer set and the last index is equal to the size of the array, so it will never be reached.
Let’s take the same example but using a vector:
1<?hh 2… 3$vector = Vector{1, 2, 3}; 4 5listVector($vector); 6 7// eliminating an element from the vector 8$vector->removeKey(1); 9 10listVector($vector);
Like we anticipated, the result is:
1Listing array: 20 - 1 31 - 2 42 - 3 5Listing array: 60 - 1 71 - 3
It is worth mentioning that “unset” can not be used, because it is not a key to be eliminated, but the element itself, and the next value in the vector will take its’ place.
Another important thing to mention is that when an index doesn’t exist, an exception of the type “OutOfBoundsException” will be thrown.
Some examples that will trigger the exception above:
1<?hh 2 3$vector = Vector{1,2,3,4}; 4 5// it will work because the key with value 1 exists 6$vector->set(1, 2); 7 8// it will not work because the key with value 4 doesn't exist yet 9$vector->set(4, 5); 10 11// it will not work for the same reason as above 12$vector[4] = 5; 13 14// for addition only method that don't provide the key work 15$vector[] = 5; 16 17// or 18array_push($vector, 5);
For accessing elements, the “OutOfBoundsException” problem remains the same. For instance, if the index 10 doesn’t exist:
1var_dump($vector[$unsetKey]);
Another more special case is when the element doesn’t exist, but the method “get” is used:
1var_dump($vector->get($unsetKey));
The example above will not generate an error, but the result will be “null” when the key doesn’t exist. I find this strange, because an element with the value null can exist in the vector, and the result will be the same.
To avoid the confusion between undefined elements and elements that are null, there is a special method to check if the key exists:
1var_dump($vector->containsKey($unsetKey));
Removing elements from the vector is done with:
1$vector->remove($key);
Or to remove the last element:
1$vector->pop();
Map
In a hash table, unlike a vector, the order of the elements is not very relevant, but the key-value association is very important. For this reason, a Map is also called a “dictionary”, because you can easily get from a key to a value, since they are “mapped”, hence the name “Map”.
The HHVM implementation will also retain the order in which the elements were introduced.
In PHP, the equivalent of a Map is an associative array.
Unlike Vector, Map needs a key that will permanently be bind with the element, even if new values are added or removed from the collection.
The functions array_push or array_shift will not work for Map, because a key is not sent and the key-value association would not be controlled:
1<?hh 2 3$map = Map{0 => 'a', 1 => 'b', 3 => 'c'}; 4 5array_push($map, 'd'); 6 7array_unshift($map, 'e'); 8 9var_dump($map);
Will generate the following result:
1Warning: Invalid operand type was used: array_push expects array(s) or collection(s) in ../map.hh on line 5 2 3Warning: array_unshift() expects parameter 1 to be an array, Vector, or Set in ../map.hh on line 7 4object(HH\Map)#1 (3) { 5 [0]=> 6 string(1) "a" 7 [1]=> 8 string(1) "b" 9 [3]=> 10 string(1) "c" 11}
As you can see, the elements were not added and each of the cases generated a Warning.
The actual insert can be done using:
1<?hh 2 3$map = Map{0 => 'a', 1 => 'b', 3 => 'c'}; 4 5// adding an element using the array syntax 6$map['new'] = 'd'; 7 8// adding an element using the method provided by the structure 9$map->set('newer', 'e'); 10 11var_dump($map);
The result will be:
1object(HH\Map)#1 (5) { 2 [0]=> 3 string(1) "a" 4 [1]=> 5 string(1) "b" 6 [3]=> 7 string(1) "c" 8 ["new"]=> 9 string(1) "d" 10 ["newer"]=> 11 string(1) "e" 12}
Unlike Vector, because the element is closely linked with the key, unset is a viable method for removing an element:
1unset($map[$key]);
The structure also has a method for removing the element with a particular key:
1$map->remove($key);
For this case, none of the options will generate an error, if the key is not set.
The “OutOfBoundsException” exception is also found here for keys that are not defined, and just like for Vectors, there is a method to test if the key exists:
1$map->contains($key);
Similarly to Vector, there is a method that will return true if the key exists and null if not:
1$map->get($key);
To make sure that a “OutOfBoundsException” will not be raised, a loop over a Map should not be done using “for” , but rather “foreach”.
Because the vector’s method “pop” does not use a key, it isn’t present in the Map structure.
Set
Set has the purpose of keeping the values unique. For this structure, the values are restricted to the scalar types: string and integer.
The interface for this structure is much simpler than Vector and Map, because the purpose is a lot more limited.
For Sets the key can not be accessed, but it is relevant in a special way.
Let’s take an example to illustrate this:
1<?hh 2$set = Set{'a', 'b', 'c'}; 3 4foreach($set as $key => $val) { 5 echo $key . ' - ' . $val . PHP_EOL; 6}
The result will be:
1a - a 2b - b 3c - c
The key and the value are identical, a clever way to keep unicity.
However, the process is transparent, fact that allows adding elements without a need for a key:
1<?hh 2 3$set = Set{'a', 'b', 'c'}; 4 5array_push($set, 'd'); 6 7array_unshift($set, 'e'); 8 9$set[] = 'f'; 10 11var_dump($set);
There will be a result similar to the one from vectors:
1object(HH\Set)#1 (6) { 2 string(1) "e" 3 string(1) "a" 4 string(1) "b" 5 string(1) "c" 6 string(1) "d" 7 string(1) "f" 8}
Even though new values can be added using the “[]” operator, they can’t be referenced using this operator:
1<?hh 2 3$set = Set{'a', 'b', 'c'}; 4 5echo $set['a'];
It will generate the following error:
1Fatal error: Uncaught exception 'RuntimeException' with message '[] operator not supported for accessing elements of Sets' in ../set.hh:5 2Stack trace: 3#0 {main}
For removing elements only the native method (remove) and methods that don’t require a key can be used:
1<?hh 2 3$set = Set{'a', 'b', 'c', 'd'}; 4 5array_pop($set); 6 7array_shift($set); 8 9$set->remove('b'); 10 11var_dump($set);
The result will be:
1object(HH\Set)#1 (1) { 2 string(1) "c" 3}
Unlike Vector and Map, the “remove” method will receive the value to be removed, not the key.
For Set there isn’t any access key for elements, therefore about all we can do is to check if an element exists, using “contains”:
1$set->contains($value);
The method will return a bool showing if the element exists or not.
Pair
A pair is a collection with two elements. It can’t have more or fewer. Just like in Vectors, the elements are indexed using a key that in this particular case can have only two values 0 and 1.
There aren’t a lot of things to be said about this data structure, because the elements can not be removed, added or replaced. This is the reason why it doesn’t have an immutable equvelent, because the structure itself is not flexible:
1<?hh 2 3$pair = Pair{'a', 'b'}; 4 5foreach($pair as $key => $val) { 6 echo $key . ' - ' . $val . PHP_EOL; 7}
The result will be:
10 - a 21 - b
A very simple structure for a very simple purpose.
Common ground
Almost all structures presented above have few common methods and behaviors. Almost all, because Set and especially Pair are more restrictive through their nature and lack some features which Vector and Map have.
Filter
It’s a filtering function that comes from functional programming. The purpose is to filter a data structure and to generate a new one of the same type. The exception is Pair, because of the number of elements restriction. The equivalent in PHP is array_filter.
Vector and Map have two methods: filter and filterWithKey. These methods take an argument of type “callable”, in other words a function:
1<?hh 2 3$vector = Vector{'a', 'b', 'c', 'd', 'e'}; 4 5// eliminate the element with value 'a' 6$result = $vector->filter($val ==> $val != 'a'); 7 8// eliminate every other element using the key 9$result2 = $vector->filterWithKey(($key, $val) ==> ($key % 2) == 0); 10 11var_dump($vector); 12var_dump($result); 13var_dump($result2);
The result will be:
1object(HH\Vector)#1 (5) { 2 [0]=> 3 string(1) "a" 4 [1]=> 5 string(1) "b" 6 [2]=> 7 string(1) "c" 8 [3]=> 9 string(1) "d" 10 [4]=> 11 string(1) "e" 12} 13object(HH\Vector)#3 (4) { 14 [0]=> 15 string(1) "b" 16 [1]=> 17 string(1) "c" 18 [2]=> 19 string(1) "d" 20 [3]=> 21 string(1) "e" 22} 23object(HH\Vector)#5 (3) { 24 [0]=> 25 string(1) "a" 26 [1]=> 27 string(1) "c" 28 [2]=> 29 string(1) "e" 30}
As you’ve noticed, the result of the “callable” function is treated as a bool and according to this the elements are added to the resulting structure.
Map has an identical behavior with Vector, the only difference is in the nature of the keys.
Something interesting is that a collection can also be immutable, because the operation doesn’t modify the original structure, but the it will also have the type of the original structure:
1<?hh 2 3$vector = Vector{'a', 'b', 'c'}; 4 5$vector = $vector->toImmVector(); 6 7// eliminate the element with value 'a' 8$result = $vector->filter($val ==> $val != 'a'); 9 10var_dump($vector); 11var_dump($result);
The result will be:
1object(HH\ImmVector)#2 (3) { 2 [0]=> 3 string(1) "a" 4 [1]=> 5 string(1) "b" 6 [2]=> 7 string(1) "c" 8} 9object(HH\ImmVector)#4 (2) { 10 [0]=> 11 string(1) "b" 12 [1]=> 13 string(1) "c" 14}
Pair also has the same functions as Vector and Map, but the behavior is not identical, because of the fact that Pair can only have 2 elements, no less, no more. For this reason, when a Pair is filtered, the result will be ImmVector, a similar structure with Pair but with a variable number of elements:
1<?hh 2 3$pair = Pair{'a', 'b'}; 4 5// eliminate the element with value 'a'
1$result = $pair->filter($val ==> $val != 'a'); var_dump($result);
The resulting structure will be:
1object(HH\ImmVector)#3 (1) { 2 [0]=> 3 string(1) "b" 4}
Set only has the “filter” method, because, as was demonstrated earlier, the keys are identical with the values. If it had had a method with keys, it would have worked the same.
Map
Another function coming from functional languages is “Map”. This aims to modify the values of a structure using a function, the resulting structure having the type of the source. In PHP, the equivalent is array_map.
Similarly with filter, Vector and Map have the common methods: “map” and “mapWithKey”. In this case also, they take a “callable” as an argument:
1<?hh 2 3$vector = Vector {'a', 'b', 'c'}; 4 5$result = $vector->map($val ==> $val . $val); 6 7$result2 = $vector->mapWithKey(($key, $val) ==> str_repeat($val, 1 + $key)); 8 9var_dump($vector); 10var_dump($result); 11var_dump($result2);
The result will be:
1object(HH\Vector)#1 (3) { 2 [0]=> 3 string(1) "a" 4 [1]=> 5 string(1) "b" 6 [2]=> 7 string(1) "c" 8} 9object(HH\Vector)#3 (3) { 10 [0]=> 11 string(2) "aa" 12 [1]=> 13 string(2) "bb" 14 [2]=> 15 string(2) "cc" 16} 17object(HH\Vector)#5 (3) { 18 [0]=> 19 string(1) "a" 20 [1]=> 21 string(2) "bb" 22 [2]=> 23 string(3) "ccc" 24}
The result of the “callable” function is the new value of the element in the structure.
Just like with “filter”, an immutable collection will result in a new immutable collection.
Also similar with “filter” is the fact that the “map” function applied to a Pair will result in an ImmVector:
1<?hh 2 3$pair = Pair{'a', 'b'}; 4 5$result = $pair->map($val ==> $val . $val); 6 7var_dump($result);
Will result in:
1object(HH\ImmVector)#3 (2) { 2 [0]=> 3 string(2) "aa" 4 [1]=> 5 string(2) "bb" 6}
Conversion
Some of the elements can be converted to different types:
from \ to Vector Map Set Pair Array Vector yes yes yes no yes Map yes yes yes no yes Set yes no yes no yes Pair yes yes yes no yes Array yes yes yes no yes There are several structural restrictions to the table above:
- Any structure that is getting converted to Set must only contain scalar values of type int and string:
1(Map{})->add(Pair {'a', new stdClass()}) 2 ->toSet();
Will generate the error:
1Fatal error: Uncaught exception 'InvalidArgumentException' with message 'Only integer values and string values may be used with Sets' in …
- When a Map is converted to any other structure, except array, it will loose the keys in most cases.
The conversion from an array to other structures is done using:
1$vector = new Vector ($array);
Beside Pair, all structures above have a single parameter that implements Traversable for the constructor.
Conclusions
Hack brings a new perspective over the most popular data type in PHP. Facebook’s reason is a simple one, optimization. If you have a consistent behavior, that particular structure can be optimized. In PHP that’s not exactly possible, because of the fact that an array in PHP can be any type of collection.
From the data structure point of view, I find it interesting to have this kind of data types. In frameworks, there are usually structures that emulate the behavior of the collections introduced by Hack. For instance in an ORM, a collection of objects is usually represented as a vector, because the purpose is to iterate over its’ values. An object that represents the values of the fields from a table will be a Map like structure, because the value of the field is closely related to the field name.
I find it very interesting not only that now we have this structures, but also that we have the interfaces to implement new ones.
I hope Hack will influence PHP to bring purpose specific structures into the language.
-
Introduction
About a month ago, Facebook released the Hack programming language.
Since then, apocalyptic articles related to this language and to how it will replace PHP have appeared everywhere. The title of this article was inspired by “Will Hack Kill PHP?“.
What is even more strange to me is that it was followed by a wave of negative assessments related to PHP, apparently Hack “fixes” the previously mentioned language. In my opinion, the language has to be “broken” in the first place to be “fixed”.
Off-course PHP has many drawbacks, like any other programming language, but there must be a good reason why it is the most popular language for the Web. After all, Facebook used it for a long time, and now it is not replaced, they are improving it… aren’t they?
One thing is for certain, it is probably one of the least inspired names. When you search for “Facebook hack” you will find anything else but this programming language…
About Hack
Hack is running on HHVM. HHVM is Facebook’s try to optimize the PHP language with Just In Time complication, the latest approach in optimization of the language. Basically, Facebook is trying to reduce their costs by optimizing the language interpreter, and now with a new language all together. After all, if we think about Facebook’s infrastructure, it’s normal for them to do this, even a relatively minor optimization will lead to a consistent cost reduction.
Initially, I thought it was an “improved” version, but it seems like it is another language altogether, basically it’s a PHP with something extra!
A small tutorial of the language is at: http://hacklang.org/tutorial/.
The tutorial doesn’t cover all the features of the language, more details are at: http://docs.hhvm.com/manual/en/hacklangref.php.
Practically, about all the features that differ in Hack from PHP are optional. You can almost write PHP and it will work. Contrarily to expectations, not even specifying the type of the input/output of the variables is not required.
Because after all it’s a different programming language altogether, I’m not going to go into much details on all the new features. That is more the purpose of a book, not an article.
I only want to point out a few features that I found interesting.
Strangely, at least at the begging, at runtime the type of the result, even though it is sent, is not necessarily interpreted.
Let’s take an example:
1<?hh 2 3function a($a): void { 4 return true; 5} 6 7echo a('a');
This example will have the output… 1.
Basically, at runtime only the input type is checked, not the output one.
To run the Typechecker in the current directory, an empty file must be added:
1$ touch .hhconfig
Then run:
1$ hh_client
As you can see, the data types are checked in a separate step beside runtime.
At the manual execution of the Typechecker, it will find all the inconstancies. The purpose is for it to identify the problems before the runtime, for instance when you edit a file, not when the app is actually running.
The Typechecker output is:
1../test.php:4:9,12: Invalid return type 2../test.php:3:17,20: This is void 3../test.php:4:9,12: It is incompatible with a bool
Unfortunately, there is still a lot of work to be done with this feature. If we try to validate only at runtime, using “type hinting” like in PHP, the function becomes:
1function a(int $a): void { 2 return true; 3} 4 5echo a('a');
The output of the Typechecker is not changed, but upon execution the result will be:
1Fatal error: Argument 1 passed to a() must be an instance of int, string given in ../test2.php on line 5
Basically, the Typechecker is doing what the type hinting is not and the the latter is now also receiving scalar type arguments.
Even if the only change was to add scalar arguments checking, I would have still considered it an important improvement.
It is a syntax that is more popular with the functional languages.
An example:
1<?hh 2 3$sqr = $x ==> $x * $x; 4 5echo $sqr(5) . PHP_EOL;
Off course the result will be 25.
I find it a very interesting and clear way to represent small logic.
In the new syntax, a function can also return another function:
1$add = $x ==> $y ==> $x + $y; 2 3$result = $add(1); 4 5echo $result(2) . PHP_EOL;
The result will be 3.
If a variable from inside a lambda expression doesn’t exist in the scope of the function definition, then it will be retrieved from the environment in which the expression was declared:
1// variable in the current scope 2$z = 5; 3 4$addZ = $x ==> $x + $z; 5// change the variable in from current scope 6$z = 6; 7 8// perform the add 9echo $addZ(1) . PHP_EOL;
The result will be… 6!
The equivalent in PHP is:
1$addZ = function ($x) use ($z) { 2 return $x + $z; 3}
The value of $z will be retrieved for the environment where the function was defined, not as a reference to the variable.
Off course that’s not the case when the variable from the outside of the function is an object, in this case it will be passed by reference:
1<?hh 2 3class a { 4 public function __construct(public string $x) {} 5 public function __toString() { 6 return $this->x; 7 } 8} 9 10// variable in the current scope 11$z = new a('Claudiu'); 12 13$addZ = $x ==> $x . ' ' . $z . '!'; 14 15// change the variable that will be used for concatenation 16$z->x = 'World'; 17 18// run the concatenation 19echo $addZ('Hello') . PHP_EOL;
The output will be:
1Hello World!
Again, a syntax more popular with the functional programming languages. The purpose is to validate a more specific type of structure than an array.
The reason is very good, validate simple data structures. The structures that are getting checked should contain the elements defined in the shape.
1<?hh 2 3// defining a structure 4newtype Circle = shape('radius' => int); 5 6// a function that will is using the type of the structure above 7function areaCircle(Circle $param) { 8 return M_PI * $param['radius'] * $param['radius']; 9} 10 11// a series of shapes that are using the structure 12$circle = shape('radius' => 10); 13$cilinder = shape('radius' => 10, 'height' => 15); 14 15// a structure that should not work pass as Circle 16$sqr = shape('side' => 10); 17 18echo areaCircle($circle) . PHP_EOL; 19echo areaCircle($cilinder) . PHP_EOL; 20echo areaCircle($sqr) . PHP_EOL;
The output is:
1314.15926535898 2314.15926535898 3 4Notice: Undefined index: radius in /home/brand/test.hh on line 6 5 6Notice: Undefined index: radius in /home/brand/test.hh on line 6 70
A little disappointing, I was hoping that the parameter that doesn’t match the structure will trigger an error, but it passes.
Not even the Typechecker finds anything wrong.
The intention is very good, now we just have to wait for the working version.
Conclusion
Probably Hack will influence PHP, which is normal in the end, it happens all the time with programming languages.
Will it replace PHP? I don’t think so, probably there will be a lot of adopters for cost reduction or a better structuring of the code.
It is not very kely for this language to be successful in the following years other than for projects of medium and large sizes. For small projects usually shared hosting is used, and this generally doesn’t have the latest PHP version, it is even less likely for it to have the latest HHVM. This is probably one of the least interesting arguments, but in the end most of the websites on the web are of small and very small size, they make up the “mass”.
An easier approach to optimization is to only use HHVM. In theory, you don’t have to change anything and the results should be visible immediately! Practically HHVM is not 100% compatible with Zend Engine, but this problem is getting less of an issue with each version. One of the priorities for HHVM is to interpret the code the same way Zend Engine does, but to be much more efficient!