Collecting Hack

Citește postarea în română

Share on:

Stamp-Collection

Something I didn’t approach in my last blog is collections. Hack comes with a variety of collections for organizing data.

Data structures represent a fundamental part of a programming language, because they will determine the information flow in the application.

PHP up to version 5 had a single type for data collections, called “array”. This data type can have three uses: arrayhash table, or a combination of the two.

To facilitate the construction of new structures, a number of  iteratori were introduced in PHP 5. Unfortunately, the resulting structures had the purpose of accessing objects in a similar fashion with arrays.

Not until PHP 5.3 data structures like  SplStack and many others that are truly different were introduced.

However, structures like vectors and tuples were never natively introduced. They can be built, but it is neither simple, nor intuitive.

HHVM’s Hack comes with a different approach, a series of  native collections that are ready to be used.

Collection types

The list of collections is:

  • Vector – indexed list of items,
  • Map – dictionary type hash table,
  • Set – list of items that only stores unique values
  • Pair – a particular vector case that only has two elements.

Vector, Map, and Set also have immutable (read-only) equivalents. These are:  ImmVectorImmMap and ImmSet. The purpose of these data types is to expose the information for reading purposes and not allow modifications. An immutable collection can be directly generated using the constructor, or using the methods: toImmVector, toImmMap and respectively toImmSet.

Even more, there are a series of abstract classes to help easily implement similar structures:

 

Vector

The advantage of a vector is that it will always have the keys in sequence and the order of the elements is not going to change. When it comes to arrays, there isn’t any simple way to check if it should behave as a hash table or as a vector. For vectors, unlike for hash tables, the key value is not relevant, only the sequence and the number of elements are important.

Let’s take an example:

 1<?hh
 2
 3function listVector($vector) {
 4     echo 'Listing array: ' . PHP_EOL;
 5     for($i = 0; $i < count($vector); $i++) {
 6          echo $i . ' - ' . $vector[$i] . PHP_EOL;
 7     }
 8}
 9
10$array = array(1, 2, 3);
11
12listVector($array);
13
14// eliminating an element from the array
15unset($array[1]);
16
17listVector($array);

The result will be:

1Listing array:
20 - 1
31 - 2
42 - 3
5Listing array:
60 - 1
7
8Notice: Undefined index: 1 in ../vector.hh on line 6
91 -

The reason is very simple: count returns the real number of elements, but the index is not guaranteed sequential. When the second element of the array was removed, the number of elements was reduced by one, but the index with value 1 was no longer set and the last index is equal to the size of the array, so it will never be reached.

Let’s take the same example but using a vector:

 1<?hh
 2 
 3$vector = Vector{1, 2, 3}; 
 4
 5listVector($vector); 
 6
 7// eliminating an element from the vector
 8$vector->removeKey(1); 
 9
10listVector($vector);

Like we anticipated, the result is:

1Listing array:
20 - 1
31 - 2
42 - 3
5Listing array:
60 - 1
71 - 3

It is worth mentioning that “unset” can not be used, because it is not a key to be eliminated, but the element itself, and the next value in the vector will take its’ place.

Another important thing to mention is that when an index doesn’t exist, an exception of the type “OutOfBoundsException” will be thrown.

Some examples that will trigger the exception above:

 1<?hh 
 2
 3$vector = Vector{1,2,3,4}; 
 4
 5// it will work because the key with value 1 exists 
 6$vector->set(1, 2);
 7
 8// it will not work because the key with value 4 doesn't exist yet
 9$vector->set(4, 5);
10
11// it will not work for the same reason as above
12$vector[4] = 5;
13
14// for addition only method that don't provide the key work
15$vector[] = 5;
16
17// or
18array_push($vector, 5);

For accessing elements, the “OutOfBoundsException” problem remains the same. For instance, if the index 10 doesn’t exist:

1var_dump($vector[$unsetKey]);

Another more special case is when the element doesn’t exist, but the method “get” is used:

1var_dump($vector->get($unsetKey));

The example above will not generate an error, but the result will be “null” when the key doesn’t exist. I find this strange, because an element with the value null can exist in the vector, and the result will be the same.

To avoid the confusion between undefined elements and elements that are null, there is a special method to check if the key exists:

1var_dump($vector->containsKey($unsetKey));

Removing elements from the vector is done with:

1$vector->remove($key);

Or to remove the last element:

1$vector->pop();

Map

In a hash table, unlike a vector, the order of the elements is not very relevant, but the key-value association is very important. For this reason, a Map is also called a “dictionary”, because you can easily get from a key to a value, since they are “mapped”, hence the name “Map”.

The HHVM implementation will also retain the order in which the elements were introduced.

In PHP, the equivalent of a Map is an associative array.

Unlike Vector, Map needs a key that will permanently be bind with the element, even if new values are added or removed from the collection.

The functions array_push or array_shift will not work for Map, because a key is not sent and the key-value association would not be controlled:

1<?hh 
2
3$map = Map{0 => 'a', 1 => 'b', 3 => 'c'};
4
5array_push($map, 'd');
6
7array_unshift($map, 'e');
8
9var_dump($map);

Will generate the following result:

 1Warning: Invalid operand type was used: array_push expects array(s) or collection(s) in ../map.hh on line 5
 2
 3Warning: array_unshift() expects parameter 1 to be an array, Vector, or Set in ../map.hh on line 7
 4object(HH\Map)#1 (3) {
 5  [0]=>
 6  string(1) "a"
 7  [1]=>
 8  string(1) "b"
 9  [3]=>
10  string(1) "c"
11}

As you can see, the elements were not added and each of the cases generated a Warning.

The actual insert can be done using:

 1<?hh 
 2
 3$map = Map{0 => 'a', 1 => 'b', 3 => 'c'};
 4
 5// adding an element using the array syntax
 6$map['new'] = 'd';
 7
 8// adding an element using the method provided by the structure
 9$map->set('newer', 'e');
10
11var_dump($map);

The result will be:

 1object(HH\Map)#1 (5) {
 2  [0]=>
 3  string(1) "a"
 4  [1]=>
 5  string(1) "b"
 6  [3]=>
 7  string(1) "c"
 8  ["new"]=>
 9  string(1) "d"
10  ["newer"]=>
11  string(1) "e"
12}

Unlike Vector, because the element is closely linked with the key, unset is a viable method for removing an element:

1unset($map[$key]);

The structure also has a method for removing the element with a particular key:

1$map->remove($key);

For this case, none of the options will generate an error, if the key is not set.

The “OutOfBoundsException” exception is also found here for keys that are not defined, and just like for Vectors, there is a method to test if the key exists:

1$map->contains($key);

Similarly to Vector, there is a method that will return true if the key exists and null if not:

1$map->get($key);

To make sure that a “OutOfBoundsException” will not be raised, a loop over a Map should not be done using “for” , but rather “foreach”.

Because the vector’s method “pop” does not use a key, it isn’t present in the Map structure.

Set

Set has the purpose of keeping the values unique. For this structure, the values are restricted to the scalar types: string and integer.

The interface for this structure is much simpler than Vector and Map, because the purpose is a lot more limited.

For Sets the key can not be accessed, but it is relevant in a special way.

Let’s take an example to illustrate this:

1<?hh
2$set = Set{'a', 'b', 'c'}; 
3
4foreach($set as $key => $val) {
5     echo $key . ' - ' . $val . PHP_EOL;
6}

The result will be:

1a - a
2b - b
3c - c

The key and the value are identical, a clever way to keep unicity.

However, the process is transparent, fact that allows adding elements without a need for a key:

 1<?hh
 2
 3$set = Set{'a', 'b', 'c'};
 4
 5array_push($set, 'd');
 6
 7array_unshift($set, 'e');
 8
 9$set[] = 'f';
10
11var_dump($set);

There will be a result similar to the one from vectors:

1object(HH\Set)#1 (6) {
2  string(1) "e"
3  string(1) "a"
4  string(1) "b"
5  string(1) "c"
6  string(1) "d"
7  string(1) "f"
8}

Even though new values can be added using the “[]” operator, they can’t be referenced using this operator:

1<?hh
2
3$set = Set{'a', 'b', 'c'};
4
5echo $set['a'];

It will generate the following error:

1Fatal error: Uncaught exception 'RuntimeException' with message '[] operator not supported for accessing elements of Sets' in ../set.hh:5
2Stack trace:
3#0 {main}

For removing elements only the native method (remove) and methods that don’t require a key can be used:

 1<?hh 
 2
 3$set = Set{'a', 'b', 'c', 'd'}; 
 4
 5array_pop($set); 
 6
 7array_shift($set); 
 8
 9$set->remove('b');
10
11var_dump($set);

The result will be:

1object(HH\Set)#1 (1) {
2  string(1) "c"
3}

Unlike Vector and Map, the “remove” method will receive the value to be removed, not the key.

For Set there isn’t any access key for elements, therefore about all we can do is to check if an element exists, using “contains”:

1$set->contains($value);

The method will return a bool showing if the element exists or not.

Pair

A pair is a collection with two elements. It can’t have more or fewer. Just like in Vectors, the elements are indexed using a key that in this particular case can have only two values 0 and 1.

There aren’t a lot of things to be said about this data structure, because the elements can not be removed, added or replaced. This is the reason why it doesn’t have an immutable equvelent, because the structure itself is not flexible:

1<?hh 
2
3$pair = Pair{'a', 'b'}; 
4
5foreach($pair as $key => $val) {
6     echo $key . ' - ' . $val . PHP_EOL;
7}

The result will be:

10 - a
21 - b

A very simple structure for a very simple purpose.

Common ground

Almost all structures presented above have few common methods and behaviors. Almost all, because Set and especially Pair are more restrictive through their nature and lack some features which Vector and Map have.

Filter

It’s a filtering function that comes from functional programming. The purpose is to filter a data structure and to generate a new one of the same type. The exception is Pair, because of the number of elements restriction. The equivalent in PHP is array_filter.

Vector and Map have two methods: filter and filterWithKey. These methods take an argument of type “callable”, in other words a function:

 1<?hh 
 2
 3$vector = Vector{'a', 'b', 'c', 'd', 'e'}; 
 4
 5// eliminate the element with value 'a' 
 6$result = $vector->filter($val ==> $val != 'a');
 7
 8// eliminate every other element using the key
 9$result2 = $vector->filterWithKey(($key, $val) ==> ($key % 2) == 0);
10
11var_dump($vector);
12var_dump($result);
13var_dump($result2);

The result will be:

 1object(HH\Vector)#1 (5) {
 2  [0]=>
 3  string(1) "a"
 4  [1]=>
 5  string(1) "b"
 6  [2]=>
 7  string(1) "c"
 8  [3]=>
 9  string(1) "d"
10  [4]=>
11  string(1) "e"
12}
13object(HH\Vector)#3 (4) {
14  [0]=>
15  string(1) "b"
16  [1]=>
17  string(1) "c"
18  [2]=>
19  string(1) "d"
20  [3]=>
21  string(1) "e"
22}
23object(HH\Vector)#5 (3) {
24  [0]=>
25  string(1) "a"
26  [1]=>
27  string(1) "c"
28  [2]=>
29  string(1) "e"
30}

As you’ve noticed, the result of the “callable” function is treated as a bool and according to this the elements are added to the resulting structure.

Map has an identical behavior with Vector, the only difference is in the nature of the keys.

Something interesting is that a collection can also be immutable, because the operation doesn’t modify the original structure, but the it will also have the type of the original structure:

 1<?hh 
 2
 3$vector = Vector{'a', 'b', 'c'}; 
 4
 5$vector = $vector->toImmVector();
 6
 7// eliminate the element with value 'a'
 8$result = $vector->filter($val ==> $val != 'a');
 9
10var_dump($vector);
11var_dump($result);

The result will be:

 1object(HH\ImmVector)#2 (3) {
 2  [0]=>
 3  string(1) "a"
 4  [1]=>
 5  string(1) "b"
 6  [2]=>
 7  string(1) "c"
 8}
 9object(HH\ImmVector)#4 (2) {
10  [0]=>
11  string(1) "b"
12  [1]=>
13  string(1) "c"
14}

 

Pair also has the same functions as Vector and Map, but the behavior is not identical, because of the fact that Pair can only have 2 elements, no less, no more. For this reason, when a Pair is filtered, the result will be ImmVector, a similar structure with Pair but with a variable number of elements:

1<?hh 
2
3$pair = Pair{'a', 'b'}; 
4
5// eliminate the element with value 'a'
1$result = $pair->filter($val ==> $val != 'a'); var_dump($result);

The resulting structure will be:

1object(HH\ImmVector)#3 (1) {
2  [0]=>
3  string(1) "b"
4}

Set only has the “filter” method, because, as was demonstrated earlier, the keys are identical with the values. If it had had a method with keys, it would have worked the same.

Map

Another function coming from functional languages is “Map”. This aims to modify the values of a structure using a function, the resulting structure having the type of the source. In PHP, the equivalent is array_map.

Similarly with filter, Vector and Map have the common methods: “map” and “mapWithKey”. In this case also, they take a “callable” as an argument:

 1<?hh 
 2
 3$vector = Vector {'a', 'b', 'c'}; 
 4
 5$result = $vector->map($val ==> $val . $val);
 6
 7$result2 = $vector->mapWithKey(($key, $val) ==> str_repeat($val, 1 + $key));
 8
 9var_dump($vector);
10var_dump($result);
11var_dump($result2);

The result will be:

 1object(HH\Vector)#1 (3) {
 2  [0]=>
 3  string(1) "a"
 4  [1]=>
 5  string(1) "b"
 6  [2]=>
 7  string(1) "c"
 8}
 9object(HH\Vector)#3 (3) {
10  [0]=>
11  string(2) "aa"
12  [1]=>
13  string(2) "bb"
14  [2]=>
15  string(2) "cc"
16}
17object(HH\Vector)#5 (3) {
18  [0]=>
19  string(1) "a"
20  [1]=>
21  string(2) "bb"
22  [2]=>
23  string(3) "ccc"
24}

The result of the “callable” function is the new value of the element in the structure.

Just like with “filter”, an immutable collection will result in a new immutable collection.

Also similar with “filter” is the fact that the “map” function applied to a Pair will result in an ImmVector:

1<?hh 
2
3$pair = Pair{'a', 'b'}; 
4
5$result = $pair->map($val ==> $val . $val);
6
7var_dump($result);

Will result in:

1object(HH\ImmVector)#3 (2) {
2  [0]=>
3  string(2) "aa"
4  [1]=>
5  string(2) "bb"
6}

Conversion

Some of the elements can be converted to different types:

from \ to Vector Map Set Pair Array
Vector yes yes yes no yes
Map yes yes yes no yes
Set yes no yes no yes
Pair yes yes yes no yes
Array yes yes yes no yes

There are several structural restrictions to the table above:

  1. Any structure that is getting converted to Set must only contain scalar values of type int and string:
1(Map{})->add(Pair {'a', new stdClass()})
2    ->toSet();

Will generate the error:

1Fatal error: Uncaught exception 'InvalidArgumentException' with message 'Only integer values and string values may be used with Sets' in …
  1. When a Map is converted to any other structure, except array, it will loose the keys in most cases.

The conversion from an array to other structures is done using:

1$vector = new Vector ($array);

Beside Pair, all structures above have a single parameter that implements Traversable for the constructor.

Conclusions

Hack brings a new perspective over the most popular data type in PHP. Facebook’s reason is a simple one, optimization. If you have a consistent behavior, that particular structure can be optimized. In PHP that’s not exactly possible, because of the fact that an array in PHP can be any type of collection.

From the data structure point of view, I find it interesting to have this kind of data types. In frameworks, there are usually structures that emulate the behavior of the collections introduced by Hack. For instance in an ORM, a collection of objects is usually represented as a vector, because the purpose is to iterate over its’ values. An object that represents the values of the fields from a table will be a Map like structure, because the value of the field is closely related to the field name.

I find it very interesting not only that now we have this structures, but also that we have the interfaces to implement new ones.

I hope Hack will influence PHP to bring purpose specific structures into the language.