2 Dictionaries and General Hash Tables

Sections

People and computers spend a large amount of time with searching. Dictionaries are an abstract data structure which facilitates searching for certain objects. An important way of implementing dictionaries is via hash tables.

The functions and operations described in this chapter have been added very recently and are still undergoing development. It is conceivable that names of variants of the functionality might change in future versions. If you plan to use these functions in your own code, please contact us.

2.1 Dictionaries

IsDictionary( obj ) C

A dictionary is a growable collection of objects that permits to add objects (with associated values) and to check whether an object is already known.

IsLookupDictionary( obj ) C

A lookup dictionary is a dictionary, which permits not only to check whether an object is contained, but also to retrieve associated values, using the operation LookupDictionary.

KnowsDictionary( dict, key ) O

checks, whether key is known to the dictionary dict, and returns true or false accordingly. key must be an object of the kind for which the dictionary was specified, otherwise the results are unpredictable.

LookupDictionary( dict, key ) O

looks up key in the lookup dictionary dict and returns the associated value. If key is not known to the dictionary, fail is returned.

There are several ways how dictionaries are implemented: As lists, as sorted lists, as hash tables or via binary lists. A user however will just have to call NewDictionary and obtain a ``suitable'' dictionary for the kind of objects she wants to create. It is possible however to create hash tables (see General hash table definitions and operations) and dictionaries using binary lists (see DictionaryByPosition).

NewDictionary( obj, look[, objcoll] ) F

creates a new dictionary for objects such as obj. If objcoll is given the dictionary will be for objects only from this collection, knowing this can improve the performance. If objcoll is given, obj may be replaced by false, i.e. no sample object is needed.

The function tries to find the right kind of dictionary for the basic dictionary functions to be quick. If look is true, the dictionary will be a lookup dictionary, otherwise it is an ordinary dictionary.

The use of two objects, obj and objcoll to parametrize the objects a dictionary is able to store might look confusing. However there are situations where either of them might be needed:

The first situation is that of objects, for which no formal ``collection object'' has been defined. A typical example here might be subspaces of a vector space. GAP does not formally define a ``Grassmannian'' or anything else to represent the multitude of all subspaces. So it is only possible to give the dictionary a ``sample object''.

The other situation is that of an object which might represent quite varied domains. The permutation (1,10⁶) might be the nontrivial element of a cyclic group of order 2, it might be a representative of S_10⁶. In the first situation the best approach might be just to have two entries for the two possible objects, in the second situation a much more elaborate approach might be needed.

An algorithm that creates a dictionary will usually know a priori, from what domain all the objects will be, giving this domain permits to use a more efficient dictionary.

This is particularly true for vectors. From a single vector one cannot decide whether a calculation will take place over the smallest field containing all its entries or over a larger field.

As there are situations where the approach via binary lists is explicitly desired, such dictionaries can be created deliberately.

DictionaryByPosition( list, lookup ) F

creates a new (lookup) dictionary which uses PositionCanonical in list for indexing. The dictionary will have an entry dict!.blist which is a bit list corresponding to list indicating the known If look is true, the dictionary will be a lookup dictionary, otherwise it is an ordinary dictionary.

2.2 General Hash Tables

This chapter describes hash tables for general objects. We hash by keys and also store a value. Keys cannot be removed from the table, but the corresponding value can be changed. Fast access to last hash index allows you to efficiently store more than one array of values -- this facility should be used with care.

This code works for any kind of object, provided you have a DenseIntKey or KeyIntSparse method to convert the key into a positive integer. These methods should ideally be implemented efficiently in the core.

Note that, for efficiency, it is currently impossible to create a hash table with non-positive integers.

2.3 General hash table definitions and operations

IsHash( obj ) C

The category of hash tables for arbitrary objects (provided an IntKey function is defined).

PrintHashWithNames( hash, keyName, valueName ) O

Print a hash table with the given names for the keys and values.

GetHashEntry( hash, key ) O

If the key is in hash, return the corresponding value. Otherwise return fail. Note that it is not a good idea to use fail as a value.

AddHashEntry( hash, key, value ) O

Add the key and value to the hash table.

RandomHashKey( hash ) O

Return a random Key from the hash table (Random returns a random value).

HashKeyEnumerator( hash ) O

Enumerates the keys of the hash table (Enumerator enumerates values).

2.4 Hash keys

The crucial step of hashing is to transform key objects into integers such that equal objects produce the same integer.

TableHasIntKeyFun( hash ) P

If this filter is set, the hash table has an IntKey function in its component hash!.intKeyFun.

The actual function used will vary very much on the type of objects. However GAP provides already key functions for some commonly encountered objects.

DenseIntKey( objcoll, obj ) O

returns a function that can be used as hash key function for objects such as obj in the collection objcoll. objcoll typically will be a large domain. If the domain is not available, it can be given as false in which case the hash key function will be determined only based on obj. (For a further discussion of these two arguments see NewDictionary, section NewDictionary).

The function returned by DenseIntKey is guaranteed to give different values for different objects. If no suitable hash key function has been predefined, fail is returned.

SparseIntKey( objcoll, obj ) O

returns a function that can be used as hash key function for objects such as obj in the collection objcoll. In contrast to DenseIntKey, the function returned may return the same key value for different objects. If no suitable hash key function has been predefined, fail is returned.

2.5 Dense hash tables

Dense hash tables are used for hashing dense sets without collisions, in particular integers. Stores keys as an unordered list and values as an array with holes. The position of a value is given by the attribute IntKeyFun or the function returned by DenseIntKey, and so KeyIntDense must be one-to-one.

DenseHashTable( ) F

Construct an empty dense hash table. This is the only correct way to construct such a table.

2.6 Sparse hash tables

Sparse hash tables are used for hashing sparse sets. Stores keys as an array with fail denoting an empty position, stores values as an array with holes. Uses HashFunct applied to the IntKeyFun (respectively the result of calling SparseIntKey) of the key. DefaultHashLength is the default starting hash table length; the table is doubled when it becomes half full.

SparseHashTable( [intkeyfun] ) F

Construct an empty sparse hash table. This is the only correct way to construct such a table. If the argument intkeyfun is given, this function will be used to obtain numbers for the keys passed to it.

GetHashEntryIndex( hash, key ) F

If the key is in hash, return its index in the hash array.

DoubleHashArraySize( hash ) F

Double the size of the hash array and rehash all the entries. This will also happen automatically when the hash array is half full.

In sparse hash tables, the integer obtained from the hash key is then transformed to an index position, this transformation is done using the hash function HashFunct:

HashFunct( key, i, size ) F

This will be a good double hashing function for any reasonable KeyInt (see Cormen, Leiserson and Rivest, Introduction to Algorithms, 1e, p. 235).

2.7 Fast access to last hash index

These functions allow you to use the index of last hash access or modification. Note that this is global across all hash tables. If you want to have two hash tables with identical layouts, the following works: GetHashEntry( hashTable1, object ); GetHashEntryAtLastIndex( hashTable2 ); These functions should be used with extreme care, as they bypass most of the inbuilt error checking for hash tables.

GetHashEntryAtLastIndex( hash ) O

Returns the value of the last hash entry accessed.

SetHashEntryAtLastIndex( hash, newValue ) O

Resets the value of the last hash entry accessed.

SetHashEntry( hash, key, value ) O

Resets the value corresponding to key.

[Top] [Up] [Previous] [Next] [Index]

GAP 4 manual
May 2002