KBOC16: Participant block
Algorithms have at least one input and one output. All algorithm endpoints are organized in groups. Groups are used by the platform to indicate which inputs and outputs are synchronized together. The first group is automatically synchronized with the channel defined by the block in which the algorithm is deployed.
Endpoint Name | Data Format | Nature |
---|---|---|
keystroke | system/kboc16_keystroke/1 | Input |
file | system/uint64/1 | Input |
client_id | system/text/1 | Input |
score_file | robertodaza/competition_kboc16/1 | Output |
Endpoint Name | Data Format | Nature |
---|---|---|
features | system/kboc16_keystroke/1 | Input |
id | system/text/1 | Input |
Algorithms may use functions and classes
declared in libraries. Here you can see the libraries and
import names used by this library. You don't
need to import the library manually on your code, the platform
will do it for you. Just use the object as it has been imported
with the selected named. For example, if you choose to import a
library using the name lib
, then access function
f
within your code like lib.f()
.
Library | Import as |
---|---|
robertodaza/kboc16_baseline_matchers/5 | kboc16_baseline_matchers |
xxxxxxxxxx
# Keystroke Biometric Ongoing Competition (KBOC) is an official competition of
# The IEEE Eighth International Conference on Biometrics: Theory, Applications, and Systems (BTAS 2016)
# organized by ATVS Biometric Research Group
# Participant Block: this code (in Python) comprises the evaluation block of the KBOC16 competition.
# The genuine and impostor samples are unknown except for the training samples (first 4 samples).
# In order to avoid overtifing of the systems and any possible misconduct, the performance evaluation is made over 100 of the 300 users.
# This first 100 users are representative of the complete set of 300 users. As an example, the difference between the performance of the baseline algorithms is less than 1%
# The evaluation over the 300 users will be done during the final weeks of the competition
# Together with this block, you can access the library kboc16_baseline_matchers (robertodaza/kboc16_baseline_matchers/5) with 3 baseline systems (see the examples below).
# HOW TO PARTICIPATE: participants can modify the code of this algorithm to include their keystroke recognition systems. It is allow the use of libraries and toolboxes out of the included in this example. The participant code could be private while its results should be available for the competition organizers (in order to include it in the final competition report).
import numpy
import bob #This is a useful toolbox including the most popular machine learning and apttern recognition functions (see http://idiap.github.io/bob/ )
class Algorithm:
def __init__(self):#Module where we define the globale variables .
self._enrollment_set = [] #We define 1 list for the enrollment samples of one specific user
self._templates=None #This variable will contain the training matrix with the enrollment sample sof all users
def process(self, inputs, outputs):#We work with objects
###############################################################
### TRAINING ###
###############################################################
if self._templates is None:#The first time entry here.
self._templates = {}#It defines a dictionary
group = inputs.groupOf('features')#we have a block with inputs of two different bases and are not synchronized. We have to make groups, with this fuction we open a group.
while group.hasMoreData():#While the group has data, it runs.
group.next()# Request new data group
pre_features = group['features'].data.timestamps# In this example, we use only the 'timestamps'. Information about the key events can be obtained and used from group['features'].data.key events'
#It is necessary to manage the case insensitive samples.
#some samples include mistakes in form of extra keys pressed. Those keys (mainly "shift" key) do not change the text but the number of keys pressed.
if numpy.size(self._enrollment_set,axis=0)==0:
l1=len(pre_features)#It is the length of a list 'pre_features'
feature_vector=[]
for i in range(1,l1):
feature_vector.append(float(pre_features[i]))#initial time value (time reference always equal to 0) is removed
self._enrollment_set.append(feature_vector)
else:#Case insensitive problem: the solution adopted in this example is to equalize the lengths in a very simplist way (let's improve it)
l2=len(pre_features)
feature_vector=[]
if l1<=l2:
for i in range(1,l1):
feature_vector.append(float(pre_features[i]))#we take the length of the first vector
self._enrollment_set.append(feature_vector)
else:
for i in range(1,l1):
if i<l2:
feature_vector.append(float(pre_features[i]))
else:
feature_vector.append(float(0))#Add zeros to have the same length
self._enrollment_set.append(feature_vector)
if inputs["id"].isDataUnitDone():#'id' is the number of users. Once all the enrollment samples of the user id are received, we save the enrollment set into the variable self._templates which contains the length of features and the enrollment set. You can use this enrollment set for train your models.
template_id = group['id'].data.text # template_id will contain the number of user
self._templates[template_id] = dict(
f= l1,
enrollment_set=numpy.array(self._enrollment_set)
)
self._enrollment_set= [] #When it ends, the variable is initialized
###################################################################
#### TEST(according the remaining twenty samples of each user) ####
###################################################################
# The probe samples will be processed after all the templates samples
comparison_ids = inputs['client_id'].data.text #This is the user number of the probe sample.
file_number= str(inputs['file'].data.value) #file number of the probe sample
data = inputs['keystroke'].data #timestamps of the probe sample
pre_features = data.timestamps ##These are the times of the test samples
#Once again, it is necessary to manage the case insensitive samples
l2=len(pre_features)
l1=self._templates[comparison_ids]['f']
feature_vector=[]
if l1<=l2:##Case insensitive problem: the solution adopted in this example is to equalize the lengths in a very simplist way (let's improve it)
for i in range(1,l1):
feature_vector.append(float(pre_features[i]))#initial time value (time reference always equal to 0) is removed
else:
for i in range(1,l1):
if i<l2:
feature_vector.append(float(pre_features[i]))
else:
feature_vector.append(float(0))##Add zeros to have the same length
probe_features=numpy.array(feature_vector)
score_info = []
# As a baseline, we include 3 matchers in the library kboc16_baseline_matchers. Here you can add you own matchers
d=kboc16_baseline_matchers.classifier_manhattand_scaled_modified(self._templates[comparison_ids]['enrollment_set'],probe_features);
#d=kboc16_baseline_matchers.classifier_knn_norm(self._templates[comparison_ids]['enrollment_set'],probe_features);
#d=kboc16_baseline_matchers.classifier_knn_mahal(self._templates[comparison_ids]['enrollment_set'],probe_features);
score = d*-1# score=-d is better than score=1/(d+0.001).In some cases (with large dynamic margin between scores), the inverse of the distance can be problematic.
score_info.append({#'score' and ' file_id' are included in 'score_info'.
'score': score,
'file_id': file_number,
})
outputs['score_file'].write({#Write on the output 'score_info' and 'client_identity' which contain the user number.
'client_identity': comparison_ids,
'scores': score_info
},
)
return True
The code for this algorithm in Python
The ruler at 80 columns indicate suggested POSIX line breaks (for readability).
The editor will automatically enlarge to accomodate the entirety of your input
Use keyboard shortcuts for search/replace and faster editing. For example, use Ctrl-F (PC) or Cmd-F (Mac) to search through this box
Keystroke Biometric Ongoing Competition (KBOC) is an official competition of the IEEE Eighth International Conference on Biometrics: Theory, Applications, and Systems (BTAS 2016) organized by ATVS Biometric Research Group.
Participant Block: this code (in Python) comprises the evaluation block of the KBOC16 competition.
The genuine and impostor samples are unknown except for the training samples (first 4 samples). In order to avoid overtifing of the systems and any possible misconduct, the performance evaluation is made over 100 of the 300 users. This first 100 users are representative of the complete set of 300 users. As an example, the difference between the performance of the baseline algorithms is less than 1%. The evaluation over the 300 users will be done during the final weeks of the competition. Together with this block, you can access the library kboc16_baseline_matchers (robertodaza/kboc16_baseline_matchers/5) with 3 baseline systems (see the examples below).
HOW TO PARTICIPATE: participants can modify the code of this algorithm to include their keystroke recognition systems. It is allow the use of libraries and toolboxes out of the included in this example. The participant code could be private while its results should be available for the competition organizers (in order to include it in the final competition report).
Modified in the version 2:score=1/(d+0.001) by score=-d. In some cases (with large dynamic margin between scores), the inverse of the distance can be problematic.
This table shows the number of times this algorithm has been successfully run using the given environment. Note this does not provide sufficient information to evaluate if the algorithm will run when submitted to different conditions.