Page Comparison

...

To get familiar with the approach:.

I had to use some absolute paths later because it was getting late. Test success.

Expand

Code Block

(log2vec) marius@mleng:~/source/Log2Vec$ python pipeline.py -i data/HDFS.log -t HDFS -o results/
rawlogs:/home/marius/source/Log2Vec/data/HDFS.log
variables have been removed
logs without variables:/home/marius/source/Log2Vec/results/HDFS/without_variables.log
input: /home/marius/source/Log2Vec/results/HDFS/without_variables.log
syn_file /home/marius/source/Log2Vec/results/HDFS/sys.txt
ant_file /home/marius/source/Log2Vec/results/HDFS/ants.txt
delete is added
INFO is added
dfs Got exception
thread transfer block
python code/getTempLogs.py -input /home/marius/source/Log2Vec/results/HDFS/without_variables.log -output /home/marius/source/Log2Vec/results/HDFS/for_training.log
input: /home/marius/source/Log2Vec/results/HDFS/without_variables.log 
output: /home/marius/source/Log2Vec/results/HDFS/for_training.log
alpha:0.050000, alpha_syn:0.025000, alpha_ant:0.300000, alpha_rel:0.010000
belta_syn:0.700000, belta_ant:0.200000, belta_rel:0.800000
Starting training using file /home/marius/source/Log2Vec/results/HDFS/for_training.log
train_file: /home/marius/source/Log2Vec/results/HDFS/for_training.log 
word_num:54
Vocab size: 55
Words in train file: 16350
triplet file total line: 5, relation num: 3, match: 5
synonyms file total line: 21, words: 20, ignore words: 0
antonyms file total line: 1, words: 1, ignore words: 0
------
code/LRWE/src/lrcwe -train /home/marius/source/Log2Vec/results/HDFS/for_training.log -synonym /home/marius/source/Log2Vec/results/HDFS/sys.txt -antonym /home/marius/source/Log2Vec/results/HDFS/ants.txt -output /home/marius/source/Log2Vec/results/HDFS/embedding.model -save-vocab /home/marius/source/Log2Vec/results/HDFS/embedding.vocab -belta-rel 0.8 -alpha-rel 0.01 -alpha-ant 0.3 -size 32 -min-count 1 -window 2 -triplet triples.txt
Total in Embeddings vocabulary: 55
Training set character count:  41
------
python code/mimick/make_dataset.py --vectors /home/marius/source/Log2Vec/results/HDFS/embedding.model --w2v-format --output /home/marius/source/Log2Vec/results/HDFS/words.pkl
[dynet] random seed: 3040219324
[dynet] allocating memory: 512MB
[dynet] memory allocation done.
The dy.parameter(...) call is now DEPRECATED.                                                     |
        There is no longer need to explicitly add parameters to the computation graph.
        Any used parameter will be added automatically.
100% |############################################################################################|
[lr=0.006 clips=13 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=0 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=0 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=0 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=0 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=0 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=0 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=0 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=0 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=5 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=6 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=9 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=11 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=3 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=4 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=4 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=5 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=4 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=3 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=3 updates=54] None
100% |############################################################################################|
------
python code/mimick/model.py --dataset /home/marius/source/Log2Vec/results/HDFS/words.pkl  --vocab /home/marius/source/Log2Vec/results/HDFS/changed_log/vocab.txt --output /home/marius/source/Log2Vec/results/HDFS/oov.vector --num-epochs 20 --learning-rate 0.006000 --num-lstm-layers 1 --cosine --dropout -1.000000 --all-from-mimick --hidden-dim 250 --char-dim 36
log input: /home/marius/source/Log2Vec/results/HDFS/without_variables.log
word vectors input: /home/marius/source/Log2Vec/results/HDFS/embedding.model
log vectors output: /home/marius/source/Log2Vec/results/HDFS/log.vector
end~~
------
 python code/Log2Vec.py -logs /home/marius/source/Log2Vec/results/HDFS/without_variables.log -word_model /home/marius/source/Log2Vec/results/HDFS/embedding.model -log_vector_file /home/marius/source/Log2Vec/results/HDFS/log.vector -dimension 32
---------
0.9438120169363016

Code Block

(log2vec) marius@mleng:~/source/Log2Vec$ python log2vec.py -i results -t HDFS
# no errors

(log2vec) marius@mleng:~/source/Log2Vec$ python code/preprocessing.py -rawlog ./code/data/BGL.log
rawlogs:./code/data/BGL.log
variables have been removed
logs without variables:./code/data/BGL_without_variables.log

(log2vec) marius@mleng:~/source/Log2Vec$ python code/get_syn_ant.py -logs ./code/data/BGL_without_variables.log  -ant_file ./middle/ants.txt
input: ./code/data/BGL_without_variables.log
syn_file ./middle/syns.txt
ant_file ./middle/ants.txt

(log2vec) marius@mleng:~/source/Log2Vec$ python code/get_triplet.py data/BGL_without_variables.log  middle/bgl_triplet.txt
(log2vec) marius@mleng:~/source/Log2Vec$ 

(log2vec) marius@mleng:~/source/Log2Vec$ python code/getTempLogs.py -input data/BGL_without_variables.log -output middle/BGL_without_variables_for_training.log
input: data/BGL_without_variables.log 
output: middle/BGL_without_variables_for_training.log

(log2vec) marius@mleng:~/source/Log2Vec/code/LRWE/src$ ./lrcwe -train ../../../middle/BGL_without_variables_for_training.log 
alpha:0.050000, alpha_syn:0.025000, alpha_ant:0.001000, alpha_rel:0.010000
belta_syn:0.700000, belta_ant:0.200000, belta_rel:0.800000
Starting training using file ../../../middle/BGL_without_variables_for_training.log
train_file: ../../../middle/BGL_without_variables_for_training.log 
word_num:0
Vocab size: 1
Words in train file: 1

(log2vec) marius@mleng:~/source/Log2Vec/code/LRWE/src$ ./lrcwe -train /home/marius/source/Log2Vec/middle/BGL_without_variables_for_training.log -synonym /home/marius/source/Log2Vec/middle/syns.txt -antonym /home/marius/source/Log2Vec/middle/ants.txt -output /home/marius/source/Log2Vec/middle/bgl_words.model -save-vocab /home/marius/source/Log2Vec/middle/bgl.vocab -belta-rel 0.8 - alpha-rel 0.01  -alpha-ant 0.3 -size 32 -min-count 1 /home/marius/source/Log2Vec/middle/bgl_triplet.txt 
alpha:0.050000, alpha_syn:0.025000, alpha_ant:0.300000, alpha_rel:0.010000
belta_syn:0.700000, belta_ant:0.200000, belta_rel:0.800000
Starting training using file /home/marius/source/Log2Vec/middle/BGL_without_variables_for_training.log
train_file: /home/marius/source/Log2Vec/middle/BGL_without_variables_for_training.log 
word_num:0
Vocab size: 1
Words in train file: 1
synonyms file total line: 0, words: 0, ignore words: 407
antonyms file total line: 0, words: 0, ignore words: 10

(log2vec) marius@mleng:~/source/Log2Vec$ python code/mimick/make_dataset.py --vectors /home/marius/source/Log2Vec/middle/bgl_words.model --w2v-format --output /home/marius/source/Log2Vec/middle/bgl_words.pkl
Total in Embeddings vocabulary: 1
Training set character count:  4

(log2vec) marius@mleng:~/source/Log2Vec$ python code/mimick/model.py --dataset /home/marius/source/Log2Vec/middle/bgl_words.pkl --vocab code/mimick/testdir/testvocab.txt --output middle/oov.vector
[dynet] random seed: 1179517440
[dynet] allocating memory: 512MB
[dynet] memory allocation done.
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
The dy.parameter(...) call is now DEPRECATED.                                                     |
        There is no longer need to explicitly add parameters to the computation graph.
        Any used parameter will be added automatically.
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|

Very interesting. This appears to be a multistaged and very advanced vectorization technique.

conda env as YAML (Python 3.9, 16.1.2024)

This allows to build a fully functional environment with Log2Vec based on Python 3.9. The original release was 3.6. There will be some deprecation warnings, but I believe they can be safely ignored.

Code: gist.githubusercontent.com/norandom/a1fd048d7d870a90aa72c9c45fd44e02/raw/f8c6ad9c5470b5380d4bcea8eaa237dd64217f9d/conda_env_log2vec.yml

Gister macro

gistUrl	https://gist.github.com/norandom/a1fd048d7d870a90aa72c9c45fd44e02
isMissingRequiredParameters	true

Code Block

curl https://gist.githubusercontent.com/norandom/a1fd048d7d870a90aa72c9c45fd44e02/raw/f8c6ad9c5470b5380d4bcea8eaa237dd64217f9d/conda_env_log2vec.yml -o log2vec_conda.yml
conda env create -f conda_env_log2vec.yml
conda activate log2vec
... # conda env gets stored in the user homes
git clone https://github.com/NetManAIOps/Log2Vec
# follow the steps

Wrapper for the Log2Vec libraries for automated Log file vectorization

This allows to use the Log2Vec library for automated log file vectorization based on the semantic embedding and NLP approach demonstrated in the paper.

Code:

gist.githubusercontent.com/norandom/86a701a56b7de8c800a83eac293da813/raw/a9c7db1d46be633f344b4a07ff05d8985530b162/log2vec_wrapper.sh

Gister macro

gistUrl	https://gist.github.com/norandom/86a701a56b7de8c800a83eac293da813
isMissingRequiredParameters	true

Understanding the .vector versus the .log

The format is line-based, with up to 32 vector dimensions (per line)

Code Block

marius@mleng:~/source/sample_logs$ wc -l syslog.log 
12266 syslog.log
marius@mleng:~/source/sample_logs$ wc -l syslog.vector 
12267 syslog.vector
marius@mleng:~/source/sample_logs$ head -n 1 syslog.vector 
12266 32

A header will be added with the number of lines (samples) and the dimensions (32). Therefore, there is one additional line.

The vectors can be consumed by an ML pipeline.

Versions Compared

Old Version 4

New Version Current

Key

conda env as YAML (Python 3.9, 16.1.2024)

Wrapper for the Log2Vec libraries for automated Log file vectorization

Understanding the .vector versus the .log