Log2Vec conda Python 3.9

Log2Vec

Paper

Our paper is published on The 29th International Conference on Computer Communications and Networks (ICCCN 2020,). The information can be found here:

Weibin Meng, Ying Liu, Yuheng Huang, Shenglin Zhang, Federico Zaiter, Bingjin Chen, Dan Pei. A Semantic-aware Representation Framework for Online Log Analysis. ICCCN 2020. August 3 - August 6, 2020, Honolulu, Hawaii, USA.

Install Log2Vec on Linux Mint 21 AMD64 in 2024

I use separate conda environments for older research-grade software.

https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#

The following dependencies need to be present:

1. nltk, nltk.download("wordnet")
2. spacy, spacy.load("en_core_web_md")
3. progressbar
4. dynet (python3)

gensim
A C++ compiler tool chain

dynet

https://github.com/clab/dynet

Last release is from 2020 (state of this information 15 Jan 2024 ). In Python 3.12 disutils became deprecated, which will cause build errors.

https://github.com/Asana/python-asana/issues/80

I chose to use 3.9, but your requirements may be stricter.

conda create --name log2vec python=3.9
conda activate log2vec
pip install 'setuptools<57.0.0'
pip install --verbose dynet --no-build-isolation

build-essential and cmake (Linux Mint 21)

This is straight forward, but I document the compiler version here for the sake of completeness.

apt install build-essential
apt install cmake

(log2vec) marius@mleng:~/source/Log2Vec/code/LRWE/src$ dpkg -l | grep build-essential
ii  build-essential                            12.9ubuntu3                                amd64        Informational list of build-essential packages

(log2vec) marius@mleng:~/source/Log2Vec/code/LRWE/src$ dpkg -l | grep cmake
ii  cmake                                      3.22.1-1ubuntu1.22.04.1                    amd64        cross-platform, open-source make system
ii  cmake-data                                 3.22.1-1ubuntu1.22.04.1                    all          CMake data files (modules, templates and documentation)

Click here to expand...

marius@mleng:~/source/Log2Vec/code/LRWE/src$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.4.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04) 

marius@mleng:~/source/Log2Vec/code/LRWE/src$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.4.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04) 

CMake Error: Run 'cmake --help' for all supported options.
marius@mleng:~/source/Log2Vec/code/LRWE/src$ cmake --version
cmake version 3.22.1

CMake suite maintained and supported by Kitware (kitware.com/cmake).

nltk and wordnet

conda install anaconda::nltk

(log2vec) marius@mleng:~/source/Log2Vec/code/LRWE/src$ python                                      
Python 3.9.18 (main, Sep 11 2023, 13:41:44) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download("wordnet")
[nltk_data] Downloading package wordnet to /home/marius/nltk_data...
True
>>> quit()

spacy

conda install anaconda::spacy
conda install conda-forge::spacy-model-en_core_web_md

Build

(log2vec) marius@mleng:~/source/Log2Vec/code/LRWE/src$ make clean
rm -rf word2vec lrcwe
(log2vec) marius@mleng:~/source/Log2Vec/code/LRWE/src$ make -j 4
g++ word2vec.c -o word2vec -lm -pthread -Ofast -march=native -Wall -funroll-loops -Wno-unused-result
g++ lrcwe.c -o lrcwe -lm -pthread -Ofast -march=native -Wall -funroll-loops -Wno-unused-result

gensim 3.x

4.x introduced changes.

conda install conda-forge::gensim=3.8.3

Test trace

Click here to expand...

(log2vec) marius@mleng:~/source/Log2Vec$ python pipeline.py -i data/HDFS.log -t HDFS -o results/
rawlogs:/home/marius/source/Log2Vec/data/HDFS.log
variables have been removed
logs without variables:/home/marius/source/Log2Vec/results/HDFS/without_variables.log
input: /home/marius/source/Log2Vec/results/HDFS/without_variables.log
syn_file /home/marius/source/Log2Vec/results/HDFS/sys.txt
ant_file /home/marius/source/Log2Vec/results/HDFS/ants.txt
delete is added
INFO is added
dfs Got exception
thread transfer block
python code/getTempLogs.py -input /home/marius/source/Log2Vec/results/HDFS/without_variables.log -output /home/marius/source/Log2Vec/results/HDFS/for_training.log
input: /home/marius/source/Log2Vec/results/HDFS/without_variables.log 
output: /home/marius/source/Log2Vec/results/HDFS/for_training.log
alpha:0.050000, alpha_syn:0.025000, alpha_ant:0.300000, alpha_rel:0.010000
belta_syn:0.700000, belta_ant:0.200000, belta_rel:0.800000
Starting training using file /home/marius/source/Log2Vec/results/HDFS/for_training.log
train_file: /home/marius/source/Log2Vec/results/HDFS/for_training.log 
word_num:54
Vocab size: 55
Words in train file: 16350
triplet file total line: 5, relation num: 3, match: 5
synonyms file total line: 21, words: 20, ignore words: 0
antonyms file total line: 1, words: 1, ignore words: 0
------
code/LRWE/src/lrcwe -train /home/marius/source/Log2Vec/results/HDFS/for_training.log -synonym /home/marius/source/Log2Vec/results/HDFS/sys.txt -antonym /home/marius/source/Log2Vec/results/HDFS/ants.txt -output /home/marius/source/Log2Vec/results/HDFS/embedding.model -save-vocab /home/marius/source/Log2Vec/results/HDFS/embedding.vocab -belta-rel 0.8 -alpha-rel 0.01 -alpha-ant 0.3 -size 32 -min-count 1 -window 2 -triplet triples.txt
Total in Embeddings vocabulary: 55
Training set character count:  41
------
python code/mimick/make_dataset.py --vectors /home/marius/source/Log2Vec/results/HDFS/embedding.model --w2v-format --output /home/marius/source/Log2Vec/results/HDFS/words.pkl
[dynet] random seed: 3040219324
[dynet] allocating memory: 512MB
[dynet] memory allocation done.
The dy.parameter(...) call is now DEPRECATED.                                                     |
        There is no longer need to explicitly add parameters to the computation graph.
        Any used parameter will be added automatically.
100% |############################################################################################|
[lr=0.006 clips=13 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=0 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=0 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=0 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=0 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=0 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=0 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=0 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=0 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=5 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=6 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=9 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=11 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=3 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=4 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=4 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=5 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=4 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=3 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=3 updates=54] None
100% |############################################################################################|
------
python code/mimick/model.py --dataset /home/marius/source/Log2Vec/results/HDFS/words.pkl  --vocab /home/marius/source/Log2Vec/results/HDFS/changed_log/vocab.txt --output /home/marius/source/Log2Vec/results/HDFS/oov.vector --num-epochs 20 --learning-rate 0.006000 --num-lstm-layers 1 --cosine --dropout -1.000000 --all-from-mimick --hidden-dim 250 --char-dim 36
log input: /home/marius/source/Log2Vec/results/HDFS/without_variables.log
word vectors input: /home/marius/source/Log2Vec/results/HDFS/embedding.model
log vectors output: /home/marius/source/Log2Vec/results/HDFS/log.vector
end~~
------
 python code/Log2Vec.py -logs /home/marius/source/Log2Vec/results/HDFS/without_variables.log -word_model /home/marius/source/Log2Vec/results/HDFS/embedding.model -log_vector_file /home/marius/source/Log2Vec/results/HDFS/log.vector -dimension 32
---------
0.9438120169363016

(log2vec) marius@mleng:~/source/Log2Vec$ python log2vec.py -i results -t HDFS
# no errors

(log2vec) marius@mleng:~/source/Log2Vec$ python code/preprocessing.py -rawlog ./code/data/BGL.log
rawlogs:./code/data/BGL.log
variables have been removed
logs without variables:./code/data/BGL_without_variables.log

(log2vec) marius@mleng:~/source/Log2Vec$ python code/get_syn_ant.py -logs ./code/data/BGL_without_variables.log  -ant_file ./middle/ants.txt
input: ./code/data/BGL_without_variables.log
syn_file ./middle/syns.txt
ant_file ./middle/ants.txt

(log2vec) marius@mleng:~/source/Log2Vec$ python code/get_triplet.py data/BGL_without_variables.log  middle/bgl_triplet.txt
(log2vec) marius@mleng:~/source/Log2Vec$ 

(log2vec) marius@mleng:~/source/Log2Vec$ python code/getTempLogs.py -input data/BGL_without_variables.log -output middle/BGL_without_variables_for_training.log
input: data/BGL_without_variables.log 
output: middle/BGL_without_variables_for_training.log

(log2vec) marius@mleng:~/source/Log2Vec/code/LRWE/src$ ./lrcwe -train ../../../middle/BGL_without_variables_for_training.log 
alpha:0.050000, alpha_syn:0.025000, alpha_ant:0.001000, alpha_rel:0.010000
belta_syn:0.700000, belta_ant:0.200000, belta_rel:0.800000
Starting training using file ../../../middle/BGL_without_variables_for_training.log
train_file: ../../../middle/BGL_without_variables_for_training.log 
word_num:0
Vocab size: 1
Words in train file: 1

(log2vec) marius@mleng:~/source/Log2Vec/code/LRWE/src$ ./lrcwe -train /home/marius/source/Log2Vec/middle/BGL_without_variables_for_training.log -synonym /home/marius/source/Log2Vec/middle/syns.txt -antonym /home/marius/source/Log2Vec/middle/ants.txt -output /home/marius/source/Log2Vec/middle/bgl_words.model -save-vocab /home/marius/source/Log2Vec/middle/bgl.vocab -belta-rel 0.8 - alpha-rel 0.01  -alpha-ant 0.3 -size 32 -min-count 1 /home/marius/source/Log2Vec/middle/bgl_triplet.txt 
alpha:0.050000, alpha_syn:0.025000, alpha_ant:0.300000, alpha_rel:0.010000
belta_syn:0.700000, belta_ant:0.200000, belta_rel:0.800000
Starting training using file /home/marius/source/Log2Vec/middle/BGL_without_variables_for_training.log
train_file: /home/marius/source/Log2Vec/middle/BGL_without_variables_for_training.log 
word_num:0
Vocab size: 1
Words in train file: 1
synonyms file total line: 0, words: 0, ignore words: 407
antonyms file total line: 0, words: 0, ignore words: 10

(log2vec) marius@mleng:~/source/Log2Vec$ python code/mimick/make_dataset.py --vectors /home/marius/source/Log2Vec/middle/bgl_words.model --w2v-format --output /home/marius/source/Log2Vec/middle/bgl_words.pkl
Total in Embeddings vocabulary: 1
Training set character count:  4

(log2vec) marius@mleng:~/source/Log2Vec$ python code/mimick/model.py --dataset /home/marius/source/Log2Vec/middle/bgl_words.pkl --vocab code/mimick/testdir/testvocab.txt --output middle/oov.vector
[dynet] random seed: 1179517440
[dynet] allocating memory: 512MB
[dynet] memory allocation done.
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
The dy.parameter(...) call is now DEPRECATED.                                                     |
        There is no longer need to explicitly add parameters to the computation graph.
        Any used parameter will be added automatically.
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|