Page Comparison

Table of Contents

I am looking into Log2Vec

...

Paper

Our paper is published on The 29th International Conference on Computer Communications and Networks (ICCCN 2020,). The information can be found here:

Weibin Meng, Ying Liu, Yuheng Huang, Shenglin Zhang, Federico Zaiter, Bingjin Chen, Dan Pei. A Semantic-aware Representation Framework for Online Log Analysis. ICCCN 2020. August 3 - August 6, 2020, Honolulu, Hawaii, USA.

Install Log2Vec on Linux Mint 21 AMD64 in 2024

I use separate conda environments for older research-grade software.

...

The following dependencies need to be present:

Code Block
1. nltk, nltk.download("wordnet") 2. spacy, spacy.load("en_core_web_md") 3. progressbar 4. dynet (python3)

gensim
A C++ compiler tool chain

dynet

...

Last release is from 2020 (state of this information 15 Jan 2024 ). In Python 3.12 disutils became deprecated, which will cause build errors.

...

I chose to use 3.9, but your requirements may be stricter.

Code Block
conda create --name log2vec python=3.9 conda activate log2vec pip install 'setuptools<57.0.0' pip install --verbose dynet --no-build-isolation

build-essential and cmake (Linux Mint 21)

This is straight forward, but I document the compiler version here for the sake of completeness.

...

as a TD-IDF alternative to log vectorization. Primarily, I’ll be interested in consuming Sysmon logs later.

Log2Vec

View file

name	Log2Vec-icccn20.pdf

Abstract—Logs are one of the most valuable data sources for large-scale service management. Log representation, which converts unstructured texts to structured vectors or matrices, serves as the the first step towards automated log analysis. However, the current log representation methods neither represent domain-specific semantic information of logs, nor handle the outof-vocabulary (OOV) words of new types of logs at runtime. We propose Log2Vec, a semantic-aware representation framework for log analysis. Log2Vec combines a log-specific word embedding method to accurately extract the semantic information of logs, with an OOV word processor to embed OOV words into vectors at runtime. We present an analysis on the impact of OOV words and evaluate the performance of the OOV word processor. The evaluation experiments on four public production log datasets demonstrate that Log2Vec not only fixes the issue presented by OOV words, but also significantly improves the performance of two popular log-based service management tasks, including log classification and anomaly detection. We have packaged Log2Vec into an open-source toolkit and hope that it can be used for future research.

https://github.com/NetManAIOps/Log2Vec

The work was supported by National Key R&D Program of China (Grant No. 2019YFB1802504, 2018YFB1800405), the National Natural Science Foundation of China (Grant Nos. 61772307, 61902200 and 61402257), the China Postdoctoral Science Foundation (2019M651015) and the Beijing National Research Center for Information Science and Technology (BNRist).

Paper

Our paper is published on The 29th International Conference on Computer Communications and Networks (ICCCN 2020,). The information can be found here:

Weibin Meng, Ying Liu, Yuheng Huang, Shenglin Zhang, Federico Zaiter, Bingjin Chen, Dan Pei. A Semantic-aware Representation Framework for Online Log Analysis. ICCCN 2020. August 3 - August 6, 2020, Honolulu, Hawaii, USA.

Install Log2Vec on Linux Mint 21 AMD64 in 2024

I use separate conda environments for “older” research-grade software. Software moves at a rapid pace.

It requires a little bit of software engineering skill.

https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#

The following dependencies need to be present:

Code Block
1. nltk, nltk.download("wordnet") 2. spacy, spacy.load("en_core_web_md") 3. progressbar 4. dynet (python3)

gensim 3.x
A C++ compiler tool chain (I document my ML env below)

dynet

...

Last release is from 2020 (state of this information 15 Jan 2024 ). In Python 3.12 disutils became deprecated, which will cause build errors.

...

I chose to use 3.9, but your requirements may be stricter. This way the error can be avoided.

Code Block
conda create --name log2vec python=3.9 conda activate log2vec pip install 'setuptools<57.0.0' pip install --verbose dynet --no-build-isolation

build-essential and cmake (Linux Mint 21)

This is straightforward, but I document the compiler version here for the sake of completeness.

marius@mleng:~/source/Log2Vec/code/LRWE/src$

gcc

-v

Using

built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.4.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04) marius@mleng:~/source/Log2Vec/code/LRWE/src$ g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.4.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 11.4.0 (

3.22.1-1ubuntu1.22.04.1                    amd64        cross-platform, open-source make system
ii  cmake-data                                 3.22.1-1ubuntu1.22.04.1                    all          CMake data files (modules, templates and documentation)

Code Block
apt install build-essential apt install cmake (log2vec) marius@mleng:~/source/Log2Vec/code/LRWE/src$ dpkg -l \| grep build-essential ii build-essential amd64 12.9ubuntu3 cross-platform, open-source make system ii cmake-data amd64 Informational list 3.22.1-1ubuntu1.22.04.1 of build-essential packages (log2vec) marius@mleng:~/source/Log2Vec/code/LRWE/src$ dpkg -l \| grep cmake ii cmake all CMake data files (modules, templates and documentation)
Expand

Code Block

Expand

Code Block

marius@mleng:~/source/Log2Vec/code/LRWE/src$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.4.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04) 

marius@mleng:~/source/Log2Vec/code/LRWE/src$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.4.0-1ubuntu1~22.04) 

CMake Error: Run 'cmake --help' for all supported options.
marius@mleng:~/source/Log2Vec/code/LRWE/src$ cmake --version
cmake version 3.22.1

CMake suite maintained and supported by Kitware (kitware.com/cmake).

nltk and wordnet

Code Block

conda install anaconda::nltk

(log2vec) marius@mleng:~/source/Log2Vec/code/LRWE/src$ python                                      
Python 3.9.18 (main, Sep 11 2023, 13:41:44) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download("wordnet")
[nltk_data] Downloading package wordnet to /home/marius/nltk_data...
True
>>> quit()

spacy

Code Block
conda install anaconda::spacy conda install conda-forge::spacy-model-en_core_web_md

Build

Code Block

(log2vec)

-with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04) 

CMake Error: Run 'cmake --help' for all supported options.
marius@mleng:~/source/Log2Vec/code/LRWE/src$

make clean rm -rf word2vec lrcwe

cmake --version
cmake version 3.22.1

CMake suite maintained and supported by Kitware (kitware.com/cmake).

nltk and wordnet

Code Block

conda install anaconda::nltk

(log2vec) marius@mleng:~/source/Log2Vec/code/LRWE/src$ makepython  -j 4 g++ word2vec.c -o word2vec -lm -pthread -Ofast -march=native -Wall -funroll-loops -Wno-unused-result
g++ lrcwe.c -o lrcwe -lm -pthread -Ofast -march=native -Wall -funroll-loops -Wno-unused-result

gensim 3.x

4.x introduced changes.

Code Block
conda install conda-forge::gensim=3.8.3

Test trace

...

                                  
Python 3.9.18 (main, Sep 11 2023, 13:41:44) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download("wordnet")
[nltk_data] Downloading package wordnet to /home/marius/nltk_data...
True
>>> quit()

spacy

Code Block
conda install anaconda::spacy conda install conda-forge::spacy-model-en_core_web_md

Build the C++ project (make)

Code Block

(log2vec) marius@mleng:~/source/Log2Vec/code/LRWE/src$ make clean
rm -rf word2vec lrcwe
(log2vec) marius@mleng:~/source/Log2Vec/code/LRWE/src$ make -j 4
g++ word2vec.c -o word2vec -lm -pthread -Ofast -march=native -Wall -funroll-loops -Wno-unused-result
g++ lrcwe.c -o lrcwe -lm -pthread -Ofast -march=native -Wall -funroll-loops -Wno-unused-result

gensim 3.x

4.x introduced changes. Using version 3.x avoids errors with breaking changes.

Code Block
conda install conda-forge::gensim=3.8.3

Test trace

To get familiar with the approach.

I had to use some absolute paths later because it was getting late. Test success.

Expand

Code Block

(log2vec) marius@mleng:~/source/Log2Vec$ python pipeline.py -i data/HDFS.log -t HDFS -o results/
rawlogs:/home/marius/source/Log2Vec/data/HDFS.log
variables have been removed
logs without variables:/home/marius/source/Log2Vec/results/HDFS/without_variables.log
input: /home/marius/source/Log2Vec/results/HDFS/forwithout_trainingvariables.log

word_num:54
Vocab size: 55
Words in train file: 16350
triplet file total line: 5, relation num: 3, match: 5
synonyms file total line: 21, words: 20, ignore words: 0
antonyms file total line: 1, words: 1, ignore words: 0
------
code/LRWE/src/lrcwe -train /home/marius/syn_file /home/marius/source/Log2Vec/results/HDFS/sys.txt
ant_file /home/marius/source/Log2Vec/results/HDFS/ants.txt
delete is added
INFO is added
dfs Got exception
thread transfer block
python code/getTempLogs.py -input /home/marius/source/Log2Vec/results/HDFS/without_variables.log -output /home/marius/source/Log2Vec/results/HDFS/for_training.log
-synonyminput: /home/marius/source/Log2Vec/results/HDFS/sys.txt -antonymwithout_variables.log 
output: /home/marius/source/Log2Vec/results/HDFS/antsfor_training.txt -outputlog
alpha:0.050000, alpha_syn:0.025000, alpha_ant:0.300000, alpha_rel:0.010000
belta_syn:0.700000, belta_ant:0.200000, belta_rel:0.800000
Starting training using file /home/marius/source/Log2Vec/results/HDFS/embeddingfor_training.model -save-vocablog
train_file: /home/marius/source/Log2Vec/results/HDFS/embedding.vocab -belta-rel 0.8 -alpha-rel 0.01 -alpha-ant 0.3 -size 32 -min-count 1 -window 2 -triplet triples.txt
Total in Embeddings vocabulary: 55
Training set character count:  41
for_training.log 
word_num:54
Vocab size: 55
Words in train file: 16350
triplet file total line: 5, relation num: 3, match: 5
synonyms file total line: 21, words: 20, ignore words: 0
antonyms file total line: 1, words: 1, ignore words: 0
------
python
code/LRWE/mimick/make_dataset.pysrc/lrcwe --vectorstrain /home/marius/source/Log2Vec/results/HDFS/embeddingfor_training.modellog --w2v-format --outputsynonym /home/marius/source/Log2Vec/results/HDFS/wordssys.pkl
[dynet] random seed: 3040219324
[dynet] allocating memory: 512MB
[dynet] memory allocation done.
The dy.parameter(...) call is now DEPRECATED.                                           txt -antonym /home/marius/source/Log2Vec/results/HDFS/ants.txt -output /home/marius/source/Log2Vec/results/HDFS/embedding.model -save-vocab /home/marius/source/Log2Vec/results/HDFS/embedding.vocab -belta-rel 0.8 -alpha-rel 0.01 -alpha-ant 0.3 -size 32 -min-count 1 -window 2 -triplet triples.txt
Total in Embeddings vocabulary: 55
Training set character count:  41
------
python code/mimick/make_dataset.py --vectors /home/marius/source/Log2Vec/results/HDFS/embedding.model --w2v-format --output /home/marius/source/Log2Vec/results/HDFS/words.pkl
[dynet] random seed: 3040219324
[dynet] allocating memory: 512MB
[dynet] memory allocation done.
The dy.parameter(...) call is now DEPRECATED.          |         There is no longer need to explicitly add parameters to the computation graph.         Any used parameter will be added automatically. 100% |############################################################################################| [lr=0.006 clips=13    |
        There is no longer need to explicitly add parameters to the computation graph.
        Any used parameter will be added automatically.
100% |############################################################################################|
[lr=0.006 clips=13 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=0 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=0 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=0 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=0 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=0 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=0 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=0 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=0 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=5 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=6 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=9 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=11 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=3 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=4 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=4 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=5 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=4 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=3 updates=54] None
100% |############################################################################################|
100% |############################################################################################|
[lr=0.006 clips=3 updates=54] None
100% |############################################################################################|
------
python code/mimick/model.py --dataset /home/marius/source/Log2Vec/results/HDFS/words.pkl  --vocab /home/marius/source/Log2Vec/results/HDFS/changed_log/vocab.txt --output /home/marius/source/Log2Vec/results/HDFS/oov.vector --num-epochs 20 --learning-rate 0.006000 --num-lstm-layers 1 --cosine --dropout -1.000000 --all-from-mimick --hidden-dim 250 --char-dim 36
log input: /home/marius/source/Log2Vec/results/HDFS/without_variables.log
word vectors input: /home/marius/source/Log2Vec/results/HDFS/embedding.model
log vectors output: /home/marius/source/Log2Vec/results/HDFS/log.vector
end~~
------
 python code/Log2Vec.py -logs /home/marius/source/Log2Vec/results/HDFS/without_variables.log -word_model /home/marius/source/Log2Vec/results/HDFS/embedding.model -log_vector_file /home/marius/source/Log2Vec/results/HDFS/log.vector -dimension 32
---------
0.9438120169363016

Code Block

(log2vec) marius@mleng:~/source/Log2Vec$ python log2vec.py -i results -t HDFS
# no errors

(log2vec) marius@mleng:~/source/Log2Vec$ python code/preprocessing.py -rawlog ./code/data/BGL.log
rawlogs:./code/data/BGL.log
variables have been removed
logs without variables:./code/data/BGL_without_variables.log

(log2vec) marius@mleng:~/source/Log2Vec$ python code/get_syn_ant.py -logs ./code/data/BGL_without_variables.log  -ant_file ./middle/ants.txt
input: ./code/data/BGL_without_variables.log
syn_file ./middle/syns.txt
ant_file ./middle/ants.txt

(log2vec) marius@mleng:~/source/Log2Vec$ python code/get_triplet.py data/BGL_without_variables.log  middle/bgl_triplet.txt
(log2vec) marius@mleng:~/source/Log2Vec$ 

(log2vec) marius@mleng:~/source/Log2Vec$ python code/getTempLogs.py -input data/BGL_without_variables.log -output middle/BGL_without_variables_for_training.log
input: data/BGL_without_variables.log 
output: middle/BGL_without_variables_for_training.log

(log2vec) marius@mleng:~/source/Log2Vec/code/LRWE/src$ ./lrcwe -train ../../../middle/BGL_without_variables_for_training.log 
alpha:0.050000, alpha_syn:0.025000, alpha_ant:0.001000, alpha_rel:0.010000
belta_syn:0.700000, belta_ant:0.200000, belta_rel:0.800000
Starting training using file ../../../middle/BGL_without_variables_for_training.log
train_file: ../../../middle/BGL_without_variables_for_training.log 
word_num:0
Vocab size: 1
Words in train file: 1

(log2vec) marius@mleng:~/source/Log2Vec/code/LRWE/src$ ./lrcwe -train /home/marius/source/Log2Vec/middle/BGL_without_variables_for_training.log -synonym /home/marius/source/Log2Vec/middle/syns.txt -antonym /home/marius/source/Log2Vec/middle/ants.txt -output /home/marius/source/Log2Vec/middle/bgl_words.model -save-vocab /home/marius/source/Log2Vec/middle/bgl.vocab -belta-rel 0.8 - alpha-rel 0.01  -alpha-ant 0.3 -size 32 -min-count 1 /home/marius/source/Log2Vec/middle/bgl_triplet.txt 
alpha:0.050000, alpha_syn:0.025000, alpha_ant:0.300000, alpha_rel:0.010000
belta_syn:0.700000, belta_ant:0.200000, belta_rel:0.800000
Starting training using file /home/marius/source/Log2Vec/middle/BGL_without_variables_for_training.log
train_file: /home/marius/source/Log2Vec/middle/BGL_without_variables_for_training.log 
word_num:0
Vocab size: 1
Words in train file: 1
synonyms file total line: 0, words: 0, ignore words: 407
antonyms file total line: 0, words: 0, ignore words: 10

(log2vec) marius@mleng:~/source/Log2Vec$ python code/mimick/make_dataset.py --vectors /home/marius/source/Log2Vec/middle/bgl_words.model --w2v-format --output /home/marius/source/Log2Vec/middle/bgl_words.pkl
Total in Embeddings vocabulary: 1
Training set character count:  4

(log2vec) marius@mleng:~/source/Log2Vec$ python code/mimick/model.py --dataset /home/marius/source/Log2Vec/middle/bgl_words.pkl --vocab code/mimick/testdir/testvocab.txt --output middle/oov.vector
[dynet] random seed: 1179517440
[dynet] allocating memory: 512MB
[dynet] memory allocation done.
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
The dy.parameter(...) call is now DEPRECATED.                                                     |
        There is no longer need to explicitly add parameters to the computation graph.
        Any used parameter will be added automatically.
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|
100% |                                                                                            |
[lr=0.01 clips=0 updates=0] None
100% |############################################################################################|

Very interesting. This appears to be a multistaged and very advanced vectorization technique.

conda env as YAML (Python 3.9, 16.1.2024)

This allows to build a fully functional environment with Log2Vec based on Python 3.9. The original release was 3.6. There will be some deprecation warnings, but I believe they can be safely ignored.

Code: gist.githubusercontent.com/norandom/a1fd048d7d870a90aa72c9c45fd44e02/raw/f8c6ad9c5470b5380d4bcea8eaa237dd64217f9d/conda_env_log2vec.yml

Gister macro

gistUrl	https://gist.github.com/norandom/a1fd048d7d870a90aa72c9c45fd44e02
isMissingRequiredParameters	true

Code Block

curl https://gist.githubusercontent.com/norandom/a1fd048d7d870a90aa72c9c45fd44e02/raw/f8c6ad9c5470b5380d4bcea8eaa237dd64217f9d/conda_env_log2vec.yml -o log2vec_conda.yml
conda env create -f conda_env_log2vec.yml
conda activate log2vec
... # conda env gets stored in the user homes
git clone https://github.com/NetManAIOps/Log2Vec
# follow the steps

Wrapper for the Log2Vec libraries for automated Log file vectorization

This allows to use the Log2Vec library for automated log file vectorization based on the semantic embedding and NLP approach demonstrated in the paper.

Code:

gist.githubusercontent.com/norandom/86a701a56b7de8c800a83eac293da813/raw/a9c7db1d46be633f344b4a07ff05d8985530b162/log2vec_wrapper.sh

Gister macro

gistUrl	https://gist.github.com/norandom/86a701a56b7de8c800a83eac293da813
isMissingRequiredParameters	true

Understanding the .vector versus the .log

The format is line-based, with up to 32 vector dimensions (per line)

Code Block
marius@mleng:~/source/sample_logs$ wc -l syslog.log 12266 syslog.log marius@mleng:~/source/sample_logs$

...

wc -l syslog.vector 
12267 syslog.vector
marius@mleng:~/source/sample_logs$ head -n 1 syslog.vector 
12266 32

A header will be added with the number of lines (samples) and the dimensions (32). Therefore, there is one additional line.

The vectors can be consumed by an ML pipeline.

Versions Compared

Old Version 2

New Version Current

Key

Paper

Install Log2Vec on Linux Mint 21 AMD64 in 2024

dynet

build-essential and cmake (Linux Mint 21)

Log2Vec

Paper

Install Log2Vec on Linux Mint 21 AMD64 in 2024

dynet

build-essential and cmake (Linux Mint 21)

nltk and wordnet

spacy

Build

nltk and wordnet

gensim 3.x

Test trace

spacy

Build the C++ project (make)

gensim 3.x

Test trace

conda env as YAML (Python 3.9, 16.1.2024)

Wrapper for the Log2Vec libraries for automated Log file vectorization

Understanding the .vector versus the .log