Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Very interesting. This appears to be a multistaged and very advanced vectorization technique.

conda env as

...

YAML (Python 3.9, 16.1.2024)

This allows to build a fully functional environment with Log2Vec based on Python 3.9. The original release was 3.6. There will be some deprecation warnings, but I believe they can be safely ignored.

Code: gist.githubusercontent.com/norandom/a1fd048d7d870a90aa72c9c45fd44e02/raw/f8c6ad9c5470b5380d4bcea8eaa237dd64217f9d/conda_env_log2vec.yml

Gister macro
gistUrlhttps://gist.github.com/norandom/a1fd048d7d870a90aa72c9c45fd44e02
isMissingRequiredParameterstrue
Code Block
curl https://gist.githubusercontent.com/norandom/a1fd048d7d870a90aa72c9c45fd44e02/raw/f8c6ad9c5470b5380d4bcea8eaa237dd64217f9d/conda_env_log2vec.yml -o log2vec_conda.yml
conda env create -f conda_env_log2vec.yml
conda activate log2vec
... # conda env gets stored in the user homes
git clone https://github.com/NetManAIOps/Log2Vec
# follow the steps

Wrapper for the Log2Vec libraries for automated Log file vectorization

This allows to use the Log2Vec library for automated log file vectorization based on the semantic embedding and NLP approach demonstrated in the paper.

Code:

gist.githubusercontent.com/norandom/86a701a56b7de8c800a83eac293da813/raw/a9c7db1d46be633f344b4a07ff05d8985530b162/log2vec_wrapper.sh

Gister macro
gistUrlhttps://gist.github.com/norandom/86a701a56b7de8c800a83eac293da813
isMissingRequiredParameterstrue

Understanding the .vector versus the .log

The format is line-based, with up to 32 vector dimensions (per line)

Code Block
marius@mleng:~/source/sample_logs$ wc -l syslog.log 
12266 syslog.log
marius@mleng:~/source/sample_logs$ wc -l syslog.vector 
12267 syslog.vector
marius@mleng:~/source/sample_logs$ head -n 1 syslog.vector 
12266 32

A header will be added with the number of lines (samples) and the dimensions (32). Therefore, there is one additional line.

The vectors can be consumed by an ML pipeline.