...
Very interesting. This appears to be a multistaged and very advanced vectorization technique.
conda env as YAML (Python 3.9, 16.1.2024)
This allows to build a fully functional environment with Log2Vec based on Python 3.9. The original release was 3.6. There will be some deprecation warnings, but I believe they can be safely ignored.
Gister macro | ||||
---|---|---|---|---|
|
Code Block |
---|
curl https://gist.githubusercontent.com/norandom/a1fd048d7d870a90aa72c9c45fd44e02/raw/f8c6ad9c5470b5380d4bcea8eaa237dd64217f9d/conda_env_log2vec.yml -o log2vec_conda.yml
conda env create -f conda_env_log2vec.yml
conda activate log2vec
... # conda env gets stored in the user homes
git clone https://github.com/NetManAIOps/Log2Vec
# follow the steps |
Wrapper for the Log2Vec libraries for automated Log file vectorization
This allows to use the Log2Vec library for automated log file vectorization based on the semantic embedding and NLP approach demonstrated in the paper.
Code:
Gister macro | ||||
---|---|---|---|---|
|
Understanding the .vector versus the .log
The format is line-based, with up to 32 vector dimensions (per line)
Code Block |
---|
marius@mleng:~/source/sample_logs$ wc -l syslog.log
12266 syslog.log
marius@mleng:~/source/sample_logs$ wc -l syslog.vector
12267 syslog.vector
marius@mleng:~/source/sample_logs$ head -n 1 syslog.vector
12266 32
|
A header will be added with the number of lines (samples) and the dimensions (32). Therefore, there is one additional line.
The vectors can be consumed by an ML pipeline.