CodeQL

It’s your checkup, no matter where the code is from. Copy-paste, Chat-GPT, Copilots … Some source code is more difficult to write. You write it once, then you correct it 10x. – CodeQL helps you with that process, from a QA development and security testing perspective.

CodeQL

CodeQL is a Code Query Language. It’s a static code analysis tool. Grep’ing through source code can be difficult. Getting all variables, functions and so on… There are tools, which can do this based on the syntax and the structure of the programming languages.

CodeQL is one such tool, which can also be used for Static Analysis Security Testing (SAST). It has a code quality and security focus, but isn’t exclusive to one kind of task.

Features

Its basis is a SQL-like representation of the code structure so that you can build idiomatic search expressions.

CodeQL works from within the CLI, Visual Studio Code, IntelliJ products etc.
You can use it from a Jenkins slave, GH Actions workflow etc.
You can define your own queries (relatively easy)
CodeQL cannot perform instrumented analysis
It works based on the actual syntax, not on bytecode (Veracode works on JVM bytecode for example). This did limit the ability to check code, for example for Kotlin.
- MS / GitHub is very slow with adding new languages, or adopting the backends

REST API

The following code exemplifies how to get the results per repository. You can use the function in a loop. It uses Pandas and requests. The API endpoint can be defined for on-premises and SaaS Cloud instances.

https://pandas.pydata.org/

https://docs.github.com/en/rest?apiVersion=2022-11-28

The set_score function is explained separately. It generates basic metrics.

def get_sast_issues_score(verbose=False, hed=dict, scope="", org="", repo="", rest_api_endpoint=""):
    """
    Ask GitHub Sec API for data about the CodeQL findings and analyze it
    :param verbose: boolean, flag
    :param hed: dict, auth data
    :param scope: string, part of REST API call
    :param org: string, part of REST API call
    :param repo: string, part of REST API call
    :param rest_api_endpoint: REST API URL
    :return: code_score (int), code_severity_list (DataFrame statistical object)
    """

    import sys

    req = "/code-scanning/alerts"
    url = rest_api_endpoint + "/" + scope + "/" + org + "/" + repo + req
    print("CodeQL Repo URL: " + url, file=sys.stdout)

    response = requests.get(url, headers=hed)

    if response.status_code == 404 or response.status_code == 403:
        # CodeQL support issue for some languages
        status = {"Status": "CodeQL disabled or unsupported language"}
        status_df = pd.DataFrame([status])
        print("CodeQL Status: " + "disabled or not supported for repo", file=sys.stdout)
        return 100, status_df

    parsed = json.loads(response.text)

    if verbose:
        print(json.dumps(parsed, indent=4, sort_keys=True))

    df = pd.DataFrame(parsed)

    if len(df) == 0 and response.status_code == 200:
        status = {"Status": "No findings"}
        status_df = pd.DataFrame([status])
        print("CodeQL Status: " + "no findings for repo", file=sys.stdout)
        return 0, status_df

    if len(df) > 0 and response.status_code == 200:
        print("CodeQL Status: " + "processing findings for repo", file=sys.stdout)

    df = pd.concat([df.drop(['rule'], axis=1), df['rule'].apply(pd.Series)], axis=1)
    df['state'] = df['state'].astype(str)
    df["state"] = df["state"].str.lower()

    if verbose:
        print(df[["number", "name", "state", "severity", "security_severity_level"]])

    # filter out anything that's not "open"
    df_open = df.loc[df['state'] == 'open']
    code_severity_list = df_open["security_severity_level"].value_counts()

    if verbose:
        print("Code Severity List (open)")
        print(code_severity_list)
        print("Code Severity Score (open)")
    code_score = set_score(severity_df=code_severity_list)

    # for better table style
    code_severity_list = code_severity_list.reset_index()
    code_severity_list.columns = ['Risk', 'Code Findings Reported']

    return code_score, code_severity_list

Use these results in an SSDLC

Secure Software Development Lifecycle

Here is how basic metrics can be generated:

AppSec Metrics with GH Advanced Security