CrediNet

CrediNet is set of tools that use graph machine learning and computational methods for credibility modelling on the web. We develop billion-scale data webgraphs and use them to assess credibility levels of websites, which can be used downstream to augment Retrieval-Augmented Generation robustness and fact-checking. This involves large-scale web scraping and text processing, and developing model architectures to interpret the different types of signals we can find on the web (including structural, temporal and linguistic cues).

Projects

CrediBench: benchmark of billion-scale temporal webgraphs on a monthly granularity, sourced from Common Crawl. For the corresponding graph construction pipeline refer to CrediGraph - GitHub.
CrediPred: inferred scores from our developed model (for more details on the model architecture, refer to CrediPred - GitHub).
DomainRel: dataset of 600k+ domains labelled as reliable or not (0-1) spanning four domains: phishing, malware, misinformation and general knowledge
CrediText: text embeddings extracted from scraped web content. Find the corresponding scraping and embedding pipelines on CrediText - GitHub.
CrediNet: API set-up to query CrediPred scores easily on the client side (for more details on the API set up and examples usages, refer to CrediNet - GitHub).