LLMappCrazy

Supplementary Figures (for Paper)

The heatmaps below reveal that plagiarism clusters most heavily among FlowGPT, Poe, and GPT Store, indicating these platforms are particularly prone to cloning and squatting.

Project Overview

This repository contains LLMappCrazy, a tool developed to detect impersonation attacks in Large Language Model (LLM) app stores, specifically focusing on app squatting and app cloning. LLMappCrazy leverages 14 squatting generation techniques, combined with Levenshtein distance and BERT-based semantic analysis, to identify cloned apps through functional similarity analysis.

Research Questions (RQs)

This project addresses the following research questions:

RQ1: To what extent are squatting apps present? Do they primarily target popular apps?
RQ2: How widespread are cloning apps in LLM app stores, as another form of impersonation?
RQ3: How many cases of potential cross-platform plagiarism exist? What are the situations in different stores?
RQ4: How many impersonation (squatting and cloning) apps exhibit malicious behavior?
RQ5: What is the impact of these impersonation apps on users and the LLM app ecosystem?

Features

Squatting Name Variants: Generates multiple variants of app names using techniques such as symbol expansion, character substitution, prefix/suffix addition, and emoji extension.
Levenshtein Distance Matching: Detects minor textual differences between app names.
BERT-Based Semantic Analysis: Identifies semantically similar apps, enabling detection of more nuanced cloning cases.
Cross-Platform Analysis: Identifies potential cross-platform plagiarism by analyzing similarities across multiple LLM app platforms.

Repository Structure

dataset/: Contains the dataset used in this study, with application information gathered from major LLM app stores.
squatting/: Stores data on detected squatting apps, including generated name variants and their matches.
- merged_squatting_all.json: Total results of the combined squatting experiment.
- top_1000_squatting_app.csv: Number of top1000 app squatting.
cloning/: Contains data on detected cloning apps, focusing on apps with high functional and semantic similarity.
- Exact-match-for-identical-content: Result of exact match.
- Similarity-detection: Result of similarity detection.
- merged_cloning_all.json: Total results of the combined cloning experiment.
LLMappCrazy/: The core tool developed for detecting squatting, featuring modules for name generation.
README.md: Project overview, research context, and usage instructions.

Data Collection

Data for this project was gathered from six major LLM app stores: GPT Store, FlowGPT, Poe, Coze, Cici, and Character.AI. Key fields collected for each application include application ID, author, description, and instructions, forming a structured dataset for analysis.

Key Findings

Distribution of Squatting Apps: We found 5,187 squatting apps for top 1000 apps.
Prevalence of Cloning Apps: Among the 13,325 detected cloned apps, there were significant instances of cross-platform plagiarism.
Impact on Users and Ecosystem: The large presence of squatting and cloning apps negatively affects user experience and platform trust, presenting potential security risks.

Contributing

Contributions to improve detection methods, expand datasets, or provide feedback are welcome. Please submit a pull request or reach out to the repository maintainers.

License

This dataset is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). You may use it freely for academic research and non-commercial purposes with proper attribution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLMappCrazy

Supplementary Figures (for Paper)

Project Overview

Research Questions (RQs)

Features

Repository Structure

Data Collection

Key Findings

Contributing

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LLMappCrazy		LLMappCrazy
cloning		cloning
dataset		dataset
figures		figures
squatting		squatting
LICENSE		LICENSE
README.md		README.md

License

security-pride/LLMappCrazy

Folders and files

Latest commit

History

Repository files navigation

LLMappCrazy

Supplementary Figures (for Paper)

Project Overview

Research Questions (RQs)

Features

Repository Structure

Data Collection

Key Findings

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages