| | ---
|
| | license: mit
|
| | ---
|
| |
|
| | # PyAutoCode: GPT-2 based Python auto-code.
|
| |
|
| | PyAutoCode is a cut-down python autosuggestion built on **GPT-2** *(motivation: GPyT)* model. This baby model *(trained only up to 3 epochs)* is not **"fine-tuned"** yet therefore, I highly recommend not to use it in a production environment or incorporate PyAutoCode in any of your projects. It has been trained on **112GB** of Python data sourced from the best crowdsource platform ever -- **GitHub**.
|
| |
|
| | *NOTE: Increased training and fine tuning would be highly appreciated and I firmly believe that it would improve the ability of PyAutoCode significantly.*
|
| |
|
| | ## Some Model Features
|
| |
|
| | - Built on *GPT-2*
|
| | - Tokenized with *ByteLevelBPETokenizer*
|
| | - Data Sourced from *GitHub (almost 5 consecutive days of latest Python repositories)*
|
| | - Makes use of *GPTLMHeadModel* and *DataCollatorForLanguageModelling* for training
|
| | - Newline characters are custom coded as `<N>`
|
| |
|
| | ## Get a Glimpse of the Model
|
| |
|
| | You can make use of the **Inference API** of huggingface *(present on the right sidebar)* to load the model and check the result. Just enter any code snippet as input. Something like:
|
| |
|
| | ```sh
|
| | for i in range(
|
| | ```
|
| |
|
| | ## Usage
|
| |
|
| | You can use my model too!. Here's a quick tour of how you can achieve this:
|
| |
|
| | Install transformers
|
| | ```sh
|
| | $ pip install transformers
|
| | ```
|
| |
|
| | Call the API and get it to work!
|
| | ```python
|
| | from transformers import AutoTokenizer, AutoModelForCausalLM
|
| |
|
| | tokenizer = AutoTokenizer.from_pretrained("P0intMaN/PyAutoCode")
|
| |
|
| | model = AutoModelForCausalLM.from_pretrained("P0intMaN/PyAutoCode")
|
| |
|
| | # input: single line or multi-line. Highly recommended to use doc-strings.
|
| | inp = """import pandas"""
|
| |
|
| | format_inp = inp.replace('\n', "<N>")
|
| | tokenize_inp = tokenizer.encode(format_inp, return_tensors='pt')
|
| | result = model.generate(tokenize_inp)
|
| |
|
| | decode_result = tokenizer.decode(result[0])
|
| | format_result = decode_result.replace('<N>', "\n")
|
| |
|
| | # printing the result
|
| | print(format_result)
|
| | ```
|
| |
|
| | Upon successful execution, the above should probably produce *(your results may vary when this model is fine-tuned)*
|
| | ```sh
|
| | import pandas as pd
|
| | import numpy as np
|
| | import matplotlib.pyplot as plt
|
| |
|
| | ```
|
| | ## Credits
|
| | ##### *Developed as a part of a university project by [Pratheek U](https://www.github.com/P0intMaN) and [Sourav Singh](https://github.com/Sourav11902312lpu)* |