P0intMaN
/

PyAutoCode

Text Generation

text-generation-inference

Model card Files Files and versions

PyAutoCode / README.md

P0intMaN's picture

Update README.md

1e20624 almost 4 years ago

|

history blame contribute delete

2.38 kB

	---
	license: mit
	---

	# PyAutoCode: GPT-2 based Python auto-code.

	PyAutoCode is a cut-down python autosuggestion built on GPT-2 (motivation: GPyT) model. This baby model (trained only up to 3 epochs) is not "fine-tuned" yet therefore, I highly recommend not to use it in a production environment or incorporate PyAutoCode in any of your projects. It has been trained on 112GB of Python data sourced from the best crowdsource platform ever -- GitHub.

	NOTE: Increased training and fine tuning would be highly appreciated and I firmly believe that it would improve the ability of PyAutoCode significantly.

	## Some Model Features

	- Built on GPT-2
	- Tokenized with ByteLevelBPETokenizer
	- Data Sourced from GitHub (almost 5 consecutive days of latest Python repositories)
	- Makes use of GPTLMHeadModel and DataCollatorForLanguageModelling for training
	- Newline characters are custom coded as `<N>`

	## Get a Glimpse of the Model

	You can make use of the Inference API of huggingface (present on the right sidebar) to load the model and check the result. Just enter any code snippet as input. Something like:

	```sh
	for i in range(
	```

	## Usage

	You can use my model too!. Here's a quick tour of how you can achieve this:

	Install transformers
	```sh
	$ pip install transformers
	```

	Call the API and get it to work!
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained("P0intMaN/PyAutoCode")

	model = AutoModelForCausalLM.from_pretrained("P0intMaN/PyAutoCode")

	# input: single line or multi-line. Highly recommended to use doc-strings.
	inp = """import pandas"""

	format_inp = inp.replace('\n', "<N>")
	tokenize_inp = tokenizer.encode(format_inp, return_tensors='pt')
	result = model.generate(tokenize_inp)

	decode_result = tokenizer.decode(result[0])
	format_result = decode_result.replace('<N>', "\n")

	# printing the result
	print(format_result)
	```

	Upon successful execution, the above should probably produce (your results may vary when this model is fine-tuned)
	```sh
	import pandas as pd
	import numpy as np
	import matplotlib.pyplot as plt

	```
	## Credits
	##### Developed as a part of a university project by [Pratheek U](https://www.github.com/P0intMaN) and [Sourav Singh](https://github.com/Sourav11902312lpu)