High-Level Project Summary
The use of Artificial Intelligence models to build tool of summarizing documents through Natural Processing Language and Deep Learning, starting from reading data and extracting information from it according to the requirementsThis software helps us solve the problem through some tools and features, which consist of:-Natural Language Processing (NLP) to automatically read NTRS documentsIt enables us to read the data from the API- summarize them where the NLP removes the stop words and collects the remaining words, formulating and abbreviating them so that they are related to the data entered by the user-generate text analytic dataNLP analyzes the data in the database to obtain specify
Link to Final Project
Link to Project "Demo"
Detailed Project Description
A software that helps to improve the accessibility and discoverability of records in the NTRS by automatically reading NTRS documents, summarizing them, generating text analytic data, and producing a list of topic keywords to help researchers find the documents they need.
The software allows users to classify, sort, and arrange information; examine relationships in the data; and combine analysis with linking, shaping, searching, and modeling.
The researcher or analyst can identify trends and cross-examine information in a multitude of ways using its search engine and query functions. They can make notes in the software and build a body of evidence to support their case or project.
the software's powerful query tools let you uncover subtle trends, and automated analysis features let you drill down into your data. You can collect data on multiple mobile devices.
It's a mobile application that is easy to use by user. User will add a document to the app and he will find the summary of this document. then, he will find the most important keywords that can help him to know if this document is the document that he search about it and want to read it, rather than he waste his time in searching on all the document in the NTRS.
- Accuracy
The user enters the brief on the tool. Depending on this brief, the tool searches the pdfs, and the output from it is a paragraph that is related to the brief based on it, it sets a rate for each paragraph and in the end, puts an evaluation of the search done by the user, so this method will save time, good quality and accuracy.
- Recommendations
Based on the user's input, the tool makes recommendations, such as that it produces outputs related to the topic he searched for, in addition to its own notes and saving them in the draft.
- Notes Reminder
There is a draft in which the user saves his notes, and after a certain time has passed, he gets an alert reminding him of the notes saved in this draft.
Our goal is to help researchers to save their time when they want to search about specific topic. By using "TexArt App" they can do this easily.
Our future plan is to increase the accuracy of our NLP model and add more features that doesn't be in the app like search on the NTRS by the keywords that we extract and determine all related documents that have the same meaning of this keyword. Also, we want to add the text analytics feature to know more about data.
We used NLP and Deep Learning techniques (RNN) , we used Seq2Seq model with Bidirectional LSTMs.
About libraries: we used Regular Expression (re), pickle, nltk , sklearn.
About programing languages: we used python and dart
Frameworks: Flutter , TensorFlow, Keras.
Space Agency Data
We used NTRS to know about NASA data. We did web scraping on NTRS to collect the data from server but we find that the data is too less to use it to train our NLP model. So, we used another data from Kaggle to train our model and we use the data from NTRS to test it and to use it to summarize the data.
Hackathon Journey
This trip was our first experience of this kind of challenges, NASA's experience was one of the strongest and most important experiences for us, and we learned that we can do a lot in a short time with the organization of work on team members, and that working under pressure requires the cooperation of the team in order to reach the best results.
What inspired us to choose this challenge is the capabilities of our team members, which include a variety of artificial intelligence engineers, Flutter developers, UI / UX designer and a good presenter, and we also have a very good background in the field of business.
Our approach to developing the project was Agile methodology, which is the best way to use time and organize work to get good and fast results.
One of the problems we faced was the difficulty of obtaining data from the server, but we tried in more than one way and then we came up with a web scraping work, but also the problem was that the data is not enough to train the model, but we used external data to train the model and then apply it to the data we own, and we also overcame the problems by discussion and voting On opinions by presenting more than one solution method with different points of view.
We would like to thank our team members for the efforts we gave in the challenge for 24 hours of full effort.
References
NTRS: https://ntrs.nasa.gov/
NASA API: https://ntrs.nasa.gov/api/openapi/
Kaggle dataset: https://www.kaggle.com/code/akashsdas/abstractive-text-summarization/data
Tags
#AI #NLP #DL

